어텐션 이해하기...

Notice

Recent Posts

Recent Comments

Link

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Tags more

Archives

Today

Total

관리 메뉴

윤제로의 제로베이스

어텐션 이해하기... 본문

Self Paper-Seminar

어텐션 이해하기...

윤_제로 2023. 5. 22. 22:13

NeRF에 어텐션을 사용해서 성능이 오를 수 있을지 궁금해서 한 번해보려고 합니당

그래서 가장 먼저 어텐션에 대한 이해를 좀 해보려고 하는데

그냥 읽으면 대충 읽을게 뻔하여 티스토리에 같이 써가면서 읽어보려 합니다.

출처는 여기에요

https://colab.research.google.com/github/metamath1/ml-simple-works/blob/master/mnistattn/mnist_attn.ipynb

mnist_attn.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

개요

mnist를 MLP를 이용하여 이미지를 Patch로 잘라서 벡터로 펼치고 이를 신경망에 넣을 거임.

이때 분할 된 각 패치에 어텐션을 줄거임.

이미지를 각각 분할해서 쓰지 않고 conv필터 사이즈를 patch * patch로 한 다음에 stride도 patch * patch로 주어 동일한 작업을 수행. 하지만 실제로 잘라서 함...

이미지 분할 함수

# https://stackoverflow.com/questions/16856788/slice-2d-array-into-smaller-2d-arrays
def blockshaped(arr, nrows, ncols):
    """
    arr: (N, H, W)

    Return an array of shape (n, h//nrows, w//ncols, nrows, ncols) where
    n * h//nrows * w//ncols * nrows * ncols = arr.size

    If arr is a 2D array, the returned array should look like n subblocks with
    each subblock preserving the "physical" layout of arr.
    """
    n, h, w = arr.shape
    assert h % nrows == 0, "{} rows is not evenly divisble by {}".format(h, nrows)
    assert w % ncols == 0, "{} cols is not evenly divisble by {}".format(w, ncols)
    # 원본
    # return arr.reshape(h//nrows, nrows, -1, ncols).swapaxes(1,2).reshape(-1, nrows, ncols)
    return ( arr.reshape(n, h//nrows, nrows, -1, ncols).transpose(0, 1, 3, 2, 4)
                .reshape(-1, h//nrows, w//ncols, nrows, ncols) )

이미지 분할과 증강

# csv파일을 pandas 데이터 프레임으로 읽어온다.
D_train = pd.read_csv("sample_data/mnist_train_small.csv")

# 데이터 프레임으로부터 1열부터는 이미지 X로 0열은 타겟 y로 저장
X_train = D_train.iloc[:,1:]
y_train = D_train.iloc[:,0]

X_train = X_train.to_numpy()
y_train = y_train.to_numpy()

X_train.shape, y_train.shape

sample_idx = 32
X_samples = X_train[sample_idx:sample_idx+3]
y_samples = y_train[sample_idx:sample_idx+3]

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(10,5))
ax[0].imshow(X_samples[0].reshape(28,28))
ax[1].imshow(X_samples[1].reshape(28,28))
ax[2].imshow(X_samples[2].reshape(28,28))

어텐션의 목적은 숫자를 예측하기 위한 이미지의 어느 부분에 집중해야하는지를 결정하는것.

mnist 데이터셋은 숫자가 이미지 정중앙에 가지런히 놓여있는 상태라서 이를 augmentation을 해서 무작위로 만들어버린다.

import matplotlib.gridspec as gridspec

# https://github.com/albumentations-team/albumentations#a-simple-example
# (0,1.0)사이로 그 정도를 지정할 수 있는 shift, scale, ratate를 설정한다.
# scale_limit에서 음수는 축소를 의미, 따라서 확대는 하지 않고 축소만 한다.
transform = A.Compose([
    A.ShiftScaleRotate(shift_limit=0.3, scale_limit=(-0.4, 0), 
                       rotate_limit=0, p=0.8, border_mode=cv2.BORDER_CONSTANT)
])

# Augment an image
transformed_image = np.array(
        [transform(image=X_sample.reshape(28,28).astype(np.float32))["image"] 
        for X_sample in X_samples])
print(transformed_image.shape)

X_blocked = blockshaped(transformed_image, 7, 7)
print(X_blocked.shape)

# https://stackoverflow.com/questions/34933905/matplotlib-adding-subplots-to-a-subplot
fig = plt.figure(figsize=(10, 3))
outer = gridspec.GridSpec(1, 3, wspace=0.2, hspace=0.2)

for i in range(3):
    inner = gridspec.GridSpecFromSubplotSpec(4, 4, subplot_spec=outer[i],
                                             wspace=0., hspace=0.)
    
    for j in range(4):
        for k in range(4):
            ax = plt.Subplot(fig, inner[j,k])
            ax.set_xticks([])
            ax.set_yticks([])
            ax.imshow(X_blocked[i,j,k])
            fig.add_subplot(ax)

plt.show()

그림처럼 숫자의 위치가 바뀌고 4*4로 이미지 패치도 찢어진 것을 볼 수 있다.

그렇다면 이제 여기서 각각의 패치를 각기 다른 Linear에 입력한다.

다음 출력으로 나온 길이 16짜리 벡터 16개를 이어 붙여 전체 길이 256짜리 벡터로 만든다.

이 벡터를 linear(256,10)에 입력하여 길이가 10인 최종 줄력 벡터를 만듬...

dataset 정의

class MnistDataset(Dataset):
    def __init__(self, csv_file, transform=None):
        self.transform = transform
        # csv파일을 읽어서 이미지와 타겟을 준비한다.
        D = pd.read_csv(csv_file)
        self.X = D.iloc[:, 1:].to_numpy().reshape(D.shape[0], 28, 28) /255. # (N,28,28)
        self.y = D.iloc[:, 0]  # (N,)

    def __len__(self):
        return self.X.shape[0]

    def __getitem__(self, idx):
        X_train = self.X[idx]
        y_train = self.y[idx]

        # transform이 초기화 되어있다면 이미지를 반환하기 전에
        # 이미지 증강 작업을 한다.
        if self.transform:
            transformed = transform(image=X_train)
            X_train = transformed["image"]

        return { "image": torch.tensor(X_train, dtype=torch.float),
                "target": torch.tensor(y_train, dtype=torch.long) }

DataLoader를 이용한 미니배치 테스트

train_loader = DataLoader(train_dataset, batch_size=10, shuffle=True)
# 배치사이즈 10으로

모델 만들기

모델은 어텐션 없는 기본 모델과 있는 모델로 2개를 만듬

하나는 Mnist, MnistAttn으로 2개임

class Mnist(nn.Module):
    def __init__(self, image_size, patch_size, hidden_patch, n_class):
        """
        이미지와 패치는 정사각형
        image_size:   입력되는 쪼개지기 전의 원래 이미지 크기, 28
        patch_size:   이미지를 패치로 쪼갤때 패치 하나의 크기, 7
        hidden_patch: 각 패치에 적용되는 Linear층의 출력 개수, 16
        n_class:      클래스 수, 10
        """
        super(Mnist, self).__init__()

        self.patch_size = patch_size
        
        assert image_size % patch_size == 0, \
            "{} is not evenly divisble by {}".format(image_size, patch_size)

        self.n_patch = image_size // patch_size # 4
        self.blocked_linears = torch.nn.ModuleList([ 
                                    nn.Linear(patch_size**2, hidden_patch) 
                                    for i in range(self.n_patch**2) ])

        self.hidden = nn.Linear(self.n_patch**2 * hidden_patch, n_class)
        self.logsoftmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        """
        x : (N, n_patch_rows, n_patch_cols, patch_rows, patch_cols)
        """
        # 주석은 미니배치 10, 이미지사이즈가 (28,28), 패치사이즈가 (7,7)
        # 패치가 Linear()를 통과한후 나오는 출력이 16인 경우를 가정한다.
        # 주석에서
        # N: 배치 사이즈
        N, _, _, Pr, Pc = x.shape

        # forward는 분할된 이미지를 입력받는다. 이 경우 입력은 (10, 4, 4, 7, 7)
        # 4x4개 패치를 16개 한 차원으로 몰아넣는다.
        x = x.reshape(N, -1, Pr, Pc) # (N, 16, 7, 7) 

        # (7,7)인 개볖 패치를 한줄로 편다 
        x = x.reshape(N, self.n_patch**2, self.patch_size**2) # (N, 16, 49), 
        
        # ModuleList에 저장된 Linear() 열여섯개를 순차적으로 포워드시킨다.
        # 그 열여섯개 결과를 1번축으로 이어붙인다.
        blocks = torch.cat( [ linear_i(x[:, i]) 
                for i, linear_i in enumerate(self.blocked_linears) ], dim=1) # (N, 16*16)

        x = nn.functional.relu(blocks) # (N, 256)->(N, 256)  
        x = self.hidden(x) # (N, 256)->(N,10)
        x = self.logsoftmax(x) # (N,10)->(N,10)
        
        return x

MnistAttn의 경우에는 16개 이미지 패치를 각각 linear에 넣고 하나는 길이 16짜리 히든 벡터를 만들고, 하나는 어텐션 가중치 계산을 위해 숫자 하나를 만듬.

Linear(49,1)에 의해 만들어진 숫자를 소프트맥스로 입력해서 가중치를 만들어준다.

이후에 16짜리 히든 벡터랑 곱한후에 몽땅 concat해서 256으로 만든 후에 linear(256,10)으로 만든다.

class MnistAttn(nn.Module):
    def __init__(self, image_size, patch_size, hidden_patch, n_class):
        """
        이미지와 패치는 정사각형
        image_size:   입력되는 쪼개지기 전의 원래 이미지 크기, 28
        patch_size:   이미지를 패치로 쪼갤때 패치 하나의 크기, 7
        hidden_patch: 각 패치에 적용되는 Linear층의 출력 개수, 16
        n_class:      클래스 수, 10
        """
        super(MnistAttn, self).__init__()

        self.patch_size = patch_size
        self.hidden_patch = hidden_patch
        self.alpha = None # attention weights
        
        assert image_size % patch_size == 0, \
            "{} is not evenly divisble by {}".format(image_size, patch_size)

        self.n_patch = image_size // patch_size # 4
        # 16 (49, 16)
        self.blocked_linears = torch.nn.ModuleList([ 
                                    nn.Linear(patch_size**2, hidden_patch) 
                                    for i in range(self.n_patch**2) ])
        # 16 (49, 1)
        self.attns = torch.nn.ModuleList([
                                    nn.Linear(patch_size**2, 1)
                                    for i in range(self.n_patch**2) ])
        # (256, 10)
        self.hidden = nn.Linear(self.n_patch**2 * hidden_patch, n_class)
        self.logsoftmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        """
        x : (N, n_patch_rows, n_patch_cols, patch_rows, patch_cols)
        """
        # 주석은 미니배치 10, 이미지사이즈가 (28,28), 패치사이즈가 (7,7)
        # 패치가 Linear()를 통과한후 나오는 출력이 16인 경우를 가정한다.
        # 주석에서
        # N: 배치 사이즈
        # o: 패치를 입력받는 Linear의 출력 사이즈
        # l: 패치 개수
        N, _, _, Pr, Pc = x.shape

        x = x.reshape(N, -1, Pr, Pc) # (N, 16, 7, 7)
        x = x.reshape(N, self.n_patch**2, self.patch_size**2) # (N, 16, 49)
        
        blocks = ( torch.cat([ nn.functional.relu(linear_i(x[:, i])) # (N, o) * l, (N, 16) * 16
                    for i, linear_i in enumerate(self.blocked_linears) ], dim=0)
                    .reshape(-1, N, self.hidden_patch).transpose(1,0) ) # (N, l, o), (N, 16, 16) 
        
        attn_weights = torch.cat( [ torch.tanh(attn_i(x[:, i])) 
                for i, attn_i in enumerate(self.attns) ], dim=1)
        attn_weights = nn.functional.softmax(attn_weights, dim=1) # (N, l), (N, 16)
        self.alpha = attn_weights
        attn_weights = torch.unsqueeze(attn_weights, 2) # (N, l, 1), (N, 16, 1)

        x = blocks * attn_weights # (N,l,o)*(N,l,1)=(N, 16, 16)*(N, 16, 1)=(N, 16, 16)
        
        # reshape (N, 16, 16) -> (N, 16*16)
        x = x.reshape(N,-1)
        
        x = self.hidden(x) # (N,10)
        x = self.logsoftmax(x)
        
        return x

학습

batch_size = 10

# transform = A.Compose([
#     A.ShiftScaleRotate(shift_limit=0.3, scale_limit=(0, 0), 
#                        rotate_limit=0, p=0.8)
# ])

transform = A.Compose([
    A.ShiftScaleRotate(shift_limit=0.3, scale_limit=(-0.4, 0), 
                       rotate_limit=0, p=0.8, border_mode=cv2.BORDER_CONSTANT)
])

train_dataset = MnistDataset("sample_data/mnist_train_small.csv", 
                             transform=transform)
print(len(train_dataset))
train_loader = DataLoader(train_dataset, batch_size=batch_size, 
                          num_workers=4, shuffle=True, pin_memory=True)

test_dataset = MnistDataset("sample_data/mnist_test.csv",
                            transform=transform)
print(len(test_dataset))
test_loader = DataLoader(test_dataset, batch_size=batch_size, 
                         num_workers=4, shuffle=True, pin_memory=True)

criterion = nn.NLLLoss()
epoches = 250
lr = 0.3

model = Mnist(28, 7, 17, 10)
model.cuda()

model_attn = MnistAttn(28, 7, 16, 10)
model_attn.cuda()

def train(model, acc):
    
    for epoch in range(epoches):
        cur_loss = 0
        correct = 0
        for i, data in tqdm(enumerate(train_loader), total=len(train_loader)):
            X = data['image']
            y = data['target']

            # 미니배치를 7x7 패치로 자르기
            X = torch.tensor(blockshaped(X.numpy(), 7, 7))
            
            X = X.cuda()
            y = y.cuda()
            
            # optimizer를 쓰지 않으면 model에 params가 있으므로 model에서 
            # 그래디언트 초기화
            model.zero_grad()
            out = model(X)
            
            loss = criterion(out, y)
            loss.backward()

            # 옵티마이저를 쓰지 않으므로 직접 params 업데이트
            for p in model.parameters():
                p.data.add_(p.grad.data, alpha = -lr)
            
            correct += (torch.exp(out).argmax(axis=1) == y).float().sum()
            cur_loss += loss

        acc_train = 100. * correct / len(train_dataset)

        # test acc.
        with torch.no_grad():
            correct = 0
            for i, data in enumerate(test_loader):
                X = data['image']
                y = data['target']

                X = torch.tensor(blockshaped(X.numpy(), 7, 7), device="cuda:0")
                # X = X.cuda()
                y = y.cuda()

                out = model(X)
                correct += (torch.exp(out).argmax(axis=1) == y).float().sum()
        acc_test = 100. * correct / len(test_dataset)

        print(f"EPOCH: {(epoch+1):02d}, LOSS: {cur_loss.item()/i:e}, TRAIN ACC.: {acc_train:e}, TEST ACC.: {acc_test:e}")
        
        acc['train'].append(acc_train.item())
        acc['test'].append(acc_test.item())

model_acc = {'train':[], 'test':[]}

train(model, model_acc)

model_attn_acc = {'train':[], 'test':[]}

train(model_attn, model_attn_acc)

'Self Paper-Seminar' 카테고리의 다른 글

CVPR 2023 paper 동향 (0)	2023.08.31
어텐션 이해하기... 2편 (0)	2023.06.10
2022 IROS 내 맘대로 Paper List - 2탄 (0)	2023.02.28
2022 IROS 내 맘대로 Paper List - 1탄 (0)	2023.02.17
Paper Reading (0)	2023.01.11

'Self Paper-Seminar' Related Articles

윤제로의 제로베이스

어텐션 이해하기... 본문

어텐션 이해하기...

개요

이미지 분할 함수

이미지 분할과 증강

dataset 정의

DataLoader를 이용한 미니배치 테스트

모델 만들기

학습

'Self Paper-Seminar' 카테고리의 다른 글

티스토리툴바