PyTorch Lecture 10: Basic CNN

ML \ DL/PyTorch Zero To All

PyTorch Lecture 10: Basic CNN

lfgwy 2022. 7. 26. 04:04

방학동안 학회에서 김성훈 교수님의 PyTorch Zero To All 강의로 공부를 하게 된 김에 스스로 정리해보려고 합니다.

좋은 강의 공유해주신 김성훈 교수님께 감사드립니다.

강의링크:

https://www.youtube.com/playlist?list=PLlMkM4tgfjnJ3I-dbhO9JTw7gNty6o_2m

PyTorchZeroToAll (in English)

Basic ML/DL lectures using PyTorch in English.

www.youtube.com

CNN(Convolutional Neural Network)

CNN(Convolutional Neural Network)은 Convolution Layer와 Pooling Layer로 구성된 네트워크입니다.

CNN은 Locally Connected Neural Net으로, 이전까지 배운 Fully Connected Neural Net과는 다르게 이미지의 일부만을 사용합니다. 이러한 특성 덕에 fully connected layer보다 작은 weight를 가지고, 이미지를 다루기에 유연하다는 장점이 있습니다.

Convolution Layer

이전에 배운 단순한 Softmax Classifier을 사용하는 경우, 주어진 이미지의 모든 pixel information을 input으로 사용합니다. 하지만 Convolution Layer는 image의 일부(patch)만을 사용합니다.

아래의 경우 주어진 image는 3(Width) x 3(Height) x 1(Depth)의 흑백 이미지입니다. 2 x 2 x 1의 filter(kernel)를 사용합니다.

Filter란, 작은 크기의 weight 행렬을 의미합니다. Filter의 크기(Width x Height)는 임의로 정할 수 있으나, depth는 보통 이미지와 같게 사용합니다. Filter가 이미지를 sliding하면서 sliding 하면서 내적을 구해 output을 만들어냅니다.

각 단계에서 filter가 이동하는 크기를 Stride라고 부르는데, 아래의 예제에서 Stride는 1로, 각 단계에서 filter는 한 칸씩 이동합니다. 각 단계에서 patch와 filter의 내적을 구해 output을 구한 후 한 칸 씩 이동하게 됩니다. 아래 예제에서 2 x 2 x 1의 filter와 3 x 3 x 1의 image 간의 내적이 4번 가능하기 때문에 output은 4개의 값을 가진 2 x 2 x 1의 크기가 됩니다.

Convoluntion layer을 사용할 때 흔히 쓰는 테크닉 중 Padding이라는 것이 있습니다. Padding이란, input 이미지 주변에 특정한 값을 채우는 것을 의미합니다. Padding은 주로 output 데이터의 크기를 조정하는 목적으로 사용됩니다. 아래 예는 가장 흔히 사용되는 padding 기법 중에 하나인, input 이미지 주변에 0을 채우는 zero padding을 사용한 경우인데, 이를 통해 이전과 같은 2 x 2 x 1의 output이 아닌 3 x 3 x 1의 output을 얻은 것을 확인할 수 있습니다.

당연히 filter를 1개 이상 사용하는 것도 가능합니다. 저희가 사용한 filter의 개수가 activation map의 depth를 결정합니다.

* Activation map이란: Convolutinal Layer의 input data를 filter가 sliding하며 만든 출력을 feature map이라 하는데, feature map에 activation function을 적용한 것을 activation map이라 합니다.

아래의 경우 6개의 filter를 사용하여 각 filter에서 하나씩 28 x 28 x 1의 activation map이 생겨, 28 x 28 x 6의 activation map이 만들어진 것을 확인할 수 있습니다.

PyTorch Zero to All 김성훈 교수님 강의자료(원래는 CS231n)

ConvNet이란, Convolution Layer를 여러 개 겹친 것을 의미합니다.

Pooling Layer

Pooling은 sub sampling이라고도 부릅니다. Sub sampling이란, 주어진 image를 더 작은 size의 image로 줄이는 방법을 의미합니다. Pooling은 convolution layer로 인해 생긴 activation map에 적용됩니다.

Pooling의 종류로는, max pooling, average pooling 등 다양한 기법이 존재합니다.

아래 예제는 2 x 2 filter, stride 2로 max pooling을 적용한 예제로, 각 patch에서 가장 큰 값만을 가져오는 것을 확인할 수 있습니다.

* Average pooling은 각 patch의 평균을 가져옵니다.

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        # MNIST는 흑백이미지니까 1, 10은 우리가 원하는 output의 수, kernel size는 5입니다.
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        # 이전 convolution layer에서 받은 output 10개를 input으로 받고,
        # output 20개 생성합니다.
        self.mp = nn.MaxPool2d(2)
        # Max pooling의 kernel size는 2
        self.fc = nn.Linear(320, 10)
        # 마지막 fully connected layer로 linear layer 사용합니다.
        # 여기서 320 자리의 값을 알기 어려운데, 이 때 사용하는 technique으로, 아무 값이나 넣어
        # 코드를 돌려 일부로 error를 발생시키는 방법이 있습니다. 발생한 error에서 값을 찾을 수 있습니다.
        # 코드블럭 밑에 첨부하는 이미지를 보시면 무슨 말인지 이해하실 수 있으실 겁니다.

    def forward(self, x):
        in_size = x.size(0)
        # batch size
        x = F.relu(self.mp(self.conv1(x)))
        # convolution layer -> pooling layer -> activation function(ReLU)
        x = F.relu(self.mp(self.conv2(x)))
        # convolution layer -> pooling layer -> activation function(ReLU)
        x = x.view(in_size, -1)  # flatten the tensor
        # 마지막 linear layer에 넣어주기 전에 tensor를 flatten 해주고 있습니다. 여기서 in_size는
        # batch size입니다.
        x = self.fc(x)
        return F.log_softmax(x)
		# softmax function을 사용하면 vanishing gradient 문제가 발생하는 경우가 많은데,
        # 이를 log를 취함으로 보완해줄 수 있습니다. 또, log를 취했기 때문에 계산에 있어 용이하다는
        # 장점도 갖습니다.

model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        # negative log likelihood loss
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        # 역전파 연산을 하지 않을 것이라는 것이 확실한 경우, memory를 아끼기 위해 volatile = True
        output = model(data)
        # sum up batch loss
        test_loss += F.nll_loss(output, target, size_average=False).data
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()

코드블럭에서 언급한 이미지입니다.

'ML \ DL > PyTorch Zero To All' 카테고리의 다른 글

PyTorch Lecture 12: RNN (0)	2022.08.09
PyTorch Lecture 11: Advanced CNN (0)	2022.08.05
PyTorch Lecture 09: Softmax Classifier (0)	2022.07.14
PyTorch Lecture 06: Logistic Regression (0)	2022.07.13
Pytorch Lecture 04: Back-Propagation and Autograd (0)	2022.07.13

현재글PyTorch Lecture 10: Basic CNN

통계 #분포 #이항분포 #포아송분포 #지수분포 # 카이제곱분포 #F분포 #평균 #분산,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

열심히