본문으로 건너뛰기

CNN 005

· 약 3분

Computer Vision

  • Classification
  • Classification with Localization
  • Object Detection
-ANNCNN
Input1D vector3D tensor (height, width, channels)
ConnectionsFully connectedLocal connections (receptive fields)
OverfittingProne to overfittingLess prone to overfitting

Convolutional Neural Networks (CNN)

  1. Convolutional Layer (CONV)
  2. Pooling Layer (POOL)
  3. Fully Connected Layer (FC)

LENET-5

Convulutional Layer (CONV)

  • The first layer to extraact features from an input image
  • Core buildling block of a CNN
  • Convolutions are basic operation in this layer
  • A number of filters (e.g. edge detectors) are applied to the input image.

Padding

  • Padding is used to control the spatial size of the output feature maps.
  • Negative values at the edges can naturally arise because of padding, and they usually are not a big problem because activation functions and later layers come afterward.
  • Input Matrix dimension: n×n×cn \times n \times c (height, width, channels)
  • Filter size: f×ff \times f
  • Padding (PP): 1, number of pixels added to the border of the input
  • (n×n)(f×f)(n+2Pf+1)×(n+2Pf+1)(n \times n) * (f \times f) \to (n + 2P - f + 1) \times (n + 2P - f + 1)
    • Example: 5×55 \times 5 input with 3×33 \times 3 filter and padding of 1 results in a 5×55 \times 5 output feature map.
  • if input and output matrix dimensions are the same, then P=f12P = \frac{f - 1}{2}.
  • Valid padding (P=0P = 0): No Padding
  • Same padding (P=f12P = \frac{f - 1}{2}): Output size and input size is same, this requires appropriate padding.

Stride

  • It is the number of pixels by which slide the filter across the input image.
No Padding StridesStride with Padding
no padding stridesstride with padding
  • Github: vdumoulin/conv_arithmetic
  • Input Matrix dimension: n×nn \times n
  • Filter size: f×ff \times f
  • Padding: PP
  • Stride: SS
  • Output Size = n+2PfS+1×n+2PfS+1 \left\lfloor \frac{n + 2P - f}{S} + 1 \right\rfloor \times \left\lfloor \frac{n + 2P - f}{S} + 1 \right\rfloor
    • Example: Input Matrix dimension: 5×55 \times 5, Filter size: 3×33 \times 3, Padding: 11, Stride: 22 results in an output size of 2×22 \times 2.

Pooling Layer (POOL)

  • Down sampling operation which reduces the dimensionality of a matrix.
  • Reduces the number of parameters for large image, but retain the valuable information.
  • Max pooling
  • Average pooling
  • Sum pooling

Fully Connected Layer (FC)

  • a traditional Multi-layer Perception (MLP) layer
  • For multi-class classification, usually Softmax activation is used.
  • Softmax ensures the output.
  • Output of the CONV and POOL layers represent a high level features of the Input image.
  • The FC layer takes these features to classify the input image into the desired output classes.