CNN 005
· 약 3분
Computer Vision
- Classification
- Classification with Localization
- Object Detection
| - | ANN | CNN |
|---|---|---|
| Input | 1D vector | 3D tensor (height, width, channels) |
| Connections | Fully connected | Local connections (receptive fields) |
| Overfitting | Prone to overfitting | Less prone to overfitting |
Convolutional Neural Networks (CNN)
- Convolutional Layer (CONV)
- Pooling Layer (POOL)
- Fully Connected Layer (FC)

Convulutional Layer (CONV)
- The first layer to extraact features from an input image
- Core buildling block of a CNN
- Convolutions are basic operation in this layer
- A number of filters (e.g. edge detectors) are applied to the input image.
Padding
- Padding is used to control the spatial size of the output feature maps.
- Negative values at the edges can naturally arise because of padding, and they usually are not a big problem because activation functions and later layers come afterward.
- Input Matrix dimension: (height, width, channels)
- Filter size:
- Padding (): 1, number of pixels added to the border of the input
-
- Example: input with filter and padding of 1 results in a output feature map.
- if input and output matrix dimensions are the same, then .
- Valid padding (): No Padding
- Same padding (): Output size and input size is same, this requires appropriate padding.
Stride
- It is the number of pixels by which slide the filter across the input image.
| No Padding Strides | Stride with Padding |
|---|---|
![]() | ![]() |
- Github: vdumoulin/conv_arithmetic
- Input Matrix dimension:
- Filter size:
- Padding:
- Stride:
- Output Size =
- Example: Input Matrix dimension: , Filter size: , Padding: , Stride: results in an output size of .
Pooling Layer (POOL)
- Down sampling operation which reduces the dimensionality of a matrix.
- Reduces the number of parameters for large image, but retain the valuable information.
- Max pooling
- Average pooling
- Sum pooling
Fully Connected Layer (FC)
- a traditional Multi-layer Perception (MLP) layer
- For multi-class classification, usually Softmax activation is used.
- Softmax ensures the output.
- Output of the CONV and POOL layers represent a high level features of the Input image.
- The FC layer takes these features to classify the input image into the desired output classes.

