Skip to main content

CNN 012

· 6 min read
  • Three main layers of a CNN
    • CONV: Convolution Layer
    • POOL: Pooling Layer
    • FC: Fully Connected Layer
    • CONV extracts features, POOL downsamples feature maps, and FC makes the final prediction.
  • Why CNNs use over ANNs for image processing
    • Computationally efficient
    • Using Filters to capture spatial features
    • Sharing weights across the image
  • Overfitting
    • The model essentially memorizes the training data, leading to poor performance on unseen data
    • To prevent overfitting, we can use techniques like:
      • Dropout: Randomly dropping out neurons during training to prevent co-adaptation
      • Batch Normalization: Normalizing the inputs of each layer to stabilize learning
      • L1/L2 Regularization: Adding a penalty to the loss function to discourage large weights
        • L1 regularization adds a penalty based on the absolute value of the weights (can be zero, can make model sparse and useful for feature selection.)
        • L2 regularization adds a penalty based on the squared value of the weights (not can be zero, reduce model complexity and overfitting.)
      • Data Augmentation: Creating new training samples by applying transformations to existing data
  • ReLU
    • If the input is below zero, ReLU does output 0.
    • If the input is above zero, it outputs the input value itself.
    • max(0, x)
    • ReLU can output any number from 0 to infinity, which allows it to capture a wide range of features in the data.
    • It fixes gradient vanishing problem by allowing gradients to flow through the network without being squashed to zero, which can happen with activation functions like sigmoid or tanh.
  • Sigmoid: Binary Classification
  • Softmax: Multi-class Classification
  • Backpropagation: sends the error backward through the network and calculates gradients, so the model knows how to update its weights and biases.
  • Gradient Descent
    • It uses Backpropagation to calculate the exact slope (the gradient) of the error (loss).
    • Then it takes a step in the opposite direction of the gradient to minimize the error.
    • It repeats this interative process until it reaches a local minimum.
  • Vanishing Gradient Problem
    • The gradient becomes too small, so earlier layers learn very slowly or almost stop learning.
    • ReLU helps mitigate this problem by allowing gradients to flow through the network without being squashed to zero.
  • Learning Rate α\alpha
    • It's a hyperparameter that controls how big of a step the model takes down the slope.
    • If α\alpha is too small, the model will take tiny steps and may take a long time to converge.
    • If α\alpha is too large, the model may overshoot the minimum and diverge.
  • Precision: TPTP+FP\frac{TP}{TP + FP}
    • Of all the patients the model predicted as having the disease, how many actually have the disease?
  • Recall: TPTP+FN\frac{TP}{TP + FN}
    • Of all the patients who actually have the disease, how many did the model successfully catch?
    • Recall is more important in medical diagnosis because we want to minimize false negatives (missing a disease).
  • Sliding Window
    • Computationally expensive because it requires multiple passes over the image with different window sizes and strides.
    • Multiple scales are needed to detect objects of varying sizes, which further increases the computational cost.
  • Stride: controls the step size of the sliding filter. Larger stride means smaller output.
  • Edge
    • The points or pixels in an image where brightness or intensities change sharply.
    • Sobel filter
    • Prewitt filter
    • Canny edge detector
  • Padding: adds zeros around the image so the CONV does not shrink the feature map too much.
  • Keep the output dimension the same as the input dimension, we can use padding.
    • P=F12P = \frac{F - 1}{2}
  • Image Classification: Assigning a label to an entire image (e.g., cat, dog, car).
  • Object Detection: Identifying and localizing multiple objects within an image (e.g., bounding boxes)
  • Instance Segmentation: Identifying and segmenting each object instance in an image (e.g., pixel-level masks)
  • Momentum: uses an exponentially weighted average of past gradients to smooth updates and accelerate convergence.
  • RMSProp: uses an exponentially weighted average of squared gradients to adapt the learning rate for each parameter.
  • Adam: combines Momentum and RMSProp by using both the first moment, average gradient, and the second moment, average squared gradient.
  • Hyperparameters: learning rate, batch size, number of epochs, optimizer type, dropout rate, etc.
  • Supervised Learning: The model learns from labeled data, classification, regression.
  • Unsupervised Learning: The model learns from unlabeled data, clustering.
  • Loss/Cost function: an estimate of how far the model's predictions are from the actual target/answer.
  • AI is a broad concept of machines performing human-like tasks.
  • ML is a subset of AI that learns from data
  • DL is a subset of ML that uses deep neural networks with many layers.
  • ML's major problem
    • insufficient data
    • non-representative training data
    • poor-quality data
    • irrelevant features
    • overfitting
    • underfitting
  • When we use ML?
    • a large amount of data for finding patterns and making predictions
    • too many rules or too much complexity for humans to handle
  • Faster R-CNN: Propose regions first, then classify them
    • RPN: Region Proposal Network, which generates candidate object proposals
  • YOLO: Predict boxes and class probabilities directly from the image in one pass
    • Anchor boxes: predefined bounding boxes of different sizes and aspect ratios used to predict the location of objects in YOLO.
  • NMS: Non-Maximum Suppression, selects the best bounding box among overlapping boxes based on confidence scores.
  • 1×1 convolution mixes channel information and can reduce the number of channels, so later convolutions become cheaper.
  • Inception module: learns small, medium, and large visual features at the same time.
  • Transfer Learning Strategies:
    • First, if the new dataset is small and similar to the original dataset, we can use the pre-trained model directly.
    • Second, if the dataset is similar but has different classes, we freeze the convolutional layers and train only the fully connected classification layer.
    • Third, if the dataset is small but not very similar, we freeze the early convolutional layers and fine-tune the later convolutional layers plus the FC layer.
    • Finally, if the dataset is large and different, we can fine-tune the whole network.
  • IoU: Intersection over Union, a metric used to evaluate the accuracy of object detection models by comparing the predicted bounding box with the ground truth bounding box.