본문으로 건너뛰기

CNN 008

· 약 4분

Datasets

PASCAL isual Object Classifcation

PASCAL VOC

  • a popular dataset for object detection, classification and segmentation
  • 20 categories

ImageNet

  • a dataset for object detection
  • 500,000 images, 200 categories
  • Not very popular due to large number of classes and size of the dataset

COCO

Microsoft Common Objects in Context dataset

  • a large-scale object detection, segmentation, and captioning dataset.
  • 330,000 images, 80 categories
  • 200,000 labeled images, 1.5 million object instances
  • 91 stuff categories

Intersecxtion over Union (IoU)

IoU=AreaofOverlapAreaofUnionIoU = \frac{Area of Overlap}{Area of Union}

  • a metric used of the evaluation of an object detector
  • how good is the predicted bounding box for an object detected colosely matches

AP

Average Precision

MetricDescription
APAPAP at IoU=.50:0.05:0.95 (primary challenge metric)
APIoU=.50AP^{IoU=.50} AP at IoU=0.50 (PASCAL VOC metric)
APIoU=.75AP^{IoU=.75} AP at IoU=0.75 (strict metric)
APsmallAP^{small}AP for small objects: area<322area < 32^2
APmediumAP^{medium}AP for medium objects: 322<area<96232^2 < area < 96^2
APlargeAP^{large}AP for large objects: area>962area > 96^2
ARmax=1AR^{max=1}AR given 1 detection per image
ARmax=10AR^{max=10}AR given 10 detections per image
ARmax=100AR^{max=100}AR given 100 detections per image
ARsmallAR^{small}AR for small objects: area<322area < 32^2
ARmediumAR^{medium}AR for medium objects: 322<area<96232^2 < area < 96^2
ARlargeAR^{large}AR for large objects: area>962area > 96^2

Taxonomy of Object Detection

History of Object Detection

History of Object Detection

Classification with Localization

  • Classification Task
    • Input: Image
    • Output: Class label
    • Performance Metric: Accuracy
  • Localization Task
    • Input: Image
    • Output: Bounding box coordinates (x,y,Ht,Wd)(x, y, Ht, Wd) or (x,y,x,y)(x, y, x', y')
    • Performance Metric: IoU

Localization Loss

Localization as a regression problem

Detection as a Classification Problem

Region Proposal

  • Find blobs in the image that are most likely to contain objects.
  • Selective Search: ~1000-2000 region proposal using CPU

R-CNN

Region based CNN

  • Convolution Neural Network as feature extractor
  • SVM as classifier
  • Bounding box regression for localization
  • Pass each region through CNN to extract features, then classify using SVM and refine bounding box using regression
  • Warped image region to fixed size (e.g., 227x227) before passing through CNN
  • Region-of-Interest (RoI) from proposal method around 2000 per image, which is computationally expensive

Fast R-CNN

  • Run Whole image through CNN to get feature map, then classify each region proposal using RoI pooling and fully connected layers
    • Region of Interest (RoIs) from proposal method
    • Crop and Resize features
    • Per-Region Network
    • Linear + Softmax for Object category
    • Linear for Box offset
  • Reduce computation
  • ROIs from feature maps using selective search
  • mAP: 70% for PASCAL VOC 2007

Faster R-CNN

  • Use CNNs to make proposal
  • RPN (Region Proposal Network) to generate region proposals
    • Small nural network to predict proposals from feature map
  • RoI pooling to extract features for each proposal
    • then classify and refine bounding box
  • mAP: 78.8% for PASCAL VOC 2007
ModelDescription
R-CNNLook at every patch one by one
Fast R-CNNLook once, and then inspect patches on feature map
Faster R-CNNPropose patches using a neural network (RPN)

R-CNN Family Comparison

FeatureR-CNNFast R-CNNFaster R-CNN
Region proposalSelective searchSelective searchRPN (learned)
CNN UsagePer regionOnce per imageOnce per image
SpeedVery slowFasterCan work in real-time
TrainingMulti-stage, discretePartially end-to-endFully end-to-end
AccuracyGoodBetterBest of all three

Image Annotation for Object Detection

  • difficulty: not easy to annotate images even for humans