extends Recognition by not just identifying objects, but localizing (where is it?)

Involves

  • localization (positions of objects - bounding boxes)
  • recognition - classification of each detected object

Challenging Key benchmarks include:

  • PASCAL VOC (20 categories).
  • ImageNet (200 categories).
  • COCO (80 categories, highly diverse).

evaluation

intersection over union (IoU)

  • measures localization accuracy
  • IoU = 1 means perfect detection
  • IoU = 0: no overlap (incorrect)
  • = 0.5 is acceptable

metrics: precision and recall

  • True Positive (IoU >= threshold)
  • False Positive: (IoU < threshold)
  • False Negative: Missed detection (ground truth not detected) Example
  • if a model predicts 3 boxes (1 correct, 2 incorrect)
    • precision = 1/3
    • recall = 1/2 (if there are 2 ground truths

trad methods

sliding window

  1. Enumerate possible locations using a sliding window.
  2. Run a recognition model on each window.
  3. Aggregate detections across scales.

Challenges:

  • Objects vary in size → Need multiple window sizes.
  • Computationally expensive.

image pyramid

to handle diff object sizes

  • fix window size but resize image (creating pyramid)
  • this is equiv to varying window sizes

deformable parts (DPM)

  • part-based detection

linear models for classification

optimization: gradient descent and backpropagation

neural networks