extends Recognition by not just identifying objects, but localizing (where is it?)
Involves
- localization (positions of objects - bounding boxes)
- recognition - classification of each detected object
Challenging Key benchmarks include:
- PASCAL VOC (20 categories).
- ImageNet (200 categories).
- COCO (80 categories, highly diverse).
evaluation
intersection over union (IoU)
- measures localization accuracy
- IoU = 1 means perfect detection
- IoU = 0: no overlap (incorrect)
-
= 0.5 is acceptable
metrics: precision and recall
- True Positive (IoU >= threshold)
- False Positive: (IoU < threshold)
- False Negative: Missed detection (ground truth not detected) Example
- if a model predicts 3 boxes (1 correct, 2 incorrect)
- precision = 1/3
- recall = 1/2 (if there are 2 ground truths
trad methods
sliding window
- Enumerate possible locations using a sliding window.
- Run a recognition model on each window.
- Aggregate detections across scales.
Challenges:
- Objects vary in size → Need multiple window sizes.
- Computationally expensive.
image pyramid
to handle diff object sizes
- fix window size but resize image (creating pyramid)
- this is equiv to varying window sizes
deformable parts (DPM)
- part-based detection
linear models for classification
- perceptron
- softmax
- loss function: cross-entropy