assigning semantic labels to pixels, regions, or entire images.
For example:
- Image-level: Does this image contain a building? (Answer: Yes/No)
- Object-level: Where is the car? What are the people doing?
- Attribute-level: What material is the building made of? What time does the clock show? Recognition tasks vary in granularity:
- Category-level: Detecting any cereal box.
- Instance-level: Identifying a specific cereal box (e.g., Kellogg’s).
Challenges
- scale of categories
- defining categories is also subjective
- variations
- viewpoint
- illumination
- scale
- deformation (non-rigid objects)
- occlusion
- clutter
recognition pipeline
- feature extraction
- train a classifier - learn function f that maps features to labels
- testing: apply f to new, unseen images
feature representation
-
color histograms - simple, not robust to scale
-
SIFT - local detector - scale-invariant, good for keypoints
-
deep learning - highly flexible but data heavy
how to choose features?
(i think rgb is scale invariant tho)