assigning semantic labels to pixels, regions, or entire images.
For example:
- Image-level: Does this image contain a building? (Answer: Yes/No)
 - Object-level: Where is the car? What are the people doing?
 - Attribute-level: What material is the building made of? What time does the clock show? Recognition tasks vary in granularity:
 - Category-level: Detecting any cereal box.
 - Instance-level: Identifying a specific cereal box (e.g., Kellogg’s).
 
Challenges
- scale of categories
- defining categories is also subjective
 
 - variations
- viewpoint
 - illumination
 - scale
 - deformation (non-rigid objects)
 - occlusion
 - clutter
 
 
recognition pipeline
- feature extraction
 - train a classifier - learn function f that maps features to labels
 - testing: apply f to new, unseen images

 
feature representation
- 
color histograms - simple, not robust to scale
 - 
SIFT - local detector - scale-invariant, good for keypoints
 - 
deep learning - highly flexible but data heavy
 
how to choose features?
(i think rgb is scale invariant tho)