assigning semantic labels to pixels, regions, or entire images.

For example:

  • Image-levelDoes this image contain a building? (Answer: Yes/No)
  • Object-levelWhere is the car? What are the people doing?
  • Attribute-level: What material is the building made of? What time does the clock show? Recognition tasks vary in granularity:
  • Category-level: Detecting any cereal box.
  • Instance-level: Identifying a specific cereal box (e.g., Kellogg’s).

Challenges

  • scale of categories
    • defining categories is also subjective
  • variations
    • viewpoint
    • illumination
    • scale
    • deformation (non-rigid objects)
    • occlusion
    • clutter

recognition pipeline

feature representation

how to choose features?

(i think rgb is scale invariant tho)