a classifier

  • store all training data
    • for test image, find k most similar training examples and vote on their labels - choose k nearest neighbors
  • predict a test image’s class based on which classes the k nearest train images belong to.
    • For example, using k = 3, if we found that for test image X, the three nearest train images were 2 pictures of Angelina Jolie, and one picture of Audrey Hepburn, we would predict that the test image X is a picture of Angelina Jolie.

curse of dimensionality principal component analysis (pca)

  1. center the data - subtract mean from each feature
  2. compute covariance matrix
  3. singular value decomposition (SVD) - factorize
  4. project data: keep top k components (like those covering 95% variance)

use PCA bc:

  • removes redundancy
  • speeds up
  • classic application: eigenfaces

classification

  • assign input vector to a class
  • geometric interpretation of classifiers:
    • classifier divides input space into decision regions separated by decision boundaries

challenges

  • small k: sensitive to noise
  • large k: may include irrelevant distant points Solution: using cross validation to balance train/test error