• fine-tune pre-trained models instead of training from scratch

two approaches:

  • feature extraction:
    • use pre-trained model as backbone (freeze early layers / generic features)
    • train a classifier head to use the backbone
  • fine-tuning:
    • start w pretrained weights
    • freeze weights of big model and train just few layers to fine tune

powers modern vision language models (VLMs)