what is it?

  • social media platform and visual discovery engine
  • it helps people find and save ideas and inspiration
  • also has a focus on fashion and e-commerce

key questions

  • What interests shall we recommend to a new user?
  • How to generate an engaging home-feed?
  • How do pins relate to each other?
  • What interests does a pin belong to?

Pinterest is trying to do 3 things with Computer Vision:

  1. understand aesthetic qualities of a product or a service to do better recommendations
  2. look inside an image with multiple items and search for similar results using any of those items
  3. make the camera the tool you use to query the world

key features

pins

  • media that users can save into differing boards
  • boards can have subgroups
  • users can pin content from other sources to Pinterest

recommendation system

  • Machine Learning is used to learn interests of users

  • AI is used to categorize and sort uploaded photos, making it easier for images to be ranked

    • representation learning model compares photos and groups based on similar qualities
      • visual patterns
      • metadata and user-labeled data
      • By saving pins into boards, users actually create a labeled dataset describing their preferences
    • non-visually-similar images can also be linked based on shared boards, captions, etc
    • the ranking model considers
      • domain quality
        • how well photos from a website perform on the app
      • pin quality
        • how well a photo performs based on user interactions + engagement (with pinner’s own followers, with larger audience, etc.)
      • pinner quality
        • engagement rate, quality of posted pins, how much engagement a user provides to others
      • topic relevance
        • analyze user preferences and content embeddings of pins, to rank most relevant content to display
  • PinSage actively gathers visually related images into graphs and uses them to generate content recommendations

  • Pinnability measures how likely certain users are to interact with Pins, ensuring relevance of recommendations

visual lens

  • search for items or ideas within images with Computer Vision

  • automatic Object Detection to find all objects in an image in real-time

  • query understanding layer

    • compute visual features
      • objects
      • salient colors
      • lighting
      • image quality conditions
    • compute semantic features such as annotations and category
  • blender - blend results from multiple sources

    • visual search returns visually similar results
      • enables object-to-object matching
      • seamless with auto object detection
      • challenge: collecting labeled bounding boxes for regions of interest aggregated image crops (visual searches) to learn which objects Pinners are interested in
      • aggregate annotations of visually similar results to each crop, and assigns a weak label across hundreds of object categories
      • uses Faster R-CNN (convolutional neural networks (CNN))
        • identifies regions likely to contain objects of interest by running a CNN to produce a feature map
          • for each location on the feature map, network considers a fixed set of regions, and uses binary softmax classifier to determine how likely it is to have an object of interest
        • for each candidate region, performs spatial pooling to produce a feature vector of fixed size
        • feature is inputted to detection network, which uses softmax to identify region as either bg or subject
        • more adjustments to boundaries to refine detection
        • non-maximum suppression to filter duplicate detections
    • object search returns scenes with visually similar objects
      • traditional visual search systems treat whole image as a unit
      • Pinterest wanted to understand images at a more fine-grained level
      • it knows both he location and semantic meaning of billions of objects in its image corpus!
      • objects are the unit - given an input image, finds the most visually similar objects in billions of images, maps those to the og image, and returns scenes containing similar objects
  • image search returns personalized text search results that are semantically relevant to the input image
    • the blender dynamically adjusts blending ratios and sources based on info from the query understanding layer

data architecture

  • Apache Kafka - process live data feeds
  • Redshift - manage and analyze data
  • Hadoop - process massive data
  • Storm - perform computations and process data in real-time
  • HBase - backend storage

analytics

  • service providing stats on website traffic
  • engagement, impressions, pin clicks, outbound clicks, saves
  • data can be used to investigate product popularity based on time of day, trends, etc.
  • marketing agencies can utilize this data to optimize for selling potential of products