system design of pinterest

what is it?

social media platform and visual discovery engine
it helps people find and save ideas and inspiration
also has a focus on fashion and e-commerce

key questions

What interests shall we recommend to a new user?
How to generate an engaging home-feed?
How do pins relate to each other?
What interests does a pin belong to?

Pinterest is trying to do 3 things with Computer Vision:

understand aesthetic qualities of a product or a service to do better recommendations
look inside an image with multiple items and search for similar results using any of those items
make the camera the tool you use to query the world

key features

pins

media that users can save into differing boards
boards can have subgroups
users can pin content from other sources to Pinterest

recommendation system

Machine Learning is used to learn interests of users
AI is used to categorize and sort uploaded photos, making it easier for images to be ranked
- representation learning model compares photos and groups based on similar qualities
  - visual patterns
  - metadata and user-labeled data
  - By saving pins into boards, users actually create a labeled dataset describing their preferences
- non-visually-similar images can also be linked based on shared boards, captions, etc
- the ranking model considers
  - domain quality
    - how well photos from a website perform on the app
  - pin quality
    - how well a photo performs based on user interactions + engagement (with pinner’s own followers, with larger audience, etc.)
  - pinner quality
    - engagement rate, quality of posted pins, how much engagement a user provides to others
  - topic relevance
    - analyze user preferences and content embeddings of pins, to rank most relevant content to display
PinSage actively gathers visually related images into graphs and uses them to generate content recommendations
Pinnability measures how likely certain users are to interact with Pins, ensuring relevance of recommendations

visual lens

search for items or ideas within images with Computer Vision
automatic Object Detection to find all objects in an image in real-time
query understanding layer
- compute visual features
  - objects
  - salient colors
  - lighting
  - image quality conditions
- compute semantic features such as annotations and category
blender - blend results from multiple sources
- visual search returns visually similar results
  - enables object-to-object matching
  - seamless with auto object detection
  - challenge: collecting labeled bounding boxes for regions of interest aggregated image crops (visual searches) to learn which objects Pinners are interested in
  - aggregate annotations of visually similar results to each crop, and assigns a weak label across hundreds of object categories
  - uses Faster R-CNN (convolutional neural networks (CNN))
    - identifies regions likely to contain objects of interest by running a CNN to produce a feature map
      - for each location on the feature map, network considers a fixed set of regions, and uses binary softmax classifier to determine how likely it is to have an object of interest
    - for each candidate region, performs spatial pooling to produce a feature vector of fixed size
    - feature is inputted to detection network, which uses softmax to identify region as either bg or subject
    - more adjustments to boundaries to refine detection
    - non-maximum suppression to filter duplicate detections
- object search returns scenes with visually similar objects
  - traditional visual search systems treat whole image as a unit
  - Pinterest wanted to understand images at a more fine-grained level
  - it knows both he location and semantic meaning of billions of objects in its image corpus!
  - objects are the unit - given an input image, finds the most visually similar objects in billions of images, maps those to the og image, and returns scenes containing similar objects

image search returns personalized text search results that are semantically relevant to the input image
- the blender dynamically adjusts blending ratios and sources based on info from the query understanding layer

data architecture

Apache Kafka - process live data feeds
Redshift - manage and analyze data
Hadoop - process massive data
Storm - perform computations and process data in real-time
HBase - backend storage

analytics

service providing stats on website traffic
engagement, impressions, pin clicks, outbound clicks, saves
data can be used to investigate product popularity based on time of day, trends, etc.
marketing agencies can utilize this data to optimize for selling potential of products

jennypng

Recent Notes

system design of pinterest

Apache Kafka

System Design

Websites I really like

Hadoop

Explorer

system design of pinterest

what is it?

key questions

key features

pins

recommendation system

visual lens

data architecture

analytics

Graph View

Table of Contents

Backlinks