• open-source software framework for storing and processing massive datasets across clusters of computers
  • distributed architecture, breaking data into pieces and processing in parallel