Inside Network - Providing news and market research to the Facebook platform and social gaming ecosystem Inside Facebook    Inside Social Games    Inside Virtual Goods    AppData    PageData  
Facebook Marketing Bible   App Stats   Page Stats   Contact   About   Advertise       Subscribe:   Email   RSS   Twitter   Facebook
Offerpal Media
By Justin Smith 1 Comment »

hadoopFacebook’s engineering team has posted some details on the tools it’s using to analyze the huge data sets it collects. One of the main tools it uses is Hadoop, an open source project that makes it easier to analyze vast amounts of data.

Some interesting tidbits from the post:

  • Some of these early projects have matured into publicly released features (like the Facebook Lexicon) or are being used in the background to improve user experience on Facebook (by improving the relevance of search results, for example).
  • Facebook has multiple Hadoop clusters deployed now – with the biggest having about 2500 cpu cores and 1 PetaByte of disk space. We are loading over 250 gigabytes of compressed data (over 2 terabytes uncompressed) into the Hadoop file system every day and have hundreds of jobs running each day against these data sets. The list of projects that are using this infrastructure has proliferated – from those generating mundane statistics about site usage, to others being used to fight spam and determine application quality.
  • Over time, we have added classic data warehouse features like partitioning, sampling and indexing to this environment. This in-house data warehousing layer over Hadoop is called Hive and we are looking forward to releasing an open source version of this project in the near future.

Check out The Facebook Marketing Bible: 50+ Ways to Market Your Brand, Company, Product, or Service Inside Facebook.

Inside Facebook Sponsors
SoftLayer     Mopay
AdParlor

One Response to “Facebook Using Hadoop for Large Scale Internal Analytics”

  1. separati.st » Daily Interesting Shizzle for June 7th Says:

    [...] google reader: Facebook Using Hadoop for Large Scale Internal Analytics Share me: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

Community

Login using your Facebook account, or enter your personal information below to comment.
Recent visitors
view more...

Leave a Reply