Facebook Engineering Explains “Worst Outage We’ve Had in Over Four Years”

Facebook was down for two and a half hours earlier today for many of its 500-some million users around the world, in what the company describes as “the worst outage we’ve had in over four years.” As part of the downtime, social plugins such as the Like button, and the developer platform, were also not accessible. The site also went down yesterday, but apparently for less time and fewer people.

As the engineering team details in a post this afternoon following the outage, a cache configuration problem cascaded into a major system failure, and ended up with Facebook having to turn off the site for many if not all users. The company tells us it doesn’t “have exact numbers, but this very widespread.” From the post:

The way to stop the feedback cycle was quite painful – we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.

This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.

While Facebook has had occasional site performance problems, in general it has managed to stay up for almost all users almost all of the time, with performance improving in the most recent years.

Creative Social Branding

Mediabistro Course

Creative Social Branding

Starting November 24, learn how to create a social buzz for your brand! You’ll learn how to engage with audiences on social platforms, identify and engage with current trends and influencers, and build an excellent social strategy to amplify your numbers and rate engagement. Register now!


Leave a Reply

2 Responses to “Facebook Engineering Explains “Worst Outage We’ve Had in Over Four Years””

  1. John Johnston says:

    Nasty business. Must have been some pretty stressed engineers there having to take the site down for that long.

  2. essay_writing says:

    As I understand, the errors have been accumulating and this led to the system breakdown? If yes, I guess more tests and better monitoring the system all the time it was working could have avoided the breakdown.

Get the latest news in your inbox
interested in advertising with inside facebook?

Social Media Jobs
of the Day

SK Energy Seeking Social Media Guru

SK Energy Shots
New York, NY

Social Media Producer

Los Angeles Times
los angeles, CA

Social Media Specialist

Catholic Review
Baltimore, MD

Direct Marketing Strategist

Southern Poverty Law Center
Montgomery, AL

Freelance Assistant - Social Media Team

Viacom Velocity
New York, NY

Featured Company

Join leading companies like this one and recruit from the nation's top media job seekers on the Mediabistro Job Board. Every job post comes with our satisfaction guarantee. Learn More

Our Sponsors

Mediabistro A division of Prometheus Global Media home | site map | advertising/sponsorships | careers | contact us | help courses | browse jobs | freelancers | content | member benefits | reprints & permissions terms of use | privacy policy Copyright © 2014 Mediabistro Inc. call (212) 389-2000 or email us