Details on Facebook Chat Architecture
For those interested in building scalable systems, Eugene Lutuchy, lead engineer on Facebook Chat, has posted details on many of the key engineering decisions his team made designing the Chat back-end infrastructure.
Letuchy writes, “When your feature’s userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start.” That’s an understatement! While we’ve gotten several reports of Facebook Chat breaking under Firefox 3.0 RC1, on the whole Facebook Chat’s rollout has been very, very solid.
Some highlights from the report:
- The most resource-intensive operation performed in a chat system is not sending messages. It is rather keeping each online user aware of the online-idle-offline states of their friends, so that conversations can begin.
- Another challenge is ensuring the timely delivery of the messages themselves. The method we chose to get text from one user to another involves loading an iframe on each Facebook page, and having that iframe’s Javascript make an HTTP GET request over a persistent connection that doesn’t return until the server has data for the client. The request gets reestablished if it’s interrupted or times out. This isn’t by any means a new technique: it’s a variation of Comet, specifically XHR long polling, and/or BOSH.
- Having a large-number of long-running concurrent requests makes the Apache part of the standard LAMP stack a dubious implementation choice.
- For Facebook Chat, we rolled our own subsystem for logging chat messages (in C++) as well as an epoll-driven web server (in Erlang) that holds online users’ conversations in-memory and serves the long-polled HTTP requests. Both subsystems are clustered and partitioned for reliability and efficient failover. Why Erlang? In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space “processes”, share-nothing message-passing semantics, built-in distribution, and a “crash and recover” philosophy proven by two decades of deployment on large soft-realtime production systems.
- Having Thrift available freed us to split up the problem of building a chat system and use the best available tool to approach each sub-problem.













May 20th, 2008 at 12:14 pm
Great information — we think it was a good move for Facebook to include an embedded chat program into the mix – just as long as they don’t gather the information within the conversations for their own benefit. One can become weary of these social networks and their motives regarding personally privacy these days.
May 23rd, 2008 at 2:43 am
[...] 最近看到了不少Facebook chat技术架构的介绍,如 InfoQ 和 Inside Facebook。总结如下,公开的资料以概念为主,没有什么新的或特殊的亮点。 [...]
December 9th, 2008 at 9:14 am
I was just wondering how do I recall chat messages? Is it stored on my computer somehow?
August 11th, 2011 at 2:54 am
[...] CDN_百度百科Details on Facebook Chat ArchitectureDownload Hadoop at OSCON · YDN BlogFacebook Architecture – High Performance at Massive Scale and Other Resources « Information WastelandFrank’s Blog » SNS技术剖析[转]Hadoop – YDNhttp–www.infoq.com-resource-presentations-Facebook-Software-Stack-en-slides-6.swfinfoq Facebook Science and the Social GraphInside Facebook Messages’ Application Server FacebookOpen Compute Project Data CentersOpen Compute ProjectOpenID – 维基百科,自由的百科全书Quora使用到的技术 酷壳 – CoolShell.cnScaling Facebook to 500 Million Users and Beyond FacebookSNS社交平台的核心技术架构 – 你的阳光 yoursunny.comsquid Optimising Web DeliveryYahoo! Hadoop Blog» 架构师之路人人网SNS技术架构_楚汉信息_新浪博客人人网使用的开源软件列表_楚汉信息_新浪博客如何用好NoSQL?Database-as-a-Service – 逖靖寒的世界 – CSDN博客电子商务领域的架构师弄潮儿 ——访麦包包首席架构师盛国军 [...]