Details on Facebook Chat Architecture
May 20th, 2008
For those interested in building scalable systems, Eugene Lutuchy, lead engineer on Facebook Chat, has posted details on many of the key engineering decisions his team made designing the Chat back-end infrastructure.
Letuchy writes, “When your feature’s userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start.” That’s an understatement! While we’ve gotten several reports of Facebook Chat breaking under Firefox 3.0 RC1, on the whole Facebook Chat’s rollout has been very, very solid.
Some highlights from the report:
- The most resource-intensive operation performed in a chat system is not sending messages. It is rather keeping each online user aware of the online-idle-offline states of their friends, so that conversations can begin.
- Another challenge is ensuring the timely delivery of the messages themselves. The method we chose to get text from one user to another involves loading an iframe on each Facebook page, and having that iframe’s Javascript make an HTTP GET request over a persistent connection that doesn’t return until the server has data for the client. The request gets reestablished if it’s interrupted or times out. This isn’t by any means a new technique: it’s a variation of Comet, specifically XHR long polling, and/or BOSH.
- Having a large-number of long-running concurrent requests makes the Apache part of the standard LAMP stack a dubious implementation choice.
- For Facebook Chat, we rolled our own subsystem for logging chat messages (in C++) as well as an epoll-driven web server (in Erlang) that holds online users’ conversations in-memory and serves the long-polled HTTP requests. Both subsystems are clustered and partitioned for reliability and efficient failover. Why Erlang? In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space “processes”, share-nothing message-passing semantics, built-in distribution, and a “crash and recover” philosophy proven by two decades of deployment on large soft-realtime production systems.
- Having Thrift available freed us to split up the problem of building a chat system and use the best available tool to approach each sub-problem.
|


Twitter
Facebook




Italian / Italiano
Strategic Facebook Platform Ecosystem Overview and Guide For Agencies & Brands
Track Facebook's International Growth in 95 Global Markets with our Monthly Reports




May 20th, 2008 at 12:14 pm
Great information — we think it was a good move for Facebook to include an embedded chat program into the mix - just as long as they don’t gather the information within the conversations for their own benefit. One can become weary of these social networks and their motives regarding personally privacy these days.
May 23rd, 2008 at 2:43 am
[...] 最近看到了不少Facebook chat技术架构的介绍,如 InfoQ 和 Inside Facebook。总结如下,公开的资料以概念为主,没有什么新的或特殊的亮点。 [...]
December 9th, 2008 at 9:14 am
I was just wondering how do I recall chat messages? Is it stored on my computer somehow?