Introduction • 1.15 billion monthly active users . • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments) • 2.7 billion Likes per day • 300 million photos uploaded per day • 500+terabytes of new data ingested into the databases every day By this statics, Facebook have to use such a great technology to handle this traffic and giving their user a faster and safer social experience
Technologies For faster data transfer • Cookies and Caches • GZip compression • AJAX and JSON • XMPP messaging For data storage • HBase & Haystack • Zookeeper • Memcached • Scribe
Cookies and Caches Cookies are small pieces of data that are stored on your computer, mobile phone or other device. Cache is a type of memory which is used by web browser. When any page loads and it is not changeable for a long time browser cache it’s CSS/JS and read it from memory to reduce the data transfer . It provide and understand a range of products and services. Facebook use this technologies to do things like: • make Facebook easier or faster to use; • enable features and store information about you (including on your device or in your browser cache) and your use of Facebook; • deliver, understand and improve advertising; • monitor and understand the use of FB products and services; • to protect you, others and Facebook.
Cookies and Caches
Gzip Compression Gzip is a software application used for file compression and decompression It compresses the image, CSS, JS sent by server and loads in client machine then decompress it. So there is no change in data and UI but data transfer rate is decreased. So all servers of Facebook used Gzip compression to make web more faster
AJAX and JSON AJAX and JSON is a group of interrelated web development techniques used on the client-side to create asynchronous web applications. With AJAX, web applications can send data to, and retrieve data from, a server asynchronously (in the background) without interfering with the display and behavior of the existing page. Data can be retrieved using the XMlHttpRequest object. Where AJAX-JSON mainly used in Facebook • Like, Comment, Share • Post story • Send message • Load feed • Dialog Box – likes, Mutual friends etc…
AJAX and JSON
XMPP Messaging XMPP stands for Extensible Messaging and Presence Protocol. XMPP is also called jabber protocol. Facebook chat and messages work on this platform. Every user of Facebook has a unique id and personal chat email like email@example.com and someone wants to send message to that user core script convert it to XML and send to Jabber server. After this process partner user gets the message at same instance due to highly reliable servers.
Manage data in large clusters
HBase and Haystack HDFS ( Highly Distributed File System ) • HBase & HDFS are elastic by design • Multiple table shards (regions) per physical server • On node additions • Load balancer automatically reassigns shards from overloaded nodes to new nodes • Because file system underneath is itself distributed, data for reassigned regions is instantly servable from the new nodes. • Regions can be dynamically split into smaller regions. • Pre-sharding is not necessary • Splits are near instantaneous!
HBase and Haystack Automatic failover • Node failures automatically detected by HBase Master • Regions on failed node are distributed evenly among surviving nodes. • Multiple regions/server model avoids need for substantial overprovisioning • HBase Master failover • 1 active, rest standby • When active master fails, a standby automatically takes over
HBase and Haystack
Zookeeper Zookeeper is open source software that FB use mainly for two purposes: • As the controller for implementing sharding and failover of application servers • As a store for their discovery service. Since Zookeeper provides FB with a highly available repository and notification mechanism, it goes a long way towards helping FB build a highly available service.
Memcached If you've read anything about scaling large websites, you've probably heard about memcached. Memcached is a high-performance, distributed memory object caching system. It speeding up Facebook by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. Facebook is the world's largest user of memcached. They use memcached to alleviate database load.
Scribe – Log server Scribe was developed at Facebook using Apache Thrift and released in 2008 as open source. Scribe is a server for aggregating log data streamed in real-time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine. Desktop site Mobile site Application Legacy SMS/Email Scribe Scribe Scribe