29
The Technology Behind Facebook Revealed Presented by: Prakhar Gethe ( CEO and Co-founder Team Zenith )

How facebook works and function- a complete approach

Embed Size (px)

Citation preview

Page 1: How facebook works and function- a complete approach

The Technology Behind Facebook

Revealed

Presented by:Prakhar Gethe( CEO and Co-founder Team Zenith)

Page 2: How facebook works and function- a complete approach

Why Facebook Is Giant

Facebook is the “social networking”. People have been “facebooking” each other for about 7 years now,

making Facebook the most used social network with over 500 million users worldwide.

50% of our active users log on to Facebook in any given day Average user has 130 friends People spend over 700 billion minutes per month on Facebook There are over 900 million objects that people interact with (pages,

groups, events and community pages) Average user is connected to 80 community pages, groups and events Average user creates 90 pieces of content each month More than 30 billion pieces of content (web links, news stories, blog

posts, notes, photo albums, etc.) shared each month.

Page 3: How facebook works and function- a complete approach

Here are a few factoids to give you an idea of the scaling challenge that Facebook has to deal with: Facebook serves 570 billion page views per month (according to Google Ad

Planner). There are more photos on Facebook than all other photo sites combined

(including sites like Flickr). More than 3 billion photos are uploaded every month. Facebook’s systems serve 1.2 million photos per second. This doesn’t

include the images served by Facebook’s CDN. More than 25 billion pieces of content (status updates, comments, etc) are

shared every month. Facebook has more than 30,000 servers (and this number is from last year!)

Scaling Challenge Of Facebook

Page 4: How facebook works and function- a complete approach

Software That Helps Facebook ScaleIn some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its operation to incorporate a lot of other elements and services, and modify the approach to existing ones.For example:

Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance.

Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput).

Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the “other side” of the Memcached layer).

Then there are the custom-written systems, like Haystack, a highly scalable object store used to serve Facebook’s immense amount of photos, or Scribe, a logging system that can operate at the scale of Facebook (which is far from trivial).

But enough of that. Let’s present (some of) the software that Facebook uses to provide us all with the world’s largest social network site.

Page 5: How facebook works and function- a complete approach

For back end PHP C++ Java Python FBML( developed at Facebook) Erlang Xhp( developed at Facebook)

Technology Used By Facebook

Database mysql-5.6 Memcached Haystack Cassandra Scribe Preasto

For front –end Ajax JSON Javascript Jquery

Page 6: How facebook works and function- a complete approach

For Back-end • PHPPHP is a server-side scripting language designed for web development but also used as a general-purpose programming language. It stands for PHP: Hypertext Preprocessor

• C++C++ is a programming language that is general purpose, statically typed, free-form, multi-paradigm and compiled.

• JavaJava is a computer programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be recompiled to run on another.

• PythonPython is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.

Page 7: How facebook works and function- a complete approach

FBMLFBML is a software environment provided by the social networking service Facebook for third-party developers to create their own applications and services that access data in Facebook

ErlangErlang is a general-purpose concurrent, garbage-collected Programming language and runtime system. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping, so that code can be changed without stopping a system.

XhpXHP is an augmentation of PHP developed at Facebook to allow XML syntax for the purpose of creating custom and reusable HTML elements.

Page 8: How facebook works and function- a complete approach

For Front-end AjaxAjax (an acronym for Asynchronous JavaScript and XML)[1] is a group of interrelated web development techniques used on the client-side to create asynchronous web applications. With Ajax, web applications can send data to, and retrieve data from, a server asynchronously (in the background) without interfering with the display and behavior of the existing page. Data can be retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not required (JSON is often used instead.), and the requests do not need to be asynchronous.

JavaScriptJavaScript (JS) is an interpreted computer programming language.As part of web browsers, implementations allow client-side scripts to interact with the user, control the browser, communicate asynchronously, and alter the document content that is displayed It has also become common in server-side programming, game development and the creation of desktop applications.

Page 9: How facebook works and function- a complete approach

jQuery jQuery is a multi-browser (cf. cross-browser) JavaScript library designed to simplify the client-side scripting of HTML. It was released in January 2006 at BarCamp NYC by John Resig. It is currently developed by a team of developers led by Dave Methvin. Used by over 65% of the 10,000 most visited websites, jQuery is the most popular JavaScript library in use today

JSONJSON or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

XMLExtensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification[3] produced by the W3C, and several other related specifications,[4] all free open standards.[5]

Page 10: How facebook works and function- a complete approach

Database Technologies mysql-5.6MySQL is (as of July 2013) the world's second most widely used open-source relational database management system (RDBMS).It is named after co-founder Michael Widenius's daughter, My.The SQL phrase stands for Structured Query Language.

MemcachedMemcached is by now one of the most famous pieces of software on the internet. It’s a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the world’s largest Memcached installation.

HaystackHaystack is Facebook’s high-performance photo storage/retrieval system (strictly speaking, Haystack is an object store, so it doesn’t necessarily have to store photos). It has a ton of work to do. There are more than 20 billion uploaded photos on Facebook, and each one is saved in four different resolutions, resulting in more than 80 billion photos.And it’s not just about being able to handle billions of photos, performance is critical. As we mentioned previously, Facebook serves around 1.2 million photos per second, a number which doesn’t include images served by Facebook’s CDN. That’s a staggering number.

Page 11: How facebook works and function- a complete approach

Cassandra Cassandra is a distributed storage system with no single point of failure. It’s one of the poster children for the NoSQL movement and has been made open source (it’s even become an Apache project). Facebook uses it for its Inbox search.Other than Facebook, a number of other services use it, for example Digg..

ScribeScribe is a flexible logging system that Facebook uses for a multitude of purposes internally. It’s been built to be able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up (Facebook has hundreds).

PreastoPresto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

Page 12: How facebook works and function- a complete approach

HiveApache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

Hadoop Distributed File System (HDFS)To understand how it’s possible to scale a Hadoop® cluster to hundreds (and even thousands) of nodes, you have to start with the Hadoop Distributed File System (HDFS). Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing

Page 13: How facebook works and function- a complete approach

Other Application BigPipeBigPipe is a dynamic web page serving system that Facebook has developed. Facebook uses it to serve each web page in sections (called “pagelets”) for optimal performance.For example, the chat window is retrieved separately, the news feed is retrieved separately, and so on. These pagelets can be retrieved in parallel, which is where the performance gain comes in, and it also gives users a site that works even if some part of it would be deactivated or broken

Hadoop and HiveHadoopHadoop is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). Hive originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use.Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.

Page 14: How facebook works and function- a complete approach

ThriftFacebook uses several different languages for its different services. PHP is used for the front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps other languages as well). Thrift is an internally developed cross-language framework that ties all of these different languages together, making it possible for them to talk to each other. This has made it much easier for Facebook to keep up its cross-language development.Facebook has made Thrift open source and support for even more languages has been added

VarnishVarnish is an HTTP accelerator which can act as a load balancer and also cache content which can then be served lightning-fast.Facebook uses Varnish to serve photos and profile pictures, billions of requests every day. Like almost everything Facebook uses, Varnish is open source.

Page 15: How facebook works and function- a complete approach

For ChatEpoll Server using ErlangAccessed using thrift Message SearchInverted index stored in HBase

epollepoll - I/O event notification facilityThe Epoll event mechanism is designed to scale to larger numbers ofconnections than select and poll.

HBaseHBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java.

Page 16: How facebook works and function- a complete approach

The Graph API The Graph API presents a simple, consistent view of the Facebook social

graph, uniformly representing objects in the graph (e.g.,people, photos, events, and pages) and the connections between them (e.g., friend relationships, shared content, and photo tags).

Restful API for accessing data on the Facebook graph. Oauth 2.0 based authentication. JSON Modeling of objects and connections. Every object in the social graph has a unique ID. You can access the

properties of an object by requesting - https://graph.facebook.com/ID Alternatively, people and pages with usernames can be accessed using

their username as an ID. All responses are JSON objects. Specifications - http://developers.facebook.com/docs/api

Page 17: How facebook works and function- a complete approach

Facebook Markup Language FBML is a variant-evolved subset of HTML with some elements

removed. It allows Facebook Application developers to customize the "look

and feel" of their applications, to a limited extent. It is the specification of how to encode content so that Facebook's

servers can read and publish it. FBML plays an important role in building applications. FBML is used

to tap in to various Facebook elements when building applications. It operates a lot like HTML and it gives the ability to do various tasks

with ease such as: ending a user e-mail creating a two column form embedding flash video creating a dashboard posting on a wall displaying a header…etc

Page 18: How facebook works and function- a complete approach

Facebook’s New Messages

• The new Messages interweaves your chats, texts and emails. It’s a central place to control all of your private communication, both on and off Facebook.

• Simply put, it can be a single inbox for all of your messages, no matter how you choose to send them.

• A facebook.com Email Address• SMS From Facebook• Chat History

Page 19: How facebook works and function- a complete approach

Open Source Software For mobile Xctoolxctool is a replacement for Apple's xcodebuild that makes it easier to build and test iOS and Mac products. It's especially helpful for continuous integration

ReboundRebound is a Java library that models spring dynamics. Rebound spring models can be used to create animations that feel natural by introducing real world physics to your application.

BuckBuck is a build system for Android that encourages the creation of small, reusable modules consisting of code and resources. Because Android applications are predominantly written in Java, Buck also functions as a Java build system.

Page 20: How facebook works and function- a complete approach

rng.ioPowers the Ringmark testing framework at rng.io, as donated to the W3C Coremob Community Group.

Facebook SDK for iOS Use the Facebook SDK for iOS to integrate with Facebook, help build engaging social apps, and get more installs.

facebook-android-sdkUse the Facebook SDK for Android to integrate with Facebook, help build engaging social apps, and get more installs

Fishhookfishhook is a very simple library that enables dynamically rebinding symbols in Mach-O binaries running on iOS in the simulator and on devices.

Page 21: How facebook works and function- a complete approach

Open Source Software For Web ReactReact is a JavaScript library for building user interfaces. React uses a declarative paradigm that makes it easier to reason about your application. It's efficient: React computes the minimal set of changes necessary to keep your DOM up-to-date. And it's flexible: React works with the libraries and frameworks that you already know.

HhvmHipHop VM (HHVM) is an open-source virtual machine designed for executing programs written in PHP. HHVM uses a just-in-time compilation approach to achieve superior performance while maintaining the flexibility that PHP developers are accustomed to. HipHop VM (and before it HPHPc) has realized more than a 5x increase in throughput for Facebook compared with Zend PHP 5.2.

HuxleyHuxley is a test-like system for catching visual regressions in Web applications. It watches you browse, takes screenshots, and tells you when they change

Page 22: How facebook works and function- a complete approach

Regenerator Regenerator is a source transformer enabling ECMAScript 6 generator functions (yield) in JavaScript-of-today (ES5). The generator syntax provides a much cleaner alternative to using callbacks when writing asynchronous server-side code.

facebook-php-sdkUse the Facebook SDK for PHP to integrate with Facebook, help build engaging social apps, and get more users.

Some other tools are node-haste jstransform rng.io rebound

TornadoTornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.

Page 23: How facebook works and function- a complete approach

Open Source Software For Data PrestoPresto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

mysql-5.6Facebook's branch of the Oracle MySQL v5.6 database

ScribeScribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.

There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers

Page 24: How facebook works and function- a complete approach

Open Source Software For Infra

RocksdbRocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database but our current focus is on embedded workloads.

OpencomputeThe Open Compute Project Foundation is a rapidly growing community of engineers around the world whose mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing

Pfffpfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code

SwiftSwift is an easy-to-use, annotation-based Java library for creating Thrift serializable types and services.

Page 25: How facebook works and function- a complete approach

FollyFolly is an open-source C++ library developed and used at Facebook. It is a library of C++11 components designed with practicality and efficiency in mind. It complements (as opposed to competing against) offerings such as Boost and of course std. In fact, we embark on defining our own component only when something we need is either not available, or does not meet the needed performance profile.

FlashcacheFlashCache is a general purpose writeback block cache for Linux.

Some other relevant tools: tornado pyaib watchman hhvm

Page 26: How facebook works and function- a complete approach

Gradual releases and dark launches

Facebook has a system, Gatekeeper that lets run different code for different sets of users.

This lets Facebook do gradual releases of new features, activate certain features only for Facebook employees, etc.

Gatekeeper also lets Facebook do something called “dark launches”, which is to activate elements of a certain feature behind the scenes before it goes live.

Page 27: How facebook works and function- a complete approach

Facebook has also widgetized large portions of their application, meaning that widgets can be written in an appropriate language instead of simply using PHP. These widgets interface with the other parts of the application through the use of internal APIs.

Like many other big sites, Facebook uses a Content delivery network (CDN) to help serve static content.

And then of course there is the huge data center Facebook is building in Oregon to help it scale out with even more servers.

Page 28: How facebook works and function- a complete approach

Shocking facts about FacebookA third of all divorce filings in 2011 contained the word "Facebook”

Iceland used Facebook to rewrite its constitution!

Adding the number 4 to the end of Facebook’s URL will automatically direct you to Mark Zuckerberg’s wall.

Facebook pays $500 to anyone who can hack into it!.

A couple got murdered because they de-friended someone on Facebook

A man was ordered to apologize on Facebook or Go to Jail Read more at

www.omg-facts.com

Page 29: How facebook works and function- a complete approach