How Facebook/Meta Was Made: The Technology Behind the World's Largest Social Platform
Sep 16, 2024
35 min read
0
18
0
Facebook's Technology Stack: How It Powers a Global Platform
The Rise
When Mark Zuckerberg launched Facebook from his Harvard dorm room in 2004, few could have predicted that it would grow into the global platform we know today, serving over 3 billion users across the world.
Non Tech Story:
The story of Facebook’s creation is one of ambition, ingenuity, and perhaps a little bit of luck. It’s a tale of a college dorm room project that grew into a global social media empire, changing the way we connect, communicate, and share information. Let’s dive deep into the beginnings of Facebook, from its humble origins at Harvard University to its meteoric rise as one of the most influential tech companies in history.
The Genesis: A Dorm Room Dream (2003-2004)
In the fall of 2003, a 19-year-old sophomore named Mark Zuckerberg was studying computer science and psychology at Harvard University . Like many Harvard students, Zuckerberg was intensely focused on academics, but he also had a passion for programming. He had already built a few small web projects, such as CourseMatch , a platform for students to select courses based on who else was enrolling, and Facemash , a controversial “hot-or-not” style website that allowed users to compare the attractiveness of students. Though Facemash was taken down by Harvard administration, it planted the seeds for what would come next.
The Harvard "Face Book" Concept
At the time, Harvard University had a collection of printed directories known as "face books," which contained photos and basic information of students. These face books were distributed to students at the start of each academic year. Inspired by the concept, Zuckerberg believed that there should be an online version of this, accessible to students across the university.
His vision was simple: create a centralized online platform where students could share their profiles, photos, and social activities, forming a virtual connection hub.
On February 4, 2004, with the help of his roommates Dustin Moskovitz , Chris Hughes , and Eduardo Saverin , Zuckerberg launched TheFacebook.com from his dorm room. The initial goal was to create a social network for Harvard students, allowing them to create profiles, upload photos, and connect with each other.
The Early Success
Within 24 hours of launching TheFacebook, more than 1,000 Harvard students had signed up. The demand was immediate and overwhelming. Soon after, Zuckerberg expanded access to students at other Ivy League schools such as Yale and Columbia , and within a few months, TheFacebook was spreading across college campuses across the U.S.
Eduardo Saverin was the business head of the project, contributing $1,000 to cover server costs. Meanwhile, Dustin Moskovitz played a crucial role in expanding the platform’s reach to other schools. In the background, Zuckerberg continued coding and improving the platform, constantly iterating on its features and functionality.
The Dropout Decision
By the summer of 2004, Facebook had gained significant traction. Zuckerberg made a monumental decision to drop out of Harvard and move to Palo Alto, California , where he and his team set up an office. This decision was largely driven by the realization that Facebook was no longer just a campus project – it had the potential to become something much bigger.
The Move to Silicon Valley and Early Growth (2004-2006)
Securing Investment
As Facebook grew, it caught the attention of several prominent investors in Silicon Valley. One of the most significant moments in the early history of Facebook came when Zuckerberg met Peter Thiel , co-founder of PayPal. Thiel was impressed by the potential of Facebook and offered an initial investment of $500,000 in exchange for a small stake in the company. This was Facebook’s first major funding round and helped Zuckerberg and his team scale the platform.
Sean Parker's Influence
At the same time, Sean Parker , the co-founder of Napster , joined Facebook as its first president. Parker, a seasoned entrepreneur, had already built relationships with key players in Silicon Valley and helped Zuckerberg navigate the startup landscape. Parker’s involvement brought a level of credibility and guidance that Facebook needed at the time. He advised Zuckerberg to drop the "The" from the name, and so, TheFacebook.com became simply Facebook .
Expanding Beyond Colleges
Initially, Facebook was only available to college students. Users had to have a .edu email address to sign up. However, by September 2006, Facebook opened its doors to anyone over the age of 13 with a valid email address. This was a pivotal moment in the company’s history, as it allowed Facebook to reach a global audience.
The Era of Features: News Feed, Likes, and the Birth of the Social Graph (2006-2009)
As Facebook’s user base grew, the platform began to evolve beyond just a simple profile-based social network. Zuckerberg and his team introduced a series of groundbreaking features that transformed Facebook into a dynamic, interactive experience.
The News Feed (2006)
One of the most transformative updates was the introduction of the News Feed in 2006. Until then, users had to manually visit their friends' profiles to see updates. The News Feed changed that by aggregating friends' activities—such as status updates, photo uploads, and profile changes—into a centralized stream.
The News Feed faced initial backlash, with users expressing concerns about privacy and the level of visibility their actions had. However, over time, the News Feed became Facebook’s most defining feature, allowing users to stay up-to-date with their friends in real-time.
The Like Button (2009)
In 2009, Facebook introduced the Like button , which became an iconic feature of the platform. The idea was simple: users could show their appreciation for a post by clicking "Like." This feature not only gave users an easy way to engage with content, but it also provided Facebook with valuable data about user preferences, which became critical in shaping the algorithms that powered the News Feed.
Scaling the Social Graph: Facebook’s Key Innovation
At the heart of Facebook is its Social Graph —a map of relationships and interactions between users. The Social Graph is the foundation of Facebook’s platform, allowing it to understand the complex web of connections between people, pages, groups, and content.
As Facebook scaled, managing the Social Graph became one of the company’s biggest technical challenges. The sheer volume of data that Facebook had to process in real-time to serve content to billions of users was unprecedented.
To solve this, Facebook developed a series of custom backend systems and storage solutions, such as TAO (a geographically distributed data store), to ensure that it could manage the massive volume of data generated by its users. The Social Graph became the driving force behind many of Facebook’s most important features, such as friend suggestions, content recommendations, and targeted advertising.
From Startup to Tech Giant: The IPO and Beyond (2012-Present)
By 2012, Facebook had more than 1 billion active users, making it the largest social network in the world. That same year, Facebook went public, offering shares on the NASDAQ in one of the largest initial public offerings (IPO) in history. Facebook’s IPO raised $16 billion, valuing the company at over $100 billion.
Despite some initial volatility, Facebook’s stock has since risen steadily, as the company has continued to grow and diversify its product offerings.
Acquisitions and Expansion
Facebook’s strategy of growth was not limited to organic expansion. Over the years, the company has made a series of high-profile acquisitions that have expanded its reach:
- Instagram (2012) : Acquired for $1 billion, Instagram has become one of the most popular photo-sharing platforms in the world.
- WhatsApp (2014) : Facebook bought WhatsApp for $19 billion, adding a powerful global messaging platform to its portfolio.
- Oculus VR (2014) : The acquisition of Oculus marked Facebook’s entry into the world of virtual reality, with the goal of shaping the future of immersive technology and social interactions.
The Future of Facebook: The Metaverse and Beyond
In October 2021, Zuckerberg made another bold move by rebranding Facebook, Inc. to Meta Platforms, Inc. , signaling a shift in the company’s long-term vision. The term "Meta" reflects Zuckerberg's growing interest in building the Metaverse —a virtual world where people can interact, work, and play using virtual reality (VR) and augmented reality (AR) technologies.
This new chapter in Facebook’s story represents an evolution from a social media company to a broader technology company, focused on creating new ways for people to connect in the virtual space.
SUMMING UP THE STORY:
The story of Facebook is a remarkable journey from a small dorm room project to one of the most powerful companies on the planet. What began as a simple tool for college students has transformed into a global platform that connects billions of people, reshapes industries, and influences culture, politics, and society.
While the road hasn't always been smooth—Facebook has faced criticism and controversies around data privacy, content moderation, and monopolistic practices—the company’s technological innovation and ability to scale have been nothing short of extraordinary.
The future of Facebook—or Meta—is still unfolding. As the company invests in building the Metaverse and pushing the boundaries of AI, VR, and AR, the next chapter in this incredible tale is just beginning.
REAL HARD WORK: CODE, TECH-STACK and ENGINEERING
Behind the scenes, Facebook's success is largely built on a complex, ever-evolving technological infrastructure designed to handle massive scale, ensure user security, and deliver a seamless user experience.
This blog takes you deep inside Facebook’s technology stack, tracing its evolution from a humble LAMP stack to the cutting-edge systems that power features like the News Feed, Messenger, Instagram, and WhatsApp today.
1. Early Days: LAMP Stack to Custom Infrastructure
In the early days, Facebook was built using the classic LAMP stack (Linux, Apache, MySQL, PHP). Let’s break down each component:
Linux : Facebook's servers were initially running on Linux, providing a stable, costeffective, and customizable operating system.
Apache : The web server used to serve web pages and handle HTTP requests.
MySQL : Facebook’s early database engine, used to store user data, status updates, and relationship data.
PHP : The programming language that powered Facebook’s dynamic content.
However, as Facebook grew exponentially, the limitations of the LAMP stack became evident:
Scaling Issues : MySQL, though reliable, struggled to handle Facebook's growing user base and massive traffic spikes.
PHP Limitations : As PHP applications grew larger and more complex, it became difficult to optimize for performance and reliability.
This led Facebook to reinvent its infrastructure, optimizing every layer to handle billions of daily interactions.
LAMP STACK EXPLAINED-----
In its early days, Facebook was built on a LAMP stack , which stands for Linux, Apache, MySQL, and PHP . This stack was a common choice for web applications in the early 2000s because it was open-source, cost-effective, and easy to set up. The LAMP stack provided a solid foundation for Facebook's initial development, allowing it to quickly iterate on features and expand to universities beyond Harvard.
1. The LAMP Stack Components
1.1 Linux (Operating System)
Facebook's servers initially ran on Linux , an open-source operating system known for its stability and flexibility. Linux was favored by many startups because of its low cost (free) and widespread developer support.
- Advantages of Linux :
- Open-source and free to use, which was crucial for a cash-strapped startup like Facebook.
- Stability and scalability to handle increased server loads as Facebook's user base grew.
- Strong developer community support, making it easier to troubleshoot and customize.
1.2 Apache (Web Server)
Apache HTTP Server was the web server Facebook used to handle HTTP requests from users. Apache is one of the oldest and most popular web servers, known for its modular architecture, which allowed Facebook to customize it as needed.
- Why Apache?
- Apache’s modularity allowed Facebook to extend its functionality as needed.
- It could handle a large number of concurrent connections, making it ideal for a growing social network.
- As an open-source tool, it offered flexibility and cost savings.
1.3 MySQL (Database)
Facebook used MySQL as its relational database to store and retrieve data. Early on, MySQL housed critical information, including user profiles, relationships, posts, and other site data. It was a solid choice for early-stage web apps due to its simplicity and the ability to handle structured data using SQL queries.
- Advantages of MySQL :
- It was open-source, saving Facebook money in the early days.
- It provided support for SQL (Structured Query Language), which allowed developers to quickly interact with the database.
- MySQL had built-in replication features, which became important as Facebook expanded and needed database redundancy for fault tolerance.
1.4 PHP (Scripting Language)
Facebook was originally written in PHP , a server-side scripting language that was popular for building dynamic web applications. PHP made it easy for Facebook’s developers to write and deploy code quickly, which was critical in Facebook's fast-paced development environment.
- Why PHP?
- Ease of use : PHP was simple and had a low learning curve, allowing Zuckerberg and his team to quickly iterate on Facebook’s features.
- Rapid development : The dynamic nature of PHP allowed the team to quickly build new features and update the site on the fly.
- Embedded in HTML : PHP could be embedded directly into HTML code, allowing for dynamic web pages that could fetch data from the MySQL database.
2. Early Days and the Benefits of LAMP
In the beginning, the LAMP stack offered Facebook a solid foundation to launch and rapidly grow within college campuses. It allowed for fast prototyping, quick deployment, and relatively easy scaling for a small number of users. The combination of these open-source technologies was lightweight and capable enough for a startup that didn’t have the vast resources of a large tech company.
However, as Facebook started to grow exponentially, the limitations of the LAMP stack began to show.
3. Main Issues with Facebook’s Early LAMP Stack
While the LAMP stack was great for the early days, it wasn’t built to handle the scale that Facebook would soon require. As millions of users began signing up, Facebook encountered several major challenges, which prompted them to rethink parts of their architecture.
3.1 Scalability
One of the biggest issues Facebook faced with the LAMP stack was scalability . The components of the LAMP stack weren’t designed to handle the massive scale and data loads that Facebook experienced as it grew.
- MySQL Bottlenecks : MySQL, while effective for early-stage growth, struggled under the massive volume of user data, relationships, and interactions. Facebook had to manage millions of reads and writes per second, which was too much for a single MySQL instance. Sharding (splitting the database into smaller chunks) and replication techniques were implemented, but these solutions created complexity and data consistency issues.
- PHP Performance : As PHP dynamically generated web pages, it became slower under heavy traffic loads. Although it was easy to develop with, PHP required the server to execute scripts each time a user accessed the page. This led to performance bottlenecks, as Facebook’s traffic exploded.
3.2 Database Issues
MySQL, being a relational database, worked well in small-scale environments but wasn't optimized for the type of rapid growth Facebook experienced.
- Single Point of Failure : In the early days, Facebook's reliance on a single MySQL database created a potential single point of failure. As traffic surged, the MySQL database couldn’t handle the sheer volume of user requests, leading to slowdowns and downtime.
- Data Integrity and Sharding : To address scalability issues, Facebook initially tried sharding MySQL, which involves splitting data across multiple databases. However, sharding introduced complexity, especially in ensuring data integrity and consistency across shards. This also required custom logic to decide which data was stored on which shard, complicating the architecture.
3.3 Performance Bottlenecks
The dynamic nature of PHP, combined with the use of Apache , caused performance bottlenecks.
- PHP Compilation Overhead : PHP scripts are interpreted and executed on the server each time a request is made. This creates an overhead, especially when millions of users are making simultaneous requests. Compiling PHP on the fly wasn't sustainable for Facebook’s growing user base.
- Apache’s Limitations : Although Apache was modular and customizable, it struggled with handling the increasing volume of HTTP requests. The thread-based model of Apache became inefficient when serving millions of users concurrently. Apache’s memory usage and thread management became a limiting factor.
3.4 Caching and Dynamic Content
Facebook, in its early days, relied heavily on dynamic content generation. Every time a user visited a profile, PHP would generate the page dynamically by querying MySQL and serving the data. This was inefficient at scale because the same content (such as static profile data) was being generated over and over again.
- Lack of Caching : Initially, Facebook didn’t use aggressive caching techniques to store commonly accessed data (e.g., user profiles, friend lists, etc.). This led to excessive database queries, which put immense strain on MySQL and slowed down page load times for users.
3.5 Infrastructure Complexity
As Facebook grew, managing multiple LAMP servers became more complex. Each server required manual configuration and management. Keeping them synchronized and ensuring that they could handle high traffic loads without crashing became a constant challenge.
4. Facebook's Solutions and Evolution Beyond LAMP
To address these issues, Facebook had to innovate and move beyond the limitations of the LAMP stack. Over time, they began building custom solutions and adopting more advanced technologies to handle their massive scale.
- Memcached : One of the first optimizations Facebook made was implementing Memcached , a distributed caching system that reduced the load on MySQL by storing commonly accessed data in memory. Memcached allowed Facebook to store frequently used data (like user profiles) in memory, greatly reducing the number of database queries.
- HipHop for PHP : To solve performance issues related to PHP, Facebook developed HipHop , a source code transformer that converted PHP into optimized C++ code. This dramatically improved the performance of PHP by compiling it into a more efficient language.
- Cassandra and TAO : As MySQL’s limitations became more apparent, Facebook started building custom solutions for data storage. They developed Cassandra , a distributed NoSQL database, to handle large amounts of data in a highly scalable way. Later, Facebook developed TAO , a geographically distributed data store optimized for reading and writing the social graph, allowing Facebook to scale its social network without relying on MySQL’s relational structure.
What is Cassandra?
Cassandra is a powerful and versatile database that is well-suited for handling large-scale, distributed data. Its scalability, high availability, and performance make it a popular choice for modern applications.
Cassandra played a crucial role in Facebook's growth and scalability. As the social network expanded and the volume of user data increased, Facebook needed a database that could handle massive amounts of data while maintaining high performance and availability.
Here's how Cassandra helped Facebook:
Scalability: Cassandra's distributed architecture allowed Facebook to easily scale its database infrastructure by adding more nodes. This ensured that the database could keep up with the increasing number of users and data.
High Availability: Cassandra's fault tolerance and replication capabilities ensured that the database remained available even in the event of hardware failures or network outages. This was critical for a service like Facebook that needed to be accessible to users around the world.
Performance: Cassandra's optimized data model and efficient query processing made it possible for Facebook to handle large-scale data analytics and real-time operations. This helped improve the user experience and enable features like the News Feed and search.
Flexibility: Cassandra's flexible data model allowed Facebook to adapt to changing data requirements and add new features without requiring major database migrations. This helped Facebook innovate and respond quickly to user needs.
Cost-Effectiveness: Cassandra's ability to scale horizontally using commodity hardware made it a cost-effective solution for Facebook. This helped the company manage its infrastructure costs as it grew.
By adopting Cassandra, Facebook was able to build a highly scalable, reliable, and performant database infrastructure that supported its massive user base and rapid growth. Cassandra's capabilities allowed Facebook to focus on innovation and delivering a great user experience, rather than worrying about the underlying infrastructure.
TAO (The Association-Oriented Data Store) is a highly specialized and optimized data store developed by Facebook to handle the unique needs of its social graph. The acronym TAO stands for "The Association-Oriented Data Store" rather than "The Adaptive Optimization," reflecting its core functionality of managing relationships between objects, such as users, posts, comments, and likes. TAO is designed to address the scalability, consistency, and performance issues Facebook encountered with traditional relational databases as its user base exploded to billions.
What is TAO?
TAO was created to efficiently manage Facebook’s massive social graph—the structured data representing all users, their relationships (e.g., friendships), and interactions (e.g., posts, comments, likes). It provides a geographically distributed system optimized for reading and writing the social graph in a way that traditional relational databases, such as MySQL, struggled to handle at scale.
Why TAO?
Facebook’s core operation revolves around the social graph—complex data relationships between users, posts, likes, comments, and more. Handling this social graph required a system that could quickly and efficiently process a vast number of reads (e.g., fetching a user’s friends or recent activity) and writes (e.g., recording a new post or like) while maintaining consistency and low latency across Facebook’s globally distributed user base.
Before TAO, Facebook used MySQL to store these relationships, but as the social network grew, MySQL struggled to manage the scale, particularly with associations (e.g., "User X likes Post Y"). Facebook needed a solution that was optimized for relationships, specifically focusing on:
High-read, high-write performance for social interactions.
Low-latency access to social graph data, which is read far more often than written.
Consistency and scalability across Facebook’s global infrastructure.
Key Features of TAO
Optimized for Social Graph Data: TAO is built to manage the kinds of associative data that form the backbone of Facebook's operations, including friendships, posts, comments, likes, and shares. It uses a graph model, where each entity (such as a user, post, or like) is a node, and relationships between them (such as "likes" or "friends with") are edges.
Geographically Distributed System: Facebook operates at a global scale, and users expect fast load times no matter where they are. TAO was designed with geo-distribution in mind. Data is replicated across Facebook’s data centers, ensuring low-latency reads from geographically closer servers.
High Read and Write Performance: TAO is designed to handle a much higher read-write ratio than traditional databases. This is crucial because social network interactions typically involve many more reads (checking friends' status updates, seeing likes, etc.) than writes. TAO optimizes for high read throughput while maintaining the ability to efficiently handle millions of writes.
Data Sharding and Replication: To ensure scalability, TAO uses sharding, where data is divided into smaller partitions distributed across different servers. It also ensures data replication, meaning the same piece of data is stored in multiple locations, improving availability and redundancy.
Eventual Consistency with Causal Consistency: TAO leverages a model of eventual consistency but improves upon it with causal consistency for certain operations. This means that while data may not be immediately synchronized across all servers, TAO ensures that changes respect the causal relationships (e.g., if you like a post, your friends will eventually see that you liked it, but not before the post itself has appeared in their feed).
API for Associations: TAO provides a simple API for managing associations between objects, such as:
Association Writes: Creating relationships between objects (e.g., "User X likes Post Y").
Association Queries: Retrieving relationships (e.g., "Show all the posts liked by User X").
Object Queries: Fetching specific objects (e.g., retrieving the details of a post).
Caching for Fast Data Access: TAO uses aggressive caching to reduce load on backend databases and ensure low-latency access to frequently queried data. The cache-first architecture ensures that TAO reads data from in-memory caches whenever possible, only falling back to database queries when necessary.
How TAO Works
At its core, TAO operates by organizing Facebook’s social graph into objects and associations:
Objects: These represent entities such as users, posts, photos, comments, etc.
Associations: These are relationships between objects, such as friendships, likes, and comments.
Object and Association Data Model
Objects: Each object (e.g., a user or post) has a unique ID and associated metadata (e.g., username, post content). These objects are stored in a key-value format.
Associations: Associations between objects represent the connections in the social graph. For example, "User X likes Post Y" or "User X is friends with User Y." These associations are directional, meaning they can represent relationships that have a defined start and endpoint (e.g., a user liking a post).
Read/Write Optimization
TAO provides different endpoints for reading and writing data, making it highly efficient at both:
Writes: TAO handles millions of association writes per second (such as new friendships, posts, or likes). Each write is distributed across different shards to ensure scalability.
Reads: Facebook’s primary use case involves reading social graph data, such as loading a user’s news feed. TAO’s caching system ensures that these reads are fast and efficient, reducing the load on the underlying database.
Causal Consistency Model
TAO provides causal consistency, ensuring that read-after-write consistency is maintained. This is particularly important in social networks where the order of operations matters—e.g., a comment should not appear before the post it comments on.
In practical terms, causal consistency ensures that when a user performs an action (like liking a post), their friends will see it in the correct order (i.e., they see the post before they see the "like").
Challenges Solved by TAO
Scaling the Social Graph: TAO was designed to handle billions of objects and trillions of associations. It scales horizontally, meaning Facebook can add more servers as needed to accommodate growing data volumes.
Low Latency for Global Users: Facebook users expect fast, seamless interactions, no matter their location. TAO ensures low-latency access to social graph data by geo-distributing data across Facebook’s global data centers and using caching to reduce latency.
Handling High Read-Write Volume: The sheer volume of reads and writes Facebook handles is enormous. TAO is optimized for this workload, allowing Facebook to handle millions of likes, comments, and posts per second while ensuring quick reads for users checking their news feeds or profiles.
Data Consistency Across Data Centers: TAO provides eventual consistency while ensuring that data is replicated across multiple data centers. By focusing on causal consistency, Facebook ensures that users see data in the correct order, even if the system is distributed across the globe.
Real-World Impact of TAO
TAO is crucial for Facebook's operations, enabling the platform to support a massive number of users and interactions while maintaining low-latency and high availability. It's not just a backend system but a fundamental technology that ensures users can seamlessly interact with the social graph.
Efficient Social Graph Queries: Users can load their profiles, check their news feed, and see who liked their posts, all in real-time, thanks to TAO’s efficient handling of associations and caching.
Massive Scale: TAO handles billions of reads and millions of writes per second, allowing Facebook to operate at a global scale without bottlenecks.
Consistency and Reliability: By offering eventual consistency with causal guarantees, TAO ensures that user interactions, such as likes and comments, are displayed in the correct order across Facebook’s global infrastructure.
1.1 Transitioning to HipHop: A PHP Revolution
Facebook addressed PHP’s performance issues by developing HipHop for PHP (HPHP) , a source code transformer that converted PHP code into highly optimized C++ code.
This reduced server load and improved performance dramatically.
Outcome : HPHP reduced Facebook's CPU usage by up to 50%, allowing Facebook to serve millions of users more efficiently. However, as Facebook scaled further, they needed a more sustainable solution, leading to the creation of the HipHop Virtual Machine (HHVM) , which now powers most of Facebook’s backend.
HipHop: A PHP Revolution—How It Powers Facebook's Backend
Facebook’s backend journey began with PHP, a simple, easy-to-use language that allowed for rapid development. However, as the platform grew exponentially, PHP’s inherent inefficiencies posed major performance challenges at scale. To address these, Facebook revolutionized its backend by developing HipHop for PHP (HPHP), a set of transformative tools that drastically improved performance, scalability, and efficiency.
Let's dive into the details of how HipHop reshaped Facebook's architecture, the evolution of this technology, and how it powered Facebook’s backend.
Why PHP and Its Limitations at Scale
In its early days, Facebook was primarily built in PHP due to its simplicity and speed of development. PHP enabled Facebook to iterate quickly, which was essential for a startup that needed to move fast and implement new features. PHP's flexibility and low barrier to entry made it ideal for developers to add and test new features frequently.
However, as Facebook's user base grew into the hundreds of millions and then billions, PHP’s dynamic nature presented significant problems:
Performance Overhead: PHP is interpreted at runtime, which incurs overhead compared to compiled languages. This is fine for small projects but inefficient at Facebook's massive scale.
Memory Usage: PHP's architecture resulted in higher memory usage, making it harder to scale efficiently without upgrading hardware constantly.
Inefficiency in CPU Utilization: Since PHP was interpreted at runtime, it couldn't leverage system resources like CPU and memory as efficiently as compiled languages like C++ or Java.
Facebook quickly realized that continuing with standard PHP would lead to massive inefficiencies in both server resources and performance. They needed to innovate without disrupting the existing PHP codebase that the entire platform was built upon. This led to the creation of HipHop for PHP.
What is HipHop for PHP?
HipHop for PHP (HPHP) was Facebook’s custom solution to the performance limitations of PHP. Initially developed by Facebook engineers in 2010, HipHop transformed the interpreted nature of PHP into a more efficient compiled language.
How HipHop Works
HipHop for PHP translates PHP code into C++, which is then compiled into machine code. This allowed Facebook to execute PHP scripts as compiled binaries rather than relying on the PHP interpreter, significantly improving performance.
Key features of HipHop include:
PHP to C++ Compilation: HPHP takes PHP source code and converts it into optimized C++ code, which is then compiled into a binary executable. This removes the overhead of interpretation at runtime, allowing PHP applications to run faster.
High Throughput: The compiled C++ binaries executed much faster than their interpreted PHP counterparts, leading to significant performance boosts for CPU-bound tasks like rendering pages or running algorithms.
Memory Efficiency: HipHop reduced memory consumption because compiled languages handle memory more efficiently than dynamically interpreted ones.
JIT Compilation: Over time, HipHop evolved to include a Just-in-Time (JIT) compilation technique in later iterations, ensuring that even dynamic portions of PHP code could be optimized on the fly.
No Disruption to Development: The best part of HipHop is that Facebook's engineers didn’t need to learn a new language or framework. They could continue writing in PHP while HipHop handled the heavy lifting of optimizing performance behind the scenes.
Performance Gains
The introduction of HipHop immediately led to significant improvements in Facebook’s backend performance:
50% Reduction in CPU Usage: By compiling PHP code to C++, HipHop cut Facebook's CPU consumption in half, leading to more efficient resource utilization across its data centers.
40% More Requests Per Server: Facebook could now handle 40% more traffic per server, significantly reducing infrastructure costs.
Higher Throughput: Page load times decreased, and the backend could process more requests simultaneously, allowing Facebook to scale faster without exponentially increasing hardware costs.
Evolution of HipHop: HipHop Virtual Machine (HHVM)
Though HipHop provided a major breakthrough, it had limitations. Converting PHP to C++ required long compile times, and certain PHP features (like eval) were hard to support in this model. To address these shortcomings, Facebook evolved HipHop into the HipHop Virtual Machine (HHVM) in 2013.
HHVM—A Just-in-Time (JIT) Compiler
HHVM marked the next step in Facebook’s optimization journey. It works by using Just-in-Time (JIT) compilation, dynamically compiling PHP code during execution, allowing for greater flexibility and speed. With HHVM:
Dynamic Compilation: Unlike HipHop, which converted PHP code into static C++ binaries, HHVM uses JIT compilation to dynamically convert PHP into machine code at runtime. This allowed for more flexibility, enabling support for dynamic PHP features while still delivering performance benefits.
Improved Performance: JIT compilation offers better runtime optimization by analyzing code during execution and applying more advanced optimizations. HHVM performs better than traditional interpreters and outpaces the original HipHop in many use cases.
Support for Hack: Facebook introduced Hack, a statically typed language built on top of PHP, alongside HHVM. Hack introduced features like gradual typing, collections, and lambda expressions. Developers could use the flexibility of PHP with the type safety and performance advantages of a statically typed language.
HHVM Key Benefits
Faster Execution: With JIT compilation, HHVM could outperform both standard PHP and the original HipHop compiler, offering even faster page render times and better CPU utilization.
Memory Optimization: HHVM optimized memory usage better than standard PHP, reducing overhead and improving Facebook's scalability.
Real-Time Flexibility: Since HHVM compiled code on the fly, it supported dynamic PHP features and real-time updates to code, allowing for rapid iteration while still delivering performance benefits.
Type Safety with Hack: The Hack language introduced by Facebook allowed for type annotations, giving developers a balance between the flexibility of PHP and the benefits of static typing, improving code safety and performance.
Impact of HipHop and HHVM on Facebook’s Backend
The combination of HipHop and HHVM had a profound impact on how Facebook’s backend operates:
Scalability: HipHop and HHVM made it possible for Facebook to scale to billions of users without a proportional increase in hardware costs. By improving server performance, they reduced the need for adding physical servers to handle increased traffic.
Resource Efficiency: Reducing CPU usage by 50% and improving memory efficiency allowed Facebook to serve more requests per server, reducing its infrastructure footprint and energy consumption. This not only saved costs but also helped Facebook build more environmentally friendly data centers.
Developer Productivity: Developers could continue to use PHP while benefiting from the optimizations of HipHop and HHVM. The introduction of Hack gave developers more powerful tools without requiring a complete overhaul of the codebase.
Faster Development Cycles: HHVM's JIT compilation allowed Facebook engineers to make real-time changes and deploy them quickly. This was crucial for maintaining the move fast and break things philosophy, where experimentation and iteration were central to Facebook’s success.
Beyond HipHop: Facebook's Evolving Backend Architecture
Though HipHop and HHVM played a crucial role in Facebook’s infrastructure for years, the company has continued to evolve its backend to meet ever-growing challenges. In addition to HHVM, Facebook now uses a mix of technologies like:
GraphQL: For efficient data querying across its APIs.
MySQL and TAO: For managing structured data at massive scale.
BigPipe: For efficient page rendering by streaming HTML content incrementally.
The shift from PHP to Hack via HHVM and the optimizations it brought were foundational steps in making Facebook’s infrastructure one of the most powerful and efficient systems on the web today.
Conclusion for Hip Hop story:
HipHop for PHP was a revolutionary step in Facebook’s journey to becoming a global tech giant. By transforming PHP into a compiled language through HipHop and later optimizing it further with HHVM, Facebook managed to solve the massive scalability and performance challenges that came with its exponential growth. This innovation not only allowed Facebook to continue using PHP’s simple and flexible syntax but also turned it into a high-performance language capable of powering one of the world’s most complex and high-traffic websites.
Today, HipHop’s legacy lives on in HHVM and Hack, forming the backbone of Facebook’s highly optimized infrastructure. This evolution showcases Facebook’s ability to push the boundaries of web performance, ensuring it can serve billions of users around the world with lightning speed and efficiency.
Read-after-write consistency is a consistency model where, once a write operation is confirmed, any subsequent read operations will reflect the updated value. This model ensures that after a data write, all subsequent reads will show the most recent write, maintaining consistency across the system.
Read-After-Write Consistency in Facebook's Database
1. Data Architecture: Facebook's database system involves a complex architecture with various components, including:
MySQL: Used for traditional relational data and critical transactional consistency.
Cassandra: A distributed NoSQL database used for handling large amounts of unstructured data, known for its high availability and scalability.
TAO: A distributed data store developed by Facebook to manage its social graph data efficiently, providing read and write access to the enormous amount of user data and interactions.
TAO provides causal consistency, ensuring that read-after-write consistency is maintained.
2. Read-After-Write Consistency Challenges:
Scalability and Performance: Maintaining read-after-write consistency can be challenging in a highly distributed system where data is replicated across multiple nodes. Ensuring that all nodes have the latest write before a read operation can affect performance and scalability.
Network Partitions: In distributed systems, network partitions or failures can result in inconsistencies where some nodes may not immediately reflect the latest write.
3. Consistency Strategies:
Quorum-Based Reads/Writes: Facebook’s distributed systems, like Cassandra, often use quorum-based approaches. For a read operation to be considered consistent, it typically requires a majority (quorum) of nodes to agree on the latest write. This ensures that the read reflects the most recent write.
Timestamp-Based Ordering: In some systems, like TAO, operations are ordered using timestamps. This allows the system to maintain consistency by ensuring that writes are processed in the order they occur, and reads always access the latest data.
Write-Ahead Logs (WAL): Systems like MySQL use write-ahead logs to ensure that data is not lost and consistency is maintained. These logs record all changes made to the database, allowing for recovery and consistency in case of failures.
4. Implementation in Facebook’s Systems:
Cassandra: For writes, Cassandra uses a commit log and memtables to ensure data durability. When a read request is made, Cassandra checks the memtable and SSTables to ensure it returns the latest data, which is essential for read-after-write consistency.
TAO: TAO’s design includes mechanisms to ensure that after a write, subsequent reads will reflect the latest state of the social graph. It uses a combination of caching and replication strategies to achieve consistency and high performance.
5. Practical Considerations:
Latency vs. Consistency Trade-offs: Facebook has to balance between low latency and consistency. While strong consistency (read-after-write) is crucial, some operations might tolerate eventual consistency to improve performance.
Consistency Models for Different Use Cases: Facebook employs different consistency models based on the use case. Critical operations, such as user authentication and critical updates, typically require strong consistency, while less critical operations might use eventual consistency for improved performance.
In summary, read-after-write consistency in Facebook’s database system involves ensuring that once a write operation is completed, all subsequent reads will reflect this update. This is achieved through a combination of consistency strategies, including quorum-based approaches, timestamp ordering, and write-ahead logs, tailored to the specific requirements of different components of Facebook’s architecture.
2. Frontend Evolution:
People say that Facebook is entirely blue white because Mark Zuckerberg is clour-blind.
From Static Pages to Dynamic User-Experiences as user expectations shifted toward rich, interactive experiences, Facebook transitioned from static HTML pages to dynamic, client-side rendered content.
Facebook's frontend technology stack has evolved significantly over the years to handle its massive scale and complex user interactions. As of the latest updates, Facebook uses a variety of technologies to build and maintain its frontend. Here’s an overview of the key technologies used:
1. React
Description: React is a JavaScript library for building user interfaces, developed and maintained by Facebook. It allows developers to build reusable UI components and manage the state of applications efficiently.
Usage: React is central to Facebook’s frontend development, used extensively across its platform for creating interactive and dynamic user experiences. It enables Facebook to handle complex user interfaces with high performance and responsiveness.
2. GraphQL
Description: GraphQL is a query language for APIs and a runtime for executing queries by providing a more efficient and flexible alternative to traditional REST APIs. It was developed by Facebook to improve data fetching and manipulation.
Usage: Facebook uses GraphQL to allow frontend applications to request exactly the data they need from the server, reducing the amount of data transferred and improving performance.
==>GraphQL addresses the issue of overfetching data through its flexible and precise querying capabilities. Overfetching occurs when a client requests more data than it actually needs, which can lead to inefficiencies and wasted resources.
Here’s how GraphQL prevents overfetching:
{
user(id: "123") {
name
email
}}
The server will respond with only the requested name and email, without including any additional fields.
Nested Queries
GraphQL supports nested queries, allowing clients to request related data in a single query. This avoids multiple requests to different endpoints and ensures that the exact data needed is fetched in one go, preventing the retrieval of unnecessary data.
Example: A client can request user details and their associated posts in one query:
1. Client-Specified Queries
In GraphQL, the client specifies exactly what data it needs in its query. The query structure directly reflects the data requirements of the client, ensuring that only the requested fields are returned. This is in contrast to traditional REST APIs, where the server defines the structure of responses, often leading to overfetching.
Example: If a client only needs a user’s name and email, it can send a query like this:
{
user(id: "123") {
name
posts {
title
content
}
}
}
Here, the client gets the user’s name and their posts, including only the title and content of each post, without fetching irrelevant details.
3. Arguments and Filters
GraphQL allows clients to use arguments and filters to refine their queries. This means clients can specify conditions to retrieve only the data that meets certain criteria, thus avoiding the retrieval of excess data.
Example: If a client wants posts created within the last month:
{
posts(filter: { createdAfter: "2024-08-15" }) {
title
content
}
}
The server will return posts that meet the filtering criteria, avoiding unnecessary data.
4. Custom Resolvers
In GraphQL, custom resolvers on the server can further control and optimize the data retrieval process. Resolvers can be written to handle complex data-fetching logic and ensure that only relevant data is fetched based on the query’s requirements.
Example: A resolver for the user field might only fetch and return user details if the client requests them, and avoid fetching additional related data unless specifically requested.
5. Type System and Introspection
GraphQL’s type system allows clients to know exactly what data is available and what types each field has. The introspection capabilities enable clients to query the schema itself to understand the data structure, ensuring they request only the necessary fields.
Example: Clients can use introspection queries to explore available fields and types:
{
__type(name: "User") {
fields {
name
type {
name
}
}
}
}
This allows clients to tailor their queries based on the precise schema information.
6. Avoiding Fixed Endpoints
Unlike REST, where different endpoints may return different data structures, GraphQL operates through a single endpoint. This approach allows clients to specify exactly what they need in each request, avoiding the fixed and often overly broad responses of REST endpoints.
Example: Instead of multiple REST endpoints for user details and user posts, a single GraphQL endpoint can handle diverse queries as specified by the client.
In summary, GraphQL prevents overfetching by giving clients precise control over the data they request. Through client-specified queries, nested queries, arguments, and filters, and a flexible schema, GraphQL ensures that only the necessary data is retrieved and delivered, optimizing performance and reducing unnecessary data transfer.
3. Relay
Description: Relay is a JavaScript framework for managing and querying data in React applications. It works with GraphQL to provide a declarative data-fetching approach.
Usage: Relay is used to manage data-fetching and state management in Facebook’s React applications, ensuring that data is consistent and efficiently synchronized with the UI.
4. Preact
Description: Preact is a lightweight alternative to React with a similar API but a smaller footprint. It’s used to optimize performance and reduce load times.
Usage: While Facebook predominantly uses React, Preact might be employed in some parts of their system for performance optimizations or lightweight applications.
5. Flow
Description: Flow is a static type checker for JavaScript developed by Facebook. It helps catch type errors and enhance code quality during development.
Usage: Flow is used in Facebook’s codebase to ensure type safety and catch potential errors early, contributing to more reliable and maintainable code.
6. TypeScript
Description: TypeScript is a statically typed superset of JavaScript that compiles to plain JavaScript. It provides optional static typing, which helps catch errors and improve code quality.
Usage: Facebook has been increasingly adopting TypeScript for its projects to enhance development productivity and maintainability.
7. Buck
Description: Buck is a build system developed by Facebook to support fast and incremental builds. It’s optimized for building large-scale projects.
Usage: Buck is used to build and manage frontend code efficiently, handling dependencies and compilation tasks to speed up development cycles.
8. Webpack
Description: Webpack is a module bundler for JavaScript applications. It takes modules with dependencies and generates static assets representing those modules.
Usage: Facebook uses Webpack for bundling JavaScript code, optimizing assets, and managing dependencies to ensure efficient frontend builds.
9. CSS-in-JS
Description: CSS-in-JS is a styling approach where CSS is composed using JavaScript, allowing for dynamic styling based on component state.
Usage: Facebook uses CSS-in-JS techniques in conjunction with React to manage component-specific styles, improving encapsulation and maintainability.
10. Flipper
Description: Flipper is a debugging tool for mobile applications developed by Facebook. It provides tools for inspecting and debugging mobile apps.
Usage: Flipper is used for debugging React Native applications, allowing developers to inspect the state, network requests, and performance of mobile apps.
11. Jest
Description: Jest is a JavaScript testing framework developed by Facebook. It provides a robust and easy-to-use testing environment for JavaScript applications.
Usage: Jest is used to write and run tests for Facebook’s frontend code, ensuring that components and features work correctly and remain bug-free.
12. Styled Components
Description: Styled Components is a library for styling React components using tagged template literals. It enables the use of component-level styles in a scoped and modular manner.
Usage: Styled Components may be used for styling React components in a way that allows for better encapsulation and reusability.
These technologies collectively enable Facebook to build and maintain a highly interactive, scalable, and performant frontend for its web and mobile applications.
2.1 React: Building Dynamic User Interfaces
React , a JavaScript library created by Facebook, revolutionized the way Facebook and other companies build modern web apps. React allows developers to create large, dynamic web applications with small, reusable components.
Why React?
Declarative UI : React enables developers to describe how the UI should look at any given point in time, and React takes care of updating the DOM efficiently.
Component-Based : With React, Facebook could break down its massive UI into smaller, reusable components, simplifying development and maintenance.
React's Role at Facebook :
React is the backbone of Facebook’s frontend development, powering everything from the News Feed to the notifications system. It has also become the foundation for Facebook’s other platforms, such as Instagram and WhatsApp Web.
2.2 Relay and GraphQL: Efficient Data Fetching
React works in tandem with Relay and GraphQL , two more technologies developed by Facebook to improve data fetching efficiency.
Relay : Relay is a JavaScript framework for managing data in React applications. It optimizes how React components fetch and update data, ensuring that only the necessary data is retrieved, reducing bandwidth usage.
GraphQL : Unlike REST APIs that return fixed data structures, GraphQL allows clients to query exactly what they need. This reduces over-fetching of data and makes API responses more efficient.
Real-World Example : Facebook’s News Feed uses GraphQL to deliver personalized content to users. As users scroll through the feed, GraphQL queries fetch only the relevant posts, images, and comments in realtime.
3. Backend Evolution: From MySQL to Distributed Systems
Handling the world’s largest social network requires a highly scalable and reliable backend. Facebook’s backend is a combination of traditional relational databases like MySQL and custombuilt, distributed storage systems optimized for speed, scalability, and fault tolerance.
3.1 Scaling MySQL: Sharding and Replication
Even though Facebook moved away from relying solely on MySQL, it remains an important part of their infrastructure. To make MySQL scalable:
Sharding : Facebook uses sharding to split large databases into smaller, manageable pieces, each stored on different servers. Each shard contains a subset of the total data, reducing the load on individual servers.
Replication : Facebook also employs replication to ensure data availability and fault tolerance. Each shard has multiple replicas distributed across different data centers, so even if one server fails, data can be retrieved from another replica.
3.2 TAO: Facebook’s Social Graph Storage
Facebook’s social graph is the core of its platform, storing information about users, friendships, and interactions. TAO is a highly scalable, geographically distributed data store that handles billions of reads and writes per second.
Purpose : TAO is optimized for reading and writing the relationships (edges) between users (nodes) in the social graph. When you send a friend request, like a post, or follow a page, TAO processes that interaction.
Architecture : TAO’s distributed architecture allows it to process these interactions quickly, ensuring lowlatency access even when handling billions of requests per second across multiple data centers.
4. Real-Time Communication and Messenger
With billions of messages sent daily, Facebook’s realtime messaging system, Messenger , is a technological feat in its own right. To deliver messages instantly, Facebook uses several key technologies:
4.1 WebSockets for Real-Time Communication
WebSockets are crucial for realtime communication in Messenger. Unlike traditional HTTP requests, which require a backandforth for each message, WebSockets establish a persistent connection between the client and the server, enabling realtime, bidirectional data flow.
How WebSockets Work in Messenger : When a user sends a message, the WebSocket connection ensures that the message is delivered instantly to the recipient. It also powers typing indicators, message receipts, and realtime read notifications.
4.2 MQTT: A Lightweight Messaging Protocol
Facebook uses MQTT , a lightweight publishsubscribe messaging protocol, to minimize the overhead associated with sending realtime messages, especially on mobile devices where bandwidth is limited.
Why MQTT? : MQTT was designed to be lightweight, making it ideal for mobile devices with limited battery life and data plans. It also supports the massive scale that Facebook Messenger operates on, handling millions of concurrent users.
4.3 Chatbots and AI in Messenger
Facebook introduced chatbots to Messenger, allowing businesses to automate customer support and other services. These chatbots are powered by AI and natural language processing (NLP) models.
NLP and AI : Facebook uses Wit.ai , an NLP engine that helps chatbots understand and respond to human language. Machine learning models are continuously trained on user interactions to improve the accuracy and relevance of chatbot responses.
5. Photos and Videos: Delivering Visual Content at Scale
With over 100 million hours of video watched daily, Facebook's infrastructure for handling visual content—photos and videos—is incredibly sophisticated.
5.1 Video Processing Pipeline
When a user uploads a video, Facebook’s video pipeline processes it in multiple steps to ensure smooth playback across all devices and bandwidth conditions.
Transcoding : Facebook transcodes videos into multiple formats and resolutions, allowing users to watch videos in the best quality supported by their device and network connection.
Adaptive Bitrate Streaming (ABR) : Facebook uses ABR to adjust video quality in realtime based on the user’s connection speed, ensuring smooth playback without buffering.
5.2 AI-Driven Content Understanding
Facebook employs AI models to analyze photos and videos, enabling features like automatic tagging, facial recognition, and content moderation. For example:
Automatic Tagging : Facebook uses deep learning models trained on millions of photos to automatically recognize and suggest tags for users in images.
Content Moderation : AI models scan photos and videos for inappropriate or harmful content, ensuring that Facebook remains a safe platform for all users.
6. AI and Machine Learning at Facebook
Artificial intelligence is deeply embedded in Facebook's technology stack, powering everything from content recommendation to hate speech detection.
6.1 FBLearner Flow: Machine Learning at Scale
Facebook’s internal machine learning platform, FBLearner Flow , automates the entire machine learning pipeline, from data preprocessing to model deployment.
How It Works : Engineers feed large datasets into FBLearner Flow, which then trains and deploys machine learning models at scale. These models are used to power everything from content ranking in the News Feed to friend suggestions.
6.2 Deep Learning with PyTorch
Facebook is also a major contributor to PyTorch , an opensource deep learning framework. PyTorch is used extensively within Facebook for developing advanced machine learning models.
PyTorch's Role : PyTorch powers several core Facebook features, including:
Image and Video Understanding : Facebook uses deep learning
models to analyze and understand visual content uploaded by users.
Natural Language Processing (NLP) : Facebook’s NLP models, developed using PyTorch, are used to understand user queries in Messenger, provide translation services, and assist in content moderation.
7. Facebook's Mobile Apps: iOS and Android Development
Building for mobile platforms like iOS and Android requires a separate tech stack to deliver the best user experience.
7.1 React Native: A Unified Framework for Mobile Development
Facebook developed React Native , a crossplatform mobile development framework, to allow developers to write apps for iOS and Android using a single codebase.
Benefits of React Native :
Code Reusability : React Native enables developers to reuse up to 90% of the code between platforms, speeding up development and reducing maintenance costs.
Hot Reloading : React Native allows developers to see the results of code changes in realtime, without recompiling the entire app, improving development speed.
8. Data Centers: Powering a Global Infrastructure
Facebook operates some of the largest and most energyefficient data centers in the world. These data centers are the backbone of Facebook’s infrastructure, handling the massive amount of data generated daily.
8.1 Prineville Data Center: The First Facebook Data Center
Facebook's first custombuilt data center in Prineville, Oregon , marked the beginning of Facebook’s journey toward building its own data center infrastructure.
- Energy Efficiency : The Prineville data center uses evaporative cooling and a custom power distribution system to reduce energy consumption.
8.2 Facebook’s Global Data Center Network
Facebook now operates data centers across the world, from the United States to Europe and Asia. These data centers are connected by Facebook's private fiber network , ensuring lowlatency access to Facebook services, no matter where users are located.
9. Content Delivery Network (CDN): Serving Billions of Users
Facebook uses a custombuilt content delivery network (CDN) to deliver static assets—like images, CSS, and JavaScript—quickly to users around the globe.
Edge Caching : Facebook’s CDN uses edge caching to store copies of popular content at servers located close to users, reducing load times and server strain.
AI-Powered Caching : Facebook uses AI algorithms to predict which content is likely to go viral and caches it in advance, ensuring it can be delivered quickly to millions of users simultaneously.