Scaling PHP as a Scripting Language
One common way to address these inefficiencies is to rewrite the more complex parts of your PHP application directly in C++ as PHP Extensions. This largely transforms PHP into a glue language between your front end HTML and application logic in C++. From a technical perspective this works well, but drastically reduces the number of engineers who are able to work on your entire application. Learning C++ is only the first step to writing PHP Extensions, the second is understanding the Zend APIs. Given that our engineering team is relatively small — there are over one million users to every engineer — we can’t afford to make parts of our codebase less accessible than others.
Scaling Facebook is particularly challenging because almost every page view is a logged-in user with a customized experience. When you view your home page we need to look up all of your friends, query their most relevant updates (from a custom service we’ve built called Multifeed), filter the results based on your privacy settings, then fill out the stories with comments, photos, likes, and all the rich data that people love about Facebook. All of this in just under a second. HipHop allows us to write the logic that does the final page assembly in PHP and iterate it quickly while relying on custom back-end services in C++, Erlang, Java, or Python to service the News Feed, search, Chat, and other core parts of the site.
Since 2007 we’ve thought about a few different ways to solve these problems and have even tried implementing a few of them. The common suggestion is to just rewrite Facebook in another language, but given the complexity and speed of development of the site this would take some time to accomplish. We’ve rewritten aspects of the Zend Engine — PHP’s internals — and contributed those patches back into the PHP project, but ultimately haven’t seen the sort of performance increases that are needed. HipHop’s benefits are nearly transparent to our development speed.
Hacking Up HipHop
One night at a Hackathon a few years ago (see Prime Time Hack), I started my first piece of code transforming PHP into C++. The languages are fairly similar syntactically and C++ drastically outperforms PHP when it comes to both CPU and memory usage. Even PHP itself is written in C. We knew that it was impossible to successfully rewrite an entire codebase of this size by hand, but wondered what would happen if we built a system to do it programmatically.
Finding new ways to improve PHP performance isn’t a new concept. At run time the Zend Engine turns your PHP source into opcodes which are then run through the Zend Virtual Machine. Open source projects such as APC and eAccelerator cache this output and are used by the majority of PHP powered websites. There’s also Zend Server, a commercial product which makes PHP faster via opcode optimization and caching. Instead, we were thinking about transforming PHP source directly into C++ which can then be turned into native machine code. Even compiling PHP isn’t a new idea, open source projects like Roadsend and phc compile PHP to C, Quercus compiles PHP to Java, and Phalanger compiles PHP to .Net.
Needless to say, it took longer than that single Hackathon. Eight months later, I had enough code to demonstrate it is indeed possible to run faster with compiled code. We quickly added Iain Proctor and Minghui Yang to the team to speed up the pace of the project. We spent the next ten months finishing up all the coding and the following six months testing on production servers. We are proud to say that at this point, we are serving over 90% of our Web traffic using HipHop, all only six months after deployment.
How HipHop Works
The main challenge of the project was bridging the gap between PHP and C++. PHP is a scripting language with dynamic, weak typing. C++ is a compiled language with static typing. While PHP allows you to write magical dynamic features, most PHP is relatively straightforward. It’s more likely that you see if (…) {…} else {..} than it is to see function foo($x) { include $x; }. This is where we gain in performance. Whenever possible our generated code uses static binding for functions and variables. We also use type inference to pick the most specific type possible for our variables and thus save memory.
The transformation process includes three main steps:
Static analysis where we collect information on who declares what and dependencies,
Type inference where we choose the most specific type between C++ scalars, String, Array, classes, Object, and Variant, and
Code generation which for the most part is a direct correspondence from PHP statements and expressions to C++ statements and expressions.
We have also developed HPHPi, which is an experimental interpreter designed for development. When using HPHPi you don’t need to compile your PHP source code before running it. It’s helped us catch bugs in HipHop itself and provides engineers a way to use HipHop without changing how they write PHP.
Overall HipHop allows us to keep the best aspects of PHP while taking advantage of the performance benefits of C++. In total, we have written over 300,000 lines of code and more than 5,000 unit tests. All of this will be released this evening on GitHub under the open source PHP license.
Learn More this Evening
This evening we’re hosting a small group of developers to dive deeper into HipHop for PHP and will be streaming this tech talk live. Check back here around 7:30pm Pacific time if you’d like to watch.
As I’m sure there will be plenty of questions, starting this evening take a look at the HipHop wiki or join the HipHop developer mailing list. You’ll also find us at FOSDEM, SCALE, PHP UK, ConFoo, TEK X, and OSCON over the next few months talking about HipHop for PHP. We’re very excited to evolve HipHop into a thriving open source project along with all of you.