Notes on a highly scalable WordPress Delivery Platform - Over 10K live requests / second, 20K concurrent connections

By Angsuman Chakraborty, Gaea News Network
Saturday, January 31, 2009

As you may be aware, if you are following my tweets, that we are testing a highly scalable WordPress delivery platform, which can serve over 10000 requests per second from a single server and handle over 20000 concurrent requests without failure. Sounds amazing? then read on…

WordPress, as you may well be aware, is a resource hungry (but well featured) blogging software. How resource hungry is WordPress?

WordPress, a Performance Nightmare & Resource Godzilla

We tested WordPress with a copy of Simple Thoughts (this blog) database on a dual processor quad core Xeon 2 Ghz processor ( i.e. 8 core of 2 Ghz each ) with 4 GB RAM and SATA-2 harddisk on RAID-1 (2 X 7.2K RPM disk = effective speed of 14400 RPM on read), a pretty high-end machine if I may say so. All standard MySQL optimizations were done. PHP has eAccelerator enabled.

WordPress, running on nginx (which is far better than Apache), saturated the server, serving only 50-60 requests per second and without any updates (new posts, comments, pingback, trackback etc.). At 100 requests per second, the server froze. It had to be cold-rebooted (pressing the power button to switch it off, then on) from our data center to bring it back up.

Note: WordPress executes lots of MySQL queries just to render a single page. Many plugins also execute SQL queries, thereby adding to the load. More often than not, you will find MySQL to the single biggest bottleneck in WordPress performance.

You can read my Top 5 tips to improve WordPress performance for easy optimizations you can do.

WordPress Boosters

WordPress sites traditionally use wp-cache 2 and now wp super cache plugins to cache the pages for faster delivery. WordPress.com, hosted platform for WordPress blogs by WordPress company, also uses one of this caching plugins. It works to some extent. We tested it with a maximum of 400 concurrent requests on a single medium-range server for a single URL with Apache Bench. The requests / seconds was much less.

The bad news is that in real-life scenario, both the plugins perform much poorly because of several factors:

  1. Every comment on a page causes it to re-generate the page, thereby increasing the load. Read the solution for increasing WordPress performance on heavily commented blogs.
  2. The cache management of both these plugins is dumb, to put it mildly. Lots of unnecessary regeneration is done which increases load on the system.
  3. It includes complicated logic for handling plugins which may change page content
  4. The cache is served by PHP, which is not a speed demon in its best days :)

The bottomline is that if your blog continues to get popular neither of these cache plugins will suffice.

But there is a bigger problem.

How to ensure stability of WordPress sites?

Traditionally WordPress is run on LAMP stack, where Apache web server & MySQL database is part of the equation. One of the biggest problems is that at high loads WordPress site will frequently freeze for extended period of time and sometimes go completely dead. This is clearly unacceptable.

While there are ways to scale WordPress, how can be ensure stability even at unexpected high load?

Approaches to scaling WordPress

First you should have MySQL running from a separate server to distribute the load better. You can even have multiple Apache HTTP servers pointing to the same MySQL instance. The load will then be distributed with a proxy in the frontend like haproxy or pound or nginx

BTW: We are always talking dedicated server here guys. If you are on shared server, this article is not for you (except wp-cache or super cache part), at least not yet :)

If you are courageous you can try using nginx instead of Apache as your HTTP server. nginx is less resource hungry and better performing than Apache. The nginx uses different configuration file format. Expect to make lots of changes to make nginx work exactly the same as with Apache. Don’t believe what you read on the net about running WordPress on nginx. While most of them are partially true, none of them covers all the bases. They haven’t clearly tested the WordPress installation thoroughly. You cannot just cut-n-paste their configuration to run Taragana network of sites, for example.

Soon MySQL will again become the bottleneck irrespective of how powerful server you use for your database. There are five distinct roads in front of you:

  1. MySQL clustering - high cost, high RAM requirement. Last I checked, it had the requirement of having the whole database resident in memory.
  2. Master-slave replication - Have one MySQL server for insert & updates, with multiple slave servers to serve the pages. This requires changing one of the core WordPress file wp-db to re-direct the requests appropriately. WordPress.com uses this approach.
  3. Master-master replication - This allows all the MySQL databases function as equals and do reads as well as inserts & updates. We have tested this option and it works well.
  4. Sharding - Breaking up the databases to different machines. This could be very useful for WordPress mu where you can move non-shared tables of an user to its own databases. However this will have to be used with other options when some users outgrow their servers. This is complicated to setup and will require lots of testing. It will most likely require you to change core WordPress code.
  5. Move different blogs to different servers and pray each of them do not outgrow their servers. This, like sharding, will need to be supplemented with other methods discussed above.

None of these will ensure that your server will not crash with very high loads. You just have to hope (and test) that your real load doesn’t exceed your capacity.

Simple solution for the complex WordPress problem…

I started looking for a simpler solution, an architecture that will ensure that the site never crashes, even at ridiculously high load as well as having a tremendously high throughput (crossing the 10K requests per second barrier) on a single dedicated server. After some research we came up with a simple solution which solves both of the problems.

Let’s first look at the stats.

We served a copy of Simple Thoughts blog (~ 3000 posts and 10000 comments) from a single server (dual processor quad core 2 Ghz processor with 4 GB RAM, 2 X 250 GB RAID-1 array with effective read speed of 14.4 K rpm). We tested it with a simulation of live load, created from our log files with httperf. We also used Apache Bench and our own Site Load Tester for comprehensively testing the setup and to ensure that there isn’t any mistake in the results.

We were able to serve over 10000 requests per second with 1.6Gbps throughput and without failures! We also handled over 20000 concurrent connections on this server without failures.

Never did the CPU usage go above 10%. We have room to grow.

Note: I didn’t test it beyond 20K concurrent because it will not be required in real life as we will hit bandwith limit sooner :)

Java serves WordPress?

Yes, you have heard it right. We are extensively using Java technology & Grizzly server in our architecture to serve WordPress pages. It is an integral part of our architecture. What we are doing with Java cannot be accomplished with PHP. We are using nginx as well as fastcgi (for PHP).

So where is the the dream WordPress setup now?

We are still rigorously testing the new setup at our new dedicated server in Chicago. We hope to go live next week on Simple Thoughts and other blogs.

BTW: You can get live updates about our progress and more if you follow me on twitter.

Discussion
May 22, 2009: 9:44 am

Nice blog i like it. It’s refreshing to read something on WordPress performance that isn’t a copy-pasta rehash of the same tired old junk.
THANKSSSSS..

March 1, 2009: 8:29 am

As you probably know the system is functional on this blog and other blogs in our network.

Mike,

I like your page fragment caching idea. You can also do things like estimating the caching time for each category of pages thereby reducing the load on the server. We do some real fine-tunings like these on our translator plugins with caching.

March 1, 2009: 8:20 am

Hi Mike,

Thanks for the compliments. Looking forward to know about your plugin, when released.

March 1, 2009: 8:13 am

Hi Donncha,

Thanks for your comment.

With load tester like Apache Bench, for example, which repeatedly downloads the same url, all caching plugins give excellent performance stats.

However in real-load simulation we realized that in spite of having any caching plugin like wp-cache 2, WP Super Cache or our own Light Cache, we were getting down with either incorrect page requests (from bots looking for an exploit) or from search engine bots; and it takes only few requests at a time to bog down WordPress. It isn’t fun watching both Apache & MySQL bringing down idle time to 0% :) So in low loads the server would run like a champ and we could see no problem and yet whenever by chance a large number of bots and humans hits the un-cached pages or expired pages, then all hell would break loose and there would be serious slowdown for extended periods of time. The other issue is comments. On heavily commented blogs, pages would go stale with every comments and suddenly the server would be overloaded, all benefits of page caching gone!
We reduced the load over the time with several strategies as mentioned earlier on my blog, like staggering page staling on comments, longer update periods, reducing max_write_lock_count to 1 to reduce bottlenecks caused by updates and many more. All helped to some extent till we were back to square one. The biggest issue was reliability. I couldn’t go to sleep knowing that WordPress would continue to run seamlessly without getting overloaded, irrespective of any number of bots or humans hitting it simultaneously.

The we solved foremost in this architecture is that we brought back reliability. Now I know the server will hold up, irrespective of any amount of load you can throw at it.

The solution however, as you said, is not for shared hosting. At the least you need to be on VPS or better yet on dedicated server, even super cheap 30$ dedicated server will do just fine.

February 28, 2009: 12:56 pm

You should check out the latest WP Super Cache. Files served by mod_rewrite rules are extremely fast, and that accounts for 99% of human traffic (bot traffic is another matter, that’s hard to cache for because bots only visit a page once so it’s not worth caching)

On pages with lots of comments the “cache rebuild” code serves a slightly out of date page to anonymous clients that request the same page within a few seconds. Instead of the page being generated many times it’s only generated once. It makes a significant difference to server load as can be seen on the graph on my announcement post:
http://ocaoimh.ie/2009/01/23/wp-super-cache-089/

Great post though, but I think it’s beyond the means and ability of most WordPress blogs on shared hosting to implement.

You should check out Batcache too, especially if you have multiple web servers where supercache really isn’t appropriate.

February 15, 2009: 5:45 am

Thanks Angsuman - lots of great detail in this post. I’ll be revisiting when I need to fine tune things!


Mike
February 13, 2009: 7:41 pm

Interesting article. It’s refreshing to read something on WordPress performance that isn’t a copy-pasta rehash of the same tired old junk.

It piqued my interest becuase I’m currently working on a derivative of SuperCache (weird site alert - it’s an in-joke sort of thing) that supports Memcached, eAccelerator, APC & XCache key storage in addition to the traditional disk system. I think it finds a niche where WP-Cache and SuperCache start to get bogged down in constantly globbing directories with thousands of files (although, to be perfectly honest I think it’s just better).

While all the other storage engines scale better than disk (as long as you have the RAM), the Memcached-Alt backend has a generation/revision based expiry for ultra fast purging on huge sites.

The plugin has been fully functional for some time but I’m in the middle of a major overhaul…

While the SuperCache static files via mod_rewrite functionality is similar to the vanilla SuperCache now, it will soon be backed up with a much faster DB query based GC & purging system rather than the current recursive hunt and peck file mtime system.

Based on my observation that a significant percentage of page generation time can be due to plugins processing post and comment text, I’m also working on integrating page fragment caching. This can provide a huge performance boost - particularly on comment-heavy sites.

After all that’s done, I hope to concentrate on adding off-the-shelf compatibility for nginx (and possibly others) since the whole thing is a tad Apache-centric at the moment.

February 1, 2009: 9:09 pm

We use file based intelligent caching.
PHP is served by fastcgi which is delegated by nginx.

February 1, 2009: 8:36 pm

This is kewl!

What I would be interested in knowing is, did you try caching parts of the page using memcached or something? That could have helped reduce the number of SQL queries, and scale.

Second question would be, were you serving the PHP out of nginx or grizzly. If grizzly, were you making use of the php-java-bridge or an RPC library of some sort? How were you setting up the server params?

February 1, 2009: 12:25 pm

nice blogg i like it and i do read it very often

YOUR VIEW POINT
NAME : (REQUIRED)
MAIL : (REQUIRED)
will not be displayed
WEBSITE : (OPTIONAL)
YOUR
COMMENT :