Cheapest Performance Improvement Tip for Heavily Used File Systems (High IO Wait) / Heavily Loaded MySQL Database / Over worked WordPress

By angsuman, Gaea News Network
Sunday, May 30, 2010

I am sure you have heard a lot of theoretical discussions on what is the best file systems to use, which IO scheduler and which storage device - ATA (obsolete), SATA, SAS, SSD etc. Today I will talk about my personal experience.

My production server was having wait times in excess of 50% (average). The primary cause was MySQL hammering a 2 disk RAID-1 of 10K Western Digital Velociraptor drives. %util on the RAID-1 array was almost always at 100% which is really really bad along with the unbearable IO Wait of the machine.

The load average was over 20. All these while the CPU idle time remained around 70% on average. CPU idle time isn’t a good indicator for IO stress. We were at a point where the MySQL wasn’t working close to what it should be and work was piling up.

I looked at other possible causes like whether I was getting optimum IOPS or throughput from the drives or whether my software RAID-1 was operating properly. The hard-disks were fine and the software RAID-1 was too. It gave me almost double the read speed while not compromising on the write speed.

Tip: Software RAID-1 versus Hardware RAID-1

Hardware RAID-1 introduces another point of failure and you will find lots of horror stories of hardware raid controllers on the web ranging from:

  • Single drive failure causing the other drive to fail by the RAID controller
  • Controller failing and data being unrecoverable
  • Controller failing and replacement being unavailable etc.

Under high MySQL loads I would trust a Software RAID controller more than Hardware controller any day.  CPU utlization by software RAID controllers shouldn’t be an issue with modern CPU, specially not with 8 core Xeons that I am using.

Tip: How to find out about your disk health and IO Wait times and %utilization?

Use iostat. The command I commonly use is:

iostat -dxmn 5 sda sdb sdc …

Replace sda, sdb etc. with your own harddisk names.

The SSD route …

Going for a SSD was the only option cheap left, the other being moving MySQL to a different server. I decided to try SSD first.

Which SSD to use: SLC or MLC

First of all SSD costs a lot where leased from a dedicated web hosting provider. My provider Gigenet (pronounced gi-gi-net) charges $115 per month for a 80 GB Intel X25-M (MLC). They mentioned having couple of 32 GB X25-E also but I like more space and…

Intel SLC has larger MTBF*while X25-M (MLC) is no slouch either with MTBF of 1.2 million hours which was more than I needed. Also I figured that with more spare disk space, wearing of the disk will be low specially with newer generation of wear levelling algorithms. So I chose Intel X25-M for my MySQL database.

My previous setup provided RAID-1 which gives you peace of mind in case of disk failure which my SSD doesn’t. My solution is to automatically take regular backups of the database with a MySQL Master-Slave setup and taking backups from the slave.

Result of switching from 2 10K WD velociraptor in RAID-1 to single Intel X25-M SSD

Average Load time has come down to 0.96 (I have dual processor quad core xeon).

Average CPU idle time is just below 90%.

Average %iowait is at 6.24 (note: I have another SATA RAID-1 setup for OS and webserver).

%util for SSD: 6.74

In short my server is chugging along happily and everything is back to normal.

SSD specific optimization on Linux

The only optimization I did so far was to add noatime in /etc/fstab for the SSD file system (I do it for all filesystems as a rule).

I also looked at Linux IO schedulers like using noop or deadline scheduler instead of the default but haven’t tested and decided on it yet.

Finally: Cheapest way to improve MySQL performance

Yes, I know SSD is costly. Having said that often it is the cheapest option you have to speed up your MySQL database without tearing your remaining hairs on fruitless little optimizations. Cost is not just the money you spend but also the time you spend behind silly optimizations and watching your system for further improvements.

Note: After adding SSD I did notice some select queries waiting for updates on a regular basis (before SSD everything was dog slow so it was hard to optimize at that point) so I decided to set low_priority_updates to 1 in MySQL configuration (my.cnf) which took care of the problem. I also added couple of indexes.

*MTBF: Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation.)

YOUR VIEW POINT
NAME : (REQUIRED)
MAIL : (REQUIRED)
will not be displayed
WEBSITE : (OPTIONAL)
YOUR
COMMENT :