Search This Blog


Tuesday, January 26, 2016

Running bash commands in parallel


A modern server is typically multi-core, perhaps even multi-CPU. That is plenty of computing power to unleash on a given job. However, unless you run a job in parallel, you are not maximizing the use of all that power.

Below are some typical everyday operations we can speed up using parallel computing:

  1. Backup files from multiple source directories to a removable disk.
  2. Resize image files in a directory.
  3. Compress files in a directory.

To execute a job in parallel, you can use any of the following commands:

  • ppss
  • pexec
  • GNU parallel

This post focuses on the GNU parallel command.

Installation of GNU parallel

To install GNU parallel on a Debian/Ubuntu system, run the following command:

$ sudo apt-get install parallel

General Usage

The GNU parallel program provides many options which you can specify to customize its behavior. Interested readers can read its man page to learn more about their usage. In this post, I will narrow the execution of GNU parallel to the following scenario.

My objective is to run a shell command in parallel, but on the same multi-core machine. The command can take multiple options, but only 1 is variable. Specifically, you run concurrent instances of the command by providing a different value for that one variable option. The different values are fed, one per line, to GNU parallel via the standard input.

The rest of this post shows how GNU parallel can backup multiple source directories by running rsync in parallel.

Parallel backup

The following command backs up 2 directories in parallel: /home/peter and /data.

$ echo -e '/home/peter\n/data' | parallel -j-2 -k --eta rsync -R -av {} /media/myBKUP

Standard input

The echo command assembles the 2 source directory locations, separated by a newline character (\n), and pipes it to GNU parallel.

How many jobs?

By default, GNU parallel deploys 1 job per core. You can override the default usint the -j option.

-j specifies the maximum number of parallel jobs that GNU parallel can deploy. The maximum number can be specified in 1 of several ways:

  • -j followed by a number

    -j2 means that up to 2 jobs can run in parallel.

  • -j+ followed by a number

    -j+2 means that the maximum number of jobs is the number of cores plus 2.

  • -j- followed by a number

    -j-2 means that the maximum number of jobs is the number of cores minus 2.

If you don't know how many cores the machine has, run the command below:

$ parallel --number-of-cores

Keeping output order

Each job may output lines to the standard output. When multiple jobs are run in parallel, the default behavior is that a job's output is displayed as soon as the job finishes. You may find this confusing because the output order may be different from the input order. The -k option keeps the output sequence the same as the input sequence.

Showing progress

The --eta option reports progress while GNU parallel executes, including the estimated remaining time (in seconds).

Input place-holder

GNU parallel substitutes the {} parameter with the next line in the standard input.

Each input line is a directory location, e.g., /home/peter. Instead of the full location, you can specify other parameters in order to extract a portion thereof - e.g., the directory name(/home) and the basename (peter). Please refer to the man page for details.


GNU parallel is a tool that Linux administrators should add to their repertoire. Running a job in parallel can only improve one's efficiency. If you are already familiar with xargs, you will find the syntax familiar. Even if you are new to the command, there is a wealth of on-line help on the GNU parallel website.

Friday, November 13, 2015

How to optimize Apache concurrency for running WordPress

To a technical savvy person, hosting your own website on a VPS server can bring a tremendous sense of accomplishment and a wealth of learning opportunities. If WordPress is what you fancy, with a minimal monthly financial commitment, you can host a WordPress website on the LAMP platform (Linux, Apache, MySQL, PHP). For example, the entry-level, $5 per month plan offered by DigitalOcean, of which I am an affiliate, will give you a 512MB RAM, single-core VPS.

With such a small RAM capacity, you will need to optimize how your Apache webserver is configured to run PHP applications such as WordPress and Drupal. The goal is to maximize the number of concurrent web connections.

This tutorial details the Apache/PHP setup procedure on Debian 8.2, aka Jessie. The procedure assumes Apache is yet to be installed. However, if Apache2 is already installed, you will find practical information below on how to reconfigure Apache2 to run a different multi-processing module.

Background knowledge

According to a recent Netcraft webserver survey, Apache powers 46.91% of the top million busiest websites on the Internet. Busy websites mean many concurrent web connections.

Concurrent connection requests to Apache are handled by its Multi-Processing Modules. MPMs can be loosely classified as threaded or non-threaded. Older Apache releases default to a MPM named Prefork. This MPM is non-threaded. Each connection request is handled by a dedicated, self-contained Apache process.

Newer Apache releases default to a threaded MPM, either Worker or Event. The Worker MPM uses one worker thread per connection. One issue with this approach is that a thread is tied up if the connection is kept alive despite it being inactive.

The Event MPM, a variant of Worker, addresses the aforesaid keep-alive issue. A main thread is used as the traffic controller that listens for requests and passes requests to worker threads on demand. In this scenario, an inactive but kept-alive connection does not tie up a worker thread.

Note that MPMs are mutually exclusive: only 1 MPM can be active at any given time.

Traditionally, the Apache core functionality serves static web content (e.g., HTML text and images). To serve dynamic content, such as PHP pages from WordPress, Apache requires special modules that execute PHP code.

For the Prefork MPM, each spawned Apache process embeds its own copy of the PHP handler (mod_php). Concurrency in this model is limited by the number of processes that Apache can spawn given the available memory.

For both Worker and Event MPMs, PHP requests are passed to an external FastCGI process, PHP5-FPM. PHP-FPM stands for PHP-FastCGI Process Manager. Essentially, the webserver and the PHP handler are split to separate processes. Apache communicates with PHP-FPM through an Apache module, either mod_fastcgi or mod_fcgid. Optimizing concurrency in this model means configuring both the MPM and the PHP handler (PHP-FPM) to have pools of processes and threads to handle requests.

The rest of this tutorial covers the cases of installing the Event MPM from scratch as well as migrating to Event from the Prefork MPM.

Installing Apache2

This tutorial starts with the installation of Apache2. If Apache is already installed, you should find out which MPM is currently running using the command apache2ctl -V, and proceed to the next section.

$ sudo apt-get update && sudo apt-get upgrade
$ sudo apt-get install apache2

Next, note the Apache version you just installed and the MPM that is loaded.

$ sudo apache2ctl -V
Server version: Apache/2.4.10 (Debian)
Architecture: 64-bit
Server MPM: event
threaded: yes (fixed thread count)
forked: yes (variable process count)

The above output tells us that we are running Apache release 2.4. Beginning with 2.4, Apache runs the Event MPM by default. If you are running an older version of Apache, the default MPM is either Prefork or Worker.

Configuring Apache2

  1. Back up the Apache configuration file, /etc/apache2/apache2.conf.

    $ sudo cp /etc/apache2/apache2.conf{,.orig}

  2. Edit the configuration file.

    Below is a subset of configuration parameters belonging to the Apache core module. You should adjust their values in order to optimize concurrency. The corresponding values are what I use for an entry-level VPS.

    Timeout 100
    KeepAlive On
    MaxKeepAliveRequests 1000
    KeepAliveTimeout 5

    For an in-depth explanation of the above parameters, please refer to Apache on-line documentation.

  3. Enable mod_rewrite.

    While the mod_rewrite module is not strictly relevant to optimizing concurrency, I've included it here as a reminder to install the module. It is an important module for running WordPress.

    $ sudo a2enmod rewrite

Installing Event MPM

If you are already running the Event MPM, skip to the next section, 'Configuring Event MPM'. Otherwise, follow the procedure below.

  1. Install the Event MPM.

    $ sudo apt-get install apache2-mpm-event

  2. Disable existing MPM.

    Recall that only 1 of Prefork, Worker or Event MPM can be running at any given time. Therefore, if you were previously running Prefork or Worker, you must first disable it, and then enable Event.

    To disable the Prefork MPM, run this command:

    $ sudo a2dismod mpm_prefork

    To disable the Worker MPM, run this:

    $ sudo a2dismod mpm_worker

  3. Enable the Event MPM.

    $ sudo a2enmod mpm_event

    Note that the above enable and disable commands are quite 'forgiving'. If you attempt to enable an MPM that is already enabled, or disable an MPM that is already disabled, it will simply return a harmless informational message.

Configuring Event MPM

To configure the Event MPM, modify its configuration file, /etc/apache2/mods-available/mpm_event.conf. Before making any changes, back it up using the following command:

$ sudo cp /etc/apache2/mods-available/mpm_event.conf{,.orig}

Edit the file to specify the following parameters:

<IfModule mpm_event_module>   
  StartServers 2  
  MinSpareThreads 25  
  MaxSpareThreads 75  
  ThreadLimit 25  
  ThreadsPerChild 25  
  MaxRequestWorkers 250  
  MaxConnectionsPerChild 10000  
  ServerLimit 12  

The above configuration is what I recommend for an entry-level VPS (512MB RAM, single-core). You need to adjust the parameters to satisfy your own system requirements. For a detailed explanation of the above parameters, click here. Note that the Event MPM shares the same parameters as the Worker MPM.

Installing PHP5 handler

To execute PHP code, Apache requires a PHP handler. PHP5-FPM is the PHP handler to use with the Event MPM.

For a new PHP installation, install php5-fpm followed by the meta-package php5.

$ sudo apt-get install php5-fpm php5

In addition to the above packages, I also installed other PHP5 packages which WordPress requires. While they are not strictly relevant to optimizing concurrency, I've included them here for completeness.

$ sudo apt-get install php5-mysql php5-gd php5-curl

Configuring virtual host

Suppose your WordPress website has the domain name To set up a virtual host with that domain name, follow the steps below:

  1. Create the Apache configuration file for

    Instead of creating the file from scratch, use the default site as a template.

    $ sudo cp /etc/apache2/sites-available/000-default.conf /etc/apache2/sites-available/

  2. Edit the configuration file.

    Customize the following site-specific parameters:

    DocumentRoot /var/www/
  3. Create DocumentRoot directory.

    $ sudo mkdir /var/www/
    $ sudo chown -R <webuser>:<webuser> /var/www/


    • WordPress should be installed in the directory /var/www/ For instructions on how to install WordPress, refer to my earlier post.

    • The DocumentRoot directory should be owned by a non-root user.

  4. Enable the new site.

    $ sudo a2ensite

  5. Disable the default site.

    $ sudo a2dissite 000-default.conf

Configuring PHP handler

Follow the procedure below to configure PHP5-FPM.

  1. Create a custom PHP configuration file for by copying the template from the default site.

    $ sudo cp /etc/php5/fpm/pool.d/www.conf /etc/php5/fpm/pool.d/

  2. Edit the configuration file.

    Customize the following parameters.

    user = <webuser>
    group = <webuser>
    listen = /var/run/
    pm = dynamic
    pm.max_children = 5
    pm.start_servers = 2
    pm.min_spare_servers = 1
    pm.max_spare_servers = 3
    pm.max_requests = 2000


    • The user and group parameters specify respectively the Unix user and group names under which the FPM processes will run. You should specify a non-root user for both.

    • The listen parameter specifies the source address that the FPM will listen to for receiving PHP requests. In this case, it will listen to the Unix socket /var/run/

    • The rest of the parameters are for an entry-level VPS system. You should adjust their values to satisfy your system requirements.

    • Click here for more details about the above parameters.

  3. Restart PHP5-FPM.

    $ sudo systemctl restart php5-fpm

Installing FastCGI

Apache requires a FastCGI module to interface with the external PHP5-FPM processes. You can use 1 of 2 FastCGI modules: mod_fastcgi and mod_fcgid. Click here for a discussion of their differences. This tutorial uses mod_fastcgi.

Before you install mod_fastcgi, you must:

  1. Enable non-free.

    Debian pre-packages the mod_fastcgi module in the non-free archive area of its repositories. Make sure that non-free is included in the /etc/apt/sources.list file.

  2. Disable mod_php.

    If Apache2 was previously installed with the Prefork MPM, most likely, it is configured to execute PHP using the mod_php module. In this case, you must disable the mod-php module before you install mod_fastcgi. Otherwise, the install will fail with the error message, 'Apache is running a threaded MPM, but your PHP Module is not compiled to be threadsafe. You need to recompile PHP.'

    To disable mod_php, run this command:

    $ sudo a2dismod php5

To install mod_fastcgi, execute the following command:

$ sudo apt-get install libapache2-mod-fastcgi

Configuring FastCGI

  1. Back up configuration file.

    Before you edit the configuration file /etc/apache2/mods-available/fastcgi.conf, back it up using the following command.

    $ sudo cp /etc/apache2/mods-available/fastcgi.conf{,.orig}

  2. Edit the file.

    Insert the following lines:

    <IfModule mod_fastcgi.c> 
      AddHandler php5-fcgi .php 
      Action php5-fcgi /php5-fcgi 
      Alias /php5-fcgi /usr/lib/cgi-bin/ 
      FastCgiExternalServer /usr/lib/cgi-bin/ -socket /var/run/ -pass-header Authorization 
      <Directory /usr/lib/cgi-bin> 
          Require all granted 


    • should be replaced with your own domain name.

    • To access the website, you need to grant the proper permission explicitly using the Require all granted statement. Without it, access to the website will be denied with the error message 'You don't have permission to access /php5-fcgi/index.php on this server.'

  3. Enable additional modules.

    $ sudo a2enmod actions fastcgi alias

  4. Restart Apache.

    The final step is to restart Apache to make all the above changes go live.

    $ sudo systemctl restart apache2

Threads in action

Concurrency for WordPress occurs at both the webserver (Apache2) and the PHP handler (PHP-FPM) levels. You can use the ps -efL command to monitor the processes and threads at either level.

To monitor Apache processes and threads, execute the following ps command.

$ ps -efL |grep apach[e]
www-data 31681 24441 31681 0 27 03:25 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 31681 24441 31684 0 27 03:25 ? 00:00:00 /usr/sbin/apache2 -k start

The second and the fourth columns are the process ID (PID) and the thread ID respectively. Note that the above output reports 2 different threads (31681 and 31684) of the same process (31681).

Execute the following command to monitor PHP.

$ ps -efL |grep ph[p]
root 24398 1 24398 0 1 Nov10 ? 00:00:17 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)
peter 31519 24398 31519 0 1 03:14 ? 00:00:17 php-fpm: pool
peter 31520 24398 31520 0 1 03:14 ? 00:00:16 php-fpm: pool
peter 31827 24398 31827 0 1 04:15 ? 00:00:15 php-fpm: pool


When traffic to your website increases over time, your webserver must scale up to handle the increase in traffic. This tutorial explains how to configure Apache2 and PHP to optimize the number of concurrent connections. After you try it out, if you still find that your website cannot keep up with the traffic, you should consider upgrading your VPS plan to have more RAM.

If you are interested in WordPress, please refer to my earlier posts.

Friday, October 30, 2015

WPScan: Stop feeling vulnerable about WordPress


WordPress is the most popular content management system (CMS) on the planet today. You can customize the look and feel of a WordPress website using third-party themes. If you want a functionality not offered by the WordPress core, you will most likely find a third-party plugin that satisfies your requirement. With the plethora of themes and plugins comes a major challenge in assuring their quality. Intruders can potentially exploit the vulnerabilities in poorly designed themes and plugins to gain unauthorized access to a WordPress website.

WPScan is a WordPress vulnerability scanner that is free for non-commerical use. It scans your WordPress website and reveals any known vulnerabilities in the installed plugins and themes.

The rest of this post explains how to install and run WPScan.


WPScan comes pre-installed on only a handful of lesser-known Linux distributions. If you run Debian, Ubuntu, Centos or Fedora, you must download the WPScan source and build it yourself. Because WPScan is written in Ruby, to build WPScan, you need to install the Ruby development environment.

Your first decision is to select a machine on which to build WPScan. This is the machine you use to launch WPScan later. Note that you can (and should) run WPScan on a different machine than the WordPress host. The examples in this post are based on a Debian 8.2 machine, aka Jessie.

Your next decision is how you will install the Ruby development environment, including the supporting development libraries. The WPScan website outlines 2 ways to install the necessary development environment on a Debian server: the Debian package management system and the Ruby Version Manager(RVM).

RVM is the WPScan-recommended method. It allows you to install multiple, self-contained Ruby environments on the same system. RVM puts a dedicated Ruby environment under your Home directory (e.g., /home/peter/.rvm). You can find the RVM procedure on the WPScan home page. I've followed the steps, and it works as advertised.

I opted instead for the Debian package manager method because it is a shorter procedure and I did not need the versatility (and the complexity) that RVM offers.

Below are the steps to install WPScan using the Debian package manager. The procedure is largely based on what is on the WPScan home page. I've added a couple of missing packages that are actually required.

  1. Update your Debian repository.

    $ sudo apt-get update

  2. Install the supporting development packages.

    $ sudo apt-get install git make gcc ruby ruby-dev libcurl4-openssl-dev zlib1g-dev

  3. Download the WPScan source.

    $ cd; git clone

  4. Install WPScan.

    $ cd wpscan
    $ sudo gem install bundler
    $ bundle install --without test --path vendor/bundle


The WPScan development team maintains a WordPress vulnerability database. Before you scan your WordPress website, you should first update the local copy of the vulnerability database as follows:

$ cd $HOME/wpscan
$ ruby wpscan.rb --update

To scan a WordPress website for vulnerabilities, use the following general WPScan command sequence.

$ cd $HOME/wpscan
$ ruby wpscan.rb --url --enumerate <options>


  • The --url parameter specifies which WordPress website to scan.

  • The --enumerate parameter specifies which WordPress components to scan for vulnerabilities. I use the following options:
    * vp: Scan only vulnerable plugins.
    * vt: Scan only vulnerable themes.
    * u: Scan users.
    * tt: Scan timthumbs.
    You can combine multiple options with separating commas.
    For example, --enumerate vp,vt,u

  • For further on-line help, run this command:

    $ ruby wpscan.rb --help

Results Analysis

I invoked WPScan to scan the WordPress website deployed on my desktop (localhost). The report is displayed below. Note that the output has been abbreviated in order to improve its overall readability.

WPScan managed to find:

  1. 3 vulnerabilities in the WordPress core.

    Those vulnerabilities were fixed in the most recently released version of WordPress.

  2. 1 vulnerability in the WP Super Cache plugin.

    Again, the vulnerability was fixed in the most recently released version of the plugin.

  3. No known vulnerability in the installed themes.

$ ruby wpscan.rb --url localhost --enumerate vp,vt  
[+] WordPress version 4.3 identified from meta generator    
[!] 3 vulnerabilities identified from the version number    

[!] Title: WordPress <= 4.3-Authenticated Shortcode Tags Cross-Site Scripting (XSS)    
[i] Fixed in: 4.3.1    

[!] Title: WordPress <= 4.3-User List Table Cross-Site Scripting (XSS)    
[i] Fixed in: 4.3.1    

[!] Title: WordPress <= 4.3-Publish Post and Mark as Sticky Permission Issue    
[i] Fixed in: 4.3.1      
[+] Enumerating installed plugins (only ones with known vulnerabilities)...   
[+] We found 1 plugins:  

[+] Name: wp-super-cache - v1.4.4  
[!] The version is out of date, the latest version is 1.4.6  

[!] Title: WP Super Cache <= 1.4.4 - Cross-Site Scripting (XSS)  
[i] Fixed in: 1.4.5  

[+] Enumerating installed themes (only ones with known vulnerabilities)...  
[+] No themes found  


WPScan is an important tool in your defense against possible attacks on your WordPress websites. It is recommended that you schedule WPScan to run regularly to detect known WordPress vulnerabilities. Yet, running WPScan is only half of your job. You remain vulnerable until you patch the vulnerabilities.

In general, the WordPress community fixes most known vulnerabilities and distributes the fixes quickly after the vulnerabilities are first reported. It is important that you keep your WordPress core and the third-party themes and plugins up-to-date. If your WordPress platform is up-to-date, WPScan will most likely return a clean report, and you can stop feeling vulnerable about your WordPress website.

Below are my other posts on WordPress:

Wednesday, July 29, 2015

MySQLTuner series part 3 - minimizing MySQL memory footprint

This is part 3 of the series on using MySQLTuner to optimize MySQL database performance and stability. Part 1 explains how to install and run MySQLTuner. Part 2 addresses the area of database defragmentation. This post illustrates how to manage MySQL memory footprint.

MySQLTuner output

MySQLTuner was used to analyze a WordPress database deployed on the LAMP platform (Linux, Apache, MySQL, PHP). The host was a VPS server with only 512 MB of memory.

$ perl

If you scroll down to the Recommendations section of the above report, it is hard to miss the eye-catching message: 'MySQL's maximum memory usage is dangerously high. Add RAM before increasing MySQL buffer variables.'

Indeed, adding more RAM is often the cheapest and simplest solution to out-of-memory problems. By spending an extra $5 per month, I can upgrade my VPS server to have 1 GB of RAM. But, before you go spend your hard-earned money on RAM, let's explore some other ways to reduce MySQL's memory footprint.

Maximum number of database connections

Lines that begin with two exclamation marks ('!!') are warnings. Note the following lines in the above Performance Metrics section:

-------- Performance Metrics --------------------------------------
[--] Up for: 36d 23h 28m 39s (758K q [0.237 qps], 53K conn, TX: 6B, RX: 98M)
[--] Total buffers: 192.0M global + 2.7M per thread (151 max threads)
[!!] Maximum possible memory usage: 597.8M (120% of installed RAM)
[OK] Highest usage of available connections: 8% (13/151)

According to the above warning, MySQL could potentially use up to 597.8 MB of RAM. Where did the number come from?

The number was derived from the preceding line. MySQL required 192MB globally and 2.7 MB per connection to the database. By default, the maximum number of connections was 150+1. (The 151st connection would be restricted to database administration only.) Hence, the maximum memory usage was 192 + 150 * 2.7, equaling 597.

Should you allow for 150 connections? Keep in mind that each connection, even in the idle state, will take up some memory. MySQLTuner can help you answer the question with confidence.

MySQLTuner reports the highest number of concurrent connections since the last MySQL restart (13 in the above example). The database should be up for a minimum of 24 hours before you run MySQLTuner. In fact, the longer the time elapses since the last restart, the more trustworthy is the statistic.

You can find out from the MySQLTuner report how long MySQL has been up. Go back to the first line under the Performance Metrics section. In the above example, MySQL had been up for 36 days since the last restart.

Although MySQL was configured for accepting 150 connections, the highest number of concurrent connections made in the past 36 days was only 13 (8% of the maximum). In light of that knowledge, we could lower the maximum number of connections allowed, therefore, reducing the total memory footprint for MySQL.

Before we go ahead to reconfigure MySQL, we will consider the wait-timeout threshold which affects how long idle connections stay alive before timing out.

Wait timeout

One of the General recommendations in the above example was:

'Your applications are not closing MySQL connections properly.'

In other words, database connections were opened but not properly closed after queries or updates were already completed. These idle connections would hang around until a predefined timeout threshold was reached. The default timeout threshold was 8 hours. So, if a query completed in 2 seconds, but because the connection was not closed properly, the connection would live for another 28,798 seconds before timing out. In the meantime, the idle connections continued to consume resources including counting toward the maximum number of open connections.

The culprit was easily identified in the above case: the database was used exclusively for WordPress, an application written in PHP. However, solving the problem can be out of your reach, unless you are a PHP developer.

The good news is that you can reduce the timeout interval by adjusting a MySQL configuration parameter. By making idle connections time out faster, there will be less concurrent connections. For WordPress/PHP applications, I set the wait timeout to be 60 seconds.

It is also worth mentioning that because there are less idle connections due to quicker timeout, you can further reduce the maximum number of connections.

Re-configuring MySQL

To change the maximum number of connections or the wait timeout threshold, edit the MySQL configuration file as follows.

$ sudo vi /etc/mysql/my.cnf

The configuration variables of interest are max_connections, and wait_timeout. Enter a value for each variable using the following syntax:

max_connections = 50
wait_timeout = 60

For the above configuration changes to take effect, a restart of the MySQL daemon is needed.

For non-systemd systems, run the following command:

$ sudo service mysql restart

For systemd-enabled systems, run:

$ sudo systemctl restart mysql

Alternatively, you can dynamically change the configuration variables, thereby avoiding the database restart. To do that, issue the following commands.

$ mysql -u root -p -e "SET GLOBAL max_connections=50;"
$ mysql -u root -p -e "SET GLOBAL wait_timeout=60;"

Note that modifying the MySQL configuration file is still required if you want the changes to persist after future system restarts.

What's next?

MySQLTuner is not something you run once and forget about it. Your web traffic pattern changes over time. You should schedule to run it regularly and examine the output. Please refer back to Part 1 of this series for instructions on how to schedule a run.

The more knowledgeable you are about database optimization, the more effective you become on using the information provided by MySQLTuner. I recommend the following videos if you want to learn more about MySQL optimization:

You may also be interested in the following MySQL-related articles from this blog:

For Apache optimization, please refer to this blog post.

Tuesday, July 7, 2015

MySQLTuner series part 2 - defragmenting a MySQL database

Part 1 of this series spells out how to install and run MySQLTuner, a script which recommends MySQL configuration changes. The goal is to optimize database performance and stability. This post describes how to interpret and use MySQLTuner output, specifically in the area of database defragmentation.

Proceed with caution

A word of caution is warranted before I plunge into the details of implementing MySQLTuner's suggestions. MySQLTuner does not excuse you from learning the basic database optimization principles and following industry best practices. Following a MySQLTuner recommendation without researching and understanding its ramifications is a gamble that may end up worsening your database performance and reliability.

Optimizing MySQL configuration is not a trivial matter, and must be done in a controlled manner. You should change only one MySQL configuration variable at a time. After every change, monitor the system to verify that the expected outcome is achieved without any negative side effect.

General comments

MySQLTuner is a PERL script which you can invoke like this:

$ perl

The following is the MySQLTuner output for a low-memory VPS server running on the LAMP platform (Linux, Apache, MySQL, PHP). The VPS is dedicated for running a WordPress blog.

One is often tempted to bypass the first several sections of the report on database metrics, and head straight to the Recommendations section. But, the metrics provide the crucial context for the recommendations that follow, and should be read carefully.

Storage engine statistics

The Storage engine statistics section of the report summarizes the total number and size of InnoDB and MyISAM tables in your database.

In the above example, 18 InnoDB and 4 MyISAM tables were detected. But the report does not identify the tables. If you want to list all InnoDB tables, execute the command below.

$ echo 'select concat(table_schema,".",table_name) from information_schema.tables where engine="InnoDB";'|mysql -u root -p

To list all MyISAM tables, replace InnoDB with MyISAM in the above command.

The key actionable statistic in this section is the total number of fragmented tables (20 in the example). Fragmentation occurs during normal database operations when records are inserted and deleted, leaving behind 'gaps' in the database.

MySQLTuner does not report the size of the 'gaps' or unused space in the fragmented tables. You can find out by running the following MySQL statement.

mysql> select ENGINE, TABLE_NAME,               \  
          DATA_FREE / (INDEX_LENGTH + DATA_LENGTH) as frag_ratio  \
       from information_schema.tables                             \
       where DATA_FREE > 0 order by frag_ratio desc;   
| InnoDB| wp_options|   1179648 |      16384 | 11534336|  9.6438  |  

The DATA_LENGTH and INDEX_LENGTH variables contain respectively the size of the data and the index for a table. DATA_FREE is the size of the unused space in a table. The fragmentation ratio is the amount of unused space to the sum of the used data and index space.

If your tables are large, you can round up the output length variables to megabytes (MB) by using the following SQL statement:

mysql> select ENGINE, TABLE_NAME,                          \
          round(DATA_LENGTH  /1024 / 1024) as data_length, \
          round(INDEX_LENGTH /1024 /1024) as index_length, \
          round(DATA_FREE / 1024 /1024) as data_free,      \
          data_free / (index_length + data_length) as frag_ratio \
       from information_schema.tables                      \
       where DATA_FREE > 0 order by frag_ratio desc;  

Database Defragmentation

If you scroll down to the Recommendations section of the report, you will see that the first general recommendation is 'Run OPTIMIZE TABLE to defragment tables for better performance'. You may execute the OPTIMIZE TABLE SQL statement for each of the 22 tables. Alternatively, you can run the mysqlcheck command as follows:

$ mysqlcheck -Aos --auto-repair -u root -p


  1. Optimizing a table will lock it up. In other words, no update to the table is allowed while the operation is being performed. For a large production table, the substantial downtime is something that the database administrator should consider before deciding to optimize a table.

  2. Optimizing a table does not necessarily reclaim its free space. This is especially true for InnoDB tables. Prior to MySQL version 5.6, all InnoDB tables are by default stored in a single file. This behavior is controlled by the MySQL configuration variable innodb_file_per_table. Optimizing InnoDB tables stored together in a single file may inadvertently produce the undesirable effect of increasing the file size.

  3. InnoDB tables fragment differently than the legacy MyISAM tables. mysqlcheck optimizes an InnoDB table by recreating it. For each InnoDB table that it optimizes, mysqlcheck generates the following informational message: 'Note : Table does not support optimize, doing recreate + analyze instead'. You can safely ignore those messages.

  4. The mysqld server process must be running for mysqlcheck to execute.

  5. -A (--all-databases)

    With -A specified, all tables of all databases are optimized.

    If you want to defragment only a specific table of a specific database, customize the following command.

    $ mysqlcheck -os <database> <table> -u root -p

  6. -o (--optimize)

    This option specifies that the optimize operation is to be performed.

  7. -s (--silent)

    -s enables silent mode: only error messages are displayed.

  8. --auto-repair

    If MySQLTuner finds a target table which is corrupted, it will try to repair it.

What's next?

Part 3 of this series continues the discussion on MySQLTuner output, specifically about the management of database memory footprint.