HOW TO Optimizing PHP [english]
PHP is a very fast programming language, but there is more to optimizing PHP than just speed of code execution.
In this chapter, we explain why optimizing PHP involves many factors which are not code related, and why tuning PHP requires an understanding of how PHP performs in relation to all the other subsystems on your server, and then identifying bottlenecks caused by these subsystems and fixing them. We also cover how to tune and optimize your PHP scripts so they run even faster.
Achieving High Performance
When we talk about good performance, we are not talking about how fast your PHP scripts will run. Performance is a set of tradeoffs between speed versus accuracy versus scalability. An example of speed versus accuracyis your scripts might be tuned to run fast with caching, but the datawill tend to grow stale and be less accurate. For an example of speed versus scalabilityyou could write a script that runs fast by loading everything intomemory, or write a more scalable one that only loads data in chunks sothat it does not exhaust application memory(Updated 30 Oct 2009 from speed vs scalability to speed vs accuracy vs scalability).
In the example below, A.php is a sprinter that can run fast, and B.php is a marathon runner than can jog forever at the nearly the same speed. For light loads, A.php is substantially faster, but as the web traffic increases, the performance of B.php only drops a little bit while A.php just runs out of steam.
Letus take a more realistic example to clarify matters further. Suppose weneed to write a PHP script that reads a 250K file and generates a HTMLsummary of the file. We write 2 scripts that do the same thing: hare.php that reads the whole file into memory at once and processes it in one pass, and tortoise.phpthat reads the file, one line at time, never keeping more than thelongest line in memory. Tortoise.php will be slower as multiple readsare issued, requiring more system calls.
Hare.php requires 0.04seconds of CPU and 10 Mb RAM and tortoise.php requires 0.06 seconds ofCPU and 5 Mb RAM. The server has 100 Mb free actual RAM and its CPU is99% idle. Assume no memory fragmentation occurs to simplify things.
At 10 concurrent scripts running, hare.php will run out of memory (10 x 10 = 100). At that point, tortoise.php will still have 50 Mb of free memory. The 11th concurrent script to run will bring hare.php to its knees as it starts using virtual memory, slowing it down to maybe half its original speed; each invocation of hare.php now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php will be still be running at its normal 0.06 seconds CPU time.
In the table below, the faster php script for different loads is in bold:
Connections
CPU seconds required to satisfy 1 HTTP request
CPU seconds required to satisfy 10 HTTP requests
CPU seconds required to satisfy 11 HTTP requests
hare.php
0.04
0.40
0.88
(runs out of RAM)
tortoise.php
0.06
0.60
0.66
As the above example shows, obtaining good performance is not merely writing fast PHP scripts. High performance PHP requires a good understanding of the underlying hardware, the operating system and supporting software such as the web server and database.
Bottlenecks
The hare and tortoise example has shown us that bottlenecks cause slowdowns. With infinite RAM, hare.php will always be faster than tortoise.php. Unfortunately, the above model is a bit simplistic and there are many other bottlenecks to performance apart from RAM:
(a) Networking
Your network is probably the biggest bottleneck. Let us say you have a 10 Mbit link to the Internet, over which you can pump 1 megabyte of data per second. If each web page is 30k, a mere 33 web pages per second will saturate the line.
More subtle networking bottlenecks include frequent access to slow network services such as DNS, or allocating insufficient memory for networking software.
(b) CPU
If you monitor your CPU load, sending plain HTML pages over a network will not tax your CPU at all because as we mentioned earlier, the bottleneck will be the network. However for the complex dynamic web pages that PHP generates, your CPU speed will normally become the limiting factor. Having a server with multiple processors or having a server farm can alleviate this.
(c) Shared Memory
Shared memory is used for inter-process communication, and to store resources that are shared between multiple processes such as cached data and code. If insufficient shared memory is allocated any attempt to access resources that use shared memory such as database connections or executable code will perform poorly.
(d) File System
Accessing a hard disk can be 50 to 100 times slower than reading data from RAM. File caches using RAM can alleviate this. However low memory conditions will reduce the amount of memory available for the file-system cache, slowing things down. File systems can also become heavily fragmented, slowing down disk accesses. Heavy use of symbolic links on Unix systems can slow down disk accesses too.
Default Linux installs are also notorious for setting hard disk default settings which are tuned for compatibility and not for speed. Use the command hdparm to tune your Linux hard disk settings.
(e) Process Management
On some operating systems such as Windows creating new processes is a slow operation. This means CGI applications that fork a new process on every invocation will run substantially slower on these operating systems. Running PHP in multi-threaded mode should improve response times (note: older versions of PHP are not stable in multi-threaded mode).
Avoid overcrowding your web server with too many unneeded processes. For example, if your server is purely for web serving, avoid running (or even installing) X-Windows on the machine. On Windows, avoid running Microsoft Find Fast (part of Office) and 3-dimensional screen savers that result in 100% CPU utilization.
Some of the programs that you can consider removing include unused networking protocols, mail servers, antivirus scanners, hardware drivers for mice, infrared ports and the like. On Unix, I assume you are accessing your server using SSH. Then you can consider removing:
deamons such as telnetd, inetd, atd, ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X
You can also disable at startup various programs by modifying the startup files which are usually stored in the /etc/init* or /etc/rc*/init* directory.
Also review your cron jobs to see if you can remove them or reschedule them for off-peak periods.
(f) Connecting to Other Servers
If your web server requires services running on other servers, it is possible that those servers become the bottleneck. The most common example of this is a slow database server that is servicing too many complicated SQL requests from multiple web servers.
When to Start Optimizing?
Some people say that it is better to defer tuning until after the coding is complete. This advice only makes sense if your programming team's coding is of a high quality to begin with, and you already have a good feel of the performance parameters of your application. Otherwise you are exposing yourselves to the risk of having to rewrite substantial portions of your code after testing.
My advice is that before you design a software application, you should do some basic benchmarks on the hardware and software to get a feel for the maximum performance you might be able to achieve. Then as you design and code the application, keep the desired performance parameters in mind, because at every step of the way there will be tradeoffs between performance, availability, security and flexibility.
Also choose good test data. If your database is expected to hold 100,000 records, avoid testing with only a 100 record database — you will regret it. This once happened to one of the programmers in my company; we did not detect the slow code until much later, causing a lot of wasted time as we had to rewrite a lot of code that worked but did not scale.
Tuning Your Web Server for PHP
We will cover how to get the best PHP performance for the two most common web servers in use today, Apache 1.3 and IIS. A lot of the advice here is relevant for serving HTML also.
The authors of PHP have stated that there is no performance nor scalability advantage in using Apache 2.0 over Apache 1.3 with PHP, especially in multi-threaded mode. When running Apache 2.0 in pre-forking mode, the following discussion is still relevant (21 Oct 2003).
(a) Apache 1.3/2.0
Apache is available on both Unix and Windows. It is the most popular web server in the world. Apache 1.3 uses a pre-forking model for web serving. When Apache starts up, it creates multiple child processes that handle HTTP requests. The initial parent process acts like a guardian angel, making sure that all the child processes are working properly and coordinating everything. As more HTTP requests come in, more child processes are spawned to process them. As the HTTP requests slow down, the parent will kill the idle child processes, freeing up resources for other processes. The beauty of this scheme is that it makes Apache extremely robust. Even if a child process crashes, the parent and the other child processes are insulated from the crashing child.
The pre-forking model is not as fast as some other possible designs, but to me that it is "much ado about nothing" on a server serving PHP scripts because other bottlenecks will kick in long before Apache performance issues become significant. The robustness and reliability of Apache is more important.
Apache 2.0 offers operation in multi-threaded mode. My benchmarks indicate there is little performance advantage in this mode. Also be warned that many PHP extensions are not compatible (e.g. GD and IMAP). Tested with Apache 2.0.47 (21 Oct 2003).
Apache is configured usingthe httpd.conf file. The following parameters are particularlyimportant in configuring child processes (updated 30 Oct 2009- inApache 2, these settings have been moved to conf/extra/httpd-mpm.conf.Make sure you also uncomment the include extra/httpd-mpm.conf in httpd.conf):
Directive
Default
Description
MaxClients
256
The maximum number of child processes to create. The default means that up to 256 HTTP requests can be handled concurrently. Any further connection requests are queued.
StartServers
5
The number of child processes to create on startup.
MinSpareServers
5
The number of idle child processes that should be created. If the number of idle child processes falls to less than this number, 1 child is created initially, then 2 after another second, then 4 after another second, and so forth till 32 children are created per second.
MaxSpareServers
10
If more than this number of child processes are alive, then these extra processes will be terminated.
MaxRequestsPerChild
0
Sets the number of HTTP requests a child can handle before terminating. Setting to 0 means never terminate. Set this to a value to between 100 to 10000 if you suspect memory leaks are occurring, or to free under-utilized resources.
For large sites, values close to the following might be better:
MinSpareServers 32
MaxSpareServers 64
Apache on Windows behaves differently. Instead of using childprocesses, Apache uses threads. The above parameters are not used.Instead we have one parameter: ThreadsPerChild which defaultsto 50. This parameter sets the number of threads that can be spawned byApache. As there is only one child process in the Windows version, thedefault setting of 50 means only 50 concurrent HTTP requests can behandled. For web servers experiencing higher traffic, increase thisvalue to between 256 to 1024.
Other useful performance parameters you can change include:
Directive
Default
Description
SendBufferSize
Set to OS default
Determines the size of the output buffer (in bytes) used in TCP/IP connections. This is primarily useful for congested or slow networks when packets need to be buffered; you then set this parameter close to the size of the largest file normally downloaded. One TCP/IP buffer will be created per client connection.
KeepAlive [on|off]
On
In the original HTTP specification, every HTTP request had to establish a separate connection to the server. To reduce the overhead of frequent connects, the keep-alive header was developed. Keep-alives tells the server to reuse the same socket connection for multiple HTTP requests.
If a separate dedicated web server serves all images, you can disable this option. This technique can substantially improve resource utilization.
KeepAliveTimeout
15
The number of seconds to keep the socket connection alive. This time includes the generation of content by the server and acknowledgements by the client. If the client does not respond in time, it must make a new connection.
This value should be kept low as the socket will be idle for extended periods otherwise.
MaxKeepAliveRequests
100
Socket connections will be terminated when the number of requests set by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients or ThreadsPerChild.
TimeOut
300
Disconnect when idle time exceeds this value. You can set this value lower if your clients have low latencies.
LimitRequestBody
0
Maximum size of a PUT or POST. O means there is no limit.
If you do not requireDNS lookups and you are not using the htaccess file to configure Apachesettings for individual directories you can set:
# disable DNS lookups: PHP scripts only get the IP address
HostnameLookups off
# disable htaccess checks
<Directory />
AllowOverride none
</Directory>
If you are notworried about the directory security when accessing symbolic links,turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to preventadditional lstat() system calls from being made:
Options FollowSymLinks
#Options SymLinksIfOwnerMatch
(b) IIS Tuning
IIS is a multi-threaded web server available on Windows NT and2000. From the Internet Services Manager, it is possible to tune thefollowing parameters:
Performance Tuning based on the number of hits per day.
Determines how much memory to preallocate for IIS. (Performance Tab).
Bandwidth throttling
Controls the bandwidth per second allocated per web site. (Performance Tab).
Process throttling
Controls the CPU% available per Web site. (Performance Tab).
Timeout
Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab)
HTTP Compression
InIIS 5, you can compress dynamic pages, html and images. Can beconfigured to cache compressed static html and images. By defaultcompression is off.
HTTP compression has to beenabled for the entire physical server. To turn it on open the IISconsole, right-click on the server (not any of the subsites, but theserver in the left-hand pane), and get Properties. Click on the Servicetab, and select "Compress application files" to compress dynamiccontent, and "Compress static files" to compress static content.
You can also configure the default isolation level of your web site.In the Home Directory tab under Application Protection, you can defineyour level of isolation. A highly isolated web site will run slowerbecause it is running as a separate process from IIS, while running website in the IIS process is the fastest but will bring down the serverif there are serious bugs in the web site code. Currently I recommendrunning PHP web sites using CGI, or using ISAPI with ApplicationProtection set to high.
You can also use regedit.exeto modify following IIS 5 registry settings stored at the followinglocation [Updated 30 Oct 2009: Tips for IIS6 and IIS7]:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\
MemCacheSize
Sets the amount of memory that IIS will use for its file cache. By default IIS will use 50% of available memory. Increase if IIS is the only application on the server. Value is in megabytes.
MaxCachedFileSize
Determines the maximum size of a file cached in the file cache in bytes. Default is 262,144 (256K).
ObjectCacheTTL
Sets the length of time (in milliseconds) that objects in the cache are held in memory. Default is 30,000 milliseconds (30 seconds).
MaxPoolThreads
Sets the number of pool threads to create per processor. Determines how many CGI applications can run concurrently. Default is 4. Increase this value if you are using PHP in CGI mode.
ListenBackLog
Specifies the maximum number of active Keep Alive connections that IIS maintains in the connection queue. Default is 15, and should be increased to the number of concurrent connections you want to support. Maximum is 250.
If the settings are missing from this registry location, the defaults are being used.
High Performance on Windows: IIS and FastCGI
After much testing, I find that the best PHP performance on Windows is offered by using IIS with FastCGI. CGI is a protocol for calling external programs from a web server. It is not very fast because CGI programs are terminated after every page request. FastCGI modifies this protocol for high performance, by making the CGI program persist after a page request, and reusing the same CGI program when a new page request comes in.
As the installation of FastCGI with IIS is complicated, you should use Zend Core for Windows or php.iis.net. This will install PHP and FastCGI for the best performance possible. The Zend Core installer can also install Apache.
This section on FastCGI updated 30 Oct 2009.
PHP's Zend Engine
The Zend Engineis the internal compiler and runtime engine used by PHP. Developed byZeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation oftheir names. In the early days of PHP4, PHP worked in the followingfashion:
ThePHP script was loaded by the Zend Engine and compiled into Zend opcode.Opcodes, short for operation codes, are low level binary instructions.Then the opcode was executed and the HTML generated sent to the client.The opcode was flushed from memory after execution.
Today, there are a multitudeof products and techniques to help you speed up this process. In thefollowing diagram, we show the how modern PHP scripts work; all theshaded boxes are optional.
PHP Scripts are loaded into memory and compiled into Zend opcodes. These opcodes can now be optimized using an optional peephole optimizer called Zend Optimizer. Depending on the script, it can increase the speed of your PHP code by 0-50%.
Formerly after execution, the opcodes were discarded. Now the opcodes can be optionally cached in memory using several alternative open source products and the Zend Accelerator (formerly Zend Cache), which is a commercial closed source product. The only opcode cache that is compatible with the Zend Optimizer is the Zend Accelerator. An opcode cache speeds execution by removing the script loading and compilation steps. Execution times can improve between 10-200% using an opcode cache.
Where to find Opcode Caches (Modified 30 Oct 2009)
For an overview, see this Wikipedia article on PHP Accelerators.
Zend Platform: A commercial opcode cache developed by the Zend Engine team. Very reliable and robust. Visit http://zend.com for more information.
You will need to test the following open source opcode caches before using them on production servers as their performance and reliability very much depends on the PHP scripts you run.
The eAccelerator is quite popular, and I am using it (Added 28 Feb 2005).
Alternative PHP Cache: http://pecl.php.net/apc. I believe that PHP6 will come with APC built in.
Caching: the Ultimate Speed Booster
One of the secrets of high performance is not to write faster PHP code, but to avoid executing PHP code by caching generated HTML in a file or in shared memory. The PHP script is only run once and the HTML is captured, and future invocations of the script will load the cached HTML. If the data needs to be updated regularly, an expiry value is set for the cached HTML. HTML caching is not part of the PHP language nor Zend Engine, but implemented using PHP code. There are many class libraries that do this. One of them is the PEAR Cache, which we will cover in the next section. Another is the Smarty template library.
Finally, the HTML sent to a web client can be compressed. This is enabled by placing the following code at the beginning of your PHP script:
<?php
ob_start("ob_gzhandler");
:
:
?>
If your HTML ishighly compressible, it is possible to reduce the size of your HTMLfile by 50-80%, reducing network bandwidth requirements and latencies.The downside is that you need to have some CPU power to spare forcompression.
HTML Caching with PEAR Cache
The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.
The most common use of the PEAR Cache is to cache HTML text. To dothis, we use the Output buffering class which caches all text printedor echoed between the start() and end() functions:
require_once("Cache/Output.php");
$cache = new Cache_Output("file", array("cache_dir" => "cache/") );
if ($contents = $cache->start(md5("this is a unique key!"))) {
#
# aha, cached data returned
# print $contents;
print "<p>Cache Hit</p>";
} else {
#
# no cached data, or cache expired
# print "<p>Don't leave home without it...</p>"; # place in cache
print "<p>Stand and deliver</p>"; # place in cache
print $cache->end(10);
Since I wrote these lines, a superior PEAR cache system has been developed: Cache Lite.
The Cacheconstructor takes the storage driver to use as the first parameter.File, database and shared memory storage drivers are available; see thepear/Cache/Container directory. Benchmarks by Ulf Wendel suggest thatthe "file" storage driver offers the best performance. The secondparameter is the storage driver options. The options are "cache_dir",the location of the caching directory, and "filename_prefix", which isthe prefix to use for all cached files. Strangely enough, cache expirytimes are not set in the options parameter.
To cache some data, yougenerate a unique id for the cached data using a key. In the aboveexample, we used md5("this is a unique key!").
The start() function usesthe key to find a cached copy of the contents. If the contents are notcached, an empty string is returned by start(), and all future echo()and print() statements will be buffered in the output cache, untilend() is called.
The end() function returnsthe contents of the buffer, and ends output buffering. The end()function takes as its first parameter the expiry time of the cache.This parameter can be the seconds to cache the data, or a Unix integertimestamp giving the date and time to expire the data, or zero todefault to 24 hours.
Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:
<?php
require_once("Cache.php");
$cache = new Cache("file", array("cache_dir" => "cache/") );
$id = $cache->generateID("this is a unique key");
if ($data = $cache->get($id)) {
print "Cache hit.<br>Data: $data";
} else {
$data = "The quality of mercy is not strained...";
$cache->save($id, $data, $expires = 60);
print "Cache miss.<br>";
}
?>
To save the data weuse save(). If your unique key is already a legal file name, you canbypass the generateID() step. Objects and arrays can be saved becausesave() will serialize the data for you. The last parameter controlswhen the data expires; this can be the seconds to cache the data, or aUnix integer timestamp giving the date and time to expire the data, orzero to use the default of 24 hours. To retrieve the cached data we useget().
You can delete a cached data item using $cache->delete($id) and remove all cached items using $cache->flush().
New: A faster Caching class is Cache-Lite. Highly recommended.
Squid and HTTP Accelerators
Perhaps the most significantchange to PHP performance I have experienced since I first wrote thisarticle is my use of Squid, a web accelerator that is able to take overthe management of all static http files from Apache. You may besurprised to find that the overhead of using Apache to serve bothdynamic PHP and static images, javascript, css, html is extremely high.From my experience, 40-50% of our Apache CPU utilisation is in handlingthese static files (the remaining CPU usage is taken up by PHP runningin Apache).
It is better to offload downloading of these static files by using Squid in httpd-accelerator mode.In this scenario, you use Squid as the front web server on port 80, andset all .php requests to be dispatched to Apache on port 81 which ihave running on the same server. The static files are served by Squid.
Here is a sample setup withSquid 2.6. A portion of the default configuration file for squidmodified for acceleration is shown below. Server is running squid onport 80 and listening on port 8000. Make sure that all http_accesspermissions in default config file are commented out. We assume thatall files are cached for 7 hours (420 minutes). Then add at bottom ofthe default config file:
http_port 80 vport=8000
cache_peer 127.0.0.1 parent 8000 3130 originserver
http_access allow all
# change below to match your hostname (used in logs as host)
visible_hostname 10.1.187.23
cache_store_log none
refresh_pattern -i \.jpg$ 0 50% 420
refresh_pattern -i \.gif$ 0 50% 420
refresh_pattern -i \.png$ 0 50% 420
refresh_pattern -i \.js$ 0 20% 420
refresh_pattern -i \.htm$ 0 20% 420
refresh_pattern -i \.html$ 0 20% 420
See Google for some examples of this.
Distributed Caches (Added 30 Oct 2009)
Web sites continue to growand nowadays clusters of web servers are common. Local caching is nolonger an option and we need to be able to cache across multipleservers. PHP has support for memcached, a distributed key-value cachingsystem. There is a memcache client API for PHP and the extension is available for download. The memcached server can also be downloaded here.
One scenario isthat you could have a cluster of 3 PHP servers sharing a common cachelocated on a single memcache server. For even larger sites, you couldrun multiple memcache servers, each memcache server holding a portionof the cache. For more examples, see this article on LiveJournal's experience with memcache.
Using Benchmarks
In earliersection we have covered many performance issues. Now we come to themeat and bones, how to go about measuring and benchmarking your code soyou can obtain decent information on what to tune.
If you want to performrealistic benchmarks on a web server, you will need a tool to send HTTPrequests to the server. On Unix, common tools to perform benchmarksinclude ab (short for apachebench) which is part of the Apache release,and the newer flood (httpd.apache.org/test/flood). On Windows you can use Microsoft's free Web Capacity Analysis Tool (Updated 30 Oct 2009).
These programs can makemultiple concurrent HTTP requests, simulating multiple web clients, andpresent you with detailed statistics on completion of the tests.
You can monitor how yourserver behaves as the benchmarks are conducted on Unix using "vmstat1". This prints out a status report every second on the performance ofyour disk i/o, virtual memory and CPU load. Alternatively, you can use"top d 1" which gives you a full screen update on all processes runningsorted by CPU load every 1 second [30 Oct 2009 Note: I now recommendrunning vmstat for at least 5 seconds with "vmstat 5" as there is toomuch fluctuation over just 1 second].
On Windows 2000 or later, you can use the Performance Monitor or the Task Manager to view your system statistics.
If you want to test aparticular aspect of your code without having to worry about the HTTPoverhead, you can benchmark using the built-in microtime() function, which returns thecurrent time accurate to the microsecond as a string. The followingfunction will convert it into a number suitable for calculations (modified 30 Oct 2009).
$time = microtime(true);
#
# code to be benchmarked here
#
echo "<p>Time elapsed: ",microtime(true) - $time, " seconds";
Alternatively, you can use a profiling tool such as APD or XDebug. Also see my article squeezing code with xdebug.
Benchmarking Case Study
Thiscase study details a real benchmark we did for a client. In thisinstance, the customer wanted a guaranteed response time of 5 secondsfor all PHP pages that did not involve running long SQL queries. Thefollowing server configuration was used: an Apache 1.3.20 serverrunning PHP 4.0.6 on Red Hat 7.2 Linux. The hardware was a twin PentiumIII 933 MHz beast with 1 Gb of RAM. The HTTP requests will be for thePHP script "testmysql.php". This script reads and processes about 20records from a MySQL database running on another server. For the sakeof simplicity, we assume that all graphics are downloaded from anotherweb server.
We used "ab" as thebenchmarking tool. We set "ab" to perform 1000 requests (-n1000), using10 simultaneous connections (-c10). Here are the results:
# ab -n1000 -c10 http://192.168.0.99/php/testmysql.php
This is ApacheBench, Version 1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/
Server Software: Apache/1.3.20
Server Hostname: 192.168.0.99
Server Port: 80
Document Path: /php/testmysql.php
Document Length: 25970 bytes
Concurrency Level: 10
Time taken for tests: 128.672 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 26382000 bytes
HTML transferred: 25970000 bytes
Requests per second: 7.77
Transfer rate: 205.03 kb/s received
Connnection Times (ms)
min avg max
Connect: 0 9 114
Processing: 698 1274 2071
Total: 698 1283 2185
While running the benchmark, on the server side we monitored the resource utilization using the command "top d 1". The parameters "d 1" mean to delay 1 second between updates. The output is shown below.
10:58pm up 3:36, 2 users, load average: 9.07, 3.29, 1.79
74 processes: 63 sleeping, 11 running, 0 zombie, 0 stopped
CPU0 states: 92.0% user, 7.0% system, 0.0% nice, 0.0% idle
CPU1 states: 95.0% user, 4.0% system, 0.0% nice, 0.0% idle
Mem: 1028484K av, 230324K used, 798160K free, 64K shrd, 27196K buff
Swap: 2040244K av, 0K used, 2040244K free 30360K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1142 apache 20 0 7280 7280 3780 R 21.2 0.7 0:20 httpd
1154 apache 17 0 8044 8044 3788 S 19.3 0.7 0:20 httpd
1155 apache 20 0 8052 8052 3796 R 19.3 0.7 0:20 httpd
1141 apache 15 0 6764 6764 3780 S 14.7 0.6 0:20 httpd
1174 apache 14 0 6848 6848 3788 S 12.9 0.6 0:20 httpd
1178 apache 13 0 6864 6864 3804 S 12.9 0.6 0:19 httpd
1157 apache 15 0 7536 7536 3788 R 11.0 0.7 0:19 httpd
1159 apache 15 0 7540 7540 3788 R 11.0 0.7 0:19 httpd
1148 apache 11 0 6672 6672 3784 S 10.1 0.6 0:20 httpd
1158 apache 14 0 7400 7400 3788 R 10.1 0.7 0:19 httpd
1163 apache 20 0 7540 7540 3788 R 10.1 0.7 0:19 httpd
1169 apache 12 0 6856 6856 3796 S 10.1 0.6 0:20 httpd
1176 apache 16 0 8052 8052 3796 R 10.1 0.7 0:19 httpd
1171 apache 15 0 7984 7984 3780 S 9.2 0.7 0:18 httpd
1170 apache 16 0 7204 7204 3796 R 6.4 0.7 0:20 httpd
1168 apache 10 0 6856 6856 3796 S 4.6 0.6 0:20 httpd
1377 natsoft 11 0 1104 1104 856 R 2.7 0.1 0:02 top
1152 apache 9 0 6752 6752 3788 S 1.8 0.6 0:20 httpd
1167 apache 9 0 6848 6848 3788 S 0.9 0.6 0:19 httpd
1 root 8 0 520 520 452 S 0.0 0.0 0:04 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
Looking at the output of "top", the twin CPU Apache server is running flat out with 0% idle time. What is worse is that the load average is 9.07 for the past minute (and 3.29 for the past 5 minutes, 1.79 for the past 15 minutes). The load average is the average number of processes that are ready to be run. For a twin processor server, any load above 2.0 means that the system is being overloaded. You might notice that there is a close relationship between load (9.07) and the number of simultaneous connections (10) that we have defined with ab.
Luckily we have plenty of physical memory, with about 798,160 Mb free and no virtual memory used.
Further down we can see the processes ordered by CPU utilization. The most active ones are the Apache httpd processes. The first httpd task is using 7280K of memory, and is taking an average of 21.2% of CPU and 0.7% of physical memory. The STAT column indicates the status: R is runnable, S is sleeping, and W means that the process is swapped out.
Given the above figures, and assuming this a typical peak load, we can perform some planning. If the load average is 9.0 for a twin-CPU server and assuming each task takes about the same amount of time to complete, then a lightly loaded server should be 9.0 / 2 CPUs = 4.5 times faster. So a HTTP request that used to take 1.283 seconds to satisfy at peak load will take about 1.283 / 4.5 = 0.285 seconds to complete.
To verify this, we benchmarked with 2 simultaneous client connections (instead of 10 in the previous benchmark) to give an average of 0.281 seconds, very close to the 0.285 seconds prediction!
# ab -n100 -c2 http://192.168.0.99/php/testmysql.php
[ some lines omitted for brevity ]
Requests per second: 7.10
Transfer rate: 187.37 kb/s received
Connnection Times (ms)
min avg max
Connect: 0 2 40
Processing: 255 279 292
Total: 255 281 332
Conversely, doubling the connections, we can predict that the average connection time should double from 1.283 to 2.566 seconds. In the benchmarks, the actual time was 2.570 seconds.
Overload on 40 connections
When we pushed the benchmark to use 40 connections, the server overloaded with 35% failed requests. On further investigation, it was because the MySQL server persistent connects were failing because of "Too Many Connections".
The benchmark also demonstrates the lingering behavior of Apache child processes. Each PHP script uses 2 persistent connections, so at 40 connections, we should only be using at most 80 persistent connections, well below the default MySQL max_connections of 100. However Apache idle child processes are not assigned immediately to new requests due to latencies, keep-alives and other technical reasons; these lingering child processes held the remaining 20+ persistent connections that were "the straws that broke the Camel's back".
The Fix
By switching to non-persistent database connections, we were able to fix this problem and obtained a result of 5.340 seconds. An alternative solution would have been to increase the MySQL max_connections parameter from the default of 100.
Conclusions
The above case study once again shows us that optimizing your performance is extremely complex. It requires an understanding of multiple software subsystems including network routing, the TCP/IP stack, the amount of physical and virtual memory, the number of CPUs, the behavior of Apache child processes, your PHP scripts, and the database configuration.
In this case the PHP code was quite well tuned, so the first bottleneck was the CPU, which caused a slowdown in response time. As the load increased, the system slowed down in a near linear fashion (which is a good sign) until we encountered the more serious bottleneck of MySQL client connections. This caused multiple errors in our PHP pages until we fixed it by switching to non-persistent connections.
From the above figures, we can calculate for a given desired response time, how many simultaneous HTTP connections we can handle. Assuming two-way network latencies of 0.5 seconds on the Internet (0.25s one way), we can predict:
As our client wanted a maximum response time of 5 seconds, the server can handle up to 34 simultaneous connections per second. This works out to a peak capacity of 34/5 = 6.8 page views per second.
To get the maximum number of page views a day that the server can handle, multiply the peak capacity per second by 50,000 (this technique is suggested by the webmasters at pair.com, a large web hosting company), to give 340,000 page views a day.
Code Optimizations
The patient reader who is still wondering why so much emphasis is given to discussing non-PHP issues is reminded that PHP is a fast language, and many of the likely bottlenecks causing slow speeds lie outside PHP.
Most PHP scripts are simple. They involve reading some session information, loading some data from a content management system or database, formatting the appropriate HTML and echoing the results to the HTTP client. Assuming that a typical PHP script completes in 0.1 seconds and the Internet latency is 0.2 seconds, only 33% of the 0.3 seconds response time that the HTTP client sees is actual PHP computation. So if you improve a script's speed by 20%, the HTTP client will see response times drop to 0.28 seconds, which is an insignificant improvement. Of course the server can probably handle 20% more requests for the same page, so scalability has improved.
The above example does not mean we should throw our hands up and give up. It means that we should not feel proud tweaking the last 1% of speed from our code, but we should spend our time optimizing worthwhile areas of our code to get higher returns.
High Return Code Optimizations
The places where such high returns are achievable are in the while and for loops that litter our code, where each slowdown in the code is magnified by the number of times we iterate over them. The best way of understanding what can be optimized is to use a few examples:
Example 1
Here is one simple example that prints an array:
for ($j=0; $j<sizeof($arr); $j++)
echo $arr[$j]."<br>";
This can be substantially speeded up by changing the code to:
for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
$s .= $arr[$j]."<br>";
echo $s;
First we need to understand that the expression $j<sizeof($arr) is evaluated within the loop multiple times. As sizeof($arr) is actually a constant (invariant), we move the cache the sizeof($arr) in the $max variable. In technical terms, this is called loop invariant optimization.
The second issue is that in PHP 4, echoing multiple times is slower than storing everything in a string and echoing it in one call. This is because echo is an expensive operation that could involve sending TCP/IP packets to a HTTP client. Of course accumulating the string in $s has some scalability issues as it will use up more memory, so you can see a trade-off is involved here.
An alternate way of speeding the above code would be to use output buffering. This will accumulate the output string internally, and send the output in one shot at the end of the script. This reduces networking overhead substantially at the cost of more memory and an increase in latency. In some of my code consisting entirely of echo statements, performance improvements of 15% have been observed.
ob_start();
for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
echo $arr[$j]."<br>";
Note that output buffering with ob_start() can be used as a global optimization for all PHP scripts. In long-running scripts, you will also want to flush the output buffer periodically so that some feedback is sent to the HTTP client. This can be done with ob_end_flush(). This function also turns off output buffering, so you might want to call ob_start() again immediately after the flush.
In summary, this example has shown us how to optimize loop invariants and how to use output buffering to speed up our code.
Example 2
In the following code, we iterate through a PEAR DB recordset, using a special formatting function to format a row, and then we echo the results. This time, I benchmarked the execution time at 10.2 ms (this excludes the database connection and SQL execution time):
function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}for ($j = 0; $j < $rs->numRows(); $j++) {
print FormatRow($rs);
}
From example 1, we learnt that we can optimize the code by changing the code to the following (execution time: 8.7 ms):
function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
} ob_start();
print FormatRow($rs);
}
My benchmarks showed me that the use of $max contributed 0.5 ms and ob_start contributed 1 ms to the 1.5 ms speedup.
However by changing the looping algorithm we can simplify and speed up the code. In this case, execution time is reduced to 8.5 ms:
function FormatRow($arr)
{
return '<b>'.$arr[0].'</b><i>'.$arr[1].</i>';
} ob_start();
print FormatRow($arr);
}
One last optimization is possible here. We can remove the overhead of the function call (potentially sacrificing maintainability for speed) to shave off another 0.1 milliseconds (execution time: 8.4 ms):
ob_start();
while ($arr = $rs->fetchRow()) {
print '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}
By switching to PEAR Cache, execution time dropped again to 3.5 ms for cached data:
require_once("Cache/Output.php");
ob_start();
$cache = new Cache_Output("file", array("cache_dir" => "cache/") );
$t = getmicrotime();
if ($contents = $cache->start(md5("this is a unique kexy!"))) {
print "<p>Cache Hit</p>";
print $contents;
} else {
print "<p>Cache Miss</p>";##
## Code to connect and query database omitted
## while ($arr = $rs->fetchRow()) { print '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>'; } print $cache->end(100);
}
print (getmicrotime()-$t);
We summarize the optimization methods below:
ExecutionTime (ms)
Optimization Method
9.9
Initial code, no optimizations, excluding database connection and SQL execution times.
9.2
Using ob_start
8.7
Optimizing loop invariants ($max) and using ob_start
8.5
Changing from for-loop to while-loop, and passing an array to FormatRow()and using ob_start
8.4
Removing FormatRow()and using ob_start
3.5
Using PEAR Cache and using ob_start
From the above figures, you can see that biggest speed improvements are derived not from tweaking the code, but by simple global optimizations such as ob_start(), or using radically different algorithms such as HTML caching.
Optimizing Object-oriented Programming
In March 2001, I conducted some informal benchmarks with classes on PHP 4.0.4pl1, and I derived some advice from the results. The three main points are:
1. Initialise all variables before use.
2. Dereference all global/property variables that are frequently used in a method and put the values in local variables if you plan to access the value more than twice.
3. Try placing frequently used methods in the derived classes.
Warning: as PHP is going through a continuous improvement process, things might change in the future.
More Details
I have found that calling object methods (functions defined in a class) are about twice as slow as a normal function calls. To me that's quite acceptable and comparable to other OOP languages.
Inside a method (the following ratios are approximate only):
- Incrementing a local variable in a method is the fastest. Nearly the same as calling a local variable in a function.
- Incrementing a global variable is 2 times slow than a local var.
- Incrementing a object property (eg. $this->prop++) is 3 times slower than a local variable.
- Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
- Just declaring a global variable without using it in a function also slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists.
- Method invocation appears to be independent of the number of methods defined in the class because I added 10 more methods to the test class (before and after the test method) with no change in performance.
- Methods in derived classes run faster than ones defined in the base class.
- A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations.
Update: 11 July 2004: The above test was on PHP 4.0.4, about 3 years ago. I tested this again in PHP4.3.3 and calling a function now takes about 20 $localvar++ operations, and calling a method takes about 30 $localvar++ operations. This could be because $localvar++ runs faster now, or functions are slower.
Summary of Tweaks
- The more you understand the software you are using (Apache, PHP, IIS, your database) and the deeper your knowledge of the operating system, networking and server hardware, the better you can perform global optimizations on your code and your system.
- Try to use as much caching as possible, typically I would use this configuration: Squid -- PHP and memcache or file caching -- Database.
- For PHP scripts, the most expensive bottleneck is normally the CPU. If you are not getting out of memory messages, more CPUs are probably more useful than more RAM (Updated 30 Oct 2009).
- Compile PHP with the "configure —-enable-inline-optimization" option to generate the fastest possible PHP executable.
- Tune your database and index the fields that are commonly used in your SQL WHERE criteria. ADOdb, the very popular database abstraction library, provides a SQL tuning mode, where you can view your invalid, expensive and suspicious SQL, their execution plans and in which PHP script the SQL was executed.
- Use HTML caching if you have data that rarely changes. Even if the data changes every minute, caching can help provided the data is synchronized with the cache. Depending on your code complexity, it can improve your performance by a factor of 10.
- Benchmark your most complex code early (or at least a prototype), so you get a feel of the expected performance before it is too late to fix. Try to use realistic amounts of test data to ensure that it scales properly.
Updated 11 July 2004: To benchmark with an execution profile of all function calls, you can try the xdebug extension. For a brief tutorial of how i use xdebug, see squeezing code with xdebug. There are commercial products to do this also, eg. Zend Studio.
- Consider using a opcode cache. This gives a speedup of between 10-200%, depending on the complexity of your code. Make sure you do some stress tests before you install a cache because some are more reliable than others.
- Use ob_start() at the beginning of your code. This gives you a 5-15% boost in speed for free on Apache. You can also use gzip compression with ob_gzhandler() for extra fast downloads (this requires spare CPU cycles) - Updated 30 Oct 2009.
- Consider installing Zend Optimizer. This is free and does some optimizations, but be warned that some scripts actually slow down when Zend Optimizer is installed. The consensus is that Zend Optimizer is good when your code has lots of loops. Today many opcode accelerators have similar features (added this sentence 21 Oct 2003).
- Optimize your loops first. Move loop invariants (constants) outside the loop.
- Use the array and string functions where possible. They are faster than writing equivalent code in PHP.
- The fastest way to concatenate multiple small strings into onelarge string is to create an output buffer (ob_start) and to echo intothe buffer.At the end get the contents using ob_get_contents. This works becausememory allocation is normally the killer in string concatenation, andoutput buffering allocates a large 40K initialbuffer that grows in 10K chunks. Added 22 June 2004.
- Pass objects and arrays using references in functions. Return objects and arrays as references where possible also. If this is a short script, and code maintenance is not an issue, you can consider using global variables to hold the objects or arrays.
- If you have many PHP scripts that use session variables, consider recompiling PHP using the shared memory module for sessions, or use a RAM Disk. Enable this with "configure -—with-mm" then re-compile PHP, and set session.save_handler=mm in php.ini.
- For searching for substrings, the fastest code is using strpos(), followed by preg_match() and lastly ereg(). Similarly, str_replace() is faster than preg_replace(), which is faster than ereg_replace().
- Added 11 July 2004: Order large switch statements with most frequently occuring cases on top. If some of the most common cases are in the default section, consider explicitly defining these cases at the top of the switch statement.
- For processing XML, parsing with regular expressions is significantlyfaster than using DOM or SAX.
- Unset() variables that are not used anymore to reduce memory usage. This is mostly useful for resources and large arrays.
- For classes with deep hierarchies, functions defined in derived classes (child classes) are invoked faster than those defined in base class (parent class). Consider replicating the most frequently used code in the base class in the derived classes too.
- Consider writing your code as a PHP extension or a Java class or a COM object if your need that extra bit of speed. Be careful of the overhead of marshalling data between COM and Java.
Useless Optimizations
Someoptimizations are useful. Others are a waste of time - sometimes theimprovement is neglible, and sometimes the PHP internals change,rendering the tweak obsolete.
Here are some common PHP legends:
a. echo is faster than print
Echois supposed to be faster because it doesn't return a value while printdoes. From my benchmarks with PHP 4.3, the difference is neglible. Andunder some situations, print is faster than echo (when ob_start isenabled).
b. strip off comments to speed up code
Ifyou use an opcode cache, comments are already ignored. This is a mythfrom PHP3 days, when each line of PHP was interpreted in run-time.
c. 'var='.$var is faster than "var=$var"
Thisused to be true in PHP 4.2 and earlier. This was fixed in PHP 4.3. Note(22 June 2004): apparently the 4.3 fix reduced the overhead, but notcompletely. However I find the performance difference to be negligible.
Do References Speed Your Code?
References do not provide any performance benefits for strings, integers and other basic data types. For example, consider the following code:
function TestRef(&$a)
{
$b = $a;
$c = $a;
}
$one = 1;
ProcessArrayRef($one);
And the same code without references:
function TestNoRef($a)
{
$b = $a;
$c = $a;
}
$one = 1;
ProcessArrayNoRef($one);
PHP does not actually create duplicate variables when "pass by value" is used, but uses high speed reference counting internally. So in TestRef(), $b and $c take longer to set because the references have to be tracked, while in TestNoRef(), $b and $c just point to the original value of $a, and the reference counter is incremented. So TestNoRef() will execute faster than TestRef().
In contrast, functions that accept array and object parameters have a performance advantage when references are used. This is because arrays and objects do not use reference counting, so multiple copies of an array or object are created if "pass by value" is used. So the following code:
function ObjRef(&$o)
{
$a =$o->name;
}
is faster than:
$function ObjRef($o)
{
$a = $o->name;
}
Note: In PHP 5, all objects are passed by reference automatically, without the need of an explicit & in the parameter list. PHP 5 object performance should be significantly faster.
Many thanks also to Andrei Zmievski for reviewing this article.