MOD_PERL_TUNING(U1s)er
NNAAMMEE
mod_perl_tuning - mod_perl performance tuning
DDEESSCCRRIIPPTTIIOONN
Described here are examples and hints on how to configure
a mod_perl enabled Apache server, concentrating on tips
for configuration for high-speed performance. The primary
way to achieve maximal performance is to reduce the
resources consumed by the mod_perl enabled HTTPD
processes.
This document assumes familiarity with Apache
configuration directives some familiarity with the
mod_perl configuration directives, and that you have
already built and installed a mod_perl enabled Apache
server. Please also read the mod_perl documentation that
comes with mod_perl for programming tips. Some
configurations below use features from mod_perl version
1.03 which were not present in earlier versions.
These performance tuning hints are collected from my
experiences in setting up and running servers for handling
large promotional sites, such as The Weather Channel's
"Blimp Site-ings" game, the MSIE 4.0 "Subscribe to Win"
game, and the MSN Million Dollar Madness game.
BBAASSIICC CCOONNFFIIGGUURRAATTIIOONN
The basic configuration for mod_perl is as follows. In
the httpd.conf file, I add configuration parameters to
make the http://www.domain.com/programs URL be the base
location for all mod_perl programs. Thus, access to
http://www.domain.com/programs/printenv will run the
printenv script, as we'll see below. Also, any *.perl
file will be interpreted as a mod_perl program just as if
it were in the programs directory, and *.rperl will be
mod_perl, but without any HTTP headers automatically sent;
you must do this explicitly. If you don't want these last
two, just leave it out of your configuration.
In the configuration files, I use /var/www as the
ServerRoot directory, and /var/www/docs as the
DocumentRoot. You will need to change it to match your
particular setup. The network address below in the access
to perl-status should also be changed to match yours.
Additions to httpd.conf:
# put mod_perl programs here
# startup.perl loads all functions that we want to use within mod_perl
Perlrequire /var/www/perllib/startup.perl
<Directory /var/www/docs/programs>
AllowOverride None
Options ExecCGI
SetHandler perl-script
PerlHandler Apache::Registry
PerlSendHeader On
</Directory>
# like above, but no PerlSendHeaders
<Directory /var/www/docs/rprograms>
AllowOverride None
Options ExecCGI
SetHandler perl-script
PerlHandler Apache::Registry
PerlSendHeader Off
</Directory>
# allow arbitrary *.perl files to be scattered throughout the site.
<Files *.perl>
SetHandler perl-script
PerlHandler Apache::Registry
PerlSendHeader On
Options +ExecCGI
</Files>
# like *.perl, but do not send HTTP headers
<Files *.rperl>
SetHandler perl-script
PerlHandler Apache::Registry
PerlSendHeader Off
Options +ExecCGI
</Files>
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
deny from all
allow from 204.117.82.
</Location>
Now, you'll notice that I use a Perlrequire directive to
load in the file startup.perl. In that file, I include
all of the use statements that occur in any of my mod_perl
programs (either from the programs directory, or the
*.perl files). Here is an example:
#! /usr/local/bin/perl
use strict;
# load up necessary perl function modules to be able to call from Perl-SSI
# files. These objects are reloaded upon server restart (SIGHUP or SIGUSR1)
# if PerlFreshRestart is "On" in httpd.conf (as of mod_perl 1.03).
# only library-type routines should go in this directory.
use lib "/var/www/perllib";
# make sure we are in a sane environment.
$ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/ or die "GATEWAY_INTERFACE not Perl!";
use Apache::Registry; # for things in the "/programs" URL
# pull in things we will use in most requests so it is read and compiled
# exactly once
use CGI::Base ();
use CGI::BasePlus ();
use CGI::Request ();
use CGI::Form ();
use Mysql ();
1;
What this does is pull in all of the code used by the
programs (but does not import any of the module methods)
into the main HTTPD process, which then creates the child
processes with the code already in place. You can also
put any new modules you like into the /var/www/perllib
directory and simpley use them in your programs. There is
no need to put use lib "/var/www/perllib"; in all of your
programs. You do, however, still need to use the modules
in your programs. Perl is smart enough to know it doesn't
need to recompile the code, but it does need to import the
module methods into your program's name space.
If you only have a few modules to load, you can use the
PerlModule directive to pre-load them with the same
effect.
The biggest benefit here is that the child process never
needs to recompile the code, so it is faster to start, and
the child process actually shares the same physical copy
of the code in memory due to the way the virtual memory
system in modern operating systems works.
You will want to replace the use lines above with modules
you actually need.
SSiimmppllee TTeesstt PPrrooggrraamm
Here's a sample script called printenv that you can stick
in the programs directory to test the functionality of the
configuration.
#! /usr/local/bin/perl
use strict;
# print the environment in a mod_perl program under Apache::Registry
print "Content-type: text/html\n\n";
print "<HEAD><TITLE>Apache::Registry Environment</TITLE></HEAD>\n";
print "<BODY><PRE>\n";
print map { "$_ = $ENV{$_}\n" } sort keys %ENV;
print "</PRE></BODY>\n";
When you run this, check the value of the
GATEWAY_INTERFACE variable to see that you are indeed
running mod_perl.
RREEDDUUCCIINNGG MMEEMMOORRYY UUSSEE
As a side effect of using mod_perl, your HTTPD processes
will be larger than without it. There is just no way
around it, as you have this extra code to support your
added functionality.
On a very busy site, the number of HTTPD processes can
grow to be quite large. For example, on one large site,
the typical HTTPD was about 5Mb large. With 30 of these,
all of RAM was exhausted, and we started to go to swap.
With 60 of these, swapping turned into thrashing, and the
whole machine slowed to a crawl.
To reduce thrashing, limiting the maximum number of HTTPD
processes to a number that is just larger than what will
fit into RAM (in this case, 45) is necessary. The
drawback is that when the server is serving 45 requests,
new requests will queue up and wait; however, if you let
the maximum number of processes grow, the new requests
will start to get served right away, but they will take
much longer to complete.
One way to reduce the amount of real memory taken up by
each process is to pre-load commonly used modules into the
primary HTTPD process so that the code is shared by all
processes. This is accomplished by inserting the use Foo
(); lines into the startup.perl file for any use Foo;
statement in any commonly used Registry program. The idea
is that the operating system's VM subsystem will share the
data across the processes.
You can also pre-load Apache::Registry programs using the
Apache::RegistryLoader module so that the code for these
programs is shared by all HTTPD processes as well.
NNOOTTEE: When you pre-load modules in the startup script, you
will need to kill and restart HTTPD for changes to take
effect. A simple kill -HUP or kill -USR1 will not reload
that code unless you have set the PerlFreshRestart
configuration parameter in httpd.conf to be "On".
RREEDDUUCCIINNGG TTHHEE NNUUMMBBEERR OOFF LLAARRGGEE PPRROOCCEESSSSEESS
Unfortunately, simply reducing the size of each HTTPD
process is not enough on a very busy site. You also need
to reduce the quantity of these processes. This reduces
memory consumption even more, and results in fewer
processes fighting for the attention of the CPU. If you
can reduce the quantity of processes to fit into RAM, your
response time is increased even more.
The idea of the techniques outlined below is to offload
the normal document delivery (such as HTML and GIF files)
from the mod_perl HTTPD, and let it only handle the
mod_perl requests. This way, your large mod_perl HTTPD
processes are not tied up delivering simple content when a
smaller process could perform the same job more
efficiently.
In the techniques below where there are two HTTPD
configurations, the same httpd executable can be used for
both configurations; there is no need to build HTTPD both
with and without mod_perl compiled into it.
These approaches work best when most of the requests are
for static content rather than mod_perl programs. Log
file analysis become a bit of a challenge when you have
multiple servers running on the same host, since you must
log to different files.
TTWWOO MMAACCHHIINNEESS
The simplest way is to put all static content on one
machine, and all mod_perl programs on another. The only
trick is to make sure all links are properly coded to
refer to the proper host. The static content will be
served up by lots of small HTTPD processes (configured not
to use mod_perl), and the relatively few mod_perl requests
can be handled by the smaller number of large HTTPD
processes on the other machine.
The drawback is that you must maintain two machines, and
this can get expensive. For extremely large projects,
this is the best way to go.
TTWWOO IIPP AADDDDRREESSSSEESS
Similar to above, but one HTTPD runs bound to one IP
address, while the other runs bound to another IP address.
The only difference is that one machine runs both servers.
Total memory usage is reduced because the majority of
files are served by the smaller HTTPD processes, so there
are fewer large mod_perl HTTPD processes sitting around.
This is accomplished using the httpd.conf directive
BindAddress to make each HTTPD respond only to one IP
address on this host. One will have mod_perl enabled, and
the other will not.
TTWWOO PPOORRTT NNUUMMBBEERRSS
If you cannot get two IP addresses, you can also split the
HTTPD processes as above by putting one on the standard
port 80, and the other on some other port, such as 8042.
The only configuration changes will be the Port and log
file directives in the httpd.conf file (and also one of
them does not have any mod_perl directives).
The major flaw with this scheme is that some firewalls
will not allow access to the server running on the
alternate port, so some people will not be able to access
all of your pages.
If you use this approach or the one above with dual IP
addresses, you probably do not want to have the *.perl and
*.rperl sections from the sample configuration above, as
this would require that your primary HTTPD server be
mod_perl enabled as well.
Thanks to Gerd Knops for this idea.
UUSSIINNGG PPrrooxxyyPPaassss WWIITTHH TTWWOO SSEERRVVEERRSS
To overcome the limitation of the alternate port above,
you can use dual Apache HTTPD servers with just slight
difference in configuration. Essentially, you set up two
servers just as you would with the two port on same IP
address method above. However, in your primary HTTPD
configuration you add a line like this:
ProxyPass /programs http://localhost:8042/programs
Where your mod_perl enabled HTTPD is running on port 8042,
and has only the directory programs within its
DocumentRoot. This assumes that you have included the
mod_proxy module in your server when it was built.
Now, when you access
http://www.domain.com/programs/printenv it will internally
be passed through to your HTTPD running on port 8042 as
the URL http://localhost:8042/programs/printenv and the
result relayed back transparently. To the client, it all
seems as if it is just one server running. This can also
be used on the dual-host version to hide the second server
from view if desired.
Thanks to Bowen Dwelle for this idea.
SSQQUUIIDD AACCCCEELLEERRAATTOORR
Another approach to reducing the number of large HTTPD
processes on one machine is to use an accelerator such as
Squid http://squid.nlanr.net/Squid/ between the clients
and your large mod_perl HTTPD processes. The idea here is
that squid will handle the static objects from its cache
while the HTTPD processes will handle mostly just the
mod_perl requests. This reduces the number of HTTPD
processes and thus reduces the amount of memory used.
To set this up, just install the current version of Squid
(at this writing, this is version 1.1.16) and use the
RunAccel script to start it. You will need to reconfigure
your HTTPD to use an alternate port, such as 8042, rather
than its default port 80. To do this, just change the
httpd.conf line Port to match the port specified in the
squid.conf file. Your URLs do not need to change.
In the squid.conf file, you will probably want to add
programs and perl to the cache_stoplist parameter so that
these are always passed through to the HTTPD server under
the assumption that they always produce different results.
This is very similar to the two port, ProxyPass version
above, but the Squid cache may be more flexible to fine
tune for dynamic documents that do not change on every
view. The Squid proxy server also seems to be more stable
and robust than the Apache 1.2.4 proxy module.
SSUUMMMMAARRYY
To gain maximal performance of mod_perl on a busy site,
one must reduce the amount of resources used by the HTTPD
to fit within what the machine has available. The best
way to do this is to reduce memory usage. If your
mod_perl requests are fewer than your static page
requests, then splitting the servers into mod_perl and
non-mod_perl versions further allows you to tune the
amount of resources used by each type of request. Using
the ProxyPass directive allows these multiple servers to
appear as one to the users. Using the Squid accelerator
also achieves this effect, but Squid takes care of
deciding when to acccess the large server automatically.
If all of your requests require processing by mod_perl,
then the only thing you can really do is throw a lot of
memory on your machine and try to tweak the perl code to
be as small and lean as possible, and to share the virtual
memory pages by pre-loading the code.
AAUUTTHHOORR
Vivek Khera. My email is MyFirstName@MyLastName.org. I'd
spell it out for you, but if this ends up on the net, I
don't want some automatic mass-mail address collector to
find me.
This document is copyright 1997 by Vivek Khera.
If you have contributions for this document, please send
them to me or post them to the mailing list. Perl POD
format is best, but plain text will do, too.
If you need assistance, contact the mod_perl mailing list
at modperl@LISTPROC.ITRIBE.NET first. There are lots of
people there that can help. Also, check the web pages
http://perl.apache.org/ and http://www.apache.org/ for
explanations of the configuration options.
$Revision: 1.6 $ $Date: 1998/03/19 23:08:22 $