Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Sun, 03 Aug 2025 21:55:50 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20100105101321 location: https://web.archive.org/web/20100105101321/https://www.perl.com/pub/a/2003/01/07/mod_perl.html server-timing: captures_list;dur=0.487959, exclusion.robots;dur=0.018846, exclusion.robots.policy;dur=0.009483, esindex;dur=0.011088, cdx.remote;dur=29.227044, LoadShardBlock;dur=525.979539, PetaboxLoader3.datanode;dur=249.606326, PetaboxLoader3.resolve;dur=193.443596 x-app-server: wwwb-app225 x-ts: 302 x-tr: 582 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=2 set-cookie: wb-p-SERVER=wwwb-app225; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Sun, 03 Aug 2025 21:55:51 GMT content-type: text/html; charset=ISO-8859-1 x-archive-orig-date: Tue, 05 Jan 2010 10:13:20 GMT x-archive-orig-server: Apache x-archive-orig-p3p: policyref="https://www.oreillynet.com/w3c/p3p.xml",CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa CONo OUR DELa PUBi OTRa IND PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE" x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: iso-8859-1 memento-datetime: Tue, 05 Jan 2010 10:13:21 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Fri, 10 Jan 2003 17:47:33 GMT", ; rel="prev memento"; datetime="Fri, 25 Sep 2009 11:26:33 GMT", ; rel="memento"; datetime="Tue, 05 Jan 2010 10:13:21 GMT", ; rel="next memento"; datetime="Sat, 06 Feb 2010 08:44:08 GMT", ; rel="last memento"; datetime="Sat, 13 Apr 2024 13:35:36 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 51_13_20100105091051_crawl103-c/51_13_20100105100954_crawl101.arc.gz server-timing: captures_list;dur=0.615837, exclusion.robots;dur=0.022180, exclusion.robots.policy;dur=0.011513, esindex;dur=0.014750, cdx.remote;dur=18.859413, LoadShardBlock;dur=246.831328, PetaboxLoader3.datanode;dur=202.409390, PetaboxLoader3.resolve;dur=200.447008, load_resource;dur=236.613438 x-app-server: wwwb-app225 x-ts: 200 x-tr: 560 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip perl.com: Improving mod_perl Sites' Performance: Part 6

Listen Print

Improving mod_perl Sites' Performance: Part 6

Forking

by Stas Bekman
January 07, 2003

It's desirable to avoid forking under mod_perl, as when you do, you are forking the entire Apache server -- lock, stock and barrel. Not only is your Perl code and Perl interpreter being duplicated, but so is mod_ssl, mod_rewrite, mod_log, mod_proxy, mod_speling (it's not a typo!) or whatever modules you have used in your server, all the core routines.

Modern operating systems come with a light version of fork, which adds a little overhead when called, since it was optimized to do the absolute minimum of memory pages duplications. The copy-on-write technique is what allows it to do so. The gist of this technique is as follows: The parent process' memory pages aren't immediately copied to the child's space on fork(); this is done only when the child or the parent modifies the data in some memory pages. Before the pages get modified, they get marked as dirty and the child has no choice but to copy the pages that are to be modified since they cannot be shared any more.

If you need to call a Perl program from your mod_perl code, then it's better to try to covert the program into a module and call it as a function without spawning a special process to do that. Of course, if you cannot do that or the program is not written in Perl, then you have to call via system() or its equivalent, which spawns a new process. If the program is written in C, then you can try to write a Perl glue code with help of XS or SWIG architectures, and then the program will be executed as a Perl subroutine.

Also, by trying to spawn a sub-process, you might be trying to do the "wrong thing". If what you really want is to send information to the browser and then do some post-processing, then look into the PerlCleanupHandler directive. The latter allows you to tell the child process after request has been processed and user has received the response. This doesn't release the mod_perl process to serve other requests, but it allows you to send the response to the client faster. If this is the situation and you need to run some cleanup code, then you may want to register this code during the request processing stage like so:

  my $r = shift;
  $r->register_cleanup(\&do_cleanup);
  sub do_cleanup{ #some clean-up code here }

But when a long-term process needs to be spawned, there is not much choice but to use fork(). We cannot just run this process within the Apache process because it'll keep the Apache process busy, instead of allowing it to do the job it was designed to do. Also, if Apache stops, then the long-term process might be terminated as well unless coded properly to detach from Apache's process group.

Forking a New Process

This is a typical way to call fork() under mod_perl:

  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
    # some code comes here
    CORE::exit(0);
  }
  # possibly more code here usually run by the parent

When using fork(), you should check its return value, because if it returns undef, it means that the call was unsuccessful and no process was spawned; something that can happen when the system is running too many processes and cannot spawn new ones.

When the process is successfully forked, the parent receives the PID of the newly spawned child as a returned value of the fork() call and the child receives 0. Now the program splits into two. In the above example, the code inside the first block after if will be executed by the parent and the code inside the first block after else will be executed by the child process.

It's important not to forget to explicitly call exit() at the end of the child code when forking - if you don't and there is some code outside the if/else block, then the child process will execute it as well. But under mod_perl there is another nuance: You must use CORE::exit() and not exit(), which would be automatically overriden by Apache::exit() if used in conjunction with Apache::Registry and similar modules. We actually do want the spawned process to quit when its work is done, otherwise, it'll just stay alive, use resources and do nothing.

The parent process usually completes its execution path and enters the pool of free servers to wait for a new assignment. If the execution path is to be aborted earlier for some reason, then one should use Apache::exit() or die(). In the case of Apache::Registry or Apache::PerlRun handlers, a simple exit() will do the correct thing.

The child shares with parent its memory pages until it has to modify some of them, which triggers a copy-on-write process that copies these pages to the child's domain before the child is allowed to modify them. But this all happens afterward. At the moment the fork() call is executed, the only work to be done before the child process goes on its separate way is to set up the page tables for the virtual memory, which imposes almost no delay at all.

Freeing the Parent Process

In the child code, you must also close all pipes to the connection socket that were opened by the parent process (i.e. STDIN and STDOUT) and inherited by the child, so the parent will be able to complete the request and free itself for serving other requests. If you need the STDIN and/or STDOUT streams, then you should reopen them. You may need to close or reopen the STDERR filehandle. It's opened to append to the error_log file as inherited from its parent, so chances are that you will want to leave it untouched.

Under mod_perl, the spawned process also inherits a file descriptor that's tied to the socket through which all communication between the server and the client occur. Therefore, we need to free this stream in the forked process. If we don't do that, then the server cannot be restarted while the spawned process is still running. If an attempt is made to restart the server, then you will get the following error:

  [Mon Dec 11 19:04:13 2000] [crit] 
  (98)Address already in use: make_sock:
    could not bind to address 127.0.0.1 port 8000

Apache::SubProcess comes to our aid and provides a method cleanup_for_exec(), which takes care of closing this file descriptor.

So the simplest way to free the parent process is to close all three STD* streams if we don't need them, and untie the Apache socket. In addition, you may want to change the process' current directory to / so the forked process won't keep the mounted partition busy, if this is to be unmounted at a later time. To summarize all this issues, here is an example of the fork that takes care of freeing the parent process.

  use Apache::SubProcess;
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
      $r->cleanup_for_exec(); # untie the socket
      chdir '/' or die "Can't chdir to /: $!";
      close STDIN;
      close STDOUT;
      close STDERR;
  
    # some code comes here
  
      CORE::exit(0);
  }
  # possibly more code here usually run by the parent

Of course, between the freeing-parent code and child-process termination, the real code is to be placed.

Detaching the Forked Process

Now what happens if the forked process is running and we decide that we need to restart the Web server? This forked process will be aborted, since when the parent process dies during the restart, it'll kill its child processes as well. In order to avoid this, we need to detach the process from its parent session by opening a new session. We do this with help of setsid() system call, provided by the POSIX module:

  use POSIX 'setsid';
  
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    # Parent runs this block
  } else {
    # Child runs this block
      setsid or die "Can't start a new session: $!";
      ...
  }

Now the spawned child process has a life of its own, and it doesn't depend on the parent any longer.

Avoiding Zombie Processes

Now let's talk about zombie processes.

Normally, every process has its parent. Many processes are children of the init process, whose PID is 1. When you fork a process, you must wait() or waitpid() for it to finish. If you don't wait() for it, then it becomes a zombie.

A zombie is a process that doesn't have a parent. When the child quits, it reports the termination to its parent. If no parent wait()s to collect the exit status of the child, then it gets "confused" and becomes a ghost process. This process can be seen as a process, but not killed. It will be killed only when you stop the parent process that spawned it!

Generally, the ps(1) utility displays these processes with the <defunct> tag, and you will see the zombies counter increment when doing top(). These zombie processes can take up system resources and are generally undesirable.

So the proper way to do a fork is:

  my $r = shift;
  $r->send_http_header('text/plain');
  
  defined (my $kid = fork) or die "Cannot fork: $!";
  if ($kid) {
    waitpid($kid,0);
    print "Parent has finished\n";
  } else {
      # do something
      CORE::exit(0);
  }

In most cases, the only reason you would want to fork is when you need to spawn a process that will take a long time to complete. So if the Apache process that spawns this new child process has to wait for it to finish, then you have gained nothing. You can neither wait for its completion (because you don't have the time to), nor continue because you will get yet another zombie process. This is called a blocking call, since the process is blocked to do anything else before this call gets completed.

The simplest solution is to ignore your dead children. Just add this line before the fork() call:

  $SIG{CHLD} = 'IGNORE';

When you set the CHLD (SIGCHLD in C) signal handler to 'IGNORE', all the processes will be collected by the init process and are therefore prevented from becoming zombies. This doesn't work everywhere, however. It proved to work at least on the Linux OS.

Note that you cannot localize this setting with local(). If you do, then it won't have the desired effect.

So now the code would look like this:

  my $r = shift;
  $r->send_http_header('text/plain');
  
  $SIG{CHLD} = 'IGNORE';
  
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    print "Parent has finished\n";
  } else {
      # do something time-consuming
      CORE::exit(0);
  }

Note that waitpid() call is gone. The $SIG{CHLD} = 'IGNORE'; statement protects us from zombies, as explained above.

Another, more portable but slightly more expensive solution, is to use a double fork approach.

  my $r = shift;
  $r->send_http_header('text/plain');
  
  defined (my $kid = fork) or die "Cannot fork: $!\n";
  if ($kid) {
    waitpid($kid,0);
  } else {
    defined (my $grandkid = fork) or die "Kid cannot fork: $!\n";
    if ($grandkid) {
      CORE::exit(0);
    } else {
      # code here
      # do something long lasting
      CORE::exit(0);
    }
  }

$grandkid becomes a "child of init", i.e. the child of the process whose PID is 1.

Note that the previous two solutions do allow you to know the exit status of the process, but in my example I didn't care about it.

Another solution is to use a different SIGCHLD handler:

  use POSIX 'WNOHANG';
  $SIG{CHLD} = sub { while( waitpid(-1,WNOHANG)>0 ) {} };

This is useful when you fork() more than one process. The handler could call wait() as well, but for a variety of reasons involving the handling of stopped processes and the rare event when two children exit at nearly the same moment, the best technique is to call waitpid() in a tight loop with a first argument of -1 and a second argument of WNOHANG. Together, these arguments tell waitpid() to reap the next child that's available, and prevent the call from blocking if there happens to be no child ready for reaping. The handler will loop until waitpid() returns a negative number or zero, indicating that no additional reapable children remain.

While you test and debug your code that uses one of the above examples, you might want to write some debug information to the error_log file so you know what happens.

Read perlipc manpage for more information about signal handlers.

Pages: 1, 2