17th Feb 2017
UNIX Concepts: Zombies
UNIX: Zombies, and the Killing of Parents and Children
Zombies(5) UNIX System V (Concepts) Zombies(5)
NAME
Defunct, zombie and immortal processes
DESCRIPTION
When a process dies, it becomes a zombie (almost dead)
process whose only remaining purpose is to hold its death
certificate (the exit status data returned by the wait
family of system calls). When the death certificate has
been collected, the process is finally removed from
existence and from the systems's process table. Zombie
processes are marked as <defunct> in ps listings.
If the parent of a child has not disowned the child and the
parent dies before collecting the child's death certificate,
the child is sent to the state orphanage. As long as the
parent is alive and the child was not disowned, when a child
tries to die, the zombie child remains around until the
parent finally collects its death certificate. The state
orphanage, process 1 a.k.a. /etc/init, is the second process
created after the system is booted and has several principal
functions: starting and in some cases maintaining the system
daemons and waiting for its children to die. It is given
the job of waiting for the deaths of orphaned children as
well. This allows zombie children to be put to rest.
As an aside, when the system is booted, the boot loader
copies the kernel into memory, creates a stack and calls the
kernel's main procedure which, in turn, makes itself into
process 0, forks itself and that child, process 1, executes
/etc/init. In parallel to /etc/init starting the system and
the system daemons, process 0 may continue to fork and
execute portions of the kernel as asychronous precesses.
Process 1 is and other processes with process 0 as their
parent may be protected from being given a KILL signal.
Processes waiting at very high priorities can not be killed
because the signal is first posted to the process kernel
control data of the process; but the remaining processing
and possible jump to process termination only occurs at
lower priorities, that is, below PZERO. In fact, the final
processing of a signal within a process occurs just as the
process is being readied for return to user state. If the
system is a multiple CPU system, the signaling process and
the signaled processes are running on different CPUs and the
signaled process in running in user state, then the
signaling CPU interrupts the signaled CPU so that the signal
can be processed for the signaled process. If the process
execution does not reach this point of return to user state,
then the process can not be signaled (in the case of the
KILL signal, killed).
A zombie process, since it is already almost dead, can not
Page 1 (printed 12/13/97)
Zombies(5) UNIX System V (Concepts) Zombies(5)
be killed further.
Slightly more technical presentation of the above material:
If the parent of a forked (or sproced) child did not have
SIGCHLD set to the ignore signal condition and the parent
exits or is terminated by the system before the parent
process has issued one of the wait system calls and
retrieved the ending status of the child, the parentage of
the child process is reassigned to process 1 (/etc/init in
most cases). As long as the the parent is alive and SIGCHLD
is not set to ignore signal condition, the process struct of
the terminated process is retained in the kernel so that the
ending status of the child if and when a wait system call is
issued for the child process. Process 1 (/etc/init) after
it has initialized the non kernel functions of the operating
system, loops on the wait system call. When a orphan dies,
process 1 receives and ignores its ending status---this
releases the process struct of the terminated process.
/etc/init is also looking for the death of its own children
so that it can start other processes dependent of that
termination or so that it can restart another copy of the
process that just terminated. For example, the historical
login processing is: init forks itself and execs getty, the
getty program (in the child) waits for the communications
port to open, getty emits the ``login: '' prompt, getty
execs login on top of itself, login authenicates the user,
initializes the user uid, gid, current and root directories,
etc. and execs the user's login shell on top of itself.
When the login process terminates, /etc/init receives its
ending status (it is the parent) and it forks itself and
execs getty, ....
No, there is not a ``fix''. That is the way it is designed
to work. It has been this way back at least as far as
Release 3 UNIX (and Release 6 was the first version to
offically escape Bell Labs).
Immortal processes (except those specifically protected by
the kernel, that is, those processes whose parent is 0), are
caused by the processed waiting for an event (usually I/O
related) at a very high priority (typically described as
waiting above PZERO). Since such processes usually have
critical system resources locked, breaking the lock in a
manner that does not release those resource could become a
major disaster.
A zombie is immortal. An immortal process is not
necessarily a zombie. A zombie or defunct process is the
death certificate of a process that has already terminated.
The only system resource being consumed by it is the process
Page 2 (printed 12/13/97)
Zombies(5) UNIX System V (Concepts) Zombies(5)
block used to store its termination status until the parent
process asks for the exit status with a wait(2) family
system call.
When the parent finally dies, any surviving children,
including the zombies are reassigned to the system
orphanage---process 1. Process 1, /etc/init, is the system
reaper of orphaned children as well as its own. The other
purposes of init are system start up and shutdown and the
respawnning (restarting, if you wish) of system services
such as gettys.
The fact that zombied orphans survive long enough for you to
observe them is cause for concern about init's health.
Defunct processes are zombie processes; these can be deleted
by killing the parent program. Use the PPID value to locate
the parent; if the PPID is 1, then rebooting is the only
solution.
There are immortal processes which derive from another
source. For these, the only practical solution is rebooting.
If a process, while in the kernel locks system critical
resources, then the process raises its processing priority
above or at the PZERO level. Such processes will not be
interrupted by the kernel. If the event for which the
process is waiting will never occur, then the process
becomes immortal. For example, if a tape drive is unpowered
during an I/O operation, then it will never send an I/O
complete signal. The tape drive is a system critical
resource and therefore the process is waiting above or at
PZERO. For another example, in SGI IRIX, kernel mode NFS
network communications appears to be handle at or above
PZERO. Other examples are possible. An immortal process
results.
AUTHOR
Randolph J. Herber.
Page 3 (printed 12/13/97)
Invest in your career. Buy my Shell Scripting Tutorial today:
og:image credit: Unknown. Please contact me if source is known.
Steve Parker - Linux / DevOps Consultant