17th Feb 2017
UNIX Concepts: Zombies
UNIX: Zombies, and the Killing of Parents and Children
Zombies(5) UNIX System V (Concepts) Zombies(5) NAME Defunct, zombie and immortal processes DESCRIPTION When a process dies, it becomes a zombie (almost dead) process whose only remaining purpose is to hold its death certificate (the exit status data returned by the wait family of system calls). When the death certificate has been collected, the process is finally removed from existence and from the systems's process table. Zombie processes are marked as <defunct> in ps listings. If the parent of a child has not disowned the child and the parent dies before collecting the child's death certificate, the child is sent to the state orphanage. As long as the parent is alive and the child was not disowned, when a child tries to die, the zombie child remains around until the parent finally collects its death certificate. The state orphanage, process 1 a.k.a. /etc/init, is the second process created after the system is booted and has several principal functions: starting and in some cases maintaining the system daemons and waiting for its children to die. It is given the job of waiting for the deaths of orphaned children as well. This allows zombie children to be put to rest. As an aside, when the system is booted, the boot loader copies the kernel into memory, creates a stack and calls the kernel's main procedure which, in turn, makes itself into process 0, forks itself and that child, process 1, executes /etc/init. In parallel to /etc/init starting the system and the system daemons, process 0 may continue to fork and execute portions of the kernel as asychronous precesses. Process 1 is and other processes with process 0 as their parent may be protected from being given a KILL signal. Processes waiting at very high priorities can not be killed because the signal is first posted to the process kernel control data of the process; but the remaining processing and possible jump to process termination only occurs at lower priorities, that is, below PZERO. In fact, the final processing of a signal within a process occurs just as the process is being readied for return to user state. If the system is a multiple CPU system, the signaling process and the signaled processes are running on different CPUs and the signaled process in running in user state, then the signaling CPU interrupts the signaled CPU so that the signal can be processed for the signaled process. If the process execution does not reach this point of return to user state, then the process can not be signaled (in the case of the KILL signal, killed). A zombie process, since it is already almost dead, can not Page 1 (printed 12/13/97) Zombies(5) UNIX System V (Concepts) Zombies(5) be killed further. Slightly more technical presentation of the above material: If the parent of a forked (or sproced) child did not have SIGCHLD set to the ignore signal condition and the parent exits or is terminated by the system before the parent process has issued one of the wait system calls and retrieved the ending status of the child, the parentage of the child process is reassigned to process 1 (/etc/init in most cases). As long as the the parent is alive and SIGCHLD is not set to ignore signal condition, the process struct of the terminated process is retained in the kernel so that the ending status of the child if and when a wait system call is issued for the child process. Process 1 (/etc/init) after it has initialized the non kernel functions of the operating system, loops on the wait system call. When a orphan dies, process 1 receives and ignores its ending status---this releases the process struct of the terminated process. /etc/init is also looking for the death of its own children so that it can start other processes dependent of that termination or so that it can restart another copy of the process that just terminated. For example, the historical login processing is: init forks itself and execs getty, the getty program (in the child) waits for the communications port to open, getty emits the ``login: '' prompt, getty execs login on top of itself, login authenicates the user, initializes the user uid, gid, current and root directories, etc. and execs the user's login shell on top of itself. When the login process terminates, /etc/init receives its ending status (it is the parent) and it forks itself and execs getty, .... No, there is not a ``fix''. That is the way it is designed to work. It has been this way back at least as far as Release 3 UNIX (and Release 6 was the first version to offically escape Bell Labs). Immortal processes (except those specifically protected by the kernel, that is, those processes whose parent is 0), are caused by the processed waiting for an event (usually I/O related) at a very high priority (typically described as waiting above PZERO). Since such processes usually have critical system resources locked, breaking the lock in a manner that does not release those resource could become a major disaster. A zombie is immortal. An immortal process is not necessarily a zombie. A zombie or defunct process is the death certificate of a process that has already terminated. The only system resource being consumed by it is the process Page 2 (printed 12/13/97) Zombies(5) UNIX System V (Concepts) Zombies(5) block used to store its termination status until the parent process asks for the exit status with a wait(2) family system call. When the parent finally dies, any surviving children, including the zombies are reassigned to the system orphanage---process 1. Process 1, /etc/init, is the system reaper of orphaned children as well as its own. The other purposes of init are system start up and shutdown and the respawnning (restarting, if you wish) of system services such as gettys. The fact that zombied orphans survive long enough for you to observe them is cause for concern about init's health. Defunct processes are zombie processes; these can be deleted by killing the parent program. Use the PPID value to locate the parent; if the PPID is 1, then rebooting is the only solution. There are immortal processes which derive from another source. For these, the only practical solution is rebooting. If a process, while in the kernel locks system critical resources, then the process raises its processing priority above or at the PZERO level. Such processes will not be interrupted by the kernel. If the event for which the process is waiting will never occur, then the process becomes immortal. For example, if a tape drive is unpowered during an I/O operation, then it will never send an I/O complete signal. The tape drive is a system critical resource and therefore the process is waiting above or at PZERO. For another example, in SGI IRIX, kernel mode NFS network communications appears to be handle at or above PZERO. Other examples are possible. An immortal process results. AUTHOR Randolph J. Herber. Page 3 (printed 12/13/97)
Invest in your career. Buy my Shell Scripting Tutorial today:
og:image credit: Unknown. Please contact me if source is known.
Steve Parker - Linux / DevOps Consultant