- Parents and children
- Process scheduler and task structs
A process is a running instance of a program. If you run the same program twice, you usually get two separate processes: they may start from the same executable file, but they don't share the same state.
A process has an identity called a PID (process ID). A PID is just a number the kernel assigns so the process can be referred to precisely (by other programs and by the kernel itself).
Parents and children
Processes are organized as a tree.
When one process asks the kernel to create another process, the creator becomes the parent, and the new one becomes the child. The kernel remembers this relationship:
- each process has exactly one parent (except for PID 1, the root of the tree)
- each process can have zero or more children
This relationship is not just "nice to know": it is used for control and cleanup. For example, a parent can start multiple children to do work, and then wait until those children are done.
At the top of the tree there is a special process: PID 1. PID 1 is the first long-lived user-space process started during boot (we referred to it as the init program in the chapter about the operating system startup).
It is the root of the process tree, and it is always present as the "default parent" when needed.
If a parent process exits while some of its children are still running, those children do not automatically stop. They keep running, but they become orphans (they no longer have their original parent).
The kernel then "re-parents" them: it assigns them a new parent so the tree stays connected. The usual new parent is PID 1.
When a process exits, it produces an exit status (a small number that tells how it ended, for example success vs an error code). The kernel keeps that status so the parent can retrieve it.
This leads to us another couple of terms:
Zombie: an exited process that is no longer running, but still has a minimal record kept by the kernel so the exit status can be collected. A zombie does not keep executing and it does not keep its full memory around. It exists mainly as bookkeeping.
Reaping: the final cleanup step where the kernel removes that last bookkeeping record. This happens when some process collects the exit status of a zombie.
The process that collects the exit status is called the reaper. Normally the parent is the reaper: it asks the kernel "did my child exit, and what was the status?". Once that happens, the zombie can be fully removed.
If a parent is gone and cannot reap its children, this is where PID 1 matters again: orphaned children get re-parented, and PID 1 acts as the fallback reaper so zombies do not accumulate permanently.
Process scheduler and task structs
A computer usually has a small number of CPU cores, but it runs many processes. The part of the kernel that decides which process runs on which core, and for how long, is the process scheduler.
A process can be running, ready to run, sleeping (waiting for something), or stopped. The exact names vary, but the idea is that the kernel must know whether a process can make progress right now or not.
Inside the kernel, a process is represented by a data structure called
task_struct. In C terms, this is literally a struct: a block of fields that the kernel uses as its record for a running task.A
task_struct contains, among other things:- The process identifier (PID), which is the small integer you see in tools like ps or top.
- The current scheduling state (running, sleeping, stopped, etc) and the priority or scheduling parameters used by the scheduler.
- Links to the process’ memory description (what memory ranges exist, what they are used for, and how they map to actual physical memory).
- The table of open file descriptors (the kernel’s list of "this process currently has file X open as descriptor 3", etc).
- Credential and security information (user id, group id, capabilities), which is how the kernel knows what this process is allowed to do.
- Signal state (signals are asynchronous notifications like "terminate", "interrupt", "child exited").
The scheduler’s job is easier to understand if you separate processes into two broad categories:
Runnable processes: processes that could make progress right now if they were given CPU time.
Waiting processes: processes that cannot make progress right now because they are waiting for something (for example: data to arrive from disk, a packet to arrive from the network, a timer to expire, or a lock to be released).
The scheduler mainly chooses between runnable processes. Waiting processes are not competing for the CPU until whatever they are waiting for happens.