User level

Permissions, processes and file descriptors

System Programming Fundamentals

Permissions, processes and file descriptors

Sessions and users
Permissions on files
File descriptors
Spawning new processes
Daemons and init systems

On early Unix machines, the operating system was designed for multiple people sharing the same hardware. A large computer would sit in a room, and several users would connect to it through terminals. Each person had their own login, their own running programs and their own files, but they were all using the same CPU, the same RAM and the same disks. From the beginning, Unix needed a way to keep those users separated enough that one person could not accidentally (or deliberately) destroy another person’s work. That is where users, groups, permissions and the process model come from: they are the basic tools the kernel uses to decide "who are you?" and "what are you allowed to do?".

Sessions and users

When you log in to a Unix system, the kernel creates a login session for you. A session is a group of processes that share some common context: a controlling terminal, a user identity and an environment. The program that handles the login verifies your password, then starts a shell or a desktop as your first process in that session. From there, every command you run becomes a child process of that shell. If you close the terminal or log out, the session ends; your terminal or graphical seat is freed for the next login.

The "who are you?" part is represented by user accounts. Each user has a username (such as "alice" or "root"), but internally the kernel uses a numeric user id, often abbreviated as UID. Every process has a UID attached to it, and the kernel uses that UID to decide what they can access. The superuser account, traditionally called root, has a special UID (usually 0) that bypasses most checks; this is why running programs as root is powerful but dangerous.

Groups are a second axis for permissions. A group is a named collection of users, again represented internally by numeric group ids (GIDs). A user can belong to several groups at once. This allows the system to express simple policies like "this directory is writable by anybody in the ‘developers’ group, but not by other users" without having to list every user individually.

Permissions on files

The "what are you allowed to do?" part is represented by permissions. Permissions on files are usually represented as three small groups of flags: one group for the file's owner, one for the file's associated group, and one for everyone else. Each group can contain up to three basic permissions:

r (read)
w (write)
x (execute)

For a regular file:

read means the process may read the bytes of the file
write means it may change those bytes
execute means it may ask the kernel to load the file into memory as a program and run it

For a directory, the same letters have slightly different effects:

read controls whether you can list the names of entries in the directory
write controls whether you can create or remove entries in that directory
execute controls whether you are allowed to "enter" the directory when the kernel walks through a path (for example, when opening /home/alice/file.txt it must "enter" /home and /home/alice)

If a permission is not granted, you see a dash instead of the letter:

- means "this permission is not allowed here"

If you look at the long listing produced by a command such as ls -l, you might see something like:

-rwxr-x---

Those 10 characters can be read as:

- : type of the object ("-" means regular file, "d" would mean directory)
rwx : permissions for the owner of the file (read, write, execute)
r-x : permissions for the group (read, no write, execute)
--- : permissions for others (no read, no write, no execute)

When you run a command such as chmod you are just asking the kernel to change these flags.

File descriptors

Every running program is wrapped in a process, and the kernel keeps a fair amount of information with that process: which user and groups it belongs to, which terminal (if any) it is attached to, which current working directory it is using, and which files and communication channels it currently has open. Input and output are handled via file descriptors.

A file descriptor is a small non-negative integer used by a process to refer to an open resource.

By convention, every process starts with at least three file descriptors already open.

Descriptor 0 is standard input, used for reading data in.

Descriptor 1 is standard output, used for normal program output.

Descriptor 2 is standard error, used for error messages and diagnostics.

When a process opens a file or a socket, the kernel picks the lowest unused descriptor number (3, 4, 5, 6 and so on) and associates it with that resource. From then on, the process does not have to repeat the path or the socket details; it just says "read from descriptor 3" or "write to descriptor 5", and the kernel looks up the real thing. The "file" in "file descriptor" matches the broad Unix sense: the underlying object can be a regular file, a terminal, a pipe, a socket or a device.

When you run a program from a shell, the shell passes its own descriptors into the child so that the program reads from your terminal and writes back to it. Shell redirection (>, <, 2>, and so on) works by rearranging these numbers before the new program starts.

For example, if you redirect output to a file, the shell opens the file first, gets a descriptor for it and then asks the kernel to make that descriptor become the child’s descriptor 1. The child itself just writes to descriptor 1 as usual; it does not need to know whether that goes to a terminal, a file or something else.

Spawning new processes

New processes in Unix are created in a way that many people find unintuitive. Instead of a single call that says "start this program with these options", the traditional model is split in two separate steps: fork and exec. This design is powerful but also odd: a program first asks the kernel to create a child that is almost an exact copy of itself, and only afterwards does that child turn into the new program you actually wanted to run.

Fork asks the kernel to make a copy of the current process. After a successful fork, there are two processes running: the original (the parent) and a new one (the child). They both continue execution from the same line of code, with the same memory contents, the same current directory and the same open file descriptors. The kernel gives the child a new process id, and the fork call returns different values in the two processes so that they can tell who is who.

Exec, short for "execute", replaces the current process image with a new program loaded from an executable file. When a process calls exec, the kernel discards its current code and data segments and maps in the segments from the requested executable instead, keeping some things such as open file descriptors, user ids and (optionally) environment variables.

The usual pattern in shells and many other programs is "fork, adjust some details in the child, then use exec to swap the child with a new program".

Historically, this two-step model comes from early Unix systems running on limited hardware. The kernel designers wanted a small, uniform set of system calls, and the ability to build more complex behaviour by combining them. At that time, processes were small, and copying them at fork time was not as expensive as it would be now. Fork also gave a convenient way to start child processes that inherit most of the parent’s state, which was useful for shells and servers that wanted to adjust just a few details before running a different program. Over time, copy-on-write techniques reduced the cost of making the copy, but the basic fork/exec interface stayed, largely because a lot of existing code relied on it.

The downside of this model is that it is easy to get details wrong. Because fork starts by duplicating almost everything, the child will inherit all open file descriptors by default. If the parent had a sensitive file open, or a network socket used to talk to a privileged service, and it forgets to close or mark those descriptors as "do not inherit", the new program will unexpectedly have access to them. That can become a security problem, especially when starting programs that run with different permissions or are less trusted. Environment variables and the current directory can also leak information or influence behaviour in ways the parent did not intend.

There are performance issues as well. Even with copy-on-write, creating a child with fork means setting up a new virtual memory mapping that initially mirrors the parent. For very large processes this can still be expensive, because the kernel has to duplicate page tables and other bookkeeping, even if the child is about to call exec and replace all of its memory with something else. In multi-threaded programs, fork has additional complications: only the thread that calls fork is preserved in the child, while others simply disappear, potentially leaving global data structures in inconsistent states.

Because of these pitfalls, the POSIX standard later introduced posix_spawn as a higher-level way to start a new program. Instead of explicitly forking and then calling exec in the child, a program calls posix_spawn with a description of what it wants: which program to run, which arguments and environment to use, which file descriptors to pass through or close, and optionally which directory or credentials to adopt. The implementation can then create the child process in a more direct and controlled way, often without doing a full copy of the parent at all. This can be faster, avoids many of the accidental inheritance problems, and gives a clearer description of the new process’s initial state. From the outside, the result is the same as with fork and exec: a new child process appears, with its own process id, running the requested program.

Daemons and init systems

Not every program is started by a person typing a command. Many programs are meant to run in the background and provide a service, like managing network connections, logging messages, scheduling tasks etc.. These background service programs are commonly called daemons. A daemon usually starts during boot, keeps running for a long time, and does not have a terminal attached.

Even though daemons run "in the background", they still start life like any other process, with the same idea of standard input, standard output, and standard error. The difference is that a daemon usually does not have a terminal for those to point to. For that reason, daemons typically make sure standard input goes to /dev/null (a special file that act as a black hole: data written to it gets discarded, and trying to read it will always return "end of file"), and standard output and standard error go somewhere sensible, like a log file or a central logging system.

Modern systems often prefer supervising daemons: a manager starts the daemon, watches it, restarts it if it crashes, and keeps a clear record of whether it is running.

That manager is part of what we referred previously as the init system: the first user-space program (PID1) that starts during boot and then starts the rest of the system. The most common init system on modern Linux distirbutions is systemd, with OpenRC as a distant second.

They have two different approaches to this job. Both can start services at boot, stop them at shutdown, and give you commands to manage services while the machine is running, but:

Systemd uses a fairly uniform configuration format to describe services and their relationships, and it actively tracks them after they start. It can start many services in parallel, restart services when they fail, and provide a consistent interface across the system. The tradeoff is that it is a large, integrated set of components, so there is more complexity in one place, and programs built around it tend to follow its way of doing things, making them potentially harder to run on Linux installations that do not use systemd.
OpenRC is more traditional and more script-focused. Services are commonly started and stopped using readable shell scripts, and it tries to stay closer to the traditional "small tools working together" Unix style. That can make it feel simpler to inspect and customize. The tradeoff is that you often rely on separate tools for features that systemd bundles tightly, and service behavior can vary more depending on how those scripts are written.