Kernel level

Kernel, syscalls and drivers

System Programming Fundamentals

Kernel, syscalls and drivers

The process scheduler
The memory management subsystem
The virtual file system
The networking unit
The inter-process communication unit
Syscalls, drivers and modules

The Linux kernel is the core of the operating system. It runs in a privileged mode of the CPU (often called kernel mode) and controls hardware, memory and processes. Everything else you think of as "Linux" lives in user space and talks to the kernel when it needs something privileged done.

The kernel’s job is to provide a consistent API to user-space programs and to distribute the underlying hardware resources between them. It is useful to think of the kernel as a collection of cooperating subsystems. The core kernel subsystems in Linux are:

The process scheduler
The memory management subsystem
The virtual file system
The networking unit
The inter-process communication unit

The process scheduler

The process scheduler is the part of the kernel that decides which process gets to run on the CPU, and for how long. A process is just a running program with its own memory and resources. If you have many processes but only a few CPU cores, the scheduler decides which process runs on which core, and for how long. To do this, it divides CPU time into small slices and performs context switches: it saves the CPU state of the currently running process and restores the state of another one.

There are two conceptual styles of multitasking. In cooperative multitasking, each process is expected to voluntarily yield the CPU, for example by calling into the OS when it is idle; if a buggy program never yields, it can freeze the system. In preemptive multitasking, which Linux uses, the kernel sets up a timer and can forcibly stop a running process when its time slice is over, then run another one. That way no single user process can hog the CPU forever.

The memory management subsystem

The memory management subsystem is the part of the kernel that manages address spaces and talks to the CPU’s hardware MMU. The hardware MMU’s job is simple but strict: given a virtual address requested by a program, it uses page tables to translate that into a physical address (a location in RAM), and it enforces access permissions on that page. The kernel maintains data structures called page tables that describe how these virtual addresses map to physical pages of RAM, and what permissions each page has (for example, readable, writable or executable).

When the CPU needs to access memory, it goes through the hardware MMU, which consults these page tables that the kernel set up. By controlling those tables, the kernel can give each process its own isolated address space, and prevent processes from reading or writing each other’s memory.

Each process gets its own virtual address space. The kernel creates and switches between these address spaces by loading different page-table roots into the CPU’s MMU registers when it schedules a new process on a core. That way, when process A runs, its virtual address 0x400000 points at some physical memory; when process B runs, the same virtual address might point to a completely different physical page, or to nothing at all. This gives isolation: one process cannot normally read or overwrite another process’s memory, because the MMU will not map its virtual addresses to those physical pages.

On top of this basic translation, the kernel’s MMU subsystem implements higher-level policies and tricks. It can mark some pages as shared between processes (for example, shared libraries or explicit shared-memory regions), use copy-on-write so that a forked child initially shares pages with its parent until one of them writes, and move rarely used pages out to disk (swap) while keeping their virtual addresses valid. All of these features come from the kernel updating the page tables and related data structures; the hardware MMU then enforces whatever the kernel has decided.

The virtual file system

The virtual file system, usually shortened to VFS, is the part of the kernel that implements the idea of "files and directories" that user programs see. Underneath the VFS, there can be many different concrete filesystems. A "concrete filesystem" here means a particular way of laying out files and directories on some storage, along with rules for how to read and write them. For example, ext4 and XFS are common on-disk filesystems for Linux. There are also network filesystems that store data on another machine, and "virtual" or "pseudo" filesystems like procfs and sysfs that do not store data on a disk at all but instead expose information from the kernel as if it were files.

The VFS sits between user programs and these various filesystem drivers. It defines a common set of operations such as "open this path", "read from this file", "write to this file" and "list the contents of this directory". When a program asks to open a path, the VFS figures out which filesystem that path belongs to and then calls into the right filesystem driver to actually fetch or update the data. Because of this indirection, user space does not have to care whether a file is on a local disk, on a network share or generated on the fly by the kernel. Everything is reached with the same simple model.

The networking unit

The networking subsystem is the part of the kernel that lets programs send and receive data over networks. A network here can mean anything from a LAN to the entire Internet. Without this subsystem, each program would have to know how to drive network hardware and how to speak all the protocols itself, which would be impossible to maintain. Instead, the kernel provides a common networking service and hides the details.

On top of basic sending and receiving, the networking subsystem also enforces policies. It can filter packets according to rules (this is what a firewall does), keep network connection state so it knows which packets belong to which connection, and apply routing tables that describe where to send packets based on their destination addresses.

The inter-process communication unit

The inter-process communication unit is the part of the kernel that deals with programs talking to each other. A process is just a running program, and many real systems are made of lots of processes that need to cooperate. They might need to send each other data ("here are the results I just computed") or simple signals ("I’m done, you can continue now").

Processes do not send data to each other directly. Instead, each one asks the kernel to do it on its behalf. The kernel offers a few basic kinds of communication channel: pipes, signals, message queues (we'll analyze them in another chapter); these are all locations where one process can put some information and another process can later pick it up. The kernel keeps track of who created each channel, who is allowed to use it, and what information is currently stored there.

A lot of the work in this unit is about waiting. Often a process cannot continue right away: it may be waiting for some data from another process, or waiting for permission to use a shared resource. In that case it tells the kernel "I am waiting for something to happen here" and the kernel takes it off the CPU so that other processes can run. Later, when another process sends data or finishes using the resource, it tells the kernel, and the kernel notices that someone was waiting. The waiting process is then put back into the pool of runnable processes so it can continue. In this way, the inter-process communication unit acts as a kind of traffic controller and message hub inside the kernel, making sure that information and "ready" signals move between processes in an orderly and safe way.

Syscalls, drivers and modules

To use these kernel services, user programs do not call kernel functions directly. Instead they use system calls, often shortened to "syscalls". A system call is a controlled entry point into the kernel: a program asks the kernel to perform an operation it is not allowed to do by itself, such as opening a file, creating a process or configuring a network interface.

Hardware access itself is mostly handled by device drivers. A device driver is a piece of kernel code that knows how to talk to a particular piece of hardware: disks, network cards, USB controllers, graphics adapters etc. The driver translates generic kernel operations (such as "send this packet" or "read this block", which are supposed to work in the same way on all Linux machines) into the specific register writes, commands and protocols that the specific piece of hardware understands.

Many drivers in Linux are built as kernel modules. A kernel module is a chunk of code that can be loaded into or removed from the running kernel at runtime. Unlike a user-space program, a module does not get its own process or memory space; once loaded it becomes part of the kernel and runs with full kernel privileges. This modularity allows Linux to support a wide range of hardware without compiling everything directly into a single huge kernel image; it also allows administrators and developers to add, update or experiment with drivers without rebooting.