Hardware level

Memory

System Programming Fundamentals

Memory

Virtual memory, protection and I/O
Swap memory

Modern computers have various types of memory they can store data in, the main ones being:

CPU registers

We already analyzed these in the previous chapter. They're tiny (from a few bytes to kilobytes) slots inside the CPU. They hold the values the CPU is using right now (like numbers for the current calculation). The access is basically instantaneous for the CPU.
CPU caches

The CPU tries to infer which data is going to be accessed more often and then stores it temporarily in its cache. There are three common layers:
- L1 cache: the smallest (16 to 128 kilobytes) and fastest cache, each CPU core has one.
- L2 cache: a bit bigger (256 kilobytes to 8 megabytes) and slightly slower, often per core but sometimes shared among multiple cores.
- L3 cache: the biggest (1 MB to 32 megabytes or even more) and slowest cache, usually shared across cores.
When the CPU needs some data, it looks in L1 first (a "cache hit", which is great), then L2, then L3, and only if it’s not there does it fetch from RAM (a "cache miss", which is slower). If you access the same data multiple times in a row (temporal locality) or nearby data in memory (spatial locality), it’s more likely to stay in cache and your code runs faster. It’s generally a good idea to organize data access to take advantage of this. For example, if you’re doing many operations on the elements of an array, do as many operations as possible on the current element before moving on to the next, so it stays hot in cache. Jumping all over a data structure makes it harder for the CPU to predict what you’ll need next and causes more cache misses, forcing the CPU to fetch the data all the way from the RAM.
Random Access Memory (RAM)

This is the main memory of a computer, where the kernel, the programs and their data live while they run. RAM is volatile memory (it loses its contents when power is off). On desktops/laptops you’ll usually see 8–64 GB; workstations/servers can have hundreds of GB or even TBs. It's much bigger than caches but slower.
Storage units (SSD/HDD)

Long-term storage. Nowadays it can store terabytes, but it's much slower than RAM. Used to keep files when the computer is off. When you start a program, the OS loads pieces of it from storage into RAM, to make it faster for the CPU to read the instructions it's composed of. Solid State Drives (SSDs) are far quicker than Hard Disk Drives (HDDs), but both are orders of magnitude slower than RAM and caches.

These are listed in order from the fastest (and more expensive) one, aka CPU registers, to the slowest (and cheapest) one, aka HDDs (which are often absent from modern PCs, who tend to use SSDs for storage). This is called the memory hierarchy: ideally you want a program to work as much as possible on the fastest types of memory, and to use the SSD only for storing data long-term (reading and writing files from the SSD is often a cause of lag in unoptimized programs).

Virtual memory, protection and I/O

CPUs usually define privilege levels to protect the system. For example, in ARM processors an instruction is either in "supervisor mode" or in "user mode", and in x86 processors there's multiple protection rings, from ring 0 to ring 3, with a higher number indicating a lower level of privilege.

Having diferent privilege levels allows the CPU to give a certain program, called kernel, control over the others. This is the basis of how modern OSs work.

Modern kernels organize memory into fixed-size chunks called pages (commonly 4 KB, but other sizes exist), and provides a page table to each program.

Programs don't usually operate on raw, physical RAM addresses. Instead, each process runs in its own virtual address space, and a hardware unit called the MMU (Memory Management Unit) translates virtual addresses to physical addresses using page tables set up by the kernel for that process.

So when a program "accesses" address 00005555556aa3c4, it’s really asking the MMU to translate that virtual address; the MMU consults the process’s page table and returns the corresponding RAM physical location.

By managing those page tables, the kernel can give a program exclusive pages, share pages between processes (e.g., for shared libraries), or even deny access. If a program touches memory that isn’t mapped or isn’t allowed, the CPU raises an error (a fault) and the OS steps in (to load it, create it, or crash the program, depending on the case).

So, the CPU executes all instructions from all programs; but it executes the ones from the kernel of your OS with a higher level of authorization, giving the kernel direct control over the hardware, while all other programs are executed at a lower level of authorization, and must rely on the kernel to access the hardware.

We've seen how access to the RAM is managed using virtual memory: but how does it work for other pieces of hardware? Well...using virtual memory. Some of the virtual addresses are not mapped to the RAM, but to other components. For example, a certain range of addresses will correspond to the GPU internal memory, and when the kernel must allow a program to communicate with the GPU, it will give access to those addresses; the program will treat these addresses pretty much as regular RAM addresses, but when it writes to them it will instead send instructions to the GPU. This technique is called memory mapped I/O (MMIO), and it allows the kernel to manage pretty much all hardware access just by managing which addresses a program can access.

Swap memory

So far we’ve talked about virtual memory mainly as a way to isolate processes and control what parts of RAM they can access. The same mechanism also helps with another problem: what to do when the system is running out of RAM.

Swap is a technique the operating system uses when the RAM is getting full. It reserves a part of the storage device (SSD/HDD) and uses it as extra space for memory, so the system can keep running even if there isn’t enough RAM for everything at once.

This keeps the system from immediately failing when memory is low, but it comes with a cost: reading and writing to storage is much slower than RAM. Light swap usage can help during short spikes, but if the system is constantly swapping, programs tend to feel slow and unresponsive.

Swap space is also commonly used for hibernation. When you hibernate a computer, the operating system saves the contents of RAM to storage so it can power off completely, and later restore the exact state you left (open programs, documents, and so on) by loading that saved memory back into RAM.