Hardware level

Central Processing Unit

System Programming Fundamentals

Central Processing Unit

The CPU is connected to other components via buses, small lines that transmit data and signals back and forth.

The data that the CPU receives is temporarily stored into registers, which on modern 64-bit CPUs are mostly of 64 bits each (although they also have registers that are bigger or smaller than that, for particular operations). In the previous chapter we talked about memory cells: registers are the most performant memory cells available to your CPU. They're relatively expensive and small, but can be read and written to very fast.

The CPU has various specialized units to operate on the binary data in various ways: some for doing integer arithmetic, others for operating on floating point numbers, others yet for operating on vectors etc.

So the data arrives through buses, gets stored into registers, gets operated on by the units, and then sent out via buses again. This is the so called datapath of the CPU.

The CPU also has a control unit, which directs the sequence of operations, following a cycle like this:

fetch phase: the control unit has a special register to store the program counter , aka the address in memory from which it's supposed to get the next instruction. During this phase it will go to that address and get the instruction (and also upload its counter to point to the next command in the program it's executing, for the next cycle)
decode phase: the control unit converts the instruction into microcode, which is it's internal instruction format. Modern CPU manufacturers mantain this separation between a machine code format (that programmers can use to write programs using assemblers), and a microcode format (that is what the CPU will actually execute); this way they can mantain a certain degree of control over the processor, allowing them for example to patch the behavior of the CPU with updates; the manufacturer can in fact publish an update for the CPU, that will change the way your machine code gets translated into microcode (this can be done to improve the security and performance of the CPU, but some suspect it's also done to implement backdoors into our machines)
execute phase: the CPU executes the microcode, trying to optimize and parallelize the execution as much as possible
before starting a new cycle, the control unit checks on an interrupt interface if there's any interrupts coming from other hardware components. An interrupt is a signal that a piece of hardware can send to the CPU, to communicate that something happened or needs to happen. Without interrupts, the CPU would just execute a program top to bottom, without any possibility for external input. Interrupts allow us to have interactive programs, because the control unit will check for these signals at each cycle, check their priority level, and if necessary communicate to the OS that it needs to dispatch a certain function to handle the interrupt. For example, when you press a key x on your keyboard, the keyboard will send an interrupt to the CPU; the CPU will finish its current cycle, check the interrupts, and tell the operating system to launch the function that has been registered to handle that key (which in most cases will pass down the signal to the current program, for example a text editor, which will read the x and insert it into text).

All the different components of a CPU are kept synchronized by a high frequency clock. The clock is like a drummer that keeps the rhytm of the execution, with modern CPUs executing billions of cycle per seconds (1 hertz = 1 cycle per second, 1 gigahertz = 1 billion cycles per second). At each tick of the clock, the control unit executes a new cycle, processing data on the datapath.

What we have described so far is the model of a single CPU core. A modern CPU has multiple cores (usually 4 to 8), which allows them to execute multiple tasks at the same time. Usually there's one core that's designated to boot the computer, and then it can use certain instructions to start up the other ones.

This is a very generic idea of how a modern CPU works. Different CPUs will vary a lot in their details, having different architectures.

The most common CPU architectures nowadays are ARM (which is used in Apple computers, smartphones, and a lot of embedded devices and hardware components) and x86 (which is used in most Linux and Windows computers).

ARM is a RISC architecture (Reduced Instruction Set Computer). It has a more limited set of instructions, which means developers must find ways to write their programs using only those instructions, making it sometimes harder to write and to optimize software. But it's also cheaper to produce and it consume less energy to run (that's how smartphones and embedded devices can have small batteries, and how certain Apple laptops can have very longlasting battery charge).

x86 is a CISC architecture (Complex Instruction Set Computer). It has a much larger set of instructions, which can make it easier to write code for, but also causes higher energy consumption.

There's a lot of less-known architectures, but these are the two most important ones.

The most direct way to program a CPU is to use an assembler. An assembler allows you to write human-readable code (in what's called an Assembly language) and to have it translated into instructions for a certain CPU. For example, FASM is one of the most used assemblers for x86: it allows you to write code like this:

include 'lib.inc'

elf64

data
  msg line "Hello World"

code 
  print msg
  exit 0

Which is not as bad as most people would expect Assembly to be, but it's not as simple print("Hello World") either, that's why most developers prefer to sacrifice the performance and control of Assembly in favor of higher level languages.