Binary level

Basics

System Programming Fundamentals

Basics

Bitfields and bitflags

Modern software has an incredible variety of use cases.

We use software to communicate from one country to another, to navigate roads, to play movies, to model the physical constraints of buildings and vehicles, to simulate virtual realities in videogames, etc.

Software engineers are often looking for uniform interfaces: ways to represent data in a format that all of these different programs can digest, allowing them to communicate and interoperate with each other.

For example, when you navigate to a webpage as a user, you open it with your browser; but when you make your own website, you might want to edit your webpage with a text editor. So webpages are written as html documents, which your text editor can edit as regular text, but your browser can render as interactive web applications. This is an example of uniform interface that both the browser and the text editor can interact with.

At the lowest level, basically all modern software relies on a single universal interface: binary code.

The smallest unit of binary code is the binary digit, aka bit.

The goal of this unit is to hold state, to indicate one of multiple possibilities.

A bit has only two possible states, that can be represented as 0 and 1. It was chosen because it's easier to implement in hardware than a 3 (or 4, or 5...) states unit. But having only 2 possible states, it cannot hold that much information. You could represent a single binary value with a bit, for example:

yes = 1

no = 0

But that's not very useful. So we tend to group bits into bigger blocks, like bytes. There has been a lot of debate about how many bits a byte should contain, but modern tech has converged on it containing 8 bits.

Basically everything in your computer (text files, pictures, videos, executables, etc) is stored as binary code, as sequences of bytes.

If you think about it, this makes the binary in itself very hard to decypher. If I just give you a huge binary sequence, how would you know if it's a movie or a program? They're both represented as binary code!

That's why software employs various ways to tag binary sequences and give them context: at the OS (operating system) level, files usually have a certain file extension (like .txt ot .exe); programming languages usually have a type system, that allows the compiler to know "this is an integer", "that one is a pointer to a function", etc

Different data can be encoded in different ways. When you write a program, you can come up with your own way to encode data.

Bitfields and bitflags

The simplest example of custom encoding are probably bitfields.

Let's say you are looking at apartments in your city, and you have a list of requirements for it:

It comes already furnished
It doesn't cost too much
It's not too far from a bus station
It has air conditioning
It's in a safe area of the city
It has wifi

You can use a byte for each apartment to record which of these conditions it satisfy, and a bit for each condition, using 1 to indicate that the feature is present in the apartment, and 0 to represent the feature is missing. For example, let's say you find an apartment that satisfies 5 of your requirements, but lacks air conditioning:

requirement	Does this apartment satisfy it?	binary value
It comes already furnished	yes	1
It doesn't cost too much	yes	1
It's not too far from a bus station	yes	1
It has air conditioning	no	0
It's in a safe area of the city	yes	1
It has wifi	yes	1
		0
		0

Since you only have 6 requirements and a byte is 8 bits, the last two bits are left unused (we don't care whether they're 1 or 0). This structure as a whole is called a bitfield (which in this example was built with just one byte, but could include more bytes if needed). In this case we say the bitfield has 6 bitflags, one for each question.

That's a lot of information packed into a single byte!

Bitfields are extremely useful to encode program-specific information in a very small amount of bits, but most of the times we want to encode types of information that are commonly used in many different programs, like numbers or text. For those we have a variety of standard encodings, as we will se in the next lessons.