User level

"Everything is a file"

System Programming Fundamentals

"Everything is a file"

Unix started in the early 1970s at Bell Labs, built by a small group of developers working on comparatively weak machines by today's standards. Memory and storage were scarce, and the system had to be simple enough that a small team could understand and maintain it. Instead of trying to design a huge operating system that solved every problem in one place, they focused on a small kernel and a collection of reusable programs that ran on top of it. Most interaction with the system happened through a text-based shell, where users typed commands and combined programs at the command line.

From that context came the basic Unix philosophy in userland: write small programs that each do one job well, make them read input and write output in simple formats (often plain text), and design them so they can be connected together. The shell and the process model make it easy to chain programs with pipes, redirect input and output, and treat files, devices and some communication channels in similar ways. Once you understand how programs use these generic abstractions, it becomes easier to reason about Unix systems as a whole, because higher-level behavior is mostly built by composing these pieces.

Unix kernels try to expose as many resources as possible through a single abstraction: the file. This is not just about documents or images stored on disk. The same interface is used for directories, hardware devices, communication channels between programs and even some views into the kernel itself. This idea is often summarized as "everything is a file", and it is one of the main reasons Unix systems feel simple and consistent even though the underlying hardware and software are very diverse.

A regular file is the simplest case. It is an ordered sequence of bytes plus some metadata. The metadata includes things like the file name, its size, who owns it and which permissions it has. The operating system does not care what those bytes "mean". A text editor will interpret them as characters, an image viewer will interpret them as pixels, a video player as frames and audio, and so on. From the kernel’s point of view, it just provides operations like "open this file", "read some bytes", "write some bytes" and "close the file".

Files are organized into directories (often called folders). A directory is itself a special kind of file that stores a list of names and the references associated with those names. Some of those names refer to regular files, some refer to other directories. By chaining names together with slashes you get a path, which is how you tell the system "start from this directory and walk through these names until you reach the file I care about".

When you use a path like /home/user/logs/app.log, the kernel resolves it one component at a time by walking these tables:

The directory at the very top is conventionally called the root directory and written as "/". All other paths are somewhere under that root.
then it looks up "home" in the root directory
then "user" in the /home directory
then "logs" in the /home/user directory
then "app.log" in the /home/user/logs directory

Sometimes you want to have the same underlying file appear in multiple places in the directory tree without copying its contents. Symbolic links, often shortened to symlinks, exist for this purpose. A symbolic link is a tiny file whose contents are just another path. When you open a symlink, the kernel silently follows the path stored inside it and opens the target instead. This lets administrators and programs rearrange the tree (or provide convenient aliases) without moving the underlying data.

The "everything is a file" idea becomes more interesting when you look at hardware devices. Under directories such as /dev you will find many entries that look like files but do not store ordinary data on disk. These are device files: small records the kernel uses to expose hardware through the same open/read/write/close interface. There are two main categories of device files: character devices and block devices.

Character devices represent things you typically interact with as a stream of bytes. Examples include serial ports, keyboards, mice or simple hardware sensors. You open the device file, then read bytes as they arrive or write bytes to send commands. There is no concept of "jump to byte 1000" inside the stream; you just consume the bytes in order as the device produces them. From the program’s perspective this looks a lot like reading from or writing to a regular file, but underneath the kernel is talking to the device driver instead of a disk.

Block devices represent hardware that naturally works in fixed-size chunks of data, such as disks, partitions or some kinds of non-volatile memory. Instead of treating them as endless streams, the kernel treats them as arrays of equally sized blocks. A filesystem is then layered on top of a block device to turn those blocks into directories and regular files. When a program opens a file from such a filesystem, it does not talk to the block device directly; it talks to the filesystem code, which in turn uses the block device file to read and write the right blocks at the right positions.

Processes also need ways to talk to each other and to the network, and Unix again reuses the file interface for this. Sockets are endpoints for communication. A network socket might represent a TCP connection to another machine; a Unix domain socket might represent a local connection between processes on the same system. In both cases, a program can get a handle to a socket and then use read and write operations to receive and send data. Unix domain sockets often appear in the filesystem as special entries under directories like /run or /tmp, which reinforces the idea that "you open a path and talk through it", even though the bytes are not being stored on disk.

Another simple communication mechanism is the named pipe, also called a FIFO. FIFO stands for "first in, first out": the data is read in the same order it was written. A named pipe appears in the filesystem with a path, just like a regular file. One process opens it for writing, another opens it for reading. The kernel then transports the bytes between them. Nothing is permanently stored on disk; the named pipe entry is just a rendezvous point so that the two processes can find each other. Once again, the programs simply open a path and use normal read/write calls, without needing to know how the kernel forwards the data.

All of these objects (regular files, directories, symlinks, device files, sockets and FIFOs) are not identical internally, but they all reuse the same small set of operations. Programs open them, read from them, write to them and close them using the same system calls. This uniform interface makes Unix systems easier to program against: tools that were originally designed to read from or write to regular files often work unchanged with device files, sockets or pipes.

This unification has trade-offs. Some resources do not naturally behave like byte streams, and forcing them into that shape can make interfaces harder to understand rather than simpler. Many devices and kernel features need operations that do not fit cleanly into read and write. The file metaphor can also hide important differences in performance and behavior. Treating a disk file, a terminal and a network socket as interchangeable "things you read and write" is convenient at first, but it can encourage code that ignores latency, blocking behavior or error modes that matter a lot for interactive or networked programs. In practice, Unix keeps the common subset of operations uniform, but still exposes many special cases and additional APIs to go beyond that subset.