Progress pill
User level

Unix shell, core utils and environment variables

System Programming Fundamentals

Unix shell, core utils and environment variables

  • Core utils
  • Environment variables
Most of what people think of as "using Unix" happens in user space: you log in, you get a shell, and you start running small programs that read input and print output. The shell is the program that reads your command line, starts programs, and connects them. It's just another program, but it sits in a special position: it is usually the first thing you interact with after login, and it is responsible for starting most of the other programs you run.
You’ll often see the word tty. It comes from teletypewriter. Early terminals were literally typewriter-like machines that printed output on paper. Later they became video terminals, but Unix kept the name.
One of the big ideas in Unix userland is composability: instead of a few huge programs that do everything, you have many small tools that each do one job, and you connect them. This works because tools tend to follow a simple contract: read bytes from standard input, write bytes to standard output, and write errors to standard error. If programs behave like that, you can chain them together without the programs needing to know about each other. And since in Unix (almost) "everything is a file", almost all data can be treated as a stream of text, flowing from one program to another, and being modified in the process. Text is the universal interface that Unix programs can use to interoperate with each other.
Pipes are the shell feature that makes this feel natural. A pipe connects the standard output of one program to the standard input of another. If you write A | B, the shell starts both programs and creates a pipe between them so that whatever A prints becomes what B reads. This does not require "special support" inside A or B: they just read from input and write to output like normal.
You can also redirect these data flows outside the terminal. Instead of sending output to the terminal, you can send it to a file with >, or append with >>. You can feed a file into a program with <. And because errors are usually separated from normal output, you can redirect standard error specifically (commonly with 2>). All of this is just the shell rearranging the program’s standard file descriptors before it starts the program.

Core utils

The Unix core utils are the small everyday programs that make this style practical. Different systems package them differently, but the common theme is simple text-oriented tools you can combine. Common examples include:
  • ls (list directory contents)
  • cp, mv, rm (copy, move, remove files)
  • mkdir, rmdir (create/remove directories)
  • cat (print a file), less (view text page by page)
  • head, tail (take the beginning/end of a stream)
  • grep (search for lines that match a pattern)
  • find (walk directories and select files)
  • sort, uniq (reorder lines and remove duplicates)
  • wc (count lines/words/bytes)
  • sed, awk (stream text transforms, more programmable)
  • xargs (turn input lines into command arguments)
  • tar (pack/unpack archive files)
  • ps, top (inspect running processes)
  • kill (send a signal to a process)
On many Linux desktops and servers, these utilities come from the GNU project (GNU coreutils, GNU grep, GNU sed, and so on). They tend to be feature-rich and consistent across full Linux distributions. BusyBox is a different approach: it provides many of the same command names, but implemented in a single small executable, aimed at minimal systems.

Environment variables

An environment variable is a variable that contains a string and is attached to a running process. The shell sets some of them for you, and you can set your own. When a process starts another process, the child usually inherits a copy of the parent’s environment. That makes environment variables a simple way to pass shared configuration around without rewriting every command line.
Some environment variables you’ll see often are:
  • HOME: your home directory path
  • USER: your username
  • SHELL: the path of your login shell
  • PWD: the current working directory (what you get when you run pwd)
  • LANG: language/locale settings (text encoding, sorting rules, messages)
  • EDITOR: which text editor a program should launch when it needs one
  • TMPDIR: where temporary files should go (if set)
  • PATH: where to search for executable commands
PATH is a list of directories (separated by colons), that the shell searches when you type a command. If you type ls, the shell checks each directory in PATH, in order, until it finds one containing an executable file named "ls".
If the command is not a shell built-in, the shell searches the directories listed in PATH, in order, looking for an executable file with that name. The first match wins.For example, if PATH is something like:
/usr/local/bin:/usr/bin:/bin
and you type:
mytool
the shell will check:
/usr/local/bin/mytool /usr/bin/mytool /bin/mytool
and run the first one that exists and is executable. The order matters: if two different directories contain a program with the same name, the one that appears earlier in PATH is the one you’ll run.
If you type a command with a slash in it, PATH is not used. For example, ./script means "run the file named script in the current directory". /usr/bin/python means "run exactly the file Python that you will find in /usr/bin".