Progress pill
Mathematical Foundations of Cryptography 1

Pseudorandomness

In our discussion of random and uniform variables, we drew a specific distinction between “random” and “uniform”. That distinction is typically maintained in practice when describing random variables. However, in our current context, this distinction needs to be dropped and “random” and “uniform” are used synonymously. I will explain why at the end of the section.
To start, we can call a binary string of length random (or uniform), if it was the result of sampling a uniform variable which gives each binary string of such a length an equal probability of selection.
Suppose, for instance, the set of all binary strings with length 8: . (It is typical to write an 8-bit string in two quartets, each called a nibble.) Let's call this set of strings .
Per the definition above, we can, then, call a particular binary string of length 8 random (or uniform), if it was the result of sampling a uniform variable that gives each string in an equal probability of selection. Given that the set includes elements, the probability of selection upon sampling would have to be for each string in the set.
A key aspect to the randomness of a binary string is that it is defined with reference to the process by which it was selected. The form of any particular binary string on its own, therefore, reveals nothing about its randomness in selection.
For example, many people intuitively have the idea that a string like could not have been selected randomly. But this is clearly false.
Defining a uniform variable over all the binary strings of length 8, the likelihood of selecting from the set is the same as that of a string such as . Thus, you cannot tell anything about the randomness of a string, just by analyzing the string itself.
We can also speak of random strings without specifically meaning binary strings. We might, for instance, speak of a random hex string . In this case, the string would have been selected at random from the set of all hex strings of length 6. This is equivalent to randomly selecting a binary string of length 24, as each hex digit represents 4 bits.
Typically the expression “a random string”, without qualification, refers to a string randomly selected from the set of all strings with the same length. This is how I have described it above. A string of length can, of course, also be randomly selected from a different set. One, for example, that only constitutes a subset of all the strings of length , or perhaps a set that includes strings of varying length. In those cases, however, we would not refer to it as a “random string”, but rather “a string that is randomly selected from some set S”.
A key concept within cryptography is that of pseudorandomness. A pseudorandom string of length appears as if it was the result of sampling a uniform variable that gives each string in an equal probability of selection. In fact, however, the string is the result of sampling a uniform variable that only defines a probability distribution—not necessarily one with equal probabilities for all possible outcomes—on a subset of . The crucial point here is that no one can really distinguish between samples from and , even if you take many of them.
Suppose, for instance, a random variable . Its outcome set is , this is the set of all binary strings of length 256. This set has elements. Each element has an equal probability of selection, , upon sampling.
In addition, suppose a random variable . Its outcome set only includes binary strings of length 256. It has some probability distribution over those strings, but this distribution is not necessarily uniform.
Suppose that I now took 1000s of samples from and 1000s of samples from and gave the two sets of outcomes to you. I tell you which set of outcomes is associated with which random variable. Next, I take a sample from one of the two random variables. But this time I do not tell you which random variable I sample. If were pseudorandom, then the idea is that your probability of making the right guess as to which random variable I sampled is practically no better than .
Typically, a pseudorandom string of length is produced by randomly selecting a string of size , where is a positive integer, and using it as an input for an expansionary algorithm. This random string of size is known as the seed.
Pseudorandom strings are a key concept to making cryptography practical. Consider, for instance, stream ciphers. With a stream cipher, a randomly selected key is plugged into an expansionary algorithm to produce a much larger pseudorandom string. This pseudorandom string is then combined with the plaintext via an XOR operation to produce a ciphertext.
If we were unable to produce this type of pseudorandom string for a stream cipher, then we would need a key that is as long as the message for its security. This is not a very practical option in most cases.
The notion of pseudorandomness discussed in this section can be defined more formally. It also extends to other contexts. But we need not delve into that discussion here. All you really need to intuitively understand for much of cryptography is the difference between a random and a pseudorandom string. [2]
The reason for dropping the distinction between “random” and “uniform” in our discussion should now also be clear. In practice, everyone uses the term pseudorandom to indicate a string that appears as if it was the result of sampling a uniform variable . Strictly speaking, we should call such a string “pseudo-uniform,” adopting our language from earlier. As the term “pseudo-uniform” is both clunky and not used by anyone, we will not introduce it here for clarity. Instead, we just drop the distinction between “random” and “uniform” in the current context.
Notes
[2] If interested in a more formal exposition on these matters, you can consult Katz and Lindell’s Introduction to Modern Cryptography, esp. chapter 3.
Quiz
Quiz1/5
Why is a pseudorandom string said to be indistinguishable from a truly random string?