Mathematical Foundations of Cryptography 1

Random variables

Modern Cryptography Fundamentals

Random variables

Random variables
Graphing random variables
Uniform variables
Conditional probability

Cryptography relies on mathematics. And if you want to build more than a superficial understanding of cryptography, you need to be comfortable with that mathematics.

This chapter introduces most of the basic mathematics you will encounter in learning cryptography. The topics include random variables, modulo operations, XOR operations, and pseudorandomness. You should master the material in these sections for any non-superficial understanding of cryptography.

The next section deals with number theory, which is much more challenging.

Random variables

A random variable is typically denoted by a non-bold, uppercase letter. So, for instance, we might talk about a random variable

, a random variable

, or a random variable

. This is the notation I will also employ from here on out.

A random variable can take on two or more possible values, each with a certain positive probability. The possible values are listed in the outcome set.

Each time you sample a random variable, you draw a particular value from its outcome set according to the defined probabilities.

Lets turn to a simple example. Suppose a variable X that is defined as follows:

X has the outcome set

It is easy to see that

is a random variable. First, there are two or more possible values that

can take on, namely

and

. Second, each possible value has a positive probability of occurring whenever you sample

, namely

All that a random variable requires is an outcome set with two or more possibilities, where each possibility has a positive probability of occurring upon sampling. In principle, then, a random variable can be defined abstractly, devoid of any context. In this case, you might think of “sampling” as running some natural experiment to determine the value of the random variable.

The variable

above was defined abstractly. You might, thus, think of sampling the variable

above as flipping a fair coin and assigning “2” in the case of heads and “1” in the case of tails. For each sample of

, you flip the coin again.

Alternatively, you might also think of sampling

, as rolling a fair die and assigning “2” in case the die lands

, or

, and assigning “1” in case the die lands

, or

. Each time you sample

, you roll the die again.

Really, any natural experiment that would allow you to define the probabilities of the possible values of

above can be imagined with respect to the drawing.

Frequently, however, random variables are not just introduced abstractly. Instead, the set of possible outcome values has explicit real-world meaning (rather than just as numbers). In addition, these outcome values might be defined against some specific type of experiment (rather than as any natural experiment with those values).

Lets now consider an example of variable

that is not defined abstractly. X is defined as follows in order to determine which of two teams starts a football game:

has the outcome set {red kicks off,blue kicks off}
Flip a particular coin : tails = “red kicks off”; heads = “blue kicks off”

In this case, the outcome set of X is provided with a concrete meaning, namely which team starts in a football game. In addition, the possible outcomes and their associated probabilities are determined by a concrete experiment, namely flipping a particular coin

Within discussions of cryptography, random variables are usually introduced against an outcome set with real-world meaning. It might be the set of all messages that could be encrypted, known as the message space, or the set of all keys the parties using the encryption can choose from, known as the key space.

Random variables in discussions on cryptography are, however, not usually defined against some specific natural experiment, but against any experiment that might yield the right probability distributions.

Random variables can have discrete or continuous probability distributions. Random variables with a discrete probability distribution—that is, discrete random variables—have a finite number of possible outcomes. The random variable

in both examples given so far was discrete.

Continuous random variables can instead take on values in one or more intervals. You might say, for instance, that a random variable, upon sampling, will take on any real value between 0 and 1, and that each real number in this interval is equally likely. Within this interval, there are infinitely possible values.

For cryptographic discussions, you will only need to understand discrete random variables. Any discussion of random variables from here on out should, therefore, be understood as referring to discrete random variables, unless specifically stated otherwise.

Graphing random variables

The possible values and associated probabilities for a random variable can be easily visualized through a graph. For instance, consider the random variable

from the previous section with an outcome set of

, and

and

. We would typically display such a random variable in the form of a bar graph as in Figure 1.

Figure 1: Random variable X

The wide bars in Figure 1 obviously do not mean to suggest that the random variable

is actually continuous. Instead, the bars are made wide in order to be more visually appealing (just a line straight up provides a less intuitive visualization).

Uniform variables

In the expression “random variable,” the term “random” just means “probabilistic”. In other words, it just means that two or more possible outcomes of the variable occur with certain probabilities. These outcomes, however, do not necessarily have to be equally likely (though the term “random” can indeed have that meaning in other contexts).

A uniform variable is a special case of a random variable. It can take on two or more values all with an equal probability. The random variable

depicted in Figure 1 is clearly a uniform variable, as both possible outcomes occur with a probability

. There are, however, many random variables that are not instances of uniform variables.

Consider, for example, the random variable

. It has an outcome set {1, 2, 3, 8, 10} and the following probability distribution:

While two possible outcomes indeed have an equal probability of occurring, namely

and

can also take on certain values with different probabilities than

upon sampling. Hence, while

is indeed a random variable, it is not a uniform variable.

A graphical depiction of

is provided in Figure 2.

Figure 2: Random variable Y

For a final example, consider the random variable Z. It has the outcome set {1,3,7,11,12} and the following probability distribution:

You can see it depicted in Figure 3. The random variable Z is, in contrast to Y, a uniform variable, as all the probabilities for the possible values upon sampling are equal.

Figure 3: Random variable Z

Conditional probability

Suppose that Bob intends to uniformly select a day from the last calendar year. What should we conclude is the probability of the selected day being in Summer?

As long as we think Bob’s process will indeed be truly uniform, we should conclude that there is a 1/4 probability Bob selects a day in Summer. This is the unconditional probability of the randomly selected day being in Summer.

Suppose now that instead of uniformly drawing a calendar day, Bob only selects uniformly from among those days on which the noon temperature at Crystal Lake (New Jersey) was 21 degrees Celcius or higher. Given this additional information, what can we conclude about the probability that Bob will select a day in Summer?

We should really draw a different conclusion than before, even without any further specific information (e.g., the temperature at noon each day last calendar year).

Knowing that Crystal Lake is in New Jersey, we would certainly not expect the temperature at noon to be 21 degrees Celsius or higher in Winter. Instead, it is much more likely to be a warm day in the Spring or Fall, or a day somewhere in the Summer. Hence, knowing the noon temperature at Crystal Lake on the selected day was 21 degrees Celsius or higher, the probability that the day selected by Bob is in the Summer becomes much higher. This is the conditional probability of the randomly selected day being in Summer, given that the noon temperature at Crystal Lake was 21 degrees Celsius or higher.

Unlike in the previous example, the probabilities of two events can also be completely unrelated. In that case, we say that they are independent.

Suppose, for example, that a certain fair coin has landed heads. Given this fact, what, then, is the probability that it will rain tomorrow? The conditional probability in this case should be the same as the unconditional probability that it will rain tomorrow, as a coin flip does not generally have any impact on the weather.

We use a "|" symbol for writing out conditional probability statements. For instance, the probability of event

given that event

has transpired can be written as follows:

So, when two events,

and

, are independent, then:

The condition for independence can be simplified as follows:

A key result in probability theory is known as Bayes Theorem. It basically states that

can be rewritten as follows:

Instead of using conditional probabilities with specific events, we can also look at the conditional probabilities involved with two or more random variables over a set of possible events. Suppose two random variables,

and

. We can denote any possible value for

, and any possible value for

. We might say, then, that two random variables are independent if the following statement holds:

for all

and

Let's be a bit more explicit about what this statement means.

Suppose that the outcome sets for

and

are defined as follows: X =

and Y =

. (It is typical to indicate sets of values by bold-faced, upper-case letters.)

Now suppose you sample

and observe

. The statement above tells us that the probability of now obtaining

from sampling

is exactly the same as if we had never observed

. This is true for any

we could have drawn from our initial sampling of

. Finally, this holds true not just for

. For any

, the probability of occurring is not influenced by the outcome of a sampling of

. All this also applies to the case where

is sampled first.

Let's end our discussion on a slightly more philosophical point. In any real-world situation, the probability of some event is always assessed against a particular set of information. There is no “unconditional probability” in any very strict sense of the word.

For instance, suppose I asked you for the probability that pigs will fly by 2030. Though I give you no further information, you clearly know a lot about the world that can influence your judgment. You have never seen pigs fly. You know that most people will not expect them to fly. You know that they are not really built to fly. And so on.

Hence, when we speak of an “unconditional probability” of some event in a real-world context, that term really can only have meaning if we take it to mean something like “the probability without any further explicit information”. Any understanding of a “conditional probability” should, then, always be understood against some specific piece of information.

I might, for instance, ask you the probability that pigs will fly by 2030, after giving you evidence that some goats in New Zealand have learned to fly after a few years of training. In this case, you will probably adjust your judgment of the probability that pigs will fly by 2030. So the probability that pigs will fly by 2030 is conditional upon this evidence about goats in New Zealand.

Quiz

cyp3023.1

1/5

What is the definition of a random variable in the context of cryptography?