- The Checksum
- Correspondence between the Entropy and the Mnemonic Phrase
- Conversion of the Binary Sequence into a Mnemonic Phrase
- Characteristics of the BIP39 Word List
- Which Length to Choose for Your Mnemonic Phrase?
The mnemonic phrase, also called "seed phrase", "recovery phrase", "secret phrase", or "24-word phrase", is a sequence usually composed of 12 or 24 words, which is generated from entropy. It is used to deterministically derive all the keys of an HD wallet. This means that from this phrase, it is possible to deterministically generate and recreate all the private and public keys of the Bitcoin wallet, and consequently access the funds that are protected with it. The purpose of the mnemonic phrase is to provide a means of backup and recovery of bitcoins that is both secure and easy to use. It was introduced in 2013 with the BIP39 standard.
Let's discover together how to go from entropy to a mnemonic phrase.
The Checksum
To transform entropy into a mnemonic phrase, one must first add a checksum (or "control sum") at the end of the entropy. This checksum is a short sequence of bits that ensures the integrity of the data by verifying that no accidental modification has been introduced.
To calculate the checksum, the SHA256 hash function is applied to the entropy (just once; this is one of the rare cases in Bitcoin where a single SHA256 hash is used instead of a double hash). This operation produces a 256-bit hash. The checksum consists of the first bits of this hash, and its length depends on that of the entropy, according to the following formula:
where represents the length of the entropy in bits, and the length of the checksum in bits.
For example, for an entropy of 256 bits, the first 8 bits of the hash are taken to form the checksum:
Once the checksum is calculated, it is concatenated with the entropy to obtain an extended bit sequence noted ("concatenate" means to put end-to-end).
Correspondence between the Entropy and the Mnemonic Phrase
The number of words in the mnemonic phrase depends on the size of the initial entropy, as illustrated in the following table with:
: the size in bits of the entropy; : the size in bits of the checksum; : the number of words in the final mnemonic phrase.
For example, for a 256-bit entropy, the result is 264 bits and yields a mnemonic phrase of 24 words.
Conversion of the Binary Sequence into a Mnemonic Phrase
The bit sequence is then divided into segments of 11 bits. Each 11-bit segment, once converted to decimal, corresponds to a number between 0 and 2047, which designates the position of a word in a list of 2048 words standardized by BIP39.
For example, for a 128-bit entropy, the checksum is 4 bits, and thus the total sequence measures 132 bits. It is divided into 12 segments of 11 bits (the orange bits designate the checksum):
Each segment is then converted into a decimal number that represents a word in the list. For example, the binary segment
01011010001 is equivalent in decimal to 721. By adding 1 to align with the list's indexing (which starts at 1 and not 0), this gives the word rank 722, which is "focus" in the list.This correspondence is repeated for each of the 12 segments, in order to obtain a 12-word phrase.
Characteristics of the BIP39 Word List
A particularity of the BIP39 word list is that no word shares the same first four letters in the same order with another word. This means that writing down only the first four letters of each word is sufficient to save the mnemonic phrase. This can be interesting for saving space, especially for those who wish to engrave it on a metal support.
This list of 2048 words exists in several languages. These are not simple translations, but distinct words for each language. However, it is strongly recommended to stick to the English version, as versions in other languages are generally not supported by wallet software.
Which Length to Choose for Your Mnemonic Phrase?
To determine the optimal length of your mnemonic phrase, one must consider the actual security it provides. A 12-word phrase ensures 128 bits of security, while a 24-word phrase offers 256 bits.
However, this difference in phrase-level security does not improve the overall security of a Bitcoin wallet, as the private keys derived from this phrase only benefit from 128 bits of security. Indeed, as we have seen previously, Bitcoin private keys are generated from random numbers (or derived from a random source) ranging between and , where represents the order of the generator point of the secp256k1 curve, a number slightly less than . One might therefore think that these private keys offer 256 bits of security. However, their security lies in the difficulty of finding a private key from its associated public key, a difficulty established by the mathematical problem of the discrete logarithm on elliptic curves (ECDLP). To date, the best-known algorithm for solving this problem is Pollard's rho algorithm, which reduces the number of operations needed to break a key to the square root of its size.
For 256-bit keys, such as those used in Bitcoin, Pollard's rho algorithm thus reduces the complexity to operations:
Therefore, it is considered that a private key used in Bitcoin offers 128 bits of security.
As a result, choosing a 24-word phrase does not provide additional protection for the wallet, as 256 bits of security on the phrase is pointless if the derived keys only offer 128 bits of security. To illustrate this principle, it's like having a house with two doors: an old wooden door and a reinforced door. In the event of a burglary, the reinforced door would be of no use, since the intruder would go through the wooden door. This is an analogous situation here.
A 12-word phrase, which also offers 128 bits of security, is therefore currently sufficient to protect your bitcoins against any theft attempt. As long as the digital signature algorithm does not change to use larger keys or to rely on a mathematical problem other than the ECDLP, a 24-word phrase remains superfluous. Moreover, a longer phrase increases the risk of loss during backup: a backup that is twice as short is always easier to manage.
To go further and learn concretely how to manually generate a test mnemonic phrase, I advise you to discover this tutorial:
Before continuing with the derivation of the wallet from this mnemonic phrase, I will introduce you, in the following chapter, to the BIP39 passphrase, as it plays a role in the derivation process, and it is at the same level as the mnemonic phrase.
Quiz
Quiz1/5
cyp2014.3
What is the size in bits of a binary segment used to create each mnemonic word?
