- Definition and Principle of Hashing
- Characteristics of Hash Functions
- Applications of Hash Functions in Bitcoin
The first type of cryptographic algorithms used in Bitcoin encompasses hash functions. They play an essential role at different levels of the protocol, but also within Bitcoin wallets. Let's discover together what a hash function is and what it's used for in Bitcoin.
Definition and Principle of Hashing
Hashing is a process that transforms information of arbitrary length into another piece of information of fixed length through a cryptographic hash function. In other words, a hash function takes an input of any size and converts it into a fixed-size fingerprint, called a "hash".
The hash can also sometimes be referred to as "digest", "condensate", "condensed", or "hashed".
For example, the SHA256 hash function produces a hash of a fixed length of 256 bits. Thus, if we use the input "PlanB", a message of arbitrary length, the generated hash will be the following 256-bit fingerprint:
24f1b93b68026bfc24f5c8265f287b4c940fb1664b0d75053589d7a4f821b688
Characteristics of Hash Functions
These cryptographic hash functions have several essential characteristics that make them particularly useful in the context of Bitcoin and other computer systems:
- Irreversibility (or preimage resistance)
- Tamper resistance (avalanche effect)
- Collision resistance
- Second preimage resistance
1. Irreversibility (preimage resistance):
Irreversibility means that it is easy to calculate the hash from the input information, but the inverse calculation, that is, finding the input from the hash, is practically impossible. This property makes hash functions perfect for creating unique digital fingerprints without compromising the original information. This characteristic is often referred to as a one-way function.
In the given example, obtaining the hash
24f1b9… by knowing the input "PlanB" is simple and quick. However, finding the message "PlanB" by only knowing 24f1b9… is impossible.Therefore, it is impossible to find a preimage for a hash such that , where is a cryptographic hash function.
2. Tamper resistance (avalanche effect)
The second characteristic is tamper resistance, also known as the avalanche effect. This characteristic is observed in a hash function if a small change in the input message results in a radical change in the output hash.
If we go back to our example with the input "PlanB" and the SHA256 function, we have seen that the generated hash is as follows:
24f1b93b68026bfc24f5c8265f287b4c940fb1664b0d75053589d7a4f821b688
If we make a very slight change to the input by using "Planb" this time, then simply changing from an uppercase "B" to a lowercase "b" completely alters the SHA256 output hash:
bb038b4503ac5d90e1205788b00f8f314583c5e22f72bec84b8735ba5a36df3f
This property ensures that even a minor alteration of the original message is immediately detectable, as it does not just change a small part of the hash, but the entire hash. This can be of interest in various fields to verify the integrity of messages, software, or even Bitcoin transactions.
3. Collision Resistance
The third characteristic is collision resistance. A hash function is collision-resistant if it is computationally impossible to find 2 different messages that produce the same hash output from the function. Formally, it is difficult to find two distinct messages and such that:
In reality, it is mathematically inevitable that collisions exist for hash functions, because the size of the inputs can be larger than the size of the outputs. This is known as the Dirichlet drawer principle: if objects are distributed into drawers, with , then at least one drawer will necessarily contain two or more objects. For a hash function, this principle applies because the number of possible messages is (almost) infinite, while the number of possible hashes is finite ( in the case of SHA256).
Thus, this characteristic does not mean that there are no collisions for hash functions, but rather that a good hash function makes the probability of finding a collision negligible. This characteristic, for example, is no longer verified on the SHA-0 and SHA-1 algorithms, predecessors of SHA-2, for which collisions have been found. These functions are therefore now advised against and often considered obsolete.
For a hash function of bits, the collision resistance is of the order of , in accordance with the birthday attack. For example, for SHA256 ( ), the complexity of finding a collision is of the order of attempts. In practical terms, this means that if one passes different messages through the function, one is likely to find a collision.
4. Resistance to Second Preimage
Resistance to second preimage is another important characteristic of hash functions. It states that given a message and its hash , it is computationally infeasible to find another message such that:
Therefore, resistance to second preimage is somewhat similar to collision resistance, except here, the attack is more difficult because the attacker cannot freely choose .
Applications of Hash Functions in Bitcoin
The most used hash function in Bitcoin is SHA256 ("Secure Hash Algorithm 256 bits"). Designed in the early 2000s by the NSA and standardized by the NIST, it produces a 256-bit hash output.
This function is used in many aspects of Bitcoin. At the protocol level, it is involved in the Proof-of-Work mechanism, where it is applied in double hashing to search for a partial collision between the header of a candidate block, created by a miner, and the difficulty target. If this partial collision is found, the candidate block becomes valid and can be added to the blockchain.
SHA256 is also used in the construction of a Merkle tree, which is notably the accumulator used for recording transactions in blocks. This structure is also found in the Utreexo protocol, which allows for reducing the size of the UTXO Set. Additionally, with the introduction of Taproot in 2021, SHA256 is exploited in MAST (Merkelised Alternative Script Tree), which allows revealing only the spending conditions actually used in a script, without disclosing the other possible options. It is also used in the calculation of transaction identifiers, in the transmission of packets over the P2P network, in electronic signatures... Finally, and this is of particular interest in this training, SHA256 is used at the application level for the construction of Bitcoin wallets and the derivation of addresses.
Most of the time, when you come across the use of SHA256 in Bitcoin, it will actually be a double hash SHA256, noted "HASH256", which simply consists of applying SHA256 twice successively:
This practice of double hashing adds an extra layer of security against certain potential attacks, even though a single SHA256 is today considered cryptographically secure.
Another hashing function available in the Script language and used for deriving receiving addresses is the RIPEMD160 function. This function produces a 160-bit hash (thus shorter than SHA256). It is generally combined with SHA256 to form the HASH160 function:
This combination is used to generate shorter hashes, notably in the creation of certain Bitcoin addresses which represent hashes of keys or script hashes, as well as to produce key fingerprints.
Finally, at the application level only, the SHA512 function is sometimes also used, which indirectly plays a role in key derivation for wallets. This function is very similar to SHA256 in its operation; both belong to the same SHA2 family, but SHA512 produces, as its name indicates, a 512-bit hash, compared to 256 bits for SHA256. We will detail its use in the following chapters.
You now know the essential basics about hashing functions for what follows. In the next chapter, I propose to discover in more detail the workings of the function that is at the heart of Bitcoin: SHA256. We will dissect it to understand how it achieves the characteristics we have described here. This next chapter is quite long and technical, but it is not essential to follow the rest of the training. So, if you have difficulty understanding it, do not worry and move directly to the following chapter, which will be much more accessible.
Quiz
Quiz1/5
cyp2012.1
What is the main function of a hash function?