Progress pill
Understanding and protecting against chain analysis

What is Bitcoin chain analysis?

Privacy on Bitcoin

What is Bitcoin chain analysis?

  • Definition and operation
  • Chain analysis objectives
  • Defending yourself against chain analysis
  • Chain analysis methods
  • Satoshi Nakamoto and chain analysis

Definition and operation

Blockchain analysis is the practice of tracing the flow of bitcoins on the blockchain. Generally speaking, chain analysis is based on the observation of characteristics in samples of previous transactions. It then involves identifying these same characteristics in a transaction that we wish to analyze and deducing plausible interpretations from them. This problem-solving method, based on a practical approach to finding a satisfactory solution, is known as a "heuristic."
In layman's terms, there are three main stages in chain analysis:
  1. Observing the blockchain;
  2. The identification of known features;
  3. The deduction of assumptions
Blockchain analysis can be performed by anyone. All you need is access to the blockchain's public information via a full node to observe transaction movements and make hypotheses. There are also free tools that facilitate this analysis, such as OXT.me, which we'll explore in detail in the last two chapters of this section. However, the main risk to confidentiality comes from companies specializing in string analysis. These companies have taken blockchain analysis to an industrial scale and sell their services to financial institutions and governments. Among these companies, Chainalysis is surely the best known.

Chain analysis objectives

One of the aims of blockchain analysis is to group together various activities on Bitcoin in order to determine the uniqueness of the user who carried them out. Subsequently, it will be possible to attempt to link this cluster of activities to a real identity.
Think back to the previous chapter. I explained why Bitcoin's privacy model was originally based on the separation of user identity from transactions. It would therefore be tempting to think that blockchain analysis is useless, since even if we manage to aggregate on-chain activities, we can't associate them with a real identity.
Theoretically, this statement is correct. In the first part of this course, we saw that cryptographic key pairs are used to establish conditions on UTXO. In essence, these key pairs divulge no information about the identity of their holders. So, even if we manage to group together the activities associated with different key pairs, this tells us nothing about the entity behind these activities.
However, the practical reality is far more complex. There are numerous behaviors that can link a real identity to on-chain activity. In analysis, this is referred to as an entry point, and there are numerous them.
The most common is KYC (Know Your Customer). If you withdraw your Bitcoins from a regulated platform to one of your personal receiving addresses, then some people are able to link your identity to that address. More broadly, an entry point can be any form of interaction between your real life and a Bitcoin transaction. For example, if you publish a receiving address on your social networks, this could be an entry point for analysis. If you make a payment in Bitcoins to your baker, he will be able to associate your face (part of your identity) with a Bitcoin address.
These entry points are virtually unavoidable when using Bitcoin. Although we may seek to restrict their scope, they will always be present. That's why it's crucial to combine methods that aim to preserve your privacy. While maintaining a separation between your real identity and your transactions is an interesting approach, it remains insufficient today. Indeed, if all your on-chain activities can be grouped together, then even the smallest entry point is likely to compromise the single layer of confidentiality you've established.

Defending yourself against chain analysis

So we also need to be able to cope with blockchain analysis in our use of Bitcoin. By doing so, we can minimize the aggregation of our activities and limit the impact of an entry point on our privacy.
What better way to counter blockchain analysis than to learn about the methods used in it? To enhance your Bitcoin privacy, it's essential to understand these methods. This will give you a better grasp of techniques such as coinjoin or payjoin (techniques we'll look at in the final parts of the course), and reduce the mistakes you might make.
In this, we can draw an analogy with cryptography and cryptanalysis. A good cryptographer is, first and foremost, a skilled cryptanalyst. To devise a new encryption algorithm, you need to know what attacks it will face and also study why previous algorithms have been broken. The same principle applies to Bitcoin privacy. Understanding blockchain analysis methods is the key to protecting against them. That's why I've included a whole section on chain analysis in this course.

Chain analysis methods

It's important to understand that string analysis is not an exact science. It relies on heuristics derived from previous observations or logical interpretations. These rules enable us to obtain fairly reliable results, albeit not with absolute precision. In other words, chain analysis always involves a dimension of probability in the conclusions reached. For example, it may be possible to estimate with varying degrees of certainty that two addresses belong to the same entity, but total certainty will always be out of reach.
The primary purpose of chain analysis is to aggregate various heuristics and minimize the risk of error. In a way, it's an accumulation of evidence that brings us closer to reality.
These famous heuristics can be grouped into different categories, which we will describe in detail below:
  • Transaction patterns;
  • Transaction-internal heuristics;
  • Heuristics external to the transaction.

Satoshi Nakamoto and chain analysis

The first two chain analysis heuristics were discovered by Satoshi Nakamoto himself. He talks about them in part 10 of Bitcoin's White Paper. They are:
  • cIOH (Common Input Ownership Heuristic);
  • and address reuse.
Source: S. Nakamoto, "Bitcoin: A Peer-to-Peer Electronic Cash System", https://bitcoin.org/bitcoin.pdf, 2009.
We'll see what they are in the following chapters, but it's already interesting to note that these two heuristics still retain a preeminence in chain analysis today.
Quiz
Quiz1/5
Among the following answers, which one is not part of the 3 main categories of heuristics and characteristics used in chain analysis?