Understanding and protecting against chain analysis

External heuristics

Privacy on Bitcoin

External heuristics

Address reuse
Script similarity and wallet imprints
The Common Input Ownership Heuristic (CIOH)
Off-chain data
Temporal models

The study of external heuristics means analyzing the similarities, patterns, and characteristics of certain elements that are not specific to the transaction itself. In other words, while we previously limited ourselves to exploiting elements intrinsic to the transaction with internal heuristics, we are now broadening our field of analysis to include the transaction's environment, thanks to external heuristics.

Address reuse

This is one of the most well-known heuristics among Bitcoiners. Address reuse enables the establishment of a link between different transactions and UTXOs. It occurs when a Bitcoin receiving address is used several times.

Thus, it is possible to exploit address reuse within the same transaction as an internal heuristic to identify the change (as we saw in the previous chapter). However, address reuse can also be used as an external heuristic to identify the uniqueness of an entity behind multiple transactions.

The interpretation of the reuse of an address is that all UTXOs blocked on that address belong (or have belonged) to the same entity. This heuristic leaves little room for uncertainty. Once identified, the resulting interpretation is likely to correspond to reality. It therefore enables the grouping of different on-chain activities.

As explained in the introduction to part 3, this heuristic was discovered by Satoshi Nakamoto himself. In the White Paper, he mentions a solution to help users avoid generating it, which is simply to use a blank address for each new transaction:

"As an additional firewall, a new key pair could be used for each transaction to keep them unlinked to a common owner."

Source: S. Nakamoto, "Bitcoin: A Peer-to-Peer Electronic Cash System", https://bitcoin.org/bitcoin.pdf, 2009.

For example, here is an address that is reused in several transactions:

bc1qqtmeu0eyvem9a85l3sghuhral8tk0ar7m4a0a0

Source: Mempool.space

Script similarity and wallet imprints

In addition to address reuse, there are many other heuristics that allow you to link actions to the same wallet or address cluster.

Firstly, an analyst can look for similarities in script usage. For example, certain minority scripts, such as multisig, may be easier to spot than SegWit V0 scripts. The larger the group we're hiding in, the harder it is to spot us. This is one of the reasons why, on good Coinjoin protocols, all participants use exactly the same type of script.

More generally, an analyst can also focus on the characteristic fingerprints of a wallet. These are use-specific processes that can be identified with a view to exploiting them as tracing heuristics. In other words, if we observe an accumulation of the same internal characteristics on transactions attributed to the traced entity, we can attempt to identify these same characteristics on other transactions.

For example, we'll be able to identify that the traced user systematically sends his change to P2TR addresses (bc1p...). If this process is repeated, we can use it as a heuristic for the rest of our analysis. We can also use other fingerprints, such as the order of UTXOs, the place of the change in the outputs, RBF (Replace-by-Fee) signaling, or the version number, the nSequence field, and the nLockTime field.

As @LaurentMT points out in Space Kek #19 (a French-language podcast), the usefulness of wallet fingerprints in chain analysis is increasing significantly over time. Indeed, the growing number of script types and the increasingly progressive deployment of these new features by wallet software accentuate the differences. In some cases, it is even possible to identify the exact software used by the entity being tracked. It is therefore important to understand that the study of wallet footprints is particularly relevant for recent transactions, rather than those initiated in the early 2010s.

To sum up, a footprint can be any specific practice, performed automatically by the wallet or manually by the user, that we can identify in other transactions to aid our analysis.

The Common Input Ownership Heuristic (CIOH)

The Common Input Ownership Heuristic (CIOH) is a heuristic that states that when a transaction has multiple inputs, they are all likely to emanate from a single entity. Consequently, their ownership is common.

To apply the CIOH, we first observe a transaction with several inputs. This could be 2 inputs or 30 inputs. Once this characteristic has been identified, we verify whether the transaction aligns with a known transaction model. For example, if there are 5 inputs with roughly the same amount and 5 outputs with exactly the same amount, we'll know that this is the structure of a coinjoin. We won't be able to apply the CIOH.

On the other hand, if the transaction doesn't fit into any known collaborative transaction model, then we can interpret that all inputs are likely to come from the same entity. This can be very useful for extending an already known cluster or continuing a trace.

CIOH was discovered by Satoshi Nakamoto. He talks about it in part 10 of the White Paper:

"[...] linking is inevitable with multi-entry transactions, which necessarily reveal that their entries were held by the same owner. The risk is that if the owner of a key is revealed, the links may reveal other transactions that belonged to the same owner."

It's particularly fascinating to note that Satoshi Nakamoto, even before the official launch of Bitcoin, had already identified the two main privacy vulnerabilities for users, namely CIOH and address reuse. Such foresight is quite remarkable, as these two heuristics remain, even today, the most useful in blockchain analysis.

To give you an example, here is a transaction on which we can probably apply CIOH:

20618e63b6eed056263fa52a2282c8897ab2ee71604c7faccfe748e1a202d712

Source: Mempool.space

Off-chain data

Of course, chain analysis is not limited to on-chain data exclusively. Any data from a previous analysis or available on the Internet can also be used to refine an analysis.

For example, if we observe that traced transactions are systematically broadcast from the same Bitcoin node, and we manage to identify its IP address, we may be able to identify other transactions from the same entity, as well as determine part of the issuer's identity. Although this practice is not easily achievable, as it requires the operation of numerous nodes, it may be employed by some companies specializing in blockchain analysis.

The analyst also has the option of relying on previously open-source analyses or their own previous analyses. Perhaps we'll be able to find an output that points to a cluster of addresses we've already identified. Sometimes, it's also possible to rely on outputs that point to an exchange platform, as the addresses of these companies are generally known.

In the same way, you can perform an analysis by elimination. For example, if analyzing a transaction with two outputs reveals that one of them relates to an address cluster already known, but distinct from the entity being traced, then we can interpret the other output as probably representing the change.

Channel analysis also includes a slightly more general OSINT (Open Source Intelligence) component, involving internet searches. It is for this reason that we advise against publishing addresses directly on social networks or on a website, whether pseudonymous or not.

Temporal models

We think about it less, but certain human behaviors are recognizable on-chain. Perhaps the most useful aspect of an analysis is your sleep pattern. Yes, when you sleep, you don't broadcast Bitcoin transactions. But you generally sleep at roughly the same time. This is why it's common practice to use temporal analysis in blockchain analysis. Quite simply, this is a census of the times at which a given entity's transactions are broadcast to the Bitcoin network. By analyzing these temporal patterns, we can deduce a wealth of information.

First of all, a temporal analysis can sometimes identify the nature of the traced entity. If we observe that transactions are broadcast consistently over a 24-hour period, this will indicate a high level of economic activity. The entity behind these transactions is likely to be a company, potentially international and perhaps with automated in-house procedures.

For example, I recognized this pattern a few months ago when analyzing the transaction that had mistakenly allocated 19 bitcoins in fees. A simple temporal analysis enabled me to hypothesize that we were dealing with an automated service, and therefore likely with a large entity, such as an exchange platform.

Indeed, a few days later, it was discovered that the funds belonged to PayPal via the Paxos exchange platform.

On the contrary, if we observe that the temporal pattern spans 16 specific hours, then we can estimate that we're dealing with an individual user, or perhaps a local company, depending on the volumes exchanged.

Beyond the nature of the entity observed, the temporal pattern can also tell us approximately where the user is located, thanks to time zones. In this way, we can match other transactions and utilize their timestamps as an additional heuristic to enhance our analysis.

For example, on the multi-use address I mentioned earlier, we can see that transactions, both incoming and outgoing, are concentrated on a 13-hour interval.

bc1qqtmeu0eyvem9a85l3sghuhral8tk0ar7m4a0a0

Source: OXT.me

This range probably corresponds to Europe, Africa, or the Middle East. We can therefore assume that the user behind these transactions lives in these areas.

In a different vein, a time analysis of this type also led to the hypothesis that Satoshi Nakamoto was not operating from Japan, but from the USA: The Time Zones of Satoshi Nakamoto

Quiz

btc2043.4

1/5

What does the Common Input Ownership Heuristic (CIOH) entail?