Understanding and protecting against chain analysis

Internal heuristics

Privacy on Bitcoin

Internal heuristics

Internal similarities
Round number payments
The largest output

An internal heuristic is a specific characteristic that we identify within a transaction itself, without needing to examine its environment, and which enables us to make deductions. Unlike patterns, which focus on the overall structure of the transaction at a high level, internal heuristics are based on the set of extractable data. This includes:

The amounts of the various UTXOs in and out;
Everything to do with scripts: reception addresses, versioning, locktimes..

Generally speaking, this type of heuristic enables us to identify changes in a specific transaction. By doing so, we can then track an entity across multiple transactions. Indeed, if we identify a UTXO belonging to a user we wish to track, it's crucial to determine, when they carry out a transaction, which output has been transferred to another user and which output represents the change, which thus remains in their possession.

Once again, let me remind you that these heuristics are not absolutely precise. Taken individually, they only enable us to identify likely scenarios. It's the accumulation of several heuristics that helps to reduce uncertainty, without ever being able to eliminate it completely.

Internal similarities

This heuristic involves examining the similarities between the inputs and outputs of the same transaction. If the same characteristic is observed on the inputs and on just one of the transaction's outputs, then it is likely that this output constitutes the change.

The most obvious feature is the reuse of a receiving address in the same transaction.

This heuristic leaves little room for doubt. Unless he's had his private key hacked, the same receiving address necessarily reveals the activity of a single user. The resulting interpretation is that the transaction change is the output with the same address as the input. We can then continue to trace the individual from this change.

For example, here is a transaction on which this heuristic can probably be applied:

54364146665bfc453a55eae4bfb8fdf7c721d02cb96aadc480c8b16bdeb8d6d0

Source: Mempool.space

These similarities between inputs and outputs don't stop at address reuse. Any similarity in the use of scripts can be used to apply a heuristic. For example, we can sometimes observe the same versioning between the input and one of the transaction outputs.

On this diagram, we can see that input n° 0 unlocks a P2WPKH script (SegWit V0 starting with bc1q). Output n° 0 uses the same type of script. Output n° 1, on the other hand, uses a P2TR script (SegWit V1 beginning with bc1p). The interpretation of this feature is that the address with the same versioning as the input is likely the change address. It would therefore always belong to the same user.

Here is a transaction on which this heuristic can probably be applied:

db07516288771ce5d0a06b275962ec4af1b74500739f168e5800cbcb0e9dd578

Source: Mempool.space

On the latter, we can see that input no. 0 and output no. 1 use P2WPKH scripts (SegWit V0), while output no. 0 uses a different P2PKH script (Legacy).

In the early 2010s, this heuristic based on script versioning was relatively unhelpful due to the limited types of scripts available at the time. However, over time and with successive Bitcoin updates, an increasing diversity of script types has been introduced. This heuristic is therefore becoming increasingly relevant, as with a wider range of script types, users divide into smaller groups, thus increasing the chances of applying this internal versioning reuse heuristic. For this reason, from a confidentiality perspective only, it's advisable to opt for the most common type of script. For example, as I write these lines, Taproot scripts (bc1p) are less frequently used than SegWit V0 scripts (bc1q). Although the former offer economic and confidentiality benefits in certain specific contexts, for more traditional single-signature uses, it may make sense to stick with an older standard for confidentiality reasons until the new standard is more widely adopted.

Round number payments

Another internal heuristic that can help us identify the change is the round number heuristic. Generally speaking, when faced with a simple payment pattern (1 input and 2 outputs), if one of the outputs spends a round amount, then this represents the payment.

By elimination, if one output represents payment, the other represents change. It can therefore be inferred that the input user is likely always in possession of the output identified as the change.

It should be stressed that this heuristic is not always applicable, since the majority of payments are still made in fiduciary units of account. Indeed, when a retailer in France accepts bitcoin, he will generally not display stable prices in sats. Instead, he will opt for a conversion between the price in euros and the amount in bitcoins to be paid. There should therefore be no round numbers at the end of the transaction.

Nevertheless, an analyst could attempt to make this conversion, taking into account the exchange rate in effect at the time the transaction was broadcast on the network. Let's take the example of a transaction with an input of 97,552 sats and two outputs, one of 31,085 sats and the other of 64,152 sats. At first glance, this transaction does not appear to involve round amounts. However, by applying the exchange rate of €64,339 at the time of the transaction, we obtain a conversion into euros as follows:

An input of €62.76;
An output of €20;
An output of €41.27.

Once converted into fiat currency, this transaction can be used to apply the round amount payment heuristic. The €20 output probably went to a merchant, or at least changed ownership. By deduction, the €41.27 output is likely to have remained in the original user's possession.

If, one day, bitcoin becomes the preferred unit of account in our exchanges, this heuristic could become even more useful for analysis.

For example, here is a transaction on which this heuristic can probably be applied:

2bcb42fab7fba17ac1b176060e7d7d7730a7b807d470815f5034d52e96d2828a

Source: Mempool.space

The largest output

When we identify a sufficiently large gap between 2 transaction outputs on a simple payment model, we can estimate that the largest output is likely to be the change.

This largest output heuristic is surely the most imprecise of all. On its own, it's pretty weak. However, this feature can be combined with other heuristics to reduce the uncertainty of our interpretation.

For example, if we're looking at a transaction with a round payment and a larger payment, applying the round payment heuristic and the larger payment heuristic together reduces our level of uncertainty.

For example, here is a transaction on which this heuristic can probably be applied:

b79d8f8e4756d34bbb26c659ab88314c220834c7a8b781c047a3916b56d14dcf

Source: Mempool.space

Quiz

btc2043.3

1/5

If in a simple payment transaction, it is observed that the same type of script is used in the input and on only one of the 2 outputs, which output likely represents the change?