Merkle Trees (or hash trees) are data structures which represent multiple individual pieces of data in a way which offers incredibly efficient proofs that a single piece of data exists in the structure without revealing any of the other pieces of data, and only requiring the verifier of the proof to know a fixed-size fingerprint of the tree (which is the length of the output of whatever hash function is being used to create the tree).

If the term ‘Merkle Tree’ brings out suppressed memories of drawing pyramids of meticulously-labeled connected boxes on Post-it notes, you’re probably familiar with a diagram like this:

To quickly cover terminology, a ‘leaf’ is one of the bottom boxes, which contain some representation of source data, and a ‘node’ is any other box. Leaves are also technically nodes—but who cares.

It’s important to note that the top box in the above diagram wouldn’t calculate H(H(AB + CD) + H(EF + GH)), instead it would calculate H(child_{0} + child_{1}), as H(AB + CD) has been stored as child_{0} and H(EF + GH) has been stored as child_{1} of the layer below. For any traditional Merkle Tree with ** n** elements, generation of the full tree requires

*hashes (ignoring the original hashes of the data before the actual tree is constructed, which is another*

**n – 1****n**hashes). The workd required to generate a Merkle Tree scales linearly with the number of leaves.

- Hash file 4
- Concatenate result of 1.) with element (1, 5)
- Hash result of 2.)
- Concatenate result of 3.) with element (2, 3)
- Hash result of 4.)
- Concatenate result of 5.) with element (3, 0)
- Hash result of 6.)
- If the result of 7.) matches the original Merkle Root Jane published in June, file 4 must have existed when she published the original Merkle Root! Otherwise, she either screwed up or tried to lie and thought nobody would actually validate her proof.

**Great, how are they useful?**Aside from an interesting optimization exercise, Merkle Trees have found their way into databases, file systems, cryptocurrencies, and more. They provide a quick way to represent (or fingerprint/summarize) large sets of data in a way where portions of the data can be selectively proven as members. Generally, a hash function is applied to a file, and can be applied at a later point in time. If the digest (or output) doesn’t change, the file hasn’t changed. In a similar fashion, Merkle Trees can make sure a collection of files haven’t changed—and can be used to discover which files, if any, have been modified.