Begin transmission…
Why I love Merkle Mountain Ranges
The Merkle Mountain Range (MMR) architecture is uniquely suited to handle the dynamic nature of a data/event stream-based system. Unlike traditional Merkle trees, MMRs allow for efficient, non-interactive appends. This means that as new data/events are recorded, they can be added to the on-chain commitment without necessitating a complete restructuring of the existing tree.
Each node in an MMR is linked not just to its immediate children but also to nodes across the structure in a way that ensures any changes to the data/events are chronologically coherent and tamper-evident. This property is vital for protocols based on transparency and provenance!
Let's dive into where MMRs come from and how they work.
Background
The generic term for an MMR-based accumulator is a binary numeral tree (which I like much more than MMR, but we’ll mostly stick with MMR for now). Binary numeral trees are a more generic concept that includes things like growable (or history) Merkle trees, Merkle Mountain Ranges, and other such concepts (see also Reyzin and Yakoubov). These are all essentially data structures formed by a functional scan over an ordered set of elements, where the scan operation is a hashing function. The functional scan comes up a lot in programming. They are also all purely functional (and authenticated) data structures, which makes them ideal in blockchain and distributed contexts, and well-suited to processing unbounded streams of data or events.
Binary numeral trees provide both version/history tracking proofs and inclusion proofs. Again, they are an “alternative” form of Merkle trees. While the latter relies on perfectly balanced binary trees, the former can be seen either as a list of perfectly balanced binary trees or a single binary tree that would have been truncated from the top right. A binary numeral tree is strictly append-only (although there are efficient methods to mutate elements in place if required): elements are added from the left to the right, adding a parent as soon as two children exist, filling up the range accordingly. The root of the tree is called a commitment.
In the figure to the left, we have two versions of the same log/stream depicted in subfigures 1 and 2. Proofs between versions can be generated for any (set of) past versions as shown in subfigure 3. Related, inclusion proofs for a given event or set of events can be done using standard Merkle inclusion proof mechanisms.
History trees that leverage content-addressable storage backends also benefit from deduplication, the immutable nature of hash trees, and more.
Structure
This binary numeral tree structure is essentially just a list of Merkle trees (known as eigentrees or simply peaks). When we store only the total leaf count in the MMR along with the list of peaks, the Merkle roots, we have a Merkle Mountain Range Accumulator, an MMRA. You can think of MMRA as a commitment to all the leaves in the trees. In practice, the definition of an MMRA looks something like this:
struct State { peaks: VecLike<Cid>, // Some Vec-like structure leaf_count: u64, }
The above-linked article from the Neptune protocol provides an excellent summary of operations over an MMRA, you should go read it! For completeness, I’ll also provide a relatively simple implementation/discussion here.
Mutating
A Merkle Mountain Range (MMR) can be seen as a collection of Merkle trees. In an MMR, the individual Merkle trees are merged when two trees reach the same size. This merging process involves adding parent nodes as they are formed to the previous peaks/roots of the trees:
Borrowing Figure 4 from the Neptune article, adding node number 19 does not merge any existing Merkle trees but adds a peak (19) to the list of peaks. The orange circles are the old peaks, and the pink circle is the newly added peak. The newly added peak digest is simply the hash digest that was inserted with the append.
Adding node number 20 merges the tree with root in 19 and 20 to a tree with root in 21. Adding node 21 in turn merges the trees with root in 21 and 18 to form the tree with root in 22. The orange circles are the old peaks, and the pink circle is the newly added peak. The hash digest for node 21 is calculated by concatenating and hashing nodes 19 and 20. The hash digest of node 22 (the new peak) is calculated by concatenating and hashing the digests of nodes 18 and 21.
The above illustrates that the new list of peaks can be calculated from the old list of peaks along with the newly inserted leaf. In other words: we do not need the full set of hash digests in the tree to calculate the new peaks. The old peaks function as authentication paths (in reverse order) for any new peaks that need to be calculated. This means a new MMRA formed by adding a new leaf can be calculated when knowing the newly inserted leaf and the previous MMRA.
The relatively simple algorithm looks something like this (in Rust-like pseudo-code):
impl State { pub fn push<S: DeserializeOwned + Serialize>( &mut self, obj: S, ) -> anyhow::Result<()> { let leaf = hash(&obj)?; // Push the new leaf onto the peaks self.peaks.set(self.peaks.count(), leaf)?; // Count trailing ones in binary representation of leaf_count + 1 // This works because adding a leaf fills the next available spot, // & the binary representation of this index will have trailing ones // where merges are required. let mut new_peaks = (!leaf_count).trailing_zeros(); while new_peaks > 0 { // Pop the last two peaks and push their hash let right = self.peaks.delete(self.peaks.count() - 1)?; let left = self.peaks.delete(self.peaks.count() - 1)?; // Push a new peak onto the peaks array self.peaks.set( self.peaks.count(), hash_pair(left.as_ref(), right.as_ref())? )?; new_peaks -= 1; } self.leaf_count += 1; Ok(()) } }
In the above, the hash_pair
function simply computes the hash of a pair of Cids as the hash of a block containing the pair as a tuple-like structure:
use cid::multihash::Code; use cid::multihash::MultihashDigest; use cid::Cid; use fvm_ipld_encoding::{to_vec, DAG_CBOR}; /// Compute the hash of a pair of CIDs. /// The hash is the CID of a new block containing the /// concatenation of the two CIDs. /// We do not include the index of the element(s) because /// incoming data should already be "nonced". fn hash_pair( left: Option<&Cid>, right: Option<&Cid> ) -> anyhow::Result<Cid> { if left.is_none() || right.is_none() { return Err(anyhow::anyhow!("hash_pair requires two CIDs")); } // Encode the CIDs into a binary format let data = to_vec(&[left, right])?; // Compute the CID for the block let mh_code = Code::Blake2b256; let mh = mh_code.digest(&data); let cid = Cid::new_v1(DAG_CBOR, mh); Ok(cid) }
Commitment
Contrarily to a Merkle tree, an MMR generally has no single root by construction so we need a method to compute one (otherwise it would defeat the purpose of using a hash tree). This process is called "bagging the peaks”:
impl State { /// Collect the peaks and combine to compute the root commitment. pub fn bag_peaks(&mut self) -> anyhow::Result<Cid> { let peaks_count = self.peaks.count(); if peaks_count == 0 { return Ok(Cid::default()); } if peaks_count == 1 { return Ok(self.peaks.get(0)?.unwrap().to_owned()); } let mut root = hash_pair( peaks.get(peaks_count - 2)?, peaks.get(peaks_count - 1)? )?; for i in 2..peaks_count { root = hash_pair(peaks.get(peaks_count - 1 - i)?, Some(&root))?; } Ok(root) } }
In practice, this singular root would be the Cid that we post on-chain as our Fingerprint, or root commitment. Here is a simplified version of the previous tree (with 11 elements added), where the dotted edges indicate the links required to form the bagged root (15) in pink:
Implicitly, we can prune any full trees from the tree, which is essentially exactly what the MMRA is doing, keeping storage costs to a minimum.
Elements
The type of the elements themselves is currently unspecified. In practice, this needs only be some type that can be represented as an IPLD object for which a Cid can be computed. See the above hash
pseudo-code for example.
DePIN Corner: W3bstream (IoTeX)
by Marla Natoli & Dan Buchholz
W3bstream is building an open, decentralized infrastructure for computing device data to power verifiable, cryptographic proofs. A key feature required here is data availability — a need for data to be readily available for compute. That’s where Textile comes in with our latest solution called “Repos” (previously Basin). With our solution, data is initially routed to Textile for immediate, high-availability reads, benefiting from our caching layer. Once the data is cached it becomes instantly accessible, which is crucial for tasks like real-time dispute resolution or immediate verification. W3bstream could then capture this data for compute, generating proofs, and connecting it to the blockchain. One simple use case is for verifiable rewards calculations. Using W3bstream and Repos, a DePIN project could make reward calculations more programmatic and transparent.
Do you have a use case that’s relevant to this workflow? Get in touch and we’d love to talk to you about creating a joint POC.
Let’s be mermaids!
When writing up a post on Merkle Mountain Ranges recently, I found myself needing to create lots of binary tree diagrams. My go-to tool for this is generally Mermaid. But with MMRs, we have a somewhat non-standard tree structure, or something more like a bunch of detached subtrees. Mermaid is all that great in this case, because of how its tree layout algorithm works. But after some searching online and a little help from ChatGPT, I came up with a nice solution that was pretty minimal, so I thought I’d share it here. Here’s the bulk of the main example:
graph 15((15)):::root --> 7 15 --> 14 7 --> 3 7 --> 6 14 --> 10 14 --> 13 3 --> 1 3 --> 2 6 --> 4 6 --> 5 10 --> 8 10 --> 9 13 --> 11 13 --> 12 18((18)):::root --> 16 18 --> 17 19((19)):::added 22:::hide ~~~ 18 22 ~~~ 21 21:::hide ~~~ 19 21 ~~~ 20:::hide 23:::hide ~~~ 22 classDef node fill:none,stroke:transparent classDef root fill:#f96 classDef added fill:#f39 classDef hide stroke:transparent,fill:transparent,color:transparent
As you can see, it is pretty much just three trees, each with a height of four. However, the two “trees” on the right, rooted at 18 and 19, are incomplete and mostly contain hidden elements. The ~~~
link indicates an invisible link, and you can define classes for your nodes to control their appearance, including making them invisible! This approach also makes it really easy to create sequences of diagrams, each revealing more or new elements that were previously hidden.
Here’s the same figure, with a few more elements revealed to show what happens when you append to an MMR. Only the link types and classes on the nodes themselves needed to change. Pretty handy. And now we have only two subtrees at roots 15 and 22.
Nice and clean. Mermaid is a fantastic tool, and so simple. Plus so many tools that I use daily, like GitHub, Notion, etc integrate mermaid diagrams by default.
Measuring and learning for user design
by Jim Kosem
For the past week or so I’ve begun digging into measuring design. This isn’t something that a designer typically thinks of. Sure, we talk about evidence a lot, and we assume something is being measured and the right person is going to tell us when something isn’t quite working out as planned or as assumed when it was designed and built, but sometimes part of the design is figuring out how to measure yourself.
This figuring out evidence gathering though should be considered part and parcel of the design process, just as much as doing user research and interviewing and whatever other plethora of exercise we can do in person with people. But figuring out how to measure how your work is being used, in real life, and without someone there observing them is something different. One interesting thing about this is when you learn about things like Real Experience Score (RES) and how to use performance and infer perceptibility and usability through site measuring it gets you thinking, as it should. This should get a designer thinking of how the thing actually loads, what they see, how quickly and how little of it changes, and how we can tweak designs in appreciation of that.
Other updates this week
ETHDenver is around the corner. Sign up for our co-sponsored Proof of Data event to hear from a panel of speakers and deep dives across DePINs, compute, AI/ML, identity, and more! https://lu.ma/proofofdata
End transmission…
Want to dive deeper, ask questions, or just nerd out with us? Jump into our Telegram or Discord—including weekly research office hours or developer office hours. And if you’d like to discuss any of these topics in more detail, comment on the issue over on GitHub!
Are you enjoying Weeknotes? We’d love your feedback—if you fill out a quick survey, we’ll be sure to reach out directly with community initiatives in the future!: Fill out the form here