Data Availability: Reviewing proof & storage approaches

Background

Data Availability (”DA”) is a hot topic now that EIP-4844 is live! There are a few major players in the space, and all of them take similar approaches but with a few differences. We'll take a look at EigenDA, Celestia, Avail, and Arbitrum AnyTrust. But first, let's start off with a bit of background information.

DA layers ensure that block data is provably published so that applications and rollups can know what the state of the chain is—but once the data is published, DA layers do not guarantee that historical data will be permanently stored and remain retrievable. DAs either deploy a validity proof or a fraud/fault proof (validity proofs are more common). Data availability sampling (”DAS”) by light clients is also a common feature to ensure data can be recovered. However, the term DA typically refers to simple blocks/transaction data, so it differs from large, arbitrary, long-term data availability and storage.

The following section outlines common terms across DA protocols. Skip if familiar.

To clarify, here's a quick recap of what DAs implement:

Validity proofs: ensure that all data and transactions are valid before they are included onchain via zk-SNARKs/STARKs.
- Computationally intensive but provides strong security guarantees.
Fraud/fault proofs: allow data to be posted onchain before guaranteed valid—and use a challenge period for tx dispute resolution.
- Less computationally intensive but lower security guarantees (i.e., requires the network to actively generate fraud proofs).
KZG commitment scheme: data redundancy via erasure encoding—and correctness thereof without needing a fraud proof.
- E.g., full nodes to prove transaction inclusion to light nodes using succinct proof.
Erasure encoding: reduce per-node storage requirements by splitting data up across many nodes while ensuring the original data can be recovered if lost.
- This involves decreasing an individual node’s storage requirement by increasing the total size of a piece of data (splitting into blocks & adding additional redundancy/erasure encoded blocks).
- Then, distribute the blocks across many nodes. If you need the original data, it should be recoverable by piecing blocks back together from the network—assuming some defined tolerance threshold is held.
Data availability sampling: ensure data availability without requiring nodes to hold the entire dataset; complements erasure encoding to he
lp guarantee data is available.
- I.e., randomly sampled pieces of erasure-coded block data to assure the entire block is available in the network for reconstruction—else, slash nodes.
Data availability committee: a trusted set of nodes—or validators in a DA PoS network—that store full copies of data and publish onchain attestations to prove ownership.

Layer 2s & DA approaches

The following section outlines common L2s and how they approach DA.

According to Avail, there are a few different approaches that L2s take to DA. Note this is in the sense of block/transaction DA and differs from the “arbitrary” / large DA approach Textile focuses on with Basin and object storage:

ZK & optimistic rollups

Post proofs (validity or fraud) onchain along with state commitments.

Plasma: Rollup + offchain DA

All data and computation, except for deposits, withdrawals, and Merkle roots, to be kept offchain.

Optimiums: Optimistic rollups + offchain DA (subclass of Plasma)

Adaptation of Optimistic rollups that also take data availability offchain while using fraud proofs for verification.
- I.e., differs from traditional rollups in that transaction data is entirely in offchain storage.
E.g., Optimism offers a “plasma mode” where data is uploaded to the DA storage layer via plain HTTP calls

Validiums: ZK rollups + offchain DA (subclass of Plasma)

Adaptation of ZK rollups that shift data availability offchain while continuing to use validity proofs.
E.g., Starknet posts a STARK validity proof and also sends a state diff, which represents the changes in the L2 state since the last validity proof was sent (updates/modifications made to the network's state).

Volitions: ZK rollups + DA (i.e., the user picks DA location) Dual-mode operation for either onchain or offchain DA: Opt for zk rollup mode with zk proofs to ensure the integrity and validity of transactions where transaction data is stored onchain. Opt for Validium mode, which stores transaction data offchain, enhancing scalability and throughput while maintaining robust validity proofs Sovereign Rollups: Independent rollups Maintain autonomy over security and data availability models; protocol decides if data availability is either onchain or offchain. There’s no standard approach here. DA solutions overview The following section outlines common DA layers and how they implement their solution. EigenDA Validity proofs + data attestations KZG commitments & Reed-Solomon erasure encoding While other systems such as Celestia and Danksharding (planned) also make use of Reed Solomon encoding, they do so only for the purpose of supporting certain observability properties of Data Availability Sampling (DAS) by light nodes. On the other hand, all incentivized/full nodes of the system download, store, and serve the full system bandwidth. Source EigenDA produces a DA attestation that asserts that a given blob or collection of blobs is available. Have both a liveness & safety threshold Enable consensus about whether a given blob of data is fully within the custody of a set of honest nodes Attestations are anchored to one or more "Quorums," each of which defines a set of EigenLayer stakers that underwrite the security of the attestation. Architecture (see here): Operator: EigenDA full nodes—the service providers of EigenDA—store chunks of blob data for a predefined period and serve these chunks upon request. Disperser: (currently centralized) is responsible for encoding blobs, distributing them to the DA nodes, and aggregating their digital signatures into a DA attestation. Retriever: a service that queries EigenDA operators for blob chunks, verifies that blob chunks are accurate, and reconstructs the original blob for the user Prune data after some predefined period. For reference—EigenLayer vs. EigenDA: EigenLayer serves as a platform connecting stakers and infrastructure developers where stakers have the option to restake their stake and contribute to the security of other infrastructures while earning native ETH rewards (i.e., pool security vs. fragmenting it). AVS stands for an "actively validated service"—software that uses its own set of nodes that are doing something that requires verification/validation (e.g., consensus, DA, TEE, etc.). The core EigenLayer protocol lets people take their ETH, restake it (liquid staking tokens), and then take these tokens and stake to EigenLayer nodes. An EigenLayer operator node can run arbitrary software—i.e., developers can create an AVS of their choice, such as a database, load balancer, log service, etc. An example of an AVS implementation is the EigenDA—i.e., EigenLayer is dogfooding their base protocol. Celestia Fraud proofs + DAS Fraud proofs via Namespaced Merkle Trees & Reed-Solomon erasure encoding Light clients verify headers through DAS—only sample blocks within a 30-day window instead of sampling all blocks from genesis While performing DAS for a block header, every light node queries Celestia Nodes for a number of random data shares from the extended matrix and the corresponding Merkle proofs. If all the queries are successful, then the light node accepts the block header as valid (from a DA perspective). Assumptions: There is a minimum number of light nodes (depends on the block size) that are conducting data availability sampling for a given block size Light nodes assume they are connected to at least one honest full node and can receive fraud proofs for incorrectly erasure-coded blocks. Prune data after 30-day window They recommend L2 rollups that want to use Celestia to implement their own long-term storage plan: here Avail Validity proofs + DAS KZG commitments + RS erasure encoding Light clients access and sample data and ensure correct block encoding and provide data availability guarantees upon the finalization of new blocks Have 3 node types: Full Nodes: download and verify the correctness of blocks but do not partake in the consensus process. Validator Nodes: responsible for generating blocks, deciding on transaction inclusion, and maintaining the order; incentivized through consensus participation Light Clients: (DAS) query full nodes to check KZG polynomial openings against the commitments in the block header for each sampled cell Full nodes prune data after some cutoff period. Arbitrum AnyTrust (variant of Arbitrum Nitro) Data Availability Committees that post Data Availability Certificates (DACert) DACert is a proof that the block's data will be available from at least one honest Committee member until an expiration time. AnyTrust sounds like a volition—the sequencer either posts data blocks on the L1 chain as calldata/blobs, or it posts DACerts DAC members run DA servers with two endpoints: Sequencer API to submit data blocks for storage. REST API that allows data blocks to be fetched by hash. Related resources DA protocol deep dives: https://chronicle.castlecapital.vc/p/deepdive-data-availability Avail DA comparison: https://blog.availproject.org/a-guide-to-selecting-the-right-data-availability-layer/ Avail DA overview: https://docs.availproject.org/docs/the-avail-trinity/avail-da Avail RFP for storage: https://github.com/availproject/avail-uncharted/blob/main/grants/RFPs/RFP-003.md EigenLayer design: https://www.blog.eigenlayer.xyz/ycie/ Ethereum DA basics: https://ethereum.org/en/developers/docs/data-availability Starknet DA overview: https://book.starknet.io/ch03-01-03-data-availability.html#recreating-starknets-state Optimism DA overview: https://specs.optimism.io/experimental/plasma.html#da-storage Vitalik’s DA & Plasma overview: https://vitalik.eth.limo/general/2023/11/14/neoplasma.html Arbitrum AnyTrust DA overview: https://docs.arbitrum.io/inside-anytrust#data-availability-certificates Celestia DA & storage: https://docs.celestia.org/learn/retrievability Fraud & DA proofs via Celestia: https://arxiv.org/abs/1809.09044