Begin transmission…

Harnessing Range Queries for Decentralized Data

In the realm of decentralized data, where information resides across multiple nodes, we're thinking about ways to make efficient retrieval and analysis possible. Optimized data formats, specifically designed for efficient range queries, are an important building block. Let's explore how these formats can empower decentralized data ecosystems with their unique capabilities, including trustless data access.

The Power of Range Queries

Range queries enable retrieving specific data subsets based on defined criteria without loading the entire dataset, crucial in decentralized settings to address bandwidth and storage costs.

Formats Excelling at Range Queries

Parquet: Columnar formats optimized for range queries due to their column-based storage.
Avro: A row-based format with schema-based indexing for efficient range queries on specific fields.
Zarr: Designed for high-dimensional scientific data with dimensional and attribute-based indexing.
PMTiles: Optimized for efficient tile-based retrieval of raster data.
FlatGeoBuf: Efficiently queries geospatial data with encoded geometries.

Techniques for Remote Range Queries

HTTP Range Requests: Formats like Parquet and ORC are splittable, enabling efficient partial data retrieval.
Virtual File Systems: Tools like sql.js-httpvfs enable SQLite (via here) to perform range queries on remote data using SQL.

Trustless Data Access

To build a system where data can be exchanged in a verifiable way, you can imagine a setup where files are chunked into Merkle trees to provide verifiable data access:

Verifiable Data Blocks: Each data block (e.g., in Parquet or ORC) is a leaf in a Merkle tree, with a unique hash. This structure allows for the generation of a root hash representing the entire dataset.
Efficient Inclusion Proofs: When accessing a data block, the client receives a Merkle proof. This proof, a sequence of hashes, verifies that the block is part of the overall dataset without downloading the entire thing.

Decentralized Data Collaboration

Efficient range queries and proofs facilitate:

Collaborative analysis across nodes: Enabling secure, verifiable data access.
Cost-effective exploration: Reduces bandwidth usage by transferring only relevant data with proof of integrity.
Enhanced trust and privacy: Data remains secure and verifiable on its native node.

As long as you, the user, trust your client, then any queries to remote resources that provide verifiable results can also be trusted (and used in your workflow). This is a great example of local compute over remote data.

Diverse Use Cases

Parquet and ORC: Ideal for analytics and data warehousing.
Avro: Versatile for data exchange and schema evolution.
Zarr: Best for scientific data with multidimensional arrays and time series.
PMTiles: Efficient for raster data in web mapping.
FlatGeoBuf: Suitable for GIS and mapping platforms.

Structured Decentralized Data:

Optimized data formats, range queries, and a system of proofs are key in shaping decentralized data. They could enable efficient, secure, and verifiable data exploration, fostering a collaborative and democratized data landscape.

Example: WeatherXM & device data

WeatherXM is a decentralized network that is powered by community-operated devices optimized for weather data collection. The raw data is processed by the network and then pushed to Textile's data availability network for hot and cold storage retrieval, which makes it available for compute over data and downstream data pipelines. This design lets data consumers use the network’s data in an open, permissionless, and verifiable way.

We built a simple demonstration for how to consume the WeatherXM network data to aggregate data and create map-based visualizations (e.g., precipitation by geography). The example uses Python for data analysis and runs in a simple GitHub Actions workflow on a cron schedule, but you could imagine how more advanced data pipeline implementation could take advantage of similar logic.

Check out the source code and data outputs from GitHub actions / queries: here

New docs Showcase, SQL table examples, & more

by Dan Buchholz

We released a number of new additions to the docs site, including:

A showcase component that displays various partners & a brief on their use case: here
A new "features" page that lists out some of the basics and links to relevant docs pages: here
New SQL walkthroughs for how to set up tables for gaming (state, leaderboard, inventory) and DataDAOs: here and here.
Dynamic cost estimator so you can approximate costs based on current token prices: here
New guide on how to chunk queries to under 35kb in the SDK to help with large query use cases: here
New Local Tableland docs that describe the available methods & usage: here

Also, if you're interested in being added to the project Showcase, be sure to reach out in our Discord or just create a docs issue on GitHub with your projects information! We'll continually add to the Showcase and also have plans to backfill it with past hackathon winners.

Digital Infrastructure Inc. (core developer of DIMO Network) Raised a Series A

by Marla Natoli

Our friends at Digital Infrastructure Inc raised a $11.5M Series A round lead by CoinFund, setting them up to accelerate the development of DIMO apps and hardware and continue to advance the DIMO Network. We're so excited about our ongoing work with DIMO in creating a decentralized vehicle database. Read more in CoinFund's blog post: here

Using web3.py to parse event logs

by Dan Buchholz

We're in the process of creating some demos that showcases how to use Textile Basin. One of these projects is a simple setup where GitHub actions will run a python script on a cron schedule, and the script parses events from a smart contract and continually updates a markdown pages in the repo.

The first step is to import and set up an instance of the Web3 class. This step connects to the Filecoin Calibration testnet (the Basin contract is deployed here):

# This file is at: ./vaults/__main__.py from json import loads from pathlib import Path from typing import Any, Dict, List from web3 import Web3 url = "https://api.calibration.node.glif.io/rpc/v1" w3 = Web3(Web3.HTTPProvider(url))

Then, we need to pass the contract's ABI and deployed address so we can parse the events & logs properly.

def get_create_events( start_block: int, end_block: int ) -> List[Dict[str, Any]]: # Assuming you're storing the abi file in the directory root abi_file = Path(__file__).parent / "abi.json" with open(abi_file, "r") as basin_abi: abi = loads(basin_abi.read()) # The event we're filtering for new_vault_event = "PubCreated" # The address of the contract (strongly typed) address = Web3.to_checksum_address( "0xaB16d51Fa80EaeAF9668CE102a783237A045FC37" ) contract = w3.eth.contract(address=address, abi=abi) # Filter events between a start and end block events = contract.events[new_vault_event].get_logs( fromBlock=start_block, toBlock=end_block ) return events

Note that on most (tesnet) chains, there isn't much of a limitation on how far back you can parse events, but the RPC provider for Filecoin Calibration only lets you go back 720 hours. Also note that web3.py imposes a 60480 block range limit, so for large request windows, you have to make multiple calls to the get_logs method.

Once you call the method, you'll be able to retrieve all of the event information and parse the data thereafter in whatever way you'd like! In this example, PubCreated emits the keccak hash of the vault name (converted to bytes) and the owner’s address.

Digging into interaction design

by Jim Kosem

A lot of interaction design is not just figuring out where a user is going to go, but how they know where they’re at. Much of the practice of designing the user experience is about this, about navigation and sense of where actions sit. This is slightly different when we’re designing for developers. Whilst developers are of course users as well, their tools are slightly different. For the developer experience, navigation is crucial, because the tool is more about management and maintenance and making sure things are where they need to be.

This past week has been a lot about this, making sure things are where they need to be, and the path we’ve taken to get there which is this parallel path of developing and designing and releasing all at the same time and seeing how to reconcile them together. This navigation and organisation design for us has been highly iterative. Unlike much development processes, you can’t really decide all this until you put it all out there

Other updates this week

We’ll be at ETHDenver and have a number of events in the works. Keep an eye out for updates, and if you’ll be there, hop into our Discord and let us know! We’d love to meet up IRL.

End transmission…

Want to dive deeper, ask questions, or just nerd out with us? Jump into our Telegram or Discord—including weekly research office hours or developer office hours. And if you’d like to discuss any of these topics in more detail, comment on the issue over in GitHub!

Are you enjoying Weeknotes? We’d love your feedback—if you fill out a quick survey, we’ll be sure to reach out directly with community initiatives in the future!: Fill out the form here

Weeknotes: Decentralized range queries, new Showcase, web3.py + events/logs, & more

A weekly digest of progress, updates, and insights from Tableland.

Textile