Weeknotes: Basin vaults tracker, DePINs & data availability, & user design thoughts

Explore a python Cookiecutter project for tracking new Basin vaults, dig into DePIN compute challenges & data availability, and learn about our approach to user design.

Begin transmission…

Basin vault tracker + python Cookiecutter

by Dan Buchholz

In our last post, we walked through the basics of web3.py and parsing events. We used this in a demo project to make it easy to find new vaults that are created on the network, and we'll be publishing a deep dive blog post on Mirror for how this works, too. You can check out the source code: here.

The demo uses Cookiecutter, which is an extensible framework for creating python projects—sourcery-ai created a template that was perfect for the use case. It came prepackaged with:

There are also features for Docker images and GitHub Actions (tests, docker publish), but I removed these for the sake of simplicity. From an implementation standpoint, the project uses a custom daily GitHub Action that runs the python module. The logic processes new events seen at the Textile Basin smart contract, stores run information in a state.json file, and then uses that state to also store an aggregate list of vaults that have been created on the network.

Because Filecoin Calibration RPC providers all have limitations on historical state access, the daily action is rather necessary. On top of that, the provider also imposes limits on the range for which you can query the blocks. For example, Ankr does not let you exceed a 2880 block difference in with web3.py's get_logs method. You must implement some sort of chunking logic to take the full range, chop it up, and then execute multiple requests.

Let's walk through a basic example of how to achieve this. To start, the first step is to set up a connection to the chain and define the events we want to parse from the logs (this logic exists in some file like ./my_module/fetch.py):

from json import loads from web3 import Web3 url = "https://rpc.ankr.com/filecoin_testnet" w3 = Web3(Web3.HTTPProvider(url)) def get_contract_create_events(start_block, end_block): # The ABI file is in the root directory abi_file = Path(__file__).parent.parent / "abi.json" with open(abi_file, "r") as basin_abi: abi = loads(basin_abi.read()) # Creation events occur when `PubCreated` is emitted new_vault_event = "PubCreated" # Basin contract on Filecoin Calibration basin_address = Web3.to_checksum_address( "0xaB16d51Fa80EaeAF9668CE102a783237A045FC37" ) # Create a Basin contract connection contract = w3.eth.contract(address=basin_address, abi=abi)

Then, we need to take this a step further and ensure that if that start_block to end_block exceeds 2880, then chunking must occur.

def get_contract_create_events(start_block, end_block): # ... chunks = chunk_block_range(start_block, end_block) events = [] for chunk in chunks: new_events = contract.events[new_vault_event].get_logs( fromBlock=chunk["start_block"], toBlock=chunk["end_block"] ) if new_events: events.append(new_events)

A simple example of how to chunk the queries into multiple calls could use this helper:

def chunk_block_range(start_block, end_block): block_range = end_block - start_block if (block_range) > 2880: start_chunk = start_block end_final = end_block chunks = [] while block_range > 0: end_chunk = start_chunk + 2880 # Block range max is 2880 if end_chunk > end_final: end_chunk = end_final chunks.append({ "start_block": start_chunk, "end_block": end_chunk }) start_chunk = end_chunk + 1 block_range = end_final - start_chunk return chunks

After executing the get_contract_create_events method, you then want to extract the data from the logs. The following logic showcases how to pull out information from the event's args (the Basin contract emits an owner and pub) as well as grabbing the block number from the log.

def get_data_from_events(contract_events): data = [] for events in contract_events: for event in events: args = event["args"] owner = args["owner"] vault = args["pub"] block_num = event["blockNumber"] data.append( { "owner": owner, "vault_hash": vault.hex(), "block_num": block_num } ) return data

Our ensuing blog post will walk through this in a bit more detail, including the GitHub actions setup, state tracking with files, and automating a data summary markdown file with all of these details.

Thinking about DePINs + data availability

by Dan Buchholz

The Textile team has been working closely with teams like WeatherXM, DIMO, and several other DePIN or compute networks. One of the primary challenges we're helping address is "data availability" (DA). For those new to the term, DA is the notion of storing data for short or long-term periods to make it accessible, typically, for compute over data or to provide persistence guarantees.

For example, say a compute network is running a data pipeline that processes DePIN device data. Each node has specific tasks it must execute. In a decentralized compute network's architecture, an output from each task might be some calculation or proof/hash, but the data itself is still important and might need to be accessible by subsequent nodes in the pipeline. In other words, there needs to be a shared data store for writing or retrieving data. The storage requirements for a single node to store and persist all of this data isn't a feasible approach, so DA-focused protocols help fill this gap.

Since each node needs to access the same state, decentralized storage options necessitate the data's availability. There are a couple of ways developers can achieve this with what we're building.

The Tableland protocol is a web3-native SQL database. Smart contracts can write data through onchain SQL actions that then get materialized offchain in the Tableland network database. This is great for permissionless and serverless-like setups, but keep in mind that the size of data is still dependent on EVM chain limitations—like gas limits/costs, transaction throughput, speed, etc. In other words, you won't be writing 100s of MiB or GiB to an EVM chain's Tableland storage within a single transaction.

Thus, Textile Basin was (experimentally) developed to be more flexible and allow for significantly larger datasets. Traditional databases (like Postgres) or raw file uploads can be added to a "vault," a data container owned/signed by an EVM-compatible key pair. Both a hot/cache layer (for fast retrieval) and a cold layer (via Filecoin) ensure the data is made available for immediate computation (data pipeline) or eventual computation (perhaps, dispute resolution or rewards calculations). The size of the data can be significantly larger—e.g., a traditional database replication/streaming process is quite common.

Now, back to the DePIN decentralized compute theme. Both Tableland and Textile Basin can be used as a single source of truth for nodes to access shared state. Our protocols come with access controls that enable these shared state workflows, and the custom storage timelines (hot vs. cold) let the user/network of the Basin protocol determine how exactly they want the DA layer to look for them. Plus, there is consistency from a DX aspect because as data is written to the vault, it's retrievable by the same CID format (bafy…), regardless of the hot vs. cold retrieval request.

Design begins with feedback

by Jim Kosem

Much of product design is knowing that it won’t necessarily ever finish. Finish, that is, in the sense that the non-software world understands the word. When you ship software, as we’re due to with the public release of Tableland Studio which I’ve been working on the user experience and user interface design of over the past couple of months, you releasing a thing that can and invariably will change.

I would argue that when you release it and people begin using it in earnest and anger only then does the real design begin. By that, I mean moving from speculating to problem-solving. Only when you start getting feedback and people’s work and livelihoods depend on what you made will you know what you should start designing. An unreleased bit of software is a sketch. It’s an idea. Designers don’t just come up with ideas, we create solutions, and until we know the real problems, from real-life use, the design begins. Only when you begin to make the hard decisions, which often involve taking out things you thought were really good ideas, do you start solving the problems which is what design is all about.

Other updates this week

  • If you’ll be at ETHDenver, we’ll be hosting a Proof of Data summit that focuses on DePINs, compute networks, AI/ML, and reputation/identity. It hasn’t officially been announced, so we’ll share more details about the speakers and schedule in the coming weeks—but we figured we’d let our readers know first!

End transmission…

Want to dive deeper, ask questions, or just nerd out with us? Jump into our Telegram or Discord—including weekly research office hours or developer office hours. And if you’d like to discuss any of these topics in more detail, comment on the issue over in GitHub!

Are you enjoying Weeknotes? We’d love your feedback—if you fill out a quick survey, we’ll be sure to reach out directly with community initiatives in the future!: Fill out the form here

If you’re looking for deep dives on all things Tableland and Textile, you can also check out our Mirror blogs: here.

Textile Blog & Newsletter logo
Subscribe to Textile Blog & Newsletter and never miss a post.
  • Loading comments...