Weeknotes: November 27, 2023

A weekly digest of progress, updates, and insights from Tableland.

Begin transmission…

Polygon gas limit bug fix due to ethersjs

We just published new patch version for the Tableland SDK (v5.1.1) and CLI (v5.4.1). There was an issue on Polygon mainnet and Mumbai that caused all transactions fail due to a gas limit issue. For example, a simple single cell table mutation would fail since the gas limit set by ethers was too low, causing the transaction to consume 99%+ of the limit and lead to an "out of gas" error.

If you're curious, you can see how the logic change was implemented here. It's pretty straightforward. If the chain is Polygon, it'll add a 20% gas. Note the ethers getFeeData() method doesn't seem to affect the gas limit but only the estimated price or fee per gas. This is how the logic works:

  • ethers exports an Overrides type, which lets you set gas settings (and some others).

  • With the ethers estimateGas method, you can see what the default gas limit being used is.

  • Then, you can add a 20% bump to this number and use that as the actual transaction's gas limit, ensuring there's 10-20% extra room in the actual execution so that the transaction succeeds.

  • Below, there's an overrides variable set beforehand as an instance of the Overrides. The estimation is then made for the contract method setController (with its params) before it bumps it by 20%, and then it executes the actual method with the overrides.

const overrides: Overrides = {}; // Set other fields, if needed const gasLimit = await contract.estimateGas.setController( caller, tableId, controller, overrides ); overrides.gasLimit = Math.floor(gasLimit.toNumber() * 1.2); await contract.setController(caller, tableId, controller, overrides);

  • Note: this example is for ethers v5, which is what the SDK uses. For ethers v6, the order of estimateGas and the method name (e.g., setController) would, basically, be switched.

Experimenting with Basin + python + polars for WeatherXM data analysis

by Dan Buchholz

Tableland Basin lets developers use familiar web2 workflows (like Postgres or parquet file uploads) to bring the data into the web3 realm, replicating it to Filecoin for cold storage. (There's a lot that we're adding to Basin to expand the feature set and improve developer experience, too!) WeatherXM, in particular, is using a workflow where they sign & replicate large parquet files on a specific schedule, which contain device data like temperature readings, humidity, wind, etc.

I put together a simple demonstration of how developers can use wxm data in this repo, which calculatres and writes summary metrics to a CSV and markdown file. Here are some notes on what it does and how it was built:

  • The project is written in python and uses polars to analyze wxm data.

  • There's a script that runs Basin CLI commands in a subprocess to get information about the Basin signed data in the xm_data.p1 namespace/publication, including the deal CID.

  • Then, remote CAR files are requested via Web3.Storage (or Piñata) using the polars scan_parquet method.

    • Polars is pretty fast compared to normal pandas DataFrames, and it has a number of parquet-specific features—all Basin-replicated data is created as parquet files (useful in data-oriented use cases).

    • There's a "lazy" and "streaming" feature that significantly improves how fast a dataframe and queries will run.

  • Once the polars DataFrame is created, range queries are executed for averaging across all columns as well as a few aggregates (like total precipitations and number of unique devices). That is, queries across a specific historical time range.

  • After the analysis is complete, it writes the summary metrics to files, where a GitHub action runs on a schedule to automatically run the analysis.

The demo is relatively complete for anyone to run on their own. A few cleanups are still needed, but it's fully functioning. For example, each file that wxm uploads is 200-250MB, and there are (currently) 5 total files. Since wxm will be continuously pushing more data, the script needs to account for this because it was consume too much memory, particularly, in the GitHub action with limited memory constraints. Also, IPFS gateways can be pretty slow, so downloading the files locally and then cleaning them up after the dataframe is created might improve the setup time.

Getting Basin deals

Basin is still early in development and doesn't have an HTTP API—yet (it's coming)! In order to dynamically fetch publication information and retrieve the data, some workaround with subprocesses were used:

# Get publications for address 0xfc7C55c4A9e30A4e23f0e48bd5C1e4a865dA06C5 (the wxm publication signer) command = ["basin", "publication", "list", "--address", address] out = result.stdout # Capture the output from the CLI command # Get all deals for `xm_data.p1` publication command = ["basin", "publication", "deals", "--publication", pub, "--format", "json",]

Once the API is launched, it'll make this fetching and retrieval processing significantly more streamlined.

Lazy DataFrames & streaming

One interesting learning was how using a polars LazyFrame and streaming option impacted performance. Here's a quick snippet of the setup; the remote_files is simply an array of URLs to IPFS gateways that point to each parquet file under the xm_data.p1 publication:

lazy = pl.scan_parquet(source=remote_files, cache=True, retries=3) df = lazy.collect(streaming=True)

The scan_parquet method is what creates a LazyFrame, which allows for whole-query optimisation in addition to parallelism. Once this is created, the .collect() method will created a pandas-style DataFrame; the streaming option (run parts of the query in a streaming fashion). Prior to this setup, I was simply doing a on the first line, but when moving to streaming, it cut the time to create a DataFrame by over 50%. Quite an improvement! And for what it's worth, it was my first time using polars, so there are definitely more optimization that could be implemented. For example, the usage of or using read_parquet might make more sense with how the queries are implemented.

User design & prototyping

by Jim Kosem

This past week as we begin designing and implementing findings from our most recent user research, we started talking about accessibility. As we move from prototypes to products, we also need to move conceptually from usable to accessible. Accessibility often means standards and guidelines (for instance WCAG2.2 which we will likely be targeting), but in reality can mean much more. In the past I've worked with quite a wide array of different user groups, and some with large access needs, for instance blind users.

One thing working trying to establish accessibility practice in many organisations is that it's not so much box ticking as approach. Usability is accessibility. While there is the need to run automated tests and include the libraries that do a lot of the heavy lifting for you in front-end, a lot of it is design, meaning what you put where and making it work for the largest amount of people you can. Often this mean simplifying content structures and interaction patterns. Many thinks this means making things boring, but in reality you're making it approachable, easier to understand and thus something users will not hesitate to come back to again and again.


Other updates this week

  • A few of us are at an event in New York City all week, so if you’re interested in meeting up IRL, give us a shout in Discord!

End transmission…

Want to dive deeper, ask questions, or just nerd out with us? Jump into our Discord for weekly research office hours. Or, for hands-on technical support, you can join our weekly developer office hours.

And if you’d like to discuss any of these topics in more detail, comment on the issue over in GitHub!

Textile Blog & Newsletter logo
Subscribe to Textile Blog & Newsletter and never miss a post.
  • Loading comments...