Begin transmission…
Replicating data from Farcaster to Textile
by Dan Buchholz
Background
Farcaster is a “sufficiently” decentralized social network that’s modeled off of Twitter/X but rooted in web3. Their monorepo comes packed with additional tooling, including a way to replicate data from the network to a database easily.
Namely, the Farcaster replicator app takes data from Hubs (e.g., “Hubble” is a Hub implementation) and stores it in a Postgres database. This allows developers to query Farcaster data and easily build applications on top of the network's data. You can check out the Farcaster docs for more information on the existing functionality.
For a detailed walkthrough, check out our blog post / tutorial on Mirror: here.
Textile + Farcaster replication
Textile’s vaults
tooling lets developers take data from common web2 environments, such as a Postgres database, and write that data to decentralized storage with almost no additional effort! The vaults
CLI (here) lets you create vaults—web3-native data containers—and stream data from a Postgres database to decentralized storage, retrievable by "event" CIDs which are formatted as Parquet files. Every event is written to both a hot cache layer with TTL (time-to-live) and a cold storage layer. Thus, anyone can extract openly replicated data from a vault and load it into their applications instead of running the replicator app themselves.
Review the source code here, which is a fork of the Farcaster monorepo & replicator app: dtbuchholz/hub-monorepo
In our demonstration, the Textile replication process is handled using shell scripts that execute CLI commands and a custom Docker image. To use the vaults
CLI tool, the wal2json
Postgres extension must be installed/configured, and the vaults
binary must also be installed. You can review dtbuchholz/textile-vaults-farcaster
(here) for the published Docker image, which supports the linux/arm64
architecture and handles these setup steps. The following defines all of the tables that get replicated:
casts
chain_events
fids
fnames
links
messages
reactions
signers
storage_allocations
user_data
username_proofs
verifications
For each of these tables, a vault is created, and then the streaming process begins. The Farcaster replicator app runs a database migration on the Docker Postgres image noted above, so once tables are created, the vault creation and streaming process begins. The pre-vaults check is rather naive in that it just polls the database for the tables to exist:
# Function to check if a table exists in the database table_exists() { local table_name="$1" exists=$(PGPASSWORD="${POSTGRES_PASSWORD}" psql -p 5432 -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" -c "SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_schema = 'public' AND table_name = '${table_name}');" -tA) if [ "$exists" = "t" ]; then return 0 # Table exists else return 1 # Table does not exist fi }
Once the tables are confirmed to exist, the script creates and streams the table data to a vault. The variables defined below either come from a .env
file or are part of looping logic in the script;
vaults create \ --account "$(vaults account address "${private_key_file}")" \ --cache 10080 \ --dburi "postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}" \ --window-size 1800 \ "${TEXTILE_VAULT_NAMESPACE}.${table}" vaults stream \ --private-key "${TEXTILE_PRIVATE_KEY}" \ "${TEXTILE_VAULT_NAMESPACE}.${table}"
For context, here is some background on the flags noted above.
--account
: Sets the account address to use for the vault, which is the public key of the private key you set in the.env
file (provided by thevaults account address
command).--cache
: Sets the TTL in minutes, so10080
is equivalent to 7 days of cached data storage & retrieval.--dburi
: Defines the Postgres database to stream data from.--window-size
: Sets the window size in seconds, which is the period to batch changes before sending them to Textile—in this example, every 30 minutes.
This initiated a streaming process for around 6 hours (before I exited the process on my machine). You can check out the resulting data by looking at the events created at the demo_farcaster_2
account. For example, running the command below with the vaults
CLI will show you all the events that came from the user_data
table:
vaults events --vault demo_farcaster_2.user_data --format json
Which will log the event CIDs:
[ { "cid": "bafkreietgpbkfvdjyqlpwph75eajcp2pix3uw5dajjrfxvzvxv6gti7duy", "timestamp": 1707970383, "is_archived": false, "cache_expiry": "2024-02-22T04:13:08.355842" } ]
Which you can then retrieve locally as parquet files (or CAR files → Parquet, if coming from cold storage):
vaults retrieve bafkreietgpbkfvdjyqlpwph75eajcp2pix3uw5dajjrfxvzvxv6gti7duy
And then using something like DuckDB, you can read the original data from Postgres…which was written directly from the Farcaster Hub backfilling/replication process:
> duckdb SELECT * FROM read_parquet( 'bafkreietgpbkfvdjyqlpwph75eajcp2pix3uw5dajjrfxvzvxv6gti7duy-1707948730199375716.db.parquet' ) LIMIT 1;
Logging the first user data posted on the Farcaster network—an imgur link of the Farcaster logo:
[ { "id": "018da9ad-5b90-8ec2-15e7-b8df0b2a80db", "created_at": "2024-02-14 14:12:07.126457-08", "updated_at": "2024-02-14 14:12:07.126457-08", "timestamp": "2023-03-15 13:09:51-07", "deleted_at": null, "fid": 1, "type": 1, "hash": "481d5e80de1b2803e89c6ad96ab349979a689358", "value": "https://i.imgur.com/I2rEbPF.png" } ]
DePIN Corner: NATIX
by Marla Natoli
NATIX is creating a crowd-sourced camera network to serve countless geospatial data needs. In response to the challenge of preserving privacy when collecting video footage, NATIX points to their AI software, which collects metadata and claims to adhere to the highest grade of privacy compliance, enabling the sharing of really valuable real-time data without the risk of exposing sensitive subjects of the video footage itself.
There are several potential use cases for this data, and NATIX breaks down the types of data serving different use cases: 1. permanent and transient static data such as signage, landmarks & infrastructure (often provided by companies like Mapbox) and 2. Dynamic data such as traffic conditions, available parking spots, etc., which is much more difficult to obtain. They view themselves as well positioned to transform the way this data is created and used, incentivizing a large network of participants to easily share this data.
Of course, to maximize the value of this data, it must be accurate and instantly available for data consumers. Their protocol uses computational proofs delivered alongside the geospatial data to support this.
At Textile, we’re building tools to support the vision of DePINs like NATIX. Whether it’s a need for near real-time data availability for computational proofs, while also archiving that data for longer-term use cases like dispute resolution, or tools that make it easy for networks to share and monetize their data across ecosystems. We envision a future where all the data created by DePINs is verified and easily used and monetized by data consumers, and where the value created by that data is transparently and programmatically distributed to the contributors of the data.
If you’re a DePIN that’s currently thinking about how to make the most of the data your network creates, we’d love to hear from you to better understand your needs.
Set up some time here.
Writing scales, meetings don’t
by Jim Kosem
There are many ways to work, and even more ways to build software. Sometimes this is done with people all in the same place, and sometimes, like it is for Textile, it’s with people physically nowhere near each other. This situation means that not just communication and consensus building need to be different. It means that how we think things together needs to be rethought as well.
There is an old axiom that writing is thinking. This is definitely the case when you’re not in the same place as someone and need another way of thinking out in the open. However, when you are in the same place, to think things out, you have a meeting. You can throw as much software as you want at this, but this is just how it is. People would rather have meetings than write it out. This is because writing is hard. It’s hard to construct a sentence that makes sense to someone when they read it tomorrow or next month. It’s hard to put together a paragraph that says what you’ve done and why it matters. But there is one distinct advantage of putting your thinking, whether alone or with others, out in writing, and that is that writing scales, conversations don’t. A well-written sentence will do the job and keep people doing the right thing better most of the time better than a dozen meetings.
Other updates this week
ETHDenver is almost here! We’re hosting an event that’s all about data, DePINs, compute, AO, and more. Sign up if you’re interested, and we’ll also be open to 1:1 meetings, too: https://lu.ma/proofofdata
We’re participating in the Data Economy hackathon (here) as well as the upcoming Backdrop hackathon (here). Be sure to apply! Our focus is on both Tableland and Textile usage for data-oriented use cases.
End transmission…
Want to dive deeper, ask questions, or just nerd out with us? Jump into our Telegram or Discord—including weekly research office hours or developer office hours. And if you’d like to discuss any of these topics in more detail, comment on the issue over in GitHub!
Are you enjoying Weeknotes? We’d love your feedback—if you fill out a quick survey, we’ll be sure to reach out directly with community initiatives in the future!: Fill out the form here
- Loading comments...