Building a simple decentralized version control system with IPFS + Textile

tutorials • Dec 19, 2018

Update: some of the methods below are out of date. For a good overview of the latest and greatest methods, be sure to read the Tour of Textile.

Using Textile Threads & Schemas to make (document) history

When we started building Textile’s Threads and Schema APIs, we had photo backup and sharing in mind. Threads allow private groups to post photos and interact over a decentralized network, maintaining complete control over their own content. But of course, they can facilitate secure sharing, coordination, and storage of many types of data over a decentralized network. So along the way, we started to think about files more generally, and how Textile could provide easier/better access to structured files on IPFS. What we ended up developing is a highly flexible, offline-first, secure, decentralized ‘database’, useful for a range of decentralized data products. To really drive home the flexibility and utility of the frameworks, we’ve started playing around with different use-cases and applications as demos. In today’s demo, we’ll use a Textile Thread and a simple JSON Schema to build a simple document version/history control system.

Our goal is to create a simple history of document updates that can be tracked, rewound, and potentially even forked and merged if we wanted. Think of tracking changes to a blog post, or a collaborative document editing environment (see Next Steps below), or even a micro-blog social media feed. While we’re at it, we’ll make sure our document history works between a set of decentralized/distributed peers. We’ll focus on JSON documents to start, and we’ll demonstrate testing with multiple peers from the command line. Basically, we’re making a decentralized collaborative document feed! The structure of our tutorial will be:

Setup (creating peers to play with)
Define our data structure
Create our (shared) thread
Invite some (well, a) peer(s)
Stream some data via direct p2p connection
Bask in the glory of a successful test

Setup

First, let’s create two peers to test with…

textile wallet init
textile wallet accounts "mnemonic word list ..." -d 2
textile init -s <account-0-seed>
textile init -s <account-1-seed> -r /Users/carson/.tex2/repo

We’ll then edit our second peer’s config file to specify different ports to use for the APIs (I’m using VSCode, you can use whatever text editor you like, or use our new config tool):

code ~/.tex2/repo/textile

For this example, I just incremented the port numbers slightly. This will allow us to refer to the different peers (which I’ll call Peer 1 and Peer 2) by their API urls… more on this in a bit.

    "Addresses": {
        "API": "127.0.0.1:40602",
        "CafeAPI": "127.0.0.1:40602",
        "Gateway": "127.0.0.1:5052"
    }

With those changes in place, we can start Peer 1:

textile daemon

And in a separate terminal, start Peer 2. Notice that I’m specifying a specific repo directory here.

textile daemon --repo-dir=/Users/carson/.tex2/repo/

JSON Schema

Now, let’s create a Thread to use for testing things out. Since we’re going to be sending document updates across the line in a incremental fashion, we’ll create a new Thread with a Schema that supports JSON data. This is pretty simple to setup, requiring only that we use the /json mill, and a few other minor keys for controlling pinning and encryption (we’ll keep things in plaintext for now). For some details on Threads in general, jump on over here. And for a very brief introduction to Schemas, take a look at this previous demo/example before moving on.

{
    "pin": true,
    "plaintext": true,
    "mill": "/json",
    "json_schema": ...
}

Notice the (currently empty) json_schema entry. This is where/how we define the actual expected JSON structure for our Thread data. It is based on the JSON Schema vocabulary for annotating and validating JSON documents. So, our Textile Thread Schema will actually contain the JSON schema that we can use to validate the input JSON data before adding it to a Thread. This is really nice, because it ensures that Textile’s Thread data stays structured, making it easier for application developers to trust that the data added to a Thread conforms to their expected data model.

JSON Patches

Again, since we’re going to be sending document updates, we’ll want a JSON Schema that supports modifying JSON documents. The JSON Patch specification, as outline in RFC 6902, is one such specification:

JSON Patch defines a JSON document structure for expressing a
sequence of operations to apply to a JavaScript Object Notation
(JSON) document…

That’s pretty much perfect for our use-case, though calculating & applying RFC7396 JSON merge patches could also be really useful…

In any case, we’ll need a JSON Schema representation of our JSON Patch spec, which could be a bit of a pain to create… except someone has already done this for us! You can grab it from the JSON schema ‘store’. To make this whole process nice and reproducible, here’s how (in another Terminal) we can grab the schema, and create a Textile Thread schema using the jq library we used in a previous post:

wget http://json.schemastore.org/json-patch -O json-patch.json
jq '{"pin": true, "plaintext": true, "mill": "/json", "json_schema": .}' json-patch.json > patch-schema.json

Create a Thread

With that patch-schema.json file ready to go, create a Thread using the above schema. We’ll keep it --open and call it json-patch:

textile threads add --open --schema=patch-schema.json json-patch

Next, we’ll find our other peer’s id, and invite them to the Thread from Peer 1:

textile peer --api=http://127.0.0.1:40602 # prints peer-id of Peer 2
textile invites create -t <thread-id> -p <peer-id>

We can then use Peer 2 to accept the invite (you can use the same terminal, because we’re specifying the API host directly):

textile invites accept --api=http://127.0.0.1:40602 <invite-id>

The output from your two running daemons should now look something like this (Peer 1):

Textile daemon version v1.0.0
Repo:    /Users/carsonfarmer/.textile/repo
API:     127.0.0.1:40600
Gateway: 127.0.0.1:5050
PeerID:  <peer-one-id>
Account: <account-0-id>
12 Dec 18 18:26 UTC  <peer-one> added JOIN update to thread <blah>
12 Dec 18 18:41 UTC  <peer-two> added JOIN update to thread <blah>
12 Dec 18 18:41 UTC  <peer-two> joined <blah>

and this (Peer 2):

Textile daemon version v1.0.0
Repo: /Users/carsonfarmer/.textile2/repo/
API: 127.0.0.1:40602
Gateway: 127.0.0.1:5052
PeerID: <peer-two-id>
Account: <account-1-id>
12 Dec 18 18:40 UTC <peer-one> invited you to join <blah>
12 Dec 18 18:26 UTC <peer-one> added JOIN update to thread <blah>
12 Dec 18 18:41 UTC <peer-two> added JOIN update to thread <blah>

Subscribing to updates

Now we’ll want to set things up so that Peer 2 and listen for updates to the Thread, and take action accordingly (alternatively Peer 2 could simply periodically check for updates: textile ls -t <thread-id>). In the mean time, Peer 1 will be responsible for pushing document updates to the Thread as JSON Patches. I’ve written a super simple Python script that will allow Peer 2 to subscribe to updates, and apply said updates to a default (empty) JSON doc:

import json
import requests
from jsonpatch import apply_patch

api_base = "http://127.0.0.1:40602/api/v0/"
sub_path = "sub/<thread-id>"
doc = {}

# First, hit the 'streaming' subcription API
with requests.get(api_base + sub_path, stream=True) as lines:
    # For each response from the 
    for line in lines.iter_lines():
        # If there is some data
        if line:
            # Parse and walk the JSON structure
            update = json.loads(line)
            if update["block"]["type"] == "FILES":
                for f in update["info"]["files"]:
                    h = f["file"]["hash"]
                    # Use the ipfs API to grab the actual data
                    r = requests.get(api_base + "ipfs/" + h)
                    # Apply the patch to the doc...
                    apply_patch(doc, r.json(), in_place=True)
                    print(doc)

With that running, Peer 2 will end up with is a series of JSON Patches, which they automatically apply to their local ‘working copy’ of a JSON doc. Here’s the breakdown:

The first few lines are imports (pip install jsonpatch requests first), and setting up the API paths
We are using the sub and ipfs endpoints from our Peer 2 daemon's API
For each response from the sub streaming API, grab the added JSON Patch data and apply it to our local JSON doc, and print out the result

Since the Thread is itself a record of the updates, you could also use the textile ls API to list a range of updates and compute the patched version with those. This would allow you to (re)create the doc at any point in its history! The offset and limit parameters let you control where you start listing from, and how many updates to list per response.

textile ls --offset=<update-id> --limit=100 -t <thread-id>

And here’s a simple Python function that let’s you do this (I didn’t include the limit in here, but you could add it):

def recreate_doc(thread_id, offset=None):
    headers = {
        "X-Textile-Opts": "thread={}".format(thread_id)
    }
    if offset is not None:
        headers["X-Textile-Opts"] += ",offset={}".format(offset)
    b = requests.get(api_base + "files", headers=headers)
    doc = {}
    for f in b.json():
        for f in update["files"]:
            h = f["file"]["hash"]
            r = requests.get(api_base + "ipfs/" + h)
            patch = r.json()
            apply_patch(doc, patch, in_place=True)
    return doc

Alternatively, since you have jsonpatch installed, you could do something like this from the command line:

textile sub -t <thread-id> | jq -r --unbuffered '.info.files[].file.hash' | xargs -L 1 sh -c 'textile cat $0 | jsonpatch doc.json > tmp.json && cp tmp.json doc.json && cat doc.json'

Which pipes all the JSON Patch updates from the subscription API through jq, xargs, and even jsonpatch to keep an up-to-date JSON doc (doc.json) and then cat (see, I told you we’d have cats 😉) it to the console!

Send the updates!

Ok, now it’s finally time to test things out. The easiest thing to do is to send some nicely formatted JSON Patch docs to the Thread from Peer 1:

echo '[{"op":"add", "path":"/name/first", "value":"Carson"}]' > patch.json
textile add patch.json -t <thread-id>

Which will output the updated JSON doc on Peer 2's Python subscription script. We can then send some more patches, and see the doc build up before our eyes:

echo '[{"op":"add", "path":"/name", "value":{}}]' > patch.json
textile add patch.json -t <thread-id>

echo '[{"op":"add", "path":"/name/first", "value":"Carson"}]' > patch.json
textile add patch.json -t <thread-id>

echo '[{"op":"add", "path":"/name/last", "value":"Farmer"}]' > patch.json
textile add patch.json -t <thread-id>

echo '[{"op":"add", "path":"/bio", "value":"Carson is a human"}]' > patch.json
textile add patch.json -t <thread-id>

echo '[{"op":"add", "path":"/website", "value":"https://carsonfarmer.com"}]' > patch.json
textile add patch.json -t <thread-id>

And we can even edit/modify past entries!

echo '[{"op":"replace", "path":"/bio", "value":"Carson is probably a human"}]' > patch.json
textile add patch.json -t <thread-id>

You should output similar to this over on Peer 2’s Python console:

' command, and the script on the right is the first Python script above.

You could also create the diffs directly from two JSON docs, and do it that way (again using a tool installed with jsonpath):

jsondiff original.json modified.json > patch.json
textile add patch.json -t <thread-id>

This is all really just for fun, and shows you some cool examples to get you thinking about Textile and the types of workflows and data pipelines that you can build with Threads, Schemas, and all the tooling that Textile provides. But there is a lot more to explore, build, and design…

Next steps

A simple personal linear version control system is all well and good, but its not going to change how you version and interact with documents. This post doesn’t address things like outdated JSON Patches or similar potential issues, but these could be resolved with a set of JSON Patch Operational Transformations, or a proper CRDT implementation (which we’re working on!). This post also doesn’t tackle collaboration between peers. Threads were designed with this type of application (shared state between multiple group members) in mind, so it would be an obvious extension. In fact, this is just what we’ll do in a future post!

During development of our latest textile-go lib, we found a lot of the concepts that come up when talking about collaborative editing, operational transformations, patching, queuing, etc applied to a wide range of decentralized interactions. So we’ve actually starting adding many of the above ideas and solutions to textile-go already! While roughly laid out in this post, soon we’ll have full blown OT and CRDT capabilities built directly into Textile. With these types of tools in place, the sky’s the limit on what you can start building on top of Textile…

Speaking of which, we hope you enjoyed this quick n’ dirty demo/tutorial. We’re going to continue to publish more of these as we continue to roll out our new APIs and tools, so do let us know what you think! You can reach out over Twitter or Slack, or pull us aside the next time you see us at a conference or event. We’re happy to provide background, thoughts, and details on where we’re taking this work. In the mean time, don’t forget to check out our GitHub repos for code and PRs that showcase our current and old implementations. We try to make sure all our development happens out in the open, so you can see things as they develop. Additionally, don’t miss out on signing up for our waitlist, where you can get early access to Textile Photos, the beautiful mobile interface to Textile’s cross-platform tooling.