An introductory look at the cryptography of identity in plain Python
One of the key features of the distributed web is being able to access data by its content, rather than its location — the idea of content vs location addressing. This allows users to access data efficiently from any peer that has the data they want. But this idea doesn’t stop with content. A peer’s location can also change over time (native roaming) and even hop between multiple devices (runtime freedom). This would be impossible with traditional location-based addresses (like IP addresses), where you would reference peers by their network location.
How does this location-agnostic network work exactly? When your IPFS peer node is communicating with other peers on the IPFS network, a few (ok, a lot, but we’ll ignore most of them for now) things are happening behind the scenes. One of the most important things going on, is that peers are identifying each other via their peer ID. This ID provides a unique identity for each peer interaction on the distributed web, so that peers know they are communicating with the right peer.
So where does this unique IPFS peer ID comes from? And for that matter, how is it used? Well today, we’re going to answer these very questions! And we’re going to do this by pulling apart the cryptographic functions that IPFS is using under-the-hood to generate them. This is a pretty technical post, and assumes some proficiency with Python, so if you have any questions, don’t hesitate to leave a comment!
Getting to know your peers
When you initialize a new peer, IPFS uses a public-key (or asymmetric) cryptographic system to generate a pair of keys: a public key which can be shared, and a private key which needs to be kept secret. With this set of keys, an IPFS peer node can perform authentication, where the public key verifies that the peer with the paired private key actually sent a given message, and encryption, where only the peer with the paired private key can decrypt the message encrypted with the corresponding public key. In practice, IPFS uses the widely-used RSA cryptosystem to generate keypairs:
$ ipfs init initializing IPFS node at ~/.ipfs/ generating 2048-bit RSA keypair...done peer identity: Qm...
With the public key in hand, it is now possible to generate a cryptographic ID — your peer identity. We’ll go over the details of this process in a moment. In the mean time, know that a peer’s ID is essentially a cryptographic hash of it’s public key. This ID enables peers to find each other and authenticate themselves once they get connected. This also means connections between peers are encrypted and authenticated by default.
This works because when two peers connect to each other they exchange public keys. In fact, any time two peers connect on IPFS, there are multiple checks to make sure that the peer IDs match the public keys being exchanged. Assuming all the checks goes off without a hitch, communications between peers are then encrypted using the keys they just exchanged. Here’s a great video explaining how public key encryption works. I also highly recommend you check out the libp2p website and this great Twitter thread to learn a bit more about some of these ideas.
Creating a Cryptographic identity
In the case of IPFS, the cryptographic Peer ID is simply the SHA-256 multihash of a peer’s public key. But before we actually use the public key, we need to encode it in some useful way. This is done by converting a Protocol Buffer (Google’s data interchange format) containing a serialized representation of the public key (in RSA DER format) into a base64-encoded string.
Whoa, that’s a mouth-full! If you’re like me, you’d much rather ‘see’ the process in action than read about it, so let’s go through this whole process in code. We’re going to play around with Python in this example, because most of the required libraries are readily available, and are relatively easy to use. For the most part, you should be able to follow along by copying and pasting the commands into a simple Python prompt. Essentially, all we’re going to do is replicate that first step when you init a new IPFS peer. So let’s get started…
We’re only going to use a few Python packages to do this, so let’s just grab them all up front. They can be installed with pip:
$ pip install base64 base58 cryptography pyrobuf
Next, we’ll setup the main imports, this is really just boilerplate code to make sure we have all the functions and classes we need as we move along, you can just copy and paste for now, as well do over each function one at a time once we start using them.
import base58, base64 from pyrobuf_util import to_varint from cryptography.hazmat.primitives.asymmetric.rsa import ( generate_private_key, RSAPublicKeyWithSerialization, RSAPrivateKeyWithSerialization ) from cryptography.hazmat.primitives.hashes import SHA256, Hash from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives.serialization import ( Encoding, PublicFormat, load_der_private_key, load_der_public_key )
Ok, with all of that boilerplate out of the way, its time to get started. But first, we’ll need to download the crypto.proto definition file from the IPFS project, and then compile it into a Python class that allows us to encode and parse the protocol buffer data. Luckily for us, there’s a really nice Python package called pyrobuf for automating this whole process. Run the following from the command-line/terminal:
$ wget https://raw.githubusercontent.com/libp2p/go-libp2p-crypto/master/pb/crypto.proto $ pyrobuf --install crypto.proto
Phew, that was (relatively) easy, we now have a nice Python library that we can import with our protobuf definitions as Python classes. Now we can jump back to our script (or Python REPL) and add the additional imports:
from crypto_proto import PrivateKey, PublicKey, RSA
Now for some cryptography! We’ll need to generate a new RSA keypair with 2048 bytes and a default public exponent:
private_key = generate_private_key( public_exponent=65537, key_size=2048, backend=default_backend() )
What we have just done is generated a new RSA private key. The key_size describes how many bits long the key should be. Larger keys provide more security; currently 1024 and below are considered breakable while 2048 or 4096 are reasonable default key sizes for new keys. Right now, IPFS defaults to 2048. The public_exponent should be set to a default value of 65537.
Let’s go ahead and grab the public key, which we’ll actually use to form the peer’s id:
public_key = private_key.public_key()
b = public_key.public_bytes( encoding=Encoding.DER, format=PublicFormat.SubjectPublicKeyInfo )
Great, now we’ll encode the public key bytes into a protobuf using the classes created for us by the pyrobuf library:
proto = PublicKey() proto.Type = RSA proto.Data = b public_buf = proto.SerializeToString()
Or, you could skip most of the above steps and copy the protobuf encoded public key directly from your existing IPFS peer. Assuming you have IPFS installed and the daemon running, you could enter the following into your terminal and copy the Public Key:
ipfs id --format="Peer Id: <id>\nPublic Key: <pubkey>"
Back in your script, rather than the above previous steps (i.e., generating a new keypair, etc), you would do something like:
public_buf = base64.b64decode(b"...")
Either way, we’ll now compute the SHA256 hash of the protobuf-encoded public key to compute a digest:
h = Hash(SHA256(), backend=default_backend()) h.update(public_buf) digest = h.finalize()
Finally, we’ll compute a multihash of the digest. SHA2–256 is the current IPFS default, and its code is defined in this table. I highly recommend you read up on multiformats and multihashes if you are at all curious about multihashes and where these magic hash function codes come from.
hash_function = 0x12 length = len(digest) multihash = to_varint(hash_function) + to_varint(length) + digest
Finally, we’ll base58-encode the multihash and print it out…
…giving you something that starts with Qm and contains your peer's public key identity. If you imported your existing peer’s public, you might want to check that they match (which they should).
And that’s all folks. You now know pretty much exactly how IPFS creates unique, cryptographic identities for communicating over the distributed web. What IPFS and libp2p do with these IDs is a topic for a future post. In the mean time, why not check out some of our other stories, or sign up for our Textile Photos waitlist to see what we’re building with IPFS. While you’re at it, drop us a line and tell us what cool distributed web projects you’re working on — we’d love to hear about it!