Making sense of the many IPFS configuration options
Welcome to the next installment of our ongoing Tutorial Series on “Setting up an IPFS Peer”! This time around, we’re going to piece together some of the IPFS repository (repo) configuration options to help us customize our IPFS peer node. While the IPFS devs have come up with a smart set of default options, certain configurations will be more useful in certain contexts, so it’s important to figure out what will work best for your particular application. Do if you haven’t done so already, take a quick spin through the first post in our series to ensure you have a working understanding of IPFS peer nodes. For today’s tutorial, you won’t need a cloud-based peer, but you are welcome to use that one to experiment with.
Putting the pieces together…
The IPFS config file is a JSON-formatted text file, located in your IPFS repo (/data/ipfs/config if you followed our tutorial). It has a number of options for controlling your IPFS repo and daemon, including how your peer is addressed by other peers, what peers it connects to be default (bootstrap peers), how it stores and represents data (files), how it discovers other peers, and a whole lot more. The main components include Addresses, API, Boostrap, Datastore, Discovery, Reprovider, Gateway, Identity, and Swarm. In the remainder of this section, we’ll briefly explain the purpose of each of these entries.
The config file stores a few different address types (Swarm, API, Gateway), each of which use the multiaddr addressing format. These addresses are common to modify/tweak, so make sure you are comfortable with the following concepts.
Swarm addresses are addresses that the local daemon will listen on for connections from other IPFS peers. You should try to ensure that these addresses can be dialed from a separate computer and that there are no firewalls blocking the ports you specify.
The API address is the address that the daemon will serve the HTTP API from. This API is used to control the daemon through the command line (or via curl if you’re feeling adventurous). Unlike the Swarm addresses, you should ensure that the API address is not dialable from outside of your machine, or potentially malicious parties may be able to send commands to your IPFS daemon.
The Gateway address is the address that the daemon will serve the gateway interface from. The gateway may be used to view files through IPFS, and serve static web content. This port may or may not be dialable from outside you machine, that’s entirely up to you. The Gateway address is optional, if you leave it blank, the gateway server will not start.
The Addresses config option also specifies Announce and NoAnnounce array options. Both can be empty. The first specifies the Swarm addresses to announce to the network. If left empty, the daemon will announce inferred swarm addresses (based on your public IP address, open ports, etc). Conversely, the NoAnnounce option specifies the array of swarm addresses not to announce to the network. You might use these options if you want greater control over how your peer-to-peer connections work.
The API config entry is a little bit simpler. It contains information (settings) to be used by the API gateway. Essentially, your daemon is running a lightweight HTTP server that will respond to client (e.g., IPFS commands, curl) requests. The HTTPHeaders sub-entry (currently the only entry under the API config option) is a map of HTTP headers to set on responses from your API HTTP server. You might want to edit these settings if you need to allow additional access control methods, or require authorization headers, etc.
The Bootstrap config array specifies the list of IPFS peers that your daemon will connect to on startup. The default values for this are the ipfs.io bootstrap nodes, which are a set of VPS servers distributed around the world. If you want to run your own private IPFS network, you might want to change this to your own set of IPFS peers, or simply add other peers under your control.
The Datastore config option contains a bunch of information related to the construction and operation of the on-disk storage system — how your repo stores data that you’ve added, pinned, and accessed. Other than the following storage size options, you’re probably going to want to leave this section alone.
Firstly, the StorageMax option is a soft upper limit on the size of your IPFS repository’s datastore. In other words, how much disc space your repo is allowed to consume. Related to this, the StorageGCWatermark option is the percentage of the StorageMax value at which a garbage collection will be triggered automatically if the daemon was run with automatic gc enabled (that option defaults to false currently). The default is currently is 90%. A third related option is the GCPeriod, which is the time duration (default is 1 hour) specifying how frequently to run a garbage collection. Again, this is only used if automatic gc is enabled.
Other options in this section include the BloomFilterSize, which is a number representing the size (in bytes) of the blockstore’s bloom filter. Leaving this at zero disables this feature. So why might you want to turn on your repo’s bloom filter? First, bloom filters are space-efficient probablistic data structures used to test whether an element is a member of a set. In the case of IPFS, a bloom filter can be used to speed-up blockstore lookups (checking for specific hashes). For now, there’s limited data on what value to specify here, though there are tools to help you calculate the optimal size. Unless you know what you’re doing, you can leave this at zero.
The last option in this section is the Spec, which defines the structure of the IPFS datastore. It is a composable structure, where each datastore is represented by a JSON object. Unless you really know what you are doing, you should probably leave this one alone! For more information on possible values for this configuration option, see docs/datastores.md.
The Discovery config option is pretty important. After-all, you want other peers to be able to discover your peer, right? So it is important to properly configure your node discovery mechanisms. By default, multicast DNS peer discovery (MDNS) is turned on. This is useful for enabling peer discovery on your local network. If you are running IPFS on machines with public IPv4 addresses, then you should probably just disable this. Related, the Interval option controls the number of seconds to wait between discovery checks for MDNS.
This brings us to the content Routing mode. How does your peer node actually find and access content. Today, IPFS uses a Kademlia-based distributed hash table (dht option), and continues to learn from DHT research. Essentially a DHT is a decentralized distributed system that provides a (key, value) lookup service, and any participating node can efficiently retrieve the value associated with a given key. In the IPFS world, this means a peer uses the DHT to find peers who have a copy of the file (value) they are looking for via its CID hash (key). The peer can then connect to those peers, and download the file from them directly. See our discussion of the distributed web and content addressing for a more complete discussion of these ideas.
By default, your node will act as a DHT node. This means it will store and serve small bits of data to the network. This is how IPFS distributes content: IPNS records, content provider records (who has what content), peer address records (to map peer IDs to IP addresses), etc. This usually doesn’t take up that much memory. However, constantly answering DHT queries can significantly increase CPU usage. One can set the Routing mode to dhtclient, which doesn’t serve requests to the IPFS network, saving bandwidth. Here, you are essentially not participating in the DHT (i.e., your peer is not a DHT node).
Directly related to Discovery is the Reprovider entry. Here, we can control the time (Interval) between rounds of reproviding local content to the routing system. If unset, it defaults to 12 hours. If we set to the value “0” it will disable content reproviding altogether. Disabling content reproviding will result in other nodes on the network not being able to discover that you have the objects that you have. If you want to have this disabled and keep the network aware of what you have, you will have to manually announce your content periodically. If you leave it enabled (a good idea), then you can choose a Strategy for deciding what should be announced. The Stragety can be one of: "all" (default) which announces all stored data, "pinned", which only announces pinned data, or "roots", which will only announce directly pinned keys and root keys of recursive pins.
Similarly to the API options, Gateway options control the HTTP gateway. Again, we can control the HTTPHeaders to set on gateway responses. By default, an HTTP gateway for IPFS only supports the HTTP GET method. This allows you to fetch a resource by its hash and, if the hash is a directory, by the path from that directory to a named file. If you enable the Writeable flag for a gateway, it gains the ability to understand the HTTP POST, PUT, and DELETE methods. This allows clients to add data to IPFS, but doesn’t trust them with the full daemon API. You can enable this mode by setting Gateway.Writeable to true in the daemon configuration, or by passing the --writeable flag on the daemon's command line. Additionally, the Gateway config entry allows you to specify a url (RootRedirect) to which requests for / will be redirected.
This one’s easy. When you run something like ipfs id, you’ll get output that contains the peer’s Identity information. The two main entries in the config file are PeerID, which is the unique PKI identity label for this config’s peer. This is set on init and never read. It is merely stored in the config for convenience. IPFS will always generate the PeerID from its keypair at runtime. Similarly, the PrivKey, is a base64-encoded protobuf describing (and containing) the node’s private key. This is not something you can change or control.
Finally, we come to the Swarm entry. Options for configuring the swarm include AddrFilters, which is an array of address filters (multiaddr netmasks) to which you wan to filter dials. What does this mean? Basically using this config setting (it is empty by default) you can restrict peer connections to certain IP address ranges. For example, one might want to exclude all IPv4 peers, and all IPv6 link-local peers to avoid some connection issues.
NAT traversal techniques are required for many network applications, such as peer-to-peer file sharing. However, in some locations (e.g. data-centers) you don’t need NAT discovery. You can disable NAT discovery by setting DisableNatPortMap to true. You can also DisableBandwidthMetrics, so that IPFS does not keep track of bandwidth usage. Doing this may lead to a slight performance improvement, as well as a reduction in memory usage. So if you don’t need it, this is a good one to tweak.
Another two Swarm config options that I like to have enabled on all my peer nodes is p2p-circuit relay transport support (set DisableRelay to false), and hop relaying (set EnableRelayHop to true). If EnableRelayHop is enabled, the node will act as an intermediate (Hop Relay) node in relay circuits for connected peers. What does all this mean? Circuit relaying provides peers with the means to indirectly connect other peers who cannot directly connect to each other, either because of NAT or because of protocol incompatibilities, such as browser-based (js-ipfs) peers connecting to desktop (go-ipfs) peers. This is a nice feature to have if you want to be able to relay information for browser-based Dapps.
Other than these Swarm settings, you might also want to tweak your connection manager configuration (Swarm.ConnMgr). For instance, you can adjust your LowWater count, which is the minimum number of peer connections to try to maintain. Similarly, you could adjust your HighWater line, which is the number of connections that, when exceeded, will trigger a connection garbage cleanup operation (i.e., it will drop some connections). Finally, your GracePeriod is a time duration (default is "20s") that new connections are immune from being closed by the connection manager. In low power situations, you might want to kick the GracePeriod up to 1 minute, but drastically reduce the LowWater and HighWater values.
There are a number of other entries in the IPFS config file, including Experimental features, settings to control publishing Ipns records, and even FUSE Mount point configuration options. We won’t go over those options here, as they are more complex than your average IPFS user needs, or are changing frequently (partcularly the Experimental features). For those interested in learning more, we highly recommend you check out the Experimental pubsub features.
And there we have it! We’ve covered pretty much all the critical bases. There’s a lot to unpack here, so feel free to jump back up to a particular section, refer back here later, and generally use this post as a guide for tweaking your peer node. In our next tutorial, we’ll peeking under the hood of IPFS daemon profiles, so you can get a better understanding of how profiles control configuration options behind the scenes.
In the mean time, why not check out some of our other stories, or sign up for our Textile Photos waitlist to see what we’re building with IPFS, or even drop us a line and tell us what cool distributed web projects you’re working on — we’d love to hear about it!