Gossip Optimization Part 2: Breaking the Hub — Factomize
In my previous blog post on the gossip network, I detailed how the current network has a tendency to form a hub network and how that introduces both inefficiencies and scalability problems. A short recap: when booting up, all nodes connect to the seed nodes, leaving them with a disproportionally vast connection count. This impacts the fanout of messages with the seed nodes receiving a disproportionate amount of messages, the duplicates of which are dropped.
The ideal network structure is every node in the system connected to an equal amount of other nodes. This is made difficult by the fact that nodes are not aware of the network topology.
So how do we go from a hub network to a random network?
Introducing CAT: Cyclic Auto Truncate
The idea behind this algorithm is simple. To start, let’s define some variables:
A round is defined as R minutes and each node is configured to want C connections. There are S seed servers.
- If the node has fewer than S connections, connect to the seed nodes.
- If the node has fewer than C connections, reach out to a random peer and ask for a share. They reply with 2 of their random connections. Repeat until the node has C connections.
- At the start of a new round, drop random connections until there are
C - 2left.
That’s it. Each round, the nodes are forced to drop some of their connections (although this is asynchronous, so other nodes closing their connections means some nodes will already have reached the target), causing the overall network to keep rearranging itself. This dropping of connections is particularly aggressive for seed nodes during the network bootstrapping phase, though that will eventually level out, assuming that once the network is stable, nodes will trickle in and out over time.
It’s very hard to visualize the data for large networks and I wanted something more tangible to measure the performance of a network. I have come up with the following:
This is the statistical mean of how far off nodes are from the desired amount of
Dev = | C - Sum(connection count of all nodes) / T |. Lower is better.
The smallest amount of active connections a node has. Zero means there’s a node that has not yet connected to anything.
The largest amount of active connections a node has. Can’t be higher than
If the overall network is not connected (graph theory), it is disconnected
Just to give you at least some idea of what these networks end up looking like, I want to bring two examples to the table. One with a lower number of nodes to make it clearer, one with a more realistic number of nodes (closer to MainNet). Red nodes are seed nodes, black nodes are regular nodes, green nodes are limited.
For each example, I have three graphs, created using Halfviz. The first graph is the old algorithm, letting Halfviz position the nodes. The second graph is using the CAT algorithm using the same position for the nodes. The third graph is the same results as the second but letting Halfviz re-order the nodes to visualize the changes of “de-stressing” the network.
Example 1: T=32, C=8, S=4, L=0, 16 Rounds
The hub is immediately obvious with each seed node having connections to every other node in the network. Min = 8 (for the 28 black nodes) and Max = 31 (for the 4 seed nodes) with a Deviation of
The same network using CAT means the seed nodes have fewer connections. Min = 8, Max = 11, Deviation
After letting Halfviz reorder the nodes, the hub nodes are free to drift away from the center.
Example 2: T=150, C=16, S=10, L=32, 16 Rounds
The effect of the hub-cluster is much more apparent in a sea of red, with each seed node having almost ten times as many connections. Min = 16, Max = 149 (!!), Deviation
The red is practically gone now that there are around 1300 fewer connections to seed servers. Min = 14, Max = 21, Deviation
The hub-breaking is also much more apparent, with seed nodes all over the map.
Alright, now that we have some sort idea what the end result will look like, let’s take a look at how these networks form.
The most interesting aspect is that regardless of node size, both graphs converge very quickly (within 4 rounds). This is due to the extremely aggressive mechanic of dropping to
C-2 connections every round. This works in the model because bandwidth is not taken into account but in reality, a single node is unlikely to be able to handle 3000+ connections for any length of time. (During peak usage, bandwidth can exceed 1MiBi/s per connection)
A more realistic approach would be to set a maximum of say
2 * C after which it rejects connections:
In the uncapped chart above, the seed nodes peaked at 150 connections but in this graph, the maximum connections are 32. It now takes up to seven rounds to near equilibrium. Further, there is a waiting period (during the rounds that Min=0) where there are some nodes who are waiting for a connection to a seed node opens up, which lasts as long as six rounds. These numbers get worse the more nodes there are, with 5000 nodes reaching equilibrium around round 14 and a wait time of 15 rounds.
To avoid wait times, nodes could transmit a single peer share instead of refusing the connection entirely, thereby giving a waiting node alternatives.
By comparison, the old algorithm stabilizes very quickly as expected, and the maximum is as high as the number of nodes in the network.
Effect of Limited Nodes
At low percentages, the impact of limited nodes is negligible. At around 40% (not counting seeds) of nodes being limited, there’s a noticeable uptrend of deviation. At 50%+ limited nodes, the ability of nodes to connect to each other and share peers is damaged significantly. At 100% (not counting seeds), the topology reverts to an extreme hub structure.
In the worst case scenario, the network still works, though performance will be impacted by the hub, as described in my last blog.
An eclipse attack is when a node is only connected to the real network through nodes controlled by an adversary:
In this example, red nodes are controlled by a malicious entity. The blue
Node 6 is eclipsed by the red nodes and red is in full control of all of the information that blue sends or receives. Red cannot change messages or forge signatures, however.
Let’s assume that a
Node A has C good connections and an attacker wants to eclipse the node. They could set up C nodes and peer all of them to
Node A, giving it C*2 connections. During the next cycle,
Node A would drop random connections until reaching C-2. Half of those would likely be bad connections, so there's another fifty-percent chance that the peer request would be from a bad node, giving it two more. The next cycle, the attacker can stuff another C nodes, preventing regular nodes from connecting. Eventually, an attacker could get 100% of available connections, requiring the use of
2*C attacking nodes.
There are multiple avenues at mitigating this line of attack: to change the dropping algorithm and to modify the choice of which connections to accept.
Instead of dropping random connections, prefer to drop connections in areas that have a dense cluster of addresses. This is similar to how the old peering code selects connections out of the list of known peers. Connections are sorted into C different buckets based on their prefix similarity, then connections are dropped from buckets filled with an above average number of connections first.
The drawback is that this creates a less-than-random set of connections. The full impact of that is outside the scope of this blog.
The benefit would be a dramatically increased resource requirement of the attacker. Nodes launched from a single computer or the same datacenter would be filtered out this way. The attacker would have to have access to a network of well-distributed IP addresses around the globe.
Flood control. During normal operation of the network, nodes don’t join the system in big bursts. If a lot of nodes connect during a single round, that’s usually a sign of something going wrong. We can expect a roughly equal number of nodes leaving and connecting due to the cycling, many of which we have seen before. We can, therefore, prefer the connections of nodes we have previous partnered with in some ratio, like allowing one new connection every X cycles, and otherwise only filling up with nodes requested from peers.
The drawback would be that it’s harder for a legitimate group of nodes to join the network at the same time, in the sense that they would experience an above-average number of connection losses every round.
Don’t allow more than C connections. If a connection arrives when the node already has the desired amount, just send the incoming connection a list of alternate peers from your list of peers and then disconnect it. After that, the specific IP is put on a blocklist and if it connects again within a specific timeframe, it will once again be rejected and the timeframe reset.
The goal is that an attacker is unable to tell when exactly a slot for connection will be free again and they have to rely on good luck or control of a very large number of nodes.
The drawback is that there could be a situation in which a legitimate node is unable to find a valid peer and locked out for that time frame.
Every X rounds, drop connections down to
C-S and grab the S seeds. This would ensure a node will never permanently remain eclipsed, although performance will still be dramatically impacted.
Authority nodes have additional mitigation in the form of special peers. Each authority node has a small set of other nodes hardcoded, giving them a permanent and unbreakable connection to each other with prioritized message transfers.
The models seem to work out favorably but, of course, are not realistic. There are things like random disconnects, nodes joining and leaving at will, etc that are hard to account for in a simple model. The next step will be implementing CAT in the new P2P package itself and running network-based simulations, as well as figuring out good values and security strategies.
A benefit of the hub network is that it’s hard-to-impossible to have a disconnected network, at the cost of running it very inefficiently. Ensuring that the network never fragments is a key challenge of the CAT algorithm. The benefit of the CAT algorithm is that it’s much easier to implement. You just have
C peers and can persist your current connections to disk, rather than having to deal with a complicated partial peer view and selection algorithm.
If all goes well, restructuring the network could prove to be a dramatic boost in overall network bandwidth capacity.
Originally published at https://factomize.com on July 11, 2019.