Experimental Improvements for Intercoin Consensus

This thread builds on the Overview and Implications of Intercoin’s Consensus. Before commenting here, I recommend you read that post.

Suppose we have the following assumptions about a consensus group:

  1. Notaries have to sign their messages
  2. Nodes are strongly connected

Then what are the threat models for a group that’s relatively small, say 10-30 notaries? We know from this 2009 paper by Jaap-Henk Hoepman that the probability of a double-spend is small. We also know that if a coin is relatively low in value ($1, say), then no one is going through all the trouble trying to double-spend it. And if notaries for coins are distributed evenly the network, then double-spending $100 or $100,000 will require the attacker to subvert a proportional amount of notaries, which becomes more and more infeasible. It seems that, in practical terms, the network is secure.

Moreover, the notaries introduce a checkpoint into the ledger every time a consensus is reached about the latest owner of a coin. This is similar to the Ripple Consensus Process where every new notary has to agree on the previous history or get instantly ignored. It’s also similar to MaidSAFE’s architecture with new vaults joining a section.

While there isn’t much notaries can do to corrupt the consensus, they can act to prevent further progress indefinitely. Some of the coins would become un-spendable until the situation is fixed. It seems that fixing would require “escalation” to larger and larger sections of the network, that would need to store an ever-growing Merkle tree of corrupted nodes (the “liars list”), along with proofs of their signed corrupted messages (conflicting messages, or gibberish, etc.) Perhaps the network can cover the cost of storing this Merkle tree by confiscating the staked Intercoins of the malicious notaries as a bounty. But, that’s a fixed amount, so it may not be enough to cover the storing of this “liars list” in perpetuity. Alternative token-economic approaches should be explored. (Although I should point out that the liars-list is the only singleton in the entire network that needs to be stored by every node, and thus Intercoin notaries would already be orders of magnitude more efficient and scalable than Bitcoin / Ethereum full nodes to operate.).

Recovery from Attacks

However, the question is how do we recover from a gradual infiltration by malicious nodes? This can happen via large-scale attack where notaries start being infected by a virus. Or it can happen as multiple notaries start flooding a permissionless network, asking to join. There has to be a vetting process, similar to how countries or co-ops throttle new immigrants and make them jump through hoops to get citizenship. But a determined attacker can send in sleeper notaries that behave well, until one day when they collude to destroy the network.

Let’s assume this is an ongoing process. Can the consensus groups identify and ignore these malicious nodes? According to Wikipedia, it is possible:

A second solution requires unforgeable message signatures. For security-critical systems, digital signatures (in modern computer systems, this may be achieved in practice using public-key cryptography) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals.

However, where has such a scheme succeeded in practice? @joelkatz has said in the past he has never seen a successful implementation. Ripple uses humans to manually prune their UNLs and he said he doesn’t know of any way to automate this.

While this may work for a global blockchain maintained by Ripple and its partners, we need something that can be autonomous and self-healing, for communities to use. In addition, we want to explore the possibility of small sub-networks operating without being connected to the Internet.

There are impossibility results showing that this may be impossible to guarantee 100% in an async environment where notaries can disconnect (or pretend to disconnect, to prevent further progress). However, we may be able to design a system that works 99% of the time, and eventually makes progress. Even Ripple faces a similar phenomenon from time to time, which is related to Buridan’s principle. (As a side note, I’d be curious to see any rigorous proofs that Ripple’s consensus process is byzantine-fault-tolerant. As a point of comparison, RAFT isn’t, but some derivatives are.)

The Vision (rigorous analysis needed)

Imagine people in a community with bad internet connectivity or high latency, perhaps on the Moon. They start out paying each other with trustlines, no third party needed. Some people would accept this “debt money” for small transactions, without worrying about how many others the sender might owe. Others would require a third party ledger.

This third party ledger could consist of Intercoin software running on 1 notary on 1 computer in someone’s basement. As long as the computer is honest, there is no need to replace it. Users can then provision other computers (e.g. their personal phones, or notaries) to check this computer once in a while. For example, they can “anonymously” ask it the hash of the latest state of a coin, and receive signed responses. If the notary ever issues conflicting accounts, they can gossip this to everyone, and push them to change the notary.

Here’s the thing… the result of that gossip must be stored on more computers, than there were previously, thereby “escalating” the cost of securing that consensus group. Perhaps we may go from 1 notary to 2. To simplify things, let’s say those 2 notaries were watching the 1 notary the entire time, eager to spring into action and “steal business” from it by earning transaction fees.

(I should mention that, since each coin has a capped value, transferring large amounts will require proportional transaction costs, making Intercoin closer to payment systems like VISA, than Bitcoin / Ethereum with its fixed costs for any amount. This is consistent with Intercoin’s purpose to be a payment system, and make crypto mainstream, to be used in everyday transactions rather than large transfers of money that may anger governments trying to enforce capital controls and anti-money-laundering measures).

Thus we go from trustlines (0 notaries) to 1 notary, to 2, 4, 8, etc. every time the previous group is corrupted. Or, we can grow more slowly, such as ceil(N^2) or something. The point is that, every time a smaller group is corrupted (coins turn red, or stay yellow for too long), they are automatically migrated to a different, larger group. It could be that the list of liars would then have to include lots of off-site notaries, and engage in migration and recovery.

To make this work, we would need to have:

  1. Notary group N watch a coin
  2. A larger “offsite” notary group N+1 watch notary group N by issuing “anonymous” requests to get the latest state of a coin
  3. End-users (recipients, etc.) would accept small payments without waiting for N+1 to confirm N is still healthy
  4. For larger payments, end-users would require notaries in group N to approve the transaction and group N+1 to confirm health of group N
  5. If group N+1 obtains a Proof of Corruption for group N (coin is red), or a Proof of Timeout (coin is yellow) then it may tell the network to migrate coins to group N+1 provision group N+2 to watch. They would then have to always store and report that group N is no longer used to secure transactions for its coins.

What’s great about this is that in the initial stages, anyone can set up Intercoin on a local network, even on the moon, and have it work quickly until the first Proof of Corruption is gossiped. Then they have to halt the network while they manually migrate it to another computer, with one or two remote notaries on Earth watching for more corruption, and accept the latency of checking larger transactions. If that ever gets corrupted then more notaries are recruited, and so on.

Since we have end-to-end encryption, the corruption would mostly be in terms of stalled consensus. Is there a way to detect this automatically, and recover automatically?

What do we know about such systems in practice? Hopefully by engaging in discussion here, we can arrive at some conclusions and improve the architecture for Intercoin 2.0