Unedited Notes

From Ripple Wiki
Jump to: navigation, search

These are notes that I need to convert to proper docs. Just putting here in the meantime

> What I'd like to know is the exact protocol that nodes use to come to a consensus. I'm having trouble imagining how we can reliably get half the nodes to quickly agree on exactly which transactions are to go in the next block ("ledger" as Arthur called it), especially in a network with potentially millions of nodes and tens of thousands of transactions every block. I'm imagining in that scenario that very few nodes will propose identical blocks, and that a long mess of negotiations would be required to come to consensus for each block.

The key thing that makes this work is that every honest node values having a consensus much more highly than caring about what's in the consensus. And in the consensus-building process, nodes will become aware of transactions they weren't previously aware of. Assuming the transaction can apply to the consensus ledger, every honest node that encounters a transaction during the consensus-building process will, at the least, agree that it should go in the *next* ledger.

The algorithm will extend the consensus window adaptively if it's not able to reach a consensus. Once a consensus window starts, the world is "frozen" by honest nodes according to what they saw at the beginning of that window. So there's no moving target. Honest nodes won't introduce any new transactions into the consensus process for that interval (and dishonest nodes that do will just rapidly squander any dynamic trust they've built).

Also, consensus is being built from both ends. If some nodes is claiming something rare that you are having trouble acquiring, it will also have trouble getting nodes to agree to it. So while you're trying to move to it, it's trying to move away from it.

And the data exchange algorithm is very efficient. It uses a tree structure to quickly identify any sections that are identical. And if two nodes are advertising only slightly different trees, you will only need to acquire the common portions of those trees once.

We have a scheme to ratchet up transaction fees if the system gets overloaded. So an attacker has to pay more for each transaction than you are willing to pay for a single one in order to flood you out of the system.

> Bigger picture, I'm also wondering why there are blocks in the first place. Couldn't there just be a transaction-by-transaction consensus? Then there would be no need for negotiations over which block non-conflicting transactions are assigned, only over which of two or more conflicting transactions get confirmed. Since you need to resolve conflicting transactions whether you have blocks or not, and can't always use blocks to order conflicting transactions since they may all occur at the same time, it seems that having no blocks requires strictly fewer messages to be passed between nodes to come to consensus.

The main problem is that you would also need to agree on the ordering of transactions because the semantics of a transaction can vary based on when it occurs. This would reduce the rate at which transactions could be accepted by a huge factor. By agreeing on the previous ledger and a block of transactions as a group, they can then be applied in a "canonical order", assuring all nodes agree on the next ledger. If we could have thought of some efficient way to get nodes to agree on the sequencing of transactions in large sets and with large numbers of such transactions overlapping, we might have done things that way. But we think the added efficiency of not having to reach a consensus for each transaction drastically improves scalability.

Also, having these windows allows the world to be frozen. Nodes, of course, still process transactions, but they don't allow new transactions to affect the ongoing consensus process. Once a consensus is reached, it's like a checkpoint moves forward and then new transactions can be introduced into the next consensus process. So the block methodology also avoids the "trying to hit a moving target" problem.

> Another question: In your implementation, does a node receive word of new transactions through its validators, or through some other mechanism?

A new transaction could be introduced to the network by any mechanism a node supports. Once a node has it, it uses a somewhat complex algorithm to decide whether to forward it or not. Basically, if the fee is adequate and it appears the transaction can succeed if included in a consensus set, it is flooded over the network.

1) There are two other reasons for processing transactions in chunks.

First, it provides a clearly defined checkpoint after each set of agreed transactions is applied. All nodes should agree on this ledger and can sign its hash. Clients can then walk the ledger's hash tree to find proof of balances and so on without having to extend unnecessary trust.

Second, it makes it easier for nodes to join the network of validators. During the consensus window, you can finish acquiring the last closed ledger. Once you've done that, and if you have the consensus transaction set at the end of the consensus window, you can apply the consensus set and then you are in sync with the network.

2) It occurred to me that I may not have given you as much detail as you wanted about the mechanics of the consensus process. It has four basic components: timing, distributing information, establishing a consensus, and handling load.

Timing is pretty simple. A consensus window starts whenever there are unconfirmed transactions or whenever more than a fixed amount of time has passed from the last consensus window. Nodes synchronize their clocks using SNTP built into the server. They also track an offset between wall time and "network time" (which can occur due to propagation delays and the like).

During a consensus window, transaction sets also need to be distributed. Nodes vote for a transaction set by hash, and then you must acquire the transaction set corresponding to that hash or you cannot count that vote. When a node has a transaction set, it sends an "I have this set" message. Nodes first try to acquire transaction sets from nodes they have. If a node wishes to acquire a transaction set and no directly-connected node has that set, it sends an "I need this set" message. If a node has a set and sees an "I need this set" message, it sends an "I can get this set for you" message. The node can then fetch the set using the intermediary node as a proxy.

Nodes cache all data that passes through them. So if sets overlap, you won't fetch the data twice. The tree is organized so that if large sections are similar, that is detected quickly.

Consensus is established on each transaction and on the ledger's close time. Transaction consensus is based on the number of nodes we know have voted yes on a transaction. If a node doesn't know about a transaction, it will implicitly vote no by not including that node in its transaction set. Transaction sets are very easy to "diff" because of their hash tree structure. Nodes create a "disputed transaction" to track any transaction they are aware of a dispute over.

The consensus algorithm intentionally makes it very easy to vote out a transaction. This makes consensus much more robust. And all honest nodes that saw the transaction will include it in the next set anyway. A node takes an initial position on a transaction, and then waits until most nodes have taken an initial position themselves. It then looks to see if at least half of the nodes have voted yes on the transaction. If not, it switches its vote to no. If a consensus is not reached rapidly, this threshold increases so that more than half of the notes must vote yes or the node switches its own vote to no. This causes rapid avalanche to agreement.

A similar process tries to reach a consensus on the ledger's close time. A close granularity is selected for each ledger. Say it's 30 seconds. Each node votes on the time it saw the ledger close and announces that to other nodes. Nodes use the exact time to maintain their network time offset. But they check for consensus by rounding each time to 30 seconds. If there's no rapid convergence on a ledger close time, nodes "agree to disagree" and include the previous ledger's close time plus one second and a "failed to agree" flag. If node successfully agree on the ledger close time, the granularity is occasionally reduced. If nodes fail to agree on the ledger close time, the granularity is increased.

For load handling, nodes analyze the way a consensus process went and attempt to extend a consensus window or reduce load if they're not happy with the process. Key to this process is the way transaction fees work. We have a base transaction fee schedule that determines the relative costs of transactions. It's based on the realistic worst-case work a transaction makes us do. Nodes also maintain their own "multiplier". If a transaction doesn't meet a node's threshold based on its multiplier, the node votes no on that transaction and does not forward it.

If a transaction gets voted into the consensus set, however, a node must apply it regardless of the transaction fee. If a node cannot keep up with the consensus transaction set, it "bows out". Nodes that bow out still confirm that they witnessed the consensus process with signed confirmations (otherwise, you might worry that the network split because you're not seeing activity from nodes). Nodes that are still processing transaction see nodes bowing out and consider these votes to "slow down" the network. Honest nodes won't let the network shrink to a microscopic set of "super nodes". But slow nodes will gracefully bow out and let the network process at the higher rate a large number of nodes are capable of.

So if the network is attacked, as nodes start to get overwhelmed, they'll raise their transaction fees. The slowest nodes may bow out and let the network volume go up. But as too many nodes get overwhelmed, it will take higher and higher transaction fees to get 50% of the nodes to accept your transaction. An attacker will have to keep paying this higher amount on every single transaction while a legitimate user need only pay enough to get one transaction past the threshold.

> What selection heuristic do you propose for maximizing the chances of convergence on subsequent rounds? Do you pick every transaction that is in a majority of your neighbours' sets?

Yes. At first. You make an initial proposal. Wait a certain amount of time to let other node's initial proposals come in. And then include a transaction in subsequent proposals if and only if it has a majority. That's the starting algorithm. However, it can "piddle around" 50% for a long time, so the algorithm then has a heavy bias to exclude transactions.

Here's the high-level view on why our algorithm is stable:

1) Every honest node wants a consensus. They will wait as long as it takes in order to get one. We have no fixed amount of time in which a consensus must be reached.

2) There is no moving target during a consensus window. With respect to establishing that consensus, the world is frozen. There is a fixed amount of information to be known about the state, and more information is always gathered by nodes. They don't forget anything. The ratcheting up of the agreement level required ensures a consensus will eventually be reached.

3) Dishonest nodes cannot stop transactions from propagating to the vast majority of honest nodes. A node would have to have every single one of its connections to a dishonest node. (And we imagine 'core' nodes agreeing to directly connect to each other as a safety.)

4) So long as a transaction can be applied to the ledger and the vast majority of nodes see it before the consensus window, there's nothing dishonest nodes can do to stop honest nodes from including it. (Nodes will extend the consensus window if they aren't getting votes or acquiring transaction sets from trusted nodes that have voted.)

5) If a transaction does not get into a consensus set, but is valid, every honest node that has seen that transaction will vote to include it in the next consensus set.

6) No honest node particularly cares what's in the consensus set, provided it includes transactions that were seen well before the consensus window started. There is no way a dishonest party could get something into the transaction set that shouldn't be there and have that cause any harm. Invalid transactions will have no effect, even if they get in the consensus set.

One other thing I should mention: Our transaction format is robust against conflicts for resources such as exchange offers.

For example, if you form a transaction that involves exchanging US dollars for Euros, you don't have to specify the exact offer you want. You can specify the input and output currencies and issuers and the worst rate you're willing to pay. Someone can't conflict our your transaction by taking a specific offer you wanted, only by taking all offers you would have considered acceptable.

Essentially, the transaction itself contains constraints and multiple paths. The ledger is self-indexing so that at transaction processing time, the best result within those constraints and using those paths will result from processing the transaction, with the transaction failing if there is no way to meet the constraints.

So to conflict out a transaction that wasn't your own, you'd have to take every resource that transaction could have used, which will generally be much more than what it actually would have used. And it's better for almost everyone if a "bigger" transaction that satisfies more people gets in anyway.

> I think you're missing the rest of the canonical ordering method here... Can you elaborate?

Currently, we do a two-phase scheme.

First, we go through all the transactions in hash order. For those that "soft fail" (fail, but could succeed if retried), we sort them.

Then, we loop through the transactions in sorted order until no more get in or we hit a retry limit.

The sorted order is first by account and second by sequence number. (The account sort is slightly perturbed to avoid having "early accounts" and "late accounts". Though it's not clear which is better anyway.

This sorting doesn't enforce any kind of preference. It just saves loops. If we always tried only in hash order and there were 10 transactions from the same account that were sequenced, it would take many loops to "stumble on" the correct sequence.

>> Limited on-the-fly calculation of alternate routing might be applied.
> Can you elaborate on this too?

A transaction can include a set of paths and paths can include directories of offers. At transaction processing time, the internal indexing of the ledger is used to take from the best path and the best offers at that time. Partial payments from multiple paths can be used. For example, if I have Bitcoins at two gateways, need to pay you USD, and you accept USD at either gateway, I can take the best offers from the four possible exchange combinations of bicoins at either gateway for USD at either gateway to pay to you. Or I can pay you US dollars first by giving you back your own IOUs and then second giving you my IOUs at a gateway without having to specify in the transaction exactly how many of your IOUs I'll redeem. (Because other transactions might change our balance if people push through us.)

Our current mechanism is not really converging. Each node has a single open ledger and processes transactions initially against the open ledger. When the ledger closes, each node initially proposes the set of transactions that it applied to its open ledger. During this window, new transactions are applied to the same open ledger until consensus is reached.

Once consensus is reached, each node rewinds to the last closed ledger and applies the consensus transaction set. The becomes the new "last closed ledger". Then it creates a new open ledger starting from the new last closed ledger. It applies any transactions not voted into the consensus set. Then it applies any transaction applied against its open ledger after the consensus window started. This ledger becomes its new open ledger and when that ledger closes, it proposes its transaction set.

If you assume a constantly flow of transactions, it's this simple -- as soon as we reach a consensus, we start on the next consensus. In the idle case, it's a bit more complex, but basically it's a fixed time from when the previous ledger closed. Nodes track when other nodes have closed, so in the idle case, it will cascade.

In principle, a node could announce a set with a new transaction at any time. This isn't always illegitimate, since it could theoretically see a supermajority of nodes it trusts that we don't trust announcing that transaction. If the transaction does garner a majority in this consensus window, then it goes in. If not, then assuming the transaction can succeed, all honest nodes that saw it during this consensus window will vote it into the next.

> (When you say "not converging", do you mean it's not working properly, or that it's not using a "convergence method", whatever that might mean?)

I think I meant to say not overlapping. I mean it doesn't have multiple sets of transactions being considered at the same time. The network really has only two states, idle and consensus. If it's idle, it's only because there's no unconfirmed transactions. If it's in a consensus state, it's trying to build a consensus on one set of transactions.

> So there's basically only one cutoff per block, which is a predetermined time after the last block closed, where you start eliminating transactions not proposed by a majority of your neighbours and punting them to the next block in order to reach consensus?


It's a predetermined time after the last block closed only if there are no transactions. If there is at least one transaction every two seconds (the minimum consensus time to ensure nodes have a fair chance to make an initial proposal and to ensure nodes don't change their position based on incomplete information), it will effectively always be trying to reach a consensus because there will always be at least one unconfirmed transaction that appears valid.

The consensus process is heavily biased towards reaching a consensus by excluding a transaction, which makes it extremely robust. If a transaction unambiguously belongs in a set, every honest node should vote it in, so it doesn't matter. If a transaction was received late or otherwise doesn't unambiguously belong in a set, so long as it can go in the next set, every honest node will vote yes on including in that next set.

> Also, how can you be sure a ledger set is truly at consensus and closed, and there isn't another wave of eliminations coming from the other side of the network that will change it?

There's an absolute minimum two second window from ledger close until a node will change its position or check to see if there's a consensus. So a node has at least two seconds to get in an initial proposal.

You can tell if there's a consensus because you can look at the number of nodes whose last proposal did or didn't include a particular transaction. If there is a consensus on every disputed transaction, then there's a consensus on the set as a whole. Nodes propose a set of transactions by hash and that hash votes yes on every transaction in it and implicitly votes no on every transaction not in it.

The two second window is an absolute minimum. There's a safety based on how long the previous ledger took to reach consensus and how many validators participated in the previous consensus. If the previous round had, say, 100 validators you trust and you only see 12, you will wait extra time (before you change your position or declare consensus) to see if those nodes participate.

The vast majority of validators, absent some kind of network problem, will get a chance to get in an initial proposal. Once a node you trust sees an initial proposal from you, it will not declare a consensus until it can converge with you or it sees a supermajority. (In which case either you're dishonest, have a bad list of trusted nodes, or the supermajority will eventually convince you.)

Nodes also monitor whether other nodes they trust have already declared a consensus and this weighs into their decision to declare a consensus. So as the nodes you trust declare a consensus, you become assured that there aren't a set of disconnected nodes that haven't converged. If you care about those nodes, some of the nodes you trust will not be declaring a consensus.

> After ledger close, nodes propose candidate ledger sets at 2-second intervals until a consensus is reached.

Not quite. Right at ledger close, nodes propose an initial candidate set. There's at least a two second delay before any nodes change their position. Nodes can transmit new proposals as often as one per second.

> There is no agreement on consensus/ledger open or ledger close times, but I can tell what your peers are doing by the messages they are sending out, and keep in sync that way.

There is actually also a mechanism to reach a consensus on the time the ledger closed and this is included in the ledger. Nodes include in their initial proposal their best estimate of when the ledger closed. They try to agree within a certain time resolution. If they do, the close time is included in the ledger and the resolution may be increased. If they fail to agree, the close time plus one second is included in the ledger, a flag marks that the close time is not correct, and the resolution is increased for the next ledger. Nodes analyze the close times to determine their own offset to the network's view of close time to try to improve the close time agreement resolution. Nodes also track wall time and try to 'push' network close time towards wall time. A deterministic algorithm based on the previous ledger determines the close resolution for the next ledger. Nodes initial proposal includes the close time at full resolution, but agreement is only needed to within the close resolution.

The primary purpose of this close time is to permit a transaction to depend on the close time of the previous ledger to support things like time lock transactions.

> What's the difference between changing my position and transmitting a new proposal?

The terms could be used interchangeably. The proposal structure has a sequence number in it. The initial proposal for each consensus window has a zero sequence and contains the node's very best estimate of the ledger closing time and proposes the transactions in its open ledger just as it closed. Changed positions have a non-zero sequence number and have an adjusted ledger close time and transaction set hash to build consensus. The important thing about the first proposal is that it lets other nodes know that you plan to participate in this consensus process. If it creates any disputed transactions or a disagreement over the rounded ledger close time, then it will delay nodes that trust it from declaring consensus until they see a super-majority, the position is changed, or enough time passes that they conclude the node that proposed it has stopped participating.

> Did you mean that there's a minimum 2-second delay after the first proposal, and thereafter only minimum 1-second delays between further proposals?

Yes. Nodes wait at least two seconds from when they think the ledger closed to ensure that other nodes have a fair chance to make an initial proposal. Thereafter, nodes will not change their positions (by making a subsequent proposal) more often than once per second. These times are, of course, tunable. And the network does not need to agree on them. As soon as a node thinks the ledger closed, it makes its initial proposal, so other nodes will know that it has closed.

If a node does get left behind in the consensus process, it can issue "partial validations". These let nodes that trust it know that it wasn't cut off from the network, that it did witness the consensus process, that it wanted to participate, and that it was unable to. Other nodes use this to control transaction fees to ensure the network doesn't contract into a small group of nodes. While this would process transactions faster, it would make controlling the majority of the network too easy.

If the consensus process does somehow break, nothing particularly terrible happens. Nodes will see two or more different ledgers being validated by nodes they trust. There's an algorithm to decide which one wins, and nodes would rapidly resynchronize to one or the other of the candidate sets. This would disrupt the network and obviously if it ever happened it would be something we would need to fix immediately. You can't use this process to empty someone's account because you can't change anything without a transaction; however, you could undo a very-recently-confirmed transaction and substitute a conflicting one.

One trivial fix is to not consider a transaction confirmed until a ledger including it has been validated, not just converged on. This would add about 3 seconds to the confirmation time. Basically, you track how many nodes validated the last closed ledger. And until at least, 60% of those nodes validate the ledger that includes your transaction, you don't consider it confirmed. Actually, that might be a good idea to do by default. It's probably worth the extra few seconds for the extra confidence, particularly until we have enough operating history to be very sure the network won't split.

> Does "validated" here mean the same things as "proposed"?

No. A proposal suggests a particular candidate transaction set to be the one that turns one ledger into the next ledger during a consensus window. A validation certifies that a node believes that a particular ledger (including the results of transactions applied to it) is the correct last closed ledger and is published as soon after a node detects consensus as possible.

> Can you share the algorithm for deciding between multiple validated ledgers and resynchronizing the network?

At the simplest level, the one with the most trusted validations wins. If a node believes it has been correctly tracking the network, the node counts itself. There are simple tie breakers if needed.

It is theoretically possible that a near 50/50 network ledger split could leave the network the network in disagreement for awhile, possibly even requiring manual intervention. If that ever happened, we'd have to track down the cause and repair it as quickly as possible. The primary goal of the consensus algorithm design is to make this nearly impossible.

Of course, it is always visible if the network is split. You'll see nodes sending validations for different ledger chains. A node always knows how many trusted nodes agree with it and how many trusted nodes disagree with it.