This website is no longer actively supported. Please see the Ripple Developer Center for up-to-date documentation and other resources.

Hash Tree

From Ripple Wiki
Jump to: navigation, search

Introduction

The Ripple network uses a hash tree structure that is indexed like a radix tree. This structure is used for account state trees, proposed transaction sets, and applied transaction trees.

Indexes and Hashes

Each data entry in the tree has two parameters, each a 256-bit value, an index and a hash.

The index is the key used to locate the information. In the case of proposed transaction sets or applied transaction trees, it's the transaction ID. In the case of account state trees, it's a hash of information that identifies what the node is about such as the account ID and/or currency.

The hash is the hash of the actual data stored in the node. In the case of proposed transaction sets, the index and hash are the same. The hash algorithm used is half of a SHA-512 hash.

For example, as an account state node's contents change, its hash will change. Its index in the state tree will remain the same.

Inner Nodes

Each inner node in a hash tree contains sixteen hashes corresponding to sixteen possible branches. A hash of zero indicates no node on that branch. A non-zero hash corresponds to the hash of the inner node or leaf node next along that branch.

Leaf Nodes

Each leaf node contains a single data element. The left node also contains the index. A leaf node also contains an identifier that identifies it as a leaf node of a particular type, account state tree entry, proposed transaction set entry, or applied transaction with metadata.

Radix Structure

The tree has a radix structure based on nibbles of the index. For example, to find a transaction whose ID begins '40EA...' you first fetch the root node. If it's a leaf node, either it's the desired node or not. Either way you're done.

If it's an inner node, you look at hash 4 (you can consider the hashes numbered 0 to 15), since 4 is the first nibble. If that hash is zero, then no nodes with indexes starting with '4' are present in the table, so you're done. If it's non-zero, you find the object with that hash. If it's a leaf node, either it's the desired node or not. Either way you're done.

If it's an inner node, you look at hash 0, since 0 is the second nibble in the index you are looking for. Again, if the hash is zero, you're done. No indexed that start '40' are in the tree. If it's a leaf node, you're done. That's the only node with an index that starts '40', and either it's the one you want or not.

If you find an inner node, you look at hash 14 (E = 14), and continue until you either find the leaf node you're looking for or find proof it is not in the tree.

Order Invariant

The tree structure is designed such that the final tree is dependent only on the nodes contained in the tree. That is, if two trees contain the same leaf nodes, they will also contain identical inner nodes and thus have an identical hash of their root node.

This is accomplished with basically only one rule. An inner node may not exist in a tree unless it has at least two leaf nodes under it. If it has no leaves under it, it can be removed. If it has one leaf under it, it can be replaced with that leaf node.

Wire and Prefix Formats

Each node is hashed to get its hash in a canonical format. This format is called 'Prefix' format because it contains a 32-bit prefix, such as 0x4C575200 for LedgerMaster nodes. This prefix prevents possible attacks where, for example, a transaction can be formed that looks like an inner node.

Only in prefix format can the hash of an object be verified. So if, for example, a peer is fetching an account state tree, it must ensure that each leaf node it receives is in fact the correct leaf node for that entry in that tree. To do this, it must check its hash against the hash in the inner node above it. To hash it, it must convert it into prefix format.

In some cases, a compressed format is used. For example, inner nodes frequently contain a number of zero hash entries. If an inner node is mostly empty, it may be more efficient just to list the non-zero branches along with the hash number. This format is called 'wire' format and is used when synchronizing trees with nodes that understand the tree structure.

Fetching, Synchronizing, Comparing, and Proving

To fetch a tree, you need every object whose hash is mentioned in an inner node of that tree.

A tree can be fetched with just an operation to get an object given its hash. Presumably, you know the ID of the tree you want to fetch. That ID is the hash of the root node. So you can fetch the root node.

If it's an inner node, you now have the hashes of more objects you can fetch by hash if you do not have them. You will eventually discover every leaf node and can fetch, by hash, all the data elements in the tree.

If smarter operations are available, synchronization can be a bit faster. A node can give you not just the node you requested by also the children of that node. This saves you an extra back and forth as you discover hashes mentioned in the nodes you fetch.

If you have a similar tree, you will discover branches you already have and will not need to synchronize those branches. For example, if you are fetching a candidate transaction tree and already have the transaction, there is no need to fetch it. If you are fetching an account state tree and already have one from the past, you will not need to fetch account state entries that have not changed.

Tree comparisons are needed to compare candidate transaction sets to discover disputed transactions. This comparison is also efficient. If a branch is identical, this will be immediately discovered. Only branches containing differences need to be investigated.

In addition, the tree structure provides simple proof that a particular entry is or is not in the tree. For example, to prove that a particular entry is in a tree whose hash is known, the path to that entry can be provided. A client can verify the hash of each node in the chain and see that it ends on the particular entry.

Similarly, to prove an entry is not in the tree, you walk to find that entry and continue until you find either a different leaf or a zero hash in an inner node. In either case, that path proves that there is no entry with that index in the tree.

In the Ripple system, this means that once a client knows the ID of the consensus ledger, it can easily be shown compact conclusive proof, by any node that has the full ledger, that a particular entry is or is not in that ledger.

In fact, because of the ledger structure, a fairly compact path can be constructed to any entry in any prior ledger, including proof that a particular transaction was or was not processed and what results it had.