High-Level architecture
The architecture is composed of the following components:
- EdgeNode is a special type of node that are responsible for the processing of requests.
- WriterNode is responsible for executing blocks and save resulting states to Tetrapod master database.
- Tetrapod is the remote, master database that EdgeNodes read partial states from.
This is a departure from the traditional blockchain client architecture; the many roles from a traditional blockchain client are split into several small components.
The smaller components are easier to scale and maintain, and the decoupling of the components allows for more flexibility in terms of deployment.
If you are curious about the performance gains of this architecture, refer to Benchmarks section.
The EdgeNode
EdgeNode is a streamlined version of node, featuring only the state machine and the request handlers. It does not participate in consensus, nor execute any blocks.
Because it doesn't participate any of the write operations in a blockchain network, it is completely free of the performance degradation usually associated with read-write contention.
The design allows for the following properties:
- Stateless. EdgeNode can start without a local database, and boot up time is usually less than 2 seconds.
- Better concurrency. Because EdgeNode does not suffer from the read-write contention, we can apply patches that allow better concurrency. We carefully designed EdgeNode so that it has no mutual exclusions in the hot-path.
- Superior latency using in-memory database as cache. Depending on a remote database means there is a greater latency to resolve certain data. To mitigate this, we augmented EdgeNode with an in-memory database with an algorithm that is likely to keep the hot data in memory. This way, we can effectively suppress any unnecessary round-trips1.
1 In practice, achieving 100% cache hit rate is impossible. However, given the fully concurrent nature of EdgeNode and the close proximity between the EdgeNode and the remote database, round-trip latency is usually negligible (< 1ms). This means that the added performance gain from currency and the in-memory database often yields a superior performance in general, while still retains the ability to resolve "missing" data efficiently.
As a result, a cluster of EdgeNodes can handle a much higher throughput than a traditional blockchain client. If you are curious about the performance gains of this architecture, refer to Benchmark section.
Combining these properties, we can achieve elastic scalability.
Elastic Scalability
Elastic Scalability is the ability to scale up and down based on the demand, and is a well sought-after property of any system at scale.
Leveraging the stateless-ness of EdgeNode, auto-scaling is as simple as spinning up/down more EdgeNodes, without any storage replication.
This is a significant improvement over the traditional blockchain client architecture with the following benefits:
- Faster adaptation. EdgeNode scaled up/down in matter of seconds. Adapting to peak surge in traffic is no longer a problem.
- Unlimited scalability. We can spin up as many EdgeNodes as we want. With the efficiency introduced above, average performance gain per EdgeNode is almost linear.
- Cost-effectiveness. EdgeNode requires less resources to run. This means that we can run more EdgeNodes with the same amount of resources, and thus achieve a better cost-effectiveness.
The WriterNode
WriterNode is a node dedicated to executing blocks and saving resulting states to Tetrapod master database.
[insert writer node diagram]
Whereas EdgeNode is designed for high throughput, WriterNode is designed for high reliability. Reliability of WriterNode is paramount to the overall reliability of the entire cluster. It is the "main computer" that computes the next state of the blockchain, and any failure in WriterNode will result in a mismatched state, causing a fallout of the consensus.
WriterNode is not exposed to any external factors that may impact the reliability of the node. It does not participate in consensus (this is delegated to another process), nor does it handle any RPC requests (which can sometimes put pressure on the block execution process).
However, ensuring uninterrupted operation alone does not guarantee increased reliability. We have implemented additional features to make it extra reliable in disaster scenarios.
Rollbacks for consensus failures
From time to time, consensus failures occur and re-syncs are required. These re-syncs, whether from a snapshot or starting from genesis, are extremely time-consuming and a major pain point for many blockchain clients.
WriterNode features a zero-time rollback mechanism to revert to a previous state. Internally, rollbacks are treated as a fork in the canonical chain; it simply rewinds to a previous state and reapplies any blocks thereafter. This feature ensures a speedy recovery from consensus failures.
Zero downtime upgrades
With WriterNode being a separate component in our stack, turning off WriterNode does not impact the overall RPC availability (although there won't be any new states).
Leveraging this, we can fully automate the upgrade procedure by pre-packaging WriterNode in the background, swapping out only the WriterNode when the new version is expected to roll out.
Tetrapod
Storage Over-Redundancy is the phenomenon where the same data is stored in multiple places, while the utilization of the aggregate size is low.
In our design, data is stored once in a single remote database, and nodes read partial states from the database. This is a departure from the traditional blockchain client architecture, where the entire state is replicated across each node. This improves overall footprint in aggregate size of data under management, as well as achieving a better utilization of each byte stored.