Amazon Dynamo (Not DynamoDB)
Yes, you read it right. Amazon initially developed Dynamo internally and published a research paper in 2007. Dynamo was used by Amazon’s internal services and was assumed to be a better replacement for SQL databases in certain use cases.
Why Did Amazon Develop an In-House Database Solution?
Below are the main reasons:
Scalability: SQL databases often have limitations in scalability.
High Availability: To provide an "ALWAYS-ON" experience.
Key-Based Storage: There was no requirement for complex queries offered by RDBMS; often, key-based retrieval was sufficient for use cases.
Failure Tolerance: Treating failure handling as a normal case instead of impacting availability or performance, such as in race conditions.
Operational Efficiency: At Amazon’s massive scale, the company could not afford the impact caused by traditional databases, leading to the need for an optimized solution.
Flexible Consistency Models: Dynamo allowed applications to balance consistency and availability based on their needs.
Dynamo is an Eventually Consistent Database. What is Eventual Consistency?
A database is not a singular entity; it consists of multiple nodes under an abstraction layer. A simple example is the leader-follower replication pattern. When a write request is made for any key, it is served by the leader (i.e., stores the data and returns a success response) and then asynchronously propagates the data to its followers. Dynamo provides eventual consistency, which allows updates to be propagated to all replicas asynchronously. A put() call may return to its caller before the update has been applied at all replicas, ensuring high availability even during failures.
Dynamo is an "Always Writable" Data Store. What Does This Mean?
Most databases resolve conflicts at the time of writes. Due to this nature, databases often throw errors during a put() operation in conflicting scenarios. However, Dynamo defers conflict resolution to the read phase. This ensures that writes are never rejected, maintaining high availability even in network partitions or failures. Dynamo also implements hinted handoff, which temporarily stores writes on another node if the intended node is unavailable and later forwards the data when the failed node recovers.
What is Conflict Resolution, When Does It Happen, and Why Is It Needed?
A simple scenario of conflict resolution occurs when two threads update the value of the same key simultaneously, and their write requests go through two different nodes of the database. This results in conflicting versions of the key’s value (similar to a "merge conflict" in Git).
Conflict resolution is needed to ensure the datastore reaches a consistent state and prevents data loss or corruption.
How Does Dynamo Resolve Conflicts on Reads?
Whenever a get(key) request is performed, Dynamo checks for the available versions of the value corresponding to that key. It follows these scenarios:
Only one version exists → Return it to the user.
Two versions exist, and resolution can be done using vector clocks → The latest write has more precedence and is returned.
Multiple conflicting versions exist, and precedence cannot be determined using vector clocks → Dynamo sends all versions to the user and asks them to resolve the conflict manually (similar to Git’s merge conflict resolution). In Amazon’s shopping cart service, conflicting versions were automatically merged by combining the cart items instead of choosing one version over another.
Additional Features of Dynamo
No Single Point of Failure: Dynamo avoids a central master node, ensuring it can scale and function even if some nodes fail.
Efficient Data Distribution: Data is spread across multiple nodes using consistent hashing to distribute load and reduce bottlenecks.
Conclusion
Built for Availability: Dynamo ensures uptime and scalability over strict data consistency.
Key Features: It introduced concepts like eventual consistency, always-writable operations, and application-managed conflict resolution.
Influence on Databases: Dynamo's design played a role in shaping modern NoSQL databases.
Customizable Balance: Applications can adjust availability, consistency, and performance settings based on their needs.
And I guess, you might be wondering, what is DynamoDB then? It is a different system that inherited some concepts and full name from Dynamo. We will cover this in an another post.
References:
Dynamo paper: https://www.amazon.science/publications/dynamo-amazons-highly-available-key-value-store