If Eventual Consistency Seems Hard, Wait Till You Try MVCC

AlisdairO · on Dec 9, 2014

I certainly wouldn't dispute that it can be hard to reason about concurrent operations in traditional database systems - particularly if you want to do it across multiple database systems, which do indeed have differing implementations. That said, if the author thinks eventual consistency is easy/not-that-bad/whatever by comparison, I would tend to question their understanding of programming for eventually consistency. Eventual consistency for non-trivial applications is really, really, really hard to get right.

Consistency in relational databases is a hard topic because maintaining consistency in any concurrent system is hard. Traditional systems provide you with a set of tools (isolation levels, foreign keys) that make a decent programmer capable of building a safe concurrent application. Throwing away those tools and replacing them with nothing does not make life easier. The tool is easier to understand, but the problem is harder to solve.

kokey · on Dec 9, 2014

Something that also stands out to me from the article is the general negativity about implementations the author knows in-depth, and positivity about the implementations that he doesn't know that well and only read about. I think this is what often drives people to adopt new things which avoids some old problems but knowledge about the new problems it creates are not that well known yet, even less so with possible solutions to these new unknown problems.

jeffdavis · on Dec 9, 2014

PostgreSQL supports true serializability, while maintaining good performance and concurrency:

http://www.postgresql.org/docs/9.4/static/transaction-iso.ht...

It's based on fairly recent research by Michael J. Cahill, et al.

It's simple. If you are confused, then set default_transaction_isolation=true. You will get errors if there's a data race.

Given that, what's the point of the article? That sub-SERIALIZABLE modes have complex semantics? Yes, that's true, but they are still much more likely to help you then the NoSQL "you're on your own to avoid races" approach.

If you want to avoid lots of really challenging problems, PostgreSQL is often the best bet by far.

VieElm · on Dec 9, 2014

> If you want to avoid lots of really challenging problems, PostgreSQL is often the best bet by far.

How about 0 single points of failure. How does PostgreSQL help you avoid that challenging problem?

adontz · on Dec 9, 2014

I'm pretty sure, every try to solve all development and maintenance problems in single software will always fail. And also asking "How to do X with Y?" is bad. "How can you ensure zero points of failure?" is the right question not "How does PostgreSQL help you ensure zero points of failure?".

Dave_Rosenthal · on Dec 9, 2014

Postgres doesn't. But VoltDB, FoundationDB, and Marklogic all have serializable transactions and no single-point-of-failure if you are looking to solve both problems at the same time.

Dave_Rosenthal · on Dec 9, 2014

I agree with the author that the various levels of isolation, etc. within the current crop of SQL databases is a morass. I’ll point to some recent fine work by Martin Kleppmann (https://github.com/ept/hermitage) that explores the issue and shows how many systems fall short of serializable isolation in their default modes. (And sometimes in modes labeled “serializable”!) In his test three databases actually achieve full serializability: PostgreSQL, MS SQL Server, and FoundationDB.

But don’t give up on ACID yet! If can actually get real serializability, you have a very powerful model that is also very simple to reason about. Serializable isolation gives you a virtual “whole database lock” that lets you modify lots of things all over the database before you “unlock” it and hand the database to the next client. The amazing thing about MVCC combined with serializable isolation is that you get to use this "simplest possible" mental model even though hundreds or thousands of concurrent clients might be all hitting the database at the same time.

curun1r · on Dec 9, 2014

> The amazing thing about MVCC combined with serializable isolation is that you get to use this "simplest possible" mental model even though hundreds or thousands of concurrent clients might be all hitting the database at the same time.

I think you've proved the author's point better than he ever could. With serializable isolation, you don't have concurrent clients all hitting the same data at the same time...you've got locking/blocking. The two things you're trying to put together are mutually exclusive.

The whole point of the article is that when you're trying to write concurrently to a database, it will be necessarily complex. ACID databases attempt to sweep that complexity under the rug and create a simpler mental model. But those abstractions are leaky and bubble up to the surface in ways that are often unexpected or difficult to handle. This is especially true when you try to scale beyond a single machine...achieving serializable isolation whilst replicating between database nodes is all but impossible.

Most NoSQL databases take the approach of explicitly exposing the complexity to the application with the assumption that concurrency is an absolute requirement and the application will understand how to achieve consistency better than a generic data tier can.

AlisdairO · on Dec 9, 2014

> I think you've proved the author's point better than he ever could. With serializable isolation, you don't have concurrent clients all hitting the same data at the same time...you've got locking/blocking. The two things you're trying to put together are mutually exclusive.

Not so. With serialization as implemented by PostgreSQL it's perfectly possible for serializable transactions to avoid blocking/rollback on read/write conflicts as long as there's some reasonable serialization order. Some DBs (default config of SQL Server for example) will lock in this situation, but it's not a requirement of the model.

If you have lots of clients writing to the same data at the same time, yes, you will have lock waits/deadlocks. At that point, though, unless your updates are trivial (i.e. only ever hitting one lockable unit per logical operation) you've got to deal with what is likely some extremely complicated update logic all by yourself. And if your updates are trivial, well, you can probably structure your relational DB so that you only ever have lock waits, which is often okay. Sure, there are situations where it's not (highly contended counter, say), but that's certainly a small minority of situations.

I would argue that if a developer finds typical ACID semantics hard to understand, they almost certainly aren't prepared for managing eventual consistency.

edit: If I might add to this, a quote from the Google F1 paper:

"We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level. Full transactional consistency is one of the most important properties of F1."

Dave_Rosenthal · on Dec 9, 2014

Hmm... Serializable isolation and concurrency go together fine and certainly don't require blocking or even locking when using MVCC and optimistic concurrency. For that matter serializable transactions and scalability are not incompatible either (though few NoSQL systems have tackled the problem yet).

I think the author's main point is that sub-serializable isolation levels are confusing and can lead to wrong behavior in subtle ways. I agree, but to me it sounds like the failing is in using too weak a consistency/isolation model, not too strong a one!

pbailis · on Dec 9, 2014

> Hmm... Serializable isolation and concurrency go together fine and certainly don't require blocking or even locking when using MVCC and optimistic concurrency. For that matter serializable transactions and scalability are not incompatible either (though few NoSQL systems have tackled the problem yet).

For non-conflicting operations, I agree entirely that serializable transactions can scale just fine. In fact, just about every serializable concurrency control algorithm (e.g., two-phase locking, partitioned OCC, MVCC with clever timestamp allocation) can scale just fine for non-conflicting operations.

However, for conflicting read-write operations, serializability will incur higher costs than many alternatives (e.g., eventual consistency, weak isolation implemented in a scalable manner). Serializing access to shared data items fundamentally requires some serial execution.

This fundamental overhead is a key reason why almost every commodity RDBMS defaults to one of these weaker isolation levels and why databases like Oracle don't support serializable isolation at all. It's not that their respective database architects don't know how to implement serializability -- it's that weak isolation is faster and, in many cases, reduces deadlocks and system-induced aborts, even if it increases programmer burden.

Dave_Rosenthal · on Dec 9, 2014

> Serializing access ... fundamentally requires some serial execution.

This is true, but I don't think this requirement limits scalability because the serial part can be arbitrarily cheap (e.g. approve ordered batches of work).

You also say that many databases could implement serializable transactions but don't because of the "higher costs" and that "weak isolation is slower". This sounds like a tradeoff to me so, of course, there will never be one right answer. Well, maybe someday for problems that permit I-confluence :)

However, the article attests to the high costs of sacrificing serializability in programmer productivity and system complexity. Those are serious downsides that need to be weighed very carefully against any actual, measured, performance advantages that are on the table.

neilc · on Dec 9, 2014

> Serializing access to shared data items fundamentally requires some serial execution.

Sure, but typical MVCC systems that provide SI will serialize transactions in some cases anyway. For example, if two transactions try to update the same row in Postgres, the second will block until the first one commits/aborts. This is true under RC, RR, or SERIALIZABLE.

> This fundamental overhead is a key reason why almost every commodity RDBMS defaults to one of these weaker isolation levels and why databases like Oracle don't support serializable isolation at all.

I don't think that's true: if you already have an MVCC implementation that provides SI, the additional overhead of providing SSI is relatively modest. I'd say the main reason Oracle doesn't provide SERIALIZABLE is historical: their MVCC implementation predated Cahill's work by many years, and people literally didn't know how to use MVCC to provide SSI at the time.

curun1r · on Dec 9, 2014

> Serializable isolation and concurrency go together fine and certainly don't require blocking or even locking when using MVCC and optimistic concurrency

If you're willing to embed retry logic throughout your application, sure. But that's a steep price to pay if your application can handle an eventually-consistent state.

empthought · on Dec 9, 2014

The retry logic has to be there anyway (you can't sacrifice partition tolerance).

grogers · on Dec 9, 2014

They aren't mutually exclusive. You can't modify the same row concurrently and have serializability, but that isn't a common requirement. The common requirement is that you can have concurrent transactions operating on different pieces of data.

curun1r · on Dec 9, 2014

If you're not modifying the same data, then there's no need for ACID...AD will do just fine. The whole point of the CI is to govern what happens when there is contention for the same data. Brushing that occurrence aside and saying it's rare is asinine...it's going to happen and programmers need to account for it. With serializable isolation, you risk your application blocking (can be acceptable) or even deadlocking (never acceptable).

grogers · on Dec 9, 2014

That probably came across wrong. What I meant was that the ability to do concurrent access is the important thing, not the performance of it. CPUs aren't any good at contended access to memory cells, but they still allow it and it works. If you want your system to scale to many threads you need to break the sharing.

Same with databases. The important thing is that it gives you strong consistency when there is contention. It'll never be as performant as operating on different rows concurrently though. When you touch different rows, you don't feel the pain of locking, which is the common case.

jbergens · on Dec 9, 2014

I agree and one thing that is missing from this discussion is that many document databases are atomic around a document update. That gives the developer a very easy model to work with for a lot of things, you can update any part of a document without conflicts. In a relational system that would probably require updates/reads from multiple tables. Of course it won't work for everything but you can get an easy model to understand.

AlisdairO · on Dec 9, 2014

That is true - but it's a commonly observed fact that it's rare for a nontrivial application to manage to limit every logical operation to single document updates.

rsanders · on Dec 9, 2014

You don't need to be using serializable isolation to face blocking or deadlocks. So-called "deadlocks" really aren't that bad when they are detected and all but one of the competing transactions are failed. You just catch the error and retry.

nehan · on Dec 10, 2014

serializability != one giant lock around your database.

concurrency and serializability go together great. it's only if transactions actually read/write or write/write the same data that they are done one at a time (and even that's not exactly true, depending on the concurrency control algorithms). MVCC often provides better performance because read-only transactions will not block writers.

and achieving serializability across database nodes is DEFINITELY NOT impossible.

quizotic · on Dec 9, 2014

One of the functions of an Operating System is to provide (the illusion of) isolation between Processes and the resources they acquire. Roughly,

Transaction : Database :: Process : OperatingSystem

The four transaction isolation levels, defined nearly fifty years ago, were an attempt at categorizing incomplete isolation. Can you imagine launching an operating system process and saying "It's OK if my memory locations are changed by another process" Or "Don't let my memory locations be changed by another process, but it's OK if another process prevents me from deleting a resource". That's what we're asking of our non-serializable database transactions.

Why not just provide strong isolation? Partly because it is hard and partly because the performance impact of strong transaction isolation is greater than the performance impact of process context switching.

But if you're living with less than strong transaction isolation, then tautologically, strange and unexpected things eventually happen: seeing state that never existed, seeing state changes that you didn't make, failing to see state that you should have seen. Rarely, the application can reliably detect and handle some of these situations. Typically, the application only thinks it can.

Sometimes, I think the core issue is with the notions of isolation and serializability themselves. Developers want to believe that events are noticed simultaneously with their occurrence, and that all observers (transactions) see the same history unfold in the same order. But the pesky physical world doesn't work that way.

nickik · on Dec 9, 2014

While reading this I was thinking about Datomic. I know that transactions are serialisable and actually serialised and stored in the database. Essentially what you get is a full DB lock and you can access the full database, or even do whatever you want. That of course does lock the database transacter, reading can still go on consistently.

Was is not also the research in VoltDB, that coordination is the problem, and that you have to do all transaction on a single core. Am I remembering this correctly?

Seams to me Datomic hits a very nice spot very you have relativly fast writing and concptionally unlimited reads.

If somebody know more then me, I happy to learn.

PS:

Also, why are we still using NoSQL and make any generalisation about it, by know there are so many NoSQL databases that they have literally nothing in common exept that its not a traditionall SQL database.

krakensden · on Dec 9, 2014

The author also has thoughts on Datomic: http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-g...

tl;dr; append-only is great as long as you don't use your database much.

derefr · on Dec 9, 2014

Mostly, "NoSQL" means "doesn't try to magically turn a set of indexes into a relational-set-theory abstraction using a Sufficiently Advanced Query Planner." If you had a NoSQL database that did that, people would insist that the "NoSQL" moniker was incorrect, even if it did, in fact, use some querying protocol other than SQL.

collyw · on Dec 11, 2014

NoSQL is web scale.

amirouche · on Dec 9, 2014

datomic has a single writer, so it's off topic.

spacemanmatt · on Dec 9, 2014

It would seem a much more honest argument to me if MVCC were being compared to another implementation of consistency management, rather than compared to NO IMPLEMENTATION at all.

amirouche · on Dec 9, 2014

That not the point of the article, which is not very well made. The author explains/try to explain that MVCC has also its quirks under high write workload, it's not a silver bullet, even when considering "eventual consistency databases" which "do nothing to help you", quite the contrary because it's a design choice.

I think this article must be read inside the debate "ACID or no ACID". Where Google with F1 [1] and appengine [2], FoundationDB and wiredtiger push for strong transaction features because it's easier for developers.

[1] http://research.google.com/pubs/pub38125.html [2] A Design for a Distributed Transaction Layer for Google App Engine

epe · on Dec 9, 2014

I find this: http://martin.kleppmann.com/2014/11/25/hermitage-testing-the... a more informative read on essentially the same topic.

optimusclimb · on Dec 9, 2014

Thanks for posting that link, I agree it does a slightly better job.

friendzis · on Dec 9, 2014

> “what’s the difference between Consistency and Isolation again?”

This is what bothers me most about this article. How can a person seemingly having advanced knowledge of relational databases cannot understand that Consistency is about dataset state and Isolation is about transactions. It's not that you get consistent results between transactions or even queries, but that if key is integral number, then `itoa()` works (given that bit length not too large) no matter what, period.

If you want to support concurrent read and write operations on a database, then it's not much different from multithreaded programming - hard and impossible to get right without compromises.

amirouche · on Dec 9, 2014

The author makes this statement rhetorically, without explaining it more than all 4 ACID properties are tightly coupled.

In the case of Consistency and Isolation, you might have a inconsistent databasse if Isolation allows phantom reads. Now that I think (again) about it, it seems to me consistency can only be a consequence of Atomicity and Isolation. Maybe not... because if Isolation allows to write things based on phatom reads then the database even if Atomic and with strong isolation it will be inconsistent.

tl,dr: yes, the author was quick in that sentence, doesn't mean he is stupid. `itoa()` example is far fetched and non very instructive.

friendzis · on Dec 10, 2014

Sorry about that, had to finish up quickly. I will try to clarify my view, and sorry again if this wall of text is too long. I just wanted to stress that Consistency is about data itself, while Isolation is about dataset acquisition (the same goes for Atomicity).

My view is that Consistency should only be mentioned when talking about data. It is much like method argument type checking in type-safe languages: if you get float passed in it WILL contain a valid number and you do not need any additional explicit checking for that. Consistency is data validation.

Isolation, on the other hand, is a form of concurrency control. Once you have shared data between concurrent threads, care must be taken. SQL brings this to a new level by having only shared data between threads (transactions). Since performance is directly related to concurrency control techniques, some compromise must be taken. The article is just about those compromises. Phantom reads can be viewed as a manifestation of holding and releasing locks (semaphores) during thread execution.

I believe that ACID properties are all orthogonal and databases should be examined where/when/how those properties are violated, e.g. Isolation is not always ensured because of the mentioned MVCC.

Atomicity: data is treated in atomic datasets; Consistency: data validation; Isolation: data immutability during transaction; Durability: data retention (and as a consequence Consistency) over prolonged periods of time.

But that is my own humble opinion.

taeric · on Dec 9, 2014

I'm not entirely sure I understand how things are any better in either world. Especially if you are approaching the problem with the idea that you will have no failure conditions, things are going to be tough. Prohibitively so.

Terr_ · on Dec 9, 2014

I think part of the goal with Eventual Consistency is that it lets you attack the underlying issues in a qualitatively different way.

Relying on the DB, your options are constrained by your choice of relational tables and rows, and further affected by your DB vendor and configuration. When you lift the conflicts out to a higher layer of abstraction, you can do something more object-oriented, leveraging some of the same tools you'll need anyway to deal with other inconsistencies and quirks.

mwcampbell · on Dec 9, 2014

> Sorry, I'm not impressed with serializable isolation via a single writer mutex.

I think that for a large number of applications, this would work just fine. Especially if the underlying storage is SSD-based -- something that's now easily obtained via VPS providers like DigitalOcean, Linode, and Vultr. After all, most database-driven applications aren't large-scale operations. So it's probably better to favor correctness and simplicity over concurrency and scalability.

grogers · on Dec 9, 2014

I'm probably the minority on this, but I think innodb's docs on its isolation modes are clearer and easier to understand than postgres'. That could be because I have internalized them over many years though.

To me, the only sensible isolation level for transactions is serializable. We've had so many consistency issues because developers didn't understand when to lock and when not to. Everyone (including me) gets it wrong from time to time.

Very careful use of nonlocking selects can be correct and improve concurrency, but across a huge codebase, bugs slip in. If I could turn on a mode where the default is "select ... lock in share mode" but opt into a "select ... nonlocking" I would be eternally grateful. Judicious use of wrappers where we force the user to specify the lock mode has helped curb this trend quite a bit.

I'd still take MVCC with tons of warts over eventual consistency any day of the week. It is impossible to reason about eventual consistency because the database is literally allowed to do basically anything. Causal should really be the default starting point for AP systems.

Animats · on Dec 9, 2014

To me, the only sensible isolation level for transactions is serializable.

There's something to be said for this. As was discussed a few weeks ago, "repeateable read" semantics differ quite a bit between MySQL/InnoDB and MS SQL. In MySQL/InnoDB, you can have two transactions update the same record in repeatable read mode, and lose the second update without an error being reported.

But you take a huge hit on performance in serializable mode. You can only do one thing at a time.

nehan · on Dec 10, 2014

No. In serializable mode it simply appears that one thing is happening at a time. You can touch unrelated data concurrently with no performance impact.

rectang · on Dec 9, 2014

While they aren't distributed, Apache Lucene and Apache Lucy (the "loose C" port of Lucene that I work on) both implement MVCC: each index reader object represents a point-in-time view of the index that persists for the object's lifetime.

Core developers such as myself have to deal with some subtleties when implementing MVCC, but I wouldn't say that MVCC is too terribly hard for our users. The thing is, our interface does not look anything like SQL -- you have to use a different mental model when interacting with our data stores, and of course they are not suitable for all applications where you might use SQL.

What I took away from the article is that MVCC does not fit well within SQL semantics.

grandalf · on Dec 9, 2014

This is a great thing to read and discuss, as opposed to blindly jumping on one bandwagon or another.

A relational database is a framework for storing data that comes with some rules in in exchange helps to guarantee certain characteristics. A flat file is too. The choice of which one to use depends on the problem.

Often there are several good choices. Imagine all the relational db zealots who frown upon using a relational db in a slightly unconventional way, such as strong events in a single table using a json column in postgres. Yet postgres provides some useful features and so it may actually be very smart to use it that way.

klochner · on Dec 9, 2014

  > Does the first part of this excerpt contradict the second part? (Emphasis mine)

No, it's perfectly clear and consistent. Does anyone else find it confusing?

debacle · on Dec 9, 2014

I wish people would not blog as experts about things they don't know a great deal about.

empthought · on Dec 9, 2014

> Many developers, including myself, have written applications that fall afoul of the MVCC implementation and rules.

Color me shocked.

imanaccount247 · on Dec 9, 2014

He's trying to claim that MVCC pushes complexity on to the user, but the examples of that are just typical "mysql does it wrong". So, that means mysql pushes complexity on to the user, not MVCC.

spacemanmatt · on Dec 9, 2014

To his credit, the author acknowledges PostgreSQL is not his strong suit. As a pg fanatic myself I think there is a lot of negativity toward SQL in general that is more fairly leveled at MySQL and Oracle in specific.