When I tell people, “No transactions!” their first reaction is either “he’s mad” or “for my use-case its necessary”.
I know this was my reaction when it was first said to me.
I happened to be working on a money based system at the time; something you would think that is the very definition of a system that “must” use transactions.
As engineers we are instinctively drawn to things that are “right”, “clean” or “correct” and we are therefore attracted to the warm fuzzy feeling we get from using ACID compliant transactions.
We happily use SQL transactions with the knowledge that our data will always be perfectly clean and neat.
However, should you ever need to build a system that truly scales, this same perfectionism will kill you.
In order to get the maximum possible performance out of your system, you might just have to do something uncomfortable and sacrifice “correct” on the altar of performance.
A simple use-case
An oversimplified order system with the following tables:
Version 1 – Traditional style with Transactions
Consider the following code:
If this code looks like something you might have written, don’t worry. This style works, it is not however as fast as it could be.
Before you go refactoring all your code to one of the other styles (below), consider if you really need to bother.
If the code path is seldom used or if you just don’t have the traffic yet, then please don’t waste your time and your customer’s money.
Version 2 – No transactions
Consider for a second, what is the cost of simply removing the transaction and just making 2 SQL calls?
Firstly, you could end up with “dirty” data; Would anyone actually even notice? Could it be cleaned up offline or at some low-load time?
Would the user’s UX be compromised? This is something that should be vigorously avoided.
This is what that code might look like:
Version 3 – Self cleaning
If you decide that occasional dirty data is unacceptable; you can achieve the same goal as the original transaction by detecting the error and cleaning up yourself.
In the below example I have introduced a 1% error rate which is unrealistic in the extreme on purpose:
You could even add retry and other things here and make this version superior to the first example.
I’ve created a benchmark using the above code fragments (uploaded here) and here are the results.
- The code with transactions is 5-6% slower 100% of the time.
- The “clean up” code in this case is not that complicated and it ended up being slower than the version with transactions. Even if it had been slower than the version with transactions, it only be slower for the very small percentage of time that errors occur.
- Sometimes dirty data is ok (really it is)
- People often counter with “I’m using row-level locking, the transactions should not be a problem for me”. This is not true as the implicit locks will lock the indexes and other things that are shared. But please feel free to benchmark and see.