Tell me what is your data structure and I’ll tell you who you are.
We have come from RDBMS – people born in 20th century. No doubts about it. MySQL, SQL Server, Oracle, PostGreSQL – does it ring the bell? What about Sqlite? You have done all or most of them? I guess you believe that ACID is the Alpha and Omega of the database system. We learnt this paradigm in school and cannot leave it easily. Same as the believe that everyone needs a house and therefor a mortgage.
Atomicity gives you peace of mind. Either all requested operations occur or none. Sleep well, my friend.
Consistency saves your job. All updates will be done according to the database constraints, such as foreign keys and triggers.
Isolation speaks for itself. All your updates will be isolated from other users. Before you finish, they won’t see your changes half-done.
Durability guarantees that transaction persists after commit, whatever happens to the database. Just don’t forget to make a regular copy.
Doesn’t it sound too good? This looks like perfect and healthy McDonald burger. But yet, … nothing is final. Newton mechanics is just a subset of the universe law. Einstein proved it beautifully. Yet the theory of relativity is also not final… so?
Stop it, I am not going to enter into highly scientific discussions and will speak from a point of view of a simple software developer and a project manager that is responsible for fitting into allocated budget.
1. Everything comes at a price.
How much ACID costs? Smallest package of SQL server starts from few thousands EUR. Ask your local reseller. Then comes the hardware and administration cost. NoSql is a small part of SQL server cost.
2. Nothing is what it seems.
Even you have a complete infrastructure of SQL database, you aren’t ACID. It is a beautiful illusion. ACID spawns over one database server but any real life transaction is distributed geographically and timely. All classical book examples of ACID transaction assume that both accounts, credited and debited one, are in the same bank and in same database. What if they are in different databases? Real system has extra level of multiple transactions that cannot be guaranteed to be ACID automatically and need human control. It means full automation can cover only small part of the complete transaction. This said, ACID of a single database isn’t very important. 1% of failed transactions can be found and corrected by human.
A careful reader would ask: Good, this is about banking, but what about a system gathering data in M2M environment and controlling machines. It cannot afford 1% of failures.
Such systems do not use classical database structure anyway. They rely on other ways to prove that transaction is properly committed or rolled back. They have complex real-time state machine that guarantees very high reliability.
3. Optimistic or pessimistic locking.
ACID transactions use pessimistic locking, e.i. they acquire all necessary locks before making changes. Locks guarantee that other processes cannot changes data affected by the current transaction. This is good for relatively low frequency of updates. In more concurrent environment locks would become a bottleneck. Many transactions would have to be aborted and repeated or given up. This is not a good solution. Optimistic locking uses different approach. It reads the status of all records to be updated and prepares all updates. Then it read the status again and if it has not been changed, updates are committed. If status has been changed, it invokes conflict resolution procedure.
4. How many transactions really need ACID
Those, who developed real database systems, know that very few updates need ACID guaranteed transactions. In fact it would be less than 1% of all queries. ACID transactions only necessary if updates affects multiple tables and concurrency is expected. If your database requires lot of ACID transactions, it may point to bad design and eventually lead to poor performance.
Example of ACID transaction.
Data is steadily coming from many water meters. Every record has ID and value. Application stores every record and at the same time updates sum of all values for current day.
Every transaction need to insert a new record into ‘meters’ table and add its value to a sum field of sum_day table.
Every transaction must do following
1. Read sum_day record and lock it for writing
2. Insert value into meters table
3. Update sum in the sum_day table
4. Release the lock from sum_day record
It looks good, except that it removes the concurrency totally. All transactions will be serialised by the locking of sum_day record. This is classical example of wrong solution that is right by definitions.
Correct solution would be to save separately sum_day for every meter. In this case concurrency would not be affected because it is very unlikely that same meter will send data 2 times at the same time. When total sum is needed it can run a query on all day_sum of all meters and give the total value. It would be much faster than calculating it on every write operation and effectively eliminates the need of locking.
5. noSQL ultra short introduction
NoSQL has concept of collections and documents. Collection is like a table and document like a record in that table. The difference from SQL is that document structure is not fixed and every document can contain any set of fields of any types.
– What a mess – says the database programmer of 20th century. But don’t rush to conclusion. In fact, you can use it as SQL style table, but it’s not limited to it. SQL style table is just a subset of document approach. Very simple example. In SQL server you have a users table that among other fields has middle name and maiden name. Very few users will fill it but it will be present in the structure and fill the memory (I know that modern systems will not allocate storage space for empty fields, but still it will use memory during processing because of tabular nature of data). In case of noSQL these fields will exist only if they have data.
Another difference is that document can have nested documents. This solves very important problem of SQL database, namely related tables. Remember, it is called RDBMS, where R stands for relational. We can make one-to-one or one-to-many-relations. In fact, in case of one-to-many relation, the secondary table contains nested records of the master table. They are just stored separately. And to connect them, relations and foreign keys are required. For example, people may have multiple addresses. In SQL we would use users table and addresses table connected by user_id field. In noSQL we can create nested array of addresses and it will be stored inside of the user document. We spare: extra table, foreign key and transactions. This works for relatively small amount of one-to-many relations. In case of big amount we still need a separate collection
What will be the future database system?
My knowledge and experience is very modest to answer this question, yet I am thinking about it. Same as mobile phone users imagine their dream phone. As usual, different approaches will mutate to adopt the best features of each one and finally come to mixture of SQL, ACID and NoSQL. We can already see these movements. PostgreSQL and Oracle are constantly adding support for NoSql and MongoDB has many features of relational database.