No SQL databases …. Humm what are they? and why should I be using them?

NoSQL databases are being talked of a lot now a days and there are a number of commercially available NoSQL databases. The question now arises to what these NoSQL databases are and why would anyone want to use them. To answer this question lets look behind into the history of databases and find out how the no sql databases became popular.

Rise of relational databases (SQL based) started in the early 1980’s. They brought a lot of benefits with them, which made it one of the in disposable tools. Firstly they introduced this simple query based language to create, insert , read, delete and host of other operations. They were easy to integrate with applications. They could manage concurrency for simultaneous transactions and organize data in simple tables with rows and columns. In addition theses relational databases provided persistence and easy report generation. All these benefits provided by relational  databases made them the industry choice and made SQL the defacto language for any database related interaction.

Alright !!, so if relational databases provide us everything why do I needs a NoSQL database. Well with all the benefits that relational databases provide us, there are a few shortcomings.Most of the applications developed assemble objects in terms of a cohesive whole things where we might integrate data from multiple smaller tables. To be more clear, suppose I want to create a simple customer profile page which just consists of the customer’s personal details, in addition to his wishlist, purchase history and credit card details. Now in this case when we create a plain POJO( if its a java application) for customer profile page we put in the name, address, wishlist, card details etc. all into one java class. In the databases however, all this information is stored at multiple tables and maybe in multiple databases. So when the user want’s to create his personal profile, the data stored in the objects that capture the user information need to be stripped out and stored at multiple tables. So what basically happens is that a very simple logical structure gets splattered across lots of rows and tables.We use various ORM tools commonly to now map these objects to the database.

So now many of you might think there has to be a better way then using mapping, why not directly store objects. One of the main reasons object based databases did not come up in the 90’s or in early 2000’s is that SQL databases provided a very efficient way of integration between different applications.However things have changed a lot now, after the humongous data growth. SQL databases were developed essentially for a single node, and there was a limitation on how large this node could be. As the data kept growing, focus shifted from single node databases to multi-node distributed databases.A lot was done to adapt the relational databases for clusters, however they always seemed to have some shortcomings. So the momentum grew for NoSQL databases.

Some of the clear characteristics of NoSQL databases are that they are non-relational, scale able, open source and schema-less. However all the NoSQL databases are not the same. They tend to differ in their data model. Some of the most common NoSQL databases seem to be document stores, key-value pair stores, data stores that store graphs and data stores that have column based indexing. This is very well illustrated in the slide from Martin Fowler.

Capture                                                                              Reference: Martin Fowler

 The simplest data model to consider is the key-value pair based data model, where data is store based off a key. Its kind of like a hash-map where you give the key and the required data is returned, it doesn’t care on what is stored based off the key. The next kind of data model to consider is the document based data model, where you have a large chunk of complex data representing a whole thing. In our previous example about customer profile page, it may very well be the case where we have all the information recorded and stored in a document. In most of the cases these documents tend to use JSON, since that carries less metadata and is easy to understand and transfer. Usually all these documents are searchable and do not pertain to a single schema.Well that said these two data models are a lot similar. Key-value based databases allow you to store some metadata, which is often in the form of Id’s. So in the case of the customer profile page the key-value database will store metadata in the form of customerID. Now this is the same way in which a record is searched in a document based database. Every document generally has an Id field and most of the lookups are done based off these Ids. So we say give me all the details for customer with Id equal to something.

 Now the next thing to discuss is the column based data model. They are a bit more complex than the earlier two models that we have seen. Here is what it is. Capture                                                                                                                            Reference: Martin Fowler

 So in the column-family based databases what we have is a row-key that is paired with a column family. So in the example above we retrieve the name of the individual record by 1234-profile.name. This structure allows for faster retrieval. In all these above discussed NoSQL databases one common underlying fact is that, everywhere we store data for one whole aggregate. In our example we always store data for a customer profile page altogether rather than having all the individual pieces of data scattered over multiple places. This is called as aggregate model by Martin Fowler.

Now graph databases are not aggregate based databases like the other three. They really are databases that are good at maintaining relations. They can solve queries like friend of A who likes B living in town C. These kind of queries are extremely difficult for relational databases. Such kind of queries are solved by graph databases.

Now its wrong to say that SQL databases are good for nothing. Say, if we wanted to query list of all products available in a store, then a relational database will do best, our aggregate based  databases will be a no match. So it essentially boils down to your applications needs.

Recently there has been a lot of focus on very quick lookups. Elastic search is one of the most frequent used search framework built on top of Apache Lucene. Elastic search too has a concept of reading data through a data source like a file system that can be easily mounted. Does that mean that elastic search as a framework can be considered as an option for a no-sql database ? Well it actually depends on the application needs again. Here is a very neatly written concise article explaining the same concept: https://www.found.no/foundation/elasticsearch-as-nosql/ 

Lets continue further discussion in the subsequent posts. Have a good time reading ..:)