Data Storage: NoSQL Stores , Key-Value Stores , Columnar Stores ,
Document Stores ,Graph Databases ,Case Studies , HDFS, HBase, Hive,
MongoDB, Neo4j

NoSQL database stands for "Not Only SQL" or "Not SQL."
Non-relational Data Management System, that does not require a fixed schema.
Major purpose- for distributed data stores with humongous data storage needs.
NoSQL is used for Big data and real-time web apps.
For example, companies like Twitter, Facebook and Google collect terabytes of user data every single day.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of database technologies that can store structured, semi-structured, unstructured & polymorphic data.

The system response time becomes slow when you use RDBMS for massive volumes of data. To resolve this problem, we could "scale up" our systems by upgrading our existing hardware. This process is expensive.The alternative for this issue is to distribute database load on multiple hosts whenever the load increases. This method is known as "scaling out."NoSQL database is non-relational, so it scales out better than relational databases as they are designed with web applications in mind.

NoSQL Data Stores

It is a datastore that store and handle really big data and it provides High availability - which means serving to many concurrent users.

NoSQL achieves this with scale out architectures which means we can have many machines and these machines can be commodity hardware. Every NoSQL supports the addition of hardware whenever needed.

The other properties of NoSQL data store is that they are non-relational meaning they don't guarantee ACID properties and don't to adhere to a fixed schema.

All of the NoSQL datastores available in market at open source? Why?

To avoid vendor lockins, the users prefer open source data store.

Every NoSql stores records. Based on the type of record a NoSQL can store, it is classified into four categories

Key-value pair
Document-oriented
Column-family table
Graph

1. Key-value stores are the simplest. Every item in the database is stored as an attribute name (or ”key”) together with its value. Riak, Voldemort, and Redis are the most well-known in this category.
2. Column databases store data together as columns instead of rows and are optimized for queries over large datasets. The most popular are Cassandra and HBase.
3. Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. MongoDB is the most popular of these databases.
4. Graph databases are used to store information about networks, such as social connections. Examples are Neo4J and HyperGraphDB.

Key-Value

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load.
Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.
It is one of the most basic NoSQL database example.
This kind of NoSQL database is used as a collection, dictionaries, associative arrays, etc.
Key value stores help the developer to store schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all based on Amazon's Dynamo paper.

For example, a key-value pair may contain a key like "Website" associated with a value like "Guru99".

Advantages

Can handle large amounts of data and heavy load,
Easy retrieval of data by keys.

Limitations

Complex queries may attempt to involve multiple key-value pairs which may delay performance.
Data can be involving many-to-many relationships which may collide.

Key-value pair data stores

Redis is an open source in-memory key-value pair data store. Redis is often called "the Swiss Army Knife of web application development." It can be used for caching, queuing, and storing session data for faster access than a traditional relational database, among many other use cases. Learn more on the Redis page.
Memcached is another widely used in-memory key-value pair storage system.

Column-oriented

A column-family table class of NoSQL data stores builds on the key-value pair type. Each key-value pair is considered a row in the store while the column family is similar to a table in the relational database model.

Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column databases are stored contiguously.

Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Column-family table data stores

Apache Cassandra
Apache HBase

Document-Oriented

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. The value is understood by the DB and can be queried.

Relational Vs. Document

In this diagram on your left you can see we have rows and columns, and in the right, we have a document database which has a similar structure to JSON. Now for the relational database, you have to know what columns you have and so on. However, for a document database, you have data store like JSON object. You do not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-commerce applications. It should not use for complex transactions which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular Document originated DBMS systems.

Graph

A graph database represents and stores data in three aspects: nodes, edges and properties.

A node is an entity, such as a person or business.

An edge is the relationship between two entities. For example, an edge could represent that a node for a person entity is an employee of a business entity.

A property represents information about nodes. For example, an entity representing a person could have a property of "female" or "male".

A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph database is a multi-relational in nature. Traversing relationship is fast as they are already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Advantages:
 Fastest traversal because of connections.
 Spatial data can be easily handled.
Limitations:
Wrong connections may lead to infinite loops.

Graph data stores

Neo4j is one of the most widely used graph databases and runs on the Java Virtual Machine stack.
Cayley is an open source graph data store written by Google primarily written in Go.
Titan is a distributed graph database built for multi-node clusters.

CS.Lectures

Monday, June 14, 2021

Data Stores || NoSqL Stores