No SQL Database (M5)

  • NoSQL database stands for "Not Only SQL" or "Not SQL."
  • Non-relational Data Management System, that does not require a fixed schema. 
  • Major purpose- for distributed data stores with humongous(huge; enormous.) data storage needs. 
  • NoSQL is used for Big data and real-time web apps. 
  • For example, companies like Twitter, Facebook and Google collect terabytes of user data every single day. 

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of database technologies that can store structured, semi-structured, unstructured & polymorphic data. The system response time becomes slow when you use RDBMS for massive volumes of data. 

To resolve this problem, we could "scale up" our systems by upgrading our existing hardware. This process is expensive.

The alternative for this issue is to distribute database load on multiple hosts whenever the load increases. This method is known as "scaling out.

 

NoSQL database is non-relational, so it scales out better than relational databases as they are designed with web applications in mind.

Features of NoSQL

  • They have higher scalability.

  • They use distributed computing.

  • They are cost effective.

  • They support flexible schema.

  • They can process both unstructured and semi-structured data.

  • There are no complex relationships, such as the ones between tables in an RDBMS.

 

Non-relational

  • NoSQL databases never follow the relational model
  • Never provide tables with flat fixed-column records
  • Work with self-contained aggregates or BLOBs
  • Doesn't require object-relational mapping and data normalization
  • No complex features like query languages, query planners, referential integrity joins, ACID

Schema-free

  • NoSQL databases are either schema-free or have relaxed schemas
  • Do not require any sort of definition of the schema of the data
  • Offers heterogeneous structures of data in the same domain

 

 

Simple API

  • Offers easy to use interfaces for storage and querying data provided
  • APIs allow low-level data manipulation & selection methods
  • Text-based protocols mostly used with HTTP REST with JSON
  • Mostly used no standard based NoSQL query language
  • Web-enabled databases running as internet-facing services

Distributed

  • Multiple NoSQL databases can be executed in a distributed fashion
  • Offers auto-scaling and fail-over capabilities
  • Often ACID concept can be sacrificed for scalability and throughput
  • Mostly no synchronous replication between distributed nodes Asynchronous Multi-Master Replication, peer-to-peer, HDFS Replication
  • Only providing eventual consistency
  • Shared Nothing Architecture- This enables less coordination and higher distribution.

NoSQL is Shared Nothing.

 

Types of No SQL Databases

  • Key-value Pair Based
  • Column-oriented Graph
  • Graphs based
  • Document-oriented

Applications That Work Best With NoSQL Database

Key Value Pair Based

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load.

Key-value pair storage databases, store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like "Website" associated with a value like "Guru99".

It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a collection, dictionaries, associative arrays, etc. Key value stores help the developer to store schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all based on Amazon's Dynamo paper. 

Advantages:
 Can handle large amounts of data and heavy load,
 Easy retrieval of data by keys.
Limitations:
 Complex queries may attempt to involve multiple key-value pairs which may delay performance.
 Data can be involving many-to-many relationships which may collide.

Column-based

Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column databases are stored contiguously.

Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Document-Oriented:

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. The value is understood by the DB and can be queried.

Relational Vs. Document

In this diagram on your left you can see we have rows and columns, and in the right, we have a document database which has a similar structure to JSON. Now for the relational database, you have to know what columns you have and so on. However, for a document database, you have data store like JSON object. You do not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-commerce applications. It should not use for complex transactions which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular Document originated DBMS systems. 

Advantages:
 This type of format is very useful and apt for semi-structured data.
 Storage retrieval and managing of documents is easy.
Limitations:
 Handling multiple documents is challenging
 Aggregation operations may not work accurately.

Graph-Based

A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph database is a multi-relational in nature. Traversing relationship is fast as they are already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data. 

Advantages:
 Fastest traversal because of connections.
 Spatial data can be easily handled.
Limitations:
Wrong connections may lead to infinite loops.

What is the CAP Theorem?

CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data store to offer more than two out of three guarantees

  1. Consistency
  2. Availability
  3. Partition Tolerance

Consistency:

The data should remain consistent even after the execution of an operation. This means once data is written, any future read request should contain that data. For example, after updating the order status, all the clients should be able to see the same data.

Availability:

The database should always be available and responsive. It should not have any downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the communication among the servers is not stable. For example, the servers can be partitioned into multiple groups which may not communicate with each other. Here, if part of the database is unavailable, other parts are always unaffected.

BASE is an abbreviation for “basically available, soft-state, and eventual consistency,” and the meanings are described as follows.


(1) Basically available: The DB system can execute and always provide services. Some parts of the DB system may have partial failures and the rest of the DB system can continue to operate. Some NoSQL DBs typically keep several copies of specific data on different servers, which allows the DB system to respond to all queries even if few of the servers fail.
(2) Soft-state: The DB system does not require a state of strong consistency. Strong consistency means that no matter which replication of a certain data is updated, all later reading operations of the data must be able to obtain the latest information.
(3) Eventual consistency: The DB system needs to meet the consistency requirement after a
certain time. Sometimes the DB may be in an inconsistent state. For example, some NoSQL
DBs keep multiple copies of certain data on multiple servers. However, these copies may be
inconsistent in a short time, which may happen when a copy of the data is updated while the
other copies continue to have data from the old version. Eventually, the replication
mechanism in the NoSQL DB system will update all replicas to be consistent.

Advantages of NoSQL

  • Can be used as Primary or Analytic Data Source
  • Big Data Capability
  • No Single Point of Failure
  • Easy Replication
  • No Need for Separate Caching Layer
  • It provides fast performance and horizontal scalability.
  • Can handle structured, semi-structured, and unstructured data with equal effect
  • Object-oriented programming which is easy to use and flexible
  • NoSQL databases don't need a dedicated high-performance server
  • Support Key Developer Languages and Platforms
  • Simple to implement than using RDBMS
  • It can serve as the primary data source for online applications.
  • Handles big data which manages data velocity, variety, volume, and complexity
  • Excels at distributed database and multi-data center operations
  • Eliminates the need for a specific caching layer to store data
  • Offers a flexible schema design which can easily be altered without downtime or service disruption

Disadvantages of NoSQL

  • No standardization rules
  • Limited query capabilities
  • RDBMS databases and tools are comparatively mature
  • It does not offer any traditional database capabilities, like consistency when multiple transactions are performed simultaneously.
  • When the volume of data increases it is difficult to maintain unique values as keys become difficult
  • Doesn't work as well with relational data
  • The learning curve is stiff for new developers
  • Open source options so not so popular for enterprises. 

When should NoSQL be used:
1. When huge amount of data need to be stored and retrieved .
2. The relationship between the data you store is not that important
3. The data changing over time and is not structured.
4. Support of Constraints and Joins is not required at database level
5. The data is growing continuously and you need to scale the database regular to handle the data. 


What is the Difference Between Relational and Nonrelational Database -  Pediaa.Com

XML | DTD | XPath | X Query | X Schema (M2.2)

XML

XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language (SGML).

A mark up language is a modern system for highlight or underline a document.

Students often underline or highlight a passage to revise easily, same in the sense of modern mark up language highlighting or underlining is replaced by tags.

XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many successful features of HTML.

There are three important characteristics of XML that make it useful in a variety of systems and solutions −

  • XML is extensible − XML allows you to create your own self-descriptive tags, or language, that suits your application.

  • XML carries the data, does not present it − XML allows you to store the data irrespective of how it will be presented.

  • XML is a public standard − XML was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard.

    Why XML ?

    Platform Independent and Language Independent: The main benefit of XML is that you can use it to take data from a program like Microsoft SQL, convert it into XML then share that XML with other programs and platforms. You can communicate between two platforms which are generally very difficult.

    The main thing which makes XML truly powerful is its international acceptance. Many corporation use XML interfaces for databases, programming, office application mobile phones and more. It is due to its platform independent feature.

    Advantages of XML

    Simplicity
    Open standard and platform/vendor-independent
    Extensibility
    Reuse
    Separation of content and presentation
    Improved load balancing
    Support for the integration of data from multiple sources
    Ability to describe data from a wide variety of applications
    More advanced search en
    gines
    New opportunities

    1) XML separates data from HTML

    If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes.

    With XML, data can be stored in separate XML files. This way you can focus on using HTML/CSS for display and layout, and be sure that changes in the underlying data will not require any changes to the HTML.

    With a few lines of JavaScript code, you can read an external XML file and update the data content of your web page.

    2) XML simplifies data sharing

    In the real world, computer systems and databases contain data in incompatible formats.

    XML data is stored in plain text format. This provides a software- and hardware-independent way of storing data.

    This makes it much easier to create data that can be shared by different applications.

    3) XML simplifies data transport

    One of the most time-consuming challenges for developers is to exchange data between incompatible systems over the Internet.

    Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications.

    4) XML simplifies Platform change

    Upgrading to new systems (hardware or software platforms), is always time consuming. Large amounts of data must be converted and incompatible data is often lost.

    XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

    5) XML increases data availability

    Different applications can access your data, not only in HTML pages, but also from XML data sources.

    With XML, your data can be available to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc), and make it more available for blind people, or people with other disabilities.

    6) XML can be used to create new internet languages

    A lot of new Internet languages are created with XML.

    Here are some examples:

  • XHTML
  • WSDL for describing available web services
  • WAP and WML as markup languages for handheld devices
  • RSS languages for news feeds
  • RDF and OWL for describing resources and ontology
  • SMIL for describing multimedia for the web
  • XML Example

    XML documents create a hierarchical structure looks like a tree so it is known as XML Tree that starts at "the root" and branches to "the leaves".

        
    <bookstore>
        <book category="COOKING">
            <title lang="en">Everyday Italian</title>
            <author>Giada De Laurentiis</author>
            <year>2005</year>
            <price>30.00</price>
        </book>
        <book category="CHILDREN">
            <title lang="en">Harry Potter</title>
            <author>J K. Rowling</author>
            <year>2005</year>
            <price>29.99</price>
        </book>
        <book category="WEB">
            <title lang="en">Learning XML</title>
            <author>Erik T. Ray</author>
            <year>2003</year>
            <price>39.95</price>
        </book>
    </bookstore> 

    The root element in the example is <bookstore>. All elements in the document are contained within <bookstore>.

    The <book> element has 4 children: <title>,< author>, <year> and <price>.

    Is XML a Programming Language?

    A programming language consists of grammar rules and its own vocabulary which is used to create computer programs. These programs instruct the computer to perform specific tasks. XML does not qualify to be a programming language as it does not perform any computation or algorithms. It is usually stored in a simple text file and is processed by special software that is capable of interpreting XML.

  • XML stands for eXtensible Markup Language
  • XML is a markup language much like HTML
  • XML was designed to store and transport data
  • XML was designed to be self-descriptive
  • XML is a W3C Recommendation
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

 the XML above does not DO anything. XML is just information wrapped in tags.

Many computer systems contain data in incompatible formats. Exchanging data between incompatible systems (or upgraded systems) is a time-consuming task for web developers. Large amounts of data must be converted, and incompatible data is often lost.

XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data.

XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

With XML, data can be available to all kinds of "reading machines" like people, computers, voice machines, news feeds, etc.

In XML, data can have an elaborate and intricate (very complicated or detailed.) structure that is significantly richer and more complex than a table of rows and columns.

 DTD

  • DTD stands for Document Type Definition.
  • DTD defines the structure and the legal elements and attributes of an XML document.
  • Before proceeding with XML DTD, you must check the validation. 
  • An XML document is called "well-formed" if it contains the correct syntax.
  • A well-formed and valid XML document is one which have been validated against DTD.

Example

Note.dtd:

<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

The DTD above is interpreted like this:

  • !DOCTYPE note -  Defines that the root element of the document is note
  • !ELEMENT note - Defines that the note element must contain the elements: "to, from, heading, body"
  • !ELEMENT to - Defines the to element to be of type "#PCDATA"
  • !ELEMENT from - Defines the from element to be of type "#PCDATA"
  • !ELEMENT heading  - Defines the heading element to be of type "#PCDATA"
  • !ELEMENT body - Defines the body element to be of type "#PCDATA"

 

XPath  

XPATH is an XML path used for navigation through the HTML structure of the page.   

It is a syntax or language for finding any element on a web page using XML path expression.   

XPath can be used for both HTML and XML documents to find the location of any element on a webpage using HTML DOM structure.

Difference between “/” and “//” in XPath

There are mainly three differences between single slash and double slash.

1. Single slash is used to create absolute XPath whereas Double slash is used to create relative XPath.

2. Single slash selects an element from the root node. For example, /html will select the root HTML element.

Double slash search element from anywhere on the web page. For example, //table will select all the table elements from anywhere on the web page.

3. Single slash (/) defines ancestor and descendant relationships if used in the middle. For example, //div/table returns the div which contains a table object.

If double slash (//) is used in middle, it defines a descendant relationship. For example, /html//title returns title element which is descendant of html element.

FLWOR

FLWOR is pronounced "flower", and is an acronym for the keywords used to introduce each clause (for, let, where, order by, and return).

for $x in doc("books.xml")/bookstore/book
where $x/price>30
return $x/title

With FLWOR you can sort the result:

for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title

The for clause selects all book elements under the bookstore element into a variable called $x.

The where clause selects only book elements with a price element with a value greater than 30.

The order by clause defines the sort-order. Will be sort by the title element.

The return clause specifies what should be returned. Here it returns the title elements.

FLWOR expressions are frequently used to combine related information. The possible combinations are generated by using variables in the for clause and using a where clause to filter out combinations that are not useful. This is known as a "join".

XML SCHEMA

  • The XML Schema language is also referred to as XML Schema Definition (XSD).
  • An XML Schema describes the structure of an XML document, just like a DTD.
  • An XML document with correct syntax is called "Well Formed".
  • An XML document validated against an XML Schema is both "Well Formed" and "Valid".
  • The purpose of an XML Schema is to define the legal building blocks of an XML document like:
  • the elements and attributes that can appear in a document
  • the number of (and order of) child elements
  • data types for elements and attributes
  • default and fixed values for elements and attributes

XSD Example

<xs:element name="note">

<xs:complexType>
  <xs:sequence>
    <xs:element name="to" type="xs:string"/>
    <xs:element name="from" type="xs:string"/>
    <xs:element name="heading" type="xs:string"/>
    <xs:element name="body" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

</xs:element>

XML Schemas are More Powerful than DTD

  • XML Schemas are written in XML
  • XML Schemas are extensible to additions
  • XML Schemas support data types
  • XML Schemas support namespaces

Why Use an XML Schema?

With XML Schema, your XML files can carry a description of its own format.

With XML Schema, independent groups of people can agree on a standard for interchanging data.

With XML Schema, you can verify data.

XML Schemas Support Data Types

One of the greatest strengths of XML Schemas is the support for data types:

  • It is easier to describe document content
  • It is easier to define restrictions on data
  • It is easier to validate the correctness of data
  • It is easier to convert data between different data types
XML Schemas use XML Syntax

Another great strength about XML Schemas is that they are written in XML:

  • You don't have to learn a new language
  • You can use your XML editor to edit your Schema files
  • You can use your XML parser to parse your Schema files
  • You can manipulate your Schemas with the XML DOM
  • You can transform your Schemas with XSLT

 

XQuery

XQuery is to XML what SQL is to databases.

XQuery is designed to query XML data.


XQuery
  • XQuery is the language for querying XML data
  • XQuery for XML is like SQL for databases
  • XQuery is built on XPath expressions
  • XQuery is supported by all major databases
  • XQuery is a W3C Recommendation

 

X Query Example 

for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title

 

XQuery is About Querying XML

XQuery is a language for finding and extracting elements and attributes from XML documents.

Here is an example of what XQuery could solve:

"Select all CD records with a price less than $10 from the CD collection stored in cd_catalog.xml"

XQuery can be used to:
  • Extract information to use in a Web Service
  • Generate summary reports
  • Transform XML data to XHTML
  • Search Web documents for relevant information
XQuery is compatible with several W3C standards, such as XML, Namespaces, XSLT, XPath, and XML Schema.

XPath

XPath is a major element in the XSLT standard.

XPath can be used to navigate through elements and attributes in an XML document.

XPath
  • XPath stands for XML Path Language
  • XPath uses "path like" syntax to identify and navigate nodes in an XML document
  • XPath contains over 200 built-in functions
  • XPath is a major element in the XSLT standard
  • XPath is a W3C recommendation
XPath Path Expressions

XPath uses path expressions to select nodes or node-sets in an XML document.

These path expressions look very much like the path expressions you use with traditional computer file systems:

 

XPath includes over 200 built-in functions.

There are functions for string values, numeric values, booleans, date and time comparison, node manipulation, sequence manipulation, and much more.

Today XPath expressions can also be used in JavaScript, Java, XML Schema, PHP, Python, C and C++, and lots of other languages.

XPath is Used in XSLT

XPath is a major element in the XSLT standard.

With XPath knowledge you will be able to take great advantage of your XSLT knowledge.


 

https://www.javatpoint.com/xquery-tutorial

 

Database Important Questions

Module2

2015

1. What is range partitioning?

  • Range partitioning is a type of relational database partitioning where the partition is based on a predefined range for a specific data field such as uniquely numbered IDs, dates or simple values like currency. 
  • A partitioning key column is assigned with a specific range, and when a data entry fits this range, it is assigned to this partition; otherwise it is placed in another partition where it fits. 
  • In a range partitioned table, rows are distributed based on a "partitioning key" where the only requisite is whether or not the data falls within the range specification of the key. 
  • For example, if the partition key is a date column, and January 2015 is a partition, then all data containing values from January 1, 2015 to January 31, 2015 will be placed in this partition. 
  • Range partitioning is quite useful for applications requiring high performance 
  • This makes data segregation easy and access to each smaller partition is fast

Characteristics of range partitioning:

  • Each partition has an exclusive upper bound.
  • Each partition has a non-inclusive lower bound, except for the very first partition.

2. Discuss about inter and intra query parallelism in parallel database

Parallelism is used to support speedup, where queries are executed faster because more resources, such as processors and disks, are provided. Parallelism is also used to provide scale-up, where increasing workloads are managed without increase response-time, via an increase in the degree of parallelism. 

Intraquery Parallelism

  • Intraquery parallelism defines the execution of a single query in parallel on multiple processors and disks. 
  • Using intraquery parallelism is essential for speeding up long-running queries.
  • This application of parallelism decomposes the serial SQL, query into lower-level operations such as scan, join, sort, and aggregation.
  • These lower-level operations are executed concurrently, in parallel.

It is the form of parallelism where Single Query is executed in parallel on many processors.

Types

Intra-operation parallelism – the process of speeding up a query through parallelizing the execution of individual operations. The operations which can be parallelized are Sort, Join, Projection, Selection and so on.

Inter-operation parallelism – the process of speeding up a query through parallelizing various  operations which are part of the query. For example, a query which involves join of 4 tables can be executed in parallel in two processors in such a way that each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result.

Example Database systems which support Intra-query Parallelism

Informix, Terradata.

Advantages

To speed up a single complex long running queries.

Best suited for complex scientific calculations (queries).

Supported Parallel Database Architectures

 

Interquery Parallelism

  • In interquery parallelism, different queries or transaction execute in parallel with one another.
  • This form of parallelism can increase transactions throughput.
  • The primary use of interquery parallelism is to scale up a transaction processing system to support a more significant number of transactions per second.

It is a form of parallelism where many different Queries or Transactions are executed in parallel with one another on many processors.

Advantages

It increases Transaction Throughput. That is, number of transactions executed in a given time can be increased.

It scales up the Transaction processing system. Hence, best suited for On-Line Transaction Processing (OLTP) systems.

Supported Parallel Database Architectures

It is easy to implement in Shared Memory Parallel System. Lock tables and Log information are maintained in the same memory. Hence, it is easy to handle those transactions which shares locks with other transactions. Locking and logging can be done efficiently.

In other parallel architectures like Shared Disk and Shared Nothing, the locking and logging must be done through message passing between processors, which is considered as costly operation when compared Shared Memory Parallel architecture. Cache coherency problem would occur.

Example Database systems which support Inter-query Parallelism

Oracle 8 and Oracle Rdb

 

 

3. Briefly describe the different types of architectures in DBMS

Click 

2016

1. Differentiate between parallel systems and Distributed System.

 Click 

2. Discuss how intra and inter query parallelism is obtained in parallel database.

3.. Discuss various types of database system architecture with diagram.

Click  

2017

1. Explain Range partitioning sorting briefly 

Works in 2 steps.

  • Range partitioning the relation
  • Sorting each partition separately 
Our aim is to sort a relation (table) Ri that resides on n disks on an attribute A in parallel. 

Steps:

Step 1: Partition the relations Ri on the sorting attribute A at every processor using a range vector v. Send the partitioned records which fall in the ith range to Processor Pi where they are temporarily stored in Di.
Step 2: Sort each partition locally at each processor Pi. And, send the sorted results for merging with all the other sorted results which is trivial process.

Point to note:

Range partition must be done using a good range-partitioning vector. Otherwise, skew might be the problem.

 

2. With pipelined parallelism, it is often a good idea to perform several operations in a 

pipeline on a single processor ,even when many processors are available.Justify.

a. Explain why.
b. Would the arguments you advanced in part a hold if the machine has a shared-memory architecture? Explain why or why not.
c. Would the arguments in part a hold with independent parallelism? (That is, are there 

cases where, even if the operations are not pipelined and there are many processors 

available, it is still a good idea to perform several operations on the same processor?)

a. The speed-up obtained by parallelizing the operations would be offset by the data transfer overhead, as each tuple produced by an operator would have to be transferred to its consumer, which is running on a different processor. 

b. In a shared-memory architecture, transferring the tuples is very efficient. So the above argument does not hold to any significant degree. 


c. Even if two operations are independent, it may be that they both supply their outputs to a common third operator. In that case, running all three on the same processor may be better than transferring tuples across processors. 
 

3. What form of parallelism (interquery, interoperation, or intraoperation)  is likely to be the most important for each of the following tasks?
a. Increasing the throughput of a system with many small queries
b. Increasing the throughput of a system with a few large queries, when the number of disks and processors is large

Answer:
a. When there are many small queries, inter-query parallelism gives good throughput. Parallelizing each of these small queries would increase the initiation overhead, without any significant reduction
in response time.
b. With a few large queries, intra-query parallelism is essential to get fast response times. Given that there are large number of processors and disks, only intra-operation parallelism can take
advantage of the parallel hardware – for queries typically have

2018

1. Differentiate inter and intra operation parallelism 

Intra-operation parallelism – the process of speeding up a query through parallelizing the execution of individual operations. The operations which can be parallelized are Sort, Join, Projection, Selection and so on.

Inter-operation parallelism – the process of speeding up a query through parallelizing various  operations which are part of the query. For example, a query which involves join of 4 tables can be executed in parallel in two processors in such a way that each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result.

Example Database systems which support Intra-query Parallelism

 

Explain about parallel database

Parallel Databases :

Nowadays organizations need to handle a huge amount of data with a high transfer rate. For such requirements, the client-server or centralized system is not efficient. With the need to improve the efficiency of the system, the concept of the parallel database comes in picture. A parallel database system seeks to improve the performance of the system through parallelizing concepts.

Need :

Multiple resources like CPUs and Disks are used in parallel. The operations are performed simultaneously, as opposed to serial processing. A parallel server can allow access to a single database by users on multiple machines. It also performs many parallelization operations like data loading, query processing, building indexes, and evaluating queries.

Advantages

  1. Performance Improvement –By connecting multiple resources like CPU and disks in parallel we can significantly increase the performance of the system.
     

  2. High availability – In the parallel database, nodes have less contact with each other, so the failure of one node doesn’t cause for failure of the entire system. This amounts to significantly higher database availability.
     

  3. Proper resource utilization – Due to parallel execution, the CPU will never be ideal. Thus, proper utilization of resources is there. 

    4. Increase Reliability – When one site fails, the execution can continue with another available site which is having a copy of data. Making the system more reliable.

    Performance Measurement of Databases :
    Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. 

 

Speedup

A fixed-sized problem executing on a small system is given to an N-times larger system. i.e Speedup is the execution of a task in less time by the increasing the degree of parallelism.


Two kinds of scaleup are relevant in parallel database systems, depending on how the size of the task is measured:


3. Discuss in detail about various Database system Architecture Design 

2019 

 1. Differentiate data servers and transaction servers

Database System Architectures  Client-server Database System  Parallel  Database System  Distributed Database System Wei Jiang. - ppt download 

2. With the help of an algorithm and example, explain how fragment and replicate join is performed during intra query parallelism

It is the general case of Asymmetric Fragment-and-Replicate join technique. Asymmetric technique is best suited if one of the relations to be joined is small, and if it can fit into memory. If the relations that are to be joined are large, and the joins is non-equal then we need to use Fragment-and-Replicate Join. It works as follows;
1. The system fragments table r into m fragments such that r0, r1, r2, .., rm-1, and s into n fragments such that s0, s1, s2, .., sn-1 . Any partitioning technique, round-robin, hash or range partitioning could be used to partition the relations.
2. The values for m and n are chosen based on the availability of processor. That is, we need at least m*n processors to perform join.
3. Now we have to distribute all the partitions of r and s into available processors. And, remember that we need to compare every tuple of one relation with every tuple of other relation. That is the records of r0 partition should be compared with all partitions of s, and the records of partition s0 should be compared with all partitions of r. This must be done with all the partitions of r and s as mentioned above. Hence, the data distribution is done as follows;
                a. As we need m*n processors, let us assume that we have processors P0,0, P0,1, …, P0,n-1, P1,0, P1,1, …, Pm-1,n-1. Thus, processor Pi,j performs the join of ri with sj.
                b. To ensure the comparison of every partition of r with every other partition of s, we replicate ri with the processors, Pi,0, Pi,1, Pi,2, …, Pi,n-1, where 0, 1, 2, …, n-1 are partitions of s. This replication ensures the comparison of every ri with complete s.
                c. To ensure the comparison of every partition of s with every other partition of r, we replicate si with the processors, P0,i, P1,i, P2,i, …, Pm-1,i, where 0, 1, 2, …, m-1 are partitions of r. This replication ensures the comparison of every si with complete r.
4. Pi,j computes the join locally to produce the join result.
Figure 2 given below shows the process of general case Fragment-and-Replicate join (it may not be the appropriate example, but it clearly shows the process);

Points to Note

1. Asymmetric Fragment-and-replicate join is the special case of general case Fragment-and-replicate join, where n or m is 1, i.e, if one of the relation does not have partitions.

2. When compared to asymmetric technique, Fragment-and-replicate join reduces the size of the tables at every processor.

3. Any partitioning techniques can be used and any joining technique can be used as well.

4. Fragment-and-replicate technique suits both Equi-join and Non-equi join.

5. It involves higher cost in partitioning.

 

3. Explain various parallel database architecture.How the execution of data varies according to the use of different architectures.

Shared memory system

  • Shared memory system uses multiple processors which is attached to a global shared memory via intercommunication channel or communication bus.
  • Shared memory system have large amount of cache memory at each processors, so referencing of the shared memory is avoided.
  • If a processor performs a write operation to memory location, the data should be updated or removed from that location.
shared memory system

Advantages of Shared memory system
  • Data is easily accessible to any processor.
  • One processor can send message to other efficiently.
Disadvantages of Shared memory system
  • Waiting time of processors is increased due to more number of processors.
  • Bandwidth problem.

Shared Disk System

  • Shared disk system uses multiple processors which are accessible to multiple disks via intercommunication channel and every processor has local memory.
  • Each processor has its own memory so the data sharing is efficient.
  • The system built around this system are called as clusters.
shared disk memory
Advantages of Shared Disk System
  • Fault tolerance is achieved using shared disk system.
    Fault tolerance: If a processor or its memory fails, the other processor can complete the task. This is called as fault tolerance.
Disadvantage of Shared Disk System
  • Shared disk system has limited scalability as large amount of data travels through the interconnection channel.
  • If more processors are added the existing processors are slowed down.
Applications of Shared Disk System
Digital Equipment Corporation(DEC): DEC cluster running relational databases use the shared disk system and now owned by Oracle.

Shared nothing disk system

  • Each processor in the shared nothing system has its own local memory and local disk.
  • Processors can communicate with each other through intercommunication channel.
  • Any processor can act as a server to serve the data which is stored on local disk.
shared nothing disk system

Advantages of Shared nothing disk system
  • Number of processors and disk can be connected as per the requirement in share nothing disk system.
  • Shared nothing disk system can support for many processor, which makes the system more scalable.
Disadvantages of Shared nothing disk system
  • Data partitioning is required in shared nothing disk system.
  • Cost of communication for accessing local disk is much higher.
Applications of Shared nothing disk system
  • Tera data database machine.
  • The Grace and Gamma research prototypes.

Hierarchical System or Non-Uniform Memory Architecture

  • Hierarchical model system is a hybrid of shared memory system, shared disk system and shared nothing system.
  • Hierarchical model is also known as Non-Uniform Memory Architecture (NUMA).
  • In this system each group of processor has a local memory. But processors from other groups can access memory which is associated with the other group in coherent.
  • NUMA uses local and remote memory(Memory from other group), hence it will take longer time to communicate with each other.
Advantages of NUMA
  • Improves the scalability of the system.
  • Memory bottleneck(shortage of memory) problem is minimized in this architecture.
Disadvantages of NUMA
The cost of the architecture is higher compared to other architectures.

 

 

 

 

 

 

 

 

Monk and Inversions

using System; public class Solution { public static void Main () { int T = Convert . ToInt32 ( Console . ReadLine...