CS.Lectures: Query Processing and Query optimization

Query Processing includes translations on high-level Queries into low-level expressions that can be used at the physical level of the file system, query optimization and actual execution of the query to get the actual result.

In Simple, Query Processing is the activity performed in extracting data from the database.

Textbook Based:

A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated.

The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation names—that appear in the text of the query, whereas the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language.

The query must also be validated by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried.

An internal representation of the query is then created, usually as a tree data structure called a query tree. It is also possible to rep-resent the query using a graph data structure called a query graph.

The DBMS must then devise an execution strategy or query plan for retrieving the results of the query from the database files. A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization.

Query processing has various steps for fetching the data from the database. The steps involved are:

Parsing and translation
Optimization
Evaluation

Parsing and translation

Before processing a query, a computer system needs to translate the query, The translation process in query processing is similar to the parser of a query. When a user executes any query, for generating the internal form of the query, the parser in the system checks the syntax of the query, verifies the name of the relation in the database, the tuple, and finally the required attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of the views when used in the query. During parse call, the database performs the following checks- Syntax check, Semantic check and Shared pool check, after converting the query into relational algebra.

Syntax check

Semantic check

Shared Pool check

Hard Parse and Soft Parse –

If there is a fresh query and its hash code does not exist in the shared pool then that query has to pass through from the additional steps known as hard parsing otherwise if hash code exists then query does not pass through additional steps. It just passes directly to execution engine (refer detailed diagram). This is known as soft parsing.

Hard Parse includes following steps – Optimizer and Row source generation.

Query Optimization

It is a process in which multiple query execution plan for satisfying a query are examined and the most efficient query plan is satisfied for execution.
Database catalogue stores the execution plans and then optimizer pass the lowest cost plan for execution.

There are two main techniques that are employed during query optimization.

The first technique is based on heuristic rules for ordering the operations in a query execution strategy. A heuristic is a rule that works well in most cases but is not guaranteed to work well in every case. The rules typically reorder the operations in a query tree.

The second technique involves systematically estimating the cost of different execution strategies and choosing the execution plan with the lowest cost estimate. These techniques are usually combined in a query optimizer.

Two internal representations of a query:

◼ Query Tree

◼ Query Graph

Row Source Generation –
The Row Source Generation is a software that receives an optimal execution plan from the optimizer and produces an iterative execution plan that is usable by the rest of the database. the iterative plan is the binary program that when executes by the SQL engine produces the result set.

Query Evaluation Plan

In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query. The query evaluation plan is also referred to as the query execution plan.
A query execution engine is responsible for generating the output of the given query. It takes the query execution plan, executes it, and finally makes the output for the user query.

Saturday, December 26, 2020

Query Processing and Query optimization

Query Evaluation Plan