cs.lectures: 12/26/20

Query optimization in Centralized Systems

The optimal access path is determined after the alternative access paths are derived for the relational algebra expression. This chapter focus on query optimization in a centralized system.

Query processing for a centralized system is done to achieve:

The response time of a query is minimized.
The system throughput is maximized
The memory and storage used for processing are reduced.
Parallelism is increased.

Steps for Query Optimization

There are three steps for query optimization. They are -

Step 1 − Query Tree Generation

A relational algebra expression is represented by a tree data structure known as a query tree. Leaf nodes represent the tables of the query. The internal nodes represent the relational algebra operations and the complete query is represented by a root.

When the operand table is made available, the internal node is executed. The result table replaces the node and the process is continued until the result table replaces the root node.

Example 1

The query considered is as follows:

πEmpID(σEName="ArunKumar"(EMPLOYEE))

The query tree appears as follows:

Step 2 − Query Plan Generation

The query plan is prepared once the query tree is generated. All the operations of the query tree are included with access paths which are known as query plan. The relational operations on the performance of the tree are specified by the access paths. For instance, the access path for a selection operation provides information on the use of B+ tree index.

The information about the intermediate tables that are required to be passed from one operator to another is provided by a query plan. The information about the usage of temporary tables, combining the operations is provided by the query plan.

Step 3− Code Generation

The final step of query optimization is the generation of the code. The type of the underlying operating system determines the form of the query. The results are produced by running the query code thus generated by the Execution Manager.

Different approaches/algorithms to Query Optimization

Heuristic Based Optimization

Semantic Query Optimization

Query Processing and Query optimization

Query Processing includes translations on high-level Queries into low-level expressions that can be used at the physical level of the file system, query optimization and actual execution of the query to get the actual result.

In Simple, Query Processing is the activity performed in extracting data from the database.

Textbook Based:

A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated.

The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation names—that appear in the text of the query, whereas the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language.

The query must also be validated by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried.

An internal representation of the query is then created, usually as a tree data structure called a query tree. It is also possible to rep-resent the query using a graph data structure called a query graph.

The DBMS must then devise an execution strategy or query plan for retrieving the results of the query from the database files. A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization.

Query processing has various steps for fetching the data from the database. The steps involved are:

Parsing and translation
Optimization
Evaluation

Parsing and translation

Before processing a query, a computer system needs to translate the query, The translation process in query processing is similar to the parser of a query. When a user executes any query, for generating the internal form of the query, the parser in the system checks the syntax of the query, verifies the name of the relation in the database, the tuple, and finally the required attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of the views when used in the query. During parse call, the database performs the following checks- Syntax check, Semantic check and Shared pool check, after converting the query into relational algebra.

Syntax check

Semantic check

Shared Pool check

Hard Parse and Soft Parse –

If there is a fresh query and its hash code does not exist in the shared pool then that query has to pass through from the additional steps known as hard parsing otherwise if hash code exists then query does not pass through additional steps. It just passes directly to execution engine (refer detailed diagram). This is known as soft parsing.

Hard Parse includes following steps – Optimizer and Row source generation.

Query Optimization

It is a process in which multiple query execution plan for satisfying a query are examined and the most efficient query plan is satisfied for execution.
Database catalogue stores the execution plans and then optimizer pass the lowest cost plan for execution.

There are two main techniques that are employed during query optimization.

The first technique is based on heuristic rules for ordering the operations in a query execution strategy. A heuristic is a rule that works well in most cases but is not guaranteed to work well in every case. The rules typically reorder the operations in a query tree.

The second technique involves systematically estimating the cost of different execution strategies and choosing the execution plan with the lowest cost estimate. These techniques are usually combined in a query optimizer.

Two internal representations of a query:

◼ Query Tree

◼ Query Graph

Row Source Generation –
The Row Source Generation is a software that receives an optimal execution plan from the optimizer and produces an iterative execution plan that is usable by the rest of the database. the iterative plan is the binary program that when executes by the SQL engine produces the result set.

Query Evaluation Plan

In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query. The query evaluation plan is also referred to as the query execution plan.
A query execution engine is responsible for generating the output of the given query. It takes the query execution plan, executes it, and finally makes the output for the user query.

Algorithms for Relational Database Schema Design

We have three algorithms for creating a relational decomposition from a universal relation as follows,

Dependency-Preserving Decomposition into 3NF Schemas
Non-additive Join Decomposition into BCNF Schemas
Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

Dependency-Preserving Decomposition into 3NF Schemas

Dependency Preservation

A Decomposition D = { R1, R2, R3....Rn } of R is dependency preserving wrt a set F of Functional dependency if,

(F1 ∪ F2 ∪ ... ∪ Fm)+ = F+.

Consider a relation R,

R ---> F{...with some functional dependency(FD)....}

R is decomposed or divided into R1 with FD { f1 } and R2 with { f2 }, then

there can be three cases:

1. f1 U f2 = F -----> Decomposition is dependency preserving.

2. f1 U f2 is a subset of F -----> Not Dependency preserving.

3. f1 U f2 is a superset of F -----> This case is not possible.

A dependency-preserving decomposition D = {R1, R2, ..., Rm} of a universal relation R based on a set of functional dependencies F, such that each Ri in D is in 3NF.
It guarantees only the dependency-preserving property; it does not guarantee the nonadditive join property.

Relational Synthesis into 3NF with Dependency Preservation

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

The first step of the algorithm is to find a minimal cover for F.

Note that multiple minimal covers may exist for a given set F

Minimal cover

Definition 1:

A minimal cover of a set of FDs F is a minimal set of functional dependencies F_min that is equivalent to F. There can be many such minimal covers for a set of functional dependencies F.

Definition 2:

A set of FDs F is minimum if F has as few FDs as an equivalent set of FDs.

Properties/Steps of minimal cover

1. Right Hand Side (RHS) of all FDs should be a single attribute.

2. Remove extraneous attributes (an attribute which can remove it without changing the closure of the set of functional dependencies).

3. Eliminate redundant functional dependencies.

Example

Let us apply these properties to F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}

Step 1:

According to the property, there should be only a single attribute on the RHS.

so we can write,

F= {A → C, AB → C, C → D, C → I, CD → I, EC → A, EC → B, EI → C}

Step 2:

We can remove AB --> C & CD --> I as these both are extraneous because C can be derived from A itself and I can be derived from C alone. Therefore,

F= { F2 = {A → C, C → D, C → I, EC → A, EC → B, EI → C}

Step 3:

None of the FDs is redundant. Hence, F is the minimal cover.

Nonadditive Join Decomposition into BCNF Schemas

The algorithm decomposes a universal relation schema R = {A1, A2, ..., An} into a decomposition D = {R1, R2, ..., Rm} such that each Ri is in BCNF and the decomposition D has the lossless join property with respect to F.

Relational Decomposition into BCNF with Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Set D := {R} ;

While there is a relation schema Q in D that is not in BCNF do

{

choose a relation schema Q in D that is not in BCNF;

find a functional dependency X → Y in Q that violates BCNF;

replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);

} ;

Each time through the loop, we decompose one relation schema Q that is not in BCNF into two relation schemas. According to Property NJB for binary decompositions, the decomposition D has the nonadditive join property. At the end of the algorithm, all relation schemas in D will be in BCNF.

It is necessary to determine whether a relation schema Q is in BCNF or not. One method for doing this is to test, for each functional dependency X → Y in Q, whether X+ fails to include all the attributes in Q, thereby determining whether or not X is a (super)key in Q. Another technique is based on an observation that whenever a relation schema Q has a BCNF violation, there exists a pair of attributes A and B in Q such that {Q – {A, B} } → A; by computing the closure {Q – {A, B} }+ for each pair of attributes {A, B} of Q, and checking whether the closure includes A (or B), we can determine whether Q is in BCNF.

Example:

Given R = {A, B, C} and the functional dependencies F = {AB → C, C → B}

Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

Relational Synthesis into 3NF with Dependency Preservation and Nonadditive Join Property

As we know, it is not possible to have all three of the following:

(1) guaranteed non-lossy design

(2) guaranteed dependency preservation

(3) all relations in BCNF

Now we give an alternative algorithm where we achieve conditions 1 and 2 and only guarantee 3NF which, Preserves dependencies & has the nonadditive join property such that each resulting relation schema in the decomposition is in 3NF.