cs.lectures: December 2020

Parallel and Distributed Databases/Database System Architectures (M2)

Parallel Database

The parallel database is a system in which multiple processors execute and run queries simultaneously.
In a parallel database, we can use thousands of small processors.

Advantages of parallel databases

Improved Performance of the processing of data.
Increased data processing speed by using multiple resources such as CPUs and disks in parallel.
High Availability:- the same data is stored at multiple locations
Increases Reliability:- even of data site fails execution can continue as other copy of data are available.

Distributed Database

Data are often generated and stored on different database systems, and there is a need to execute queries and update transactions across multiple databases. This need led to the development of distributed database systems.

A distributed database is a collection of multiple interconnected databases, which are spread physically across various locations that communicate via a computer network.

Advantages:

Modular Development: If we want to expand the same system on different locations then we just need to add nodes to the current network and these nodes do not interrupt the current network functionalities.
Increases Reliability: If one node on a network fails then its work can be distributed between other nodes on the network, failing of one node does not stop the system.
Improves Performance: a small database is always easy to handle compared to a large database, so in the distributed database, a large database is distributed into small database across various locations which are easy to handle with better performance.
Increase Availability: failure of one node does not affect data availability as data can be obtained from various other nodes on the network.
Faster Response: the data is available locally so data retrieval becomes efficient.

Database System Architectures

Centralized and Client-Server Architectures
Server System Architectures
Parallel Systems
Distributed Systems

Centralized Database Systems

Centralized database systems are those that run on a single computer system.
They are widely used for enterprise-scale applications.
There are two ways in which computers are used: as single-user systems and as multiuser systems.

A typical single-user system is a system used by a single person, usually with only one

processor (usually with multiple cores), and one or two disks.

A typical multiuser system, on the other hand, has multiple disks, a large amount of memory, and multiple processors. Such systems serve a large number of users who are connected to the system remotely, and they are called server systems.

Database systems designed for single-user systems usually do not provide many of the facilities that a multiuser database provides.

The systems that not support SQL and may instead provide an API for data access. Such database systems are referred to as embedded databases since they are usually designed to be linked to a single application program and are accessible only from that application.

Multiuser database systems support the full transactional features. Such databases are usually designed as servers, which service requests received from application programs; the requests could be in the form of SQL queries or they could be requests for retrieving, storing, or updating data specified using an API.

Most general-purpose computer systems in use today have a few multicore processors (typically one to four), with each multicore processor having a few cores. Main memory is shared across all the processors in a general-purpose computer system. Parallelism with such a small number of cores, and with shared memory, is referred to as coarse-grained parallelism.

Databases running on coarse-grained parallel machines traditionally did not at-

tempt to partition a single query among the processors; instead, they ran each query

on a single processor, allowing multiple queries to run concurrently. Thus, such systems support a higher throughput; that is, they allow a greater number of transactions

to run per second.

In contrast, machines with fine-grained parallelism have a large number of processors, and database systems running on such machines attempt to parallelize single tasks (queries, for example) submitted by users.

Client-Server System

Server systems satisfy requests generated at m client systems.

Database functionality can be broadly divided into two parts :

Front-end

Back-end

Advantages of replacing mainframes with networks of workstations or personal

computers connected to back-end server machines:

better functionality for the cost

flexibility in locating resources and expanding facilities

better user interfaces

easier maintenance

Server System Architectures

Server systems can be broadly categorized as transaction servers and data servers.

# Transaction-server systems

- also called query-server systems, provide an interface to which clients can send requests to perform an action, in response to which they execute the action and send back results to the client.

Usually, client machines ship transactions to the server systems, where those transactions are executed, and results are shipped back to clients that are in charge of displaying the data.

Requests may be specified through the use of SQL or through a specialized application program interface.

Transaction server systems consists of multiple processes accessing data

in shared memory.

The processes that form part of the database system include:

Server processes: These are processes that receive user queries (transactions), execute them, and send the results back. The queries may be submitted to the server processes from a user interface, or from a user process running embedded SQL, or via JDBC, ODBC, or similar protocols.
Lock manager process: This process implements lock manager functionality, which includes lock grant, lock release, and deadlock detection.
Database writer process: There are one or more processes that output modified buffer blocks back to disk continuously.
Log writer process: This process outputs log records from the log record buffer to stable storage. Server processes simply add log records to the log record buffer in shared memory, and if a log force is required, they request the log writer process to output log records (recall that a log force causes the log contents in memory to be output to stable storage).
Checkpoint process: This process performs periodic checkpoints.
Process monitor process: This process monitors other processes, and if any of them fails, it takes recovery actions for the process, such as aborting any transaction being executed by the failed process and then restarting the process.

The shared memory contains all shared data, such as:

Buffer pool.
Lock table.
Log buffer, containing log records waiting to be output to the log on stable storage.
Cached query plans, which can be reused if the same query is submitted again.

To ensure that no two processes are accessing the same data structure at the same time, databases systems implement mutual exclusion using either,

Operating system semaphores
Atomic instructions such as test-and-set or compare-and-swap, which are supported by the computer hardware.

# Data-server systems

- allow clients to interact with the servers by making requests to read or update data, in units such as files, pages, or objects.

For example, file servers provide a file-system interface where clients can create, update, read, and delete files.

Data servers are used in local area networks where:

There is a high-speed connection between the clients and the server.
The client machines are having comparatively more processing power.
Tasks to be executed are computing-intensive.

In such an environment, the server machine sends data to the client machines to perform all processing at the client machine and then the client sends that data back to the server machine.

Data servers for database systems offer much more functionality; they support units of data such as pages, tuples, or objects that are smaller than a file. They provide indexing facilities for data, and they provide transaction facilities so that the data are never left in an inconsistent state if a client machine or process fails. The term data item to refer to tuples, objects, files, and documents. We also use the terms data server and data storage system interchangeably. Data servers support communication of entire data items.

Data servers in earlier generations of storage systems supported a concept called

page shipping, where the unit of communication is a database page that may potentially

contain multiple data items. Page shipping is not used today, since storage systems do

not expose the underlying storage layout to clients.

Caching at Clients

The time cost of communication between a client application and a server is high compared to that of a local memory reference.

Following issues decide the time cost of communication between client and server.

Data Shipping- {Page-Shipping versus Item-Shipping}: Units of communication for data are the page or an item (tuple or an object). Data can be fetched by fetching the page or single item (tuple or object).
Locks: Locks are usually granted by the server for data item/page that it ships client machine.

The time to send a message over a network, and get a response back, called the network round-trip time, or network latency, As a result, applications running at the client's adopt several optimization strategies to reduce the effects of network latency.

The optimization strategies include the following:

Prefetching- Fetching items even before they are requested.
Data caching- Data that are shipped to a client on behalf of a transaction can be cached at the client within the scope of a single transaction. Data can be cached even after the transaction completes, allowing successive transactions at the same client to make use of the cached data. However, cache coherency is an issue.
Lock caching
Adaptive lock granularity

Lock de-escalation is a way of adaptively decreasing the lock granularity if there

is higher contention.

Parallel Systems

Parallel systems improve processing and I/O speeds by using a large number of com-

puters in parallel.

In parallel processing, many operations are performed simultaneously, in which the computational steps are performed sequentially.

A coarse-grain parallel machine consists of a small number of powerful processors; a massively parallel or fine-grain parallel machine uses thousands of smaller processors.

Measures of Performance for Parallel Systems

There are two main measures of the performance of a database system:

(1) throughput, the number of tasks that can be completed in a given time interval.

(2) response time, the amount of time it takes to complete a single task from the time it is submitted.

A system that processes a large number of small transactions can improve throughput by

processing many transactions in parallel. A system that processes large transactions can improve response time as well as throughput by performing subtasks of each transaction in parallel.

Two important issues in studying parallelism are speedup and scaleup. Running a

given task in less time by increasing the degree of parallelism is called speedup. Handling larger tasks by increasing the degree of parallelism is called scaleup.

Speedup

A fixed-sized problem executing on a small system is given to an N-times larger system. i.e Speedup is the execution of a task in less time by the increasing the degree of parallelism.

Scaleup

It is the process of handling a large task in the same amount of time by increasing the degree of parallelism. It increases the size of both the problem and the system N-times larger system used to perform N-times larger job.

Scale-up is linear if equation equals 1.

Two kinds of scaleup are relevant in parallel database systems, depending on how the size of the task is measured:

In batch scaleup, the size of the database increases, and the tasks are large jobs whose runtime depends on the size of the database.
In transaction scaleup, the rate at which transactions are submitted to the database increases, and the size of the database increases proportionally to the transaction rate. This kind of scaleup is what is relevant in transaction-processing systems where the transactions are small updates.

Scaleup is usually the more important metric for measuring the efficiency of parallel database systems. The goal of parallelism in database systems is usually to make sure that the database system can continue to perform at an acceptable speed, even as the size of the database and the number of transactions increases.

Several factors work against an efficient parallel operation and can diminish both

speedup and scaleup.

Sequential computation- Start-up costs
Interference
Skew

Interconnection Networks

Parallel systems consist of a set of components (processors, memory, and disks) that

can communicate with each other via an interconnection network.

Commonly used types of interconnection networks:

Bus. All the system components can send data on and receive data from a single communication bus.
Ring. The components are nodes arranged in a ring (circle), and each node is connected to its two adjacent nodes in the ring.
Mesh. The components are nodes in a grid, and each component connects to all its adjacent components in the grid.
Hypercube. The components are numbered in binary, and a component is connected to another if the binary representations of their numbers differ in exactly one bit. Thus, each of the n components is connected to log(n) other components. In a hypercube interconnection, a message from a component can reach any other component by going through at most log(n) links.

Parallel Database Architectures

There are several architectural models for parallel machines.

Shared memory. All the processors share a common memory
Shared disk. A set of nodes that share a common set of disks; each node has its own processor and memory. Shared-disk systems are sometimes called clusters.
Shared nothing. A set of nodes that share neither a common memory nor common disk.
Hierarchical. This model is a hybrid of the preceding three architectures. This model is the most widely used today.

Distributed Systems

In a distributed database system, the database is stored on nodes located at geographically separated sites.

The nodes in a distributed system communicate with one another through various communication media, such as high-speed private networks or the internet. They do not share the main memory or disks.

The main differences between shared-nothing parallel databases and distributed

databases include the following:

----------------

In a distributed database system, we differentiate between local and global transactions.

A local transaction is one that accesses data only from nodes where the transaction was initiated.

A global transaction, on the other hand, is one that either accesses data in a node different from the one at which the transaction was initiated, or accesses data in several different nodes.

Distributed databases that are built by integrating existing database systems have somewhat different characteristics.

Sharing data. The major advantage in building a distributed database system is the provision of an environment where users at one site may be able to access the data residing at other sites.

Autonomy. The primary advantage of sharing data using data distribution is that each site can retain a degree of control over data that are stored locally. In a centralized system, the database administrator of the central site controls the database. In a distributed system, there is a global database administrator responsible for the entire system.

Types:

In a homogeneous distributed database system, nodes share a common global

schema, all nodes run the same distributed database-management software and the nodes actively cooperate in processing transactions and queries.

However, in many cases, a distributed database has to be constructed by linking

together multiple already-existing database systems, each with its own schema and possibly running different database-management software. The sites may not be aware of

one another, and they may provide only limited facilities for cooperation in query and

transaction processing. Such systems are sometimes called federated database systems

or heterogeneous distributed database systems.

Nodes in a distributed database communicate over wide-area networks ( WAN ).

Disadvantage:

Added complexity required to ensure proper coordination among sites.
Software development cost.
Greater potential for bugs.
Increased processing overhead.

Query optimization in Centralized Systems

The optimal access path is determined after the alternative access paths are derived for the relational algebra expression. This chapter focus on query optimization in a centralized system.

Query processing for a centralized system is done to achieve:

The response time of a query is minimized.
The system throughput is maximized
The memory and storage used for processing are reduced.
Parallelism is increased.

Steps for Query Optimization

There are three steps for query optimization. They are -

Step 1 − Query Tree Generation

A relational algebra expression is represented by a tree data structure known as a query tree. Leaf nodes represent the tables of the query. The internal nodes represent the relational algebra operations and the complete query is represented by a root.

When the operand table is made available, the internal node is executed. The result table replaces the node and the process is continued until the result table replaces the root node.

Example 1

The query considered is as follows:

πEmpID(σEName="ArunKumar"(EMPLOYEE))

The query tree appears as follows:

Step 2 − Query Plan Generation

The query plan is prepared once the query tree is generated. All the operations of the query tree are included with access paths which are known as query plan. The relational operations on the performance of the tree are specified by the access paths. For instance, the access path for a selection operation provides information on the use of B+ tree index.

The information about the intermediate tables that are required to be passed from one operator to another is provided by a query plan. The information about the usage of temporary tables, combining the operations is provided by the query plan.

Step 3− Code Generation

The final step of query optimization is the generation of the code. The type of the underlying operating system determines the form of the query. The results are produced by running the query code thus generated by the Execution Manager.

Different approaches/algorithms to Query Optimization

Heuristic Based Optimization

Semantic Query Optimization

Query Processing and Query optimization

Query Processing includes translations on high-level Queries into low-level expressions that can be used at the physical level of the file system, query optimization and actual execution of the query to get the actual result.

In Simple, Query Processing is the activity performed in extracting data from the database.

Textbook Based:

A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated.

The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation names—that appear in the text of the query, whereas the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language.

The query must also be validated by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried.

An internal representation of the query is then created, usually as a tree data structure called a query tree. It is also possible to rep-resent the query using a graph data structure called a query graph.

The DBMS must then devise an execution strategy or query plan for retrieving the results of the query from the database files. A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization.

Query processing has various steps for fetching the data from the database. The steps involved are:

Parsing and translation
Optimization
Evaluation

Parsing and translation

Before processing a query, a computer system needs to translate the query, The translation process in query processing is similar to the parser of a query. When a user executes any query, for generating the internal form of the query, the parser in the system checks the syntax of the query, verifies the name of the relation in the database, the tuple, and finally the required attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of the views when used in the query. During parse call, the database performs the following checks- Syntax check, Semantic check and Shared pool check, after converting the query into relational algebra.

Syntax check

Semantic check

Shared Pool check

Hard Parse and Soft Parse –

If there is a fresh query and its hash code does not exist in the shared pool then that query has to pass through from the additional steps known as hard parsing otherwise if hash code exists then query does not pass through additional steps. It just passes directly to execution engine (refer detailed diagram). This is known as soft parsing.

Hard Parse includes following steps – Optimizer and Row source generation.

Query Optimization

It is a process in which multiple query execution plan for satisfying a query are examined and the most efficient query plan is satisfied for execution.
Database catalogue stores the execution plans and then optimizer pass the lowest cost plan for execution.

There are two main techniques that are employed during query optimization.

The first technique is based on heuristic rules for ordering the operations in a query execution strategy. A heuristic is a rule that works well in most cases but is not guaranteed to work well in every case. The rules typically reorder the operations in a query tree.

The second technique involves systematically estimating the cost of different execution strategies and choosing the execution plan with the lowest cost estimate. These techniques are usually combined in a query optimizer.

Two internal representations of a query:

◼ Query Tree

◼ Query Graph

Row Source Generation –
The Row Source Generation is a software that receives an optimal execution plan from the optimizer and produces an iterative execution plan that is usable by the rest of the database. the iterative plan is the binary program that when executes by the SQL engine produces the result set.

Query Evaluation Plan

In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query. The query evaluation plan is also referred to as the query execution plan.
A query execution engine is responsible for generating the output of the given query. It takes the query execution plan, executes it, and finally makes the output for the user query.

Algorithms for Relational Database Schema Design

We have three algorithms for creating a relational decomposition from a universal relation as follows,

Dependency-Preserving Decomposition into 3NF Schemas
Non-additive Join Decomposition into BCNF Schemas
Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

Dependency-Preserving Decomposition into 3NF Schemas

Dependency Preservation

A Decomposition D = { R1, R2, R3....Rn } of R is dependency preserving wrt a set F of Functional dependency if,

(F1 ∪ F2 ∪ ... ∪ Fm)+ = F+.

Consider a relation R,

R ---> F{...with some functional dependency(FD)....}

R is decomposed or divided into R1 with FD { f1 } and R2 with { f2 }, then

there can be three cases:

1. f1 U f2 = F -----> Decomposition is dependency preserving.

2. f1 U f2 is a subset of F -----> Not Dependency preserving.

3. f1 U f2 is a superset of F -----> This case is not possible.

A dependency-preserving decomposition D = {R1, R2, ..., Rm} of a universal relation R based on a set of functional dependencies F, such that each Ri in D is in 3NF.
It guarantees only the dependency-preserving property; it does not guarantee the nonadditive join property.

Relational Synthesis into 3NF with Dependency Preservation

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

The first step of the algorithm is to find a minimal cover for F.

Note that multiple minimal covers may exist for a given set F

Minimal cover

Definition 1:

A minimal cover of a set of FDs F is a minimal set of functional dependencies F_min that is equivalent to F. There can be many such minimal covers for a set of functional dependencies F.

Definition 2:

A set of FDs F is minimum if F has as few FDs as an equivalent set of FDs.

Properties/Steps of minimal cover

1. Right Hand Side (RHS) of all FDs should be a single attribute.

2. Remove extraneous attributes (an attribute which can remove it without changing the closure of the set of functional dependencies).

3. Eliminate redundant functional dependencies.

Example

Let us apply these properties to F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}

Step 1:

According to the property, there should be only a single attribute on the RHS.

so we can write,

F= {A → C, AB → C, C → D, C → I, CD → I, EC → A, EC → B, EI → C}

Step 2:

We can remove AB --> C & CD --> I as these both are extraneous because C can be derived from A itself and I can be derived from C alone. Therefore,

F= { F2 = {A → C, C → D, C → I, EC → A, EC → B, EI → C}

Step 3:

None of the FDs is redundant. Hence, F is the minimal cover.

Nonadditive Join Decomposition into BCNF Schemas

The algorithm decomposes a universal relation schema R = {A1, A2, ..., An} into a decomposition D = {R1, R2, ..., Rm} such that each Ri is in BCNF and the decomposition D has the lossless join property with respect to F.

Relational Decomposition into BCNF with Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Set D := {R} ;

While there is a relation schema Q in D that is not in BCNF do

{

choose a relation schema Q in D that is not in BCNF;

find a functional dependency X → Y in Q that violates BCNF;

replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);

} ;

Each time through the loop, we decompose one relation schema Q that is not in BCNF into two relation schemas. According to Property NJB for binary decompositions, the decomposition D has the nonadditive join property. At the end of the algorithm, all relation schemas in D will be in BCNF.

It is necessary to determine whether a relation schema Q is in BCNF or not. One method for doing this is to test, for each functional dependency X → Y in Q, whether X+ fails to include all the attributes in Q, thereby determining whether or not X is a (super)key in Q. Another technique is based on an observation that whenever a relation schema Q has a BCNF violation, there exists a pair of attributes A and B in Q such that {Q – {A, B} } → A; by computing the closure {Q – {A, B} }+ for each pair of attributes {A, B} of Q, and checking whether the closure includes A (or B), we can determine whether Q is in BCNF.

Example:

Given R = {A, B, C} and the functional dependencies F = {AB → C, C → B}

Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

Relational Synthesis into 3NF with Dependency Preservation and Nonadditive Join Property

As we know, it is not possible to have all three of the following:

(1) guaranteed non-lossy design

(2) guaranteed dependency preservation

(3) all relations in BCNF

Now we give an alternative algorithm where we achieve conditions 1 and 2 and only guarantee 3NF which, Preserves dependencies & has the nonadditive join property such that each resulting relation schema in the decomposition is in 3NF.

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Algorithm:

Find a minimal cover G for F
For each left-hand-side X of a functional dependency that appears in G, create a relation schema in D with attributes {X ∪ {A1} ∪ {A2} ... ∪ {Ak} }, where X → A1, X → A2, ..., X → Ak are the only dependencies in G with X as left-hand-side (X is the key of this relation).
If none of the relation schemas in D contains a key of R, then create one more relation schema in D that contains attributes that form a key of R.

Important Questions: Information Security

1. What is brute force attack?

A brute force attack is a trial-and-error method used to obtain information such as a user password or personal identification number (PIN).
In a brute force attack, automated software is used to generate a large number of consecutive guesses as to the value of the desired data.
Brute force attacks may be used by criminals to crack encrypted data, or by security, analysts to test an organization's network security. An attack of this nature can be time- and resource-consuming.
An example of a type of brute force attack is known as a dictionary attack, which might try all the words in a dictionary.

Brute force explanation with example

Consider we have to open a number combination like in the above picture.

We have 3 wheels to turn, each wheel has values from 0-9.

Therefore the smallest number is 000 and the largest number is 999, so there are 1000 possible combinations to unlock this lock.

An attacker with a brute force approach will try to manually try out all the possible combinations to unlock this lock.

# One disadvantage of this method is that it is really slow, to solve this we can use a dictionary attack.

A dictionary attack will try out all the combinations in a list, rather than trying out all possible combinations. Consider my Password is 1234, this is one of the most common passwords found and I am having a list with the top 1000 common passwords, so my password can be hacked in seconds.

2. Discuss different types of attacks that can occur in an organization.

An attack is a deliberate act or action that takes advantage of a vulnerability to compromise a controlled system. It is accomplished by a threat agent that damages or steals an organization's information or physical asset.

Types of Attacks

Attacks on confidentiality, integrity, availability.
Brute force attack: A brute force attack is a trial-and-error method used to obtain information such as a user password or personal identification number (PIN).
Timing Attack: A timing attack is a security exploit that allows an attacker to discover vulnerabilities in the security of a computer or network system by studying how long it takes the system to respond to different inputs.
Sniffers: Sniffing is a process of monitoring and capturing all data packets passing through a given network.
Denial of Service - It prevents the normal use of communication facilities. This attack may have a specific target.

3. Describe discretionary policies for Biba model.

The Biba model is a hierarchical security model designed to protect system assets (or objects) from unauthorized modification, which is designed to protect system integrity. In this model, subjects(users) and objects are associated with ordinal integrity levels where subjects can modify objects only at a level equal to or below its own integrity level.

Discretionary policies

1. Access Control Lists: it used to determine which subjects can access which objects. The access control list can then be modified by the subjects with the correct privileges.

2. Object Hierarchy: integrity can be enforced by using an object’s hierarchy. With this method, there is root and objects that are ancestors to the root. To access a particular object, the subject must have the observe privileges to those objects and all the other ancestor objects all the way up to the root.

3. Ring: it numbers the rings in the system with the lower number being a higher privilege. The access modes of the subject must fall within a certain range of values to be permitted to access an object.

4. What is phishing? Give an example.

Phishing is the fraudulent attempt to obtain sensitive information or data, such as usernames, passwords and credit card details.
it is carried out by email spoofing, instant messaging, and text messaging, phishing often directs users to enter personal information at a fake website which matches the look and feel of the legitimate site.
Phishing is an example of social engineering techniques used to deceive users.

Types of Phishing

Mass Phishing (Deceptive Phishing) – Mass, large-volume attack intended to reach as many people as possible
Spear Phishing – Targeted attack directed at specific individuals or companies using gathered information to personalize the message and make the scam more difficult to detect
Whaling (CEO Fraud) – Type of spear phishing attack that targets “big fish,” including high-profile individuals or those with a great deal of authority or access
Clone Phishing(pharming) – Spoofed copy of a legitimate and previously delivered email, with original attachments or hyperlinks replaced with malicious versions, which is sent from a forged email address. so it appears to come from the original sender or another legitimate source
Advance-Fee Scam- Requests the target to send money or bank account information to the cyber-criminal

Example of phishing

An email or message asking to go to a certain hyperlink, and to enter a person's credentials or sensitive information is an example of phishing. The website will look like an authentic website but the URL will be different.

5. Differentiate between polymorphic and metamorphic worm.

Polymorphic worms and metamorphic worms are used synonymously but they vary due to their respective unique engines.

A metamorphic worm is a worm that can reprogram itself. With each infection, it rewrites its code, making it appear different, but the main functionality of the worm doesn’t change. This change of code is done using a metamorphic engine. This ability to morph itself makes detecting these worms harder.

A polymorphic worm can transform a program into a version consisting of different code but having the same functionality. Encryption is generally employed here; encrypting the payload with different keys can generate many worm variations. A decryption module has to be prepended before the payload.

6. How do you reduce the impact of XSS vulnerabilities?

Cross-site scripting (XSS) is a code injection security attack targeting web applications. we can reduce the impact of XSS vulnerabilities by,

• If Cookies Are Used:

▫ Scope as strict as possible

▫ Set ‘secure’ flag

▫ Set ‘HttpOnly’ flag

On the client, consider disabling JavaScript (if possible) or use something like

the No Script Firefox extension.

Filtering for XSS

The easiest form of cross-site scripting vulnerability elimination would be to pass all external data through a filter. Such a filter would remove dangerous keywords, for example, the infamous <script> tag, JavaScript commands, CSS styles, and other dangerous HTML markups (such as those that contain event handlers.

Escaping from XSS

Escaping is the primary means to avoid cross-site scripting attacks. When escaping, you are effectively telling the web browser that the data you are sending should be treated as data and should not be interpreted in any other way. If an attacker manages to put a malicious script on your page, the victim will not be affected because the browser will not execute the script if it is properly escaped. In HTML, you can escape dangerous characters by using HTML entities, for example, the &# sequence followed by its character code.

7. Describe frame spoofing with a neat diagram.

Frame Spoofing

Premature Termination of connections

➢ A number of management frames used in 802.11 wireless LANs such as the Beacon, Association and Authentication frames.

➢ A station needs to authenticate and then associate with an Access Point (AP) before they can exchange data frames with each other.

➢ Each party can, at any point in time, terminate the connection by transmitting a Deauthentication frames.

➢ The recipient of a management frame relies on the sender address field in the frame to identify the originator of the message.

However, an attacker can spoof the sender address in the frame. For example, he can fabricate a de-authentication frame with

Sender Address = Sataion_27

Receiver Address = AP

➢ The address used are 48-bit MAC address. When the AP receives the above frame, it thinks that Station_27 wishes to terminate the existing connection to itself. The AP sets the state of the connection between itself and Station_27 to be “Unauthenticated and Unassociated”

➢ Station_27 would have to go through the time-consuming process of re-associating itself to the AP if it wished to resume the communication. The attacker could repeatedly transmit such Deauthentication frames to the AP thus effectively slowing down or even preventing communication between Station_27 and AP.

8. Describe the security enhancements present in UMTS.

The Universal Mobile Telecommunications System (UMTS), based on the GSM standards, is a mobile cellular system of the third generation that is maintained by 3GPP (3rd Generation Partnership Project).

Mutual Authentication: provides enhanced protection against false base station attacks by allowing the mobile to authenticate the network.
Data Integrity: provides enhanced protection against false base station attacks by allowing the mobile to check the authenticity of certain signalling messages.
Network to Network Security: Secure communication between serving networks.
Flexibility: Security features can be extended and enhanced as required by new threats and services.
Longer key length: Key length is 128 as against 64 bits in GSM.
Wider security scope: Security is based within the RNC rather than the base station.

9. What is SOAP binding? Explain with the help of an HTTP message.

SOAP (Simple Object Access Protocol) bindings are mechanisms which allow SOAP messages to be effectively exchanged using a transport protocol.
Most SOAP implementations provide bindings for common transport protocols, such as HTTP or SMTP.
HTTP is synchronous and widely used. A SOAP HTTP request specifies at least two HTTP headers: Content-Type and Content-Length.

Example:

Example code

<soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http"/ >

<soap:operation soapAction="sayHello"/>

<input>...

10. List the security threats in RFID based identification and tracking systems.

Man-in-the-Middle Attack: A man-in-the-middle attack happens during the transmission of a signal. The hacker listens for communication between a tag and reader and then intercepts and manipulates the information. The hacker diverts the original signal and then sends false data while pretending to be a normal component in the RFID system.

Denial of Service: A Denial of Service attack is the broad concept of an RFID system failure that is associated with an attack. These attacks are usually physical attacks like jamming the system with noise interference, blocking radio signals, or even removing or disabling RFID tags.

Power Analysis: Power analysis attacks can be mounted on RFID systems by monitoring the power consumption levels of RFID tags.

Eavesdropping: Eavesdropping, like it sounds, occurs when an unauthorized RFID reader listens to conversations between a tag and reader then obtains important data.

11 a) What is role based access control. Illustrate with suitable example the concept of role inheritance.

Role-based access control (RBAC) is an approach to restricting system access to authorized users.
It s a policy-neutral access-control mechanism defined around roles and privileges.
RBAC can be used to facilitate administration of security in large organizations with hundreds of users and thousands of permissions.
The components of RBAC such as role-permissions, user-role and role-role relationships make it simple to perform user assignments.

Example:

An organization assigns a role-based access control role to every employee; the role determines which permissions the system grants to the user, like you can designate whether a user is an administrator, a specialist, or an end-user, and limit access to specific resources or tasks.

b) Differentiate between Discretionary and Role based access control.

Discretionary Access Control (DAC)

The owner of a protected system or resource sets policies defining who can access it.
DAC can involve physical or digital measures, and is less restrictive than other access control systems, as it offers individuals complete control over the resources they own.
It is less secure because associated programs inherit security settings and allow malware to exploit them without the knowledge of the end-user.
You can use RBAC to implement DAC.

Role-based access control (RBAC)

Is a mechanism that restricts system access.
It involves setting permissions and privileges to enable access to authorized users.
Most large organizations use role-based access control to provide their employees with varying levels of access based on their roles and responsibilities.
This protects sensitive data and ensures employees can only access information and perform actions they need to do their jobs.

c) Briefly discuss Mandatory access control implemented in a typical secure operating System.

MAC is considered the most secure of all access control models.
In MAC, central authority regulates access rights based on multiple levels of security.
Only users or devices with the required information security clearance can access protected resources.
Access rules are manually defined by system administrators and strictly enforced by the operating system or security kernel.
Organizations with varying levels of data classification, like government and military institutions, typically use MAC to classify all end users.
You can use role-based access control to implement MAC.

12 a) Demonstrate Chinese wall security model with neat diagram.

The Chinese Wall model is a security model that concentrates on confidentiality and finds itself application in the commercial world. The model bases itself on the principles defined in the Clark Wilson security model.The Chinese Wall model was introduced by Brewer and Nash in 1989.According to the model, subjects are only granted access to data that is not in conflict with other data they possess.

b) Classify each of the following as a violation of confidentiality, integrity, availability or some combination thereof. Also, justify your answer.

i. John copies Mary's homework.

Confidentiality- Copyng the data s violation of confidentiality.

ii. Pau[ clashes Linda's system

Availability- Ths crashing causes unavailability of the system to Linda.

iii. Carol changes the amount of Angelo's check from 100 to 1000

Integrity- The data on the check got changed which data integrity violation.

iv. Gina forges Roger's signature on a deed.

Integrity- Violation of Integrity by unauthorized signature.

13 a) Interpret about the star property in Bell -LaPadula model.

b) Write Windows access control algorithm.

14 a) How Buffer OverFlow (BOF) vulnerability makes software insecure. Explain different ways in which BOF exploitations occur.

A buffer overflow, or buffer overrun, occurs when more data is put into a fixed-length buffer ( Buffers are areas of memory set aside to hold data) than the buffer can handle.
The extra information, which has to go somewhere, can overflow into adjacent memory space, corrupting or overwriting the data held in that space.
This overflow usually results in a system crash, but it also creates the opportunity for an attacker to run arbitrary code or manipulate the coding errors to prompt malicious actions.

The techniques to exploit a buffer overflow vulnerability vary by architecture, by the operating system and by memory region.

Stack-based exploitation
Heap-based exploitation: A buffer overflow occurring in the heap data area is referred to as a heap overflow.
Barriers to exploitation: Manipulation of the buffer, which occurs before it is read or executed, may lead to the failure of an exploitation attempt. These manipulations can mitigate the threat of exploitation, but may not make it impossible.

b) Explain XSS vulnerabilities.

Cross-site scripting (XSS) is a type of security vulnerability, typically found in web applications.
XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users.
A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy.
XSS effects vary in range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site.
Cross-site scripting attacks use known vulnerabilities in web-based applications, their servers, or the plug-in systems on which they rely.
Exploiting one of these, attackers fold malicious content into the content being delivered from the compromised site.

15 a) Describe Kermack-McKendrick Model of worm propagation.

Kermack-McKendrick Model

The model consists of three compartments:

The number of susceptible (S)
The number of infectious (I)
The number of recovered individuals (R)

The model consists of a system of three coupled nonlinear ordinary differential equations,

where,

# N - the total population

# t - time

# S(t) - the number of susceptible people

# I(t) - the number of people infected

# R(t) - the number of people who have recovered and developed immunity to the infection

# beta is the infection rate

# gamma is the recovery rate

b) Explain any two categories of topological worms.

Email-Worm

An Email-Worm (also known as a mass-mailer or less commonly, an Internet worm) is a type of worm that distributes copies of itself in infectious executable files attached to fake email messages.
Email-Worm typically arrives as executable files attached to fake email messages.

P2P Worms

P2P Worms spread via peer-to-peer file-sharing networks (such as Kazaa, EDonkey, FastTrack, etc.).
Most of these worms work in a relatively simple way, to get onto a P2P network, all the worm has to do is, copy itself to the file-sharing directory, which is usually on a local machine.
The P2P network does the rest, when a file search is conducted, it informs remote users of the file and provides services making it possible to download the file from the infected computer.

16 a) Explain how can you detect and prevent SQL Injection vulnerabilities.

SQL Injection (SQLi) is a type of injection attack that makes it possible to execute malicious SQL statements.

The only efficient way to detect SQL Injections is by using a vulnerability scanner, often called a DAST tool (dynamic application security testing).

Prevention

Input validation - The validation process is aimed at verifying whether or not the type of input submitted by a user is allowed. Input validation makes sure it is the accepted type, length, format, etc. Only the value which passes the validation can be processed.
Parametrized queries - Parameterized queries are a means of pre-compiling a SQL statement so that you can then supply the parameters in order for the statement to be executed. This method makes it possible for the database to recognize the code and distinguish it from input data.
Escaping - Always use character-escaping functions for user-supplied input provided by each database management system (DBMS). This is done to make sure the DBMS never confuses it with the SQL statement provided by the developer.

b) Name any worm that exploited buffer overflow vulnerability. Explain its characteristics.

Code Red was a computer worm that exploited buffer overflow vulnerability. It did this by using a long string of the repeated letter 'N' to overflow a buffer, allowing the worm to execute arbitrary code and infect the machine with the worm.

Characteristics

It often uses a computer network to spread itself, relying on security failures on the target computer to access it.
It will use this machine as a host to scan and infect other computers.
Computer worms use a recursive method to copy themselves without host programs and distribute themselves and then controlling and infecting more and more computers in a short time.
Worms almost always cause at least some harm to the network, even if only by consuming bandwidth, whereas viruses almost always corrupt or modify files on a targeted computer.

17 a) Explain link level security provided by Bluetooth.

In link-level security, a device starts security procedures before any physical link is established.
In this mode, authentication and encryption are used for all connections to and from the device.
The authentication and encryption processes use a separate secret link key that is shared by paired devices, once the pairing has been established.
The link key is generated for the first time when two devices communicate.

Link key generation:

Two devices communicating for the first time will go through an initialization phase, they will be “associated” at that point.
The link key generation begins when the user enters identical PINs into both devices, which the devices use to generate their secret link keys.
One of Bluetooth's security strengths is that in subsequent communications between devices, the link key is never transmitted outside of the device.
The link key is simply used in cryptographic algorithms to generate matching sequences.

b) Describe entity authentication and key agreement in GSM Networks.

18 a) How security is implemented in online credit card payment systems?

b) What are the main concerns involved in online credit card payment systems?

19 a) Explain MAC generation and encryption in CCMP.

Counter Mode with Cipher Block Chaining Message Authentication Code Protocol

b) Explain any two technologies for web services.

XML

XML is a markup language. With a markup language, we can structure a document using tags, using XML, we can customize the tags also.
Each bit of information in a document is defined by tags without the overload of formatting present in HTML.
This type of representation is suitable for application-to-application communication.
Another feature of XML is that the vocabulary can be extended. Vocabulary refers to the types of tags used to structure a document in XML.
XML supports multichannel portal applications

SOAP

The Simple Object Access Protocol is a standard protocol that provides a definition for XML-based information exchange by means of XML messages.
SOAP provides a paradigm for allowing different programs, running in different or the same operative system to communicate with each other using a transport protocol (mainly HTTP) and XML based structures.
SOAP is a lightweight protocol that provides a message exchange pattern for structured information in a decentralized, distributed environment; it defines an extensible messaging framework based on XML to provide a message construct (SOAP messages) which can be exchanged over different underlying protocols. This framework is independent of any programming model and other implementation semantics.

SOAP Message Structure

The following block depicts the general structure of a SOAP message −

<?xml version = "1.0"?>

<SOAP-ENV:Envelope xmlns:SOAP-ENV = "http://www.w3.org/2001/12/soap-

envelope"

SOAP-ENV:encodingStyle = "http://www.w3.org/2001/12/soap-encoding">

<SOAP-ENV:Header>

...

</SOAP-ENV:Header>

<SOAP-ENV:Body>

...

<SOAP-ENV:Fault>

...

</SOAP-ENV:Fault>

...

</SOAP-ENV:Body>

</SOAP_ENV:Envelope>