cs.lectures: 12/15/20

Waterfall Model

It is also referred to as a linear-sequential life cycle model.
WATERFALL MODEL is a sequential model that divides software development into pre-defined phases.
Each phase must be completed before the next phase can begin with no overlap between the phases.
Each phase is designed for performing specific activity during the SDLC phase.

Sequential Phases

Requirements: The first phase involves understanding what needs to design and what is its function, purpose, etc. Here, the specifications of the input and output or the final product are studied and marked.
System Design: The requirement specifications from the first phase are studied in this phase and system design is prepared. System Design helps in specifying hardware and system requirements and also helps in defining overall system architecture. The software code to be written in the next stage is created now.
Implementation: With inputs from system design, the system is first developed in small programs called units, which are integrated into the next phase. Each unit is developed and tested for its functionality which is referred to as Unit Testing.
Integration and Testing: All the units developed in the implementation phase are integrated into a system after testing of each unit. The software designed, needs to go through constant software testing to find out if there are any flaws or errors. Testing is done so that the client does not face any problem during the installation of the software.
Deployment of System: Once the functional and non-functional testing is done, the product is deployed in the customer environment or released into the market.
Maintenance: This step occurs after installation, and involves making modifications to the system or an individual component to alter attributes or improve performance. These modifications arise either due to change requests initiated by the customer, or defects uncovered during live use of the system. The client is provided with regular maintenance and support for the developed software.

Waterfall model can be used when,

Requirements are not changing frequently
Application is not complicated and big
Project is short
Requirement is clear
Environment is stable
Technology and tools used are not dynamic and is stable
Resources are available and trained

Advantages

In this model, phases are processed and completed one at a time and they do not overlap.
Before the next phase of development, each phase must be completed.
Suited for smaller projects where requirements are well defined.
They should perform quality assurance test (Verification and Validation) before completing each stage.
Any changes in software is made during the process of the development.

Disadvantages

It is difficult to estimate time and cost for each phase of the development process.
Error can be fixed only during the phase.
It is not desirable for complex project where requirement changes frequently.
Testing period comes quite late in the developmental process..
Small changes or errors that arise in the completed software may cause a lot of problems.

Software Life Cycle

The software life cycle is a process that consists of a series of planned activities to develop or alter the Software Products.

lt s used by the software industry to design, develop and test software.
The SDLC aims to produce high-quality software that meets or exceeds customer expectations reaches completion within times and cost estimates.
The following are the various stages/steps of the software life cycle,

# Planning and Requirement Analysis

The requirement is the first stage in the SDLC process.
It is conducted by the senior team members with inputs from all the stakeholders and domain experts in the industry.
Planning for the quality assurance requirements and recognization of the risks involved is also done at this stage.

# Defining (Feasibility study)

Once the requirement analysis phase is completed the next step is to define and document software needs.
This process conducted with the help of 'Software Requirement Specification' document also known as 'SRS' document.
It includes everything which should be designed and developed during the project life cycle.

# Designing the Software

In this third phase, the system and software design documents are prepared as per the requirement specification document.
This helps define overall system architecture.

# Coding or Developing the Product

In this phase of SDLC, the actual development begins, and the programming is built.
The implementation of design begins concerning writing code.

# Testing the Product

After the code is generated, it is tested against the requirements to make sure that the products are solving the needs addressed and gathered during the requirements stage.
During this stage, unit testing, integration testing, system testing, acceptance testing are done.

# Deployment in the Market

Once the software is certified, and no bugs or errors are stated, then it is deployed.
Then based on the assessment, the software may be released as it is or with suggested enhancement in the object segment.
After the software is deployed, then its maintenance begins.

# Maintenance

Once when the client starts using the developed systems, then the real issues come up and requirements to be solved from time to time.
This procedure where the care is taken for the developed product is known as maintenance.

Software life-cycle Models

Waterfall Model
Iterative Model
Spiral Model
V-Model
Big Bang Model
Agile Model
Rapid Application Development (RAD) Model
Prototyping Models

Introduction: Overview of computer security

Computer Security: Security applied to an automated information system to attain the applicable objectives of preserving the integrity, availability, and confidentiality of information system resources, i.e protection of computer systems and information from harm, theft and unauthorized use.

Computer Security is mainly concerned with three main areas:

# Confidentiality

# Integrity

# Availability

CIA Triangle (Triad)

# It is simple but widely-applicable security model.

# The CIA Triad refers to the 3 goals of information security of the organization’s system, network and data.

# They are, “ Confidentiality, Integrity & Availability “.

Confidentiality

Confidentiality is roughly equivalent to privacy, that means that only the authorized individuals/systems can view sensitive or classified information.
The data being sent over the network should not be accessed by unauthorized individuals.
The attacker may try to capture the data using different tools available and gain access to your information.
Data encryption is a common method of ensuring confidentiality.
Another way to protect your data is through a VPN tunnel. VPN stands for Virtual Private Network and helps the data to move securely over the network.

Integrity

Integrity is protecting information from being modified by unauthorized parties.
The ability to ensure that data is an accurate and unchanged representation of the original secure information, Well, the idea here is making sure that data has not been modified.
Corruption of data is a failure to maintain data integrity. Corruption can occur when information is being compiled, stored, or transmitted.
To check if our data has been modified or not, we make use of a hash function such as SHA (Secure Hash Algorithm) and MD5(Message Direct 5).

Availability

Availability of information refers to, ensuring that authorized users are able to access the information when needed.
This means that the computing systems used to store and process the information, the security controls used to protect it, and the communication channels used to access it must be functioning correctly.
Information can be erased or become inaccessible, resulting in “loss of availability.”
Ensuring availability also involves preventing denial-of-service attacks, such as a flood of incoming messages to the target system, essentially forcing it to shut down.

To make information available to those who need it and who can be trusted with it, organizations use authentication and authorization.

# Authentication is proving that a user is a person he or she claims to be.

# Authorization is the act of determining whether a particular user (or computer system) has the right to carry out a certain activity, such as reading a file or running a program.

# Authentication and authorization go hand in hand.

# Users must be authenticated before carrying out the activity they are authorized to perform.

# Security is strong when the user cannot later deny that he or she performed the activity.

This is known as nonrepudiation.

# Accountability ( Tracing activities of an individual on a system) also supports non-repudiation

# These concepts of information security also apply to the term information security.

Challenges of Computer Security

Computer security is not as simple as it might first appear to the novice. But the mechanisms used to meet the requirements such as confidentiality, integrity, and availability can be quite complex, and understanding them may involve rather a subtle reasoning (mental keenness).
Potential attacks on developing a particular security mechanism or algorithm's security features.
The procedures used to provide particular services are often counterintuitive (does not happen in the way you would expect it to)
It is necessary to decide where to use the various security mechanisms in terms of physical placement (e.g., at what points in a network are certain security mechanisms needed) & in a logical sense (e.g., at what layer or layers of architecture such as TCP/IP should mechanisms be placed)
Security requires regular, even constant, monitoring, and this is difficult in today’s short-term, overloaded environment.

Components of Information System

Components of the information system are as follows:

# Hardware: Including computer systems and other data processing, data storage, and data communications devices.

# Software: Including the operating system, system utilities, and applications. These are used to control and coordinate the hardware components and for analyzing and processing of the data.

# Data: Including files and databases, as well as security-related data, such as password files.

# Network: Refers to the local and wide area network communication links, bridges, routers, and so on. These resources facilitate the flow of information in the organization.

# People: Every system needs people if it is to be useful, probably the component that most influences the success or failure of information systems. This includes "not only the users, but those who operate and service the computers, those who maintain the data, and those who support the network of computers."

# Procedures: The policies that govern the operation of a computer system.

Need for Information Security

The purpose of information security is to ensure the key Objectives, i.e.

# Confidentiality

# Integrity

# Availability

Thus preventing & minimising the impact of security incidents.

The major needs for security in an organization are,

# Protecting the functionality of the organization

# Enabling the safe operation of applications

# Protecting the data that the organization collect and use

# Safeguarding technology assets in organizations

Protecting the functionality of the organization

Implementing information security in an organization can protect the technology and information assets, by preventing, detecting, and responding to threats.
The decision-maker in organizations must set policy and operates their organization in keeping with the complex, efficient, and capable applications.

Enabling the safe operation of applications

The modern organization needs to create an environment that safeguards the application, particularly those application that serves as important elements of the infrastructure of the organization.

Protecting the data that the organization collect & use

Data in the organization can be in two forms that are either in rest or in motion, the motion of data signifies that data is currently used or processed by the system.
The attacker may try to corrupt the data values which affects the integrity of the data.

Safeguarding technology assets in organisations

The organization must add secure infrastructure services based on the size & scope of the organization.
Additional security services may be needed as the organization expands.

NSTISSC SECURITY MODEL

National Security Telecommunications & Information systems security committee’ document

# It is now called the National Training Standard for Information security professionals.

# While the NSTISSC model covers the three dimensions of information security, it omits a discussion of detailed guidelines and policies that direct the implementation of controls.

# The 3 dimensions of each axis become a 3x3x3 cube with 27 cells representing areas that must be addressed to secure today’s Information systems.

# To ensure system security, each of the 27 cells must be properly addressed during the security process.

# For example, the intersection between technology, Integrity & storage areas requires control or safeguard that addresses the need to use technology to protect the Integrity of information while in storage.

Integration and Testing of Embedded Hardware & Firmware

# Hardware includes the physical parts. (body)

# Firmware is a software program or a set of instructions on a hardware device. (brain)

Hardware and Firmware are developed and tested independently. (Unit Testing)

Hardware parts are tested by small utility programs
Firmware is tested by simulators.

Integration of Hardware and Firmware

Integration of hardware and firmware deals with the embedding of firmware into the target hardware board.
It is the process of ‘Embedding Intelligence’ to the product.
We have two types of an embedded system,

Operating System based

Non-Operating System based

Firmware embedding techniques for non- Operating system based embedded system

Out of Circuit Programing
In System Programing
In Application Programing

Out of Circuit Programing (OCP)

Out-of-circuit programming is performed outside the target board.
The processor or memory chip into which the firmware needs to embedded is taken out of the target board and it is programmed with the help of programming device.

In System Programming (ISP)

The firmware is embedded into the target device without removing it from the target board.
It is a flexible and easy way of firmware embedding.
The only pre-requisite is that the device must have an ISP support.
Apart from target board, PC, ISP utility and ISP cable, no other additional hardware is required for ISP.

In Application Programing (IAP)

In Application Programming (IAP) is a technique used by the firmware running on the target device for modifying a selected portion of the code memory.
It is not a technique for the first time embedding of user written firmware.
It modifies the program code memory under the control of the embedded application.
Updating calibration data, look-up tables, etc., which are stored in the code memory, are typical examples of IAP.

Firmware embedding techniques for Operating system based embedded system

USe of FActory Programmed Chips - for example, Calculator.

Embedded System Development Environment

The primary components:

host system
target system
connecting tools between host and target (like, IDE, Compilers,...)

IDEs (Integrated Development Environment)

In Embedded System, IDE stands for an integrated environment for developing and debugging the target processor specific embedded firmware.
An IDE is also known as integrated design environment or integrated debugging environment. IDE is a software package which bundles a “Text Editor”, “Cross-compiler”, ”Linker” and a “Debugger”.
IDEs can either command-line based or GUI based.
IDE consists of,

1. Text Editor or Source code editor

2. A compiler and an interpreter

3. Build automation tools

4. Debugger

5. Simulators

6. Emulators and logic analyzer

An example of IDE is Turbo C/C++ which provides a platform on windows for development of application programs with command-line interface.
The other category of IDE is known as Visual IDE which provides the platform for the visual development environment, for example, Microsoft Visual C++.
IDEs used in Embedded firmware are slightly different from the generic IDE used for high-level language-based development in desktop applications.
In Embedded applications, the IDE is either supplied by the target processor/controller manufacturer or by third-party vendors or as Open source.

Cross Compilers

A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running.
For example, a compiler that runs on a Windows 7 PC but generates code that runs on Android smartphone is a cross compiler.
A cross compiler is necessary to compile (High-level lang-> Machine level lang) code for multiple platforms from one development host.
The fundamental use of a cross compiler is to separate the built environment from the target environment. This is useful in several situations like,

# Embedded computers where a device has extremely limited resources.

# Compiling for multiple machines.

# Compiling on a server farm.

# Bootstrapping to a new platform

Disassemblers

A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler.

Decompilers

A decompiler is a computer program that takes an executable file as input and attempts to create a high-level source file which can be recompiled successfully.

It is, therefore, the opposite of a compiler, which takes a source file and makes an executable.

Decompilers are usually unable to perfectly reconstruct the original source code, and as such, will frequently produce obfuscated code.

Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.

Simulators

Simulators are a less complex application that simulates the internal behaviour of a device.

They are written in a high-level language.

Simulators can be difficult for debugging purposes.

IDE provides simulators support.

Example: iOS Simulator.

Emulators

A hardware or software that enabled one computer system to behave like another computer system.

It is written in machine language.

They are more suitable for debugging.

Example: Android(SDK) Emulators.

Debuggers

A computer program that is used to test and debug other programs or target programs.
It helps to identify errors in a computer program and to fix them.

Classification and Prediction in Data Mining

Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends.
Classification predicts categorical (discrete, unordered) class labels & Prediction, models continuous-valued functions.
The goal of prediction is to forecast or deduce the value of an attribute based on values of other attributes.
The goal of data classification is to organize and categorize data in distinct classes.

The Data Classification process includes two steps −

Building the Classifier or Model
Using Classifier for Classification

Building the Classifier or Model

This step is the learning step or the learning phase.
In this step, the classification algorithms build the classifier.
The classifier is built from the training set made up of database tuples and their associated class labels.
Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object, or data points.

Using Classifier for Classification

In this step, the classifier is used for classification.
Here the test data is used to estimate the accuracy of classification rules.
The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.

Classification and Prediction Issues

The major issue is preparing the data for Classification and Prediction. Preparing the data involves the following activities −

Data Cleaning − Data cleaning involves removing the noise and treatment of missing values. The noise is removed by applying smoothing techniques and the problem of missing values is solved by replacing a missing value with the most commonly occurring value for that attribute.
Relevance Analysis − Database may also have irrelevant attributes. Correlation analysis is used to know whether any two given attributes are related.
Data Transformation and reduction − The data can be transformed by any of the following methods.

Normalization − The data is transformed using normalization. Normalization involves scaling all values for a given attribute in order to make them fall within a small specified range. Normalization is used when in the learning step, the neural networks or the methods involving measurements are used.
Generalization − The data can also be transformed by generalizing it to the higher concept. For this purpose, we can use the concept hierarchies.

Note − Data can also be reduced by some other methods such as wavelet transformation, binning, histogram analysis, and clustering.

Data Preprocessing in Data Mining

Data preprocessing is a data mining technique that is used to transform the raw data into a useful and efficient format.

Steps involved are,

Data Cleaning
Data Integration
Data Transformation
Data Reduction
Data Discretization and Concept Hierarchy Generation

Data Cleaning

Data cleaning in data mining is the process of detecting and removing incomplete, noisy, and inconsistent data.

# We can remove the incomplete data by following methods,

Ignore the tuples - This approach is suitable only when the dataset we have is quite large and multiple values are missing within a tuple.
Fill the Missing values - We can fill the missing values manually, by attribute mean or the most probable value.

# Noisy Data (meaningless data) can be handled by,

Binning Method - This method works on sorted data to smooth it. The whole data is divided into segments of equal size and then various methods are performed to complete the task.

For Example,

There are three approaches to perform smoothing –

# Smoothing by bin means: In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.

# Smoothing by bin median: In this method, each bin value is replaced by its bin median value.

# Smoothing by bin boundary: In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.

Approach:

Sort the array of the given data set.
Divides the range into N intervals, each containing the approximately same number of samples(Equal-depth partitioning).
Store mean/ median/ boundaries in each row.

Example

Sorted data: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34

Smoothing by bin means:

- Bin 1: 9, 9, 9, 9

- Bin 2: 23, 23, 23, 23

- Bin 3: 29, 29, 29, 29

Smoothing by bin boundaries:

- Bin 1: 4, 4, 4, 15

- Bin 2: 21, 21, 25, 25

- Bin 3: 26, 26, 26, 34

Smoothing by bin median:

- Bin 1: 9 9, 9, 9

- Bin 2: 24, 24, 24, 24

- Bin 3: 29, 29, 29, 29

Regression - Data can be smoothed by fitting the data into a regression function.

Clustering - This approach groups similar data into a cluster. Values that fall outside of the set of clusters may be considered outliers.

Data Integration

Data Integration is a data preprocessing technique that combines data from multiple sources and provides users with a unified view of these data.

The data integration approaches are formally defined as triple <G, S, M> where,

G stand for the global schema,

S stands for a heterogeneous source of schema,

M stands for mapping between the queries of source and global schema.

There are mainly 2 major approaches for data integration – one is “tight coupling approach” and another is the “loose coupling approach”.

* Tight Coupling - In this coupling, data is combined from different sources into a single physical location through the process of ETL – Extraction, Transformation, and Loading.

* Loose Coupling - Here, the data only remains in the actual source databases.

Issues in Data Integration

Schema Integration
Redundancy
Detection and resolution of data value conflicts

Data Transformation

The process of transforming the data into appropriate forms suitable for the mining process. This involves the following ways,

Normalization - It is done to scale the data values in a specified range like 0.0 to 1.0.
Attribute Selection - In this strategy, new attributes are constructed from the given set of attributes to help the mining process.
Generalization - Here, low-level data are replaced with high-level data by using concept hierarchies climbing. For Example-The attribute “city” can be converted to “country”.
Smoothing - It is a process that is used to remove noise from the data set using some algorithms
Aggregation - Data collection or aggregation is the method of storing and presenting data in a summary format.
Discretization - This is done to replace the raw values of a numeric attribute by ranges or conceptual levels.

Data Reduction

Data reduction techniques can be applied to obtain a reduced representation of the data which aims to increase storage efficiency and reduce data storage and analysis costs. The various steps to data reduction are:

Data Cube Aggregation - Aggregation operation is applied to data for the construction of the data cube.

Numerosity Reduction - This enables to store the model of data instead of whole data, e.g. Regression Models.

Dimensionality Reduction - This reduces the size of data by encoding mechanisms. It can be lossy or lossless. If the original data can be retrieved after reconstruction from compressed data, such reduction is called lossless reduction else it is called lossy reduction. The two effective methods of dimensionality reduction are: Wavelet transforms and PCA (Principal Component Analysis).

Attribute Subset Selection - The goal of attribute subset selection is to find a minimum set of attributes such that dropping of those irrelevant attributes.

Methods of Attribute Subset Selection

1. Stepwise Forward Selection

* This procedure starts with an empty set of attributes as the minimal set.
* The most relevant attributes are chosen(having minimum p-value) and are added to the minimal set. * In each iteration, one attribute is added to a reduced set.

2. Stepwise Backward Elimination

Here all the attributes are considered in the initial set of attributes. In each iteration, one attribute is eliminated from the set of attributes whose p-value is higher than the significance level.

3. Combination of Forward Selection and Backward Elimination

* The stepwise forward selection and backward elimination are combined to select the relevant attributes most efficiently.

* This is the most common technique which is generally used for attribute selection.

4. Decision Tree Induction

* This approach a decision tree is used for attribute selection.

* It constructs a flow chart like structure having nodes denoting a test on an attribute.

* Each branch corresponds to the outcome of the test and leaf nodes are a class prediction.

* The attribute that is not the part of the tree is considered irrelevant and hence discarded.

Data Discretization and Concept Hierarchy Generation

Data Discretization techniques can be used to divide the range of continuous attributes into intervals. This leads to a concise, easy-to-use, knowledge-level representation of mining results.

Discretization techniques can be categorized based on which direction it proceeds, as Top-down & Bottom-up.

Concept hierarchy method can be used to reduce the data by collecting and replacing low-level concepts with higher-level concepts.

Typical methods

* Binning

* Cluster Analysis

* Histogram Analysis

* Entropy-Based Discretization