Thursday, September 16, 2021

Particle swarm optimization (PSO)

Developed in 1995 by James Kennedy and Russ Eberhart.
It was inspired by social behavior of bird flocking or fish schooling.
PSO applies the concept of social interaction to problem solving.
PSO is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality.
PSO is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions.
Also, PSO does not use the gradient of the problem being optimized, which means PSO does not require that the optimization problem be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods.
However, metaheuristics such as PSO do not guarantee an optimal solution is ever found.
PSO is a robust optimization technique that applies the social intelligence of swarms to solve optimization problems.
Bird flocking and fish schooling are the example of swarm intelligence. In bird flocking, each swarm update its position by following collective intelligence of the group and its own intelligence.

First version of this algorithm was proposed by J. Kennedy, R. Eberhart.In this version, a set of potential solutions (called as particles), are dispersed at various points in the solution space. Each point has some objective function value, and thus has fitness, for its current location. Each particle’s movement is influenced by the following two factors:

The best fitness achieved by the particle so far. (BFS)
The best fitness achieved by any particle in the neighbourhood. (BFN)

Suppose that if the search space is n-dimensional, and then the particle i of the swarm can be represented by an n-dimensional vector Xi= (xi1, xi2, …,xin). The velocity of this particle can be represented by another n-dimensional vector Vi= (vi1, vi2, …, vin).

The fitness of each particle can be evaluated according to the objective function of optimization problem. The best previously visited position of the particle i is noted as its individual best position Pi= (pi1,pi2,…,pin).

The position of the best individual of the whole swarm is noted as the global best position G= (g1,g2, …,gn). At each step, the velocity of particle and its new position will be assigned according to the following two equations:

Vi=ω∗Vi+c1∗r1∗(Pi Xi)+c2∗r2∗(GXi) (1)

Xi=Xi+Vi (2)

Where, ω is called the inertia weight that controls the impact of previous velocity of particle on its current one. r1, r2 are independently uniformly distributed random variables with range (0, 1). c1, c2 are positive constant parameters called acceleration coefficients which control the maximum step size.

In PSO, Eq. (1) is used to calculate the new velocity according to its previous velocity and to the distance of its current position from both its own best historical position and the best position of the entire population or its neighbourhood. Generally, the value of each component in V can be clamped to the range [-vmax, vmax] to control excessive roaming of particles outside the search space. Then the particle flies toward a new position according Eq. (2). This process is repeated until a user-defined stopping criterion is reached.

#Bird Flocking

A flock is a gathering of a group of same species animals in order to forage or travel with one another. In avians flocks are typically seen in association with migration. While this is true it can also be seen that flocking is important in safety from predation and foraging benefits.

Flocking is the behavior exhibited when a group of birds, called a flock, are foraging or in flight.

From the perspective of the mathematical modeller, "flocking" is the collective motion by a group of self-propelled entities and is a collective animal behavior exhibited by many living beings such as birds, fish, bacteria, and insects. It is considered an emergent behavior arising from simple rules that are followed by individuals and does not involve any central coordination.

In 1986, Craig Reynolds described this process in 3 simple behaviors:

Separation- avoid crowding local flockmates
Alignment- move towards the average heading of local flockmates
Cohesion- move toward the average position of local flockmates

#Fish Schooling

A school is a group of fish swimming in a synchronizing and polarized way. The individuals of this group adopt a schooling behavior (Pitcher, 1983). Schooling behaviors are characterized by the tendency to polarize, with individuals adopting the same orientation and swimming in the same direction

• Fish schooling and aggregation behaviors are some of the most prominent social and group activities exhibited by fishes.

• Fish may school or form aggregations for many reasons, including foraging, reproduction, and defense from predators.

•One of the most enduring hypotheses regarding fish schooling is that fish in schools can obtain a hydrodynamic advantage, thus reducing the cost of locomotion, by taking advantage of the wakes shed by neighbors within the school.

Terminologies in PSO

•Collection of flying particles (swarm) - Changing solutions

• Search area - Possible solutions

•Movement towards a promising area to get the global optimum

•Each particle keeps track:
its best solution, personal best, pbest or lbest
the best value of any particle, global best, gbest

Each particle keeps track of its coordinates in the solution space which are associated with the best solution (fitness) that has achieved so far by that particle. This value is called personal best , pbest.
Another best value that is tracked by the PSO is the best value obtained so far by any particle in the neighborhood of that particle. This value is called gbest.
The basic concept of PSO lies in accelerating each particle toward its pbest and the gbest locations, with a random weighted accelaration at each time step.

Working

It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity.

Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

Particles Velocity
1. Inertia: Makes the particle move in the same direction and with the same velocity
2. Personal Influence: Improves the individual, Makes the particle return to a previous
position, better than the current, Conservative
3. Social Influence: Makes the particle follow the best neighbors direction

Intensification: explores the previous solutions, finds the best solution of a given region
Diversification: searches new solutions, finds the regions with potentially the best solutions

Evolution

PSO is originally attributed to Kennedy, Eberhart and Shi and was first intended for simulating social behaviour, as a stylized representation of the movement of organisms in a bird flock or fish school. The algorithm was simplified and it was observed to be performing optimization. The book by Kennedy and Eberhart describes many philosophical aspects of PSO and swarm intelligence. An extensive survey of PSO applications is made by Poli. Recently, a comprehensive review on theoretical and experimental works on PSO has been published by Bonyadi and Michalewicz.

The basic PSO parameters are swarm size or number of

particles, number of iterations, velocity components, and acceleration coefficients

illustrated bellow. In addition, PSO is also influenced by inertia weight, velocity

clamping, and velocity constriction

Neighborhood Topologies

A neighborhood must be defined for each particle [7]. This neighborhood

determines the extent of social interaction within the swarm and influences a

particular particle’s movement. Less interaction occurs when the neighborhoods in

the swarm are small [4]. For small neighborhood, the convergence will be slower

but it may improve the quality of solutions. For larger neighborhood, the

convergence will be faster but the risk that sometimes convergence occurs earlier

[7]. To solve this problem, the search process starts with small neighborhoods size

and then the small neighborhoods size is increased over time. This technique

ensures an initially high diversity with faster convergence as the particles move

towards a promising search region [4].

The PSO algorithm is social interaction among the particles in the entire swarm.

Particles communicate with one another by exchanging information about the

success of each particle in the swarm. When a particle in the whole swarm finds a

better position, all particles move towards this particle. This performance of the

particles is determined by the particles’ neighborhood [4]. Researchers have

worked on developing this performance by designing different types of

neighborhood structures [15]. Some neighborhood structures or topologies are

discussed below:

Figure 3.5 (a) illustrates the star topology, where each particle connects with every

other particle. This topology leads to faster convergence than other topologies, but

there is a susceptibility to be trapped in local minima. Because all particles know

each other, this topology is referred to as the gbest PSO.

Figure 3.5 (b) illustrates the ring topology, where each particle is connected only

with its immediate neighbors. In this process, when one particle finds a better

result, this particle passes it to its immediate neighbors, and these two immediate

neighbors pass it to their immediate neighbors, until it reaches the last particle.

Thus the best result found is spread very slowly around the ring by all particles.

Convergence is slower, but larger parts of the search space are covered than with

the star topology. It is referred as the lbest PSO.

Figure 3.5 (c) illustrates the wheel topology, in which only one particle (a focal

particle) connects to the others, and all information is communicated through this

particle. This focal particle compares the best performance of all particles in the

swarm, and adjusts its position towards the best performance particle. Then the

new position of the focal particle is informed to all the particles.

Figure 3.5 (d) illustrates a four clusters topology, where four clusters (or cliques)

are connected with two edges between neighboring clusters and one edge between

opposite clusters.

There are more different neighborhood structures or topologies (for instance,

pyramid topology, the Von Neumann topology and so on), but there is no the best

topology known to find the optimum for all kinds of optimization problems.

Neighbourhoods and topologies

The topology of the swarm defines the subset of particles with which each particle can exchange information.
The basic version of the algorithm uses the global topology as the swarm communication structure. This topology allows all particles to communicate with all the other particles, thus the whole swarm share the same best position g from a single particle. However, this approach might lead the swarm to be trapped into a local minimum, thus different topologies have been used to control the flow of information among particles.
For instance, in local topologies, particles only share information with a subset of particles. This subset can be a geometrical one – for example "the m nearest particles" – or, more often, a social one, i.e. a set of particles that is not depending on any distance.
In such cases, the PSO variant is said to be local best (vs global best for the basic PSO).

A commonly used swarm topology is the ring, in which each particle has just two neighbours, but there are many others. The topology is not necessarily static. In fact, since the topology is related to the diversity of communication of the particles, some efforts have been done to create adaptive topologies (SPSO, APSO, stochastic star, TRIBES, Cyber Swarm, and C-PSO).

Inshort,

Two different kinds of neighborhoods are defined for PSO:

1. In the gbest swarm, all the particles are neighbors of each other; thus, the position of the best overall particle in the swarm is used in the social term of the velocity update equation.

•It is assumed that gbest swarms converge fast, as all the particles are attracted simultaneously to the best part of the search space.

•However, if the global optimum is not close to the best particle, it may be impossible to the swarm to explore other areas; this means that the swarm can be trapped in local optima.

2. In the pbest / lbest swarm, only a specific number of particles (neighbor count) can affect the velocity of a given particle.

•The swarm will converge slower but can locate the global optimum with a greater chance.

In the lbest / pbest topology, a number of neighbors are selected from each side of the particle

Convergence

In relation to PSO the word convergence typically refers to two different definitions:

Convergence of the sequence of solutions (aka, stability analysis, converging) in which all particles have converged to a point in the search-space, which may or may not be the optimum,
Convergence to a local optimum where all personal bests p or, alternatively, the swarm's best known position g, approaches a local optimum of the problem, regardless of how the swarm behaves.

Convergence of the sequence of solutions has been investigated for PSO.^[15]^[16]^[17] These analyses have resulted in guidelines for selecting PSO parameters that are believed to cause convergence to a point and prevent divergence of the swarm's particles (particles do not move unboundedly and will converge to somewhere). However, the analyses were criticized by Pedersen^[22] for being oversimplified as they assume the swarm has only one particle, that it does not use stochastic variables and that the points of attraction, that is, the particle's best known position p and the swarm's best known position g, remain constant throughout the optimization process. However, it was shown^[38] that these simplifications do not affect the boundaries found by these studies for parameter where the swarm is convergent. Considerable effort has been made in recent years to weaken the modelling assumption utilized during the stability analysis of PSO,^[39] with the most recent generalized result applying to numerous PSO variants and utilized what was shown to be the minimal necessary modeling assumptions.^[40]

Convergence to a local optimum has been analyzed for PSO in^[41] and.^[42] It has been proven that PSO needs some modification to guarantee finding a local optimum.

This means that determining convergence capabilities of different PSO algorithms and parameters still depends on empirical results. One attempt at addressing this issue is the development of an "orthogonal learning" strategy for an improved use of the information already existing in the relationship between p and g, so as to form a leading converging exemplar and to be effective with any PSO topology. The aims are to improve the performance of PSO overall, including faster global convergence, higher solution quality, and stronger robustness.^[43] However, such studies do not provide theoretical evidence to actually prove their claims.

Comparison with other evolutionary computation techniques

• Unlike in genetic algorithms, evolutionary programming and evolutionary strategies, in PSO, there is no selection operation.

• All particles in PSO are kept as members of the population through the course of the run

• PSO is the only algorithm that does not implement the survival of the fittest.

• No crossover operation in PSO.

• eq 1resembles mutation in EP.

• In EP balance between the global and local search can be adjusted through the strategy parameter while in PSO the balance is achieved through the inertial weight factor (w) of eq. 1

Advantages & Disadvantages of PSO

Advantages of the PSO algorithm [14] [15]:

1) PSO algorithm is a derivative-free algorithm.

2) It is easy to implementation, so it can be applied both in scientific research

and engineering problems.

3) It has a limited number of parameters and the impact of parameters to the

solutions is small compared to other optimization techniques.

4) The calculation in PSO algorithm is very simple.

5) There are some techniques which ensure convergence and the optimum

value of the problem calculates easily within a short time.

6) PSO is less dependent of a set of initial points than other optimization

techniques.

7) It is conceptually very simple.

Disadvantages of the PSO algorithm [13]:

1) PSO algorithm suffers from the partial optimism, which degrades the

regulation of its speed and direction.

2) Problems with non-coordinate system (for instance, in the energy field)

exit.

Advantages
• Insensitive to scaling of design variables
• Simple implementation
• Easily parallelized for concurrent processing
• Derivative free
• Very few algorithm parameters
• Very efficient global search algorithm

Disadvantages
• Tendency to a fast and premature convergence in mid optimum points
• Slow convergence in refined search stage (weak local search ability)

https://slideplayer.com/amp/1528196/

Millonas proposed five basic principles (van den Bergh 2001):

(1) Proximity: the swarm should be able to carry out simple space and time computations.

(2) Quality: the swarm should be able to sense the quality change in the environment and response it.

(3) Diverse response: the swarm should not limit its way to get the resources in a narrow scope.

(4) Stability: the swarm should not change its behavior mode with every environmental change.

(5) Adaptability: the swarm should change its behavior mode when this change is worthy.

Explain proximity principle, stability principle and adaptability principle of swam intelligence.

Proximity principle: The basic units of a swarm should be capable of giving the respond back to to environmental variance triggered by interactions among agents. However, some fundamental behaviors are shared such as living-resource searching and nest-building.

Principle of stability: The population should not change its mode of behavior every time the environment changes.

Principle of adaptability: The swarm is sensitive to the changes in the environment that result in different swarm behaviour.

https://www.techferry.com/articles/swarm-intelligence.html

Wednesday, September 15, 2021

INFORMATION RETRIEVAL 3&4Module

Discuss the concept of hubs and authorities in multimedia indexing.

https://nlp.stanford.edu/IR-book/html/htmledition/hubs-and-authorities-1.html

https://safecont.com/en/ranking-urls-hubs-authorities/

given a query, every web page is assigned two scores. One is called its hub score and the other its authority score .
For any query, we compute two ranked lists of results rather than one.
The ranking of one list is induced by the hub scores and that of the other by the authority scores.

There are two primary kinds of web pages useful as results for broad-topic searche
There are authoritative sources of information on the topic. We will call such pages authorities. They are the pages that will emerge with high authority scores.
There are pages on the Web that are hand-compiled lists of links to authoritative web pages on a specific topic. These hub pages are not in themselves authoritative sources of topic-specific information.
They are compilations made by someone with an interest in the topic. These hub pages are the pages that will emerge with high hub scores.
A good hub page is one that points to many good authorities;
A good authority page is one that is pointed to by many good hub pages.

Explain how correlated terms are built form local document set through association clustering.

Discuss page rank algorithm in the context of a small web consisting of three pages A,B and C whereby page A links to the pages B and c, page B links to page c and page c links to page A. Assume the damping factor is 0.5. Find the page rankings also.

PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) / 2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))
We get the following PageRank™ values for the single pages:
PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = 0.76923077
PR(C) = 15/13 = 1.15384615

The sum of all pages' PageRanks is 3 and thus equals the total number of web pages.

What are the different methods of query expansion?

https://en.wikipedia.org/wiki/Query_expansion

Explain how the URL frontier maintains the politeness and priority property of crawlers.

https://nlp.stanford.edu/IR-book/html/htmledition/the-url-frontier-1.html

Consider a Web graph with three nodes 1,2, and 3. The links are as follows: 'l->2,3->2,2->1, 2->3. Write down the transition probability matrices for the surfer's walk with teleporting, with the value of teleport probability alpha q=0.5. Derive the ranking of the nodes using Page Rank algorithm.

What is pseudo relevance feedback? What is its disadvantage?

Suppose that a user's initial query is 'cheap CDs cheap DVDs extremely cheap CDs'. The user examines two documents, dl and d2. He judges dl, with the content 'CDs cheap software cheap CDs' relevant and d2 with content 'cheap thrills DVDs'non-relevant. Assume that we are using direct term frequency (with no scaling and no docuhrent frequency). There is no need to length-nonnalize vectors. Using Rocchio
relevance feedback what would the revised query vector be after relevancef eedback?
Assume α = l, β = 0.75, γ= 0.25.

Discuss Page Ranking.

page rank is a measure of how 'important' a web page is. It works on the basis that when another website links to your web page, it's like a recommendation or vote for that web page. Each link increases the web page's page rank. ... It's a vote from a less-known website that isn't relevant to the subject.

Write note on the User Relevance Feedback strategy for query reformulation. Explain the process of Query Expansion and Term Reweighting for vector model.

Write note on the process of query expansion through Local Clustering. Explain three cluster building strategies for local clustering in detail (association clusters, metric clusters, and scalar clusters).

Module4

Discuss how XML retrieval is performed

Explain how R tree helps in mapping objects into f-D space to do clustering

Explain the application of GEMINI method in colour images. Discuss. its.application in medical field.

Explain any one data structure used in multimedia indexing

Explain the generic multimedia indexing approach for multimedia IR

Briefly explain the vector space model for information retrieval from XML documents

What are spatial access methods?

Explain the generic multimedia indexing approach for multimedia IR
.
Briefly explain the vector space model for information retrieval from XML documents

Aim: to have each dimension of the vector space encode a word together with its position within the XML tree.
How: Map XML documents to lexicalized subtrees.

Take each text node (leaf) and break it into multiple nodes, one for each word. E.g. split Bill Gates into Bill and Gates
Define the dimensions of the vector space to be lexicalized subtrees of documents – subtrees that contain at least one vocabulary term.

We can now represent queries and documents as vectors in this space of lexicalized subtrees and compute matches between them, e.g. using the vector space formalism.

Write note on two type of multimedia similarity queries Whole Match and Sub-pattern Match queries with example.

Why we need spatial access methods instead of sequential scanning to access multimedia objects? Explain GEMINI algorithm (Explain each step).

We need spatial access methods to organizing spatial data (multidimensional data such as points, line segments, regions, polygons, volumes, or other kinds of geometric entities) that allows the efficient retrieval of data according to a set of search criteria.

What do you mean by curse of dimmsionality? Discuss the difference between feature selection and feature extraction with example. How do these two process contribute to dimensionalitv reduction?

The curse of dimensionality basically means that the error increases with the increase in the number of features. he dimension of a dataset corresponds to the number of attributes/features that exist in a dataset. A dataset with a large number of attributes, generally of the order of a hundred or more, is referred to as high dimensional data. It refers to the fact that algorithms are harder to design in high dimensions and often have a running time exponential in the dimensions. A higher number of dimensions theoretically allow more information to be stored, but practically it rarely helps due to the higher possibility of noise and redundancy in the real-world data.
Some of the difficulties that come with high dimensional data manifest during analyzing or visualizing the data to identify patterns, and some manifest while training machine learning models. The difficulties related to training machine learning models due to high dimensional data is referred to as ‘Curse of Dimensionality’. The popular aspects of the curse of dimensionality; ‘data sparsity’ and ‘distance concentration’

Feature selection is the process of choosing precise features, from a features pool. This helps in simplification, regularization and shortening training time. This can be done with various techniques: e.g. Linear Regression, Decision Trees.

Feature extraction is the process of converting the raw data into some other data type, with which the algorithm works is called Feature Extraction. Feature extraction creates a new, smaller set of features that captures most of the useful information in the data.

The main difference between them is Feature selection keeps a subset of the original features while feature extraction creates new ones.

Machine Learning - Feature Selection vs Feature Extraction - Data Analytics

A scalable saliency-based Feature selection method with instance level information | DeepAI

What is Feature selection

Problem of selecting some subset of a learning algorithm’s input variables upon which it should
focus attention, while ignoring the rest. In other words, Dimensionality Reduction
Especially when dealing with a large number of variables there is a need for Dimensionality
Reduction.
Feature Selection can significantly improve a learning algorithm’s performance
Many domains have tens of thousands of variables out of which most are irrelevant and redundant.
Feature selection limits the training data and reduces the amount of computational resources used.
It can significantly improve a learning algorithms performance.

Feature Extraction

Feature extraction is an attribute reduction process. Unlike feature selection, which ranks the
existing attributes according to their predictive significance, feature extraction actually transforms
the attributes. The transformed attributes, or features, are linear combinations of the original
attributes.
The general workflow involves applying feature extraction on given data to extract features and
then apply feature selection with respect to the target variable to select a subset of data. In effect,
this helps improve the accuracy of a model.
For example, in an unsupervised learning problem, the extraction of
bigrams from a text, or the extraction of contours from an image are examples of feature
extraction.
Feature extraction can be used to extract the themes of a document collection, where documents
are represented by a set of key words and their frequencies