CS.Lectures: Categorization of clustering methods

In general, the major clustering methods can be classified into the following categories.

Partitioning Methods

Hierarchical Methods

Density-Based Methods

Grid-Based Methods

Partitioning approach:

in partitioning approach, we construct various partitions and then evaluate them by some criterion.
Given a database of n objects, a partitioning method constructs k partitions of the data object, where each partition represents a cluster. i.e., it clusters the data into k groups, which together satisfy the following requirements:

(1) each group must contain at least one object, and

(2) each object must belong to exactly one group

Typical methods: k-means, k-medoids,

Hierarchical approach:

A hierarchical clustering method works by grouping data objects into a tree of clusters.
New clusters are formed using a previously formed one.
Hierarchical clustering methods can be further classified into,

Agglomerative hierarchical clustering (Bottom- Up) - This bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters until all of the objects are in a single cluster or until certain termination conditions are satisfied. Most hierarchical clustering methods belong to this category.

Divisive hierarchical clustering (Top- Down) - This top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces until each object forms a cluster on its own or until it satisfies certain termination conditions.

Typical methods: Diana, BIRCH, CAMELEON, CURE

Density-based approach:

Most partitioning methods cluster the objects based on the distance between objects.
Such methods can find only spherical-shaped clusters and encounter difficulty in discovering clusters of arbitrary shapes.
Density-based clustering methods have been developed based on connectivity and density functions.

Typical methods: DBSACN, OPTICS, DenClue

Grid-based approach:

Grid-based methods quantize the object space into a finite number of cells that form a grid structure.
All of the clustering operations are performed on the grid structure (i.e., on the quantized space).
The main advantage of this approach is its fast processing time, which is typically independent of the number of data objects and dependent only on the number of cells in each dimension in the quantized space.

Typical methods: STING, WaveCluster, CLIQUE

Model-based:

Model-based methods hypothesize a model for each of the clusters and find the best fit of the data to the given model.

Typical methods: Expectation-Maximization (EM), SOM, COBWEB

Frequent pattern-based:

Based on the analysis of frequent patterns.

Typical methods: p-Cluster

User-guided or constraint-based:

Clustering by considering user-specified or application-specific constraints

Typical methods: COD (obstacles), constrained clustering

Wednesday, December 16, 2020

Categorization of clustering methods