In general, the major clustering methods can be classified into the following categories.
Partitioning Methods
Hierarchical Methods
Density-Based Methods
Grid-Based Methods
Partitioning approach:
- in partitioning approach, we construct various partitions and then evaluate them by some criterion.
- Given a database of n objects, a partitioning method constructs k partitions of the data object, where each partition represents a cluster. i.e., it clusters the data into k groups, which together satisfy the following requirements:
(1) each group must contain at least one object, and
(2) each object must belong to exactly one group
Typical methods: k-means, k-medoids,
Hierarchical approach:
- A hierarchical clustering method works by grouping data objects into a tree of clusters.
- New clusters are formed using a previously formed one.
- Hierarchical clustering methods can be further classified into,
Agglomerative hierarchical clustering (Bottom- Up) - This bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters until all of the objects are in a single cluster or until certain termination conditions are satisfied. Most hierarchical clustering methods belong to this category.
Divisive hierarchical clustering (Top- Down) - This top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces until each object forms a cluster on its own or until it satisfies certain termination conditions.
Typical methods: Diana, BIRCH, CAMELEON, CURE
Density-based approach:
- Most partitioning methods cluster the objects based on the distance between objects.
- Such methods can find only spherical-shaped clusters and encounter difficulty in discovering clusters of arbitrary shapes.
- Density-based clustering methods have been developed based on connectivity and density functions.
Typical methods: DBSACN, OPTICS, DenClue
Grid-based approach:
- Grid-based methods quantize the object space into a finite number of cells that form a grid structure.
- All of the clustering operations are performed on the grid structure (i.e., on the quantized space).
- The main advantage of this approach is its fast processing time, which is typically independent of the number of data objects and dependent only on the number of cells in each dimension in the quantized space.
Typical methods: STING, WaveCluster, CLIQUE
Model-based:
- Model-based methods hypothesize a model for each of the clusters and find the best fit of the data to the given model.
Frequent pattern-based:
- Based on the analysis of frequent patterns.
Typical methods: p-Cluster
User-guided or constraint-based:
- Clustering by considering user-specified or application-specific constraints
Typical methods: COD (obstacles), constrained clustering
No comments:
Post a Comment