Classification and Prediction in Data Mining

 


  • Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends. 
  • Classification predicts categorical (discrete, unordered)  class labels &  Prediction, models continuous-valued functions.
  • The goal of prediction is to forecast or deduce the value of an attribute based on values of other attributes.
  • The goal of data classification is to organize and categorize data in distinct classes.

The Data Classification process includes two steps −

  • Building the Classifier or Model
  • Using Classifier for Classification

Building the Classifier or Model


  • This step is the learning step or the learning phase.
  • In this step, the classification algorithms build the classifier.
  • The classifier is built from the training set made up of database tuples and their associated class labels.
  • Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object, or data points.

Building the Classifier or Model


Using Classifier for Classification

  • In this step, the classifier is used for classification. 
  • Here the test data is used to estimate the accuracy of classification rules. 
  • The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
Using the Classifier



Classification and Prediction Issues


The major issue is preparing the data for Classification and Prediction. Preparing the data involves the following activities −

  • Data Cleaning − Data cleaning involves removing the noise and treatment of missing values. The noise is removed by applying smoothing techniques and the problem of missing values is solved by replacing a missing value with the most commonly occurring value for that attribute.
  • Relevance Analysis − Database may also have irrelevant attributes. Correlation analysis is used to know whether any two given attributes are related.
  • Data Transformation and reduction − The data can be transformed by any of the following methods.
    • Normalization − The data is transformed using normalization. Normalization involves scaling all values for a given attribute in order to make them fall within a small specified range. Normalization is used when in the learning step, the neural networks or the methods involving measurements are used.
    • Generalization − The data can also be transformed by generalizing it to the higher concept. For this purpose, we can use the concept hierarchies.

Note − Data can also be reduced by some other methods such as wavelet transformation, binning, histogram analysis, and clustering.

No comments:

Post a Comment

Monk and Inversions

using System; public class Solution { public static void Main () { int T = Convert . ToInt32 ( Console . ReadLine...