5
DATA MINING Data Mining: Intelligent methods are applied to extract the useful information or patterns Data Mining: A KDD Process: Data mining: the core of knowledge discovery process. Steps of a KDD Process Data Cleaning Handles Noisy, Inconsistent, Incomplete data Missing Values Noisy data Binning, Clustering etc. Inconsistencies Tools, functional dependencies Data Integration Schema Integration Entity Identification problem Redundancy Correlation Analysis Data Selection Select only the task relevant data Data Transformation Transform or consolidate data Smoothing, Normalization, Feature Construction Data Reduction Compression

Data Mining- Steps and Functionalities

Embed Size (px)

DESCRIPTION

Data Mining- Steps and Functionalities

Citation preview

  • DATA MINING

    Data Mining: Intelligent methods are applied to extract the

    useful information or patterns

    Data Mining: A KDD Process: Data mining: the core of knowledge discovery

    process.

    Steps of a KDD Process Data Cleaning Handles Noisy, Inconsistent, Incomplete data Missing Values Noisy data Binning, Clustering etc.

    Inconsistencies Tools, functional dependencies

    Data Integration Schema Integration Entity Identification problem

    Redundancy Correlation Analysis

    Data Selection Select only the task relevant data

    Data Transformation Transform or consolidate data Smoothing, Normalization, Feature Construction Data Reduction Compression

  • Pattern Evaluation Interestingness Measures

    Knowledge Presentation Visualization

    Data Mining Functionalities:

    Descriptive Characterize general properties of the data

    Predictive Performs inference

    Mining Parallel Various Granularities

    Concept/class description Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis

    Concept/ Class Description:

    Data can be associated with Classes / Concepts Computers, Printers BigSpenders Vs BudgetSpenders

    Class / Concept Description Classes and Concepts can be summarized in

    concise and precise terms Data Characterization Data Discrimination

  • Data Characterization:

    Summarization of the general characteristics Data collected and aggregated OLAP roll up operation Attribute Oriented Induction Results Charts, cubes, rules Example Characteristics of Customers

    Data Discrimination:

    Compare target class and contrasting classes Maybe user specified Examples: Products whose sales increased Vs decreased Regular Shoppers Vs Occasional Shoppers

    Output includes Comparative measures

    Association Analysis:

    Discovery of association rules Form: X Y Multi-dimensional Age(X, 2029)

    buys(X, Laptop) Single Dimensional

  • Classification and Prediction:

    Classification Finds models that describe and differentiate

    classes or concepts Predicts class Training data Models rules, decision trees, NN, formulae Preceded by relevance analysis (to eliminate

    irrelevant attributes) Prediction Derived model is used for prediction Data value prediction Class label prediction (Classification) Trend identification

    Cluster Analysis

    Unsupervised Class labels are missing in the training set Maximize Intra-class similarity Minimize Inter-class similarity Hierarchy of classes

    Outlier Analysis

    Objects that do not comply with the general behavior Noise Vs Rare events Fraud detection Statistical tests Deviation based methods

  • Evolution Analysis:

    Trend detection Time series data Involves other functionalities

    DATA MININGData Mining: Intelligent methods are applied to extract the useful information or patterns

    Data Mining: A KDD Process: Data mining: the core of knowledge discovery process.

    Steps of a KDD Process Data Cleaning Handles Noisy, Inconsistent, Incomplete data Missing Values Noisy data Binning, Clustering etc.

    Inconsistencies Tools, functional dependencies

    Data Integration Schema Integration Entity Identification problem

    Redundancy Correlation Analysis

    Data Selection Select only the task relevant data

    Data Transformation Transform or consolidate data Smoothing, Normalization, Feature Construction Data Reduction Compression

    Pattern Evaluation Interestingness Measures

    Knowledge Presentation Visualization

    Data Mining Functionalities: Descriptive Characterize general properties of the data

    Predictive Performs inference

    Mining Parallel Various Granularities

    Concept/class description Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis

    Concept/ Class Description: Data can be associated with Classes / Concepts Computers, Printers BigSpenders Vs BudgetSpenders

    Class / Concept Description Classes and Concepts can be summarized in concise and precise terms Data Characterization Data Discrimination

    Data Characterization: Summarization of the general characteristics Data collected and aggregated OLAP roll up operation Attribute Oriented Induction Results Charts, cubes, rules Example Characteristics of Customers

    Data Discrimination: Compare target class and contrasting classes Maybe user specified Examples: Products whose sales increased Vs decreased Regular Shoppers Vs Occasional Shoppers

    Output includes Comparative measures

    Association Analysis: Discovery of association rules Form: X ( Y Multi-dimensional Age(X, 2029) income(X, 20K25K) buys(X, Laptop)

    Single Dimensional buys(X, Laptop) buys(X, Software)

    Classification and Prediction: Classification Finds models that describe and differentiate classes or concepts Predicts class Training data Models rules, decision trees, NN, formulae Preceded by relevance analysis (to eliminate irrelevant attributes)

    Prediction Derived model is used for prediction Data value prediction Class label prediction (Classification) Trend identification

    Cluster Analysis Unsupervised Class labels are missing in the training set Maximize Intra-class similarity Minimize Inter-class similarity Hierarchy of classes

    Outlier Analysis Objects that do not comply with the general behavior Noise Vs Rare events Fraud detection Statistical tests Deviation based methods

    Evolution Analysis: Trend detection Time series data Involves other functionalities