Upload
raj-endran
View
6
Download
0
Embed Size (px)
DESCRIPTION
Data Mining- Steps and Functionalities
Citation preview
DATA MINING
Data Mining: Intelligent methods are applied to extract the
useful information or patterns
Data Mining: A KDD Process: Data mining: the core of knowledge discovery
process.
Steps of a KDD Process Data Cleaning Handles Noisy, Inconsistent, Incomplete data Missing Values Noisy data Binning, Clustering etc.
Inconsistencies Tools, functional dependencies
Data Integration Schema Integration Entity Identification problem
Redundancy Correlation Analysis
Data Selection Select only the task relevant data
Data Transformation Transform or consolidate data Smoothing, Normalization, Feature Construction Data Reduction Compression
Pattern Evaluation Interestingness Measures
Knowledge Presentation Visualization
Data Mining Functionalities:
Descriptive Characterize general properties of the data
Predictive Performs inference
Mining Parallel Various Granularities
Concept/class description Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis
Concept/ Class Description:
Data can be associated with Classes / Concepts Computers, Printers BigSpenders Vs BudgetSpenders
Class / Concept Description Classes and Concepts can be summarized in
concise and precise terms Data Characterization Data Discrimination
Data Characterization:
Summarization of the general characteristics Data collected and aggregated OLAP roll up operation Attribute Oriented Induction Results Charts, cubes, rules Example Characteristics of Customers
Data Discrimination:
Compare target class and contrasting classes Maybe user specified Examples: Products whose sales increased Vs decreased Regular Shoppers Vs Occasional Shoppers
Output includes Comparative measures
Association Analysis:
Discovery of association rules Form: X Y Multi-dimensional Age(X, 2029)
buys(X, Laptop) Single Dimensional
Classification and Prediction:
Classification Finds models that describe and differentiate
classes or concepts Predicts class Training data Models rules, decision trees, NN, formulae Preceded by relevance analysis (to eliminate
irrelevant attributes) Prediction Derived model is used for prediction Data value prediction Class label prediction (Classification) Trend identification
Cluster Analysis
Unsupervised Class labels are missing in the training set Maximize Intra-class similarity Minimize Inter-class similarity Hierarchy of classes
Outlier Analysis
Objects that do not comply with the general behavior Noise Vs Rare events Fraud detection Statistical tests Deviation based methods
Evolution Analysis:
Trend detection Time series data Involves other functionalities
DATA MININGData Mining: Intelligent methods are applied to extract the useful information or patterns
Data Mining: A KDD Process: Data mining: the core of knowledge discovery process.
Steps of a KDD Process Data Cleaning Handles Noisy, Inconsistent, Incomplete data Missing Values Noisy data Binning, Clustering etc.
Inconsistencies Tools, functional dependencies
Data Integration Schema Integration Entity Identification problem
Redundancy Correlation Analysis
Data Selection Select only the task relevant data
Data Transformation Transform or consolidate data Smoothing, Normalization, Feature Construction Data Reduction Compression
Pattern Evaluation Interestingness Measures
Knowledge Presentation Visualization
Data Mining Functionalities: Descriptive Characterize general properties of the data
Predictive Performs inference
Mining Parallel Various Granularities
Concept/class description Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis
Concept/ Class Description: Data can be associated with Classes / Concepts Computers, Printers BigSpenders Vs BudgetSpenders
Class / Concept Description Classes and Concepts can be summarized in concise and precise terms Data Characterization Data Discrimination
Data Characterization: Summarization of the general characteristics Data collected and aggregated OLAP roll up operation Attribute Oriented Induction Results Charts, cubes, rules Example Characteristics of Customers
Data Discrimination: Compare target class and contrasting classes Maybe user specified Examples: Products whose sales increased Vs decreased Regular Shoppers Vs Occasional Shoppers
Output includes Comparative measures
Association Analysis: Discovery of association rules Form: X ( Y Multi-dimensional Age(X, 2029) income(X, 20K25K) buys(X, Laptop)
Single Dimensional buys(X, Laptop) buys(X, Software)
Classification and Prediction: Classification Finds models that describe and differentiate classes or concepts Predicts class Training data Models rules, decision trees, NN, formulae Preceded by relevance analysis (to eliminate irrelevant attributes)
Prediction Derived model is used for prediction Data value prediction Class label prediction (Classification) Trend identification
Cluster Analysis Unsupervised Class labels are missing in the training set Maximize Intra-class similarity Minimize Inter-class similarity Hierarchy of classes
Outlier Analysis Objects that do not comply with the general behavior Noise Vs Rare events Fraud detection Statistical tests Deviation based methods
Evolution Analysis: Trend detection Time series data Involves other functionalities