16
1 The University of Iowa Intelligent Systems Laboratory Data Mining (Knowledge discovery in database) The University of Iowa Intelligent Systems Laboratory What is Data Mining? Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus Data mining finds valuable information hidden in large volumes of data. Data Mining process involves: Databases Statistics Machine Learning High Performance Computing Visualization Mathematics The University of Iowa Intelligent Systems Laboratory Data mining: Basic steps Feature Produ ct Produ ct Produ ct Produ ct Produ ct Produ ct Produ ct Produ ct Produ ct Produ ct Feature Feature Feature Feature Feature Feature Feature Feature Feature Selection Preprocessing Mining Data Selected Data Preprocessed Data Knowledge Classification, clustering, association analysis etc. Handling missing values, outlier removal, labeling output, feature selection Data sampling The University of Iowa Intelligent Systems Laboratory Mining tasks Classification: YES, NO Regression: Predict the actual output value Clustering: „Grouping identical itemsAssociation rules: Identifying association among input attributes. Anomaly detection: Detect deviation from normal behavior {Milk} --> {Coke}, {Diaper, Milk} --> {Beer} Fraud detection, Network intrusions Recommender systems e.g. Netflix Weather forecasting Patient diagnosis

Data Preprocessing - University of Iowauser.engineering.uiowa.edu/~coneng/lectures/Lecture_DM...2 The University of Iowa Intelligent Systems Laboratory Data Mining Software •Enterprise-level:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    The University of Iowa Intelligent Systems Laboratory

    Data Mining(Knowledge discovery in database)

    The University of Iowa Intelligent Systems Laboratory

    What is Data Mining?

    • Data Mining: "The non trivial extraction of implicit, previously unknown,

    and potentially useful information from data"

    – William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus

    • Data mining finds valuable information hidden in large volumes of data.

    • Data Mining process involves:

    – Databases

    – Statistics

    – Machine Learning

    – High Performance Computing

    – Visualization

    – Mathematics

    The University of Iowa Intelligent Systems Laboratory

    Data mining: Basic steps

    Feature

    Produ

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ctP

    rodu

    ct

    FeatureFeatureFeatureFeatureFeatureFeatureFeatureFeatureFeature

    Selection

    Preprocessing

    Mining

    Data

    Selected

    Data

    Preprocessed

    Data

    Knowledge

    Classification,

    clustering, association

    analysis etc.

    Handling missing

    values, outlier

    removal, labeling

    output, feature

    selection

    Data sampling

    The University of Iowa Intelligent Systems Laboratory

    Mining tasks

    • Classification: YES, NO

    • Regression: Predict the actual output value

    • Clustering: „Grouping identical items‟

    • Association rules: Identifying association among input

    attributes.

    • Anomaly detection: Detect deviation from normal behavior

    {Milk} --> {Coke}, {Diaper, Milk} --> {Beer}

    Fraud detection, Network intrusions

    Recommender systems e.g. Netflix

    Weather forecasting

    Patient diagnosis

  • 2

    The University of Iowa Intelligent Systems Laboratory

    Data Mining Software

    • Enterprise-level: (US $10,000 and more)– Fair Isaac, IBM, Insightful, KXEN, Oracle, SAS, and SPSS

    • Department-level: (from $1,000 to $9,999)– Angoss, CART/MARS/TreeNet/Random Forests, Equbits,

    GhostMiner, Gornik, Mineset, MATLAB, Megaputer, Microsoft

    SQL Server, Statsoft Statistica, ThinkAnalytics

    • Personal-level: (from $1 to $999):– Excel, See5, MATLAB

    • Free:– C4.5, R, Weka, Xelopes

    The University of Iowa Intelligent Systems Laboratory

    Polling results

    The University of Iowa Intelligent Systems Laboratory

    Data Mining: WEKA

    The University of Iowa Intelligent Systems Laboratory

    Outline

    •Data preparation

    •Preprocessing and “arff” files

    •Filters, classifiers, and visualization

    •Attribute selection

    •Training and testing

    •Quality measurements

    •Interpretation of results

  • 3

    The University of Iowa Intelligent Systems Laboratory

    Data Mining Procedures

    • Prepare the data into desired formats

    • Preprocess the data if necessary

    • Select different algorithms based on

    application or domain expertise

    • Evaluate the results and repeat experiments

    again if necessary

    The University of Iowa Intelligent Systems Laboratory

    Data Sets

    The University of Iowa Intelligent Systems Laboratory

    Arff file

    @relation weather@attribute No real@attribute outlook {sunny,overcast,rainy}@attribute temperature real@attribute humidity real@attribute windy {TRUE,FALSE}@attribute play {yes,no}

    The University of Iowa Intelligent Systems Laboratory

    Arff file

  • 4

    The University of Iowa Intelligent Systems Laboratory

    Arff file

    The University of Iowa Intelligent Systems Laboratory

    Arff file

    The University of Iowa Intelligent Systems Laboratory

    WEKA

    The University of Iowa Intelligent Systems Laboratory

    WEKA Explorer

  • 5

    The University of Iowa Intelligent Systems Laboratory

    WEKA

    The University of Iowa Intelligent Systems Laboratory

    WEKA

    The University of Iowa Intelligent Systems Laboratory

    Parameter Distribution

    The University of Iowa Intelligent Systems Laboratory

    Filters

  • 6

    The University of Iowa Intelligent Systems Laboratory

    Filters

    The University of Iowa Intelligent Systems Laboratory

    Filters

    The University of Iowa Intelligent Systems Laboratory

    Filters

    The University of Iowa Intelligent Systems Laboratory

    Filters

  • 7

    The University of Iowa Intelligent Systems Laboratory

    Filters

    The University of Iowa Intelligent Systems Laboratory

    Classifier

    The University of Iowa Intelligent Systems Laboratory

    Classifier

    The University of Iowa Intelligent Systems Laboratory

    Classifiers

  • 8

    The University of Iowa Intelligent Systems Laboratory

    Classifiers

    The University of Iowa Intelligent Systems Laboratory

    Classifiers

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

  • 9

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

    The University of Iowa Intelligent Systems Laboratory

    Decision Tree

  • 10

    The University of Iowa Intelligent Systems Laboratory

    Classifier: PART

    The University of Iowa Intelligent Systems Laboratory

    Classifier: PART

    The University of Iowa Intelligent Systems Laboratory

    Classifier: PART

    The University of Iowa Intelligent Systems Laboratory

    Classifier: PART

  • 11

    The University of Iowa Intelligent Systems Laboratory

    Neural Networks

    The University of Iowa Intelligent Systems Laboratory

    Neural Networks

    The University of Iowa Intelligent Systems Laboratory

    Neural Networks

    The University of Iowa Intelligent Systems Laboratory

    Association Rules

  • 12

    The University of Iowa Intelligent Systems Laboratory

    Association Rules

    The University of Iowa Intelligent Systems Laboratory

    Association Rules

    The University of Iowa Intelligent Systems Laboratory

    Visualization

    The University of Iowa Intelligent Systems Laboratory

    Visualization

  • 13

    The University of Iowa Intelligent Systems Laboratory

    Clustering

    The University of Iowa Intelligent Systems Laboratory

    Clustering

    The University of Iowa Intelligent Systems Laboratory

    Clustering

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

  • 14

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

  • 15

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

    The University of Iowa Intelligent Systems Laboratory

    Test Data Analysis

  • 16

    The University of Iowa Intelligent Systems Laboratory

    UCI repository

    •http://archive.ics.uci.edu/ml/index.html

    http://archive.ics.uci.edu/ml/index.html