22
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Embed Size (px)

Citation preview

Page 1: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Group Members:

Karim C. El-KhazenPascal Suria

Lin GuiPhilsou Lee

Xiaoting Niu

Page 2: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

DefinitionGeneral Concept

FoundationsEvolutionApplicationsChallenges

AlgorithmsClassicalNext Generations

Introduction to Data Mining

Page 3: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

What is Data Mining?

Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods.

Introduction to Data Mining

Page 4: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Foundations

Massive data collectionPowerful multiprocessor computersData mining algorithms

Page 5: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

EvolutionEvolutionaryStep

Business Question EnablingTechnologies

ProductProviders

Characteristics

Data Collection(1960s)

"What was my totalrevenue in the lastfive years?"

Computers, tapes,disks

IBM, CDC Retrospective, staticdata delivery

Data Access(1980s)

"What were unitsales in NewEngland lastMarch?"

Relationaldatabases(RDBMS), SQL,ODBC

Oracle, Sybase,Informix, IBM,Microsoft

Retrospective,dynamic datadelivery at recordlevel

DataWarehousing &Decision Support(1990s)

"What were unitsales in NewEngland last March?Drill down toBoston."

OLAP, multi-dimensionaldatabases, datawarehouses

Pilot, Comshare,Arbor, Cognos,Microstrategy

Retrospective,dynamic datadelivery at multiplelevels

Data Mining(EmergingToday)

"What’s likely tohappen to Bostonunit sales nextmonth? Why?"

Advancedalgorithms,multiprocessorcomputers,massive databases

Pilot, Lockheed,IBM, SGI,numerous startups

Prospective,proactiveinformation delivery

Page 6: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

ApplicationsIndustry

RetailsHealth maintenance groupTelecommunications Credit card

Web miningSports and entertainment solutions

Page 7: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Challenges

Ability to handle different types of data Graceful degeneration of data mining algorithms Valuable data mining results Representation of data mining requests and results Mining at different abstraction levels Mining information from different sources of data Protection of privacy and data security

Page 8: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Hierarchy of Choices and Decisions

Business goalCollecting, cleaning and preparing dataPredictionModel type and algorithms

Page 9: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Data Description

Descriptions of data characteristics in elementary and aggregated formSummarization Visualization

Page 10: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Predictive Data Mining

Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships.

Page 11: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Prediction: Classification

Classification predicts class membership Pre-classify (using classification algorithms)Test to determine the quality of the modelPredict (using effective classifier)

Page 12: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Prediction: Regression

Regression takes a numerical dataset and develops a mathematical formula that fits the data. 

When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction! 

Page 13: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

AlgorithmsClassical Techniques

StatisticsNeighborhoodsClustering

Next GenerationsDecision TreeNeural NetworkRule Induction

Page 14: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

StatisticsClassical Statistics:

Related to the collection and description of dataBelieves: there exists an underlying pattern of data

distributionObjective: find the best guess

Data Mining:Employs statistical methodsNeeds to analyze huge amounts of dataBeyond traditional statistics

Page 15: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

NeighborhoodsBasic idea:

For a new problem, look for the similar problems (neighborhoods) that have been solved

Key point: find the neighborhoodCalculate the distance: how far is good to be

considered as a neighbor?Which class the new problem belong to?

Large computational load:New calculation for each new case

Page 16: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

ClusteringElements grouped together according to different

characteristicsEvery cluster share same values (homogenous)

Problem: Control the number of clusterHierarchical clustering: flexibilityNon-hierarchical clustering: given by user

Used most frequently for:Consolidating data into a high-level of viewGroup records into likely behaviors

Page 17: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Decision TreeA way of representing a series of rules that lead to a

class or value

Structure: Decision node, branches, leaves

Example: A loan officer wants to determine the credit of applicants

Page 18: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Decision Tree (continued)

Help to induce the tree and its rules to make predictions

Page 19: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Neural NetworksEfficiently modeling large and complex

problems with hundreds of predictor variablesStructure:

Input layer, hidden layer, output layerActivation function between nodesRequires training and testing of relations

Page 20: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Neural Networks (continued)Example:

Page 21: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining

Rule InductionA method to derive a set of rules to classify

casesFor example, rule induction can be used to

discover patterns relating decisions (e.g., credit card application)

Rules may not cover all possible situations

Page 22: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

Introduction to Data Mining