University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data...

Preview:

Citation preview

University of Toronto04/19/23 1

Data Mining

The Art and Science of Obtaining Knowledge from Data

Dr. Saed Sayad

University of Toronto04/19/23 2

Agenda

Explosion of data Introduction to data mining Examples of data mining in science

and engineering Challenges and opportunities

University of Toronto04/19/23 3

Explosion of Data Data in the world doubles every 20 months!

NASA’s Earth Orbiting System:

46 megabytes of data per second

4,000,000,000,000 bytes a day

FBI fingerprints image library:

200,000,000,000,000 bytes

In-line image analysis for particle detection:

1 megabyte in one second

University of Toronto04/19/23 4

Explosion of Data (cont.)

University of Toronto04/19/23 5

Explosion of Data (cont.)

University of Toronto04/19/23 6

Explosion of Data (cont.)

University of Toronto04/19/23 7

Explosion of Data (cont.)

University of Toronto04/19/23 8

Fast, accurate, and scalable data analysis techniques to extract useful knowledge:

The answer is Data Mining.

What we need?

University of Toronto04/19/23 9

What is Data Mining?

“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”

Data KnowledgeData Mining

University of Toronto04/19/23 10

AI,Machine Learning

Statistics

Data Mining

Database

Data Analysis

Data WarehouseOLAP

University of Toronto04/19/23 11

Data MiningData Mining

Data Analysis Database

Statistics Machine Learning Data Warehouse OLAP

University of Toronto04/19/23 12

Text Files Relational Database

Multi-dimensional Database

Entities File Table Cube

Attributes Row and Col

Record, Field, Index

Dimension, Level, Measurement

Methods Read, Write

Select, Insert, Update, Delete

Drill down, Drill up, Drill through

Language - SQL MDX

Database

University of Toronto04/19/23 13

Data Analysis

Classification Regression Clustering Association Sequence Analysis

University of Toronto04/19/23 14

Data Analysis

X1

X2 Y2

Output Variablesor

Targets

Y1Numeric

Categorical

Numeric

Categorical

Regression (0,1)

Classification (good, bad)

age, income, …

gender, occupation, …

Linear Modelsor

Decision Trees

Input Variablesor

Attributes

ModelModel

W1

W2

University of Toronto04/19/23 15

Data Analysis (cont.)

Age

Income

Clustering

1, chips, coke, chocolate2, gum, chips3, chips, coke4, …

Probability (chips, coke) ?

Association

Sequence Analysis

…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…

Xt-1 XtT

University of Toronto04/19/23 16

Data Mining in Research Life Cycle

Questions Needs

Search

Research

Experiment

Modeling

Report

Library

Data

Database

Data Analysis

University of Toronto04/19/23 17

Data Mining – Modeling Steps

1.Problem Definition

2.Data Preparation

3.Exploration

4.Modeling

5.Evaluation

6.Deployment

University of Toronto04/19/23 18

Agenda

Explosion of data Introduction to data mining Examples of data mining in science and

engineering Challenges and opportunities

University of Toronto04/19/23 19

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

University of Toronto04/19/23 20

1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”

Supination Pronation Flexion Extension

Muscle Contraction

Biceps Triceps

Supination H HPronation L LFlexion H LExtension L H

University of Toronto04/19/23 21

2. Data Preparation

The dataset includes 80 records.

There are two input variables; biceps signal and triceps signal.

One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

University of Toronto04/19/23 22

3. Exploration

Triceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

University of Toronto04/19/23 23

3. Exploration (cont.)

Biceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

University of Toronto04/19/23 24

5. Modeling

Classification

OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …

University of Toronto04/19/23 25

6. Model Deployment

A neural network model was successfully implemented inside the robotic arm.

University of Toronto04/19/23 26

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

University of Toronto04/19/23 27

Plastics Extrusion

Plastic pellets

Plastic melt

University of Toronto04/19/23 28

Film Extrusion

Extruder

Plastic Film

Defect due to particle

contaminant

University of Toronto04/19/23 29

In-Line Monitoring

Transition Piece

Window Ports

University of Toronto04/19/23 30

In-Line Monitoring

Light Source Extruder and Interface

Optical Assembly

Imaging Computer

Light

University of Toronto04/19/23 31

Melt Without Contaminant Particles (WO)

University of Toronto04/19/23 32

Melt With Contaminant Particles (WP)

University of Toronto04/19/23 33

1. Problem Definition

Classify images into those with particles (WP) and those without particles (WO).

WO WP

University of Toronto04/19/23 34

2. Data Preparation

2000 Images

54 Input variables all numeric

One output variables with two possible values-With Particle -Without Particle

University of Toronto04/19/23 35

2. Data Preparation (cont.) Pre-processed images to remove noise

Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles

Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles

54 Input variables, all numeric

One output variable, with two possible values (WP and WO)

University of Toronto04/19/23 36

3. Exploration

Demo!

University of Toronto04/19/23 37

4. Modeling

Classification:

• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian

University of Toronto04/19/23 38

5. Evaluation

Dataset Attrib. Class One-R C4.5 3.N.N Bayes

Sharp Images

54 2 99.9 99.8 99.8 95.8

Sharp + Blurry Images

54 2 98.5 97.8 97.8 93.3

Sharp + Blurry Images

54 3 87 87 84 79

10 -fold cross-validation

If pixel_density_max < 142 then WP

University of Toronto04/19/23 39

6. Deploy model A Visual Basic program will be developed to implement the model.

University of Toronto04/19/23 40

Agenda

Explosion of data Introduction to data mining Examples of data mining in science &

engineering Challenges and opportunities

University of Toronto04/19/23 41

Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.

University of Toronto04/19/23 42

Data mining is an exciting and challenging field with the ability to solve many complex scientific and

business problems.

You can be part of the solution!

Recommended