Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan...

Preview:

Citation preview

Data Warehousing

Lecture-31Supervised vs. Unsupervised Learning

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: ahsan101@yahoo.com

Data Structures in Data Mining

• Data matrix– Table or database – n records and m attributes, – n >> m

C1,1 C1,2 C1,3 C1,m

C2,1 C2,2 C2,3 C2,m

C3,1 C3,2 C3,3 C3,m

Cn,1 Cn,2 Cn,3 Cn,m

.

.

.…

.

.

.

1 S1,2 S1,3 S1,n

S2,1 1 S2,3 S2,n

S3,1 S3,2 1 S3,n

Sn,1 Sn,2 Sn,3 1

.

.

.…

.

.

.

• Similarity matrix– Symmetric square matrix– n x n or m x m

Main types of DATA MINING

Supervised• Bayesian Modeling • Decision Trees• Neural Networks• Etc.

Unsupervised• One-way Clustering• Two-way Clustering

Type and number of classes are NOT known in advance

Type and number of classes are known in advance

Clustering: Min-Max Distance

Age

Salary

20 40 60

outlier Inter-cluster distances are maximized

Intra-cluster distances are

minimized

How Clustering works?

One-way clustering example

INPUT OUTPUT

Black spotsare noise

White spotsare missing

data

Data Mining Agriculture data

INPUT Clustered OUTPUT

clusters

Which class?

Classifier (model)

Unseen Data

Classification

Output

ConfidenceLevel

Inputs

How Classification work?

Classification Process (1): Model Construction

TrainingTrainingDataData

NAME Time Items GenderMoin 10 2 MMunir 16 3 MMeher 15 1 FJaved 5 1 MMahin 20 1 FAkram 20 4 M

ClassificationClassificationAlgorithmsAlgorithms

IF time/items >= 6THEN gender = ‘F’

ClassifierClassifier(Model)(Model)

(observations, measurements, etc.)

Relationship between shopping time and items bought

Classification Process (2): Use the Model in Prediction

TestingTestingDataData Unseen DataUnseen Data

(Firdous, Time= 15 Items = 1)

ClassifierClassifier

Gender?NAME Time Items GenderTahir 20 1 MYounas 11 2 MYasin 3 1 M

Clustering vs. Cluster Detection

Clustering vs. Cluster Detection Example

AA BB

The K-Means Clustering

The K-Means Clustering: Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

A B

D C

The K-Means Clustering: Comment

Recommended