63
SUSHIL KULKARNI SUSHIL KULKARNI SUSHIL KULKARNI SUSHIL KULKARNI CLASSIFICATION CLASSIFICATION IN DATA MINING IN DATA MINING

Classification algorithms used in Data Mining. This is a lecture given to Msc students

Embed Size (px)

DESCRIPTION

Classification algorithms like Decision tree, ID3, Information Theory,Entropy,CART, Naive Baysian classsification are discussed with examples.

Citation preview

Page 1: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

SUSHIL KULKARNISUSHIL KULKARNI

CLASSIFICATIONCLASSIFICATIONIN DATA MININGIN DATA MINING

Page 2: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ClassificationClassification What is classification?

Model Construction

ID3

Information Theory

Naïve Baysian Classifier

Page 3: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CLASSIFICATION CLASSIFICATION PROBLEMPROBLEM

Page 4: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CLASSIFICATION PROBLEMCLASSIFICATION PROBLEM

ӂGiven a database D={t1,t2,…,tn} and a set of classes C={C1,…,Cm}, the Classification Problem is to define a mapping f: DC where each ti is assigned to one class.

ӂ Problem is to create classes to classify data with the help of given set of data called training set.

Page 5: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CLASSIFICATION EXAMPLESCLASSIFICATION EXAMPLES

ӂ Teachers classify students’ grades as A,

B, C, D, or F.

ӂ Identify mushrooms as poisonous or

edible.

ӂ Identify individuals with credit risks.

Page 6: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Why Classification? A motivating Why Classification? A motivating applicationapplication

ӂ Credit approval

o A bank wants to classify its customers based on whether they are expected to pay back their approved loans

o The history of past customers is used to train the classifier

o The classifier provides rules, which identify potentially reliable future customers

Page 7: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Why Classification? A motivating Why Classification? A motivating applicationapplication

ӂ Credit approvalo Classification rule:

If age = “31...40” and income = high then credit_rating = excellent

o Future customers

Suhas : age = 35, income = high excellent

credit rating

Heena : age = 20, income = medium fair

credit rating

Page 8: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Classification — A Two-Step Classification — A Two-Step ProcessProcess

Model construction: describing a set of predetermined classes: Excellent and Fair using training set.

Model is represented using classification rules.

Page 9: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Supervised LearningSupervised Learning

Supervised learning (classification)

o Supervision: The training data (observations,

measurements, etc.) are accompanied by labels

indicating the class of the observations

o New data is classified based on the training set

Page 10: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Classification Process (1): Classification Process (1): Model ConstructionModel Construction

TrainingData

NAME RANK YEARS TEACH Henna Assistant Prof 3 noLeena Assistant Prof 7 yesMeena Professor 2 yesDinesh Associate Prof 7 yesDinu Assistant Prof 6 noAmar Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN teach = ‘yes’

Classifier(Model)

Page 11: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Classification Process (2): Use Classification Process (2): Use the Model in Predictionthe Model in Prediction

Classifier

TestingData

NAME RANK YEARS TENUREDSwati Assistant Prof 2 noMalika Associate Prof 7 noTina Professor 5 yesJune Assistant Prof 7 yes

Unseen Data

(Dina, Professor, 4)

Teach?

Page 12: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Model Construction: Example Model Construction: Example Sr. Gender Age BP Drug

1 M 20 Normal A

2 F 73 Normal B

3 M 37 High A

4 M 33 Low B

5 F 48 High A

6 M 29 Normal A

7 F 52 Normal B

8 M 42 Low B

9 M 61 Normal B

10 F 30 Normal A

11 F 26 Low B

12 M 54 High A

Page 13: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Model Construction: ExampleModel Construction: Example

Blood Pressure ?

Age ? Drug B

High Low

Drug A Drug B

Drug A

Normal

40 > 40

Directed Tree

Page 14: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Model Construction: ExampleModel Construction: Example

Tree summarizes the following:o If BP=High prescribe Drug A

o If BP=Low prescribe Drug B

o If BP=Normal and age 40 prescribe Drug A else prescribe

Drug B

Two classes ‘Drug A’ and ‘Drug B’ are created.

Page 15: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Model Construction: ExampleModel Construction: ExampleThe tree is constructed with training data and

there is no training error.

All rules that we made are 100% correct

according to training data.

In practical field data, it is unlikely that we get

rules with 100% accuracy and with high support.

Page 16: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Model Construction: ExampleModel Construction: ExampleAccuracy and Support :

o Accuracy is 100% correct for given rules.

o If BP=High prescribe Drug A ( Support = 3/12)

o If BP=Low prescribe Drug B ( Support = 3/12)

o If BP=Normal and age 40 prescribe Drug A else

prescribe Drug B ( Support = 3/12)

Page 17: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Error and SupportError and SupportLet t = No. of data points, r = no. of data points in a

class or node, max = maximum no. of data points in

a class or node, min = minimum no. of data points in

a class or nodeo Accuracy = max / r

o Error = min / r

o Support = max / t

Accuracy and Error are calculated for classes and

support for the class is calculated with respect to

the total number of data points in a given set.

Page 18: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Rules with different Rules with different accuracy & supportaccuracy & support

180 data points

115 A5 B

58 A2 B

X < 60 X > 60

E = 5/120A= 115/120S= 115/180

E = 2/60A= 58/60S= 58/180

Node P Node Q

Page 19: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Criteria to grow the treeCriteria to grow the tree

If the attribute is a categorical then the

tree is called as classification tree.

[ Eg. Drug Prescribe]

If the attribute is continuous then the tree

is called as regression tree.

[ Eg. Income]

Page 20: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CLASSIFICATION CLASSIFICATION TREES FOR TREES FOR

CATEGORICAL CATEGORICAL ATTRIBUTESATTRIBUTES

Page 21: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

INDUCTION DECISION TREE [ ID3]INDUCTION DECISION TREE [ ID3]

Decision tree generation consists of two phaseso Tree construction

• At start, all the training examples are at the root• Partition examples recursively based on

selected attributeso Tree pruning

• Identify and remove branches that reflect noise or outliers

Use of decision tree: Classifying an unknown sampleo Test the attribute values of the sample against the

decision tree

Page 22: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Training DatasetTraining Dataset

No age income student credit_rating buys_computer1 <=30 high no fair no2 <=30 high no excellent no3 31…40 high no fair yes4 >40 medium no fair yes5 >40 low yes fair yes6 >40 low yes excellent no7 31…40 low yes excellent yes8 <=30 medium no fair no9 <=30 low yes fair yes

10 >40 medium yes fair yes11 <=30 medium yes excellent yes12 31…40 medium no excellent yes

This follows an example from Quinlan’s ID3

Page 23: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Output: ID 3 for “Output: ID 3 for “buys_computer”buys_computer”

age?

student? credit rating?

no yes fairexcellent

<=30>40

no noyes yes

yes

31..40

‘no’ and ‘yes’ are two classes created

Page 24: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ANOTHER EXAMPLE:ANOTHER EXAMPLE:MARKSMARKS

ӂ If x >= 90 then grade =A.ӂIf 80<=x<90 then grade =B.ӂIf 70<=x<80 then grade =C.ӂIf 60<=x<70 then grade =D.ӂIf x<50 then grade =F

SUSHIL KULKARNI

>=90<90

x

>=80<80

x

>=70<70

x B

A

>=60<50

x C

F D

Page 25: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ALGORITHM FOR ID 3ALGORITHM FOR ID 3

Basic algorithm (a greedy algorithm)o Tree is constructed in a top-down recursive divide-

and-conquer mannero At start, all the training examples are at the rooto Attributes are categorical o Samples are partitioned recursively based on

selected attributeso Test attributes are selected on the basis of a

heuristic or statistical measure (e.g., information gain)

Page 26: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ALGORITHM FOR ID 3ALGORITHM FOR ID 3

Conditions for stopping partitioningo All samples for a given node belong to the

same classo There are no remaining attributes for

further partitioning – majority voting is employed for classifying the leaf

o There are no samples left

Page 27: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ID 3 : ADVANTAGESID 3 : ADVANTAGES

Easy to understand.

Easy to generate rules

Page 28: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

ID 3 :ID 3 :DISADVANTAGESDISADVANTAGES

May suffer from over fitting.

Does not easily handle nonnumeric data.

Can be quite large – pruning is necessary.

Page 29: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

INFORMATION INFORMATION THEORYTHEORY

Page 30: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

INFORMATION THEORYINFORMATION THEORY

Page 31: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

When all the marbles in the bowl are mixed up, little information is given.

When the marbles in the bowl are distributed in different classes , more information is given.

INFORMATION THEORYINFORMATION THEORY

Page 32: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Entropy gives an idea of how to split an attribute from a tree.

‘yes’ or ‘no’ in our example.

ENTROPYENTROPY

Page 33: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

BUILDING THE BUILDING THE TREETREE

Page 34: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Information Gain ID3Information Gain ID3

Select the attribute with the highest information

gain

Assume there are two classes, P and N

o Let the set S contain p elements of class P and n

elements of class N

o The amount of information, needed to decide if an

arbitrary object in S belongs to P or N is defined as

npn

npn

npp

npp

npI

22 loglog),(

Page 35: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Information Gain in Decision Information Gain in Decision Tree InductionTree Induction

Assume that using attribute A, a set S will be partitioned into sets {S1, S2 , …, Sv}

o If Si contains pi elements of P and ni elements of N,

the entropy, or the expected information needed to classify objects in all sub trees Si is

The encoding information that would be gained by branching on A

1),()(

iii

ii npInpnp

AE

)(),()( AEnpIAGain

Page 36: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Training DatasetTraining Dataset

No age income student credit_rating buys_computer1 <=30 high no fair no2 <=30 high no excellent no3 31…40 high no fair yes4 >40 medium no fair yes5 >40 low yes fair yes6 >40 low yes excellent no7 31…40 low yes excellent yes8 <=30 medium no fair no9 <=30 low yes fair yes

10 >40 medium yes fair yes11 <=30 medium yes excellent yes12 31…40 medium no excellent yes

This follows an example from Quinlan’s ID3

Page 37: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Attribute Selection by Attribute Selection by Information Gain ComputationInformation Gain Computation

o Class P: buys_computer = “yes”

o Class N: buys_computer = “no”

o I(p, n) = I(9, 5) =0.940

o Compute the entropy for age:

Hence

Similarly

age pi ni I(pi, ni)<=30 2 3 0.97131..40 4 0 0>40 3 2 0.971

69.0)2,3(14

5

)0,4(14

4)3,2(

14

5)(

I

IIageE

048.0)_(

151.0)(

029.0)(

ratingcreditGain

studentGain

incomeGain

250.0

)(),()(

ageEnpIageGain

AGE IS MAX GAINAGE IS MAX GAIN

Page 38: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Splitting the samples using Splitting the samples using ageage

income student credit_rating buys_computerhigh no fair nohigh no excellent nomedium no fair nolow yes fair yesmedium yes excellent yes

income student credit_rating buys_computerhigh no fair yeslow yes excellent yesmedium no excellent yeshigh yes fair yes

income student credit_rating buys_computermedium no fair yeslow yes fair yeslow yes excellent nomedium yes fair yesmedium no excellent no

age?<=30

31...40

>40

labeled yes

Page 39: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Output: ID 3 for “Output: ID 3 for “buys_computer”buys_computer”

age?

student? credit rating?

no yes fairexcellent

<=30>40

no noyes yes

yes

31..40

Page 40: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CARTCART

Page 41: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

CART [ CLASSIFICATION AND CART [ CLASSIFICATION AND REGRESSION TREE] REGRESSION TREE]

Algorithm is similar to ID 3 but used GINI index called impurity measure to select variables.

If target variable is normal and has more than two categories , the option of merging of target categories into two super categories may be considered. The process is called Twoing.

Page 42: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

GiniGini Index (IBM Index (IBM Intelligent Miner)Intelligent Miner)If a data set T contains examples from n classes, gini index, gini(T) is defined as

where pj is the relative frequency of class j in

T.

n

jp jTgini

1

21)(

Page 43: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Extracting Classification Rules Extracting Classification Rules from Treesfrom Trees

Represent the knowledge in the form of IF-THEN rules

One rule is created for each path from the root to a leaf

Each attribute-value pair along a path forms a conjunction

Page 44: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Extracting Classification Rules Extracting Classification Rules from Treesfrom Trees

The leaf node holds the class prediction

Rules are easy for humans to understand

Example

IF age = “<=30” AND student = “no” THEN buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Page 45: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

BAYESIAN BAYESIAN CLASSIFICATIONCLASSIFICATION

Page 46: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Classification and Classification and regressionregression

What is classification? What is regression?

Issues regarding classification and

regression

Classification by decision tree induction

Bayesian Classification

Other Classification Methods

regression

Page 47: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

What is Bayesian What is Bayesian Classification?Classification?

Bayesian classifiers are statistical classifiers

For each new sample they provide a probability that the sample belongs to a class (for all classes)

Page 48: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

What is Bayesian What is Bayesian Classification?Classification?

Example:

o sample John (age=27, income=high,

student=no, credit_rating=fair)

o P(John, buys_computer=yes) = 20%

o P(John, buys_computer=no) = 80%

Page 49: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N

play tennis?

Page 50: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Nrain cool normal true Nsunny mild high false Nrain mild high true N

Outlook Temperature Humidity Windy Classovercast hot high false Prain mild high false Prain cool normal false Povercast cool normal true Psunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false P

9

5

Page 51: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Given the training set, we compute the probabilities:

We also have the probabilities P = 9/14 and N = 5/14

Outlook P N Humidity P Nsunny 2/9 3/5 high 3/9 4/5overcast 4/9 0 normal 6/9 1/5rain 3/9 2/5Tempreature Windyhot 2/9 2/5 true 3/9 3/5mild 4/9 2/5 false 6/9 2/5cool 3/9 1/5

Page 52: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian ClassifierNaive Bayesian Classifier

We use notation P(A) as the probability of an event A and P(A/B) denotes the probability of A conditional on another event B. H is the hypothesis and E is the evidence and is the combination of attribute values then

)E(p

)H(p).H/E(p)E/H(p

Example: Let H be ‘yes’ and e is the combination of attribute values for new day: Outlook=sunny, temp.= cool, humidity= high, windy= true. Call these for pieces as E1 , E2 ’ E 3 and E 4 and are independent then

)E(p

)H(p).H/4E(p)H/3E(p).H/2E(p).H/1E(p)E/H(p

Page 53: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian ClassifierNaive Bayesian Classifier

Denominator can be eliminated as the final normalizing step when we make the probabilities of different pieces, the sum is 1. Thus

)H(p).H/4E(p)H/3E(p).H/2E(p).H/1E(p)E/H(p

Page 54: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

To classify a new day E: outlook = sunny, temperature = cool

humidity = high, windy = false

Prob(P|E) = Prob(P) * Prob(sunny|P) * Prob(cool|P)

* Prob(high|P) * Prob(false|P) = 9/14*2/9*3/9*3/9*6/9 = 0.01

Prob(N|X) = Prob(N) * Prob(sunny|N) * Prob(cool|N) * Prob(high|N) * Prob(false|N) = 5/14*3/5*1/5*4/5*2/5 = 0.013

Page 55: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Probability of ‘Playing’

Probability of ‘ Not Playing’

Therefore E takes class label N

%57013.001.0

013.0

%43013.001.0

01.0

Page 56: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Second example X = <rain, hot, high, false>

P(X|p) · P(p) = P(rain|p) * P(hot|p) * P(high|p)

* P(false|p) * P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582

P(X|n) · P(n) = P(rain|n) · P(hot|n) · P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286

Sample X is classified in class N (don’t play)

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Page 57: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Naive Bayesian Classifier Naive Bayesian Classifier ExampleExample

Probability of ‘Playing’

Probability of ‘ Not Playing’

Therefore X takes class label N

%630182860.0010582.0

0182860.0

%370182860.0010582.0

010582.0

Page 58: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

REGRESSIONREGRESSION

Page 59: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

What Is regression?What Is regression?regression is similar to classification

o First, construct a model

o Second, use model to predict unknown value

Major method for regression is regression

• Linear and multiple regression

• Non-linear regression

regression is different from classification

o Classification refers to predict categorical class label

o regression models continuous-valued functions

Page 60: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Predictive modeling: Predict data values or construct generalized linear models based on the database data.

One can only predict value ranges or category distributions

Determine the major factors which influence the regressiono Data relevance analysis: uncertainty

measurement, entropy analysis, expert judgement, etc.

Predictive Modeling in Predictive Modeling in DatabasesDatabases

Page 61: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Regress Analysis and Log-Regress Analysis and Log-Linear Models in RegressionLinear Models in Regression Linear regression: Y = + X

o Two parameters , and specify the line and are to be estimated by using the data at hand.

o using the least squares criterion to the known values of (x1,y1),(x2,y2),...,(xs,yS):

s

i i

s

i ii

xx

yyxx

1

2

1

)(

))(( xya

Page 62: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

Regress Analysis and Log-Regress Analysis and Log-Linear Models in RegressionLinear Models in Regression Multiple regression: Y = b0 + b1 X1 + b2 X2.o Many nonlinear functions can be transformed into

the above. o E.g.,Y=b 0 + b1 X+ b2X 2+ b3X 3, X1=X, X2=X 2, X3=X 3

Log-linear models:o The multi-way table of joint probabilities is

approximated by a product of lower-order tables.o Probability: p(a, b, c, d) = ab acad bcd

Page 63: Classification algorithms used in Data Mining. This is a lecture given to Msc students

SUSHIL KULKARNISUSHIL KULKARNI

T H A N K S !T H A N K S !