21
DATA MINING USING MATLAB CODES By Ahmad karawash 1

Data mining using matlab codes

Embed Size (px)

Citation preview

Page 1: Data mining using matlab codes

1

DATA MINING USING MATLAB CODES

By Ahmad karawash

Page 2: Data mining using matlab codes

2

overview

Network Data used Create the graph Display graph Learning parameter Inference conclusion

Page 3: Data mining using matlab codes

3

Network

Page 4: Data mining using matlab codes

4

Data used

Use asia10000.mat file that contain 10000 records about Chest Clinic.

Page 5: Data mining using matlab codes

5

Create graph

N=8; dag=zeros(N,N); A=1;S=2;T=3;L=4;B=5;E=6;X=7;D=8; dag(A,T)=1; dag(S,[L B])=1; dag(T,E)=1; dag(L,E)=1; dag(E,[X D])=1; dag(S,B)=1; dag(B,D)=1;

discrete_nodes=1:N; node_sizes=[2 2 2 2 2 2 2 2]; bnet=mk_bnet(dag,node_sizes,discrete_nodes);

Page 6: Data mining using matlab codes

6

Display graph

names = {'VisitToAsia', 'Smoker', 'HasTuberCulosis', 'HasLungCancer', 'HasBronchitis', 'TuberculosisOrCancer', 'PositiveX-Ray', 'Dyspnoea'};

carre_rond = [1 1 1 1 1 1 1 1]; draw_graph(bnet.dag,names,carre_rond); title('medical domain');

Page 7: Data mining using matlab codes

7

Learning parameter

load asia10000.mat; nsamples = size('asia10000',1); bnet.CPD{E}=tabular_CPD(bnet,E); bnet.CPD{T}=tabular_CPD(bnet,T); bnet.CPD{L}=tabular_CPD(bnet,L); bnet.CPD{S}=tabular_CPD(bnet,S); bnet.CPD{A}=tabular_CPD(bnet,A); bnet.CPD{D}=tabular_CPD(bnet,D); bnet.CPD{B}=tabular_CPD(bnet,B); bnet.CPD{X}=tabular_CPD(bnet,X); bnet=learn_params(bnet,'asia10000');

Page 8: Data mining using matlab codes

8

Load CPT

CPT = cell(1,N); for i=1:N s=struct(bnet.CPD{i}); CPT{i}=s.CPT; End celldisp(CPT)

A S

BL

T

E

D

X

Page 9: Data mining using matlab codes

9

Inference (via Mathlab code)

engine=jtree_inf_engine(bnet); evidence=cell(1,N); evidence{T}=1; % E=false => has no tuberclosis evidence{L}=2; % => has lung cancer evidence{B}=1; % => has no branchit [engine,loglik]=enter_evidence(engine,evidence); marg=marginal_nodes(engine,A); % Displaying the result of inference fprintf('\nResult of the inference\n'); fprintf('P(E / T=2, L=1 ,B=1) = [%3.5f %3.5f]\n',marg.T)

Result of the inference P(E / T=2, L=1, B=1 ) = [1.0000 0.0000] -> 1 > 0 => P(E/ B=1, T=2,L=1)= true (normally true result if T or L =>E) then we

can make classification

Page 10: Data mining using matlab codes

10

conclusion

Now we can make probability (any thing/ anything)

Page 11: Data mining using matlab codes

11

Weka overview

Used data Decision tree Bayes Naif Classifier K-mean clustering

Page 12: Data mining using matlab codes

12

Used data

For classification I will use arff file about Diabetes.

For clustering I will use arff file bmw-training.arff

Page 13: Data mining using matlab codes

13

Decision tree build

Page 14: Data mining using matlab codes

14

Decision tree build

Making a classification using decision Tree result of correct classification is

~84%And of incorrect classification is ~ 15%

Page 15: Data mining using matlab codes

15

Decision tree draw

Page 16: Data mining using matlab codes

16

BNC build

Page 17: Data mining using matlab codes

17

BNC build

Making a classification using decision Tree result of correct classification is

~76%And of incorrect classification is ~ 23%

Page 18: Data mining using matlab codes

18

Compare DT & BNC

The incorrect classified instance by BNC is greater than that of DT

BNC

DT

Page 19: Data mining using matlab codes

19

K-mean cluster

Page 20: Data mining using matlab codes

20

K-mean cluster Interpretation of the result will be discussed

We divide cluster to 2 and 500 iteration

Page 21: Data mining using matlab codes

21

By Ahmad KarawashPhD, Canada.

For more information: [email protected]