81
8/20/2019 Crisp 12 Pre http://slidepdf.com/reader/full/crisp-12-pre 1/81  ECML/PKDD-2003 Knowledge Discovery Standards  Tutorial resented !y" Sara! #nand $%niversity o& %lster'(  Mar)o *ro!elni) $+nstitute ,oe& Ste&an'  and  Dietric. ettsc.erec) $T.e o!ert *ordon %niversity'  Tuesday( 231 Sete!er 2003

Crisp 12 Pre

Embed Size (px)

Citation preview

Page 1: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 1/81

 

ECML/PKDD-2003Knowledge DiscoveryStandards

 Tutorial resented !y"Sara! #nand $%niversity o& %lster'( 

Mar)o *ro!elni) $+nstitute ,oe& Ste&an' and Dietric. ettsc.erec) $T.e o!ert *ordon

%niversity' Tuesday( 231 Sete!er 2003

Page 2: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 2/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Tutorial !4ectives

verview o& e5isting KD-standards Motivation &or using KD-standards

6ow do t.ese standards relate toeac. ot.er7

Page 3: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 3/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

*lo!al view" C+SP-DM

Model generation" ,DMS8L/MM(LED9 DM

Data access"S8L

inter&aces

Model reresentation" PMML

 T.e Knowledge Discovery

Process Data access"S8L

inter&aces

Model reresentation" PMML

Model generation" ,DMS8L/MM(LED9 DM

Page 4: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 4/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Tutorial utline

+ntroduction C+SP-DM S8L inter&aces &or Data Mining 9rea) ,ava Data Mining #P+

Predictive Model Mar)-u Language E5ales

Page 5: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 5/81

 

C+SP-DM" # Standard

Process Model &or DataMining.tt"//www1cris-d1org/

Page 6: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 6/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at is C+SP-DM7

Cross-+ndustry Standard Process &or Data Mining #i"

 To develo an industry( tool and alication neutralrocess &or conducting Knowledge Discovery

De:ne tas)s( oututs &ro t.ese tas)s( terinologyand ining ro!le tye c.aracteriation

;ounding Consortiu Me!ers" DailerC.rysler(SPSS and <C

C+SP-DM Secial +nterest *rou = 200 e!ers Manageent Consultants Data are.ousing and Data Mining Practitioners

Page 7: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 7/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

;our Levels o& #!straction P.ases

E5ale" Data Prearation

*eneric Tas)s # sta!le( general and colete set o& tas)s E5ale" Data Cleaning

Secialied Tas) 6ow is t.e generic tas) carried out E5ale" Missing >alue 6andling

Process +nstance E5ale" T.e ean value &or nueric attri!utes and

t.e ost &re?uent &or categorical attri!utes wasused

Page 8: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 8/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

P.ases o& C+SP-DM

<ot linear( reeatedly !ac)trac)ing

Page 9: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 9/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

9usiness %nderstanding

P.ase %nderstand t.e !usiness o!4ectives .at is t.e status ?uo7

%nderstand !usiness rocesses #ssociated costs/ain

De:ne t.e success criteria Develo a glossary o& ters" sea) t.e language Cost/9ene:t #nalysis

Current Systes #ssessent +denti&y t.e )ey actors

Miniu" T.e Sonsor and t.e Key %ser .at &ors s.ould t.e outut ta)e7 +ntegration o& outut wit. e5isting tec.nology landscae %nderstand ar)et nors and standards

Page 10: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 10/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

9usiness %nderstanding

P.ase  Tas) Decoosition 9rea) down t.e o!4ective into su!-tas)s Ma su!-tas)s to data ining ro!le de:nitions

+denti&y Constraints esources Law e1g1 Data Protection

9uild a ro4ect lan List assutions and ris)

$tec.nical/:nancial/!usiness/ organisational'&actors

Page 11: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 11/81ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data %nderstanding P.ase Collect Data

.at are t.e data sources7 +nternal and E5ternal Sources $e1g1 #5io(

E5erian' Docuent reasons &or inclusion/e5clusions Deend on a doain e5ert #ccessi!ility issues

Legal and tec.nical

#re t.ere issues regarding data distri!utionacross di@erent data!ases/legacy systes .ere are t.e disconnects7

Page 12: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 12/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data %nderstanding P.ase

++ Data Descrition Docuent data ?uality issues

re?uireents &or data rearation

Coute !asic statistics

Data E5loration Sile univariate data lots/distri!utions +nvestigate attri!ute interactions Data 8uality +ssues

Missing >alues %nderstand its source" Missing vs <ull values

Strange Distri!utions

Page 13: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 13/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Prearation P.ase

+ntegrate Data ,oining ultile data ta!les Suarisation/aggregation o& data

Select Data #ttri!ute su!set selection

ationale &or +nclusion/E5clusion Data saling

 Training/>alidation and Test sets

Page 14: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 14/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Prearation P.ase ++ Data Trans&oration

%sing &unctions suc. as log ;actor/Princial Coonents analysis <oraliation/Discretisation/9inarisation

Clean Data 6andling issing values/utliers

Data Construction Derived #ttri!utes

Page 15: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 15/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e Modelling P.ase Select o& t.e aroriate odelling

tec.ni?ue Data re-rocessing ilications

#ttri!ute indeendence

Data tyes/<oralisation/Distri!utions Deendent on

Data ining ro!le tye utut re?uireents

Develo a testing regie Saling

>eri&y sales .ave siilar c.aracteristics and arereresentative o& t.e oulation

Page 16: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 16/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e Modelling P.ase 9uild Model

C.oose initial araeter settings Study odel !e.aviour

Sensitivity analysis

#ssess t.e odel 9eware o& over-:tting +nvestigate t.e error distri!ution

+denti&y segents o& t.e state sace w.ere t.e odel is

less e@ective +teratively ad4ust araeter settings

Docuent reasons o& t.ese c.anges

Page 17: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 17/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e Evaluation P.ase >alidate Model

6uan evaluation o& results !y doaine5erts

Evaluate use&ulness o& results &ro !usiness

ersective De:ne control grous Calculate li&t curves E5ected eturn on +nvestent

eview Process Deterine ne5t stes

Potential &or deloyent Deloyent arc.itecture Metrics &or success o& deloyent

Page 18: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 18/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e Deloyent P.ase Knowledge Deloyent is seci:c to

o!4ectives Knowledge Presentation Deloyent wit.in Scoring Engines and

+ntegration wit. t.e current +T in&rastructure #utoated re-rocessing o& live data &eeds AML inter&aces to 3rd arty tools

*eneration o& a reort nline/Bine

Monitoring and evaluation o& e@ectiveness Process deloyent/roduction Produce :nal ro4ect reort

Docuent everyt.ing along t.e way

Page 19: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 19/81

 

Microso&t LE D9 &or DM

E5tension o& Microso&t

#nalysis Services &or DataMining

Page 20: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 20/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at is LE D9 &or Data-

Mining7 LE D9 &or DM is Microso&ts E5tension o&#nalysis Server roduct &or covering DM&unctionality +t is closely connected to MS L#P Server or)s wit.in S8L Server data!ase suite

+t de:nes DM at several levels" E5tensions o& S8L language &or descri!ing DM

tas)s

#P+ in t.e &or o& CM inter&ace &or" $F' Prograing DM clients wit.in alications $2' Prograing DM roviders $server side coonents'

&or including new DM algorit.s %ses PMML &or odel descrition

Page 21: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 21/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

#rc.itecture o& a solution using

LE D9 &or DM tec.nologyEnd-%ser #lication

Data!ase SystesMS S8L Server(MS L#P Server

racle( D92( G

MS E5cel /MS Site Server /

MS Coerce Server

MS #nalysis Server

Decision TreesCoonent

ClusteringCoonent

LE D9 &or DM

LE D9 &or DMLE D9 &or DM

LE D9 &or DM

LE D9

Page 22: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 22/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at are )ey DM tas)s7 Key DM tas)s covered !y LD D9 &or

DM are" Predictive Modeling $Classi:cation' Segentation $Clustering' #ssociation $Data Suariation' Se?uence and Deviation #nalysis

Deendency Modeling

Page 23: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 23/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

De:ning a doain H

Creating Mining Model !4ectUsing an OLE DB command object, the clientexecutes a CREATE statement that is similar to aCREATE TABLE statement:

CE#TE M+<+<* MDEL I#ge PredictionJ$ICustoer +DJ L<* KE(I*enderJ TEAT D+SCETE(I#geJ D%9LE D+SCET+ED$' PED+CT(IProduct Purc.asesJ T#9LE $IProduct <aeJ TEAT KE(

I8uantityJ D%9LE <M#L C<T+<%%S(IProduct TyeJ TEAT D+SCETE EL#TED T IProduct <aeJ'

'%S+<* IDecision TreesJ

Page 24: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 24/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

+nserting Training Data intoModel

n a manner similar to !o!ulating an ordinar" table,the client uses a #orm o# the $%ERT $TO

statement&$ote the use o# the %'A(E statement to create thenested table&

+<SET +<T I#ge PredictionJ$ICustoer +DJ( I*enderJ( I#geJ(IProduct Purc.asesJ$SK+P( IProduct <aeJ( I8uantityJ( IProduct TyeJ'

'S6#PE

SELECT ICustoer +DJ( I*enderJ( I#geJ ;M Custoers DE 9

ICustoer +DJN#PPE<D $

SELECT ICust+DJ( IProduct <aeJ( I8uantityJ( IProduct TyeJ ;M SalesDE 9 ICust+DJNEL#TE ICustoer +DJ To ICust+DJ'

#S IProduct Purc.asesJ

Page 25: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 25/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

%sing Models to a)e

Predictions(redictions are made )ith a %ELECT statement that joinsthe

model*s set o# all !ossible cases )ith another set o# actualcases&

SELECT t1ICustoer +DJ( I#ge PredictionJ1I#geJ;M I#ge PredictionJPED+CT+< ,+< $

S6#PE SELECT ICustoer +DJ( I*enderJ( ;M Custoers DE 9 ICustoer +DJN

#PPE<D $SELECT ICust+DJ( IProduct <aeJ( I8uantityJ ;M Sales DE 9 ICust+DJNEL#TE ICustoer +DJ To ICust+DJ

'#S IProduct Purc.asesJ

' as t< I#ge PredictionJ 1*ender O t1*ender and

I#ge PredictionJ 1IProduct Purc.asesJ1IProduct <aeJ O t1IProduct Purc.asesJ1IProduct <aeJ andI#ge PredictionJ 1IProduct Purc.asesJ1I8uantityJ O t1IProduct Purc.asesJ1I8uantityJ

Page 26: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 26/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

#ssociation ules The #ollo)ing statement creates a data mining

model to +nd out those !roducts )hich sell togetherbased on an association algorithm& The model isinterested onl" in rules )ith at least +e items:

Create Mining Model My#ssociationModel $

  Transactionid long )ey(  IProduct urc.asesJ ta!le redict $  IProduct <aeJ te5t )ey ' '%sing IMy #ssociation #lgorit.J $Miniusie O Q'

Training an association model is exactl" the same as

training a tree model or a clustering model& To get all the association rules discoered b" the

algorithm, run the #ollo)ing statement:Select R &ro My#ssociationModel1content

Page 27: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 27/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

egression #nalysis B" using a regression algorithm, the #ollo)ing

mining model !redicts loan ris- leel based onage, income, homeo)ner, and marital status:Create Mining Model MyegressionModel $

Custoerid long )ey(#ge long continuous(

6oeowner !oolean discrete(

Maritalstatus 9oolean discrete(

Loanris)LE>ELcontinuous redict

'%sing IMy egression #lgorit.J

The #ollo)ing statement returns all thecoe.cients o# the regression:Select R &ro MyegressionModel1content

Page 28: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 28/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

>isual 9asic e5ale using t.e

LE D9 &or DM Clusteringcoonent$F' Di ClusterConnection #s <ew #DD91Connection$2' ClusterConnection1Provider O MSDMine$3' DMM<ae O ICollPlanDMMJ$' Data;ile<ae O 1UCollegePlan1d!

$Q' ClusterConnection1ConnectionString O locationOlocal.ostVW initial catalogOI;oodMart 2000JV$X' ClusterConnection1en

$Y' ClusterConnection1E5ecute CE#TE M+<+<* MDEL IClusterModelJW $IStudent +dJ L<* KE( ICollege PlansJ TEAT D+SCETEPED+CT(

W I*enderJ TEAT D+SCETE PED+CT( I+?J L<* C<T+<%%SPED+CT(W IParent EncourageentJ TEAT D+SCETE PED+CT( IParent+ncoeJL<* C<T+<%%S PED+CT'W %S+<* Microso&tClustering

 $Z' G

Page 29: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 29/81

 

AML# - AML &or #nalysis

.tt"//5la1org/

Page 30: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 30/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at is AML &or #nalysis7

/0L #or Anal"sis is a set o& AML Message+nter&aces t.at use t.e industry standard S#P tode:ne t.e data access interaction !etween a

client alication and an analytical data rovider$L#P and Data Mining' wor)ing over t.e+nternet1

Page 31: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 31/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at are t.e !ene:ts o&

AML#7 Customers will gain t.e a!ility to rotect server

and tools investents and ensure t.at newanalytical deloyents will interoerate and wor)cooeratively1

Deelo!ers will gain t.e a!ility to leveragee5isting develoer s)ills and to use oen accessAML-!ased e! services( eliinating t.e need torogra to ultile #P+s and ?uery languages1

nde!endent so#t)are endors will !e a!le toreduce cole5ity and costs &or develoent andaintenance !y writing to a single access inter&ace1

Page 32: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 32/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

6istory o& AML#

2000 2001 2002 2003

Hyperion & MicrosoftAnnounce Co-Sponsorshipof XMLA Specification

SAS Joins Council

First XMLA CouncilMeetin !creation of S"# tea$s%

Microsoft eleases S'(

)ersion 1*0 elease+

)ersion 1*1 elease+

)ersion 1*2 !,'%

Apr .o/ MayAprApr Sep

"nterperate orshop "

"nterperate orshop ""

Mar

Secon+ XMLA CouncilMeetin

1st Public XMLAInterOperabilityDemonstration(TDWI)

Page 33: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 33/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

E5ale o& AML# S#P

e?uest  T.e &ollowing is an e5ale o& an Execute et.od callwit. [Stateent\ set to an L#P MDA SELECT stateent"

Page 34: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 34/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

E5ale o& AML# S#P

esonse  T.is is t.e a!!reviated resonse &or t.ereceding et.od call"

Page 35: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 35/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at Provider >endors SuortAML#7

Page 36: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 36/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

.at Consuer W Consulting>endors #re/ill Suort AML#7

Page 37: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 37/81

 

9E#K 

Page 38: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 38/81

 

 ,DM" T.e ,ava #P+ &or DataMining

Page 39: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 39/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

!4ective  To develo a ,ava #P+ t.at suorts

9uilding o& odels Scoring o& data using odels Creation( storage( access and aintenance o& data

and etadata suorting data ining results To provide for data mining systems what JDBCTM did for

relational databases Implementers of data mining applications can expose a

single, standard AI !nderstood by a wide variety of client

applications and components Data Mining clients can be coded against a single AI that

is independent of the !nderlying data mining system "vendor  

Page 40: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 40/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

#roac. and

Develoent Leverages ot.er related standards

PMML $DM*' CM $M*' S8L/MM $+S'  ,CA $,S-FX'

Pu!lic Dra&t eleased in ,uly( 2002

Currently wor) is continuing on t.e:nal dra&t

  ,M+ $,S-0'  ,L#P $,S-X]'

  C+SP-DM  LED9 DM

Page 41: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 41/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

elated Standards

DMG

PMML

%&rsntation o' data

(inin) (odls 'or intr-*ndor +#$an)

DTD/ML

MG

C"M

DM

b#t (odl

'or r&rsntin)

data (inin) (tadata:(odls, (odl rsults

1ML/DTD/ML

SL/MM

Pt. DM

SL ob#ts 'or d'inin),#ratin), and a&&l4in)

data (inin) (odls, and

obtainin) t$ir rsults

SL

LE D5

'or DM

SL-li! intr'a#'or data (inin)

o&rations

LE D5/SL

6S%-073

6DM

6a*a AP8 'or d'inin),#ratin), a&&l4in), and

obtainin) t$ir rsults o' 

data (inin) (odls

6a*a

Page 42: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 42/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e E5ert *rou Mar) 6ornic)( racle

$Lead' 9E# Systes Couter #ssociates

Cororate+ntellect CalTec. ;air +ssac 6yerion +9M KAE<

8uadstone S#P S#S SPSS Strategic

#nalytics Sun Microsystes %niversity o&

%lster

Page 43: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 43/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

%se Case # rograer is tas)ed wit. develoent o& a target

ar)eting tools t.at allows t.e user to C.oose a target caaign E-ail a rando sale o& t.e custoers 9uild a odel !ased on t.e resonses

#ly t.e odel to irove t.e targeting o& t.e caaign %sing ,DM $&or t.e 3rd and t. tas)s' t.e rograer

De:nes t.e target data &or t.e odelling using t.e P.ysical andLogical Data Classes

%ses t.e Classi:cation ;unction Settings class to set de&aultaraeters &or t.e learning tas)

Creates a !uild tas) t.at generates and ersists t.e odel Creates an aly tas) t.at alies t.e odel to select t.e caaign

targets Miniises ris) associated wit. a c.ange in t.e data ining vendor !y

using t.e standard ,DM inter&ace

Page 44: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 44/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

6ow will it wor)7

 ,DM de:nes a set o&inter&aces &or De:ning t.e data to !e

used in t.e ining P.ysical/Logical Data

De:ning t.e dataining araeters ;unction settings

Suort &or <ovice%sers

#lgorit. settings E5ert %ser #lgorit. seci:c

settings

Per&oring Tas)s E5ecuting a data ining

algorit. +orting/E5orting to

PMML  Testing t.e )nowledge #lying t.e )nowledge on

new data 9atc. and eal-tie

Scoring

Coute Statistics

+nterrogating t.e resulting)nowledge

Persistence o& all MetaData/Data

Page 45: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 45/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Tyical #rc.itecture

     ,     D     M

Cororateare.ouse

MetaData

eository

ProrietaryDataMiningEngine F

MetaDataeository

ProrietaryData MiningEngine 2

1

1

%ses ;actory Classes6ence( Service ProviderClasses need not!e ade u!lic

Page 46: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 46/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Con&orance ules &or ServiceProviders

a la carte aroac. to &unctions and algorit.ssuorted vendors ileent &unctions and algorit.s t.at t.eir

roducts suort

#t least one &unction ust !e suorted #ll core ac)ages ust !e suorted #ll et.ods wit.in a ileented class ust !e

ileented seantics seci:ed &or eac. et.od ust !e

ileented to ensure coon interretation o& a givenresult

Must suort ,2EE and/or ,2SE E5tension ay !e done t.roug. su!classing

Page 47: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 47/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Mining ;unctions

Suorted Classi:cation egression #ttri!ute +ortance Clustering #ssociation ules

Page 48: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 48/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

#lgorit.s Suorted <a^ve 9ayes Decision Trees ;eed ;orward <eural <etwor)s Suort >ector Mac.ines K-Means

Page 49: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 49/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Code E5ale $F'// Gt a #onn#tion

9 Conn#tionS&# #onnS&# a*a+.data(inin).rsour#.Conn#tionS&# d(C;a#tor4.)tConn#tionS&#<2  #onnS&#.st=a( >usr9? <

3  #onnS&#.stPass@ord >&s@d? <

  #onnS&#.st1%8 >(4DME? <

B a*a+.data(inin).rsour#.Conn#tion d(Conn d(C;a#tor4.)tConn#tion#onnS&# <

// Crat and &o&ulat t$ P$4si#al Data ob#t D'in t$ Data to b usd

 P$4si#alDataSt;a#tor4 &ds;a#tor4 P$4si#alDataSt;a#tor4 d(Conn.)t;a#tor4 > a*a+.data(inin).data.P$4si#alDataSt? <

7 P$4si#alDataSt &d &ds;a#tor4.#rat >(ini*an.data? <

  &d.i(&ortMtaData<

 d(Conn.sa*b#t >(4PD?, &d <

// Crat Lo)i#alData ob#t

90 Lo)i#alData;a#tor4 ld;a#tor4 Lo)i#alData;a#tor4 d(Conn.)t;a#tor4>a*a+.data(inin).data.Lo)i#alData? <

99 Lo)i#alData ld ld;a#tor4.#rat &d <

// S&#i'4 $o@ attributs s$ould b usd

92 Lo)i#alAttribut in#o( ld.)tAttribut >in#o(? <

93  in#o(.stAttributT4& AttributT4&.nu(ri#al <

Page 50: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 50/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Code E5ale $2'// Crat t$ ;un#tionSttin)s 'or Classi'i#ation

9 Classi'i#ationSttin)s;a#tor4 #'s;a#tor4 Classi'i#ationSttin)s;a#tor4 d(Conn.)t;a#tor4

>a*a+.data(inin).su&r*isd.#lassi'i#ation.Classi'i#ationSttin)s? <

9B Classi'i#ationSttin)s sttin)s #'s;a#tor4.#rat<

9 sttin)s.stTar)tAttribut=a( >bu4Mini*an? <

97 sttin)s.stCostMatri+ #osts < // &rd'ind #ost (atri+

// Crat t$ Al)orit$(Sttin)s and add it to t$ ;un#tionSttin)s

9 =ai*5a4sSttin)s;a#tor4 nb;a#tor4 =ai*5a4sSttin)s;a#tor4 d(-Conn.)t;a#tor4

>a*a+.data(inin).al)orit$(.nai*ba4s.=ai*5a4s-Sttin)s? <

9 =ai*5a4sSttin)s nbSttin)s nb;a#tor4.#rat<

20  nbSttin)s.stSin)ltonT$rs$old .09L <

29  nbSttin)s.stPair@isT$rs$old .09L <

// Asso#iat LD and AS @it$ t$ ;un#tionSttin)s

22 sttin)s.stAl)orit$(Sttin)s nbSttin)s <

23 sttin)s.stLo)i#alData ld <

2 d(Conn.sa*b#t >(4;S?, sttin)s <

Page 51: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 51/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Code E5ale $3'// Crat t$ build tas! 

2 5uildTas!;a#tor4 bt;a#tor4

5uildTas!;a#tor4 d(Conn.)t;a#tor4>a*a+.data(inin).tas!.5uildTas!? <

27 5uildTas! buildTas! bt;a#tor4.#rat >(4PD?, >(4;S?, >(4Modl? <

2 Fri'i#ation%&ort r&ort buildTas!.*ri'4<

2 i' r&ort null H // it$r rror or @arnin)

30  %&ortT4& r&ortT4& r&ort.)t%&ortT4& < // #$#! i' itIs ust a @arnin) or an rror 

32 J ls H

33  d(Conn.sa*b#t >(45uildTas!?, buildTas! <

// E+#ut t$ tas! and blo#! until 'inis$d

3  E+#utionandl $andl d(Conn.+#ut >(45uildTas!? <

3B  $andl.@ait;orCo(&ltion null < // @ait @it$out ti(out until don

// A##ss t$ (odl

3  Classi'i#ationModl (odl

Classi'i#ationModl d(Conn.)tb#t >(4Modl?, =a(db#t.(odl <

37 J

// Clos t$ #onn#tion

3 d(Conn.#los<

Page 52: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 52/81

 

PMML" T.e PredictiveModel Mar)u Language

.tt"//www1dg1org

Page 53: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 53/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Predictive Model Mar)-u Language$PMML'

+ndustry led standard &or reresentingt.e outut o& data ining

Suorted !y ;ull Me!ers" +9M( racle( Magni&y( SPSS(

S#S( StatSo&t( Microso&t( Cororate+ntellect(KAE<( Sal&ord Systes

<uerous #ssociated Me!ers

!4ective de:ne and s.are redictive odels using an

oen standard

Page 54: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 54/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

ationale

Cole5 osaic o& so&tware alications Knowledge generators

Data Mining >endors Di@erent data ining algorit.s .ave di@erent

languages &or e5ressing t.e )nowledge discovered >endor deendent reresentations &or )nowledge e1g1

C/C__ routines

Knowledge consuers eal-tie Scoring / Personalisation engines

Mar)eting Tools >isualisation Tools

<eed &or a vendor indeendentreresentation o& data ining outut

Page 55: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 55/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

PMML

9ene:ts rorietary issues and incoati!ilities no

longer a !arrier to t.e e5c.ange o& odels!etween alications

!ased on AML develo odels using any generator

vendor( deloy t.e odels using anyconsuer vendor alication

Develoent Current elease 21F Suorted !y ost current releases o&

e!er vendors alications

Page 56: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 56/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

PMML Docuent 9asic AML structure DCTPE declaration

not re?uired # PMML docuent ust

!e a valid AML docuent o!ey PMML con&orance

rules

oot eleent [PMML\ X c.ild eleents

2 re?uired 6eader Data Dictionary

otional

+(l *rsionN9.0N O

[`DCTPE PMML P%9L+C PMML210

.tt"//www1dg1org/v2-0/lv201dtd\ 

PMML *rsionN2.0N O

  adr /O

  Minin)5uildTas! /O

  DataDi#tionar4 /O

  Trans'or(ationDi#tionar4 /O

  SQun#Minin)Modl /O

  E+tnsion /O

/PMMLO

Page 57: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 57/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

6eader #ttri!utes

coyrig.t descrition

Eleents #lication $t.at generated t.e PMML'

<ae" Cari >ersion" 210

#nnotation ;ree te5t

 TieSta Date/Tie o& odel creation

Page 58: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 58/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

6eader $2'

+(l *rsionN9.0N O

PMML *rsionN9.0N O

  adr #o&4ri)$t>Cor&orat8ntll#tN ds#ri&tion>%sults o' CAP%8N O

 

/adrO

  . . .

  . . .

/PMMLO

A&&li#ation na(>C%ALN *rsionN3.0N OAnnotationOT$is is a PMML do#u(nt @it$ rsults 'ro( t$

  CAP%8 run on #o((odit4 (ar!t data./AnnotationO

Ti(sta(&O2003-03-02 9:30:00 GMT R00:00/Ti(sta(&O

Page 59: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 59/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Mining 9uild Tas) May contain any AML value descri!ing t.e

con:guration o& t.e training run t.atroduced t.e odel

+n&oration rovided in t.is eleent isessentially eta-data not used seci:cally in t.e deloyent o& t.e

odel !y t.e PMML consuer

Seci:c content structure not de:ned inPMML

Page 60: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 60/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Dictionary #ttri!utes

<u!er o& ;ields aids consistency c.ec)s

Eleents Data;ield

#ttri!utes <ae dislay<ae tye

categorical/ordinal/continuous De:nes legal oerations on t.e :eld values

 Ta5onoy #ame of taxonomy that defines a hierarchy on the val!es

isCyclic

Page 61: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 61/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Dictionary $2'

Eleents >alue

De:nes doain &or ordinal and categorical attri!utes value dislay>alue roerty" valid/ invalid/ issing

+nterval De:nes t.e range o& valid values &or continuous

:elds closure" oenClosed( closeden( oenen(

closedClosed le&tMargin

rig.tMargin  Ta5onoy

De:ne .ierarc.ies on seci:c :elds wit.in t.e datadictionary

Page 62: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 62/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Data Dictionary $3'

#ttri!utes nae" associates t.e ta5onoy wit. t.e aroriate :eld

wit.in t.e data dictionary $see Data;ield attri!ute ta5onoy'

Eleents C.ildParent

#ttri!utes c.ild;ield" nae o& :eld wit.in t.e ta!le $see Eleents

!elow' t.at reresents t.e c.ild value arent;ield" nae o& :eld wit.in t.e ta!le $see Eleents

!elow' t.at reresents t.e arent value arentLevel;ield" nae o& :eld wit.in t.e ta!le $see

Eleents !elow' t.at reresents t.e level in t.e.ierarc.y isecursive" es/<o" i& t.e w.ole .ierarc.y is de:ned in

t.e sae ta!le or an individual ta!le er level Eleents

+nline Ta!le/Ta!le Locator

Page 63: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 63/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

DataDictionary colete

+(l *rsionN9.0N O

PMML *rsionN9.0N O

  adr /O

 

. . .

/PMMLO

DataDi#tionar4 nu(';ilds N3N O

/DataDi#tionar4 O

Data;ild na( NT4&N o&t4&N#at)ori#alNO

  Falu *alu N51 N/O  Falu *alu NN/O

  Falu *alu NCN/O

/Data;ildO

Data;ild na( NA)N o&t4& N#ontinuousNO

  8ntr*al #losur N#losdClosdN l'tMar)in N0N ri)$tMar)in N9B0N/O

/Data;ildO

Data;ild na( NPostCodN o&t4&N#at)ori#alN ta+ono(4 NLo#ationN /O

Ta+ono(4 naeOLocation\ .

/Ta+ono(4O

Page 64: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 64/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Ta5onoy E5ale[Ta5onoy naeOLocation\[C.ildParent c.ildColunOPost Code arentColunODistrict\

[Ta!leLocator 5-d!naeOyD9 5-ta!le<aeOPostCodeDistrict /\[/C.ildParent\[C.ildParent c.ildColunOe!er arentColunOgrouisecursiveOyes\ [+nlineTa!le\

[E5tension e5tenderOMySyste\  [row e!erO] grouOCentralLondon/\

[row e!erO<] grouO<ort.London/\[row e!erO<2 grouO<ort.London/\[row e!erOF grouOCentralLondon/\[row e!erOCentralLondon grouOLondon/\

[row e!erO<ort.London grouOLondon/\[row e!erOEastLondon grouOLondon/\[row e!erOLondon grouOEngland/\GGGG1

[/E5tension\[/+nlineTa!le\[/C.ildParent\ [/Ta5onoy\

Page 65: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 65/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Trans&oration Dictionary De:nes aing o& source data values

to values ore suited &or use !y t.eining algorit.

PMML suorts Normalization$ map val!es to n!mbers, the inp!t

can be contin!o!s or discrete% Discretization$ map contin!o!s val!es to discrete

val!es%

Value mapping$ map discrete val!es to discreteval!es%

Aggregation$ s!mmari&e or collect gro!ps ofval!es, e%g% comp!te average

Page 66: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 66/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Trans&oration Dictionary

$2'  Tran&orationDictionary Derived;ield Eleents

#ttri!utes nae

dislay<ae Eleents

E5ression $one o& t.e &ollowing' <orContinuous <orDiscrete Discretie Ma>alues #ggregates

Page 67: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 67/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Trans&oration Dictionary

$3'[Derived;ield naeOnoral#ge\

[<orContinuous :eldOage\[Linear<or origOQ norO0/\

  [Linear<or origOZ2 norO01Q/\  [Linear<or origOF0Q norOF/\

  [/<orContinuous\[/Derived;ield\[Derived;ield naeOale\

[<orDiscrete :eldOarital statusvalueO/\[/Derived;ield\[Derived;ield naeO&eale\

[<orDiscrete :eldOarital statusvalueO&/\[/Derived;ield\

Page 68: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 68/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 Trans&oration Dictionary

$'[Derived;ield naeO!innedPro:t\  [Discretie :eldOPro:t\

[Discretie9in !in>alueOnegative\[+nterval closureOoenen rig.tMarginO0 /\

[/Discretie9in\[Discretie9in !in>alueOositive\

[+nterval closureOcloseden le&tMarginO0 /\[/Discretie9in\

[/Discretie\[/Derived;ield\[Derived;ield naeO.ouseTye\  [Ma>alues oututColunOlong;or\

[;ieldColunPair :eldOTye colunOs.ort;or/\[+nlineTa!le\[E5tension\

[row\[s.ort;or\9%[/s.ort;or\[long;or\!unglow[/long;or\ [/row\[row\[s.ort;or\6[/s.ort;or\[long;or\.ouse[/long;or\

[/row\[row\[s.ort;or\C[/s.ort;or\[long;or\cottage[/long;or\

[/row\  [/E5tension\[/+nlineTa!le\

[/Ma>alues\[/Derived;ield\

[Derived;ield naeOites9oug.t\  [#ggregate :eldOite &unctionOultiset grou;ieldOtransaction/\ 

Page 69: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 69/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

 T.e PMML Docuent

Data Dictionary

 Trans&oration Dictionary

Mining Sc.ea

ModelF

G

Model2 Model)

Data

ModelStatistics

Mining Sc.ea

ModelStatistics

Mining Sc.ea

ModelStatistics

Page 70: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 70/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Mining Sc.ea Eleents

Mining;ield #ttri!utes

<ae

usageTye" active/ redicted/ suleentary utliers" as+s/ asMissing>alue/ asE5tree>alues low>alue .ig.>alue issing>alueelaceent issing>alueTreatent" as+s/ asMean/ asMode/

asMedian/ as>alue

Page 71: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 71/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

+(l *rsionN9.0N OPMML *rsionN9.0N O

  adr /O

  DataDi#tionar4 /O

 

/PMMLO

MiningSc.ea

SQun#Modl &unction<aeOse?uences algorit.<aeOCari2iniuSuortO21FY iniuCon:denceO0100

nu!er&+tesOQ nu!er&SetsOQ nu!er&Se?uencesOFFnu!er&ulesO3\[E5tension naeOorder!y valueOnone/\ 

/SQun#Modl O

Minin)S#$(a O

/Minin)S#$(a O

Minin);ild na( NPri#N usa)T4&N&rdi#tdN /O

Minin);ild na( Nlo#ationN usa)T4&Na#ti*N /O

Minin);ild na( Nbdroo(sN usa)T4&Na#ti*N /O

Minin);ild na( N$ousT4&N usa)T4&Na#ti*N /O

Minin);ild na(NAraN usa)T4& Nsu&&l(ntar4N /O

Page 72: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 72/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Model Statistics Eleents

%nivariateStatistics #ttri!utes

;ield

Eleents Discrete Statistics Continuous Statistics Counts" >alid( +nvalid and Missing counts <ueric+n&o" in/ a5/ ean/ standard

deviation/ edian/ inter8uartileDistance

Page 73: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 73/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Suorted Data Mining

Models Tree Model <eural <etwor)s Clustering Model

egression Model *eneral egression Model <a^ve 9ayes Model

#ssociation ules Se?uence ule Model

Page 74: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 74/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Se?uence Model

eresents t.e oututo& Se?uence uleMining

#ttri!utes odel<ae &unction<ae algorit.<ae nu!er&Transactions

iniuSuort iniuCon:dence lengt.Liit G11

Eleents Se?uence ule

Eleents #ntecedent Se?uence

se?uencee&erenc

e Conse?uent Se?uence Deliiter

Se?uence Eleents

Sete&erence Deliiter

Set Predicate #rray

[Se?uenceModel &unction<aeOse?uences nu!er&TransactionsOF00iniuSuortO0120 iniuCon:denceO012Q nu!er&+tesOXnu!er&Sets Q nu!er&Se?uences 3 nu!er&ules F\ [MiningSc.ea\

Page 75: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 75/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

nu!er&SetsOQ nu!er&Se?uencesO3 nu!er&ulesOF\ [MiningSc.ea\GGG [/MiningSc.ea\

[SetPredicate idOs00F :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ inde51.tl [/#rray\ [/SetPredicate\

[SetPredicate idOs002 :eldOtransaction oeratorOsuerset&\[#rray nO2 tyeOstring\ [email protected] )dnuggets1co [/#rray\[/SetPredicate\

[SetPredicate idOs003 :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ roducts1.tl [/#rray\ [/SetPredicate\

[SetPredicate idOs00 :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ !as)et1.tl [/#rray\ [/SetPredicate\

[SetPredicate idOs00Q :eldOtransaction oeratorOsuerset&\[#rray nOF tyeOstring\ c.ec)out1.tl [/#rray\ [/SetPredicate\

[Se?uence idOse?00F nu!er&SetsOF occurrenceOZ0 suortO01Z0\[Sete&erence set+dOs00F/\ [/Se?uence\

[Se?uence idOse?002 nu!er&SetsO occurrenceO0 suortO010\[Sete&erence set+dOs002/\[Deliiter deliiterOacrossTieindows

gaO&alse/\[Sete&erence set+dOs003/\[Deliiter deliiterOsaeTieindow

gaOtrue/\[Sete&erence set+dOs00/\[Deliiter deliiterOsaeTieindow

gaO&alse/\[Sete&erence set+dOs00Q/\ [/Se?uence\

[Se?uenceule idOrule00F nu!er&SetsOQ occurrenceO20 suortO0120

con:denceO012Q\[#ntecedentSe?uence\[Se?uencee&erence

Page 76: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 76/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

PMML Consuers

Post-Processing >isualiation >eri:cation and Evaluation Deloyent 6y!rids and Meta-Learning

Page 77: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 77/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

PE#" Post-Processing #ssociation

ules

Sets o& #ssociation rules are !rowsed li)e we!ages

PMML-&oratedassocationrules can !euloaded

 ,orge et al1(2002

Page 78: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 78/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

>ii - PMML >isualiation

 ,ava #let

Soe non-standarde5tensionsre?uired &or

!estvisualiation ettsc.erec)(

2003

eads( visualies and writes PMML:les

Couling wit. EK# in rogress

Page 79: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 79/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Cn H >isualiing C

gra.s coare and evaluate odels

 ,ava #let %nderstands PMML

as an e5tension to

>ii

;arrand and ;lac. $.tt"//www1cs1!ris1ac1u)/

bYE&arrand/rocon/inde51.tl'

%se eceiver erator C.aracteristics $C' to

Page 80: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 80/81

ECML/PKDD 2003 : KD-Standards Tutorial S. Anand, M. Groblni!, D. "tts#$r#! 

Suary

Standards .el to strealine e@orts Sign o& aturity in :eld o& KD ;ro #rt to Engineering Standards are still incolete( !ut"

Use what is available!

More tools utiliing standards are

needed

Page 81: Crisp 12 Pre

8/20/2019 Crisp 12 Pre

http://slidepdf.com/reader/full/crisp-12-pre 81/81

e&erences *rossan( 1L1( 6ornic)( M1;1( Meyer( *1 $2002'1 Data Mining Standards Initiatives(

Counications o& t.e #CM( >ol1 Q"Z see also .tt"//www1dg1org  C.aan( P1( Clinton( ,1( Ker!er( 1( K.a!aa( T1( einart( T1( S.earer( C1 and irt.( 1 $2000'1

CRISP-DM 1.0: Step-by-step data mining guide( C+SP-DM consortiu( .tt"//www1cris-d1org Cli&ton( C1( T.uraising.a( 91 $200F'1 Emerging standards for data mining1 Couter

Standards W +nter&aces >ol 23 FZY H F]31 Compare and Contrast !"#P and $M" for #na%ysis 

.tt"//www1ess!ase1co/resourceli!rary/articles/4ola5la1c&   ,CA .tt"//www14c1org/en/4sr/detail7idO0FX   ,L#P .tt"//www14c1org/en/4sr/detail7idOX] 

 ,orge( #1( Poas( ,1 and #evedo( P1 $2002'1 Post-pro&essing operators for bro'sing %arge sets ofasso&iation ru%es1 Proc1 Discovery Science 021 $eds1 Lange( S1( Sato.( K1 and Sit.( C1 61'(L!ec)( *erany( L<CS( 2Q3( Sringer->erlag1

;arrand( ,1 and ;lac. P1 $2003'1 R!C!n: a too% for visua%ising R!C grap(s1 See".tt"//www1cs1!ris1ac1u)/bYE&arrand/rocon/inde51.tl 

Melton( ,1 and Eisen!erg( #1 S)" Mu%timedia and #pp%i&ation Pa&*ages +S)",MM(.tt"//www1ac1org/sigod/record/issues/0FF2/standards1d& 

M* Coon are.ouse MetaModel .tt"//www1og1org/cw/  S#P .tt"//www1w31org/T/S#P/    Tang( 1( Ki( P1 ui%ding Data Mining So%utions 'it( S)" Server /000(

.tt"//www1dreview1co/w.iteaer/wid2]21d&   ettsc.erec)( D1( ,orge( #1( Moyle( S1 $to aear'1 Data Mining and De&ision Support

Integration t(roug( t(e Predi&tive Mode% Mar*up "anguage Standard and isua%iation inMladenic D( Lavrac <( 9o.anec M( Moyle S $editors'" Data Mining and Decision Suort"+ntegration and Colla!oration( Kluwer Pu!lis.ers1

AML# .tt"//www15la1org/