RapidMiner walkthrough 1. Please also remember to apply ... · 2. Open RapidMiner 7.3 and open a...

Preview:

Citation preview

RapidMinerwalkthrough

1. InstallRapidMiner7.3fromhttps://my.rapidminer.com/nexus/account/index.html#downloads

PleasealsoremembertoapplyforanEducationallicensenoworafterthiswalkthroughpracticesothatunlimiteddatarowsareallowed.(Thedefaultversiononlyallowsupto10,000rows).Youcandosohere:https://my.rapidminer.com/nexus/account/index.html#licenses/request

Whensuccessfullyinstalled,seethenextstep.

2. OpenRapidMiner7.3andopenanewprocess

Whendone,seethenextstep.

3. TypeReadCSVintotheoperatorboxtocreateanew“ReadCSV”Operator

Whendone,seethenextstop

4. ClickontheImportConfigurationWizardontherightsideoftheinterface

Whendone,seethenextstep

5. Selectfile“SaoPedroetal(2013)_UMUAI_DesigningControlledExperiments_cummandlocalfeatures.csv”

Youwillhavetodownloaditfromthecoursewebpage

Whendone,seethenextstep

6. Thisisa“csv”file,soselect“CommaDelimited”

Whendone,clickHERE

7. ClickNextuntilthesystemdoesnotletyouclickNextanymore.ThenclickFinish.

Whendone,seethenextstep

8. Createa“SetRole”operatorintheoperatorboxatthetop-left.

Thenconnecttheoutputbubbleontherightsideof“ReadCSV”totheinputbubbleontheleftsideof“SetRole”byclickingontheoutputbubbleandthenclickingontheinputbubble.Yourscreenshouldlooklikethis.

Whendone,seethenextstep.

9. NowgoovertotherightsideandselectDesigningControlledExperimentsasthevariableyouwanttochange,andsetittobea“label”inthetargetrolebox.

Whendone,seethenextstep.

10. InstalltheWEKAExpansionPack.TodothisgototheExtensionsmenu,andselectMarketplace(UpdatesandExtensions).SearchforWeka,andinstalltheWekaExpansionPack.

Whendone,seethenextstep.

11. Typew-j48intotheoperatorswindow,andcreatethew-j48operator

Whendone,seethenextstep.

`12.NowconnecttheoutputbubblefromSetRole(exaforexampleset)totheinputbubblefromJ48(trafortrainingset)

Whendone,seethenextstep.

13. ThenconnecttheoutputbubblefromW-J48(model)totheres(result)bubbleonthefarright

Whendone,seethenextstep.

14. Thenpressplayatthetopofthescreen.Afteraminuteorso(possiblylongerforslowercomputers),youshouldseeyourmodel

Whendone,seethenextstep.

15. Thisrepresentationshowshowthemodelmakesdecisions.Youcanreaditasfollows:

IfthevariableCMcvscntislessthanorequaltozero,thenthemodelpredictsNo.Intheoriginaldataset,therewere271caseswherethispredictionwascorrect,and2caseswhereitwaswrong.Sotheconfidenceofthispredictionis(271)/(271+2)=271/273=99.27%.IfthevariableCMcvscntisgreaterthanzero,thenthemodelgoestothenextvariable.IfthevariableCVSctislessthanorequaltozero,thenIfthevariableRunTSumislessthanorequalto11,thenabout11otherthings,tofinallygettoapredictionofNowith10/11=90.9%confidence(NotethatyouhavetoscrolldowntoseethecasewhereCVSctisgreaterthanzero).

Whendone,seethenextstep.

16. NotethatJ48decisiontreesareextremelycomplicatedtothinkthroughallatonce.

Andtheyareoneofthesimpleralgorithmstointerpret!

Whendone,seethenextstep.

17. ClickontheDesignbuttonatthetoptogobacktothemainscreen.

Whendone,seethenextstep.

18. NowaddtwomoreoperatorstotherightofW-J48.First,anApplyModel,andsecond,aPerformance(BinomialClassification).Choosekappainthewindowtotheright.Makesurethatyoulinktheoperatorsasshownhere.Youcandeletealinkbyright-clickingonitandselectingdelete,oryoucanclickonitandpressthedeletebutton.Thenpressrun.

Whendone,seethenextstep.

19. Youshouldseethisscreen.Thisshowsyouthemodel’sKappaandconfusionmatrix.Thekappaisexcellent,infacttoogood.Keepinmindwedidnotusecross-validation,sothismodelisbeingtrainedandtestedonthesamedataset.

Here’showtoreadtheconfusionmatrix.Thereare165caseswherethemodelsays“Y”andthedatasays“Y”.Thereare383caseswherethemodelsays“N”andthedatasays“N”.Thereare11caseswherethemodelsays“N”andthedatasays“Y”.Thereare5caseswherethemodelsays“Y”andthedatasays“N”.

Whendoneseethenextstep.

20. Nowgobacktothemainscreen,andcreatewhatyouseehere.YoushoulddeleteW-J48,ApplyModel,andPerformance,andaddCrossValidation.Youwillgetsomeerrormessages.Don’tworryaboutthosefornow.Inmanycases,you’llwanttodoBatchX-ValidationinsteadofX-Validation.Batch-X-Validationallowsyoutodostudent-levelcross-validation,oritem-levelcross-validation,orpopulation-levelcross-validation.RegularX-validationsupportsflatcross-validation,astalkedaboutitthevideolecture.

Notetheoptionsovertotheright,whichallowyoutodok-foldcross-validation(currentlysetuptodo10-foldcross-validation),ortodoleave-one-outcross-validation.

Whendoneseethenextstep.

21. Nowdoubleclickonthevalidationbox(thetallyellowone).Itwillbringyoutoanotherscreen.Addoperatorsasshownhere–thesameonesyoujustdeleted.Theleftboxrepresentswhatyoudowiththetrainingfolds–buildamodel.Andtherightboxrepresentswhatyoudowiththetest folds–applythemodel,andseehowwellitdoes.Setupeverythingthesamewayyoudidbefore,e.g.withPerformance(BinomialClassification)andthekappastatistic.

Whendoneseethenextstep.

22. Youcanclicktheblueuparrowtogobacktothemainscreen

Whendoneseethenextstep.

23. Click to run themodel. You should get this.Note that kappa is a lot lower oncewe’re cross-validating.

Whendone,seethenextstep.

24. Sonowyou’vebuiltamodelandvalidatedit.There’salotmorethingsyoucoulddo.

Youcould

• Usestudent-levelcross-validation(youwouldhavetoaddthevariablestudentbackin)• Try different algorithms, such asW-Jrip,W-KStar, KNN, Logistic Regression, Linear Regression

(whichgivesyouStepRegressionforbinomialdata)• Trycreatingnewfeatures(tryGenerateAttributes)orremovingfeatures(tryRemoveCorrelated

Attributes)

Havefun!

Recommended