Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
RapidMinerwalkthrough
1. InstallRapidMiner7.3fromhttps://my.rapidminer.com/nexus/account/index.html#downloads
PleasealsoremembertoapplyforanEducationallicensenoworafterthiswalkthroughpracticesothatunlimiteddatarowsareallowed.(Thedefaultversiononlyallowsupto10,000rows).Youcandosohere:https://my.rapidminer.com/nexus/account/index.html#licenses/request
Whensuccessfullyinstalled,seethenextstep.
2. OpenRapidMiner7.3andopenanewprocess
Whendone,seethenextstep.
3. TypeReadCSVintotheoperatorboxtocreateanew“ReadCSV”Operator
Whendone,seethenextstop
4. ClickontheImportConfigurationWizardontherightsideoftheinterface
Whendone,seethenextstep
5. Selectfile“SaoPedroetal(2013)_UMUAI_DesigningControlledExperiments_cummandlocalfeatures.csv”
Youwillhavetodownloaditfromthecoursewebpage
Whendone,seethenextstep
6. Thisisa“csv”file,soselect“CommaDelimited”
Whendone,clickHERE
7. ClickNextuntilthesystemdoesnotletyouclickNextanymore.ThenclickFinish.
Whendone,seethenextstep
8. Createa“SetRole”operatorintheoperatorboxatthetop-left.
Thenconnecttheoutputbubbleontherightsideof“ReadCSV”totheinputbubbleontheleftsideof“SetRole”byclickingontheoutputbubbleandthenclickingontheinputbubble.Yourscreenshouldlooklikethis.
Whendone,seethenextstep.
9. NowgoovertotherightsideandselectDesigningControlledExperimentsasthevariableyouwanttochange,andsetittobea“label”inthetargetrolebox.
Whendone,seethenextstep.
10. InstalltheWEKAExpansionPack.TodothisgototheExtensionsmenu,andselectMarketplace(UpdatesandExtensions).SearchforWeka,andinstalltheWekaExpansionPack.
Whendone,seethenextstep.
11. Typew-j48intotheoperatorswindow,andcreatethew-j48operator
Whendone,seethenextstep.
`12.NowconnecttheoutputbubblefromSetRole(exaforexampleset)totheinputbubblefromJ48(trafortrainingset)
Whendone,seethenextstep.
13. ThenconnecttheoutputbubblefromW-J48(model)totheres(result)bubbleonthefarright
Whendone,seethenextstep.
14. Thenpressplayatthetopofthescreen.Afteraminuteorso(possiblylongerforslowercomputers),youshouldseeyourmodel
Whendone,seethenextstep.
15. Thisrepresentationshowshowthemodelmakesdecisions.Youcanreaditasfollows:
IfthevariableCMcvscntislessthanorequaltozero,thenthemodelpredictsNo.Intheoriginaldataset,therewere271caseswherethispredictionwascorrect,and2caseswhereitwaswrong.Sotheconfidenceofthispredictionis(271)/(271+2)=271/273=99.27%.IfthevariableCMcvscntisgreaterthanzero,thenthemodelgoestothenextvariable.IfthevariableCVSctislessthanorequaltozero,thenIfthevariableRunTSumislessthanorequalto11,thenabout11otherthings,tofinallygettoapredictionofNowith10/11=90.9%confidence(NotethatyouhavetoscrolldowntoseethecasewhereCVSctisgreaterthanzero).
Whendone,seethenextstep.
16. NotethatJ48decisiontreesareextremelycomplicatedtothinkthroughallatonce.
Andtheyareoneofthesimpleralgorithmstointerpret!
Whendone,seethenextstep.
17. ClickontheDesignbuttonatthetoptogobacktothemainscreen.
Whendone,seethenextstep.
18. NowaddtwomoreoperatorstotherightofW-J48.First,anApplyModel,andsecond,aPerformance(BinomialClassification).Choosekappainthewindowtotheright.Makesurethatyoulinktheoperatorsasshownhere.Youcandeletealinkbyright-clickingonitandselectingdelete,oryoucanclickonitandpressthedeletebutton.Thenpressrun.
Whendone,seethenextstep.
19. Youshouldseethisscreen.Thisshowsyouthemodel’sKappaandconfusionmatrix.Thekappaisexcellent,infacttoogood.Keepinmindwedidnotusecross-validation,sothismodelisbeingtrainedandtestedonthesamedataset.
Here’showtoreadtheconfusionmatrix.Thereare165caseswherethemodelsays“Y”andthedatasays“Y”.Thereare383caseswherethemodelsays“N”andthedatasays“N”.Thereare11caseswherethemodelsays“N”andthedatasays“Y”.Thereare5caseswherethemodelsays“Y”andthedatasays“N”.
Whendoneseethenextstep.
20. Nowgobacktothemainscreen,andcreatewhatyouseehere.YoushoulddeleteW-J48,ApplyModel,andPerformance,andaddCrossValidation.Youwillgetsomeerrormessages.Don’tworryaboutthosefornow.Inmanycases,you’llwanttodoBatchX-ValidationinsteadofX-Validation.Batch-X-Validationallowsyoutodostudent-levelcross-validation,oritem-levelcross-validation,orpopulation-levelcross-validation.RegularX-validationsupportsflatcross-validation,astalkedaboutitthevideolecture.
Notetheoptionsovertotheright,whichallowyoutodok-foldcross-validation(currentlysetuptodo10-foldcross-validation),ortodoleave-one-outcross-validation.
Whendoneseethenextstep.
21. Nowdoubleclickonthevalidationbox(thetallyellowone).Itwillbringyoutoanotherscreen.Addoperatorsasshownhere–thesameonesyoujustdeleted.Theleftboxrepresentswhatyoudowiththetrainingfolds–buildamodel.Andtherightboxrepresentswhatyoudowiththetest folds–applythemodel,andseehowwellitdoes.Setupeverythingthesamewayyoudidbefore,e.g.withPerformance(BinomialClassification)andthekappastatistic.
Whendoneseethenextstep.
22. Youcanclicktheblueuparrowtogobacktothemainscreen
Whendoneseethenextstep.
23. Click to run themodel. You should get this.Note that kappa is a lot lower oncewe’re cross-validating.
Whendone,seethenextstep.
24. Sonowyou’vebuiltamodelandvalidatedit.There’salotmorethingsyoucoulddo.
Youcould
• Usestudent-levelcross-validation(youwouldhavetoaddthevariablestudentbackin)• Try different algorithms, such asW-Jrip,W-KStar, KNN, Logistic Regression, Linear Regression
(whichgivesyouStepRegressionforbinomialdata)• Trycreatingnewfeatures(tryGenerateAttributes)orremovingfeatures(tryRemoveCorrelated
Attributes)
Havefun!