24
31‐10‐2013 1 Version 1.0 iTalk2Learn 2013-10-31 Deliverable 5.1 Report on evaluation plan 31 October 2013 Project acronym: iTalk2Learn Project full title: Talk, Tutor, Explore, Learn: Intelligent Tutoring and Exploration for Robust Learning

iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 1 Version 1.0

iTalk2Learn 2013-10-31

Deliverable 5.1

Report on evaluation plan

31 October 2013

Project acronym: iTalk2Learn

Project full title:Talk, Tutor, Explore, Learn: Intelligent Tutoring and Exploration for RobustLearning

Page 2: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 2 Version 1.0

D5.1EvaluationPlan

WorkPackage: 5

Document title: D5.1-report_on_evaluation_plan

Version: 1.0

Official delivery date: 31.10.2013

Actual publication date:

Type of document: Report

Nature: Public

Authors: Katharina Loibl (RUB), Nikol Rummel (RUB), Manolis Mavrikis (IOE), Alice Hansen (IOE), Carlotta Schatten (UHi), Lars Schmidt-Thieme (UHi), Norbert Pfannerer (Sail), Gemma Solomons (Whizz), and Richard Marett (Whizz)

Internal reviewers: Beate Grawemeyer and Sergio Gutierrez-Santos (BBK), Carlotta Schatten and Lars Schmidt-Thieme (UHi)

Version Date SectionsAffected

0.1 30/07/2013 Firstdraft(RUB)

0.2 09/09/2013 Exploratorytasks(IOE)

Summativeevaluation(IOE)

Sequencingforstructuredtasks(UHi)

0.3 20/09/2013 Speechrecognition(Sail)

Automatic switching between structured and exploratorytasks(RUB)

0.4 30/09/2013 Refinementofallsectionsafterdiscussionwithallpartnersatthegeneralmeeting

0.5 02/10/2013 Exploratorytasks(IOE)

Speechrecognition(Sail)

1.0 29/10/2013 Finalversion

Page 3: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 3 Version 1.0

D5.1EvaluationPlan

ExecutiveSummary

This deliverable reports on the iTalk2Learn evaluation plans. We first define our evaluation goals.Subsequently we describe two interlinked plans: the formative evaluation plan and the summativeevaluation plan. The section on the formative evaluation plan describes the iterative design and testprocess involved in developing adaptive sequencing and support mechanisms, exploratory learningactivities, and infant speech recognition. The summative evaluation plan defines appropriatemethodologiestotestthenewtechnologyintwoexperiments,usingquantitativemeasures.

Page 4: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 4 Version 1.0

D5.1EvaluationPlan

TableofContents

ExecutiveSummary..........................................................................................................................................................................3 

TableofContents...............................................................................................................................................................................4 

ListofFigures........................................................................................................................................................................................5 

ListofTables........................................................................................................................................................................................5 

ListofAbbreviations..........................................................................................................................................................................5 

1.  GeneralIntroduction...........................................................................................................................................................6 

2.  Formativeevaluation...........................................................................................................................................................7 

2.1Exploratorylearningenvironmentandtasks................................................................................................................9 

2.2Speechrecognition.................................................................................................................................................................12 

2.3Automaticadaptivity.............................................................................................................................................................13 

2.3.1Sequencingforstructuredtasks...................................................................................................................................14 

2.3.2Automaticswitchingbetweenstructuredandexploratorytasks...................................................................16 

2.3.3Support(HintsandFeedback).......................................................................................................................................17 

3.  Summativeevaluation......................................................................................................................................................18 

3.1.  ExperimentinGermany..................................................................................................................................................21 

3.2.  ExperimentinEngland....................................................................................................................................................21 

4.  Conclusion.............................................................................................................................................................................21 

References.........................................................................................................................................................................................22 

Page 5: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 5 Version 1.0

D5.1EvaluationPlan

ListofFigures

Figure1:iTalk2Learnevaluationplan

ListofTables

Table1:Developmentofexploratorylearningenvironment(phase1)

Table2:Task‐dependentandtask‐independentsupport

Table3:Conditionsintheexperimentsofthesummativeevaluation

ListofAbbreviations

AM Acousticmodel

ELE ExploratoryLearningEnvironment

HCI Human‐Computer‐Interaction

LM Languagemodel

LVF LogotronVisualFractions

M Month

n number

OOV outofvocabularyrate

SIG SpecialInterestGroup

WER Worderrorrate

WOZ WizardofOz

Y Year

Page 6: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 6 Version 1.0

D5.1EvaluationPlan

1. GeneralIntroduction

The iTalk2Learn project aims to facilitate robust learning in elementary education. Robust learningincludestheacquisitionofproceduralskillsandofconceptualknowledge(Koedinger,Corbett, & Perfetti, 2012).Researchhasshownthatproceduralskillscanbebestsupportedbystructuredpractice(Rittle-Johnson, Siegler, & Alibali, 2001). In contrast, students acquire conceptual knowledge throughexperiencingmoreopen‐endedtasks,suchasexploratorytasks(Ainsworth&Loizou,2003;Chi,Bassok,Lewis,Reimann,&Glaser,1989;Lewis,1988;VanLehn,1999;alsoseeD1.1).Definitionsofproceduraland conceptualknowledge, aswell as the interactionbetween them,are included inD1.1andwillbeextendedinD1.3.

Against this background, the iTalk2Learn project aims to facilitate robust learning in elementaryeducationbycreatingaplatformforintelligentsupportthatcombinesexistingstructuredlearningtaskswith new exploratory learning tasks, and that provides options for voice interaction. Intelligentcomponentswilladaptivelysequencethelearningtasksandprovideadaptivesupport forstudents’astheyinteractwiththem.Inordertoenablelearnerstocommunicatemorenaturallywiththeinterfaceandtoreflectontheirthinking,anotherstrandoftheprojectisthedevelopmentofspeechrecognitionforchildren.

WeaimatevaluatingallrelevantcomponentsoftheiTalk2Learnplatform,namelyexploratorylearningtasks, speech recognition for young learners, and automatic adaptivity concerning sequencing andsupport.Thecomponentswillbeevaluatediniterativedesignandtestcycles.TheprogressandoutcomeoftheprojectwillbeevaluatedbyusingformativeandsummativeevaluationstrategiesasillustratedinFigure 1. The formative evaluation plan (see chapter 2.) describes the foreseen iterative process ofdeveloping,implementing,andtestingthevariouscomponentsoftheiTalk2Learnplatform.Theresultsof the formative evaluation will inform the design of the summative evaluation. The summativeevaluation (seechapter3.)will evaluate thepedagogical and technologicaloutcomesof theproject intwoexperimentsthatwillbeconductedintwoprovenapplicationscenarios,intwoEuropeanlanguages(EnglishandGerman),andwithquantitative learningmeasures. In the followingwewilldescribe theformativeevaluationplanandthesummativeevaluationplaninmoredetail.

Page 7: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 7 Version 1.0

D5.1EvaluationPlan

Figure1:iTalk2Learnevaluationplan

2. Formativeevaluation

The iTalk2Learn project aims to facilitate robust learning by creating a platform enriched withintelligentsupportthatcombinesexistingstructuredlearningtaskswithnewexploratorylearningtasksinelementaryeducationandenablesyounglearnerstocommunicatemorenaturallywiththeinterface.

Workingonallthreecomponentsi.e.,exploratorylearningtasks,automaticadaptivity(bothsequencingandsupport), and speech recognitionatoncewould lead toveryhigh complexityandslow‐down theprogressoftheprojectastheprogressofonecomponentisdependentupontheprogressoftheothercomponents. We therefore work on these three interrelated components in parallel threads. Inconsequence, the formative evaluation plan foresees an evaluation of thework on these componentsseparately. The main goal of the formative evaluation is to optimally inform the project about thecurrentstateofdevelopmentofthevariouscomponentsoftheiTalk2Learnplatform.Asecondgoalofthe formative evaluation is to advance theoretical principles. As argued in DiSessa and Cobb (2004),'grand' theories of learning (such as Piaget’s theory) are not always precise enough to informinstructional design decisions. Similarly, other theoretical perspectives (referred to as 'orientingframeworks', DiSessa & Cobb, 2004) such as constructivism may provide general principles forconceptualising instructional design but lack the prescriptive power required to develop specificdesigns. Similarly, Self (1999) observes that such theories of learning or frameworks are often notadequate for facilitatingthe implementationofcomputationalsupportandadvocates for 'amixtureof

iTalk2Learnevaluation

Formativeevaluation

Summativeevaluation

Development(Preparationfor

designexperimentation)

Implementation(Design

experimentation)

Experiments‐England‐Germany

Page 8: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 8 Version 1.0

D5.1EvaluationPlan

theoryandempiricism' to informthedesignof intelligentsystems.This isexactlywhatweattempt intheformativeevaluationphaseoftheproject.

Followingbothgoals(informingtheprojectof thecurrentstateandadvancingtheoreticalprinciples),weapplymethodologiesfromthefieldsofHuman‐Computer‐Interaction(HCI)andEducationalDesignResearch.The common component betweenbothmethodologies lies in their iterative approaches: InthefieldofHCI,developing(educational)technology iteratively,meanstorepeatedly implementearlyversions of the developed technology to derive further specification for the improvement of thetechnologyaswellastocollaboratewiththeendusersearlyinthedesignprocess(Preim,1999).Thisuser‐centredapproachbuildsonNielsen’smodelofusabilityengineering(Nielsen,1989)andaimsatahighlevelofusability.TheEducationalDesignResearchmethodology(alsoreferredtoasDesign‐BasedResearch)alsoinvolvesseveraldesignphasesandearlycollaborationwiththeendusers(Cobb,Confrey,diSessa, Lehrer, & Schauble, 2003). The aim of Educational Design Research cycles is to bring abouteducational improvement through investigating instructional designs that are often significantlydifferenttothetypicalformsofeducation.Asaresult,theresearcherismorelikelytoidentifyrelevantfactorsthatcontributetotheemergenceofa'newtheory',andtheproductsofdesignexperimentsareoften innovative (Cobbetal.,2003).Edelson(2002)explains that theemergenceofanewtheoreticalprinciplesoccursas"designresearchexplicitlyexploitsthedesignprocessasanopportunitytoadvancethe researchers’ understanding of teaching, learning, and educational systems" (p.107). He identifiesthreetypesofdevelopmentsthatresultfromdesignresearch:domaintheories,designframeworks,anddesign methodologies. Educational Design Research includes a phase of preparation for designexperimentation,aphaseofconductingdesignexperimentationandaphaseofconductingretrospectiveanalysesandtheiriteration.

Phase1:Preparationfordesignexperimentation

The purpose of phase 1 is to produce a “conjectured local instruction theory” (Gravemeijer & Cobb,2007,p.19),whichwillberefinedthroughthesecondphaseofthedesignexperimentationapproach.The first phase includes literature reviews, analyses of the state‐of‐the‐art, and walk‐throughs ofpreliminaryversionsoftheto‐be‐developedcomponentsoftheoftheiTalk2Learnplatform.

FormostcomponentsoftheiTalk2Learnproject,thefirstphasealreadytookplaceorisinprogressatthe time of thiswriting. Therefore, the stepswithin this phase are described inmoredetail than thestepsfortheremainingphases.

Phase2:Conductingdesignexperimentation

During phase 2, the effectiveness and usability of the developed educational system (in our case theiTalk2Learn platform) is piloted in iterative trials. Through this experimentation period involvingiterative intervention and design cycles, theoretical principles are developed through the design andredesignofaneducationalsystem.Thedevelopmentoftheseprinciplestakesplacewithina“learningecology”(Cobbetal.,2003,p.9)thattakesthecomplexsystemofthespecificeducationalsettingandcontextintoaccount.IniTalk2Learn,thelearningecologyissetwithinanumberofprimaryschools.Theclassroomisadynamicenvironmentinwhichthevariousaspectsmustbeseenascontinuallyactingandre‐acting with each other. This dynamism negates the possibility of interpreting the classroom as a

Page 9: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 9 Version 1.0

D5.1EvaluationPlan

situation where just a collection of activities or a list of separate factors influences learning andintroducetheneedtotakeintoaccountabroadersetofinfluencingfactorstothenewformsoflearning.

The trials that will be conducted in this experimentation period will take place in the UK (with asubsample of Whizz users) and in Germany (with students recruited with the help of the RUBSchülerlabor).

Phase3:Conductingretrospectiveanalyses

Theaimofthisphaseistoprovide“resultingclaimsthataretrustworthy”(Cobbetal.,2003,p.13).Itisessentialthattheoutcomesoftheretrospectiveanalysishavebeendevelopedsystematicallythroughalllevelsandtypesofdatafromalliterations.

IntheiTalk2Learnprojectthethirdphasewilltakeplaceinthesummativeevaluation(seechapter3).Thesummativeevaluationwillallowustoestablish“trustworthy”claims.Thethirdphasealsoallowsfor triangulation between all partners, producing valid and reliable findings. The results of this thirdphasewillbereflectedinthevariousY2/Y3deliverablesandinthesummativeevaluation(D5.3).Itwill,however,notbepartofthedeliverablereportingontheformativeevaluation(D5.2

IntheremainderofthischapterwedescribeourformativeevaluationplaninlinewiththeEducationalDesignResearchmethodologyandstructuredaccordingtotheabovementionedphases:Preparationfordesignexperimentation (phase1)andconductingdesignexperimentation (phase2).Wedescribe thestepsthatweundertakeintherespectivephasesforallcomponents,that is,exploratorytasks,speechrecognition,andautomaticadaptivity.ThesecomponentswillbecombinedintheiTalk2Learnplatform.

2.1 Exploratory learning environment and tasks

As mentioned above, the iTalk2Learn project aims at fostering robust learning, which consists ofprocedural skills and conceptual knowledge. InD1.1,we discussed that the acquisition of conceptualknowledge can be facilitated by engaging students in exploratory learning tasks. For this mean wedevelop an exploratory learning environment (ELE). In the ELE studentswork on exploratory tasks,whicharedesignedbyiTalk2Learn.

Phase1:Preparationfordesignexperimentation

In thisphase iTalk2Learn isusingabootstrappingprocess for thedevelopmentandevaluationof theELE thatbrings together three sources: the literature, students’ cognitivewalk‐throughsusingpaper‐based tasks or tasks from related existing state‐of‐the‐art software, and the partners’ own designknowledgeandexpertise.Thisphase isnearing completionasof thewritingof thisdeliverable.Morespecifically, up to nowwe have undertaken the following steps within phase 1:We have conductedliterature reviews related to students' conceptual development in fractions and the pedagogy offractions to inform theELEandexploratory taskdesigns. School‐based trialshavebeenan importantaspectofthisphase,workingwithstudentstoanalyseexistingsoftware,earlyiterationsofthedesignedELEandtasks,andgaininginsightintostudents'understandingoffractions.Wehavealsomadeuseof

Page 10: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 10 Version 1.0

D5.1EvaluationPlan

experts in mathematics education to act as critical friends throughout phase 1. More details areprovidedinTable1.

Table1:Developmentofexploratorylearningenvironment(phase1)

Month(s) Tasks Objective/Briefcommentary

M4–M6 Literaturereviewonstudents’conceptualdevelopmentonfraction,i.e.theirinterpretationsoffractionsandtherepresentationsoffractions.

TheliteraturereviewfedintotheDesignDriverstoinformthedesignoftheELEinPhaseOneofthemethodology(seeD3.2forareviewofandexplanationaboutthedesigndrivers).Italsoinformedthedesignofthetasks(moredetailisprovidedinD1.1andD1.2)thatwillbetestedinPhaseTwo.

M6 Contentanalysisandobservationofexistingfractionssoftwareinuse:Whizz,FractionsTutor,andLogotronVisualFractions(LVF)1

Wemetwithfour10‐11yearoldchildrentocomparecurrentlyexistingactivitiesandidentifyhowtheELEwillbeintegratedintheseproductstocomplementandbuilduponthepresentcontent.

M6 PresentationofinitialELEdesignideastotheIOEMathematicsEducationSpecialInterestGroup(SIG)

DiscussionswiththeIOEMathematicsEducationSIGformedpartoftheinitialbootstrappingprocessforPhaseOne,basedonthedesigndrivers(fromtheliteraturereview)andourowndesignassumptions.

M6 ApplicationtoIOEEthicsCommitteetoundertakeresearchinschools

FollowingtheBritishEducationResearchAssociation’sprofessionalcodeofethics,theapplicationincludedasummaryoftheplannedresearch,anoverviewoftheparticipants,datacollectionandstorage,andinformationforparticipants’parents.Theapplicationwasapproved.

M7 Trialandreviewofexistingstate‐of‐the‐artsoftware:LogotronVisualFractions(LVF),Whizz

Thetrialwithfour10‐11yearoldstudentsfedintotheDesignConjectures(reportedinD3.2)andcontributedtotheELEandtasksdesigns.

1LogotronVisualFractions(seehttp://www.r‐e‐m.co.uk/logo/?Titleno=26562)isanexistingproductthatutilizesanumberoffractionrepresentations.InthisregarditistheclosesttowhatwewishtoachievewithmodelsintheELE.UsingitatanearlystagesupportedourunderstandingofhowstudentsusefractionsmodelsandhowtheELEcouldeffectivelybuildonexistingsoftware.

Page 11: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 11 Version 1.0

D5.1EvaluationPlan

andGizmo

M7–M8 Cognitivewalk‐throughs Fourstudents(10‐11yearsold)workedwithbespoke‐designedvirtualandpaper‐basedtaskswithateacher.Interactionandengagementwithstate‐of‐the‐artsoftwareandconceptualunderstandingoffractionswasanalysedtosupportthedesignoftheELEandexploratorytasks.

M7‐M8 DesignofELE Designdocumentproduced(seeAppendix1ofD3.2)forTestalunaasbasisforthedevelopmentoftheprototype.

M9 FirstprototypeofELEtrialledinschool

Four10‐11yearoldstudents'interactionswiththeprototypeusingfractionsadditionproblemsprovideddatafordesigningexploratorytasksandthefirstiterationoftheELEproductforPhaseTwo.

M9–M10 Designsoftasksproduced Thesewerebasedonliteraturereviewsandanalysisofstudents’engagementwiththeexistingstate‐ofthe‐artsoftware.

M10 RevisionofELEdesignbasedontrial

CommunicatedwithTestalunatodevelopthefirstiterationoftheELEproductforPhaseTwo.

M11–M12

ContinuedesignoftasksundertheassumptionsofthedesignoftheELE

Phase2:Conductingdesignexperimentation

Thedesignexperimentphasewill beginnowandcontinue throughoutY2of theproject.Phase2willinvolve iterative interventionanddesigncycles.Our initial intentionsmoving forward include trialingpaper‐basedtaskswithstudents(withdifferentmathability levels).The findingswill informthe finalstageofdesigningtheexploratorytasks.Assoonas thenext iterationof theELE isavailable, the firstexploratorytasksusingtheELEwillbetrialled.InthesetrialswewilluseaWizardofOz(WOZ)methodtosimulate the intelligentcomponents thatarebeingdesigned inparallel. InWOZstudies, thedesigndecisionsareevaluatedthroughaprocesswhereahumanreactstostudents’actionsviathecomputerbased on a scriptwhile students assume that the reactions come from the system. In thiswayWOZstudies allow testing the impact of certain design features before those features are technicallyimplemented (Mavrikis &Gutierrez-Santos, 2010). Further iterative trialswith increasing numbers ofstudents will be conducted throughout the year as other features are integrated (such as speechrecognition,supportmechanisms)until thepointatwhichtheELE isworkingmosteffectively for theprojectaimsandthesummativeevaluation.

Page 12: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 12 Version 1.0

D5.1EvaluationPlan

2.2 Speech recognition

In D3.1 we discussed that existing speech recognition developed for adults does not achieve thenecessaryaccuracyonrecognizingyoungchildren'sspeech.Forthepurposeoftheevaluationofspeechrecognitionforyounglearners,wearefollowingthegeneralmethodologyasoutlinedabove.

Phase1:Preparationfordesignexperimentation

The first step towardsdevelopinga speechrecognitionsystem foryoung learners is tocollect speechdataofthesamepopulationasthelatersystemuser,i.e.studentsinGermanyandUK.Wecollectdatatofeed into thestatisticalmodels for trainingand testing thespeechrecognitionmodule (T3.3),and thebehaviouraldataminingtask(T3.4).Thedatacollectioniscurrentlyinprogress.Thesespeechcorporain English and German contain speech collected during problem‐solving scenarios and transcriptsrepresenting the utterances as well as non‐speech events which occurred during the recording (e.g.coughing,background‐noises,fillerwords,hesitations,etc.).Thecollectionprocessforbothspeechandinteractioncorporaistakingplaceinastagedmanner,soastobeabletoadjustcollectionprocessestoexperiencesmadeoninitialsub‐portionsofthecorpus.DatacollectiontakesplaceatschoolsinanefforttoresembletheenvisagedsettingwhereiTalk2Learnwillbeused.

Phase2:Conductingdesignexperimentation

The speech corpus collected in phase 1 will be used to train the models of the speech recognitionsystem. The speech recognition system uses two statistical models, the acoustic model (AM), whichmodelshowthedifferentsoundsofthelanguagearerepresentedintheaudio,andthelanguagemodel(LM),whichmodelsprobabilitiesofoccurrencesofwordsandsequencesofwords.TheAM is trainedfromthecollectedrecordingsandtranscripts;theLMistrainedfromacorpusoftext,whichwillcontainthe transcriptsof the speechdataandadditional text,e.g. fromobservingandwritingdownwhat thestudentssay,handcraftingofpossiblephrases,ortextgrabbedfromonlineforumsorchat‐rooms.

The training processmakes use of a number of parameters that need to be set accordingly, e.g. howmuchtheprobabilityofcertainkeywordsshouldbeboostedtoimprovetheirrecognition.Afteramodelhasbeenbuilt,itneedstobeevaluatedtoassessitsperformance.Inaniterativemannertheparameterswillthenbeadjustedandtheprocessrepeated.

Toevaluateoneofthesemodels(AMorLM),aportionofthedata,maybe10%,issetasideandnotusedfortraining.Theresultingmodel is thentestedonthe10%ofthedatathatwassetaside.Incasethatonlylittletrainingdataisavailable,andtoachievethebestpossibleperformanceofthewholesystemitisadvisabletouseallofitinthetrainingprocess,theabovedescribedtestingprocesscanberepeatedwithdifferentsetsof10%ofthedata,andonceanadequatesetofparametershasbeenfound,trainafinalmodelusingalltrainingmaterial.

The LM can be built much more rapidly and easier than the AM, and we envision to train and testmultipleLMsiteratively,andfordifferentcontexts.Toevaluatealanguagemodel,wewillcomputetwomeasuresoftheLM,theso‐calledperplexityandtheoutofvocabularyrate(OOV)ontheheldoutdata.

Page 13: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 13 Version 1.0

D5.1EvaluationPlan

The purpose of the LM is to assign probabilities to the next possibleword, dependent on the partialutteranceuptoagivenpointintime.Inotherwords,theLMtriestopredictthenextword.Wemeasurehowwellthispredictionfitsthetestdata,whichiscalled“perplexity”.

AnimportantpartoftheLMisthevocabulary.Thevocabularydetermineswhatcanberecognized‐anywordNOTpart of the vocabulary cannotbe recognized. Therefore anyword, thatwill be saidby thechildren,andwhichisnot inthevocabulary,will leadtoat leastonerecognitionerror,possiblymore,becausetheLMwillloseitscontextandhasalesserchancetocorrectlypredictthefollowingwords.Butwemightwanttousenotallofthewordsfromthetrainingcorpus,toreducethechancethatwordsareacousticallysimilarandgeneratehigherprobabilityoferrors.Additionally,dependingontheoriginofthecorpus,notallwordswillberelevant,e.g.theremightbetyposifthetextwasgrabbedfromonlineforumsorchatrooms(onlineforumsorchatroomsmaybepossiblesourcesforthetextcorpusthatweneedtolookatinmoredetails).Thedesignofthevocabularyalsohastotakeintoaccounttheindicatorsneededbytheautomaticadaptivity,e.g.fillersorhesitationshavetobepartofthevocabulary.TheOOV‐ratemeasureshowwellthevocabularyfitsthetestdata,specificallyhowmanywordsoftestsetweremissedbytheselectionofthevocabulary.

TheAMcannoteasilybetestedalone,becauseinthefinalsystemitisalwayscombinedwithaLM.Toevaluateitsquality, logfilesofthetrainingprocesswillbeanalyzed,whichshowwhetherthetrainingprocess converges to the trainingdata.ThenacombinedsystemcontaininganAMandanLMwillbetested.Thestandardindicatoristheworderrorrate(WER),whichisthepercentageofmis‐recognizedwords in the test set. For this aim (i.e., testing the speech recognition system) we transcribe thestudents’utterancesandcomparethesetranscriptstotheoutcomeofthespeechrecognitionsystem.Inoursetup,whereweenvisiontousespeechrecognition,forinstance,toassessthechildren'suseofthecorrect mathematical terminology, and are therefore interested in these keywords only, the usualmeasuresareprecisionandrecall.Thesearecalculatedfromthetwopossibletypesoferror,anutteredkeywordwasnotrecognized,whichiscalledafalsenegative;orakeywordwasrecognizedwhenitwasnotsaid,whichiscalledafalsepositive.Dependingontheusagescenariooneofthesetypesoferrorsmaybemoresevere than theother; the trainingparameterscan thenbeadjusted to find theoptimaloperating point. Precision is the percentage of correctly recognized keywords among all recognizedkeywords, whereas recall is the percentage of correctly recognized keywords among all utteredkeywords.

Speechrecognitionwillbeincludedinthedevelopmentofautomaticadaptivityasdescribedbelow.

2.3 Automatic adaptivity

As described in the introduction, the iTalk2Learn project aims to support the acquisition of robustknowledge, which includes procedural skills and conceptual knowledge (Koedinger et al., 2012).Procedural skills can be fostered by structured practice; conceptual knowledge by learning withexploratory tasks. Thus, we aim to develop a platform including both structured practice andexploratory activities by building on and enhancing existing tutors and developing new components(i.e., the ELE as described above). The platform will further facilitate natural interaction by speechrecognition.

Page 14: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 14 Version 1.0

D5.1EvaluationPlan

ThedevelopmentofautomaticadaptivityfortheiTalk2Learnplatformwillfocusonthreethreads:

1) Concerning structured practice we integrate existing tutors (Whizz and Fractions Tutor;however,FractionsTutormustfirstbetranslatedandadaptedtotheneedsofGermanstudents)in our iTalk2Learn platform. To improve these tutors we use recommender technology todevelop an adaptive content sequencer which selects the order of tasks in dependence ofstudentslearningprocess(see2.3.1).

2) Inaddition,we focusonhowtobestcombinestructuredpracticeandexploratoryactivities inorder to reach our goal of fostering robust learning (see 2.3.2). In otherwords,when shouldstudentsswitchfromstructuredpracticetoexploratorytasks,andviceversa.

3) Finally,weaimatguidingstudentsduring thewhole learningprocess (seeTable2and2.3.3).With regard to structured tasks the existing tutors already provide task dependent hints tostudents. For our newly‐developed exploratory learning tasks we aim to also develop task‐dependent support. In addition, we plan to develop task‐independent support that takesstudents’ utterances into account (for instance, we envisage that the system could promptstudentstousemathematicalterminology).

Allthreadswillincludeworkwithandwithoutindicatorsfromspeech.

Table2:Task‐dependentandtask‐independentsupport

Task‐dependentsupport Task‐independentsupport

StructuredTasks AsprovidedbyMath‐WhizzandFractionsTutor Tobeimplementedby

iTalk2Learn(forinstance,encouragementtousemathematicalterminology).

ExploratoryTasks TobeimplementedbyiTalk2Learn:Hintsandfeedbackduringexploratorytasks

2.3.1 Sequencingforstructuredtasks

Asindicatedabove,thefirstcomponentofautomaticadaptivityaimsatdevelopinganadaptivecontentsequencerfortheiTalk2Learnplatform.

Phase1:Preparationfordesignexperimentation

To reach the goal of an adaptive content sequencerwe build a probabilisticmodel that selects tasksbasedonrecordingsoflearner‐tutorinteractions.Eachnewobservation(i.e.,interactionwiththetutor)canbeutilizedtoshape themodel toselect tasks thatmaximizeourgoalof fosteringrobust learning.Realizingadaptivesequencing isnoteasy,becausethequalityof thesequencerneeds tobeevaluatedonline(i.e.,whilestudentsareinteractingwiththesystem).

Page 15: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 15 Version 1.0

D5.1EvaluationPlan

Onepossibleapproachwouldbetorecordadatasetthatexplorestherelationshipbetweensequenceofthe content and learning (i.e. exploring themajority of possible element combination of a sequence).Thiswouldmeanthatallstudents(highachieversandlowachievers)wouldhavetosolveverydifficulttasksaswellaseasy tasks inavarietyof combinations.Thisprocess couldeasily frustrate theyounglearners of our target population (Chi Min et al., 2011). We will therefore initially implement thesequencer exploiting performance predictionmethods, by building amodel using historic data.Withhistoricdatawerefertologfilesofthestructuredtutorsthatarealreadyavailable.AsimilarapproachwasusedbyKoedingerandcolleagues(2011).However,wewillneedtocollectnewdatafromstudentsactuallyusingthesystemwiththedevelopedsequencertotestitsquality.Thiswillbepartofthesecondphase(i.e.,conductingthedesignexperiment).

At the beginning, the adaptive sequencing will consider performance measurements (for instance,successanderrorrate,timeneededetc.).WewillconductananalysisofhistoricWhizzdatainordertoevaluatewhich log files canbeused to create an efficient studentmodel for performanceprediction.Corrupteddata(e.g.,anegativenumberforage)willberemovedandlogdataofstudents“gamingthesystem”(i.e.,studentsnotreallylearningbutengagingforexampleintrialanderrorbehaviortofinishtheassignedtasks)willbediscussed.TheanalysisofWhizzdataalsoinvolvesevaluatingpossibilitiesofcreatingmetadatathatcouldgiveusmoreinformationabouttheactualconditionoftheuser.Possibleinformation in the metadata could include the error frequency, the tendency to ask help etc. Thisinformation will be utilized to bias the performance predictor in order to obtain a more preciseprediction.Thesameprocedurewill beundertaken forFractionsTutor;however, in thiscasewewillalreadyneedtocollectsomenewdatainthisfirstphasebecausehistoricdataoftheFractionsTutorcanonlybeusedinaverylimitedwayfortworeasons:1)WewilltranslatetheFractionsTutorandadaptittheGerman students’ needs. Thesemodificationsmay influence the fit between themodeldevelopedwithhistoricdataanddatacollectedwiththemodifiedFractionsTutor.2)TheFractionsTutordoesnotpresentthetasksinmanydifferentorders.Datafromdifferentordersofthetasksisneededtodevelopthemodel.

Finally, we will integrate information gained by the speech recognition in the model to adapt thesequence accordingly. In the initial phase, we evaluate existing information from state‐of‐the‐artmethodsandSail’spreviousexperienceinspeechanalysis.Forexample,ifwecandetectutterancesbythe students indicating that they perceive the task as being too easy,wewill bias the task selectionmodelinaway,thatstudentssubsequentlyreceiveamoredifficulttask.ThemodelwillbebuiltwiththecollectedandpreprocessediTalk2Learndata.

Phase2:Conductingdesignexperimentation

InasecondstepthedevelopedsequencerwillbeimplementedintheiTalk2Learnplatform.Asalreadysaid the model will be built with the preprocessed data and tested with some elementary studentsrecruitedwiththehelpoftheWhizzuserdatabaseinEnglandandtheRUBSchülerlaborinGermany.Inthis test phasewewill collect the interactions in log files to evaluate students learning process andreportdifferencesbetweentheonesusingtheadaptiveversionandtheonesusingtheoriginalversionsofWhizzorFractionsTutor.

Page 16: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 16 Version 1.0

D5.1EvaluationPlan

We will further evaluate whether recognizing students’ speech provides useful information forsequencingthetasks.Inthisregardweaimattestingwhethertheuseofspeechindicatorsallowsustomodeltheknowledgeacquisitionofthestudent.Weexpectthatbyincludingspeechindicatorsintothemodelofthesequencerwewillbeabletoprovideanimprovedtrajectoryofcontentstothestudents.

Requirements and state‐of‐the‐art for adaptive intelligencewill be reported inD2.1.A first andmoreadvancedprototypeof theadaptive sequencerwillbedeliveredand commented inD2.2.1andD2.2.2respectively.Theoutcomesoftheevaluationincludingstudents’speechwillbereportedinD3.4.

2.3.2 Automaticswitchingbetweenstructuredandexploratorytasks

Inordertoknowhowtocombinestructuredandexploratorytaskweevolveaninterventionmodel.Thetheory‐basedinterventionmodelwillbedescribedinD1.3.Itwillserveasframeworkforansweringtwomainquestions:

1. Should students initially start the learning sequence with exploratory learning tasks orstructuredpractice?

2. When should students switch between the two different types of tasks (i.e. from structuredpracticetoexploratorylearningtasks,orviceversa)?

Phase1:Preparationfordesignexperimentation

Withregard to the firstquestionweconducta literaturereview focusingon instructionalapproacheswhich with different sequences of structured and exploratory learning tasks (e.g., Van Merriënboer,Clark&deCroock,2002;Reigeluth,Merrill,Wilson&Spiller,1980).Inparticular,wefocusonacurrentscientificdebateabouttimingofexploratorytasks,structuredtasks,andinstruction(e.g.,Kapur,2008;Kapur&Bielaczyc,2012;Kirschner, Sweller&Clark,2006;Schwartz&Bransford,1998).MoredetailwillbeprovidedinD1.3.

In order to answer the second question we need to determine when switching between structuredpracticeandexploratorytasksisuseful.Withregardtothisgoal,weaimatidentifyingspecificlearningindicators, which show, for instance, that the respective student has already developed (enough)proceduralorconceptualknowledgeandhencemightnotneedtorepeatedlyengageinthesametypeoftasks.Insuchasituationswitchingtotheothertypeoftaskmayprovidestudentswithfurtherlearningopportunities.For identifyingsuch learning indicators(e.g.decreaseof timespentonasingle taskforproceduralknowledgegains),wearereviewingliteraturefromthefieldofHCIingeneralandfromthefieldofeducationaltechnologyinparticular.Inadditiontolearningindicatorsthatcanbeidentifiedinstudents’performanceonatask,weenvisagethatthesystemcouldalsoadaptitsrecommendationforthenexttasksbasedonutterancesthatinformaboutstudents’perceptionontasks(difficult,boringetc).

Building on the (theoretical) intervention model, we will develop methods to automatically switchbetweenstructuredandexploratorytasksasthelearnerprogresseswithintheiTalk2Learnplatform.Nostate‐of‐the‐art literature is available for machine learning that could be applied to this task. Thus,

Page 17: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 17 Version 1.0

D5.1EvaluationPlan

accomplishing this taskrequiresa lotof innovativeworkbyseveralpartners. Inparticular,UHi,RUB,andIOEwillcollaboratestronglytodevelopamethodforautomaticswitching.

Phase2:Conductingdesignexperimentation

In order to underline our theoretical considerationswe intend to test different sequenceswith smallsamplesofstudents.Thesetestswillprovidefurtherindicatorstorefineourinterventionmodel.

Furthermore,wewill test theautomatic switchingbetween structured tasks andexploratory tasks initerative design cycles with a small set of students. The first trials will evaluate switching based onindicatorsderivedfromstudents’actionsinthesystem.Latertrialsmayadditionallyincludeindicatorsderived from speech. We propose to evaluate automatic switching by measuring the number ofsuccessfully solved tasks and the required time. In addition, we will implement a posttest. Thedescriptive statistics of these measurements will provide feedback for the refinement of switchingbetweenstructuredtasksandexploratorytasks.

2.3.3 Support(HintsandFeedback)

As indicated earlier (seeTable 2), support (i.e., hints and feedback) can be given at two levels: task‐dependentandtask‐independent.

Task‐dependentsupport:

Phase1:Preparationfordesignexperimentation

Regardingstructuredtasks,thetutors(i.e.,WhizzandFractionsTutor)alreadyprovidehintsdependingon the student’s progress.As alreadymentioned,wewill leave thehint functionalities of the existingtutorsintact.

Fortheexploratorylearningtasks,wewilldeveloptask‐dependentsupportbecauseresearchonguideddiscovery learning has shown that support is a prerequisite for learning in these settings (e.g. van Joolingen, de Jong, Lazonder, Savelsbergh, & Manlove, 2005; also see D1.1). Thus, the development ofadaptivesupportishighlyinterlinkedwiththedevelopmentoftheexploratorylearningtasksdescribedin2.1.Similartotheabovecomponents,wederiveindicatorsforwhentoprovidewhatkindofsupportfrom a literature review and from our early observations with students. The development of theexploratory tasks that has been described above also helps toward deriving initial information thatfacilitatethedesignofthetask‐dependentsupporttobeprovidedintheELE.

Phase2:Conductingdesignexperimentation

With respect to the task‐dependent supportwithin the exploratory learning tasks,wewill undertakeWOZ studies. These studies act as bothdesign and evaluation studies. Evaluating the performance ofELEsupportisadifficultquestionbyitself;traditionalstrategies,likecomparingaguidedversiontoanunguidedversionoftheexploratorylearningtasksisnotavalidapproachbecauseitiswell‐understoodthatanunguidedversionwillnotresultinanyproductivelearning,thatis,inanyeventsupportwillbebetter than no support. IOE and BBK have devised a methodology to evaluate the performance of

Page 18: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 18 Version 1.0

D5.1EvaluationPlan

intelligentsupportinexploratoryenvironments(exemplifiedinMavrikisetal.,2012)anditwillbeusedinthecontextof iTalk2Learn.Themethodologyreliesonvariousmetricsthatallowidentifyingdesignproblems and measuring students’ perception of the intelligent support at various implementationstages. For example,we have employed themetric of ‘relevance’ as ameasure of howmany supportinterventionsmadebythesystemwererelevantforthestudenti.e.whereappropriatetothesituationasjudgedbyexperts(ouronlyavailablegoldstandard).Wewilladaptthismethodologyaccordingtotherequirementsofthisproject.

Inaddition,wewillaimtocollectinformationonchildren’sopinionontheexploratoryenvironmentingeneralbutalsogaugetheirperceptionoftheintelligentsupportinparticular,inordertohelpusmakedesigndecisionssuchastheactualmessages,theirappearanceandthegeneralapproachwearetaking.

Task‐independentsupport:

Phase1:Preparationfordesignexperimentation

Task‐independently (i.e., for structured tasks and exploratory tasks)wewill provide further supportbuilding on advanced behavioral interaction interpretation (i.e., speech). For instance, we aim atderiving indicators forperceived taskdifficulty(e.g., studentsstating that the task iseasyordifficult)andotheraffective factors (e.g.boredomor frustration) that canbeusedasadditional information toadapt the task‐independent support. We further envisage detecting students’ use of mathematicterminologyandprovidefeedbackaccordingly.

Inordertoderivespeechindicators,wefirstrecordwhatstudentsutterwhenworkingwiththesystem.Additionally,wecompilevocabulary lists fora)possibleutterancesregardingperceivedtaskdifficultyand b) relevant mathematic terminology. Both lists will be integrated in the LM and the AM of thespeech recognition system. The speech recognition systemwill be trained to detect the words fromthese lists. Inthisregard,wehavetorunseveral trainingandtestingcyclesof thespeechrecognitionsystem to find the optimal balance between boosting these words in order to ease their detectionwithoutincreasingtheprobabilityofafalsedetectiontoomuch.

Phase2:Conductingdesignexperimentation

With respect to the task‐independent support, we will test the developed support (e.g., feedbackregardingtheuseofmathematicsterminology,affectiveaspects,andperceiveddifficulty)withasmallsamplesizeiniterativedesigncycles.WewillfirstconductWOZstudiestotestthegeneraleffectofthedevelopedsupport(withoutrelyingonthesystem),bothwithandwithoutincludingspeechindicators.Afterwardsthedesignexperimentationwill includetestswiththe“real”speechrecognitionsystemasdescribedabove.

3. Summativeevaluation

Inthesummativeevaluationtheparalleldevelopmentsoftheprojects(i.e.,exploratorytasks,automaticadaptivity,andspeechrecognition)willbebroughttogether.Allcomponentsarecombinedinaunifying

Page 19: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 19 Version 1.0

D5.1EvaluationPlan

platform. This platform will allow integrating existing Tutors for structured practise (i.e., Whizz orFractionsTutor)andcombiningthemwiththedevelopedELEforexploratorytasks.Theplatformwillinclude an automatic sequencer for sequencing the structured tasks and for switching betweenstructured tasks and exploratory tasks. Furthermore, it will provide task‐independent support. Alladaptive componentsof theplatform (i.e., sequencing structured tasks, switchingbetween structuredandexploratorytasks,andtask‐independentsupport)willworkwithandwithoutspeechindicators.Formore information on the unifying platform see D4.1. The task‐dependent support features describedaboveareembeddedintheexistingTutorsforstructuredtasksandwillbeembeddedintheELEfortheexploratory tasks. These features rely on students’ actions in the system (i.e., they are not based onspeechindicators).

As described in chapter 2, our formative evaluation plan includes phase 1 (Preparation for designexperimentationandphase2(Conductingdesignexperimentation)oftheEducationalDesignResearchmethodology.Thethirdphase(Retrospectiveanalysis)oftheEducationalDesignResearchmethodologywill be conductedwithin the summative evaluation. This phase aims to evaluate thepedagogical andtechnologicaloutcomesof theproject inorder toderiveclaimsthataretrustworthy. Inparticular,wewill investigate two hypotheses: 1) Combining structured practice and exploratory tasks promotesrobust learning. 2) Indicators from speech will enhance the automatic adaptivity. To test thesehypotheses,wecomparemultipleversionsoftheunifiedplatformasdisplayedinTable3.Studentswillberandomlyassignedtooneoftheconditions(i.e.,thewillworkwithoneoftheversions).

Table3:Conditionsintheexperimentsofthesummativeevaluation

Fullversionwithspeech

Fullversionwithoutspeech

Versionwithout ELE

Sequencingstructuredtasks2

Yes,withspeechindicators.

Yes,withoutspeechindicators.

Yes,withorwithoutspeechindicators.

Switchingbetweenstructuredandexploratorytasks3.

Yes,withspeechindicators.

Yes,withoutspeechindicators.

No.

Task‐independentsupport

Yes,withspeechindicators.

Yes,withoutspeechindicators.

Yes,withorwithoutspeechindicators.

ComparingtheversionwithoutELEandthefullversionwithoutspeechtothefullversionwithspeechallowstestingthetwomentionedhypotheses,whicharedescribedinmoredetailbelow:

2Thestructuredtasksincludetask‐dependentsupportasprovidedbyWhizzorFractionsTutor.3Theexploratorytasksincludetask‐dependentsupport,whichwillbedevelopedasdescribedin2.3.3.

Page 20: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 20 Version 1.0

D5.1EvaluationPlan

Hypothesis1:Robustlearningrequiresproceduralskillsandconceptualknowledge.Switchingbetweenstructured tasks and exploratory tasks is required to foster both types of knowledge. Thus, a systemincluding exploratory learning task (i.e., a full version) will foster robust learning in comparison tosystemswithstructuredtasksonly(i.e.aversionwithoutELE).Totesthypothesis1,wewillimplementacontrolconditionwithoutthenewlydevelopedexploratorylearningenvironment(i.e.,versionwithoutELE).Wewillmakesurethatthisconditiononlydifferswithregardtothetypeoftask,notwithregardtothecontentthatcanbelearntwiththesystem.

Hypothesis 2: Speech recognition allows identifying learning indicators that cannot be measuredotherwise.Includingthese indicatorstotheadaptivityofthesystem(i.e., thefullversionwithspeech)willfosterlearningincomparisontoanadaptivesystemwithoutspeech‐basedlearningindicators(i.e.thefullversionwithoutspeech).Totesthypothesis2wewill implementacontrolconditionthatdoesnotusespeechrecognitiontoadaptsequencing,switching,andtask‐dependentsupport(i.e.,fullversionwithoutspeech).Inthiscondition,theadaptivitywillbebasedonlyonstudents’keyboardentries.

AsmentionedintheDOW,thepedagogicalandtechnologicaloutcomesoftheprojectwillbeevaluatedin two application scenarios in two European languages with quantitative learning measures. Oneexperimentwill take place in a controlled setting (i.e., lab study) inGermany using content from theFraction Tutor for the structured tasks. The other experiment will take place in the UK using theplatform online (i.e., in a more open and less controlled setting) using content of Whizz for thestructuredtasks.Inbothscenarios,WhizzorFractionsTutorwillbecombinedwiththedevelopedELEforthefullversions(seeTable3).

The experimentswill examine the general effectiveness of the adaptive intelligent support for robustlearning and test the aforementioned hypotheses. In both experiments, we will test for learningoutcomes. In order to be able to compare the results of both experiments we unify the metrics asfollows:Wechoosesysteminherentlearningindicatorsthatcanbeloggedbybothsystems,WhizzandFractionTutors aswell as thedevelopedELE.Moreover thepost‐testwill constitute of the same, yettranslated, itemsmeasuringproceduralskills,andconceptualknowledge. Itemstestingforproceduralskillswillpresentproblems isomorphic to theones studentsworkedonwith the system.Thus, theseitemswillrequirestudentstoapplytheprocedurestheylearntduringtheinteractionwiththesystem.Items testing forconceptualknowledgewill ask forexplanation to identifystudentsunderstandingofthe learntconcepts,andaskstudents toadopt the learntproceduretounfamiliarproblems.Thepost‐testwillbepilotedwithstudentsfromourtargetgroup.

In addition to student learning,wewill implement surveysmeasuring students’ satisfactionwith thesystemandtheirmotivation/engagement.Wewillrelyonmetricsappropriateforstudentsinthisage.In previous projects, we have employed a 5‐point Likert scales appropriate for children to evaluateimportantconstructsincludinghelpfulness,repetitivenesscomprehensionandaffect(seeMavrikisetal.2012)usinga visual analogue scale that employspictorial representations that children can relate to(eg.theFunToolkitinRead,MacFarlane,&Casey,2002)..

Page 21: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 21 Version 1.0

D5.1EvaluationPlan

3.1. ExperimentinGermany

As already indicated above, RUB will evaluate the iTalk2Learn platform in a controlled setting inGermany.Fortheexperiment,wewillrecruitchildrenfromelementaryschools(n=30percondition)withthehelpofthe“Schülerlabor”oftheRuhr‐UniversitätBochum.TheRUBSchülerlaborisanextra‐curricular locationwhereclassescanspendadayworkingonaspecificwell‐preparedtopicthatgoesbeyondtheschoolcurriculum.Duetotheseactivities, theRUBSchülerlaborhasmanyschoolcontactsthatwillbeactivatedtorecruit students.Studentswillworkwithdifferentversionsof thesystem(cf.Table3)andwewillcollecttheirinteractionsinlogfilesaswellastheirlearningoutcomesmeasuredbya(paper‐based)post‐test.

3.2. ExperimentinEngland

TheexperimentinEnglandwillbeconductedonlinethroughtheiTalk2learnplatform.WewilladvertisethroughemailstoIOEcontactsandWhizzcustomerbase(thatfitourtargetpopulation).Wewillfurtherrely on previous agreementswith twoEnglish schools and other contacts thatwe aremaking as theprojectprogresses.Thiswaywehopeforasubstantialnumberofstudents(around40percondition).Note that this experimentmay take place in a less controlled setting (e.g. studentsmay be asked tointeract with the platform in a longer period of time from home) so the results will be qualifiedaccordingly.

Itisworthstatingthattheimplementationofaspeechrecognitionsysteminanuncontrolledsettingisariskytaskduetothecomplexitiesanduncertaintiesbehindthespeechrecognitionsoftwareworkinginvaried,uncontrolledcontexts.Thus,theexperimentwillprovide insights intotheecologicalvalidityofourapproach.

4. Conclusion

To conclude, we foresee to evaluate the iTalk2Learn progress and outcomes in two ways. First, weevaluatetheprocessasdescribedintheformativeevaluationplan(chapter2)usingEducationalDesignResearchmethodologies.Duringtheformativeevaluation,thecomponentsoftheiTalk2Learnplatform(i.e.exploratorytasks,automaticadaptivity,andspeechrecognition)willbedevelopedandevaluatedinan iterativeprocessofdesignexperimentation.Wewillworkon the components inparallel to ensurethat the project progresses in time. The results of the formative evaluationwill be reported in D5.2.These resultswill inform the project about progress and the usability of the developed components.Moreover, the results of the formative evaluationwill allow us to derive specific hypotheses for theexperimentsconductedaspartofthesummativeevaluation.

Thesummativeevaluationwilltestthehypothesesdescribedabove(thatwillpossiblybemodifiedandspecifieddependentontheresultsof theformativeevaluationusingquantitativemethodswith largersample size. To test the generalizability of our outcomes and the applicability of the iTalk2Learnplatform to different educational settings, the summative experiments will be conducted in two

Page 22: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 22 Version 1.0

D5.1EvaluationPlan

European countries (Germany and UK), using different tutorial systems for the structured tasks(FractionsTutorandWhizz)intwosettings(controlledlaboratorysettingandopenonlineorclassroomsetting).TheresultsofthesummativeevaluationwillbereportedinD5.3.

References

Ainsworth, S., & Loizou, A. T. H. (2003). The effects of self‐explaining when learning with text ordiagrams.CognitiveScience,27,669‐681.

Chi,M. T.H., Bassok,M., Lewis,M., Reimann, P., & Glaser, R. (1989). Self‐explanations:How studentsstudyanduseexamplesinlearningtosolveproblems.CognitiveScience,13,145‐182.

Chi, M., VanLehn, K, Litman, D., & Jordan, P. (2011). Empirically evaluating the application ofreinforcement learning to the induction of effective and adaptive pedagogical tactics. UserModelingandUserAdaptedInstruction(UMUAI),21,137‐180.

Cobb, P., Confrey, J., diSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiment in EducationalResearch.EducationalResearcher,32(1),9‐13.

diSessa,A.,&Cobb,P.(2004).OntologicalInnovationandtheRoleofTheoryinDesignExperiments.TheJournaloftheLearningSciences,13(1),77‐103.

Edelson,D.C.(2002).DesignResearch:Whatwelearnwhenweengageindesign.JournaloftheLearningSciences,11(1),105‐121.

Gravemeijer, K.&Cobb, P. (2006).Design research froma learningdesignperspective, In: J. vandenAkker,K.Gravemeijer, S.McKenney&N.Nieveen (Eds.)EducationalDesignResearch (pp.17–51).London:Routledge.

Kapur,M.(2008).Productivefailure.CognitionandInstruction,26(3),379‐424.

Kapur, M. & Bielaczyc, K. (2012): Designing for Productive Failure, Journal of the Learning Sciences,21(1),45‐83.

Kirschner, P. A., Sweller, J., & Clark, R. E. (2006).Whyminimal guidance during instruction does notwork.EducationalPsychologist,41(2),75‐86.

Koedinger, K. R., Corbett, A. T., & Perfetti, C. (2012). The knowledge‐learning‐instruction framework:Bridgingthescience‐practicechasmtoenhancerobuststudentlearning.CognitiveScience,36(5),757–798.

Koedinger,K.R., Pavlik Jr., P.I., Stamper, J.C.,Nixon,T.,&Ritter, S. (2011).Avoidingproblemselectionthrashing with conjunctive knowledge tracing. In Proceedings of the Fourth InternationalConferenceonEducationalDataMining(pp.91‐100).

Page 23: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 23 Version 1.0

D5.1EvaluationPlan

Lewis,C.H.(1988).Whyandhowtolearnwhy:Analysis‐basedgeneralizationofprocedures.CognitiveScience,12,211‐256.

Mavrikis,M.,&Gutierrez‐Santos, S. (2010)Not allwizardsare fromOz: Iterativedesignof intelligentlearningenvironmentsbycommunicationcapacitytapering.Computers&Education,54(3),641‐651.

Mavrikis, M., Gutierrez‐Santos, S., Geraniou, E., & Noss, R. (2012).Design requirements, studentperception indicatorsandvalidationmetrics for intelligentexploratory learningenvironments.InPersonal and Ubiquitous Computing, Available online at http://dx.doi.org/10.1007/s00779‐012‐0524‐3

Nielsen,J.(1989).UsabilityEngineeringataDiscount.InG.Salvendy&M.J.Smith(Eds.),DesigningandUsing Human‐Computer Interfaces and Knowledge‐Based Systems (pp. 214‐221). Amsterdam:ElvisierScience.

Preim, B. (1999). Entwicklung interaktiver Systeme. Grundlagen, Fallbeispiele und innovativeAnwendungsfelder [Development of interactive systems. Basic principles, case studies, andinnovativefieldsofapplication].BerlinHeidelberg:Springer‐Verlag.

Read, J., MacFarlane, S., & Casey, C. (2002), Endurability, engagement and expectations: Measuringchildren's fun. In Proceedings of Interaction Design and Children (pp. 189–198), Eindhoven,ShakerPublishing.

Reigeluth,C.M.,Merrill,M.D.,Wilson,B.G.,&Spiller,R.T.(1980).TheElaborationTheoryofInstruction:Amodelforsequencingandsynthesizinginstruction.InstructionalScience,9(3),195‐219.

Rittle‐Johnson,B.,Siegler,R.,&Alibali,M.(2001).Developingconceptualunderstandingandproceduralskillinmathematics:Aniterativeprocess.JournalofEducationalPsychology,93(2),346‐362.

Schwartz,D.L.,&Bransford,J.D.(1998).Atimefortelling.CognitionandInstruction,16,475–522.

Self,J.(1999).Thedefiningcharacteristicsofintelligenttutoringsystemsresearch:ITSscare,precisely.InternationalJournalofArtificialIntelligenceinEducation,10,350‐364

van Joolingen, W. R., de Jong, T., Lazonder, A. W., Savelsbergh, E. R., & Manlove, S. (2005). Co‐Lab:research and development of an online learning environment for collaborative scientificdiscoverylearning.ComputersinHumanBehavior,21(4),671–688.

VanLehn,K.(1999).Rulelearningeventsintheacquisitionofacomplexskill:AnevaluationofCascade.JournaloftheLearningSciences,8(1),71‐125.

VanMerriënboer,J.J.G.,Clark,R.E.,&deCroock,M.B.M.(2002).Blueprintsforcomplexlearning:The4C/ID‐model.EducationalTechnologyResearchandDevelopment,50,39‐64.

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, andmethods.Speechcommunication,48(9),1162‐1181.

Page 24: iTalk2Learn 2013-10-31 Deliverable 5.1 Report on ... · The summative evaluation (see chapter 3.) will evaluate the pedagogical and technological outcomes of the project in two experiments

31‐10‐2013 24 Version 1.0

D5.1EvaluationPlan

Worsley, M., & Blikstein, P. (2011). What's an expert? Using learning analytics to identify emergentmarkersofexpertisethroughautomatedspeech,sentimentandsketchanalysis. InProceedingsoftheFourthInternationalConferenceonEducationalDataMining(pp.235‐240).