The Impact of Multi-Character Story Distribution and ...neff/papers/StoryEngagement.pdf · capture overlays are added in a post-processing step to provide additional context for the

The Impact of Multi-Character StoryDistribution and Gesture on Children’s

Engagement

Harrison Jesse Smith1, Brian K. Riley1, Lena Reed2, Vrindavan Harrison2,Marilyn Walker2, Michael Neff1

1 University of California, Davis, Davis CA 95616{hjsmith,bkriley, mpneff}@ucdavis.edu

2 University of California, Santa Cruz, Santa Cruz CA 95064{lireed,vharriso, mawalker}@ucsc.edu

Abstract. Effective storytelling relies on engagement and interaction.This work develops an automated software platform for telling stories tochildren and investigates the impact of two design choices on children’sengagement and willingness to interact with the system: story distribu-tion and the use of complex gesture. A storyteller condition compares sto-ries told in a third person, narrator voice with those distributed betweena narrator and first-person story characters. Basic gestures are used inall our storytellings, but, in a second factor, some are augmented withgestures that indicate conversational turn changes, references to othercharacters and prompt children to ask questions. An analysis of eye gazeindicates that children attend more to the story when a distributed sto-rytelling model is used. Gesture prompts appear to encourage children toask questions, something that children did, but at a relatively low rate.Interestingly, the children most frequently asked “why” questions. Gazeswitching happened more quickly when the story characters began tospeak than for narrator turns. These results have implications for futureagent-based storytelling system research.

Keywords: Embodied Storyteller · Listening Comprehension · PrimarySchool · Case Study.

1 Introduction

For many, being read a bedtime story is a fond childhood memory. This comfort-ing experience also creates excitement as the the new world of the story unfolds.While enjoyable, such storytelling also provides the foundation for developinglistening comprehension and later reading comprehension skills [28, 19, 20, 38,48, 16, 59, 63] which are critical to educational attainment.

While the levels of quality home language input low socioeconomic status(SES) children receive is debated [25, 57], it is clear that the absence of such lan-guage input can negatively affect a child’s early language development skills [26,27, 8, 14, 4, 35, 44, 64, 67, 31, 56]. If children do not have adequate language skills

2 H.J. Smith et al.

in the primary grades, they are likely to have persistent academic difficulties [30,56, 62], leading to long lasting consequences [52].

Computer storytelling apps may provide a way to address this early exposuregap and remediate, at least in part, the early educational deficit by providing highquality language exposure at home or in the classroom. They can be displayedon phones or tablets and deployed at low cost. It remains unclear, however, howto effectively design these apps in order to maximize child engagement.

To help answer this question, we present the results of experiment that looksat two factors in story presentation. The first compares narrator-only story-tellings (third-person) with tellings that distribute the text between a narra-tor and story characters (first and third-person). The second factor varies theamount of nonverbal behavior present in the characters, comparing a conditionthat only uses beat gestures and subtle head nods with a condition that includescharacter deixis gestures, turn taking cues, and interaction prompts (see 2.3 forgesture type definitions).

To investigate these factors, we used a custom-built Unity application andcloud-based text-to-speech software to present four Aesop’s fables in a repeat-able, controlled fashion. The storytelling application is shown in Fig. 1. Duringthe story presentation, we recorded participants’ gaze locations. After each storyconcluded, the system solicited and recorded questions asked by the participant.

The experiment was run as a 2x2, within-subjects study focused on chil-dren aged 5-8. Results indicate that a multi-character, distributed telling of thestory is more engaging than a narrator-only telling, based on gaze behavior.The impact of nonverbal communication appears complicated, as the additionalanimation of a conversational turn handover can hold student attention, ratherthan directing it at the intended target. However, there is some evidence thatquestion prompting gestures can help elicit feedback from children. Questionelicitation at the end of the story resulted in questions 20% of the time, most ofwhich (70%) where different types of why questions.

The results reported in this paper have implications for future automatedand/or interactive storytelling applications. They suggest that presenting storiesfrom multiple characters’ first-person points of view is an effective way to increasestudent engagement. While question-prompting gestures may be a useful way toellicit questions from students, it is unclear whether nonverbal turn-over gesturesare an effective method for signalling the next speaker to children aged 5-8. Thefrequency and types of questions asked is useful for developing a conversationalstorytelling framework, which is a long-term aim of this project.

The contributions of this work are as follow:

– We show that a distributed storytelling model results in significantly higherengagement than a narrator-only model.

– We present data suggesting that question-prompting gestures may be effec-tive for eliciting questions from children.

– We present the frequency and types of questions asked by children to anautomated storytelling application.

Impact of Story Distribution and Gesture on Children’s Engagement 3

– We present data showing that nonverbal turnover gestures may not be aneffective method of signalling the next speaker to children aged 5-8.

– We demonstrate a webcam-based method for collecting gaze data, useful incertain experimental settings.

Fig. 1. Left: The laptop and camera placement used to collect video footage of theparticipants. Right: The video footage used to extract gaze targets. Inverted screencapture overlays are added in a post-processing step to provide additional context forthe annotators.

2 Background

2.1 Automated Storytelling and Editing

Recognizing the importance of reading and storytelling for children’s devel-opment, related work has also focused on improving children’s reading skills.Project LISTEN was one of the first systems in this area: it aimed at computertutors that could listen to a child read aloud and provide help where needed withpronunciation and other types of reading aloud errors [47, 2]. Other work has fo-cused on virtual peers for pedagogical purposes, and tested the effect of havingthe peer model more advanced storytelling behaviors [15, 10, 55]. Storytellingagents have also been explored with robots as reading companions and tutors[17, 36, 65, 49], including studies placing robots in classrooms over extended pe-riod of time [32].

An automated storyteller could potentially tailor the text of a story based onthe needs of the current user. Adjusting vocabulary level or narration point-of-view could result in a more effective and rewarding storytelling session. Due tothe complexities of natural language, however, it is challenging to create robustmethods for automatically editing text in non-trivial ways. Several researchers inthe field of natural language processing have focused efforts on this problem. Oneset of researchers presented methods for automatically generating dialogues frommonologues [51]. In another work, the author presented methods for generatingfull virtual-agent based performances, given only input text [50].

4 H.J. Smith et al.

2.2 Eye Tracking in Multimedia Learning

Multimedia learning materials, which distribute conveyed information acrossmultiple visual and/or audio channels, are widely used and are an effectiveway to foster meaningful learning outcomes in students [42]. In the past, theefficacy of such materials were commonly assessed using post-intervention in-terviews and behavioral assessments [53]. While such techniques are useful tomeasure the overall learning outcomes induced by the materials, they do notprovide the resolution necessary to link detailed behaviors of a participant toon-screen causal elements [43]. Such linkages, and the insights they can provide,may aid in the creation of valuable design principles for different categories ofmultimedia applications.

An alternative to post-assessments is tracking participant gaze behavior. It isa useful measure for understanding how a viewer allocates their visual attentionand how this engagement temporally fluctuates as a function of on-screen events[29]. Analyzing such engagement is particularly useful when developing designguidelines for interactive storytelling applications, whose primary purpose is fos-tering listening comprehension in the viewer. Such applications should engagethe viewer without resorting to seductive details (motions or other stimuli thatare pleasant but distracting, and which do not further comprehension).

While interest in eye tracking has increased rapidly in recent years [3], rel-atively few researchers have studied the eye movement of early grade schoolstudents interacting with multimedia stimuli [46, 60] . Neither study reportedeye tracking movement of students observing the stimuli in an in-use classroom.This may be due to the chaotic nature of such classroom, the expense and sen-sitivity of eye-tracking software, and the difficulty of properly calibrating andcontrolling the behaviors of a young child during a sedentary experiment. Incontrast, the current study focuses on engagement and attention of early gradeschool students within in-use classrooms and makes use of multiple web camerasto record gaze behaviors.

2.3 Gesture

To further engage the child, we will endow the child-like narrator with nonver-bal communication behaviors, as endorsed by the PAL framework [34] and otherrelated work on pedagogical agents and agent personality [11, 37, 41]. Studiesof teacher communication have found a cluster of nonverbal behaviors that areparticularly effective in the teaching context. Termed “immediacy”, these fac-tors generate positive affect and include eye contact, smiling, vocal expressive-ness, physical proximity, leaning towards a person, using appropriate gesturesand being relaxed [58, 5, 6, 33]. They are consistently shown to impact affectivelearning [54, 7, 18], which impacts students predisposition towards material andmotivation to use knowledge [6, 9]. Their impact on cognitive learning is lessclear, with mixed findings [18, 54]. Deictic (or pointing) gestures help groundthe conversation by establishing shared reference [45] and can help children dis-tinguish ambiguous speech sounds [61]. Speech that is accompanied by gesture


leads to better recall than the same speech without gesture [13]. In teaching set-tings, gesture can provide a second representation, and multiple representationsare known to enhance learning [23].

Beat gestures [45] are small, downward movements of the arms and handsthat accompany the cadence of the speech and may add emphasis, but do notconvey clear meaning. They are used in this work to make the characters appearmore alive. Deictic gestures [45] are used to create reference, such as by pointing.Backchanneling, such as head nods and affirmative utterances, are used by thelistener to signal their agreement with the speaker [66]. Conversational turnmanagement in human dialog is largely nonverbal [66], motivating its use here.

3 Method

Participants. Participants from four K-2 classrooms in two schools in theUnited States participated in this experiment. Consent from school administra-tion, classroom teachers, parents, and an institutional review board was obtainedprior to the study. All participants spoke English and had normal or corrected-to-normal vision. In total, 33 participants, 12 girls and 21 boys, were included inthe final analysis. Their ages ranged from 5-8 years old (M = 6.4, SD = 1.05).

Design. The study used a 2x2 experimental design in which every participantobserved all four stimuli combinations. A within-subjects design was used to min-imize sources of non task-related variance, such as participant’s base attentionspans or moods on the day of the experiment. The first factor was StorytellingPerspective, which employed a Narrator Only level and a Distributed level.The second factor was Gesture Types, which employed a Complex Gesturelevel and a Simple Gesture level. A single story was used for each conditioncombination (see Fig.2).

Materials. Four Aesop’s Fables were selected for use in the experiment. Aesop’sFables are commonly used in studies on (oral) narrative comprehension and areoften used in teaching materials for the K-2 age group. We selected four fablesthat could be animated using Narrator, Fox and Crow characters. These wereThe Fox and the Grapes, The Fox and the Crow, The Dog and His Shadow andThe Crow and the Pitcher. The fable The Dog and his Shadow was converted toThe Fox and his Shadow in order to use the Fox’s character model and gestures.

The original text of the stories came from the versions of Aesop’s Fablesdistributed as part of Elson’s Drama Bank [21, 1]. For each story we produced(by hand) a version of the story with simpler sentences and simpler vocabulary:these story versions were double-checked by a learning scientist for their ageappropriateness. Because all the original stories are presented in third personby a narrator, we used the Fabula tales natural language generation engine togenerate first person direct versions of story sentences for half of the stories [39,40].

6 H.J. Smith et al.

Fig. 2. Overview of the conditions, along with story names, example images, and storytext. Example image in row A shows the Question Gesture and example images in rowB show Nonverbal Turnover Gestures.

Fig. 2 provides examples of how each story was told and how the contentwas distributed amongst the characters. In the Narrator Only condition theNarrator recounted the entire story (see Rows a and c of Fig. 2). The Narratorrefers to the story characters in third person, and all utterances and gesturesare produced by the Narrator. The story characters appear on the screen but donot speak.

The first person, direct speech, versions of the stories are used in the Dis-tributed condition, and thus the story telling is split between the onscreencharacters (Rows b and d of Fig. 2). The Narrator only produces the utterancesthat describe actions. Utterances that provide content for character speech andthought are converted to first person direct speech and spoken by the characterto whom the speech or thought is attributed, e.g. What a beautiful bird I see!Nobody is as beautiful ... in Row b of Fig. 2.


While all stories employed character blinks, idle breathing motions, and mi-nor head/arm beat gestures, the Complex Gesture condition included threedifferent types of gestures not present in the Simple Gesture condition: ques-tion prompt gestures, deictic gestures, and nonverbal turnover gestures. Questionprompt gestures (see example image of Fig.2-a) were performed by the Narra-tor while she verbally prompted the participants for questions about the story(”Now tell me, do you have any questions about the story?”); in the SimpleGesture condition, the Narrator only verbally prompted the participants. Inthe Narrator Only, Complex Gesture condition, the Narrator used two de-ictic gestures, pointing towards the Fox, while verbally referring to him. The formof this gesture was identical to the nonverbal turnover gesture demonstrated bythe Narrator in Fig.2-b.

In the Distributed, Complex Gesture condition, characters performedconversational turnover gestures after they finished speaking, visually indicatingwhich character would speak next (see example images in Fig.2-b). In all storiesthere was a pause of 1.2 seconds between when one character stopped speakingand the next character began. When present, the conversational turnover ges-tures began as the character finished talking and took 0.75 seconds, leaving 0.5seconds before the next character began to speak.

Stories were presented using a custom-built Unity application. The charac-ters, story text, and gestures were provided as input to the system. AWS PollyText-to-Speech was used to obtain speech audio and the viseme informationnecessary to drive character lip syncing behavior. At the end of each story, theNarrator would prompt the participant for questions about the story. Duringthis period, the researcher used an external keyboard to control the Narrator ina Wizard of Oz fashion, triggering verbal and nonverbal backchanneling behav-iors. After the child was finished asking questions, the researcher initiated thenext story.

Procedure. Stimuli were shown on a Dell Precision laptop with 17 inch screenin a partially secluded classroom corner. Despite this separation, other studentswould sometimes distract the participant with their presence, actions, and noises.This environment therefore contained the same types of distractions that a childwould experience while reading or working in school.

Upon starting the experiment, each participant watched an introductory seg-ment in which the Narrator introduced herself, explained that she would betelling stories, and invited the participant to ask questions at the end of eachstory. Then all four stories were shown sequentially. Order was randomized tocontrol for ordering effects. At the end of each story, the Narrator promptedthe participant to ask any questions they had about about the story. The entireprocedure took, on average, 3.5 minutes. For an example screen recording show-ing the experimental stimuli presented to participants, please visit the followinglink: https://youtu.be/HEeQica-xHY.

8 H.J. Smith et al.

Measures. Due to the in-classroom nature of our experiment, expensive, sen-sitive eye-tracking hardware was avoided. Rather, two webcams were positionedaround the perimeter of the laptop screen to record the gaze behaviors of the par-ticipant (see Fig.1-Left) for post-hoc annotation. Simultaneously, Open Broad-cast Studio was used to record the contents of the screen. Taken together, thisinformation was sufficient to determine when a participant was looking at thestimuli and at which character they were looking. See Fig.1-Right for an exampleof the resulting video. The webcams also captured the questions each participantasked at the conclusion of each story.

Gaze Annotation. Two undergraduate annotators were hired to annotate gazebehaviors and transcribe the utterances of each participant. Based on the syncedscreen recording and dual webcam footage, annotators identified the partici-pant’s area of focus throughout the duration of the experiment by labeling itwith one of four categories: Narrator, Fox, Crow, and Non-Task. Non-Task wasused when the participant was not looking at any of the characters on the screen.

The data from one participant was used to train the annotators; both anno-tators, along with the lead researcher, collectively discussed and annotated thegaze behavior. Next, data from six participants (21 minutes, 19% of the remain-ing data) was independently annotated by each annotator. Inter-rater reliabilitywas very high (observed agreement was 97% and Cohen’s kappa was 0.93), sodata from the remaining 26 participants was split between the annotators.

4 Results

4.1 Visual Attention

Attention To Story. Using the gaze annotations, it was possible to determinethe percentage of time participants were actively observing each story (viewinga character versus viewing a Non-Task category). These are shown in Fig.3.Summary attention statistics are given in Table 1.

To assess whether attention differed significantly as a function of condition,we conducted a Friedman test of differences using the single factor of ‘Condi-tion’ with four levels. While a repeated measures ANOVA is commonly used in2x2 within-factors designs, the percentage values analyzed were not normallydistributed, and thus the non-parametric Friedman test was used instead. Thetest rendered a Chi-square value of 9.13, which was significant (p=0.02). Post-hoc analysis using multiple Wilcox signed-rank tests with Bonferroni correctionrevealed multiple significant differences (Fig.2, left). Distributed, ComplexGesture was significantly higher than both Narrator Only, Simple Ges-ture (padj < 0.01) and Narrator Only, Complex Gesture (padj = 0.02).Distributed, Simple Gesture was significantly higher than Narrator Only,Simple Gesture (padj < 0.01) and almost significantly higher than NarratorOnly, Complex Gesture (padj = 0.09). Other differences were not significant.


Using the same technique, we evaluated the effect of order on attention (Fig.2,right). As might be expected, attention wanes over time. Attention to the firststory was significantly higher than to the third (padj = 0.008) and fourth (padj =0.006) story, and marginally higher than to the second story (padj = 0.10).

Table 1. Left: Summary statistics on the amount of attention paid to each story asa function of condition. Right: Summary statistics on the amount of attention paid toeach story as a function of story order.

Condition Order

Narrator,Complex

Narrator,Simple

Distributed,Complex

Distributed,Simple

1 2 3 4

Mean 78.6% 79.5% 91.7% 89.9% 91.4% 85.2% 82.6% 80.5%StandardDeviation

19.3 21.8 8.0 7.9 10.8% 13.5% 18.6% 19.9%

Fig. 3. The percentage of time students gazed at the Narrator, Fox, or Crow (as op-posed to Non-Task) as a function of story condition. Error bars indicate standard errorof the mean. Results significant at padj < 0.05 denoted by asterisk, result approachingsignificance at this level denoted by dot.

Gaze Behavior during Conversational Turnovers. We next used gaze in-formation to evaluate differences in the amount of time it took participants to fo-cus on the next speaker after a conversational turnover. Because these turnoversonly occur when two or more characters take turns speaking, this analysis wasconducted only on data obtained from the Distributed conditions.

For each conversational turnover, we determined the time at which the newcharacter began to speak. We then calculated, relative to this point, the amount

10 H.J. Smith et al.

of time it took each participant to first glance at the new speaker. This valuewas positive if the speaker began talking before the participant looked to themand negative if the participant looked to the speaker before they began to speak.

Using these values, an independent samples t-test was conducted to comparethe differences in gaze switching time between the Distributed, ComplexGesture condition and the Distributed, Simple Gesture condition. Theresults are shown in Table 2, top. There was a significant difference betweenthese two conditions, with participants taking longer to switch their gaze to thenew character in the Distributed, Complex Gesture condition.

Table 2. Summary statistics of the amount of time, in seconds, it took participants toswitch their gaze to a new speaker after that speaker first began their conversationalturn. The top row compares instances in which turnover gestures were present toinstances in which the gesture was absent. The bottom row compares turnovers to theNarrator with turnovers to the Fox or the Crow.

MeanStandardDeviation

P Value T Statistic

Distributed,Complex Gesture

0.71 1.470.004 2.88

Distributed,Simple Gesture

0.19 1.86

Turnover toNarrator

1.15 1.64<0.001 7.72

Turnover toFox or Crow

-0.08 1.55

Fig. 4. Cumulative distributions functions showing the percentage of participants wholooked to the next speaker relative to when the speaker began talking. Each line indi-cates a single conversational turnover from the story.


Table 3. Count of questions asked, separated by Gesture condition level.

QuestionAsked

No QuestionAsked

P ValueChi Squared

Statistic

Complex Gesture 18 480.052 3.77

Simple Gesture 9 57

After visual inspection of the cumulative distributions of gaze switching vs.time (as shown in Fig.4), it appeared that, when the Narrator took over speaking,participants turned their gaze back to her more slowly and less frequently thatwith the other two characters. We therefore conducted a t-test to determineif this difference was significant. The results are shown in Table 2, bottom.Participants took significantly longer to turn their gaze back to the speakerwhen the new speaker was the Narrator (p < 0.001). This could be because theparticipants were less interested in the Narrator (as they see her in every story),participants were more intrigued by the animal characters, and/or participantswere more interested in the story characters.

4.2 Question Analysis

Question Frequency. In this study, we only elicited questions from partici-pants at the end of the story. This protocol created 132 possible question op-portunities and resulted in 27 questions. 17 participants asked no questions,eight asked one, six asked two, one asked three, and one asked four. To assesswhether the Complex Gesture condition (and the question prompt gestureit contained) influenced participant’s tendency to ask questions, we performeda chi-squared independence test. The results are given in Table 3. While thep = 0.05 level of significance was narrowly missed, this could be due to the smalltotal number of questions collected. A larger sampling may reveal that the ques-tion prompt gesture is a clear visual indicator encouraging children to interactwith the system.

Question Type. We conducted an analysis of the types of questions the chil-dren asked in order to determine the needed future capabilities of a conversa-tional storytelling system that can answer questions as the story unfolds. Weexpected questions about comprehension, and two main types: (1) questionsbased on understanding the meaning of sentences, based on vocabulary or syn-tax within a sentence; (2) questions based on inferring causality, since that isa key part of understanding narrative [12, 22, 24]. Our goal is to support thesekind of questions from students in a future version of our system, as well as toadd question categories based on these to the narrator’s repertoire. Examplesare shown in Table 4. Q1 illustrates a comprehension question. We expect thesewould be more frequent if we allowed questions as the story unfolds. There were19 why questions of different types. Questions Q2, Q3 and Q4 illustrate the 12why questions related to causal understanding about how the world works or


Table 4. Example Questions from Participants

ID N Type Example

Q1 1 Comprehension What is a vine? (vocabulary)

Q2 12 Why, Why did the Fox try to get the grapes? (hungry)Q3 Causal Chain Why did the Fox get the cheese?Q4 Why did she put rocks in the water? That sounds gross.

Q5 5 Why, BackStory How did the Crow get the cheese?Q6 Why was the Crow so thirsty?

Q7 5 What Next Is he going to get the water?Q8 The Fox will eat the bird?

Q9 2 Why, Storyline Why wouldn’t the Fox know that it was his reflection?Q10 How is the Fox able to listen to the bird sing when a bird

can only chirp?

failure to fill in implicit actions or state changes. Q2 illustrates a very simplecausal inference: the Fox is described as hungry but two participants asked whythe Fox wanted the grapes, while the others involve complex causal reasoning.The other question types target unexpected competencies that would be hard tosupport in our future conversational storyteller. Q5 and Q6 illustrate the 5 ques-tions about the back-story, about how the situation came to be at the start ofthe narrative, which is not part of the story content. There were also 5 questionsabout what might happen in the story world after the end of the story (WhatNext): this is illustrated by questions Q7 and Q8. This could partly be due to thefact that we only asked questions at the end of the story. Finally, in Q9 and Q10the participants question presuppositions of the story, i.e. that a Fox wouldn’trecognize his reflection, and that birds can sing, rather than simply chirp.

5 Conclusion

The greater visual attention children paid to stories presented in first-person bystory characters, in addition to the narrator, suggest that such distribution ofstorytelling may be an effective approach for building engagement. Gaze analysisalso showed that children switched attention more quickly to story charactersthan to the narrator. The use of intentional gestures presents a mixed picture.It appears that gestures to the child are helpful in eliciting questions. Gesturesfor conversational turn management appeared to hold children’s interest, ratherthan directing them to the next character to speak.

Children did ask questions of the system some of the time and these werefrequently why questions. In future work we plan to elicit questions and askquestions during the storytelling at particular story points, rather than simplyat the end of the story. We expect this to increase children’s engagement withthe story, and hopefully increase their narrative comprehension. We also wish tostudy deixis in cases where it is non-redundant with the text.

Acknowledgements: This work was supported through NSF grant IIS-1748058.


References

1. Aesop, Jones, V.S.V., Rackham, A., Chesterton, G.K.: Aesop’s Fables: A NewTranslation. W. Heinemann (1933)

2. Aist, G., Kort, B., Reilly, R., Mostow, J., Picard, R.: Experimentally augmentingan intelligent tutoring system with human-supplied capabilities: adding human-provided emotional scaffolding to an automated reading tutor that listens. In:Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conferenceon. pp. 483–490. IEEE (2002)

3. Alemdag, E., Cagiltay, K.: A systematic review of eye tracking research on multi-media learning. Computers & Education 125, 413–428 (2018)

4. Alexander, K.L., Entwisle, D.R., Dauber, S.L.: First-grade classroom behavior:Its short-and long-term consequences for school performance. Child development64(3), 801–814 (1993)

5. Andersen, J.F.: The relationship between teacher immediacy and teaching effec-tiveness. Ph.D. thesis, ProQuest Information & Learning (1979)

6. Andersen, J.F.: Instructor nonverbal communication: Listening to our silent mes-sages. New Directions for Teaching and Learning 1986(26), 41–49 (1986)

7. Andersen, J.F., Norton, R.W., Nussbaum, J.F.: Three investigations exploring re-lationships between perceived teacher communication behaviors and student learn-ing. Communication Education 30(4), 377–392 (1981)

8. August, D., Hakuta, K.: Improving schooling for language minority students: Aresearch agenda. Improving Schooling for Language Minority Students: A ResearchAgenda (1997)

9. Baylor, A.L.: Promoting motivation with virtual agents and avatars: role of vi-sual presence and appearance. Philosophical Transactions of the Royal Society ofLondon B: Biological Sciences 364(1535), 3559–3565 (2009)

10. Baylor, A.L., Kim, Y.: Pedagogical agent design: The impact of agent realism,gender, ethnicity, and instructional role. In: Intelligent tutoring systems. vol. 3220,pp. 592–603. Springer (2004)

11. Baylor, A.L., Ryu, J.: The effects of image and animation in enhancing pedagogicalagent persona. Journal of Educational Computing Research 28(4), 373–394 (2003)

12. Bloom, C.P., Fletcher, C.R., Broek, P.V.D., Reitz, L., Shapiro, B.P.: An on-lineassessment of causal reasoning during comprehension. Journal of Memory andCognition (1990)

13. Breckinridge Church, R., Garber, P., Rogalski, K.: The role of gesture in memoryand social communication. Gesture 7(2), 137–158 (2007)

14. Brooks-Gunn, J., Duncan, G.J.: The effects of poverty on children. The future ofchildren 7, 55–71 (1997)

15. Cassell, J.: Towards a model of technology and literacy development: Story listen-ing systems. Journal of Applied Developmental Psychology 25(1), 75–105 (2004)

16. Catts, H.W., Adlof, S.M., Hogan, T.P., Weismer, S.E.: Are specific language im-pairment and dyslexia distinct disorders? Journal of Speech, Language, and Hear-ing Research 48(6), 1378–1396 (2005)

17. Chang, A., Breazeal, C.: Tinkrbook: shared reading interfaces for storytelling. In:Proceedings of the 10th International Conference on Interaction Design and Chil-dren. pp. 145–148. ACM (2011)

18. Chesebro, J.L.: Effects of teacher clarity and nonverbal immediacy on studentlearning, receiver apprehension, and affect. Communication Education 52(2), 135–147 (2003)


19. Cunningham, A.E., Stanovich, K.E.: Early reading acquisition and its relation toreading experience and ability 10 years later. Developmental psychology 33(6),934 (1997)

20. Duncan, G.J., Dowsett, C.J., Claessens, A., Magnuson, K., Huston, A.C., Kle-banov, P., Pagani, L.S., Feinstein, L., Engel, M., Brooks-Gunn, J., et al.: Schoolreadiness and later achievement. Developmental psychology 43(6), 1428 (2007)

21. Elson, D.: Dramabank: Annotating agency in narrative discourse. In: LREC. pp.2813–2819 (2012)

22. Fletcher, C.R., Hummel, J.E., Marsolek, C.J.: Causality and the allocation of at-tention during comprehension. Journal of Experimental Psychology 16(2), 233–140(1990)

23. Goldin-Meadow, S., Singer, M.A.: From children’s hands to adults’ ears: gesture’srole in the learning process. Developmental psychology 39(3), 509 (2003)

24. Graesser, A.C., Singer, M., Trabasso, T.: Constructing inferences during narrativetext comprehension. Psychological review 101(3), 371 (1994)

25. Hart, B., Risley, T.R.: Meaningful differences in the everyday experience of youngAmerican children. Paul H Brookes Publishing (1995)

26. Hoff, E.: Environmental supports for language acquisition. Handbook of early lit-eracy research 2, 163–172 (2006)

27. Hoff, E., Naigles, L.: How children use input to acquire a lexicon. Child development73(2), 418–433 (2002)

28. Hoover, W.A., Gough, P.B.: The simple view of reading. Reading and writing 2(2),127–160 (1990)

29. Hyona, J.: The use of eye movements in the study of multimedia learning. Learningand Instruction 20(2), 172–176 (2010)

30. Juel, C.: Learning to read and write: A longitudinal study of 54 children from firstthrough fourth grades. Journal of educational Psychology 80(4), 437 (1988)

31. Juel, C., Griffith, P.L., Gough, P.B.: Acquisition of literacy: A longitudinal studyof children in first and second grade. Journal of educational psychology 78(4), 243(1986)

32. Kanda, T., Hirano, T., Eaton, D., Ishiguro, H.: Interactive robots as social partnersand peer tutors for children: A field trial. Human-computer interaction 19(1), 61–84 (2004)

33. Kennedy, J., Baxter, P., Belpaeme, T.: Nonverbal immediacy as a characterisationof social behaviour for human–robot interaction. International Journal of SocialRobotics 9(1), 109–128 (2017)

34. Kim, Y., Baylor, A.L.: A social-cognitive framework for pedagogical agents as learn-ing companions. Educational technology research and development 54(6), 569–596(2006)

35. Korenman, S., Miller, J.E., Sjaastad, J.E.: Long-term poverty and child develop-ment in the united states: Results from the nlsy. Children and Youth ServicesReview 17(1-2), 127–155 (1995)

36. Kory, J., Breazeal, C.: Storytelling with robots: Learning companions for preschoolchildren’s language development. In: Robot and Human Interactive Communica-tion, 2014 RO-MAN: The 23rd IEEE International Symposium on. pp. 643–648.IEEE (2014)

37. Lee, K.M., Peng, W., Jin, S.A., Yan, C.: Can robots manifest personality?: Anempirical test of personality recognition, social responses, and social presence inhuman–robot interaction. Journal of communication 56(4), 754–772 (2006)

38. Literacy, D.E.: Report of the national early literacy panel washington. DC NationalInstitute for Literacy (2008)


39. Lukin, S.M., Reed, L.I., Walker, M.: Generating sentence planning variations forstory telling. In: 16th Annual Meeting of the Special Interest Group on Discourseand Dialogue. p. 188 (2015)

40. Lukin, S.M., Walker, M.A.: A narrative sentence planner and structurer for do-main independent, parameterizable storytelling. Dialogue & Discourse 10(1), 34–86(2019)

41. Mairesse, F., Walker, M.A.: Towards personality-based user adaptation: psycho-logically informed stylistic language generation. User Modeling and User-AdaptedInteraction 20(3), 227–278 (2010)

42. Mayer, R.E.: Cognitive theory of multimedia learning. The Cambridge handbookof multimedia learning 41, 31–48 (2005)

43. Mayer, R.E.: Using multimedia for e-learning. Journal of Computer Assisted Learn-ing 33(5), 403–423 (2017)

44. McLoyd, V.C.: Socioeconomic disadvantage and child development. American psy-chologist 53(2), 185 (1998)

45. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. Universityof Chicago Press, Chicago (1992)

46. Molina, A.I., Navarro, s., Ortega, M., Lacruz, M.: Evaluating multimedialearning materials in primary education using eye tracking. Comput. Stand.Interfaces 59(C), 45–60 (Aug 2018). https://doi.org/10.1016/j.csi.2018.02.004,https://doi.org/10.1016/j.csi.2018.02.004

47. Mostow, J., Aist, G., Burkhead, P., Corbett, A., Cuneo, A., Eitelman, S., Huang,C., Junker, B., Sklar, M.B., Tobin, B.: Evaluation of an automated reading tutorthat listens: Comparison to human tutoring and classroom instruction. Journal ofEducational Computing Research 29(1), 61–117 (2003)

48. Network, N.E.C.C.R., et al.: Child care and child development: Results from theNICHD study of early child care and youth development. Guilford Press (2005)

49. Park, H.W., Gelsomini, M., Lee, J.J., Breazeal, C.: Telling stories to robots: Theeffect of backchanneling on a child’s storytelling. In: Proceedings of the 2017ACM/IEEE International Conference on Human-Robot Interaction. pp. 100–108.ACM (2017)

50. Piwek, P., Hernault, H., Prendinger, H., Ishizuka, M.: T2d: Generating dialoguesbetween virtual agents automatically from text. In: International Workshop onIntelligent Virtual Agents. pp. 161–174. Springer (2007)

51. Piwek, P., Stoyanchev, S.: Generating expository dialogue from monologue: moti-vation, corpus and preliminary rules. In: Human Language Technologies: The 2010Annual Conference of the North American Chapter of the Association for Compu-tational Linguistics. pp. 333–336. Association for Computational Linguistics (2010)

52. Ritchie, S.J., Bates, T.C.: Enduring links from childhood mathematics and readingachievement to adult socioeconomic status. Psychological science 24(7), 1301–1308(2013)

53. Rodrigues, P., Rosa, P.J.: Eye-tracking as a research methodology in educationalcontext: a spanning framework. In: Eye-tracking technology applications in educa-tional research, pp. 1–26. IGI Global (2017)

54. Rodrıguez, J.I., Plax, T.G., Kearney, P.: Clarifying the relationship betweenteacher nonverbal immediacy and student cognitive learning: Affective learningas the central causal mediator. Communication education 45(4), 293–305 (1996)

55. Ryokai, K., Vaucelle, C., Cassell, J.: Virtual peers as partners in storytelling andliteracy learning. Journal of computer assisted learning 19(2), 195–208 (2003)


56. Snow, C.E., Burns, M.S., Griffin, P.: Preventing reading difficulties in young chil-dren committee on the prevention of reading difficulties in young children. Wash-ington, DC: National Research Council (1998)

57. Sperry, D.E., Sperry, L.L., Miller, P.J.: Reexamining the verbal en-vironments of children from different socioeconomic backgrounds. ChildDevelopment 90(4), 1303–1318 (2019). https://doi.org/10.1111/cdev.13072,https://onlinelibrary.wiley.com/doi/abs/10.1111/cdev.13072

58. Staudte, M., Crocker, M.W., Heloir, A., Kipp, M.: The influence of speaker gaze onlistener comprehension: Contrasting visual versus intentional accounts. Cognition133(1), 317–328 (2014)

59. Storch, S.A., Whitehurst, G.J.: Oral language and code-related precursors to read-ing: evidence from a longitudinal structural model. Developmental psychology38(6), 934 (2002)

60. Takacs, Z.K., Bus, A.G.: Benefits of motion in animated storybooks for children’svisual attention and story comprehension. an eye-tracking study. Frontiers in psy-chology 7, 1591 (2016)

61. Thompson, L.A., Massaro, D.W.: Children‘ s integration of speech and pointinggestures in comprehension. Journal of Experimental Child Psychology 57(3), 327–354 (1994)

62. Torgesen, J.K.: Avoiding the devastating downward spiral: The evidence that earlyintervention prevents reading failure. American Educator 28(3), 6–19 (2004)

63. Vellutino, F.R., Scanlon, D.M., Zhang, H.: Identifying reading disability based onresponse to intervention: Evidence from early intervention research. In: Handbookof response to intervention, pp. 185–211. Springer (2007)

64. Wertheimer, R.F., Moore, K.A., Hair, E.C., Croan, T.: Attending kindergarten andalready behind: A statistical portrait of vulnerable young children. Child TrendsWashington, DC (2003)

65. Westlund, J.K., Breazeal, C.: The interplay of robot language level with chil-dren’s language learning during storytelling. In: Proceedings of the Tenth AnnualACM/IEEE International Conference on Human-Robot Interaction Extended Ab-stracts. pp. 65–66. ACM (2015)

66. Whittaker, S.: Theories and methods in mediated communication: Steve whittaker.In: Handbook of discourse processes, pp. 246–289. Routledge (2003)

67. Zill, N.: Promoting educational equity and excellence in kindergarten. The transi-tion to kindergarten pp. 67–105 (1999)

Documents

The Impact of Multi-Character Story Distribution and ...neff/papers/StoryEngagement.pdf · capture overlays are added in a post-processing step to provide additional context for the