Upload
jihan
View
61
Download
1
Tags:
Embed Size (px)
DESCRIPTION
USI module U1-5 Multimodal interaction. Jacques Terken USI module U1, lecture 5. Contents. Demos and video clips Multimodal behaviour Multimodal interaction, architecture and multimodal fusion Design heuristics, guidelines and tools. http://www.nuance.com/xmode/demo/# - PowerPoint PPT Presentation
Citation preview
USI module U1-5Multimodal interaction Jacques TerkenUSI module U1, lecture 5
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
2
Contents• Demos and video clips• Multimodal behaviour• Multimodal interaction, architecture and
multimodal fusion• Design heuristics, guidelines and tools
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
3
• http://www.nuance.com/xmode/demo/#• http://www.csee.ogi.edu/CHCC/ (Video
Quickset)• RASA (combination of tangible and multimodal
interaction)
• May be also of interest– http://www.gvu.gatech.edu/gvu/events/
demo-days/2001/demos010930.html– http://ligwww.epfl.ch/~thalmann/
research.html
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
4
Quickset ipaq (ogi – chcc)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
5
Multimodal behaviour • Development of multimodal systems
dependent on knowledge about natural integration patterns that are characteristic for the combined use of different modalities
• Dealing with myths about multimodal interaction:– Oviatt, S.L., “Ten myths of Multimodal interaction”,
Communications of the ACM 42(11), 1999, pp.74-81
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
6
Myth 1: If you build a multimodal system, users will interact multimodally.
Dependent on domain:• Spatial domain: 95-100% of the users have a
preference for multimodal interaction; • Other domains: 20% of the commands are
multimodalDependent on type of action:• High MM: adding, moving, modifying objects,
calculating distance between objects• Low MM: printing, scrolling etc.
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
7Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
8
• Distinction between general, selective and spatial actions
• General: non-object-directed actions (printing etc.)
• Selective: choosing objects • Spatial: manipulation of objects ( adding etc.)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
9
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
10
myth 2: Speech & pointing is the dominant multimodal integration pattern.
• Central in Bolt’s speak-and-point interface (“put that there”
• Speak-and-point includes only 14% of spontaneous multimodal actions
• In human communication pointing accounts for appr. 20% of all gestures
• Other actions: handwriting, hand gestures, facial expressions (“Rich” interaction)
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
11
myth 3: Multimodal input involves simultaneous signals.
Multi-Modal Interaction (0H640)
• Information from different modalities is often sequential
• Often gestures precede speech
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
12
myth 4: Speech is the primary input mode in any multimodal system that includes it, and gestures, head and body movement, gaze direction and other input are secondary
• Often speech cannot contain all information (cf. combination of pen + speech)
• Gestures are better for some kinds of information
• Often gestures indicate the context for speech
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
13
myth 5: Multimodal language does not differ linguistically from unimodal language.
• Users often avoid complicated commands in multimodal interaction
• Multimodal language is often shorter, syntactically more simple, and more fluent– Unimodal: “place a boat dock on the east, no,
west end of reward lake”– Multimodal: [draws rectangle] “add rectangle”
• Multimodal language more easy to process– Less anaphora and indirectness
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
14
myth 6: Multimodal integration involves redundancy of content between modes.
• Different modalities contribute complementary information:– Speech: subject, object, verb (objects,
actions/operations): – Gesture: Location (spatial info)
• Even in the case of correction only 1% redundancy
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
15
myth 7: Individual error-prone recognition technologies combine multimodally to produce even greater unreliability.
• Combination of inputs enables mutual disambiguation
• Users choose the least error-prone modality (“leveraging from users’ natural intelligence about when and how to deploy input modes effectively”)
• Combination of error-prone modalities gives in fact a more stable system
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
16
myth 8: All users’ multimodal commands are integrated in a uniform way.
• Differences between people• Consistent use within people
• Advance detection of integration pattern can result in better recognition
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
17
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
18
myth 9: Different input modes are capable of transmitting comparable content (alt-mode hypothesis).
• Differences between modalities:– Type of information– Functionality during communication– Accuracy of expression – Manner of integration with other modalities
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
19
myth 10: Enhanced speed and efficiency are the main advantages of multimodal systems.
Applies indeed (to a limited extent) for the spatial domain:
• In multimodal pen/speech interaction speed increase with app. 10%
More important advantages in other domains:• Decrease in errors an non-fluent speech with 35-50%• Possibility of choice of input:
– Less chance of fatigue per modality– Better opportunities for repair– Larger range of users
Multi-Modal Interaction (0H640)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
20
Advantages: Robustness • Individual signal processing technologies error-
prone• Integration of complementary modalities to yield
synergy, capitalizing on the strength of each modality and overcoming weaknesses in the other– Users will select the input mode that they
consider less error prone for particular lexical content
– User’s language is simplified when interacting multimodally
– Users tend to switch modes after system errors, facilitating error recovery
– Users report less frustration when interacting multimodally (greater sense of control)
– Mutual compensation/disambiguation
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
21
W3C (see http://www.w3.org/TR/mmi-reqs/ ): Seen from the perspective of the system (how the input
is handled)• Sequential multimodal input
Modality A for action a, next Modality B for action b, each event handled as a separate event
• Simultaneous (Uncoordinated) multimodal inputEach event handled as a separate event. Choice between different modalities at each moment in time
• Composite (coordinated simultaneous) multimodal inputEvents integrated into a single event before interpretation. (“true” multimodality)
Technologies: Types of multimodality
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
22
Sequential Simultaneous
Non-coordinated(W3C: supplementary)
Exclusive(W3C: Sequential)
Concurrent(W3C: simultaneous)
Coordinated(W3C: complementary)
alternate Synergistic(W3C: composite)
Coutaz & Nigay
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
23
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
24
Mutual disambiguation (MD)• Speech input: n-best list
1. Ditch2. Ditches
• Gestural input
• Joint interpretation:1. Ditches
• Benefit may be dependent on situation (e.g. larger for non-native speakers)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
25
Early fusion• Closely coupled and synchronized modalities
such as speech and lip movements• “Feature level” fusion• Based on multiple Hidden Markov Models or
temporal neural networks. Correlation structure between modes can be taken into account automatically via learning
• Problems: modelling complexity, computational intensity, training difficulty
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
26
Late fusion• “Semantic level” fusion• Individual recognizers • Sequential integration• Advantages: scalable – individual recognizers
don’t need to be retrained
• Early approaches: multimodal command’s posterior probability is the cross-product of the posterior probabilities of the associated constituents No advantage taken from mutual compensation phenomenon
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
27
Architectural requirements for late semantic fusion• Fine-grained timestamping• Sequentially-integrated or simultaneously
delivered• Common representational format for different
modalities• Frame based (multimodal fusion through
unification of feature-structures) Mutual disambiguation
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
28
Unificationutterance gesture
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
29
Design of multimodal interfaces1. Task analysis
What are the actions that need to be performed?
2. Task allocationWhat party is the most suitable candidate for performing particular actions?
3. Modality allocationWhat modality or combination of modalities is most suited to perform particular actions?
Current presentation focuses on 3
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
30
Definition of ‘modality’• Modality as sensory channel
However, stating that particular numeric information should be presented in the visual modality provides little grip
• Hence, the notion of ‘representational modality’ has been proposed (Bernsen), which distinguishes e.g. table and graph as two different modalities
• For the time being, we use ‘modality’ in the more restricted sense of sensory channel, and look for mappings between actions and modalities
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
31
Relevant dimensions• Nature of the information• Interaction paradigm• Physical and dialogue context• Platform• Accessibility• Multitasking
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
32
Rules of thumb, heuristics• Michaelis and Wiggins (1982)• Cohen and Oviatt (1994)• Suhm (2000)• Larsson (2003)• Reeves, Lai et al. (2004)
• For references see Terken J. “Guidelines and Tools for the Design of Multimodal Interfaces”, Workshop ASIDE2005, Aalborg (DK)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
33
Michaelis and Wiggins (1982)• Speech generation is preferable when the
– message is short.– message will not be referred to later.– messages deal with events in time.– message requires an immediate response.– visual channels of communication are overloaded.– environment is too brightly lit, too poorly lit, subject
to severe vibration, or otherwise unsuitable for transmission of visual information.
– user must be free to move around.– user is subjected to high G forces or anoxia.
• Tentative guidelines for when NOT to use speech may be derived from these suggestions through negation.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
34
Cohen and Oviatt (1994)• spoken communication with machines (both
input and output) may be advantageous:– when the user’s hands or eyes are busy– when only limited keyboard and/or screen is
available– when the user is disabled– when pronunciation is the subject matter of
computer use– when natural language interaction is
preferred
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
35
Suhm (2000)Principles for choosing the set of modalities2. Consider speech input for entry of textual data, dialogue-
oriented tasks, and command control. Speech input is generally less efficient for navigation, manipulation of image data. and resolution of object references.
3. Consider written input for corrections, entry of digits, and entry of graphical data (formulas, sketches, etc.)
4. Consider gesture input for indicating scope or type of commands, for resolving deictic object references
5. Consider the traditional modalities (keyboard and mouse input) as alternative, unless superiority of novel modalities (speech, pen input) is proven.
• Principles to circumvent limitations of recognition technology
• Principles for the implementation of Pen-Speech Interfaces
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
36
Larsson (2003)• Satisfy Real-world Constraints
– Task-oriented Guidelines – Physical Guidelines – Environmental Guidelines
• Communicate Clearly, Concisely, and Consistently with Users– Consistency Guidelines – Organizational Guidelines
• Help Users Recover Quickly and Efficiently from Errors– Conversational Guidelines– Reliability Guidelines
• Make Users Comfortable– System Status – Human-memory Constraints – Social Guidelines – …
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
37
Reeves, Lai et al. (2004)Propose a set of multimodal design principles
that are founded in perception and cognition science (but motivation remains implicit)
Four general areas• Designing multimodal input and output• Adaptivity• Consistency• Feedback• Error prevention/handling
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
38
Designing Multimodal Input and Output• Maximize human cognitive and physical abilities.
Designers need to determine how to support intuitive, streamlined interactions based on users' human information processing abilities (including attention, working memory, and decision making) for example:– Avoid unnecessarily presenting information in two
different modalities in cases where the user must simultaneously attend to both sources to comprehend the material being presented; such redundancy can increase cognitive load at the cost of learning the material.
– Maximize the advantages of each modality to reduce user's memory load in certain tasks and situations;
– System visual presentation coupled with user manual input for spatial information and parallel processing;
– System auditory presentation coupled with user speech input for state information, serial processing, attention alerting, or issuing commands.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
39
• Integrate modalities in a manner compatible with user preferences, context, and system functionality. Additional modalities should be added to the system only if they improve satisfaction, efficiency, or other aspects of performance for a given user and context. When using multiple modalities:– Match output to acceptable user input style (for example,
if the user is constrained by a set grammar, do not design a virtual agent to use unconstrained natural language);
– Use multimodal cues to improve collaborative speech (for example, a virtual agent's gaze direction or gesture can guide user turn-taking);
– Ensure system output modalities are well synchronized temporally (for example, map-based display and spoken directions, or virtual display and non-speech audio);
– Ensure that the current system interaction state is shared across modalities and that appropriate information is displayed in order to support: • Users in choosing alternative interaction modalities; • Multidevice and distributed interaction;
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
40
3. Theoretical approaches• Modality theory (Bernsen c.s.)
‘Modality’ defined as ‘representational modality’
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
41
Modality theory (Bernsen)Aim• Given any particular class of task domain information
which needs to be exchanged between user and system during task performance, identify the set of input/output modalities which constitute an optimal solution to the representation and exchange of that information (Bernsen, 2001).
• Taxonomic analyses: – (representational) Input and output modalities are
characterized in terms of a limited number basic features such as
– linguistic/nonlinguistic, – analogue/non-analogue, – arbitrary/nonarbitrary, – static/dynamic.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
42
• Modality properties can then be applied according to the following procedure:1. Requirements Specification >2. Modality Properties + Natural
Intelligence >3. Advice/Insight with respect to modality
choice.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
43
• [MP1] Linguistic input/output modalities have interpretational scope, which makes them eminently suited for conveying abstract information. They are therefore unsuited for conveying high-specificity information including detailed information on spatial manipulation and location.
• [MP2] Linguistic input/output modalities, being unsuited for specifying detailed information on spatial manipulation, lack an adequate vocabulary for describing the manipulations.
• [MP3] Arbitrary input/output modalities impose a learning overhead which increases with the number of arbitrary items to be learned.
• [MP4] Acoustic input/output modalities are omnidirectional.
• [MP5] Acoustic input/output modalities do not require limb (including haptic) or visual activity.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
45
4. Tools• SMALTO (Bernsen)• Multimodal property flowchart (Williams et al.,
2002)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
46
SMALTO• Addresses the “Speech functionality problem”:• SMALTO has been created by taking a large
number of claims or findings from the literature on designing speech or speech-centric interfaces and casting these claims into the structured representation expressing the Speech Functionality Problem
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
47
• [Combined speech input/output, speech output, or speech input modalities M1, M2 and/or M3 etc.] or [speech modality M1, M2 and/or M3 etc. in combination with non-speech modalities NSM1, NSM2 and/or NSM3 etc.]
• are [useful or not useful] • for [generic task: GT] • and/or ]speech act type: SA] • and/or [user group: UG] • and/or [interaction mode: IM] • and/or [work environment: WE] • and/or [generic system: GS] • and/or [performance parameter: PP] • and/or [learning parameter: LP] • and/or [cognitive property: CP] • and/or [preferable or non-preferable] to [alternative modalities
AM1, AM2 and/or AM3 etc.]• and/or [useful on conditions] C1, C2 and/or C3 etc.
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
48
• SMALTO has been evaluated within the framework of projects involving the creators and in the DISC project
• Informal evidence indicates that it is difficult to apply for “linguistically naïve” designers because of the way the modality properties are formulated
• This was also the motivation for the Modality Property Flowchart (Williams et al. 2002)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
49
Multimodal property flowchart
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
50
• Multimodal interfaces as a particular type of interfaces Multimodal property flowchart needs to be combined with general usability heuristics for interface design (e.g. Nielsen)
SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces
51
Main points• Multimodal interfaces match the natural
expressivity of human beings • Taxonomy of multimodal interaction• Limitations of signal processing in one
modality can be overcome by taking into consideration input from another modality (multimodal disambiguation)
• Mapping of functionalities onto modalities not always straightforward support from guidelines and tools