An introduction to the multimodal corpus
HuComTech and its annotation
László HunyadiUniversity of Debrecen
Nijmegen, MPI, 19 December 2012
Wednesday, April 24, 13
research modules involved:
• computational linguistics
• communication theory
Wednesday, April 24, 13
research modules involved:
• computational linguistics
• communication theory
• psychology
Wednesday, April 24, 13
research modules involved:
• computational linguistics
• communication theory
• psychology
• digital image processing
Wednesday, April 24, 13
research modules involved:
• computational linguistics
• communication theory
• psychology
• digital image processing
• engineering (robotics)
Wednesday, April 24, 13
Purpose of the corpus
to identify elements of human-human communication and structural relations between them that are
- relevant for HCI- technologically implementable
furthermore, to - learn the multimodal nature of human communication (both its verbal and nonverbal aspects)- describe human communication in a multimodal, holistic model
Wednesday, April 24, 13
The corpus is intended to represent sufficient data in proper arrangement for purposes of
linguisticslanguage technology (the teaching and testing of a speech recognition software)behavioral psychologyrobotics and more
Wednesday, April 24, 13
Corpus:
• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including
Wednesday, April 24, 13
Corpus:
• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including
• ≈ 450.000 word tokens
Wednesday, April 24, 13
Corpus:
• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including
• ≈ 450.000 word tokens
• 15 read sentences
Wednesday, April 24, 13
Corpus:
• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including
• ≈ 450.000 word tokens
• 15 read sentences
• 10 minute guided dialogues (job interviews)
Wednesday, April 24, 13
Corpus:
• appr. 60 hours of video-recordings of 111 speakers aged 18-29, including
• ≈ 450.000 word tokens
• 15 read sentences
• 10 minute guided dialogues (job interviews)
• 15 minute free dialogues
Wednesday, April 24, 13
45.5%
54.5%
Distribution of subjects by sex
male female
0
10
20
30
40
50
19 20 21 22 23 24 25 26 27 28 29 30
22235
710
811
4848
6
Distribition of subjects by age
# su
bjec
ts
age
Wednesday, April 24, 13
Annotation
serves the study of multimodality through the study of unimodality and the fusion of aligning markers
Markers to be annotated are determined by a theoretical-technological model of communication
Wednesday, April 24, 13
We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.
Theoretical considerations
Wednesday, April 24, 13
We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.
Therefore, in order to properly represent human communication for technology, we need a model to follow this two-way mechanism: to serve both synthesis and analysis.
Theoretical considerations
Wednesday, April 24, 13
We assume that human communication has a two-way mechanism: speakers and listeners rely on the same mechanism to comunicate.
Therefore, in order to properly represent human communication for technology, we need a model to follow this two-way mechanism: to serve both synthesis and analysis.
We assume that the approach of a generative model proposed for syntax (especially Chomsky 1981) can be useful in building such a two-way model of communication for technology.
Theoretical considerations
Wednesday, April 24, 13
• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures
Characteristics of the constituent modules
Wednesday, April 24, 13
• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures
• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.
Characteristics of the constituent modules
Wednesday, April 24, 13
• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures
• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.
Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)
Characteristics of the constituent modules
Wednesday, April 24, 13
• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures
• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.
Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)
• The functional extension assigns all possible communicative functions to any given structure generated by the basic structure and only those
Characteristics of the constituent modules
Wednesday, April 24, 13
• Each module has a characteristic finite set of primitives and, by way of the Operational component these primitives are combined into an infinite set of non-primitives and further structures
• The basic structure generates all and only those structures (configurations of primitives) that are formally possible (‘grammatical’) in any communicative event.
Eg. The ‘start’ of an event can be followed by ‘end’ of an event, but the inverse order is not possible (‘ungrammatical’)
• The functional extension assigns all possible communicative functions to any given structure generated by the basic structure and only those
Eg. the restart node in the basic structure can be associated with the continuity function, but the assignment of the function turn-taking to it is not possible.
Characteristics of the constituent modules
Wednesday, April 24, 13
• The pragmatic extension actualizes the input from the functional extension for the given actual communicative event by selecting the appropriate markers and their appropriate values based on the given scenario and ontology behind the event.
Characteristics of the constituent modules
Wednesday, April 24, 13
• The pragmatic extension actualizes the input from the functional extension for the given actual communicative event by selecting the appropriate markers and their appropriate values based on the given scenario and ontology behind the event.
Eg. The function of ‘happiness’ is expressed by the appropriate value of some modal marker(s): facial, gestural, audio, lexical or some/all of them.
Characteristics of the constituent modules
Wednesday, April 24, 13
• Functions are translated into their technological counterparts as parameters through data fusion
• The pragmatic extension selects the modalities and their markers to represent the given function
• Actual occurrences of markers are represented by the corresponding parameter values
• Technology receives these parameter values as input and operates on them.
Interface to technology: application of the model
Pragmatic extension
Technology
The pragmatic extension is the interface to technology:
Wednesday, April 24, 13
Annotation
unimodal (video, audio)multimodal (video + audio)
manualautomatic
description of physical properties (esp. video)interpretative annotations
with focus on emotions and the multimodal alignment of video and audiospecial features of annotation: pragmatic syntactic prosodic
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Audio
IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Audio
IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V
discourse-level: TT, TK, BC, SL
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Audio
IP-level: HC, SC, EM, IN, BC, HE, RE, IT, SL, V
discourse-level: TT, TK, BC, SLemotion-level: neutral, sad, happy/laughing, surprised, recall, tensed (and degrees of: strong, moderate, reduced), other, silence
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
comevent: start, end
deictic: addressee, self, measure, object, shape; left, right, both
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
comevent: start, end
deictic: addressee, self, measure, object, shape; left, right, both
emblems: attention, agree, block, disagree, doubt, doubt-shrug, refusal, surprise, more-or-less, number, finger-ring, hands up, one hand other hand, other
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
headshift: lower, turn, raise, shake, nod; sideways, left, right
emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
headshift: lower, turn, raise, shake, nod; sideways, left, right
touchmotion: hair, leg, arm, face, eye, ear, chin, mouth, neck, bust, forehead, nose, glasses; tap, scratch; left, right
emotions: natural, happy, recall, sad, surprise, tense; (and degrees of: strong, moderate, reduced)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
posture: crossing arm, holding head, lean back, lean forward, lean left, lean right, rotate right, roate left, shoulder up, upright
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
posture: crossing arm, holding head, lean back, lean forward, lean left, lean right, rotate right, roate left, shoulder up, upright
handshape: breaking, fist, crossing fingers, open flat, open spread, thumb out, index out; left, right, both
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)
eyebrows: scowl, up; left, right, both
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Video
facial expressions: natural, happy, recall, sad, surprise, tense (and degrees of: moderate, reduced, strong)
eyebrows: scowl, up; left, right, both
gaze: blink, up, down, left, right, forwards, left-up, left-down, right-up, right-down
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax
structural segmentation:
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax
clause boundaries
structural segmentation:
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax
clause boundaries
hierarchical arrangement of clauses
structural segmentation:
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax
clause boundaries
hierarchical arrangement of clauses
internal structure of clauses (esp. missing elements)
structural segmentation:
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax vs. prosody
(prosody of clauses)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax vs. prosody
pitch movement: rise, fall, stagnant + finer distinctions
(prosody of clauses)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax vs. prosody
pitch movement: rise, fall, stagnant + finer distinctions
intensity: increase, decrease, stagnant + finer distinctions
(prosody of clauses)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Syntax vs. prosody
pitch movement: rise, fall, stagnant + finer distinctions
intensity: increase, decrease, stagnant + finer distinctions
pause/duration: increase, decrease, stagnant + finer distinctions
(prosody of clauses)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
Annotation: DiAMSL for text based eventseral on several layers, esp. audio: turn management, discourse
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
Annotation: DiAMSL for text based eventseral on several layers, esp. audio: turn management, discourse
Multimodality - the complex of audio + video multimodal communicative act (annotation based on Bach-Harnish)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
communicative act types: constatives directives, comissives, acknowledgements, none
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
communicative act types: constatives directives, comissives, acknowledgements, none
supporting events of communicative acts: backchannel, politeness markers, corrections, no support
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
thematic control: topic initiation, elaboration, topic change (contextual, non-contextual)
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - multimodal
thematic control: topic initiation, elaboration, topic change (contextual, non-contextual)
information structure: given vs. new information
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - unimodal
agreement: uninterested, disagree, block, uncertainty; full, partial
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - unimodal
agreement: uninterested, disagree, block, uncertainty; full, partial
attention: calling, paying
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - unimodal
agreement: uninterested, disagree, block, uncertainty; full, partial
attention: calling, paying
deixis
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - unimodal
information: received novelty
Wednesday, April 24, 13
Levels of annotation and attributes (labels)
Pragmatics - unimodal
information: received novelty
turn-management: intending to start speaking, start speaking successfully, end speaking, breaking in
Wednesday, April 24, 13
Sample data for multimodal alignments
Turn management
turn-give: forward, blink, down, left-down, right-down
Wednesday, April 24, 13
Sample data for multimodal alignments
Turn management
turn-give: forward, blink, down, left-down, right-down
turn-take: forward, blink, down, left-down, right-down
Wednesday, April 24, 13
Sample data for multimodal alignments
Turn management
turn-give: forward, blink, down, left-down, right-down
turn-take: forward, blink, down, left-down, right-down
break-in_turn-keep: forwards, blink, up, down, left-down, right-down
Wednesday, April 24, 13
Sample data for multimodal alignments
Emotions vs. gestures
uncertainty is mostly found to be associated with the hand gesture open spread, less frequently with crossing fingers
Wednesday, April 24, 13
Sample data for multimodal alignments
Emotions vs. gestures
uncertainty is mostly found to be associated with the hand gesture open spread, less frequently with crossing fingers
agreement is also associated with open spread and crossing fingers
Wednesday, April 24, 13
Sample data for multimodal alignments
Emotions vs. gestures
doubt is found to be associated with open spread, crossing fingers and sideways as well
Wednesday, April 24, 13
Video annotation (manual vs. automatic)
annotation methodadvantage/disadvantage (+/-)advantage/disadvantage (+/-)
annotation methodphysical values interpretative values
manual - +
automatic + -
Essential difference between the two: automatic annotation is ‘digital’ following framewise judgement across a predefined number of frames, whereas manual annotation is ‘analog’
Wednesday, April 24, 13
Video analysis state log - Face Model: General - Calibration: ContinousVideo analysis state log - Face Model: General - Calibration: ContinousStart time: 2012.06.15. 9:35:15Filename: C:\Users\MACMINI\Documents\Noldus test\007_J_C2.trimmed.movFrame rate:3,33333333333333
Video TimeEmotion0:00:03 Unknown
0:00:19.800Neutral0:00:34.800Unknown0:00:35.699Neutral0:00:39.300Sad0:00:42.900Neutral0:00:49.500Scared
0:00:51 Sad0:00:51.900Neutral0:01:02.400Sad0:01:05.100Disgusted0:01:07.800Neutral0:01:13.500Angry
0:01:15 Neutral0:01:16.800Sad
0:01:18 Neutral0:01:22.200Sad0:01:25.500Neutral0:01:31.200Surprised0:01:32.700Neutral0:01:33.600Sad0:01:35.399Unknown
0:01:36 Sad0:01:36.899Neutral0:01:42.600Angry0:01:46.199Sad0:01:56.100Neutral0:02:03.300Sad0:02:04.200Neutral0:02:06.300Scared0:02:07.500Neutral0:02:26.100Surprised0:02:29.100Neutral0:02:42.299Scared0:02:43.500Sad0:02:47.100Surprised
0:02:48 Scared0:02:51.299Sad0:03:00.299Neutral0:03:08.100Surprised0:03:10.500Neutral0:03:12.299Sad0:03:13.200Neutral0:03:15.899Sad0:03:20.100Disgusted
0:03:24 Sad0:03:26.100Neutral0:03:27.899Angry0:03:29.100Neutral
0:03:30 Angry0:03:31.799Sad0:03:47.100Scared0:03:48.300Sad0:04:05.100Neutral0:04:07.500Disgusted0:04:08.400Angry0:04:09.300Neutral0:04:10.500Angry
0:04:12 Sad0:04:13.200Neutral0:04:23.400Happy0:04:28.800Scared0:04:32.400Neutral0:04:40.800Unknown0:04:46.200Sad0:04:50.400Neutral0:04:51.300Sad0:04:57.900Neutral0:05:04.800Unknown0:05:05.700Angry0:05:08.700Happy0:05:13.200Angry0:05:14.099Neutral0:05:16.200Sad0:05:18.599Disgusted0:05:19.800Sad0:05:21.900Surprised0:05:22.800Sad0:05:24.900Neutral
0:05:33 Sad0:05:35.099Neutral0:05:41.400Scared0:05:43.800Neutral0:05:50.400Sad0:05:52.500Angry0:05:53.700Sad0:06:04.200Neutral0:06:11.400Angry
0:06:15 Sad0:06:15.900Unknown0:06:17.099Angry0:06:19.500Neutral0:06:21.599Sad0:06:23.400Surprised0:06:24.300Neutral0:06:29.700Angry0:06:31.799Neutral0:06:33.299Scared0:06:34.500Angry0:06:36.900Sad0:06:37.799Disgusted0:06:40.500Sad0:06:44.700Scared0:06:46.799Neutral0:06:57.900Sad0:07:02.099Neutral0:07:03.900Sad0:07:06.900Angry
0:07:09 Scared0:07:10.200Sad
0:07:12 Neutral0:07:15.300Surprised0:07:16.800Scared0:07:18.300Neutral0:07:21.300Sad0:07:22.200Unknown0:07:27.600Sad0:07:36.900Scared0:07:37.800Neutral0:07:41.100Sad0:07:42.600Neutral0:07:46.200Sad0:07:52.500Surprised0:07:56.700Sad0:07:58.200Scared0:07:59.400Neutral0:08:04.200Unknown0:08:05.100Sad0:08:08.100Scared0:08:09.600Neutral0:08:11.400Happy0:08:12.300Neutral
0:08:15 Sad0:08:15.900Neutral0:08:40.199Sad0:08:41.400Neutral0:08:46.800Scared0:08:48.300Neutral0:08:59.100Unknown0:09:00.300Sad0:09:03.300Disgusted0:09:06.900Sad0:09:10.500Disgusted0:09:13.800Surprised0:09:15.900Neutral0:09:24.300Unknown0:09:27.300Sad0:09:31.199Neutral0:09:32.100Happy0:09:34.199Sad0:09:38.699Disgusted0:09:42.900Neutral0:09:50.100Surprised0:09:52.199Neutral0:09:55.199Surprised0:09:56.699Neutral0:10:14.400Unknown0:10:15.900END
Values assigned to single frames hence only begin time of the given value
Settings and sample output
Wednesday, April 24, 13
Comparison: automatic vs. manual emotion recognition
3%4%7%
45%
42%
Manual annotation
happy natural recall tense surprise
Wednesday, April 24, 13
Comparison: automatic vs. manual emotion recognition
Although both systems are based on the FACS model of emotions, different categories (emotions) recognised
Whereas both systems assign interpretative values, manual annotation selects ‘more difficult’ ones
Manual annotation offers subjectively observed degrees of emotions (strong, moderate, reduced), for automatic annotation thresholds for being ‘happy’, ‘angry’ etc. are determined statistically > smaller degrees are left out
Wednesday, April 24, 13
Comparison: automatic vs. manual emotion recognition
Eventual unrealistic values in automatic annotation are the result of the single frames approach
Duration is not marked > offset of an emotion cannot be determined in case of non-continuous labels
Most agreement between the two approaches: happy, natural
Wednesday, April 24, 13
Annotation of spoken syntax and its relation to prosody in the HuComTech corpus
Wednesday, April 24, 13
• Aim: language technology (speech-to-text)
• communication studies (alignment of multimodal markers for communicative acts and emotions)
• linguistics (the syntax-prosody interface)
Wednesday, April 24, 13
Syntactic data from our annotation
Spoken language vs. written language
Grammar: same/different?
Same underlying principles:
grouping of elementshierarchical organisation of groups
Difference: two additional dimensions of spoken language:
- time - grouping has language specific means
Wednesday, April 24, 13
# of clauses in a sentence
# of clauses/sentence
12345678910111213141516171819202122
Informal dialogs Formal dialogs
2933 688289 82163 2492 2644 937 924 224 217 07 06 14 04 02 01 02 02 13 00 03 01 00 0
0
750
1500
2250
3000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# of clauses/sentence
Informal dialogs Formal dialogs
Wednesday, April 24, 13
Structural relations (hierarchy)
clause sequences with no structural relation has no subordinate clausehas no coordinate clausehas neither subordinate, nor coordinate clause
embeddingsinsertionsmultiple subordination (recursion)
Wednesday, April 24, 13
Type of missing element according to
syntactic code
Type of missing element Informal dialogs % Formal dialogs %1. nothing missing 2664 35.59 758 34.62. main clause 37 0.49 15 0.693. preceding clause 58 0.77 6 0.274. relative pronoun 89 1.19 22 1.015. conjunction 22 0.29 4 0.186. subject (grammatical) 3178 42.37 1167 54.147. subject (logical) 274 3.66 113 5.178. predicate 214 2.87 72 3.299. object 102 1.36 45 2.0610. adverb 11 0.15 4 0.1811. attribute 0 0 0 012. verb 10 0.13 0 013. unfinished clause 728 9.7 167 13.114 missing element not relevant 3375 45.05 769 35.21
Sum: 143.62 149.9Type of missing element by frequency
Type of missing element Informal dialogs % Formal dialogs %14 missing element not relevant6. subject (grammatical)1. nothing missing13. unfinished clause7. subject (logical)8. predicate9. object4. relative pronoun3. preceding clause2. main clause5. conjunction10. adverb12. verb11. attribute
Sum:
3375 45.05 769 35.213178 42.37 1167 54.142664 35.59 758 34.6728 9.7 167 13.1274 3.66 113 5.17214 2.87 72 3.29102 1.36 45 2.0689 1.19 22 1.0158 0.77 6 0.2737 0.49 15 0.6922 0.29 4 0.1811 0.15 4 0.1810 0.13 0 00 0 0 0
142.62 149.9
Wednesday, April 24, 13
Syntactic types vs. gestures
Alignment of syntactic type and gestures can offer an insight into certain cognitive processes in communication:
- speech dynamics - error detection - gesturing the “untold”
Wednesday, April 24, 13
Syntactic types vs. gestures
A very interesting finding: pause (silence) and gaze can be mutually supplementary:
We found very few instance of gazing at the right overlap of an unfinished clause followed by a pause, but there was frequent gazing if there was no pause in a similar position.
Also, looking up as gazing direction was specific to alignment with the end of an unfinished clause but was quite rare at the end of a finished/complete clause.
Wednesday, April 24, 13
14%
11%
2%2%
5%
10%6%
49%
Unfinished clause type 13 + no SL + gaze
informal: forwards left-down right-downright-up left-up up downblink
Wednesday, April 24, 13
• the “IP-level” (based on F0-contour and pause, manual)
• pitch movement (automatic)
• intensity change (automatic, in progress)
• accent/stress detection (automatic, in progress)
Wednesday, April 24, 13
• aim: to generate data on pitch movement trends (actual movement type, tone range)
• capture F0-properties of syntactic types
• assign communicative functions (including emotions)
Detection of pitch movement
Wednesday, April 24, 13
• based on Praat (algorithm and scripts)
• stylization on the syllable level (P. Mertens’ prosogram (‘perceived pitch’ in semitones, http://bach.arts.kuleuven.be/pmertens/prosogram/)
• trend of syllable based stylization (Szekrényes, 2012)
• classification
The calculation of the trend of pitch movement
Wednesday, April 24, 13
{p} %o hát %o pályakezdĘ vagyok, úgyhogy legfeljebb most a tanulmányaimról tudok mesélni,1.0.2.0.0.0.6. 2.3.1.0.0.0.6.
s12upward stagnant rise fall stagnantL1 L1 L1 L1 L1 L1L1 L1 L1 L1
126.86 135.98 135.98 121.67 121.67 155.21155.21 119.61 119.61 108.28
2 3 4
40
50
60
70
80
90asyll, G=0.32/T2, DG=20, dmin=0.035
Prosogram v2.8006mc22 F shure
150 Hz
Calculations associated with syntactic type but not based on it
Wednesday, April 24, 13
Filename StartTime EndTime Duration StartValue EndValueAbsolute
DifferenceChange across time (Hz/msec) Movement ActualF0Range Sentence # Clause analysis
057fc30_I_shure 136.44 137.01 0.57 236.31 172.81 63.5 0.111411 fall MM s30 1.0.0.0.0.0.6.
057fc30_I_shure 137.01 137.06 0.05 172.81 221.4 48.59 1.079783 rise MM s30 1.0.0.0.0.0.6.
057fc30_I_shure 137.06 137.29 0.23 221.4 255.44 34.04 0.14798 rise MH2 s30 1.0.0.0.0.0.6.
057fc30_I_shure 137.29 137.78 0.49 255.44 179.93 75.5 0.153578 fall H1M s30 1.0.0.0.0.0.6.
057fc30_I_shure 137.78 138.32 0.54 179.93 186.13 6.2 0.011479 stagnant MM s31 1.2.0.0.0.0.4,6,9.
057fc30_I_shure 138.32 138.42 0.1 186.13 229.14 43.01 0.452767 rise MM s31 1.2.0.0.0.0.4,6,9.
057fc30_I_shure 138.42 139.03 0.61 229.14 168.1 61.05 0.10008 fall ML1 s31 1.2.0.0.0.0.4,6,9.
057fc30_I_shure 139.03 139.08 0.05 168.1 198.17 30.07 0.6014 rise L1M s31 2.0.0.1.0.0.6.
057fc30_I_shure 139.08 139.2 0.12 198.17 165.49 32.68 0.272326 fall ML1 s31 2.0.0.1.0.0.6.
057fc30_I_shure 139.2 139.56 0.36 165.49 169.56 4.07 0.011147 stagnant L1M s31 2.0.0.1.0.0.6.
057fc30_I_shure 139.56 139.63 0.07 169.56 201.24 31.68 0.452586 rise MM s31 2.0.0.1.0.0.6.
057fc30_I_shure 139.63 140.2 0.56 201.24 208.74 7.5 0.013276 stagnant MM s31 2.0.0.1.0.0.6.
5 pitch levels: L2, L1, M, H1, H2Slope (Hz/ms)Absolute values
3. video
Wednesday, April 24, 13
Syntactic types vs. F0
70%
23%
6%
Formal type 13 (unfinished clause)
fall rise stagnant
61%25%
14%
Informal type 13 (unfinished clause)
fall rise stagnant
Wednesday, April 24, 13
Syntactic types vs. F0
40%
16%
44%
Formal type 2 (main clause missing)
fall rise stagnant
56% 29%
16%
Informal type 2 (main clause missing)
fall rise stagnant
Wednesday, April 24, 13
Syntactic types vs. F0
70%
23%
8%
Formal type 3 (subord. clause missing)
fall rise stagnant
56% 29%
16%
Informal type 2 (subord. clause missing)
fall rise stagnant
Wednesday, April 24, 13
methodology based on the calculation of the trend of pitch movement (currently being implemented)
Detection of intensity change
Wednesday, April 24, 13
• based on Hunyadi 2002
• PET: pitch and energy over time
• accent/stress is the result of the interaction of pitch and intensity: relative prominence
• absolute PET-value + duration
Detection of accent/stress
Wednesday, April 24, 13
THANK YOU!http://hucomtech.unideb.hu/hucomtech
Wednesday, April 24, 13