View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Sound Systems of Language
• Phonetics – The sounds (phones) of the world’s languages, the
phonemes they map to, and how they are produced
• Phonology– Rules that govern how phones are realized differently
in different contexts
• Technologies:– Automatic Speech Recognition (ASR) systems take
sounds as input and output word hypotheses
– Text-to-Speech (TTS) systems take text as input and produce speech
Letters and Sounds• same spelling = different sounds
o comb, tomb, bomb oo blood, food, good
c court, center, cheese s reason, surreal, shy
• same sound = different spellings[i] sea, see, scene, receive, thief [s] cereal, same, miss
[u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie
• combination of letters = single soundch child, beach th that, bathe
oo good, foot gh laugh
• single letter = combination of soundsx exit, Texas u use, music
• ‘silent’ lettersk knife, know p psycho, pterodactyl
e moose, bone gh through
Articulators
lips
teethAlveolar ridge
velum
uvula
pharyngeal
vocal folds:glottis
larynx
trachea
palate
Articulators in action
“Why did Ken set the soggy net on top of his deck?”
(Sample from the Queen’s University / ATR Labs X-ray Film Database)
Vocal fold vibration
[UCLA Phonetics Lab demo]
Places of articulation
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
labial
dentalalveolar post-alveolar/palatal
velar
uvular
pharyngeal
laryngeal/glottal
Articulatory parameters for English consonants (in ARPAbet)
h
q
glottal
dxflap
yl/r wapprox
ng n mnasal
jhchaffric.
zhsh z sdhth v ffric.
g k d t b pstop
velarpalatalalveolarinter-dental
labio-dental
bilabial
PLACE OF ARTICULATION
MA
NN
ER
OF
AR
TIC
ULA
TIO
N
VOICING: voicedvoiceless
Acoustic landmarks
“Patricia and Patsy and Sally”
[p] [t] [p] [t]
[p] [t]
[l][sh] [s] [s][n] [n][ix]
[ix] [ih]
[ih] [ax] [ae] [iy] [iy][ae]
Syllables
• Syllabification important for– pronunciation: deny/denim
– speaking rate calculation: syllables per second
– word recognition in ASR
• (onset) + nucleus + (coda): – c a t
– a
– a t
– t o
• Lexical stress: primary, secondary, terciary– telephone
Phonological Rules
• Not all instances of a given phone [x] sound/look alike
• Phoneme /x/ may have many allophones• Phonological rules map phonemes in context to
allophones, e.g.– simple rules: /{t,d}/ --> [V’ _ V
– FSA’s, FST’s
– declarative constraints: t: V’ _ V
Allophones of /t/
• What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/:
Figure 4.8: Jurafsky & Martin (2000), page 104.
Application: Word Pronunciation for TTS
• Pronouncing dictionaries (the: [‘dhax],[‘dhiy])• Problems:
– Homographs (bass/bass, wind/wind, desert/desert)
– Abbreviation (dr., st.)
– Numbers (2125551212)
– Acronyms (NAACL, IDIAP)
– Morphological variation (unrelentingly)
– Proper names and unknown words
• rules + dictionaries/dictionaries + rules
• Hybrid model:– FSTs model individual word pronunciation in lexicon
(e.g. reg-noun-stem entry c:k a:ae t:t)
– FSAs model morphology (e.g. reg-noun-stem + s)
– FSTs for pronunciation rules (e.g. s--> z)
– special rules to model name and acronym pronunciation
– default letter2sound rules for other words
Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words
• Rhyming analogy: varoom/room, todo/dodo• Linguistic origin: Infiniti, vingt, Perez• Abbreviation expansion:
– spacious living/dining rm w/frplc/dining room with fireplace
– pls?
Summary
• Phones realize phonemes in different contexts– Different places and manners of articulation result in
acoustic differences that can be detected by ASR systems as well as people
• Versatile FSTs can model phonological as well as morphological and spelling systems
• Many creative approaches toward pronunciation modeling for TTS
• Next time: Read Ch 5