Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Incorporating syllables into feature-based distributions describing phonotactic patterns
Cesar Koirala
(University of Delaware)
LSA 2013 Boston, Massachusetts
In this presentation…
• I will introduce a method of integrating syllables into a feature-based model of phonotactic learning – Heinz & Koirala (2010).
• I will demonstrate that incorporation of syllables provides important contextual information to the model.
• I shall discuss how this project is an example of incorporating feature interactions into feature-based models.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
General schema of the presentation
feature interaction Broad picture: What is a feature-based model with syllables? Where does it fit in the broad categorization of learning models of phonotactics? A brief introduction to Heinz & Koirala model and the baseline version Method of integrating syllables into the model An illustrative example
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
LSA 2013 Boston, Massachusetts
Feature Interaction
• Features F and G interact if there is a phonotactic constraint in a language that targets the co-occurrence of F and G in some context.
• In English, feature [dorsal] and [nasal] interact because of the existence of the constraint *[#ŋ] or *# [+dorsal, +nasal], but no constraint *# [+dorsal] or *# [+nasal].
• Statistically speaking, the two features are not independent.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Theories of Feature Interaction
• The most restrictive theory of feature interaction: -No two (or more) features interact. -Statistically speaking, all features are independent.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Theories of Feature Interaction
• The most restrictive theory of feature interaction: -No two (or more) features interact. -Statistically speaking, all features are independent.
• The least restrictive theory of feature interaction: -Any group of features may (may not) interact.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Space of possibilities of interaction
F1 F2 F3 F4 F5 F6 F7 F8 … No Features interact (most restrictive)
F1 F2 F1 F3 F1 F4 … 2 Features interact
…
All Features interact F1F2 ..... Fn
Huge space of possible feature interactions
…
(least restrictive)
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Space of possibilities of interaction
F1 F2 F3 F4 F5 F6 F7 F8 … No Features interact (most restrictive)
F1 F2 F1 F3 F1 F4 … 2 Features interact
…
All Features interact F1F2 ..... Fn
Huge space of possible feature interactions
…
(least restrictive)
n-gram model
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Research Question
• What is the nature of feature interaction in the world’s languages? • General Hypothesis: The nature of feature interaction is not arbitrary.
• Research Strategy: The huge space of possibilities does not need to be searched. If
we understand the nature of feature interaction, we can search a much smaller, more constrained space.
• My thesis examines this question typologically, experimentally and with a
computational model.
• In this talk I shall focus only on the computational model.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Computational models of Phonotactic learning
Phonotactic learner: The task of the model (a computer program) is to observe data from a corpus and learn a phonotactic grammar that reflects the linguistic generalizations present in the data.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Lexicalist Learning Models of Phonotactics Non Supra-segmental Supra-segmental Combination of both Segment-based Feature-based
Broad Categorization of Phonotactic Models
Assign well-formedness on the basis of lexical frequency of non supra-segmental units
• n-gram models • Vitevich & Luce (2004)
• Hayes & Wilson (2008) • Albright (2009) • Heinz & Koirala (2010)
Assign well-formedness on the basis of lexical frequency of supra-segmental units. e.g. syllable-based model • Coleman & Pierrehumbert (1997)
Uses both segmental and supra-segmental information
• Heinz & Koirala model with syllables • Daland et.al (2011)
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Motivation for feature-based models with syllables
• Feature-based models - Hayes & Wilson (2008) and Albright (2009) - unanimously benefit by incorporating syllables when predicting sonority projection effects (Daland et al. 2011).
• Models based on syllables alone (e.g., Coleman & Pierrehumbert (1997)) require the coda and onset constituents to be independently well-formed to impose any kind of phonotactics .
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Heinz & Koirala Model
• Heinz & Koirala (2010) presented a computational learning model of phonotactics which generalizes on the basis of phonological features. Their baseline version assumes statistical independence of the individual features (There is no feature interaction).
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Heinz & Koirala Model
• Heinz & Koirala (2010) presented a computational learning model of phonotactics which generalizes on the basis of phonological features. Their baseline version assumes statistical independence of the individual features (There is no feature interaction).
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Heinz & Koirala Model
• The basic idea: 1. We can define probability distributions over segments as a normalized product
of simple distributions over features 2. These simple distributions can be distributions of individual features or
distributions of a combination of features 3. Probabilistic Deterministic Finite Automata (PDFA) can be used to represent the
features and their distributions • Advantages:
1. Fewer parameters than an n-gram model - Fewer parameters means less training is necessary 2. Mathematically sound – well-formed probability distribution 3. Captures intuitions transparently – which like sounds have like distributions in like contexts is determined by those features that do NOT interact.
→ In the baseline version, this means “like sounds have like distributions in like contexts”
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Illustration
(1) A binary feature system with ∑ = {a,b,c} and two features F and G
F G a + -
b + +
c - +
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Illustration
(1) (2) A binary feature system with ∑ = {a,b,c} and two features F and G PDFAs for features F and G
F G a + -
b + +
c - +
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Illustration
(1) (2) A binary feature system with ∑ = {a,b,c} and two features F and G PDFAs for features F and G
(3)
F G a + -
b + +
c - +
A fragment of the product machine of PDFAs for F and G. Z is a normalizing term that ensures well formed probability distribution.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Illustration
• The maximum likelihood estimate of the model given the corpus is obtained by passing the corpus through factor PDFAs ( not the product machine)
• The parse of the sample through the factor machines is counted and normalized in order to obtain the distributions of individual features.
• The normalized product of these PDFAs gives the actual probability distribution.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
How restrictive is the baseline version in terms of feature interaction?
• As mentioned, the baseline version implements the most restrictive theory – no features interact!
• The baseline version is unable to learn feature interaction. • For instance, since the baseline version would observe nasals and velars word
initially in English, it would incorrectly predict that velar nasal can also do so.
• A non-baseline version that would a priori allow two features to interact can learn that [+dorsal] and [+nasal] cannot co-occur word initially in English.
• In general, the model can allow any fixed number of features to interact.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Allowing featural interaction in the model
• Generalizations can be made based on more than one feature.
• If F is [dorsal] and G is [nasal], then the model learns that [+dorsal] and [+nasal] cannot occur word initially in English while non-nasal velars and non-velar nasals can.
F G a + -
b + +
c - +
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
F1 F2 F3 F4 F5 F6 F7 F8 … No Features interact (most restrictive)
F1 F2 F1 F3 F1 F4 … 2 Features interact
…
All Features interact F1F2 ..... Fn
Huge space of possible feature interactions
…
(least restrictive)
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
How restrictive is the baseline version in terms of feature interaction?
Baseline version
Two features are a priori allowed to interact
F1 F2 F3 F4 F5 F6 F7 F8 … No Features interact (most restrictive)
F1 F2 F1 F3 F1 F4 … 2 Features interact
…
All Features interact F1F2 ..... Fn
Huge space of possible feature interactions
…
(least restrictive)
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
How restrictive is the baseline version in terms of feature interaction?
Baseline version
Two features are a priori allowed to interact
Our strategy: We do not need to search the whole space as it is a priori restricted in some fashion by UG
Factoring syllables into the model
• We use a single multivalued feature SYLL with the values onset, coda and nucleus. • A consonant at the onset position of a syllable has SYLL value onset. A consonant
at the coda position has the SYLL value coda. Syllabic sounds have SYLL value nucleus.
• This representation enables us to obtain the distribution of syllable structure in the language. It can be represented by the following PDFA.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
The Sonority Sequencing Principle
• Sonority Sequencing Principle/Generalisation (Selkirk 1984:116) In any syllable, there is a segment constituting a sonority peak that is preceded and/or followed by a sequence of segments with progressively decreasing sonority values.
• This typological generalization is a statement of featural interaction. Features important to sonority interact with syllabic features.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Factoring syllables into the model
• We use a single multivalued feature Sonority with the values stop, fricative, affricate, nasal, liquid, glide and vowel.
• A non-baseline version that would a priori allow interaction of syllabic features and sonority features would be able to learn SSP-like behavior
• This means training is done on the product of syllabic PDFA and sonority PDFA.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
The sonority PDFA
Factoring syllables into the model
Training data • CELEX2 English lemma corpus with pronunciations taken from the CMU
Pronouncing dictionary ( Chandlee 2012). • Stress markings were removed. • The training corpus consisted of 23,911 words. • Training was done on the version in which the corpus reflected information about
onsets and codas.
P R AH P OW Z AH L Pons Rons AH Pons OW Zons AH Lcod P R AH P OW Z Pons Rons AH Pons OW Zcod
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Factoring syllables into the model
Illustrative Example (English consonant clusters) The model assigns different probability values to the same clusters at onset and coda positions.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Factoring syllables into the model
Illustrative Example (English consonant clusters) The results reflect that the model does not just assign different probabilities to the same consonant clusters at onset and coda positions, but it has also learned SSP like behavior.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Factoring syllables into the model
Illustrative Example (English consonant clusters) The results reflect that the model does not just assign different probabilities to the same consonant clusters at onset and coda positions, but it has also learned SSP like behavior. For Example: 1. The model assigns higher probabilities to X-stop coda clusters than X-stop onset clusters as predicted by SSP. The only X-stop onset clusters with a non-zero probability are fricative-stop clusters (probably due to S-stop onsets)
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Factoring syllables into the model
Illustrative Example (English consonant clusters) 2. Similarly, the model assigns higher probabilities to X-fricative coda clusters than X-fricative onset clusters as predicted by SSP. The only two onset clusters with non zero probabilities are: fricative-fricative and stop-fricative. 3. The only X-affricate clusters with non zero values are nasal-affricate clusters (‘bench’) and liquid-affricate clusters (‘arch’) at coda positions.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Conclusion
• We introduced a way of integrating syllables into the Heinz & Koirala model. • Unlike the baseline version of the Heinz & Koirala model, this version a priori
allowed certain features (from the syllable PDFA and the sonority PDFA) to interact.
• This approach directly incorporated the structure of the SSP into the model via this interaction.
• Consequently the model finds different probability distributions for the same consonant cluster in different syllabic positions.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions
Thank you!
This research is supported by a NSF Linguistics DDRIG #1226793. I thank Jeff Heinz for valuable discussion that set the foundation for this work. I would also like to thank the University of Delaware’s phonology/phonetics group and the audiences at NECPhon 2012 for their comments and suggestions.
LSA 2013 Boston, Massachusetts
Introduction Feature Interaction Heinz & Koirala Model Syllables Discussions