Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley

Statistical NLPSpring 2010

Lecture 25: Diachronics

Dan Klein – UC Berkeley

Evolution: Main Phenomena

Mutations of sequences

Speciation

Tree of Languages

Challenge: identify the phylogeny Much work in

biology, e.g. work by Warnow, Felsenstein, Steele…

Also in linguistics, e.g. Warnow et al., Gray and Atkinson…

http://andromeda.rutgers.edu/~jlynch/language.html

Statistical Inference Tasks

Modern TextAncestralWord Forms

Cognate Groups /Translations

GrammaticalInference

Inputs Outputs

Phylogeny

FR IT PT ES

fuego feu

feuhuevo

les faits sont très clairs

Outline

AncestralWord Forms

fuego feu

feuhuevo

Language Evolution: Sound Change

camera /kamera/Latin

chambre /ʃambʁ/French

Deletion: /e/

Change: /k/ .. /tʃ/ .. /ʃ/

Insertion: /b/

Eng. camera from Latin, “camera obscura”

Eng. chamber from Old Fr. before the initial /t/ dropped

Diachronic Evidence

tonitru non tonotrutonight not tonite

Yahoo! Answers [2009] Appendix Probi [ca 300]

Synchronic (Comparative) Evidence

Simple Model: Single Characters

CG CC CC GG

A Model for Words Forms

focus /fokus/

fuego /fweɣo/

/fogo/

fogo /fogo/

fuoco /fwɔko/

IT ES PT

LAµE S

µI T µP T

[BGK 07]

Contextual Changes

/fokus/

/fwɔko/

f# w ɔ

Changes are Systematic

/fokus/

/fweɣo/

/fogo/

/fogo//

fwɔko/

/fokus/

/fweɣo/

/fogo/

/fogo//

fwɔko/

/kentru

/sentro

/sentr

/sentro

/tʃɛntro

Experimental Setup

Data sets Small: Romance

French, Italian, Portuguese, Spanish

2344 words Complete cognate sets Target: (Vulgar) Latin

Large: Oceanic 661 languages 140K words Incomplete cognate sets Target: Proto-Oceanic [Blust,

FR IT PT ES

Data: Romance

Learning: Objective

P (µjw1 : : :wL )

/fokus/

/fweɣo/

/fogo/

/fogo//fwɔko/

Learning: EM

/fokus/

/fweɣo/

/fogo/

/fogo//fwɔko/

/fokus/

/fweɣo/

/fogo/

/fogo//fwɔko/

M-Step Find parameters which

fit (expected) sound change counts

Easy: gradient ascent on theta

E-Step Find (expected) change

counts given parameters

Hard: variables are string-valued

Computing Expectations

‘grass’

Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence

[Holmes 01, BGK 07]

A Gibbs Sampler

‘grass’

A Gibbs Sampler

‘grass’

A Gibbs Sampler

‘grass’

Getting Stuck

How to jump to a state where the liquids /r/ and /l/ have a common

ancestor?

Getting Stuck

Solution: Vertical Slices

Single Sequence

Resampling

Ancestry Resampling

[BGK 08]

Details: Defining “Slices”

The sampling domains (kernels) are indexed by contiguous subsequences (anchors) of the observed leaf sequences

Correct construction section(G) is non-trivial but very efficient

anchor

Results: Alignment Efficiency

Is ancestry resampling faster than basic Gibbs?

Hypothesis: Larger gains for deeper trees

Depth of the phylogenetic tree

Setup: Fixed wall time

Synthetic data, same parameters

Results: Romance

Learned Rules / Mutations

Comparison to Other Methods

Evaluation metric: edit distance from a reconstruction made by a linguist (lower is better)

UsOakes

Comparison to system from [Oakes, 2000]

Uses exact inference and deterministic rules

Reconstruction of Proto-Malayo-Javanic cf [Nothefer, 1975]

Data: Oceanic

Proto-Oceanic

Data: Oceanic

http://language.psy.auckland.ac.nz/austronesian/research.php

Result: More Languages Help

Number of modern languages used

ceDistance from [Blust, 1993] Reconstructions

Results: Large Scale

Visualization: Learned universals

*The model did not have features encoding natural classes

Regularity and Functional Load

In a language, some pairs of sounds are more contrastive than others (higher functional load)

Example: English “p”/“b” versus “t”/”th”

“p”/“b”: pot/dot, pin/din, dress/press, pew/dew, ...

“t”/”th”: thin/tin

Functional Load: Timeline

Functional Load Hypothesis (FLH): sounds changes are less frequent when they merge phonemes with high functional load [Martinet, 55]

Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67]

Caveat: only four languages were used in King’s study [Hocket 67; Surandran et al., 06]

Our work: we reexamined the question with two orders of magnitude more data [BGK, under review]

Functional load as computed by [King, 67]

Data: only 4 languages from the Austronesian dataM

Each dot is a sound change identified by the system

Data: all 637 languages from the Austronesian data

Functional load as computed by [King, 67]

rior p

bility

Outline

AncestralWord Forms

fuego feu

feuhuevo

Cognate Groups

/fweɣo/

/fogo/

/fwɔko/

/berβo/

/vɛrbo/

/vɛrbo//

tʃɛntro/

/sentro/

/sɛntro/

‘fire’

Model: Cognate Survival

IT ES PT

Results: Grouping Accuracy

Fraction of Words Correctly Grouped

Method

[Hall and Klein, in submission]

Semantics: Matching Meanings

Occurs with:“night”“sun”

“week”

tag ENEN

Occurs with:“nacht”“sonne”“woche”

Occurs with:“name”“label”“along”

Outline

AncestralWord Forms

fuego feu

feuhuevo

Grammar Induction

congress narrowly passed the amended bill les faits sont très clairs

Task: Given sentences, infer grammar (and parse tree structures)

Shared Prior

P (µ̀ jµ) = N (µ;¾2I )

P (µ) = N (0;¾2I )

congress narrowly passed the amended bill

Results: Phylogenetic Prior

WG NGRMG

GLAvg rel gain: 29%

Conclusion Phylogeny-structured models can:

Accurately reconstruct ancestral words Give evidence to open linguistic debates Detect translations from form and context Improve language learning algorithms

Lots of questions still open: Can we get better phylogenies using these high-

res models? What do these models have to say about the very

earliest languages? Proto-world?

Thank you!

nlp.cs.berkeley.edu

Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley

Documents

Statistical NLP Winter 2008 Lecture 1: Introduction Roger Levy (with grateful borrowing from Dan Klein)

CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley

SP10 cs288 lecture 22 -- summarization.pptpeople.eecs.berkeley.edu/~klein/cs288/sp10/slides/SP10 cs288 lectu… · Lecture 22: Summarization Dan Klein –UC Berkeley Includes slides

Semantics and Pragmatics of NLP Overvie · SPNLP: Overview Lascarides & Klein Outline Meaning and NLP The In uence of Logic Computational Semantics Computational Pragmatics Semantics

Statistical NLP Spring 2011 Lecture 21: Semantic Roles Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this

Statistical NLP Winter 2009 Language models, part II: smoothing Roger Levy thanks to Dan Klein and Jason Eisner

Truth-Conditional Semantics Statistical NLPklein/cs288/sp10/slides... · 2010-04-12 · 1 Statistical NLP Spring 2010 Lecture 20: Compositional Semantics Dan Klein –UC Berkeley

SP10 cs288 lecture 22 -- coreference.pptpeople.eecs.berkeley.edu/~klein/cs288/sp10/slides/SP10 cs288 lectu… · Lecture 21: CoreferenceResolution Dan Klein –UC Berkeley. 2. 3 Infinite

CS 188: Artificial Intelligence Fall 2007 Lecture 22: Viterbi 11/13/2007 Dan Klein – UC Berkeley

SP11 cs288 lecture 27 -- grammar induction (2PP)klein/cs288/sp11/slides/SP11 cs288 lecture 27...Lecture 27: Grammar Induction Dan Klein – UC Berkeley Supervised Learning Systems

Statistical NLP Spring 2010 Lecture 6: Parts-of-Speech Dan Klein – UC Berkeley

NLP in Scala with Breeze and Epic David Hall UC Berkeley

20100106 Tom Klein NLP & Capitalism 3.0

Statistical NLP Winter 2008 Lecture 5: word-sense disambiguation Roger Levy Thanks to Jason Eisner and Dan Klein

CS 188: Artificial Intelligence Fall 2007 Lecture 25: Kernels 11/27/2007 Dan Klein – UC Berkeley

SP07 cs294 lecture 7 -- word classes.ppt [Read-Only]klein/cs294-7/SP07... · 2007-02-07 · 1 Statistical NLP Spring 2007 Lecture 7: Word Classes Dan Klein – UC Berkeley What’s

An Empirical Investigation of Statistical Significance in NLP · An Empirical Investigation of Statistical Signicance in NLP Taylor Berg-Kirkpatrick David Burkett Dan Klein Computer

Statistical NLP Spring 2010 Lecture 22: Summarization Dan Klein – UC Berkeley Includes slides from Aria Haghighi, Dan Gillick

Statistical NLP Spring 2010 Lecture 1: Introduction Dan Klein – UC Berkeley

Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 23... · Machine Translation IV Taylor Berg-Kirkpatrick –CMU Slides: Dan Klein, David Hall –UC Berkeley Algorithms