A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS...

Preview:

Citation preview

A Corpus Search Methodology for Focus Realization

Jonathan Howell and Mats Rooth

Linguistics and CIS

Cornell University

Goals

Study phonetic realization of focus in cases where formal-semantic theories make clear predictions.

Natural data from podcasts, radio, etc.

Find data using speech search engine based on speech recognition (Everyzing)

Automate all of the workflow

Today: preliminary data from pilot

he stayed longer than I did

-er [[ he he stayed x long]2

than [ IF stayed x long ]~2]

[ y stayed x-long ] antecedent clause

[ speaker stayed x-long ] scope of focus

… I should have liked that song a lot more than I did.

[more

x[[should w[ I like that song x well in w]]

than [I like that song x well in w0]]]

I understand even less than I did before

even less [[ I prs understand x much]2

than [I understood x much beforeF] ]~2]

Focus in comparative clauses

• Coherent syntactic-semantic theory about where focus should go

• Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause

• On a theoretical basis, we often think we know the correct grammatical analysis of sentences people use

Result

Hundreds of elements of a minimal pair varying position for focus

Speech files for short and 10-second intervals spanning than I did

Everyzing html contains time offsets for beginnings words. These are converted by program into a Praat representation.

Alingments are not good enough to use without correction.

Classification

Listen to sound snippet to determine if there is an actual token of “than I did”.

True in 56% of cases in a sample of 179 tokens.

Classify correct tokens into three grammatical-semantic classes

s comparing than- and main clauses, reference varies in the position of “I”. This licenses focus on the subject “I”.

[ he looked younger than I did. ]

21/40 tokens

d Comparing than- and main clauses, reference is constant in the position of “I”, but varies in the possible-world or temporal index of did, and not in any following position.

Depending on details of the representation of modality and time, this could license a focus on “did”.

5/40 tokens

f comparing than- and main clauses, reference in the position of I is constant, but varies in some position following did, often a temporal phrase.

I actually look younger now than I did 5 years ago

13/40 tokens

Mark vowel intervals in I and did with hand work.

Pitch in vowel region and duration of vowel region contribute positively to the area under the pitch curve (definite integral of pitch).

Number of glottal pulses in the vowel region.

NLP vs. Acoustic Phonetics

Classification based on signal

NLP classifier based on correct sentence (or speech recognition output), using parsing and machine learning on text features

Multiple focus

Issues marking of multiple foci with different scopes, and prominence of focus relative to accents not marking focus.

You made a very small amount more than I did. Now I make muchF more than youF do.

Recommended