38
Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Embed Size (px)

Citation preview

Page 1: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Designing an Elicitation Corpus with Semantic Representations

Simon FungAdvisor: Lori LevinNovember 2006

Page 2: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

Page 3: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。

Page 4: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Uses for parallel corpus

statistical MT training data learning about grammar of new

language

Page 5: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Motivation

how do languages form various constructions (e.g. relative clauses)?

1. The student whom I saw2. 我見過的學生。

Page 6: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Motivation

what semantic distinctions are important in different languages?He is talking. Tā zài jiăng

huà.Il parle.

They are talking.

Tā mén zài jiăng huà.

Ils parlent.

He talks. {habitually}

Tā jiăng huà. Il parle.

Page 7: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

The MILE (MInor Language Elicitation) Corpus

sentences covering various semantic categories/constructions

e.g. number, gender, relative clauses to be translated into language

under study semantic representation for each

sentence

Page 8: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

The MILE (MInor Language Elicitation) Corpus

10,000-20,000 words translations done by one person 7 languages per year for next 5

years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer

electronic resources

Page 9: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Constraints

maximize range of semantic categories and constructions

minimize corpus size

Page 10: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Constraints

different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate

through features e.g. no sing./pl. distinction → no dual

Page 11: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Page 12: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Page 13: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np-number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np-pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity-n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio-gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open-question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a))

Page 14: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Page 15: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Page 16: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Feature name

Page 17: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Feature name

value

Page 18: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Using semantic representation

Advantages: more precise more complete encode actual linguistic features to

elicit

Page 19: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Page 20: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

Page 21: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Page 22: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。

Page 23: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

1. Naturalness naturalness of sentences vs. holding

lexical items constant• minimal pairs ideal (A tree fell/The tree fell)• but also want natural sentences• natural → easier to translate → less mistakes

She hurt herself. *It hurt itself.

sentences are hand-written vs using natural language generators

(GenKit)

Page 24: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

2. Restrictions

• need to find restrictions on combinations of features

• some combinations invalid/unnatural

• e.g. inclusive and third-person

Page 25: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

3. Definition of values use language-independent

semantic categories precise

e.g. specificity better than definiteness

agreement on definitions• intercoder agreement (informal

experiment)• writers agreed on English forms to use

Page 26: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Avoiding language-specificity many-to-many translations of determiners

I have a cat. J’ai un chat.

The cat is fat. Le chat est gros.

I like chocolate. J’aime le chocolat.

I eat chocolates. Je mange des chocolats.

Communism failed. Le communisme a échoué.

He has (some) money. Il a de l’argent.

I am a teacher. Je suis professeur.

England L’angleterre

I don’t have a/any cat(s). Je n’ai pas de chat.

Page 27: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Avoiding language-specificity

Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.

Page 28: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Definiteness

example of a problem in design of features and values

how to define definiteness, while avoiding using English

definiteness categories?

Page 29: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Criteria for definiteness

Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness

Page 30: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Criteria for definiteness

chose the most important criteria: identifiability specificity

Page 31: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Definiteness

You and I are in a room. I say

“The chair is on fire!”

Page 32: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006
Page 33: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Definiteness

Why did I say “the chair”? identifiability

I know that you know what chair I’m talking about

specificity I’m referring to a particular chair

Page 34: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Grammatical feature: specificity

John wants to marry a Norwegian.Feature: np-specificity

Values specific

John wants to marry a (specific) Norwegian. non-specific

John wants to marry some Norwegian. specificity-neutral

She is a Norwegian.

Page 35: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Grammatical feature: specificity

Turkish direct objects:

Ali bir kitap okudu. Ali one book read Ali read a book.

Ali bir kitab-ı okudu.

Ali one book-acc readAli read a (specific) book.

Page 36: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Layout of Corpus1. Clause types, negation, and formality2. Discourse setting/Speaker-hearer features3. Basic NP features4. Verbal Tense and Aspect5. Evidentiality and Modality6. Causatives7. Comparatives8. Modifiers9. Conjunctions10. Clause-combining

Page 37: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Layout of Corpus

combine feature values systematically why combine

some features interact e.g. Will the woman be happy?

(interrogative, future tense) what to combine?

some features known to interact e.g. person, number (I am, we are, he is)

Page 38: Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006

Status

delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani