34
Generation

Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Generation

Page 2: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Aims of this talk Discuss MRS and LKB generation Describe larger research programme:

modular generation Mention some interactions with other

work in progress: RMRS SEM-I

Page 3: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Outline of talk Towards modular generation Why MRS? MRS and chart generation Data-driven techniques SEM-I and documentation

Page 4: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Modular architecture

Language independent component

Meaning representation

Language dependent realization

string or speech output

Page 5: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Desiderata for a portable realization module Application independent Any well-formed input should be

accepted No grammar-specific/conventional

information should be essential in the input

Output should be idiomatic

Page 6: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Architecture (preview)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Page 7: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Why MRS? Flat structures

independence of syntax: conventional LFs partially mirror tree structure

manipulation of individual components: can ignore scope structure etc

lexicalised generation composition by accumulation of EPs: robust

composition Underspecification

Page 8: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

An excursion: Robust MRS Deep Thought: integration of deep and

shallow processing via compatible semantics

All components construct RMRSs Principled way of building robustness into

deep processing Requirements for consistency etc help

human users too

Page 9: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Extreme flattening of deep output

x

every

cat

x

some

y dog1 chase

y x y

some

y dog1

y

every

x cat chase

xe

x ye

lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y),

lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x),

ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5

Page 10: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Extreme Underspecification Factorize deep representation to minimal

units Only represent what you know Robust MRS

Separating relations Separate arguments Explicit equalities Conventions for predicate names and sense

distinctions Hierarchy of sorts on variables

Page 11: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Chart generation with the LKB1. Determine lexical signs from MRS2. Determine possible rules contributing EPs

(`construction semantics’: compound rule etc)

3. Instantiate signs (lexical and rule) according to variable equivalences

4. Apply lexical rules5. Instantiate chart6. Generate by parsing without string position7. Check output against input

Page 12: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Lexical lookup for generation _like_v_1(e,x,y) – return lexical entry for

sense 1 of verb like temp_loc_rel(e,x,y) – returns multiple temp_loc_rel(e,x,y) – returns multiple

lexical entrieslexical entries multiple relations in one lexical entry: multiple relations in one lexical entry:

e.g., e.g., who, wherewho, where entries with null semantics: heuristicsentries with null semantics: heuristics

Page 13: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Instantiation of entries _like_v_1(e,x,y) & named(x,”Kim”) &

named(y,”Sandy”) find locations corresponding to `x’s in all FSs replace all `x’s with constant repeat for `y’s etc

Also for rules contributing construction semantics

`Skolemization’ (misleading name ...)

Page 14: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Lexical rule application Lexical rules that contribute EPs only

used if EP is in input Inflectional rules will only apply if

variable has the correct sort Lexical rule application does

morphological generation (e.g., liked, bought)

Page 15: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Chart generation proper Possible lexical signs added to a chart

structure Currently no indexing of chart edges

chart generation can use semantic indices, but current results suggest this doesn’t help

Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)

Page 16: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Root conditions Complete structures must consume all

the EPs in the input MRS Should check for compatibility of scopes

precise qeq matching is (probably) too strict

exactly same scopes is (probably) unrealistic and too slow

Page 17: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Generation failures due to MRS issues Well-formedness check prior to input to

generator (optional) Lexical lookup failure: predicate doesn’t

match entry, wrong arity, wrong variable types

Unwanted instantiations of variables Missing EPs in input: syntax (e.g., no noun),

lexical selection Too many EPs in input: e.g., two verbs and no

coordination

Page 18: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Improving generation via corpus-based techniques CONTROL: e.g. intersective modifier

order: Logical representation does not determine

order• wet(x) & weather(x) & cold(x)

UNDERSPECIFIED INPUT: e.g., Determiners: none/a/the/ Prepositions: in/on/at

Page 19: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Constraining generation for idiomatic output Intersective modifier order: e.g.,

adjectives, prepositional phrases Logical representation does not

determine order wet(x) & weather(x) & cold(x)

Page 20: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Adjective ordering Constraints / preferences

big red car * red big car cold wet weather wet cold weather (OK, but dispreferred)

Difficult to encode in symbolic grammar

Page 21: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Corpus-derived adjective ordering ngrams perform poorly Thater: direct evidence plus clustering positional probability Malouf (2000): memory-based learning

plus positional probability: 92% on BNC

Page 22: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Underspecified input to generationWe bought a car on FridayAccept:

pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday)

and:pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday)

And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)

Page 23: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Guess the determiner We went climbing in _ Andes _ president of _ United States I tore _ pyjamas I tore _ duvet George doesn’t like _ vegetables We bought _ new car yesterday

Page 24: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Determining determiners Determiners are partly conventionalized,

often predictable from local context Translation from Japanese etc, speech

prosthesis application More `meaning-rich’ determiners assumed to

be specified in the input Minnen et al: 85% on WSJ (using TiMBL)

Page 25: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Preposition guessing Choice between temporal in/on/at

in the morning in July on Wednesday on Wednesday morning at three o’clock at New Year

ERG uses hand-coded rules and lexical categories Machine learning approach gives very high precision

and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)

Page 26: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

SEM-I: semantic interface Meta-level: manually specified

`grammar’ relations (constructions and closed-class)

Object-level: linked to lexical database for deep grammars

Definitional: e.g. lemma+POS+sense Linked test suites, examples,

documentation

Page 27: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

SEM-I development SEM-I eventually forms the `API’: stable,

changes negotiated. SEM-I vs Verbmobil SEMDB

Technical limitations of SEMDB Too painful! `Munging’ rules: external vs internal SEM-I development must be incremental

Page 28: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Role of SEM-I in architecture Offline

Definition of `correct’ (R)MRS for developers

Documentation Checking of test-suites

Online In unifier/selector: reject invalid RMRSs Patching up input to generation

Page 29: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Goal: semi-automated documentation

Lex DB [incr tsdb()]

and semantic test-suite

Object-level SEM-I

Meta-level SEM-I

Documentation

Auto-generate examples

autogenerateappendix

ERGDocumentation

strings

semi-automatic

examples, autogenerated

on demand

Page 30: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Robust generation SEM-I an important preliminary

check whether generator input is semantically compatible with grammars

Eventually: hierarchy of relations outside grammars, allowing underspecification

`fill-in’ of underspecified RMRS exploit work on determiner guessing etc

Page 31: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Architecture (again)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Page 32: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Interface External representation

public, documented reasonably stable

Internal representation syntax/semantics interface convenient for analysis

External/Internal conversion via SEM-I

Page 33: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Guaranteed generation? Given a well-formed input MRS/RMRS,

with elementary predications found in SEM-I (and dependencies)

Can we generate a string? with input fix up? negotiation? Semantically bleached lexical items: which,

one, piece, do, make Defective paradigms, negative polarity,

anti-collocations etc?

Page 34: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other

Next stages SEM-I development Documentation and test suite integration Generation from RMRSs produced by shallower

parser (or deep/shallow combination) Partially fixed text in generation (cogeneration) Further statistical modules: e.g., locational

prepositions, other modifiers More underspecification Gradually increase flexibility of interface to

generation