54
Curs 10 Curs 10 Natural Language Natural Language Generation Generation a highly complex task a highly complex task both for people and for machines both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Embed Size (px)

Citation preview

Page 1: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Curs 10Curs 10Natural Language Natural Language

GenerationGenerationa highly complex task a highly complex task

both for people and for machinesboth for people and for machines

Slide-uri împrumutate de la Michael Zock

LIMSI-CNRSOrsay, France

Page 2: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

2

Some preliminary issues Some preliminary issues : : WarningWarning

This is not a state of the art talk. If you are interested in those, this here could be a starting point :

Bateman & Zock : (2003) Natural Language Generation. In R. Mitkov (Ed.) Handbook of Computational Linguistics, Oxford University Press, pp. 284-304

List of systems: http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm

Anything related to NLG: http://www.siggen.org/

Page 3: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

3

Some preliminary Some preliminary issuesissues

Background materialBackground material

Willem Levelt• Speaking : from Intention to Articulation, MIT Press, 1989

E. Reiter & R. Dale• Building Natural Language Generation Systems (2000),

Cambridge University Press

Page 4: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

4

Overview of this talkOverview of this talk

Part 1 : General problems • knowledge and constraints, architecture,

process, etc.

Part 2 : Deep generation message planning message ordering (text plan, outline)

Part 3: Surface generation lexical choice (acces and synthesis) computation of syntactic structure

Page 5: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Different waysDifferent ways to look atto look at text text generationgeneration

NLG

Fully automated generation text

Simulation of psycho-logical processes

connectionism

Online processingIncremental generation

Semi-automated, machine-mediated-generation

Writer’s workbench

Foreign languagelearning

NLG NLG FORFOR peoplepeople NLG NLG LIKELIKE peoplepeopleNLG NLG WITHWITH peoplepeople

Page 6: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

6

What is NLG? - What is NLG? - askask googlegoogle

Fort méconnue du grand public, la génération de textes demeure une discipline sportive essentiellement universitaire, pratiquée par d'obs-curs chercheurs dans des labora-toires tristes et exigus. Cette dis-cipline pousse ses malheureux adeptes à des pratiques honteuses : la génération par ordinateur inter-posé de textes longs et soporifiques à partir d'une composition séman-

tique produite mécaniquement.

Hardly known by the great majority of people, text generation remains a sport basically practiced by people from academia. Those engaged in this activity usually work in sad and narrow places. The discipline induces strange kinds of behavior like the generation of long and boaring texts via computers on the basis of mechanically produced semantic representations.

Page 7: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

7

What is NLG?What is NLG? In search for a definitionIn search for a definition

The focus and definition may depend on the domain (psychology, linguistic, computer science)

Mapping problem: translate meanings into linguistic form

Linguistically-mediated problem solving

Language as a search problem

Page 8: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

8

What is NL-What is NL-Generation? Generation? (I)(I)

Generation as aGeneration as a mappingmapping processprocess

NLG viewed as a process of mapping a conceptual structure (meaning) onto a linguistic

form

Input: concepts

Output: words

C1

W1

C2

W2

C3

W3

Page 9: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

9

Catch me if you canCatch me if you can

We tend to think faster than we can find the corresponding words and convert them into

sounds

Conceptualization

Expression

C1

W1

C2

W2

C3

W3

C4

Page 10: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

There is There is nono one-to-oneone-to-one mapping betweenmapping between linguistic linguistic

structuresstructures andand conceptual conceptual structuresstructures

Page 11: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

11

The The samesame conceptualconceptual structurestructure may may map ontomap onto

differentdifferent linguisticlinguistic structuresstructures (synonymes, paraphrase)(synonymes, paraphrase)

This car belongs to the president verb This is the car of the president preposition

This is the president's car genitive This is his car. Poss. Adj.

PossessionPossession

Page 12: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

12

The The samesame linguisticlinguistic structurestructure may may map ontomap onto

differentdifferent conceptualconceptual structuresstructures

Peter's car is broken possession Peter's brother is sick family relationship Peter's leg hurts inalienable

possession, part of

Linguistic ressource: genitif

Page 13: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

13

NLG as NLG as language mediatedlanguage mediated problem solvingproblem solving

Page 14: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

14

A A simplesimple generatiogeneratio

n n modelmodel

Page 15: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

15

Nature of choicesNature of choices

pragmatic conceptual linguistic

Page 16: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

16

Pragmatic choicesPragmatic choices

Languages are indirect means for achieving goals

• mediating devices

Different linguistic means serve different discourse purposes

• i.e. different forms are used in order to achieve different goals

Page 17: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

17

Pragmatic choices: Pragmatic choices: languagelanguage as a as a resourceresource

active vs. passive voice [topic, perspective]

main vs. subordinate clause [relative prominence]

Page 18: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

18

Conceptual choicesConceptual choices

Different meanings yield generally different forms

NUMBER he sings vs. they singTENSE he sings vs. he sang

Page 19: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

19

Linguistic choicesLinguistic choices

The same meaning can be expressed by different words or syntactic forms (synonymes, paraphrases)

man, guy, chapGROWN UP MALE PERSON:

help, give a hand, assistHELP:

Page 20: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

20

What is NL-What is NL-Generation?Generation?

Tentative definition Tentative definition (III)(III)

Generation as a search problemSize of mental lexicon : appr. 30 000 words

Page 21: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

An An abstract abstract

viewview

Page 22: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

An An exampleexample

Page 23: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

23

Input: Input: analysisanalysis

Page 24: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

24

Input: Input: synthesissynthesis

Page 25: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

25

Different search Different search spacesspaces

Page 26: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Fundamental Fundamental problemsproblems

Analysis : ambiguity

Generation : choice

Page 27: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

27

Why bother about generation ? Why bother about generation ? (1)(1)

DifferentDifferent kinds of kinds of motivationmotivation

Theoretical Practical Industrial

Page 28: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

28

Theoretical reasons - building Theoretical reasons - building and testing a theoryand testing a theory

Testbed for a linguistic theory : • coverage (over/undergeneration),

correctness

Testbed for a psychological model: • simulation of cognitive processes (on-line

processing, language learning)

Page 29: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

29

Practical reasonsPractical reasons(industrial-full automation)(industrial-full automation)

machine translation

text generation (business letters)

generation of resumes (stock market report, weather forecast, etc.)

help systems (audit trail, access to DB)

abstracting

Page 30: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

30

Practical reasonsPractical reasons(help systems, semi automation)(help systems, semi automation)

Computer assisted language learning (tools)

Writer's workbench (pre/postediting: correction of grammar, style, spelling, text organization)

Page 31: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

31

The decomposition of The decomposition of the task: the task: NLG-NLG-architecturesarchitectures

Page 32: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

32

A A twotwo--stagestage modelmodelDivision of labor

GOAL

Page 33: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

33

Four componantsFour componants

Page 34: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

34

Procedural know-howProcedural know-how

Planning (determine the order of the different steps - textual organisation)

Searching (find the words; access)

Reasoning-inferencing (« see » possible links between ideas)

Page 35: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

LTM

Up to lifetime

STMless than 30 seconds

Rose

Sensory Memory

1 second

Page 36: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

Basic Memory Basic Memory Processes Processes

Page 37: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

37

Number of choicesNumber of choices (space + time constraints)(space + time constraints)

We have to take a great number of choices under severe space and time constraints

space constraint (limitation of STM) time constraint : (speed)

speech is fast: 3-5 words / second average of decisions / word = 4

Page 38: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

38

Diversity of choicesDiversity of choices

Conceptual choices Linguistic choices Pragmatic choices

Page 39: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

39

The necessary information for synthesis is scattered all over

BOOKBOOK

Pronoun

Direct ObjectSubject

LISTENERLISTENER

Pronoun

GIVEGIVE

SPEAKERSPEAKER

Indirect Object

Page 40: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

40

HowHow toto expressexpress the notion of thethe notion of the speakerspeaker ??

WhatWhat do the different formsdo the different forms dependdepend upon?upon?

SPEAKERmeme meme

moimoi meme

nousnous we / uswe / us

jeje II

Page 41: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

41

Tu me donnes le livre.

You give me the book.

Tu nous donnes le livre.

You give us the book.

Tu ME donnes le livre.You give me the book.

Tu lui donnes le livre.You give him/her the book.

Person

Number

LISTENERLISTENER GIVEGIVE BOOKBOOKDO

SPEAKERSPEAKER

IO

Subj.

Page 42: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

42

Donne-moi ce livre !Give me this book !

Ne me le donne pas !Don’t give me this book !

Tu m’as donné le livre.You have given me the book.

Tu me donnes le livre.You give me the book.

Donne-le moi !Give it to me !

Tu me donnes le livre.You give me the book.

Speech act

Tense

Polarity

LISTENERLISTENER GIVEGIVE BOOKBOOKDO

SPEAKERSPEAKER

IO

Subj.

Page 43: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

43

Page 44: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

44

Input present

PRAGMATIC CHOICEPaul = topicMarie = givenAider = new

MORPHOLOGY

Verb : 3d person, singular, present aide

Subject : Noun Paul

Direct object : pronoun la

LEXICALIZATION

HELP = aider PAUL = Paul MARY = Marie

PHONO-GRAPH. SYNTH.

Paul l’aide.

PAULPAUL MARYAgent Object

PART OF SPEECH

HELP = verb

Paul = noun

Mary = pronoun

WORD ORDER

SUBJECT noun

DIR. OBJECT pronoun

VERB verb

HELPHELP

voice = active

Paul = subject

Mary = direct object

SYNT. FUNCT. & VOICE

Paul helps her

output

Page 45: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

45

Consequences for languages, Consequences for languages, architecture & processingarchitecture & processing

languages are and need to be flexible

information does not become available in a strict order: it may vary on every occasion

EVENT-TIME-PLACE vs. PLACE-EVENT-TIME , etc.

Consequences (interaction and accomodation) Data : accomodation of the different data structures (interaction between

words and syntax) in the different modules (conceptual lexical, syntactic), Process : feedback to higher components

Page 46: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

46

Example illustrating Example illustrating thethe consequencesconsequences (i.e. (i.e.

functional dependenciesfunctional dependencies ) ) of theof the choiceschoices

Page 47: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

47

Conceptual inputConceptual input

Page 48: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

48

LetLet’’s consider the s consider the consequencesconsequences

of the following of the following 22 choiceschoices

Topicalisation the concept to start the sentence with

Lexical choice synonymes

Page 49: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

49

Topicalize Topicalize AgentAgent

Consequences:

Agent --> Subject voice --> active Patient --> Direct Object

Page 50: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

50

Consequences of Consequences of topicalisationtopicalisation

Page 51: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

51

Topicalize Topicalize PatientPatient

Consequences: Agent --> PP Voice --> passive Patient -->grammatical Subject

Page 52: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

52

Consequences of Consequences of topicalisationtopicalisation

Page 53: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

53

Summary of theSummary of the consequences consequences of theof the topicalizationtopicalization choicechoice at at

the topthe top levellevel

Strategy 1 Strategy 2

Topic agent patient

Agent grammatical subject

preposit. phrase

Patient direct object grammat. subject

voice active passive

Page 54: Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri împrumutate de la Michael Zock LIMSI-CNRS Orsay, France

54

Assumptions - Assumptions - ConclusionConclusion

No superexpert but a set of cooperative agents competition - accomodation no algorithmic processing but opportunistic

planning various orders of processing various components need the same information system is heterarchical rather than hierarchic