Curs 10 Natural Language Generation a highly complex task both for people and for machines Slide-uri...

Preview:

Citation preview

Curs 10Curs 10Natural Language Natural Language

GenerationGenerationa highly complex task a highly complex task

both for people and for machinesboth for people and for machines

Slide-uri împrumutate de la Michael Zock

LIMSI-CNRSOrsay, France

2

Some preliminary issues Some preliminary issues : : WarningWarning

This is not a state of the art talk. If you are interested in those, this here could be a starting point :

Bateman & Zock : (2003) Natural Language Generation. In R. Mitkov (Ed.) Handbook of Computational Linguistics, Oxford University Press, pp. 284-304

List of systems: http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm

Anything related to NLG: http://www.siggen.org/

3

Some preliminary Some preliminary issuesissues

Background materialBackground material

Willem Levelt• Speaking : from Intention to Articulation, MIT Press, 1989

E. Reiter & R. Dale• Building Natural Language Generation Systems (2000),

Cambridge University Press

4

Overview of this talkOverview of this talk

Part 1 : General problems • knowledge and constraints, architecture,

process, etc.

Part 2 : Deep generation message planning message ordering (text plan, outline)

Part 3: Surface generation lexical choice (acces and synthesis) computation of syntactic structure

Different waysDifferent ways to look atto look at text text generationgeneration

NLG

Fully automated generation text

Simulation of psycho-logical processes

connectionism

Online processingIncremental generation

Semi-automated, machine-mediated-generation

Writer’s workbench

Foreign languagelearning

NLG NLG FORFOR peoplepeople NLG NLG LIKELIKE peoplepeopleNLG NLG WITHWITH peoplepeople

6

What is NLG? - What is NLG? - askask googlegoogle

Fort méconnue du grand public, la génération de textes demeure une discipline sportive essentiellement universitaire, pratiquée par d'obs-curs chercheurs dans des labora-toires tristes et exigus. Cette dis-cipline pousse ses malheureux adeptes à des pratiques honteuses : la génération par ordinateur inter-posé de textes longs et soporifiques à partir d'une composition séman-

tique produite mécaniquement.

Hardly known by the great majority of people, text generation remains a sport basically practiced by people from academia. Those engaged in this activity usually work in sad and narrow places. The discipline induces strange kinds of behavior like the generation of long and boaring texts via computers on the basis of mechanically produced semantic representations.

7

What is NLG?What is NLG? In search for a definitionIn search for a definition

The focus and definition may depend on the domain (psychology, linguistic, computer science)

Mapping problem: translate meanings into linguistic form

Linguistically-mediated problem solving

Language as a search problem

8

What is NL-What is NL-Generation? Generation? (I)(I)

Generation as aGeneration as a mappingmapping processprocess

NLG viewed as a process of mapping a conceptual structure (meaning) onto a linguistic

form

Input: concepts

Output: words

C1

W1

C2

W2

C3

W3

9

Catch me if you canCatch me if you can

We tend to think faster than we can find the corresponding words and convert them into

sounds

Conceptualization

Expression

C1

W1

C2

W2

C3

W3

C4

There is There is nono one-to-oneone-to-one mapping betweenmapping between linguistic linguistic

structuresstructures andand conceptual conceptual structuresstructures

11

The The samesame conceptualconceptual structurestructure may may map ontomap onto

differentdifferent linguisticlinguistic structuresstructures (synonymes, paraphrase)(synonymes, paraphrase)

This car belongs to the president verb This is the car of the president preposition

This is the president's car genitive This is his car. Poss. Adj.

PossessionPossession

12

The The samesame linguisticlinguistic structurestructure may may map ontomap onto

differentdifferent conceptualconceptual structuresstructures

Peter's car is broken possession Peter's brother is sick family relationship Peter's leg hurts inalienable

possession, part of

Linguistic ressource: genitif

13

NLG as NLG as language mediatedlanguage mediated problem solvingproblem solving

14

A A simplesimple generatiogeneratio

n n modelmodel

15

Nature of choicesNature of choices

pragmatic conceptual linguistic

16

Pragmatic choicesPragmatic choices

Languages are indirect means for achieving goals

• mediating devices

Different linguistic means serve different discourse purposes

• i.e. different forms are used in order to achieve different goals

17

Pragmatic choices: Pragmatic choices: languagelanguage as a as a resourceresource

active vs. passive voice [topic, perspective]

main vs. subordinate clause [relative prominence]

18

Conceptual choicesConceptual choices

Different meanings yield generally different forms

NUMBER he sings vs. they singTENSE he sings vs. he sang

19

Linguistic choicesLinguistic choices

The same meaning can be expressed by different words or syntactic forms (synonymes, paraphrases)

man, guy, chapGROWN UP MALE PERSON:

help, give a hand, assistHELP:

20

What is NL-What is NL-Generation?Generation?

Tentative definition Tentative definition (III)(III)

Generation as a search problemSize of mental lexicon : appr. 30 000 words

An An abstract abstract

viewview

An An exampleexample

23

Input: Input: analysisanalysis

24

Input: Input: synthesissynthesis

25

Different search Different search spacesspaces

Fundamental Fundamental problemsproblems

Analysis : ambiguity

Generation : choice

27

Why bother about generation ? Why bother about generation ? (1)(1)

DifferentDifferent kinds of kinds of motivationmotivation

Theoretical Practical Industrial

28

Theoretical reasons - building Theoretical reasons - building and testing a theoryand testing a theory

Testbed for a linguistic theory : • coverage (over/undergeneration),

correctness

Testbed for a psychological model: • simulation of cognitive processes (on-line

processing, language learning)

29

Practical reasonsPractical reasons(industrial-full automation)(industrial-full automation)

machine translation

text generation (business letters)

generation of resumes (stock market report, weather forecast, etc.)

help systems (audit trail, access to DB)

abstracting

30

Practical reasonsPractical reasons(help systems, semi automation)(help systems, semi automation)

Computer assisted language learning (tools)

Writer's workbench (pre/postediting: correction of grammar, style, spelling, text organization)

31

The decomposition of The decomposition of the task: the task: NLG-NLG-architecturesarchitectures

32

A A twotwo--stagestage modelmodelDivision of labor

GOAL

33

Four componantsFour componants

34

Procedural know-howProcedural know-how

Planning (determine the order of the different steps - textual organisation)

Searching (find the words; access)

Reasoning-inferencing (« see » possible links between ideas)

LTM

Up to lifetime

STMless than 30 seconds

Rose

Sensory Memory

1 second

Basic Memory Basic Memory Processes Processes

37

Number of choicesNumber of choices (space + time constraints)(space + time constraints)

We have to take a great number of choices under severe space and time constraints

space constraint (limitation of STM) time constraint : (speed)

speech is fast: 3-5 words / second average of decisions / word = 4

38

Diversity of choicesDiversity of choices

Conceptual choices Linguistic choices Pragmatic choices

39

The necessary information for synthesis is scattered all over

BOOKBOOK

Pronoun

Direct ObjectSubject

LISTENERLISTENER

Pronoun

GIVEGIVE

SPEAKERSPEAKER

Indirect Object

40

HowHow toto expressexpress the notion of thethe notion of the speakerspeaker ??

WhatWhat do the different formsdo the different forms dependdepend upon?upon?

SPEAKERmeme meme

moimoi meme

nousnous we / uswe / us

jeje II

41

Tu me donnes le livre.

You give me the book.

Tu nous donnes le livre.

You give us the book.

Tu ME donnes le livre.You give me the book.

Tu lui donnes le livre.You give him/her the book.

Person

Number

LISTENERLISTENER GIVEGIVE BOOKBOOKDO

SPEAKERSPEAKER

IO

Subj.

42

Donne-moi ce livre !Give me this book !

Ne me le donne pas !Don’t give me this book !

Tu m’as donné le livre.You have given me the book.

Tu me donnes le livre.You give me the book.

Donne-le moi !Give it to me !

Tu me donnes le livre.You give me the book.

Speech act

Tense

Polarity

LISTENERLISTENER GIVEGIVE BOOKBOOKDO

SPEAKERSPEAKER

IO

Subj.

43

44

Input present

PRAGMATIC CHOICEPaul = topicMarie = givenAider = new

MORPHOLOGY

Verb : 3d person, singular, present aide

Subject : Noun Paul

Direct object : pronoun la

LEXICALIZATION

HELP = aider PAUL = Paul MARY = Marie

PHONO-GRAPH. SYNTH.

Paul l’aide.

PAULPAUL MARYAgent Object

PART OF SPEECH

HELP = verb

Paul = noun

Mary = pronoun

WORD ORDER

SUBJECT noun

DIR. OBJECT pronoun

VERB verb

HELPHELP

voice = active

Paul = subject

Mary = direct object

SYNT. FUNCT. & VOICE

Paul helps her

output

45

Consequences for languages, Consequences for languages, architecture & processingarchitecture & processing

languages are and need to be flexible

information does not become available in a strict order: it may vary on every occasion

EVENT-TIME-PLACE vs. PLACE-EVENT-TIME , etc.

Consequences (interaction and accomodation) Data : accomodation of the different data structures (interaction between

words and syntax) in the different modules (conceptual lexical, syntactic), Process : feedback to higher components

46

Example illustrating Example illustrating thethe consequencesconsequences (i.e. (i.e.

functional dependenciesfunctional dependencies ) ) of theof the choiceschoices

47

Conceptual inputConceptual input

48

LetLet’’s consider the s consider the consequencesconsequences

of the following of the following 22 choiceschoices

Topicalisation the concept to start the sentence with

Lexical choice synonymes

49

Topicalize Topicalize AgentAgent

Consequences:

Agent --> Subject voice --> active Patient --> Direct Object

50

Consequences of Consequences of topicalisationtopicalisation

51

Topicalize Topicalize PatientPatient

Consequences: Agent --> PP Voice --> passive Patient -->grammatical Subject

52

Consequences of Consequences of topicalisationtopicalisation

53

Summary of theSummary of the consequences consequences of theof the topicalizationtopicalization choicechoice at at

the topthe top levellevel

Strategy 1 Strategy 2

Topic agent patient

Agent grammatical subject

preposit. phrase

Patient direct object grammat. subject

voice active passive

54

Assumptions - Assumptions - ConclusionConclusion

No superexpert but a set of cooperative agents competition - accomodation no algorithmic processing but opportunistic

planning various orders of processing various components need the same information system is heterarchical rather than hierarchic

Recommended