7
Optimizing a Discourse Structuring Component for Utterance Generation in Human-Computer Dialogue Vladimir Popescu 1, 3 , Jean Caelen 2 , Corneliu Burileanu 3 1 Laboratoire Informatique d’Avignon, University of Avignon, France [email protected] 2 Laboratoire d’Informatique de Grenoble, Grenoble Institute of Technology, France [email protected] 3 Faculty of Electronics, Telecommunications and Information Technology, University “Politehnica” of Bucharest, Romania [email protected] Abstract — In this paper we describe a series of optimizations brought to a discourse structuring module for answer generation in dialogue systems. Starting from a baseline rhetorical structure updating algorithm, based on a first-order logic simulation of SDRT (“Segmented Discourse Representation Theory”), several improved versions are obtained, applying a series of pragmatically-driven, psychologically-motivated or purely computational optimizations. These optimizations stem from a reduction in the complexity of the baseline algorithm, as well as from a better conformance of the rhetorical structures obtained to linguistic predictions. Discourse structuring; human-computer dialogue; language generation; logic programming; runtime optimization I. INTRODUCTION In natural language generation (NLG) for human-computer dialogue (HCD), several main trends and approaches can be identified nowadays: (i) rule / grammar-based systems, which are generally either logic-based [17], template-based [18], or a combination between these two [16]; (ii) corpus-based, which are generally either purely stochastic [9], or data-oriented, grammar-based [13]. As for the NLG components involved in HCD, several options exist in currently-reported systems: (i) usage only of surface realization [8], [13], (ii) usage of surface realization combined with microplanning algorithms [17], and (iii) usage of surface realization, microplanning and discourse structuring: [16], [18]. This paper is concerned with the design and development of a rhetorical structuring component for NLG in HCD for an application concerning book reservation in a library; this module is based on a first-order logic (FOL) emulation of a fragment of SDRT [1], relying on two knowledge sources, a language and task-independent set of discourse predicates, and a language-independent, domain-specific task ontology, which specifies the relations between the relevant concepts in the domain of study [10]. This rhetorical structuring component has several roles in HCD systems: (i) rhetorical structuring of multi sentential machine dialogue turns, (ii) connection of machine dialogue turns to the dialogue history, (iii) fine-tuning of surface generation decisions, such as anaphora or ellipsis. This rhetorical structuring component is due to be used in a HCD system for several tasks: meeting room reservation [2], or book reservation [10]. Its baseline version, reported in [10], has the following characteristics and limitations: (i) approximation and formalization of 17 SDRT rhetorical relations in FOL, (ii) use of a generic set of rhetorical predicates for supporting the discourse relations definitions; (iii) discourse structure updating algorithm which is quadratic in the number of current utterances in dialogue, (iv) too high complexity of the computational process. The high complexity of the discourse updating algorithm has determined our team to pursue research and development of several ways of reducing the computational costs, enforcing at the same time the conformance of the rhetorical structures obtained, to linguistic predictions and to SDRT specifications. The optimizations applied are of several types: pragmatically- driven, psychologically motivated, or rather computational. The step-by-step development of these optimizations, as well as the evaluation of their effectiveness, are the subject of this paper. All the research efforts regarding the speedup in language generation for human-computer dialogue applications are motivated by the fact that, unlike in most of the monologue generation situations [6], [14], the dialogue with the human user imposes real-time constraints on the computational processes involved; hence, a fast generation component is required for human-computer dialogue applications. The paper is structured as follows: the next section describes the incremental development of the rhetorical structuring component, starting from a baseline algorithm (described in detail in [10]), to which speech act-related constraints are applied in a first step (in the manner described in detail in [11]), dialogue history limitation (based on psychological motivation [15]) is imposed in a second step, and the usage of a stack of computed discourse predicates is proposed in a third step; the third section describes a series of runtime evaluations of these optimizations on typical dialogues supported by the system; the fourth section concludes the The research reported in this paper was funded by the Romanian Government, under the National Research Authority CNCSIS grant “IDEI” no. 114/2007, code ID_930.

[IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

Embed Size (px)

Citation preview

Page 1: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

Optimizing a Discourse Structuring Component for Utterance Generation in Human-Computer Dialogue

Vladimir Popescu1, 3, Jean Caelen2, Corneliu Burileanu3 1Laboratoire Informatique d’Avignon, University of Avignon, France

[email protected] 2Laboratoire d’Informatique de Grenoble, Grenoble Institute of Technology, France

[email protected] 3Faculty of Electronics, Telecommunications and Information Technology, University “Politehnica” of Bucharest, Romania

[email protected]

Abstract — In this paper we describe a series of optimizations brought to a discourse structuring module for answer generation in dialogue systems. Starting from a baseline rhetorical structure updating algorithm, based on a first-order logic simulation of SDRT (“Segmented Discourse Representation Theory”), several improved versions are obtained, applying a series of pragmatically-driven, psychologically-motivated or purely computational optimizations. These optimizations stem from a reduction in the complexity of the baseline algorithm, as well as from a better conformance of the rhetorical structures obtained to linguistic predictions.

Discourse structuring; human-computer dialogue; language generation; logic programming; runtime optimization

I. INTRODUCTION In natural language generation (NLG) for human-computer

dialogue (HCD), several main trends and approaches can be identified nowadays: (i) rule / grammar-based systems, which are generally either logic-based [17], template-based [18], or a combination between these two [16]; (ii) corpus-based, which are generally either purely stochastic [9], or data-oriented, grammar-based [13].

As for the NLG components involved in HCD, several options exist in currently-reported systems: (i) usage only of surface realization [8], [13], (ii) usage of surface realization combined with microplanning algorithms [17], and (iii) usage of surface realization, microplanning and discourse structuring: [16], [18].

This paper is concerned with the design and development of a rhetorical structuring component for NLG in HCD for an application concerning book reservation in a library; this module is based on a first-order logic (FOL) emulation of a fragment of SDRT [1], relying on two knowledge sources, a language and task-independent set of discourse predicates, and a language-independent, domain-specific task ontology, which specifies the relations between the relevant concepts in the domain of study [10].

This rhetorical structuring component has several roles in HCD systems: (i) rhetorical structuring of multi sentential machine dialogue turns, (ii) connection of machine dialogue

turns to the dialogue history, (iii) fine-tuning of surface generation decisions, such as anaphora or ellipsis.

This rhetorical structuring component is due to be used in a HCD system for several tasks: meeting room reservation [2], or book reservation [10]. Its baseline version, reported in [10], has the following characteristics and limitations: (i) approximation and formalization of 17 SDRT rhetorical relations in FOL, (ii) use of a generic set of rhetorical predicates for supporting the discourse relations definitions; (iii) discourse structure updating algorithm which is quadratic in the number of current utterances in dialogue, (iv) too high complexity of the computational process.

The high complexity of the discourse updating algorithm has determined our team to pursue research and development of several ways of reducing the computational costs, enforcing at the same time the conformance of the rhetorical structures obtained, to linguistic predictions and to SDRT specifications. The optimizations applied are of several types: pragmatically-driven, psychologically motivated, or rather computational. The step-by-step development of these optimizations, as well as the evaluation of their effectiveness, are the subject of this paper.

All the research efforts regarding the speedup in language generation for human-computer dialogue applications are motivated by the fact that, unlike in most of the monologue generation situations [6], [14], the dialogue with the human user imposes real-time constraints on the computational processes involved; hence, a fast generation component is required for human-computer dialogue applications.

The paper is structured as follows: the next section describes the incremental development of the rhetorical structuring component, starting from a baseline algorithm (described in detail in [10]), to which speech act-related constraints are applied in a first step (in the manner described in detail in [11]), dialogue history limitation (based on psychological motivation [15]) is imposed in a second step, and the usage of a stack of computed discourse predicates is proposed in a third step; the third section describes a series of runtime evaluations of these optimizations on typical dialogues supported by the system; the fourth section concludes the

The research reported in this paper was funded by the RomanianGovernment, under the National Research Authority CNCSIS grant “IDEI”no. 114/2007, code ID_930.

Page 2: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

paper, discusses current limitations and provides pointers to further developments.

II. RHETORICAL STRUCTURING ALGORITHMS

A. Baseline Method In the rhetorical structuring component described, two sets

of rhetorical relations are formalized: dialogue-specific (in a number of ten) and monologue-specific (in a number of seven) [1], [10], each rhetorical relation being expressed as a conjunction or disjunction of (optionally negated) discourse predicates.

The baseline rhetorical structure updating algorithm involves that for a current machine speech turn (to be generated), the last utterance is rhetorically connected to all previous utterances in the same turn (via monologue relations) and to all previous utterances in previous speech turns (via monologue relations - for turns came from the machine, or dialogue relations - for turns came from the user). Thus, all possible rhetorical relations are checked, and those proved to be false are discarded, the rest of them being retained in the (updated) discourse structure. The algorithm has been described and illustrated in detail in [10].

Denoting by T the time needed to prove a rhetorical relation between two utterances 1 , by N the number of existing utterances in the dialogue, by M the number of current speech turns in dialogue, and by R the number of possible rhetorical relations between utterances (this number equals Rd for utterances came from different speakers and Rm for utterances came from the same speaker), in the same speech turn, we can determine the time τLAST_UTT required for the updating the discourse structure with one utterance [10]:

τLAST_UTT < Τ ⋅ Rd ⋅ (α ⋅ Ν2 − β ⋅ Ν ⋅ Μ + Μ2 + γ ⋅ Ν − δ),

where α, β, γ and δ are positive real constants.

This results in a quadratic complexity of the algorithm on the size of the dialogue.

B. First Optimization: Usage of Speech Acts The addition of speech act-related constraints to the

baseline algorithm has been reported in detail in [11], but the main idea is that pairs of speech acts (e.g., FS – “MAKE-KNOW”, for an act of informing the user on something) determine the sets of possible rhetorical relations connecting the corresponding pair of utterances; e.g., for the pair of acts FFS – “MAKE-DO-KNOW”, for a request of information, and FS), the authorized relations, in dialogue context, are: Question-Answer Pair (QAP), Indirect Question-Answer Pair (IQAP), and Plan-Elaboration (P-Elab). Thus, out of the 10 (for dialogue), or 7 (for monologue) candidate relations [10], usually much less (around 3 in both dialogue and monologue

1 This time is assumed constant, but, strictly speaking, this is not true, since it depends on the semantics of the rhetorical relations and on the logic formulas expressing the utterances.

contexts [2]) remain to be checked, for a given pair of utterances.

However, applying this optimization does not yield complexity reduction, only the computation time is reduced [11]. Denoting by τLAST_UTT

ACTS the total runtime, for the computation of the rhetorical relations connecting the last utterance to the rhetorical context, with speech acts, we obtain only an average reduction by a factor of six, with respect to previous computations:

τLAST_UTTACTS ≈ 1 / 6 ⋅ τLAST_UTT.

C. Second Optimization: Limiting the Accessible Dialogue History Research in psycholinguistics has shown that, in human

communication, a certain speech turn produced is connected to not too far preceding turns [15]. This constitutes the basis for the optimization described in this subsection, which consists in limiting to a (relatively small) constant the set of accessible previous speech turns in discourse update. In this case, the complexity becomes linear in the size of the dialogue.

Thus, denoting by Q the maximum number of prior speech turns to be rhetorically connected to the current one and by τLAST_UTT

LIMITED_HISTORY the time needed to compute the rhetorical relations connecting the last utterance to the limited dialogue history2 , we have that:

τLAST_UTTLIMITED_HISTORY ≤ Τ ⋅ Rd

⋅ [− α ⋅ Q2 + Q − β − α ⋅ Q ⋅ Μ + α ⋅ Q ⋅ Ν + (Μ + Ν) / α],

where α and β are positive real constants. Hence, we have obtained a complexity linear in the size of the dialogue, but depending quadratically on the size of the available dialogue history.

D. Third Optimization: Stack of Computed Predicates The main idea for this third optimization is that discourse

predicates, taking as arguments previously-realized (either by the user or by the machine) utterances, once they are proved (i.e., their truth values are computed), are placed in a lookup table, organized as a stack. The stack structure is motivated by the fact that a more recently previously computed predicate is more likely to be used for proving a rhetorical relation between the current utterance and previous ones, because the discourse updating process tries to connect the current utterance to the most recent previous utterance, then to older ones, until the first one.

The approach is described in detail below:

1. each predicate p of a rhetorical relation ρ, applied on the utterance labeled π is computed (e.g. topic(π), enounce(π) [10]) and used;

2 We assume that speech acts are not used.

Page 3: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

pn(π | π′) ↔ θ(pn(π | π′))

pn-1(π | π′) ↔ θ(pn-1(π | π′))

...

p1(π | π′) ↔ θ(p1(π | π′))

pu′(π | π′ | π′′) ↔ θ(pu′(π | π′ | π′′))

...

p1′(π |π′ | π′′) ↔ θ(p1′(π |π′ | π′′))

pv(π′′) ↔ θ(pv(π′′))

...

p1(π′′) ↔ θ(p1(π′′))

pn(π | π′) ↔ θ(pn(π | π′))

...

p1(π | π′) ↔ θ(p1(π|π′))

2. p(π) along with its truth value, denoted by θ(p(π)) is stored in the stack, as a pair p(π) ↔ θ(p(π));

3. for a subsequent utterance π′, when computing a certain rhetorical relation ρ(π, π′):

a. if predicate p(π) belongs to the semantics of ρ, then its truth value is read from the stack and not computed anymore;

b. else, the stack is further checked, until either:

i. a predicate p′ belonging to ρ and applied to π′ is found: in that case p′(π′) ↔ θ(p′(π)) is read from the stack, or

ii. a predicate p′ belonging to ρ and applied to π is found: in that case, p′(π) ↔ θ(p′(π)) is read from the stack, or

iii. nothing relevant is found in the stack, hence the predicates are computed.

In order to estimate the time needed for a discourse update, as compared to the baseline algorithm, we assume that, unlike in Sections 2.1 to 2.3, the elementary time unit is τ, the time needed to compute the truth value of a predicate included in the definition of a rhetorical relation. We assume that the time for computing conjunctions, disjunctions or negations of predicates is negligible.

Thus, for a rhetorical relation ρ, we consider that its semantics is specified in terms of predicates p1, ..., pn in ρ = p1 ∧... ∧pn 3, and that an utterance is labeled by π.

For a subsequent utterance, π′, computing the truth value of ρ(π, π′) boils down to computing the truth value of: p1(π | π′) ∧... ∧ pn(π | π′); here, | separates possible options, i.e., the predicate pi is computed for either π or π′. After having computed ρ(π, π′), in the time n ⋅ τ, we build a stack as shown in Figure 1.

Figure 1. Predicate stack construction.

For a new rhetorical relation, ρ′ and a new utterance π′′ we have: ρ′ = p1′ ∧... ∧ pn′; then, we compute ρ ∩ ρ′ = {p: ∃ i ∈ {1, ..., n}: pi = p ∧ ∃ j ∈ {1, ..., n′}: pj′ = p}; let ρ ∩ ρ′ = {p1, ..., pt}, t ≤ min(n, n′). We assume that the computation of this set is performed in negligible time.

We compute (also in negligible time) ρ′\ ρ ∩ ρ′ = {p1′, .., pu′}, u = n′ − t ≤ n′. If p1, ..., pt′ take as arguments π or π′, and

3 In fact, not all the connectors are ∧; there are also ¬, ∨ and ⇒ , but this is irrelevant here.

we need to compute ρ(π, π′′) or ρ(π′, π′′), respectively, then we retrieve the corresponding discourse predicates in ρ from the stack, where t′ ≤ t and {p1, ..., pt′} ⊂ {p1, ..., pn}. For computing ρ′(π′′, π | π′), we spend a time of (n′ - t′) ⋅ τ, and the rest of t′ predicates are read from the stack.

Then, the predicates in ρ′ \ {p1, ..., pt′} are put in the stack: ρ′\ {p1, ..., pt′} = {p1′, ..., pu′, p1, ..., pv}, where v = n′ - t′ - u = t - t′. Thus, the stack is updated as in Figure 2. pv, ..., p1 take as arguments only π′′, since otherwise they would have already been in the stack.

As the dialogue progresses, since more rhetorical relations would have been used, t = |{ρ: ρ is used} ∩ ρnew| increases monotonically, therefore u decreases monotonically. Moreover, since more utterances would have been tested for connection via rhetorical relations, t′ → τ, thus v → 0; thus (n′ - t′) → (n′ - t) = u and v → 0. Hence, the stack grows slower and slower, its dimensions tend to asymptotically increase towards an upper bound, roughly equal to: ∑πnew∑i = 1, Rm + Rd ν(ρi)/2, where ρi is a rhetorical relation and ν(ρi) is the number of its predicates; hence, this upper bound is linearly dependent on the size of the dialogue.

Figure 2. Predicate stack update.

Thus, for a new utterance πnew and a set of prior utterances Πprior so that |Πprior| = N, and for Rm or Rd rhetorical relations (intra-speech turn or inter-speech turn, respectively), the computational cost is (Πprior denotes the set of previously used rhetorical relations, and arg(ρ) stands for the set of arguments of the rhetorical relation ρ):

• Without the predicate stack:

τLAST_UTT = τ ⋅ ∑i = 1, Rd | Rm∑π ∈ Πprior

ν(ρi(π, πnew));

• With the predicate stack:

τLAST_UTTSTACK = τ ⋅ ∑i = 1, Rd | Rm∑π ∈ Π

prior (ν(ρi (π, πnew)) − |ρi

∩ (∪ρ ∈ Πprior

∧ π ∈ arg(ρ) ρ)|).

Since we have that:

Page 4: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

Background(π11, π12); Elaboration(π11, π13); Consequence(∧(π12, π13), π14); QAP(π14, π15); P-Elab(π15, π21); P-Elab(π15, π22); Elaboration(π21, π22); IQAP(π22, π23); P-Corr(π23, π31); Contrast(π31, π32); Consequence(π32, π33); IQAP(π33, π34); QAP(π33, Elaboration(π34, π35)); P-Corr(Elaboration(π34, π35), π41); Contrast(π41, π42); Consequence(π42, π43); IQAP(π43, π44); P-Elab(π44, π51); Consequence(π51, π52); Elaboration(π51, π61); QAP(π52, π53); ACK(π53, π61); Consequence(Elaboration(π61, π62), π63).

Since we have that:

lim|Ππριορ|→∞ (|ρι ∩ (∪ρ ∈ Πprior ∧ π ∈ αργ(ρ) ∩ Πprior ρ)|) = ν(ρι),

it results that, for long enough and varied enough dialogues, the time needed for rhetorical predicate computations at each discourse structure update tends to decrease.

III. EVALUATIONS

A. A Typical Dialogue For a quantitative evaluation of the algorithms and

optimizations described above, we used a fragment of a typical dialogue for book reservation in a library (U is the user and M is the machine). This one, consisting in 6 speech turns and 23 utterances, is shown below (the upper indexes represent labels of the utterances, in the form πij, where i is the index of the speech turn containing the utterance, and j is the index of the utterance in the current speech turn):

M1: Welcome to Groplanπ11. This system allows you to find reference materials in the library of our

Universityπ12. At any time you can get help by saying

general usage modeπ13. What do you wantπ14? U1: Theatre book ancient tragedy humm... ancient

Greek theatreπ15 M2: I have found more than 13 ancient Greek

tragediesπ21. You can give extra specifications, for instance on the author, or see the solutions or perform a

new searchπ22.

U2: Aristophanesπ23. M3: There is no ancient Greek tragedy written by

Aristophanesπ31. Yet, there are written by Sophocles,

Euripides or Aeschylusπ32. Which proposal are you

interested inπ33? U3: A tragedy with parricide I am looking for a

tragedy where the son kills his fatherπ34. By

Aeschylusπ35. M4: There is neither any tragedy with parricide where

the son murders his father, written by Aeschylusπ41.

Instead, there are, written by Sophocles or Euripidesπ42.

Which proposal are you interested inπ43?

U4: I want to read about a son that kills his fatherπ44.

M5: I have two tragedies written by Sophoclesπ51. You

can see the solutions or perform a new searchπ52.

U5: See the solutionsπ53.

M6: The first tragedy is “Oedipus the King”π61. It is about a prince who murders his father, becomes king and marries his mother, getting to realize and regret it in the

endπ62. Would you like to get more information, the next

tragedy or to perform a new searchπ63? The discourse structure for this dialogue, as computed with

all the four algorithms, is shown in Figure 3; the rhetorical relations are denoted by several labels, which correspond to the semantics defined in SDRT [1]. For example, Elaboration(π11, π13) denotes the fact that utterance π13 elaborates on π11, and that both are non-interrogative utterances. In order to allow for the recursive representations licensed by SDRT, we nest, when appropriate, discourse structures, as in Consequence(Elaboration(π61, π62), π63), which means that utterance π63 is a consequence of the discourse constituent formed by utterances π61 and π62, such that the latter elaborates on the former.

Figure 3. Discourse structure for a typical dialogue.

B. Discourse Structure Updating Times In this subsection we present the computation of the

discourse updating times for the dialogue presented above; these times are obtained under the following constraints: the number of utterances, N = 23; the number of speech turns, M = 6; the number of dialogue rhetorical relations, Rd = 10; the number of monologue rhetorical relations, Rm = 7; the number of dialogue rhetorical relations, when speech acts are used, Rd (i, j) ≈ 3 for all i, j from 1 to N; the number of monologue rhetorical relations, when speech acts are used, Rm (i, j) ≈ 3 for all i, j from 1 to N; the size of the dialogue history (expressed in speech turns), Q = 3.

Page 5: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

Figure 4. Discourse structure updating times.

The resulted times are expressed in terms of τ, the average time needed for computing the truth value for one discourse predicate; this choice is preferred since it provides time estimates that are independent of the implementation platform or of hardware constraints (viz. processing power). Thus, in Figure 4 we show the discourse updating times for the last utterance in the current speech turn, for the dialogue presented above, for the incremental usage of the three optimizations presented in this paper, compared to the performance of the baseline algorithm.

In Figure 4 we can see that, whereas the discourse updating times with one utterance for the baseline and speech act-constrained algorithms are approximately quadratic in the size of the dialogue, the updating times for the further optimized discourse updating algorithms are linear in the size of the dialogue (expressed in speech turns). Due to the fact that the size of the dialogue is expressed in speech turns and not in utterances, the shapes of discourse update curves for the baseline and speech act-constrained algorithms are only slightly quadratic.

IV. CONCLUSIONS AND FURTHER WORK The paper has described the incremental development of a

rhetorical structuring component for NLG in HCD, following several steps: (i) first, a baseline algorithm was designed, relying on a FOL emulation of SDRT and on a set of discourse predicates; (ii) then, a first optimization through speech act usage was applied; this resulted in reduction of the discourse updating time, and in addition of supplemental pragmatic constraints [11]; (iii) then, a second optimization, based on psychological evidence was applied, consisting in the limitation of the accessible dialogue history; this resulted in the reduction of the algorithm complexity; (iv) finally, a third optimization, of a computational nature, was applied; it consisted in using a stack storing a lookup table of previously computed discourse predicates; this resulted in more reduction in discourse updating time.

Cascading all the three optimizations resulted in more than 100 time reductions in the discourse updating time with one

utterance, for a typical dialogue comprising 23 utterances. Out of the optimizations, the first one does not reduce the computational complexity of the algorithm, whereas the latter two reduce it. None of the last two optimization methods alters the discourse structures obtained; qualitative evaluations have been made in this respect, for the task concerned (book reservation in a library).

In the near future, exhaustive evaluations on more substantial dialogue data should be performed; however, in order to do this, we have to either: (i) have dialogue data acquired through Wizard-of-Oz methods, annotate them with the semantics of the utterances and apply the rhetorical structuring algorithms on them, or (ii) couple this rhetorical structuring component to a suitable surface realizer integrated in a dialogue system and create evaluation data via system usage. Both processes are quite expensive in time, but the second one is under consideration, since a dialogue system has already been developed by our team [2], [8].

Moreover, the discourse structure updating component described in this paper could be further optimized, with respect to the relevance of the rhetorical relations computed, by considering each rhetorical structure computation as a logic program, that is further converted to an attribute grammar [4], evaluated in its turn [12].

APPENDIX: COMPUTATION OF DISCOURSE UPDATING TIMES

In this appendix we illustrate at a certain level of detail the way whereby the discourse updating times are computed, for one utterance and for all the versions of the algorithms presented in the paper. Thus, we consider the typical dialogue shown in the paper, emphasizing the discourse structure updating times for the second (and last) utterance of the second speech turn produced by the machine, i.e., of the utterance labeled π22. Furthermore, we will denote by K(π) the logical form expressing the meaning of utterance π, and by Σρ, the semantics of the rhetorical relation of type ρ.

Several predicates are used; giving their definition here would be very space consuming, therefore the reader is referred to [10] for details in this regard. In showing the computation of the discourse updating times, we assume that utterance π22 is due to be inserted in the discourse structure for the dialogue; the computations are performed separately for each version of the discourse updating algorithm. Thus:

• Baseline algorithm:

o first, utterance π22 is checked against utterance π21, i.e., the algorithm tries to connect these utterance via rhetorical relations; since both utterances have the same emitter, the candidate rhetorical relations are monologue-specific, namely ρ ∈ {Alternation, Background, Consequence, Elaboration, Narration, Contrast, Parallel};

o then, each rhetorical relation is checked, for the two utterances, thus, e.g. :

Page 6: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

ΣAlternation(π21, π22) → enounce(π21) ∧ enounce(π22) ∧ ¬ equals(K(π21) ∨ K(π22), ∅) ∧ equals(K(π21) ∧ K(π22), ∅); this involves the computation of five discourse predicates, those separated by conjunctions, hence the time consumed for this check is 5 ⋅ τ;

ΣBackground(π21, π22) → enounce(π21) ∧ enounce(π22) ∧ (K(π21) ⇒ K(π22)) ∧ equals(∀ θ, θ′: smaller(θ, θ′) ∧ MemberOf(θ′\ θ, equals(⟨ϕi⟩i ∪ ⟨ωj⟩j, Ω) ∧ equals(⟨ϕi⟩i ∩ ⟨ωj⟩j, ∅) ∧ MemberOf(ϕi, K(π21)) ∧ MemberOf(ωj, Ω) ∧ ( k: equals(ϕk, ωj) ∧ MemberOf(ϕk, K(π21))))) ∧ MemberOf(θ, K(π21)) ∧ MemberOf(θ′, K(π21)), t); thus, 15 predicates are computed, in a time of approximately 15 ⋅ τ;

o finally, the total discourse updating time is obtained summing these times over all candidate rhetorical relations 4 and all previous utterances in dialogue;

• Speech act-constrained algorithm:

o since the speech acts conveyed by utterances π21 and π22 (as determined by the dialogue interpreter and the dialogue planner respectively [2]) are MAKE-KNOW (FS) and MAKE-CAN (FP, for granting the user with several choices) respectively, it results that the only possible candidate rhetorical relation is Elaboration: ρ ∈ {Elaboration};

o hence, ΣElaboration(π21, π22) is checked, in a time computed similarly to the one presented above;

o the previous step is repeated for all previous utterances (evaluating each time the set of candidate rhetorical relations, according to the pair of speech acts to be rhetorically connected [11]), and the total updating time is obtained summing these times over the set of previous utterances in dialogue;

• History limited algorithm:

o for Q = q ≥ 1, we check, for utterance π22, only utterances such as π2i, ..., π2−q+1, i, for 0 ≤ i < 2;

4 For user’s utterances, the rhetorical relations are dialogue-specific [10].

o for the example given here, no difference in the updating time is obtained (since q cannot be bigger than 1 anyway);

• Predicate stack-enabled algorithm:

o since we are concerned with determining the set of rhetorical relations between utterances π21 and π22 and the latter is new (i.e., not yet used in rhetorical structure computation), we check in the stack for discourse predicates taking π21 as argument; indeed, given the fact that utterance π21 has been checked for connection against π15, via dialogue-specific rhetorical relations, and against π14 to π11, via monologue-specific relations, we have that the computed discourse predicates taking π21 as argument are those to be found in the set of rhetorical relations P = {Acknowledgement, Plan-Correction, Elaboration, Background, Contrast};

o given the semantics of the rhetorical relations specified above [10], the following predicates in the stack take as argument π21: enounce(), topic(), Δt(), bad_time(Δt ());

o thus, the discourse structure updating times, for the relations shown above in detail, are:

ΣAlternation(π21, π22): 5 ⋅ τ − τ = 4 ⋅ τ;

ΣBackground(π21, π22): 15 ⋅ τ - 11 ⋅ τ - τ = 3 ⋅ τ;

o finally, the predicate enounce(π22) is added in the stack.

REFERENCES [1] N. Asher and A. Lascarides, Logics of Conversation. Cambridge

University Press, 2003. [2] J, Caelen and A. Xuereb, Interaction et Pragmatique - jeux de dialogue

et de langage. Editions Hermès-Lavoisier, 2007. [3] L. Danlos, B. Gaiffe., and L. Roussarie., “Document Structuring à la

SDRT,” ACL European Workshop on Natural Language Generation, EWNLG'01 Proceedings, paper no. 3, 10 p, 2001.

[4] P. Deransart and J. Maluszynski, J., A Grammatical View of Logic Programming. MIT Press, 1993.

[5] B. J. Grosz and C. L. Sidner, “Attention, Intentions, and the Structure of Discourse,” Computational Linguistics, vol 12(3), pp. 175-204, 1986.

[6] E. Hovy, “Automated discourse generation using discourse structure relations,” Artificial Intelligence, vol 65, pp. 341-386, 1993.

[7] K. McKeown, “Discourse strategies for generating natural-language text,” Artificial Intelligence, vol 27, pp. 1-42, 1985.

[8] H. Nguyen, Dialogue homme-machine: modélisation de multi-session. Ph D Thesis, Joseph Fourier University, Grenoble, 2005.

[9] A. Oh, and A. Rudnicky, “Stochastic Natural Language Generation for Spoken Dialog Systems,” Computer Speech and Language, vol 16(3-4), pp. 387-407, 2002.

[10] V. Popescu, J. Caelen, and C. Burileanu, “Logic-Based Rhetorical Structuring Component in Natural Language Generation for Human-Computer Dialogue,” Lecture Notes in Computer Science, vol 4629, pp. 309-317, 2007.

Page 7: [IEEE 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue (SpeD) - Constanta, Romania (2009.06.18-2009.06.21)] 2009 Proceedings of the 5-th Conference

[11] V. Popescu, J. Caelen, and C. Burileanu, “Using Speech Acts in Logic-Based Rhetorical Structuring for Natural Language Generation in Human-Computer Dialogue,” ACL SIGDIAL Workshop on Discourse and Dialogue, SIGDIAL'07 Proceedings, pp. 243-246, 2007.

[12] V. Popescu and J. Caelen, “Contextual Filtering of Rhetorical Relations in Discourse Structuring for Language Generation in Human-Computer Dialogue,” ACL SIGGEN Workshop Constraints in Discourse, CiD'08 Proceedings, pp. 115-122, 2008.

[13] A. Ratnaparkhi, “Trainable Approaches to Surface Natural Language Generation and their Application to Conversational Dialog Systems,” Computer Speech and Language, vol 16(3-4), pp. 435-455, 2002.

[14] E. Reiter and R. Dale, Building Natural Language Generation Systems. Cambridge University Press, 2000.

[15] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, 2003.

[16] A. Stent, “A Conversation Acts Model for Generating Spoken Dialogue Contributions,” Computer Speech and Language, vol 16(3-4), pp. 313-352, 2002.

[17] M. Stone, C. Doran, B. Webber, and M. Palmer, “Microplanning with Communicative Intentions: The SPUD System,” Computational Intelligence, vol 19, pp. 311-381, 2003.

[18] M. Theune., From Data to Speech: Language Generation in Context. Ph. D Thesis, University of Eindhoven, 2000.