Interpreting Information Requests in Context A Collaborative Web Interface for Distance Learning

Autonomous Agents and Multi-Agent Systems, 5, 429–465, 2002© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Interpreting Information Requestsin Context A Collaborative Web Interfacefor Distance Learning

CHARLES L. ORTIZ, JR.∗ [email protected] International Artifical Intelligence Center, Menlo Park, CA 94025

BARBARA J. GROSZ [email protected] University, Division of Engineering and Applied Sciences, Cambridge, MA 02138

Abstract. We describe the use of theories of agent collaboration and human dialogue processing inproviding a principled basis for the design of web interfaces to multimedia information stores. The DIALsystem, an implementation in the domain of information support for distance learning by students inan introductory programming class, is used to illustrate the efficacy of this approach. DIAL builds arepresentation of context that is based on the collaborative plans of the system and its user and usesthis contextual information to reduce the communication burden. Context is represented by a structureof intentions that a user is attempting to satisfy. This structure is modified as tasks are completed ortask descriptions are refined. DIAL interprets information requests relative to the prevailing context asit is represented by this structure. As a result, requests may be expressed more economically; contextualinformation is added by the system. Furthermore, DIAL uses information about the intentional contextto respond and act collaboratively, rather than in the master-slave style typical of most current human-computer interfaces. DIAL and the access method it supports provide a unique support tool for distancelearning environments as well as a demonstration of a general way in which agent models can be usedto improve human-computer communication.

Keywords: collaboration, human-computer interaction

1. Introduction

The design of systems for human-computer interaction has largely adhered to anapproach in which the computer is considered a servant that responds to exactlyspecified commands. The significant expansion of the purposes for which comput-ers are used and the increased diversity of people using systems argue for a shiftin perspective. Computer systems are increasingly used for obtaining informationavailable electronically. They are being used by distributed communities comprisingpeople with varying abilities and needs.In such settings, the master-slave perspective on interface design is inappropri-

ate. People should be able to communicate in terms of the work or effects theywant to accomplish, rather than being required to tell a system the specific steps

∗This work was performed when the author was at Harvard University from 1996–1998 as a post-doctoral fellow.

430 ortiz and grosz

it must take to satisfy their needs. They should be able to ask for the informationthey want, rather than being required to tell a system where and how to find thatinformation. For this to be possible, systems must become problem-solving part-ners, and “screen deep” interfaces [35] should be replaced by collaborative systemsfor human-computer communication that track what users are attempting to do andassist them in their tasks without detailed instructions. Computer systems that areproblem-solving collaborators rather than simple servants will significantly increasethe usefulness of large-scale information systems [13, 51].A recent study of “every citizen interfaces” [35] argues that it is necessary to

develop a principled basis on which to design ‘collaborative-ness’ into the next gen-eration of software. This foundation not only would make it possible to designsystems that collaborate with their users, but also would improve the capabilities ofsystems to support users in collaborating with one another. Greif [12] has arguedthat “the next wave of innovation in work-group computing [will result in] productsthat encourage collaboration in the application domain.”In this paper, we describe a step in this direction, an implementation that is

part of a research effort to build collaborative communication systems using agenttechnology derived from the SharedPlans theory of collaboration [14, 16]. The Dis-tributed Information Access for Learning (DIAL) system provides for multimediainteractions with a complex information system. The work addresses the need for“intelligent interfaces” [25]. The system takes an active role, working with users toidentify information relevant to their needs or tasks, rather than providing only anarrow input–output window. In building DIAL, we aimed to develop a system thatenables users to obtain the information they need without having to specify the wayin which the system should find that information. A central goal of this work hasbeen to demonstrate the efficacy of deploying a model of collaboration to informthe design of a system to provide users with natural, flexible, and effective meansof communicating with computer systems.This paper provides an overview of the SharedPlans theory, describes the way in

which its key elements influenced the design of the DIAL system, and then discussesthe main features of the implementation. The next section describes the basic viewof human-computer communication that underlies the system design, contrasting itwith the master-servant view that is prevalent in most extant user interfaces.

2. Motivation and background

A command from a user to a system, whether through some formal language (e.g.,shell languages, query languages, menu-based systems) or a natural language, com-bines communication with action. For example, by clicking “print,” a user not onlycommunicates that printing is of interest, but also makes a request (a speech act[45]) that the system do the printing of particular material. By clicking a file nameor the icon representing a file, a user indicates that she wants to do something withthat file; in so doing she asks the system to shift its attention to the file. Subse-quent actions will indicate what the computer is to do with it (perhaps send it tosomeone else or allow the user to edit it). In these examples, users’ communications

interpreting information requests in context 431

with systems are intended to get the system to act in some way that assists them inachieving their goals.Just as communication through dialogue is essential when people work together,

so too is dialogue essential for human-computer interactions. People enlist com-puter assistance for a variety of goals and activities, ranging from such simpleeveryday tasks as obtaining facts and producing simple documents to more complextasks such as manipulating complex machinery, running a warehouse, and operatinga financial system. In all these situations, users participate in dialogues with systemsabout their goals—the effects they want to achieve or the task they want done.Most current human-computer interface systems are severely limited in their dia-

logue capabilities because several fundamental properties of human dialogue pro-cessing, features which are central to the success of dialogues in human collabora-tions, are lacking in these systems. In particular, they do not adequately representthe intentional structure of the dialogue. Intentional structure [16] represents infor-mation about the actions being done, their relationships to one another, and thepurpose for which they are being done. This information is crucial to dialogueunderstanding. Furthermore, current systems do not adequately track attentionalstate (information about what is salient) [16]. As a result, they do not have avail-able important information about what the user is trying to achieve, nor are theyable to model adequately the dialogue context.Although several systems maintain a history of user actions (e.g., Emacs’s undo

command; Web browsers’ “back” buttons), they record only a linear history thatis isomorphic to the list of user commands. Such history lists record user actionsbut do not capture the reasons for thus actions; they do not reflect the structureof the user’s work activities. As a result, if information must be deleted, the oldestinformation is removed. In contrast, human dialogue has a structure that dependson the structure of the participants’ activity [16, 28]. Information that is linearlydistant and thus considered old and subject to deletion from the history list may infact remain relevant, whereas information that is linearly more recent may, in fact,be irrelevant.For example, someone might be exploring information about a university depart-

ment, then realize she should check traffic conditions and alternative means oftransportation for a short commuting trip she needs to take, and then, once thisinterrupting task is taken care of, go back to get more information about the depart-ment. In this situation, a web browser might construct the history list in Figure 1.

Figure 1. Sample linear history list of web sites visited.

432 ortiz and grosz

If it needs to delete entries, it will delete the two oldest elements (Harvard DEAS,DEAS faculty) even though those are more relevant to what the user is now doingthan is the intervening travel information.Hypertext systems derive context from structure, but the structure they employ

is based on static properties of information rather than the dynamics of the user’stask and the user-system interaction. Even those systems that allow users to refer tohistory lists do not themselves make use of the information those lists contain, nordo they allow users to provide any more general task information that might makequerying or other interactions more efficient. Instead, they interpret each user inputin relative isolation. For example, if a user interested in travel between Providenceand Boston visited the Amtrak website to consider options and then decided tocheck Greyhound bus schedules, the user would have to resupply all the departure,destination, and time-of-day information given at the Amtrak site to the Greyhoundwebsite.Menu systems do carry context but in an implicit way that gives the user very little

control. For instance, users cannot dynamically select intermediate states in a menuchain as stable points from which to perform multiple operations. Typically, onlythe bottom-most and top-most elements of the chain are points at which the usercan work. Furthermore, a user cannot flexibly move back and forth between theseelements; most often, systems require that all decisions be made at the bottom-most level before additional operations can be carried out at a higher level. Thus,it becomes difficult to navigate within a context to pursue several options. Forinstance, in some word-processing systems, a user who wants to change both thesize and type of font must first navigate through a set of menus to the “font” entry,make the desired change in type and then renavigate through the same set of menusback to “font” and choose the “size” entry. That is, the user cannot spontaneouslydecide, based on what he is trying to do, that “font” is an important state fromwhich to make multiple changes.As others have argued [50], a central concern that bridges both human-computer

interfaces (HCIs) and AI approaches to human-computer collaboration is to makeinformation about the current problem and the dialogue context available in theinterface and to integrate use of contextual information in the system. In designingDIAL, we adapted computational models of dialogue and methods developed fordialogue participation in natural language [16, 17, 29, 31, 32] to human-computercommunication in other modalities and media. In particular, we use the SharedPlansspecification of collaboration in multiagent settings as the basis for modeling theintentional structure of its dialogue with the user. DIAL differs from the naturallanguage dialogue work on which it is based in two ways. First, the system andits user communicate through displays and pointing rather than by using a naturallanguage. Second, rather than directly implementing SharedPlans, we have usedthe specification to inform the overall system design and added components to theinterface to handle collaborative activities.Because DIAL represents intentional structure, users can take advantage of dia-

logue context to decrease the amount of information they must explicitly provide.For instance, queries may depend on contextually available information and so canbe more brief. On the system side, the system can tailor its output to be appropriate


Figure 2. DIAL screen showing (counterclockwise from top left): (i) action input, (ii) action context,(iii) text output, and (iv) video/chat window.

in context. For instance, if a student asks to see a video, the system will use con-textual information to determine the appropriate set of videos for the student tochoose among rather than presenting the full list of available videos.The application described in this paper aims to support distance learning1 by

students in an introductory programming class by providing access to files of courseinformation that include lecture notes, exercises, exams, solutions, related resourcetexts, and video tapes of lectures. Anticipated users include students reviewing thismaterial and faculty preparing it.Figure 2 shows part of a typical DIAL session for a student. The video is on sort-

ing relevant to the exam. As described in a later section, the hierarchical informationshown to the left of the video represents both intentional structure and attentionalstate information. In DIAL, menus are context sensitive in a new way: they areconstrained not only by explicit actions, but also by implicit connections repre-sented in the intentional structure. Attentional state information helps determinethe intentional connections that matter. DIAL does not propose a new form ofinterface at the surface level, but rather provides a new methodology for develop-ing collaborative human-computer information systems that carefully models bothwhat the user is doing and why.

3. Overview of DIAL

In DIAL, the user-system dialogue is a collaborative activity in which the user andthe system each undertake actions in service of a shared goal or joint activity. At the

434 ortiz and grosz

start, a user specifies the task that his interaction with DIAL is meant to support.This task is an action to which the user is committed; most often, it is a complexaction with many constituent subactions. For instance, a student might access DIALto prepare for an exam or a lecturer might do so to prepare an assignment. Tosimplify the presentation, we will use “task” throughout the remainder of this paperto refer to a complex action of this sort to which the user is committed. Giventhe user’s overall task, DIAL engages in a collaboration with the user to locateinformation relevant to that task. The user has an intention to do the task; theuser and DIAL form a collaborative plan in service of that intention. Thus, thecollaborative plan is subsidiary [32] to the user’s individual plan. The manner inwhich DIAL interacts collaboratively is based on the specifications given in theSharedPlans theory [14, 15].DIAL makes use of the notion that actions may be decomposed into constituent

subactions. In the SharedPlans formalization, a recipe specifies the constituent sub-actions and any constraints that must hold among them so that doing the subac-tions under those constraints constitutes doing the action. Furthermore, the theorystipulates that agents need know details only of the subactions in which they areparticipating. For example, the action of reviewing a topic requires selection of thetopic (an action that might be done by the student alone or the student and sys-tem together), identification of all material relevant to that topic (an action that thesystem and student do together), and display of this material (an action the systemperforms). The recipe for this way of reviewing a topic would include the constraintthat the topic identification must be done first. Because the display action is per-formed by the system alone, the user does not need to know any details of therecipe for this subaction (e.g., whether ghostview or xdvi is being used).In DIAL, dialogue-structure information is used both to constrain the system’s

interpretation of the user’s “utterances” and to enable the system to produceresponses appropriate to the task and the dialogue context. We use “utterances”rather than “input” and “responses” rather than “output” to emphasize that this isa dialogue. The user and system communicate with one another in a combinationof graphical, textual, and gestural form. (At present, the only gesture is pointing.)They interpret each other’s utterances in context. As described in Section 5, DIALrepresents both the intentional structure of the dialogue and its attentional state.Figure 2 shows a typical student-system interaction. The screen is divided into

four frames. Videos are displayed in the top right frame. These videos can sup-ply a student with information not necessarily found in the lecture notes, such asthe “demonstration” on sorting shown in the video frame in the figure. The bot-tom frame displays text material such as lecture notes, assignments, and examples.The small frame at the top left contains a form for entering actions that the userwants to perform. The entry mechanism is menu based: the menus guide the userby indicating the actions that are possible given the current context. The secondframe from the top, on the left, displays information about both intentional struc-ture and attentional state in a “context tree.” This context tree has several roles:(1) it makes public, and hence mutually believed, the intentional structure; (2) itrepresents the attentional state; and (3) it provides a representation that the usercan manipulate to change attentional state. By allowing the user to manipulate the


attentional state, DIAL provides for flexible elaboration of the intentional structure.A user can navigate to a previous state and reestablish the intentional state contextof his task at that point.The nodes of the context tree explicitly represent user tasks (e.g., study for exam,

review assignment) that have led to user-system collaborations. The hierarchicalrelationships among nodes reflect the decomposition of the tasks. For each node ofthe tree, the user and system have a collaborative plan to identify and obtain infor-mation relevant to the task represented by that node. The collaborative context—SharedPlans spawned by each user task and the relationships among them—isimplicit. The relationships between tasks in the context tree, and the intentionsand plans corresponding to them, is a very general one in terms of a notion ofaction contribution [1]. SharedPlans are partial for much of their life as recipesevolve over time and subtasks are completed. As a result, the context tree is gener-ally an incomplete representation of the collaborative activity. Furthermore, actionsthat are part of the decomposition of a task but not themselves collaborative (i.e.,they are done individually by either the user or the system), are not shown. DIALmarks tasks for which collaborative plans are still actively being pursued with aster-isks. The attentional state comprises the SharedPlans corresponding to these tasksstructured in a way that is mirrored in the hierarchical context-tree structure.In this example, the student has pursued two high-level intentions: Studying

for the exam and reviewing random numbers. As part of the former, the studentreviewed the assignment on linked lists as well as the topic of sorting. The intentionto review the assignment was itself accomplished by first reviewing a particular codeexample and then reviewing the notes on pointers, while the intention to review thematerial on sorting was accomplished by watching the video on selection sort. Ininteractions with DIAL, when a video is requested, the video corresponding to thecurrent context is launched. In the example, the actual identity of the video neednot be known by the user; that name is appended to the user’s request automati-cally. Finally, as part of the task of reviewing random numbers, the student sent amessage to a teaching assistant with a question on random numbers. Note that thislast action is a choice that is explicitly represented by the system through the useof recipe templates as described in Section 5.1.The structure of the context tree reflects the embedding structure among inten-

tions: the context indicates that the student watched the video as a way of reviewingthe topic of sorting which, in turn, was part of a means for studying for the exam.The system uses these implicit, contextual “as a way” relationships to determine therelevant set of videos. As a result, it can produce a small focused list from whichthe user makes a choice. Just as humans may err, DIAL will be more likely to makean error if it is lacking information. For flexibility, and to limit user frustration,DIAL makes it simple for the user to request the full list through the menu systemif the contextual reading of the request is incorrect. This option reflects a designstrategy; we believe that a user of any software system that makes assumptionsabout user intentions may, from time to time, desire or need to overturn a system’ssuggestion and pursue a more “exhaustive” strategy; the system should make suchoverturning easy.

436 ortiz and grosz

It is important to appreciate how much more informative the representation ofcontext in DIAL’s action history is than a strictly “flat” history of the sort that mightbe displayed by a web browser:

Study for Exam (Cs 50)Review Assignment (6: Linked Lists and Pointers)See Code Example (’6.3’: Llist.C)Review Topic (Pointers)Review Topic (Sorting)See Video (3: Selection)Review Topic (Random)Contact Tf (Eric Feigin)

The context tree includes only actions that the system and user mutually believeare being carried out jointly. In general, actions that either is taking alone and forwhich no supporting collaborative activity is needed are not included. We believethat this is a positive feature of the system: details of these actions are suppressedas they need not be communicated. For instance, in the Figure 2 example, the SeeVideo action must be further decomposed by the system into actions such as xanim+f selection.mov before it is executable. Such detail, however, is of no concernto the user because the system alone is performing this action; it is therefore notshown in the action context window. Note, however, that although See Video isa single agent action, it is recorded in the context because it represents a usefulreference point that the user may want to revisit. Various actions that the usermight be undertaking (e.g., handwritten notes or getting a cup of coffee to stayawake) also would not appear in the tree.Each node in the context tree is selectable so that the student can return to

previous contexts. If a user clicks a context tree entry, attention shifts back to thestate that was active at that point. However, only the attentional state changes, theintentional structure is not affected. After making such a shift, a user can thendeclare new intentions, which will be interpreted in that new attentional state. Inthe example in Figure 2, the student could click Review Topic (Pointers)and then ask to see a video. In this case, attentional state will shift back to ReviewTopic (Pointers) and a new node would be added below this entry in thetree. Video choices would then be related to those having to do with pointers. Thestudent could shift attention back to reviewing randomness by clicking ReviewTopic (Random). This feature of DIAL allows users to navigate among intentionswithout having to reproduce entire intentional paths.This way of handling context shifts differs from that which occurs in natural

language dialogues. These differences result from the different kinds of informationdirectly available in speech and graphical displays. In natural language dialogues,intonation may be used to shift implicitly to some contexts: for example, to concludea subdialogue and return to the embedding dialogue. However, shifting to othercontexts—in particular those of completed subdialogues—may require long, explicitdescriptions to identify the context. DIAL avoids this by exploiting the mutual beliefthat the system and the user have regarding intended actions that is captured in thecontext tree and also in the system’s graphical presentations. Since the intentional


context is recorded explicitly and displayed graphically, it is available to a user asa roadmap of the dialogue. Unlike applications of the theory to natural languagedialogue understanding in which a complex process of plan recognition is neededin order to interpret utterances, in DIAL utterances are interpreted relative to theexplicit context. In DIAL, it is no more difficult to return to a prior subdialoguethat is closed than to one that is open, but all such shifts must be made explicitly.Several important characteristics distinguish DIAL from conventional menu-

based interfaces, even those in which menus are chosen dynamically according to auser’s last input.

— DIAL supports a collaborative and progressive refinement of an action context.The context is a richer entity than that found in systems in which the next set ofpermissible commands or menu is given only as a function of the last command.

— DIAL enables a user to navigate within the full context of what has alreadybeen done; for example, it can restore a previous context and then explore anew alternative or branch.

— DIAL uses task knowledge to interpret communications from users.— By structuring and recording the interaction in a manner in which the user

conceptualizes it, DIAL simplifies information access.

These features are discussed in more detail in the remainder of this paper.

4. The SharedPlans theory of collaboration

The design of DIAL follows a well-developed theory of agent collaboration calledSharedPlans that supports the design of natural language communication systemsand multiagent planning systems [14, 24, 29, 38]. Here, we review SharedPlans;Section 10 compares our approach based on SharedPlans with other agent tech-nologies.The SharedPlans theory of collaboration is based on a mental-state view of plans.

Rather than associating a plan for some action, �, with a group of actions that canachieve �, a plan is instead a richer structure consisting of a collection of beliefs andintentions [40]. Intentions serve several functions [4]: (1) they constrain an agent’schoices of what to do; (2) they constrain the adoption of new intentions: an agentwill not normally adopt new intentions that conflict with existing ones; and (3) theytrigger monitoring of the success or failure of attempts to achieve an intention:failures can engender replanning. Intentions come in two varieties: an intention-to perform some action represents an individual commitment on the part of anagent to perform that action, while an intention-that represents a commitment tosome condition. For example, the user might have an intention-to see a particularvideo while the system has an intention-that the user be able to see the video.In terms of the three functions described above, an intention-to � serves to (1)constrain deliberations by focusing means-end reasoning on discovering ways ofachieving �; (2) trigger a conflict resolution process; and (3) activate a process tomonitor the success or failure of �. In contrast, an intention-that � serves to (1)

438 ortiz and grosz

engender activities that will maintain � or help bring � about; (2) trigger a conflictresolution process as in the intentions-to case; and (3) monitor conditions that willbring about ¬�.The SharedPlans theory considers both basic actions that are executable at will by

an individual and do not further decompose, and complex actions that are decom-posable into constituent subacts. When constructing a system, the domain and appli-cation determine what the designer identifies as basic. For example, one applicationmight consider displaying a video as a basic action while another might view it asa complex action that includes the subaction of invoking a particular video displayprocess.In this theory, a recipe for a complex action, �, is represented as a group of

actions and constraints: the meaning of a recipe for � is, roughly, that by doingthe indicated group of actions in a situation in which the recipe constraints hold, theagents will also perform � [1, 28, 36, 39]. The SharedPlans formalism supports theincremental and collaborative development of recipes for actions. Agents do notneed to have complete recipes already stored or selected in order to begin a collab-orative activity. A recipe that only partially describes some way of accomplishing anaction can be extended by an agent through a process of elaboration. Each agent ispresumed to have a set of recipes, or a “recipe library.” Different agents’ librariesof recipes may differ; these differences might reflect each agent’s unique capabili-ties and knowledge. Recipe libraries are also typically updated over time. Successfulcompletion of a collaborative plan may require that recipes from several agents beintegrated. This is a research problem in its own right and is not addressed in thispaper.Both individual and group plans—either partial or complete—are defined in

SharedPlans. For the purposes of this paper, the following specifies the basic ele-ments of a group plan. A group, Gr , has a SharedPlan to do � if each member, Gi,of the group mutually believes that

1. Gi intends-that Gr do �,2. Gr has a (partial) recipe for �,3. For each constituent act, �, in the recipe for �, either

a) Gr have selected group members to perform �,(i) some individual or subgroup, Gj , has a (partial) plan to do �, and

(ii) Gj believes it can do �;

(iii) Gr mutually believe (i) and (ii);

(iv) Gi intends-that Gj be able to do � in the context of �.b) Gr have a full plan to select group member(s) to perform �.

4.1. SharedPlans in dialogue understanding

SharedPlans had its origin in applications to dialogue understanding [16]. Studies ofnaturally occurring dialogues have established that dialogues exhibit structure [20].This structure includes three interdependent components: a linguistic structure, an


T: We have a connectivity problem between node39 and node64 that we need to fix.S: Okay. ...S: It looks like we need to upgrade node39 then.T: Yeah

(1) How shall we do that?(2) S: Well, first we need to divert the network traffic to another node.(3) T: Okay.(4) Then we can replace node39 with a higher capacity switch.(5) S: Right.(6) T: Okay, good.

(7) S: What type of switch should we use?(8) T: How about an XYZ+?(9) S: Sounds good. We have two of them available?

(1)

(2)

(3)

Figure 3. Example dialogue between a network technician (T) and a network management system (S)engaged in the task of identifying a problem and repairing it.

intentional structure, and an attentional state [16]. Figure 3 illustrates the structureof a simple dialogue. The linguistic structure is the set of utterances grouped intosegments (indicated by bold lines in the figure) and the embedding relationshipsbetween segments. The intentional structure for this fragment has three elements:a plan to fix a connectivity problem corresponding to segment (1), a plan to selecta recipe for this corresponding to segment (2), and a plan to identify the switch tobe used corresponding to segment (3). The embeddings between the discourse seg-ments reflect intentional relationships. For example, the recipe identification activitydiscussed in segment (2) is part of the plan to fix the connectivity problem describedin segment (1); it solves a knowledge precondition [31, 32]. Attentional state recordsthe objects, properties, relations, and discourse intentions that are most salient inthe given segment. The participants in the sample dialogue rely on attentional stateto identify the switch currently under discussion in segment (3) as the referent ofthe pronoun in (9).Lochbaum [31] applied the theory of SharedPlans to dialogue participation, view-

ing two agents engaged in dialogue as participating in a collaborative activity. Oneparticipant interprets the other’s utterance by explaining how that utterance con-tributed to the agents’ partial SharedPlans for some action. A key element of theinterpretation process was determining whether the new utterance signaled theinitiation of a subsidiary SharedPlan for some action, contributed to the currentSharedPlan, or signaled the completion of the current SharedPlan. If the utterance

440 ortiz and grosz

signaled a new, subsidiary SharedPlan, the agent would ascribe to the other agentan intention—that the two agents construct the plan and would use its knowledge ofSharedPlans and recipes to determine the relationship between this new SharedPlanand other plans in the intentional structure.

5. SharedPlans in DIAL

DIAL’s domain is information access in support of distance learning. One of ourdesign criteria was to choose a domain and task situation for which a general levelof modeling was appropriate so that generic recipes could be developed. We wantedto demonstrate the utility of collaboration without necessitating a very large knowl-edge representation effort: This meant avoiding tasks that involved highly intricaterecipes or a great deal of detailed knowledge. In our distance learning applica-tion, we have generic recipes related to various learning situations, but DIAL doesnot require encoding knowledge of course content (i.e., computer science knowl-edge). In addition, one of our goals was to show the benefit derived from a designthat was informed by the SharedPlans theory without necessarily implementing theentire theory and without incurring the computational cost of plan recognition.DIAL forms collaborative plans with a user to locate information relevant to the

user’s task (e.g., studying for an exam, preparing a lecture). The SharedPlan of theuser and system is an “information-locating” plan (ILP), one that is in service ofthe user’s individual intentions and plan to carry out the task [4, 39]. Typically, thisinitial shared ILP will spawn many subsidiary SharedPlans [31, 32], which will alsobe information-locating plans. For example, a user’s individual intention to study fora midterm exam might lead her to engage DIAL and indicate Study for Exam.The user and DIAL would then have a SharedPlan to find information relevant tothe exam. DIAL would present a menu indicating the types of information available(e.g., assignment, lecture notes). If the user next indicates Review Assignment,the user and DIAL would embark on a subsidiary SharedPlan to locate assignmentsand assignment information relevant to the exam.

5.1. Dialogue structure and recipes

In DIAL, system utterances take the form of menu choices presented to the user,and system actions include activities such as displaying a video or displaying sometext. User utterances specify actions (e.g., reviewing a topic) or parameters ofactions (e.g., the topic to be reviewed). The system interprets an action specifica-tion as initiating a collaborative plan for locating information to support the user’sintention to do that action. For example, if the user says “study for exam” the sys-tem interprets this as initiating a plan for locating such information as lecture notes,videos, and assignments relevant to the exam. Subsequent user requests are inter-preted in this context. User utterances that specify parameters of actions are usedto refine these collaborative information-locating plans. For example, if the user


says “sorting” after having said “review topic” the plan is taken to be a review ofthe topic on sorting.The structure of the collaborative ILPs is the basis of the intentional structure of

any dialogue with DIAL. The intentional structure is actually composed of discoursesegment purposes [16]. The relationship of the SharedPlan structure to the inten-tional structure of discourse segments has been described by Lochbaum [31, 32].In brief, each element of the intentional structure is an intention-that a particu-lar SharedPlan be formed and, by virtue of the ensuing intentions-to do actions,executed. For the purposes of this paper, the differences between the SharedPlanstructure and the intentional structure do not matter and we will subsequentlyrefer to them as though they were identical. A discourse segment comprises allutterances that correspond to a given SharedPlan and any SharedPlans subsidiaryto that plan. The linguistic structure corresponds to the set of structured user andsystem utterances where the embedding relationships between segments reflect theintentional structure.The initial set of top-level actions on which users and DIAL can collaborate

include studying for an exam, reviewing an assignment, and reviewing a topic.Recipes specify the ways in which these top-level actions are expanded recursivelydown to such individual “basic” actions as displaying one of the lecture notes files,displaying a video, or displaying an example of code for an assignment. DIALonly stores recipes for actions that are performed jointly. Any further decompo-sition of individual actions performed by DIAL (e.g., the way in which to displaya video) is implicit in code. DIAL does not represent decomposition informationabout a user’s individual actions, but rather treats them as though they were basic.It would need to do so, however, to handle the kinds of misconceptions Pollack[39] discusses. For example, a user might set out to review the topic of graph algo-rithms as a way of preparing for an exam, with the misconception that that topicwas important for the part of the exam that covered sorting. If the system hadbeen aware of the user’s misconception, it would have been able to respond col-laboratively by correcting the user. Most systems will perform better with moreinformation; however, there is a continuum of design options that involve trade-offs of complexity, ease of use, and knowledge requirements (both task and infor-mation). DIAL’s design has been chosen to simplify use and reduce knowledgerequirements.Information-locating recipes are constructed by DIAL and the user jointly and

incrementally. DIAL uses recipe templates to support this joint activity. Unlike arecipe which specifies a set of actions that, if performed, would achieve some action,a recipe template specifies a disjunction of possible choices. Each act-type has anassociated recipe template. The system uses these templates to circumscribe the setof possibilities from which a user can choose a next action to pursue; each choice ofnext action extends the recipe for the current collaborative activity. This approachprovides more flexibility than would be possible with a fixed set of recipes. Thejointly developed recipe is displayed in the action context window; recipe construc-tion can also be interleaved with action execution.2 Incremental recipe constructionby way of recipe templates reflects the sort of knowledge partiality that the Shared-Plans theory was designed to address.

442 ortiz and grosz

Top Level Study for Exam Review Assignment

Review Lecture See Video Contact TA

Review Assignment Review Topic Review Lecture See Video Contact TA View Scope

Review AssignmentStudy for Exam Review Topic Review Lecture See Video

View Scope See Code Example

Review Topic Review Topic Review Lecture See Video Contact TA View Scope

Contact TA

Review Topic

Figure 4. Recipe templates and inter-recipe connections.

Figure 4 gives examples of recipe templates describing the choices that a usermight have, depending on the previous actions. The arrows among them indicatethe interconnections: the figure indicates that if the user is currently reviewing theassignment then one of the options in the bottom center column will be madeavailable. For example, the student could choose to Study for Exam and thenReview Assignment, if the student had received a poor grade on an assign-ment related to material on the exam. On, the other hand, if the student hadpicked Review Assignment at the top level, the system could expect the stu-dent to pick any assignment. This illustrates the way in which the context constrainsthe set of objects to which an expression could refer to. Using those recipe tem-plates, the system and the user might jointly develop the following recipe to locateinformation in service of the user’s studying for an exam:

Study For Exam (CS 50)Review Assignment (6: Linked Lists and Pointers)

Review Topic (Pointers)See Video (On Linked Lists)

This context fragment indicates that the user is studying for the exam by review-ing the assignment on linked lists and pointers and that the latter activity is beingbe done (in part) by reviewing the topic of pointers and viewing a video on linkedlists. Review Topic (Pointers) and See Video (On linked lists)each contribute to doing Review Assignment. The two most embedded actions(review topic and see video) each, eventually, decompose into a set of basic-levelactions, some done individually by the user (e.g., choosing the topic) and somedone individually by the system (e.g., all the actions needed to get the video on the


display). The system actions assist the user in his task. As this example illustrates,each recipe is individually tailored to that particular user’s needs and preferences,through a process of successive refinement in which the system presents possiblechoices—from the recipe template library—that are consistent with the currentcontext. The user can then choose either to refine the information search or toinitiate a new task context.The action context displays the current state of the joint activity, thus providing

a common ground of information about the actions that have been taken and acontext in which to select subsequent actions. In natural language applications of theSharedPlans theory, the computational cost of plan recognition may be high. DIALsolves this problem, limiting the plan recognition task by using recipe templatesto structure interactions in a tractable way. Rather than having to infer the user’spossible intentions, the system suggests possible courses of action according to thelibrary of recipe templates and then records the actual choice.

5.2. Plan correspondences

The SharedPlans theory of collaboration may inform the design of systems in threeways: it can be used (1) as a logical theory directly implementable in a theoremprover,3 (2) as a specification of agent design that constrains planning processes[21, 24, 38, 42, 49], or (3) to guide design by identifying key elements of collabo-rative behavior. In contrast to type (2) implementations, type (3) implementationsneed not explicitly represent or reason about the beliefs, intentions and other men-tal attitudes represented within the theory. Rather, the theory provides abstractguidelines for the design of systems whose behavior is such that one could ascribecollaboration to it.In DIAL, we have taken the third approach: The SharedPlans theory informs

the design of DIAL but is not directly implemented. Some of the components of aSharedPlan are realized in DIAL in the following way.

1. Gi’s intention-that the group do �. DIAL’s commitments to the collaborativeinformation-locating activities it undertakes with a user are manifest primarilyin three ways: (1) DIAL tracks the intentional structure; (2) it uses the inten-tional structure and attentional state to guide the interpretation of user utter-ances (i.e., queries, selections) and to constrain its replies; and (3) it uses theintentional structure to plan alternative courses of action.

2. The group. Gr, has a [partial] recipe for �. The user and DIAL incrementallycompose the recipe for � over a session, using recipe templates.

3. For each constituent act � in the recipe, eithera) Gr have selected group numbers to perform �,

(i) Some individual or subgroup, Gj , has a (partial) plan to do �. DIALand the user make use of a fixed assignment of responsibility.

(ii) Gj believes it can do �. This is currently a default belief.4

(iii) Gr mutually believe (i) and (ii). The visual display provides a com-mon ground that supports mutual belief of (i).

444 ortiz and grosz

(iv) Gi intends-that Gj is able to do � in the context in which � is beingperformed. DIAL’s commitment to users being able to do their actions takesthree forms: (1) DIAL constrains user choices to a set of possible actionsthat can be pursued in the current context; (2) DIAL tracks context, allowingusers to state more simply what they need; and (3) DIAL attempts failurerecovery—for example, if contact with a teaching assistant/fellow is lost, DIALwould try to re-establish.5

b) The group, Gr, has a full plan to select group member(s) to perform �. Thisfunctionality is not yet implemented. In fact, this illustrates our earlier claimthat some measure of collaboration is possible through a partial implemen-tation of the theory. We envision future versions of this system handling thiscase by having the user and the system collectively searching for other help-ful agents on the net: for example, a particular teaching assistant or professorwho is currently online and able to supply the information requested.

5.3. Interpretation in context

5.3.1. DIAL’s use of intentional structure. One of the primary ways in which DIALuses intentional structure is to contextually constrain its interpretation of user utter-ances. The relationships among elements of the intentional structure, in particularthe subsidiary relationships among SharedPlans, as well as the plans themselves,play a role. For example, in the scenario in Figure 2, the plan for action of view-ing the video on selection sort contributes to the review of sorting that is beingdone as part of studying for the exam. Thus, the plan for viewing the video isinterpreted not simply as a plan to view a video but more specifically as a planfor viewing a video as a way of reviewing sorting. In an analogous manner, thereview of sorting is a way of studying. Thus, the simple action View Video is reallythe more complex action View Video in a manner suitable for reviewof sorting for the exam. The context provided by the intentional structureis a source of action modifiers.6 The action of reviewing pointers also contributesto the student’s preparing for the exam, and a similar set of by way of relationshipsor action modifiers holds.DIAL uses its representation of attentional state to determine which portions

of the intentional structure are relevant to interpreting each user utterance. As aresult of this contextualized interpretation, DIAL would, in this example, presentonly that subset of the material on pointers from the course that appears on theexam.To reflect embedded linguistic structure, DIAL places a new SharedPlan below

the lowest SharedPlan in the context tree for which the following conditions hold:(1) the SharedPlan is in the current attentional state; that is, it is one of the high-lighted nodes in the action context tree; (2) the new SharedPlan corresponds toan action in the recipe template for the embedding SharedPlan; (3) conditions (1)and (2) do not hold for any more deeply embedded SharedPlan (i.e., for any nodebelow this node in the action context tree). This activity of situating a task within anaction context replaces plan recognition in natural language discourse processing,


which attempts to infer relationships between intentions. The visual presentation ofthe context tree and the user’s capability to explicitly move nodes of the tree makethe task more tractable.This contextualized interpretation is illustrated by the different interpretations

of the act description Review lecture from a DIAL interaction in each of thefollowing two possible contexts:

Context 1:Study for exam

Review topic (loops)Review lecture

Context 2:Study for examReview topic (loops)

Review lecture

The sequence of utterances is identical in these two contexts, but the interpretationof the last action description is different: in the first context, Review lecture isinterpreted as “review a lecture on loops material covered by the exam,” while inthe second it is interpreted as “review lecture on loops” with no further restriction.The second utterance is thus interpreted as requesting broader coverage—all thematerial on loops, not just that on the exam—and will most likely yield a longer setof possible lectures.As already discussed, intentional and attentional state information can combine

to help disambiguate references to physical objects. For example, we have alreadyseen that an intention to perform an action in DIAL, such as see assignment, is, byitself, vague. To fully specify such a request requires the addition of informationsuch as the identity of the assignment, the file name of the particular assignment,and perhaps also the method to be used in displaying the assignment to the user.DIAL relieves the user of the burden of specifying each such request in exact detailby interpreting information requests relative to a prevailing context.

5.3.2. Theoretical extensions. DIAL currently uses intentional structure and atten-tional state information to constrain and thus help identify a reference to a physicalobject (for example, a particular lecture). Intentional structure and attentional statemay also be combined to constrain reference to non physical objects (e.g., events).Because their use requires a richer structure of actions than implemented in DIAL,this section describes a theoretical framework that enables such extensions. Theideas in the remainder of this subsection have not yet been implemented.Imagine a student who is taking a class on modern culture and consider two situa-

tions: in the first, the student is preparing for an exam and in the second the studentis completing an assignment. As part of preparing for the exam the student asks tosee the list of movies covered on the exam; in this case, the student decides to viewone of those movies: Casablanca. The student instructs DIAL to Load tape ofCasablanca followed by the command, Play. In the second case, the student is tolisten to Beethoven’s Ninth Symphony and issues the commands Load tape of

446 ortiz and grosz

Beethoven’s Ninth and then Play. In the first instance, the command Playshould be interpreted as an action involving the TV set and video player, whereasthe same command in the second instance should be interpreted as involving thetape player and stereo system.This use of attentional state and intentional structure is related to work in plan

inference. For example, Sidner [46] gives the following example: Someone declares,“I’m going on a date tonight. Can you pick up something at the florist for me?”The action described in the second sentence is constrained to the act-types thatcontribute in some way to the intention or plan of going on a date. Work on inter-preting events in the context of means clauses also addresses a related problem.For instance, in the sentence, “Bill knocked the board off the roof by raising hisarm,” the second action description is seen as a means of performing the first andtherefore can be used to constrain the set of allowable interpretations of the actiondescription “Bill knocked the board off the roof” [2, 10]. However, this latter bodyof work does not address the interpretation of an action description in a broaderdiscourse context of several clauses.We can be more precise about action interpretation in this broader context by

specifying what is meant by an intention in such a way that the broader context canbe accommodated. For this purpose, an intention-to constrains the set of allowableactions at the current state of processing [37]. Given the set of all possible futuresthat can result from all sequences of actions that could be performed in a particu-lar state, an intention constrains that set to include only those futures in which theintended action occurs. Each new intention successively prunes the set of permissi-ble futures.Actions are then interpreted relative to such contexts. In the above example,

the Play action in the first case is interpreted relative to the set of permissiblefutures specified by the parent intentions: Study for exam and Load tape ofCasablanca. Only actions that are consistent with the prevailing context and alsocontribute to it are then admissible (i.e., listening to Beethoven’s Ninth would notbe admissible).7

6. System architecture

The functional architecture of DIAL is illustrated in Figure 5. The DIAL systemis viewed as a rational agent; the architecture consists of several mental processestogether with a body of supporting state information [37]. DIAL maintains stateinformation on lectures, videos, and other data to which it has access; these cor-respond to the beliefs of SharedPlans. The architecture includes processes definedin the SharedPlans theory for updating a shared plan [15]. These include a selectrecipe process responsible for choosing a recipe template based on the current con-text (part of the state information). The reconcile process checks to see whether anew intention can be accommodated within the current context [3]. The Act mod-ule is responsible for executing an intention to perform some basic action into anactual command, for example, a Unix command.


Lecture 17

action

context

lecturesassignmentsbboardsmail

text

DIAL AGENT

TAs

reconcile select recipe

processes

contextrecipetemplates

state

Input/Ouput medium

actvideo

video

Figure 5. DIAL functional architecture.

Correspondences between the DIAL design and elements of the SharedPlans the-ory were described in detail in Section 5.2. Rather than explicitly represent each ofthose elements in DIAL, we use the theory as a roadmap to inform our design. Sincethe representation language in which the theory is framed has a clear semantics, itcan serve as a language through which designers can unambiguously discuss detailsof their designs, such as tradeoffs and levels of collaboration. The situation is notunlike that of an architect who makes use of a precise graphic language to describedetailed elements of a planned building structure; because the language used isunambiguous, it can be interpreted by others—such as engineers or contractors—during the actual building phase. An ambiguous or semantically ungrounded lan-guage would make it difficult for the various developers building the structure tocoordinate their efforts.

7. Sample queries and processing

To illustrate DIAL’s tracking and use of context, we will examine an example inwhich a student uses DIAL to study for an upcoming midterm in CS 50, the intro-ductory programming course at Harvard. The student engages DIAL by choosingfrom among a list of high-level goals. In this case, she begins by choosing “Study forExam” from the list of actions presented to her; that goal is recorded in the actioncontext frame. The system interprets her utterance as asking that DIAL collabo-rate with her on locating information that will help her study for the exam. DIALresponds by displaying in the output frame a diagram of the scope of the midtermas a subset of the syllabus. By placing this information in the focus of attentionas well as making it visible and thus shared (or, “mutually believed”), DIAL notonly (1) indicates that it is able to work with the student (i.e., it is agreeing toparticipate in a SharedPlan to provide information for the student’s studying forthe exam), but also (2) provides information that the student may use in decidingwhat information to review, and (3) makes public a content filter that it will use toconstrain the information that it provides.

448 ortiz and grosz

Figure 6. The user is in the process of choosing to see “Assignment 9: A Virtual Machine”.

In this example, the displayed scope information indicates that control flow, point-ers, linked lists, and sorting are within the scope of the exam (partially shown atthe bottom of Figure 6; the system actually highlights the scope in red). It alsoindicates that control flow consists of the subtopics conditionals and loops. The stu-dent decides she does not need to review conditionals or loops, but that she hashad trouble with one of the sorting algorithms and wants to review some of thecode. This particular student’s preferred method of reviewing that material involvesreviewing the relevant assignments. She thus clicks the action “Review Assignment”from the list of possible next actions. The context so far is studying for the exam byway of reviewing an assignment. DIAL restricts the set of possible assignments tothose related to exam topics. It thus asks the student to choose an assignment froma list consisting of assignments that are within the scope of the exam.8

The most recent assignment that includes sorting is on the implementation ofsorting algorithms in MIPS. Although MIPS is not within the scope of the exam, thisassignment is included in the list because it contains a sorting algorithm and sortingis on the exam.9 The student decides that she wants to review this assignment,“Assignment 9: A Virtual Machine.” The text is displayed in the Output Frame(see Figure 7).The action context tree at this point would indicate two active SharedPlans: A

high-level plan for getting information related to the exam and an embedded planfor reviewing assignment 9; the embedding indicates that the assignment review ispart of a way of (i.e., a recipe for) studying for the exam. Both of these plansare in focus attentionally, as indicated by the asterisks in Figure 8. When the


Figure 7. The assignment is displayed.

student asks to “View Scope,” she can receive helpful information describing theway in which these contextual factors constrain or filter the information the systemtakes to be relevant (Figure 8). In this case, the scope is the single topic, sorting.This is determined by considering first the scope of the assignment, which includes“Sorting, Accumulator, Register Set, MIPS Machine, and Assembly Language,” andthat of the exam; the latter adds additional constraints since the student is not onlystudying the last assignment, but doing so in the context of studying for the exam.The current Scope is thus calculated by DIAL as the intersection of the embeddingscope (i.e., the scope of the exam) and the scope of the assignment; “Sorting” isthe only point of overlap.The student begins review of the sorting assignment. She scrolls through the

assignment until she finds the bubble sort algorithm and focuses her review on thatparticular part of the assignment. She decides to review lecture notes for lecturescovering sorting and chooses “Review Lecture” from the list of next actions. Becausethe context has been narrowed to a single topic, sorting, DIAL restricts the lecturesit considers to those that include material that dealt with sorting, in this case LectureNumber 22. Because only one lecture is relevant in the current context, DIAL cancomplete its action description without prompting for a particular lecture (Figure 11in Section 8).Although this example appears simple, it emphasizes the benefit—in terms of col-

laborative support—of adopting even a partial implementation of the SharedPlanstheory: Adopting a collaborative perspective on the design of human-computerinformation systems from the start will result in a more flexible and adaptive system.

450 ortiz and grosz

Figure 8. The user requests to view the scope of the material covered on the particular assignment.The in-scope material consists of the entries under “sorting.” These are actually highlighted in red bythe system.

This difference becomes clear when comparing this example to the web-based travelexample of Section 2.

8. Implementation

8.1. System components

The DIAL system consists of two main components: a surface communication sys-tem (DialSC) that processes user utterances and generates responses to the user bydisplaying information and a server-side software agent (the DIAL agent, DialAG;see Figure 5) that manages task collaborations with the user. Numerous server-sidedatabases provide DialAG with course-specific information. DialSC and DialAGwork in conjunction, with DialSC exchanging new information from the user foroutput files (Figure 9).The surface communication system is implemented through HTML forms acces-

sible to users through a web browser. DialAG uses the information provided in thisform to determine the appropriate response to the user and to update its intentionalstructure and attentional state information. DialAG then sends a new HTML formto DialSC reflecting this computation.The DIAL agent (DialAG) is implemented as a CGI program written in Perl,

which is launched each time the input form is submitted. When DialAG is launched,


Video

Lecture 17

action

context agent.cgi

state info

new info from user

output files

Internet

(1) User enters request intoHTML form.

(3) User sees output.

(2) Submitting form launches CGIprogram to process data

The server

CGIBIN

video

TAs

lecturesassignmentsbboardsmail

text

DialSC DialAG

Figure 9. DIAL system components. The DialSC surface communication system processes user utter-ances and generates responses, while the DialAG server-side agent manages task collaborations.

it reads in the current state of the system, including the context, stored in a statevariable from CGI-form data (including new information and the intentional state),and obtains non-state-dependent information (such as recipes for actions, or lists ofsource files) from the databases on the server. DialAG then checks the state datafor the action corresponding to the system’s interpretation of the user’s utterance.The process is summarized in Figure 10.

8.2. System operation

The entry of each new user intention into the context has side effects on the PERLvariable we use to represent state. Since most DialAG actions are centered aroundthe search and retrieval of information sources, one of the most important roles ofthis context is to determine which sources to search. The key entries in the statevariable that determine this are “Scope” and “Source Mapping.”The first step in the development of context involves determining the scope, which

corresponds to the user’s attentional state. It is represented by a list of topics—or alternatively, a hierarchy of topics. Topics in which the user is interested arestored in the “Scope” variable; other topics are excluded. This topics list definesa subset of the course syllabus, which is itself defined as a hierarchy of topics inDialAG’s databases. The default value of Scope is, therefore, the entire syllabus,or “ALL TOPICS.” The value of Scope will remain as “ALL TOPICS” until anaction such as “Study for Exam” or “Review Assignment” restricts it to the list oftopics declared in the databases to be the scope of that exam or assignment. Next,a “Source Mapping” operation is performed. The Source Mapping is a list of actualsources (for example, a particular lecture notes file or a video file) with which theuser is concerned. It is a “mapping” of topics onto sources. The Source Mapping

452 ortiz and grosz

Look at inputfrom user

Get STATE from User InterfaceGet current action/any new information fromuser via User Interface

User abandons current goal andbacks up to another point in the ACTION HISTORY TREE"NEXT ACTION"

goal by choosing aUser continues current

Update ACTION HISTORYwith NEXT ACTION

Call a Perl functioncorresponding to NEXTACTION

Wait for function to finish executing

Modify current STATEwith the saved state fromthe chosen point in ACTIONHISTORY

specified in the restored STATEReport choices of NEXT ACTION

to follow action to which userjust returned

Send to User any output

Automatically back up inACTION HISTORYReport failure to UserReport same choices of NEXTACTION to User last chosen

Save STATE in currentspot in ACTION HISTORYLook up NEXT ACTIONSto follow action just performedReport choices of NEXT ACTIONto User, along with any output

actionsucceeded

action

failed

Figure 10. Description of the behavior of the DIAL Agent.

is derived from the Scope by compiling a list of all of the available sources whoseown scopes (as declared in the databases10) have non-empty intersections with thecurrent Scope. When users choose, for example, to “Review Lecture,” they arepresented with a list of all lectures in the current Source Mapping, but that listmight include all the lectures for the course, or just the lectures on the scope of theexam, or just a single lecture file, depending on whether the system was operating atthe top level, was in the context of studying for an exam, or was currently reviewinga topic having matches in only one lecture file. The Source Mapping can also be


modified directly by an action such as “Review Topic,” which can perform a keywordsearch on each source in the Source Mapping and then use the results to determinewhether or not the source should stay in the Source Mapping.

8.2.1. Moving among contexts. DIAL assumes that the relationship between suc-cessive actions is one of successively deepening context. However, a shift in topiccan occur in several ways. When the system cannot locate a new intention withinthe currently active branch of the context, it will attempt to place the intention ata higher level. This detection of a shift is performed automatically. A shift of topiccan also be signaled explicitly by a user: the user can navigate to another branchof the context and pursue a new task under a past intention. This latter mecha-nism allows a user to explicitly signal an interruption to the current flow; however,this method requires that a user also explicitly signal an end to the interruption bynavigating to the point at which the interruption originated.The example of Section 7 included a simple instance of DIAL automatically back-

ing up out of a context, corresponding to the “Review Lecture” action that followedthe “View Scope” action. When DIAL cannot locate a recipe template for someaction, such as “View Scope,” it assumes that the action should be at the same levelof context as the previous action rather than deeper in context. In fact, in order notto clutter the context tree, we do not record “View Scope” actions (since it wouldbe meaningless to be in the context of such an action). For this reason, the “ReviewLecture” action was interpreted as being in the context of “Review Assignment.”The list of actions from which the student chose “Review Lecture” was the list ofnext actions in the recipe for “Review Assignment.” Similarly, when the studentchooses her next action after reviewing the lecture, it will again be from the list ofnext actions for “Review Assignment,” not from the empty list of next actions for“Review Lecture.” Her Scope and Source Mapping will accordingly be the intersec-tion of the scopes of the exam and the assignment, before being further restrictedby the “Review Lecture” action.Automatic backup of the context can also occur when an action fails within its cur-

rent context. If, in the example, the student decides, after reading over the assign-ment on the sorting algorithms in MIPS, to review the topic of search, then shecan choose “Review Topic” from the menu in the action input frame, as shownin Figure 11, and indicate search as the topic. Because “sorting” represents theintersection of the scopes of the exam and assignment, the context would constraininterpretations for the “Review Topic” action to reviewing the lecture and videoson sorting. Therefore, the Action History Frame would not place the “Review Topic(Searching)” action in the context of studying for the exam or reviewing the assign-ment. Instead, the “Review Topic (Searching)” action would be placed at the toplevel, parallel to “Study for Exam.”Although DIAL will move forward or back up in context as necessary to perform

the chosen actions, it is ultimately the user who is in control of navigating throughthe context. At any time, the user may click any action in the action history treeto return to a given point in context. If after reading some of the lecture on MIPSassembly the student wanted to return to what she was doing earlier, she could,for example, click the action named “Review Assignment (Assignment 9: A Virtual

454 ortiz and grosz

Figure 11. Following a request to review the lecture on sorting, the user requests to review searching.DIAL realizes that this is a new topic and automatically backs up the context to a higher level.

Machine)” in the action history tree, and the assignment file would be broughtback up in the Output Frame, and the Scope and Source Mapping values would berestored to values at the time that she executed the “Review Assignment” command.Notice that a user wanting to pursue a new task that did not conflict with the currentcontext but was actually intended to be a top-level task would have to click the top-level task before entering the new task.In summary, DIAL handles interruptions in either of two ways. If the user’s input

cannot be accommodated into any position in the context, DIAL interprets thatinput as an interruption and position it at the top-level, indicating a new top leveltopic. The user can subsequently revisit the previous topic by clicking the relevantbranch of the context. Alternatively, the user can signal an interruption by manuallyclicking the top level of the context tree. In the future, we plan to extend DIAL sothat it will return to the old context branch automatically as soon as an interruptionis completed. In all these cases, we assume that the state of the old context will notbe affected—for example, in terms of the consumption of resources that might beneeded in the old context—upon completion of an interruption and resumption tothe old context.

9. Increasing the flexibility of DIAL

To increase the flexibility of the system, we have developed a Javascript programthat enables a course manager to create and modify a syllabus tree. Because the


Figure 12. Topic-tree program for interactively modifying the syllabus tree.

course syllabus or syllabus tree—consisting of topics and subtopics covered in thecourse—plays an important role in determining the embedding structure in theaction context, we wanted to provide an easy way for a course manager to modifythe syllabus tree interactively. The program displays a tree on the screen as shownin Figure 12. The manager can click any topic in the tree and then choose one ofthe options DELETE, ADD CHILD, MOVE, COPY, or RENAME. For example,if the manager clicks FUNCTIONS followed by ADD CHILD and enters the newtopic RECURSION, then RECURSION would immediately be added as a childof FUNCTIONS. The response is immediate because the Javascript program is runcompletely on the client machine; CGI is used only for loading and saving trees onthe server. This sort of transportability is an idea borrowed from work in naturallanguage processing [19].An additional source of flexibility is failure recovery. The DIAL system is based

on a theory of rational agency in which the behavior of an agent is decomposed intoarchitectural modules. One module not yet included is a monitor [37]. A monitormodule would play an important role in failure recovery: If an action performedby DIAL failed (say, because an action to display some HTML file failed becausethe file was write protected), the monitor module would recognize this and initiatesome replanning to identify possible alternatives (perhaps to display a video insteador give the user some other options). We have implemented a planner to performsuch low-level failure recovery within our domain but have not yet integrated it intothe DIAL system.

456 ortiz and grosz

9.1. Applying DIAL to general information access domains

In DIAL, we have replaced the master-slave metaphor—in which a user first (pre-cisely) communicates a request that is then followed by a response—with one basedon collaboration in which a user and an application work jointly toward some goal.Such collaboration leads to a somewhat different view of communication, partic-ularly in information-gathering tasks such as the ones with which DIAL has beenconcerned or those prevalent on the web. In such tasks, the fruits of collabora-tions take the form of the establishment of what we will call an access perspective.An access perspective reflects the inherently multifaceted way in which informationat one location may be accessed from another, depending on the intended use ofthat information and the required level of detail. Each such specialized method foraccessing information leads to the establishment of a unique access perspective, inour case, between a user and an information store. The same request executed fromtwo distinct perspectives may result in the retrieval of different pieces of informa-tion. Access perspectives are dynamic entities that are informed by the structure ofthe user’s activity. Conventional, nonadaptive user interfaces characteristically pro-vide a static access method. In DIAL, the active branch of the context tree estab-lishes an access perspective through which any information requests and responsespass. As a user explores a particular branch of a context tree, the context becomesmore and more constrained and the access perspective becomes, correspondingly,narrower. The software responsible for establishing an access perspective may com-prise many processes; it can, as well, be distributed, as is the case in DIAL.One could imagine structuring information searches on the web in the same way.

For example, suppose someone was trying to get information on buying a car. Thecontext—and, hence, the search—might be incrementally developed in the followingway (notice that such a style of interaction could be easily supported by DIAL inits current form):

Find autosSales

UsedLocation=MA

[interruption]Check mapsCall payment calculator[end interruption]

Here, the user first looks at all the web sites involving automobiles and then focuseson sales of used cars and finally on car sales in the Massachusetts area. As shown,the search might be interrupted by a call to a map database to locate auto deal-erships listed or to payment calculator software to determine financial impact. Asmentioned earlier, DIAL does not currently provide a mechanism for automaticallydetecting such an interruption and then returning to the old context; however, thechange in context could be made explicit as in the current DIAL system.


9.2. Applying DIAL to other domains

The distance learning application we have described is ideal for the ideas on col-laboration put forward in this paper, not only because significant information in thedomain was available electronically, but also because we identified and representedmany uses to which that information could be applied. The theories of dialogue anddiscourse processing that underlie this work require that systems model the tasksbeing done.11 We thus wanted an application that exhibited a richer task structurethan the information retrieval application others attempting to adapt dialogue tech-niques have investigated [47]. However, to demonstrate the general usefulness ofthe approach we have taken, we also wanted to minimize the need for extensive,detailed task-modeling. The complex activities that need to be modeled for thisapplication (exam preparation, review of missed lectures, topic-based study, cre-ation of course materials) are more generic than the travel-planning task othershave explored [42]; that is, they require less domain-specific knowledge. In par-ticular, this application does not require detailed modeling of the content of theinformation (i.e., computer science) or a user’s knowledge of it.DIAL represents a prototypical application of the ideas of collaboration put for-

ward in this paper. DIAL has the following characteristics; the technology underly-ing DIAL can apply to any domain sharing these features.

— Correspondences between information can be represented in several ways: forexample, textbooks, videos, notes.

— The information in the domain of application can be structured hierarchically.This hierarchical structure (in the case of DIAL, this corresponds to the topichierarchy) is used to structure intentional context and to simplify informationsearches.

— The application has a naturally hierarchical task structure.

10. Related work

DIAL is related to three strands of prior research: alternative models of team behav-ior, applications of discourse theories to the development of natural-language inter-faces, and alternative applications of SharedPlans to human-computer interaction.This section reviews and compares work that addresses similar issues as to the DIALeffort.We are not familiar with applications of either of the two main alternatives to

the SharedPlans model of collaborative activity, the Joint Intentions Model [5] orthe Planned Team Activity model, to the sort of user interface we have describedin this paper. However, the joint intentions model has been applied to naturallanguage dialogue systems [7]. Some of the differences between these models andSharedPlans are relevant to the DIAL application.In contrast to the SharedPlans model, the Joint Intentions Model of [5] makes

use of the idea of a joint, irreducible intention; partiality of plan representationis not addressed in that model. The Planned Team Activity Model also assumes a

458 ortiz and grosz

fixed set of mutually known recipes. The DIAL application, and similar efforts inhuman-computer collaborative interface design, require an ability to model the par-tial plan states during the course of joint problem solving. DIAL accomplishes thisby recording intentional structure—which is dynamic and can be incomplete—andthrough the incremental and joint construction of recipes. DIAL’s use of intentionalstructure and recipes is derived from the application of SharedPlans to discourseunderstanding.In both the Joint Intentions and the Planned Team Activity models, communi-

cation points are built in. In SharedPlans, when an intention is dropped, commu-nication is only one of several options that an agent can take. We expect that thisdifference will be significant when we further consider failure recovery.The Artimis system directly implements a logical theory of rational interaction to

develop a natural language dialogue system [44]. It is thus an example of a type 1use of an agent theory, as described in Section 5.2. DIAL and Artimis also differ infocus: DIAL does not attempt to solve the natural language interpretation problem;rather it draws on theories of collaboration that have been applied to the develop-ment of natural language dialogue systems. One result of these two differences isa contrast in the kind of domain information each system requires. DIAL requiresinformation about such tasks as exam-studying, doing assignments, and informationacquisition; however, it does not need information about the specific applicationdomain (e.g., about chemistry or computer science). In contrast, the Artimis systemrequires detailed knowledge of its domain in order to demonstrate its collaborativefeatures.The Shoptalk [6, 8], Choris [6] and ALFresco [48] systems were early efforts

aimed at incorporating the attentional state and linguistic structure elements ofdiscourse theory [16, 22] in human-computer interfaces that combined natural lan-guage and graphics.Shoptalk allowed a user to query a simulation through a combination of nat-

ural language and graphics. It included an explicit display of the “context tree”which the user was allowed to manipulate. Although similar in spirit to DIAL’scontext tree, the tree in Shoptalk was built directly from past utterances rather thanintentional structure. Discourse context was used to help resolve certain kinds ofreferring expressions. The Choris system used a similar technique as the Shoptalksystem.ALFresco’s “topic module” incorporated elements of attentional state that were

correlated with linguistic structure. It used this state information in the interpre-tation of certain referring expressions. One important contribution was the way inwhich linguistic and deictic information were combined to resolve ambiguities.DIAL differs significantly from these systems in its use of intentional structure

in addition to attentional state, an advance made possible by Lochbaum’s augmen-tation of discourse structure theory [16, 30, 31]. By structuring the context treearound intentions, DIAL enables a user to manipulate the purposes of the conver-sation rather than the utterances that were made. A DIAL user can manipulate andrevisit intentions, not sentences. A context tree based on intentional structure allowsmore flexibility in choosing alternatives, determining knowledge preconditions (toILPs, for example), and in recovering from failures.


Its use of three other aspects of the SharedPlans theory of collaboration also dis-tinguishes DIAL from these earlier uses of discourse theory. The SharedPlans the-ory stipulates that agents need know details of only the subactions in which they areparticipating. Furthermore, the theory explicitly supports collaboration among morethan two individuals, not just two-party collaborations. One surface manifestationof this difference is that DIAL’s context tree represents joint activities whereas theShoptalk context tree reflect the user’s individual activities (in the form of questionsposed to the system). Finally, although intentions-that are not represented explicitlyin DIAL, they are captured implicitly in the DIAL architecture: the system deriveshelpful behavior when necessary.Lochbaum [32], following Grosz and Sidner’s argument that discourse was a

collaborative activity [18], showed that SharedPlans was an appropriate basis onwhich to build the intentional structure of discourse. This use of SharedPlans forintentional structure addressed a major open question in discourse structure the-ory [16, 22]. In particular, Lochbaum showed a correspondence between (1) anintention-that relative to a shared plan and an intention-that a subsidiary sharedplan succeeds and (2) the intentional structure of dialogue. One of her centralcontributions was her analysis of knowledge precondition dialogues (Section 4.1).Lochbaum demonstrated the use of SharedPlans in a natural-language dialoguesystem.Rich and Sidner [42, 43], building directly on the SharedPlans formalization and

Lochbaum’s use of it for the intentional structure of discourse, have designed atool called COLLAGEN for developing multimedia interfaces to application soft-ware. COLLAGEN provides mixed-initiative assistance to a user and maintains ahierarchically structured interaction history similar to DIAL’s action context. COL-LAGEN is designed to mediate interactions between a user and a software interfaceagent; it acts as a sort of discourse manager and also keeps track of the collabo-rative plans of the participants. COLLAGEN maintains a library of recipes, eachrepresented as a partially ordered sequence of steps with associated constraints.DIAL and COLLAGEN differ in the way in which they apply SharedPlans and

discourse theory and in the implementation of particular elements of the theory.First, COLLAGEN is a direct adaptation of Lochbaum’s work to dialogue in

a different modality. Like Lochbaum’s work, it focuses on interpreting utterances(though in COLLAGEN’s case in a combination of formal language and deixis) andmakes use of the SharedPlans theory explicitly (thus being a type 2 use of theory)while we use it implicitly to inform the design of DIAL (type 3 use). Furthermore,in DIAL recipes are constructed jointly and incrementally through recipe templates;COLLAGEN makes use of predefined task recipes. These two differences meanthat plan recognition is unnecessary for DIAL: the templates specify action optionsat any choice point. In contrast, COLLAGEN requires, and the project has thusexplored, methods of plan recognition [26].Both COLLAGEN and DIAL allow a user to revisit a context. COLLAGEN

requires that the user reference a previous act or subact, while in DIAL, contextentries are active objects that can be revisited by clicking them. We chose the lat-ter approach because it hides details of the underlying implementation—the actionrepresentation and recipes—from users. On the other hand, COLLAGEN has more

460 ortiz and grosz

sophisticated interruption mechanisms for explicitly starting and ending a new seg-ment [41].Unlike COLLAGEN’s segmented history, DIAL does not record actions in its

context that are part of the decomposition of a task but not themselves collaborative(that is, are done individually by either the user or the system). As a result, DIALcontexts are more compact and focus on the joint activities.Finally, DIAL has several features not in COLLAGEN. DIAL makes use of

the hierarchical structure of domain information through the topic hierarchy, tostructure context, a feature useful for adaptability as discussed in Section 9. DIALalso uses attentional state to infer referents to objects (e.g., identifying the relevantassignment to a user’s request), a particularly useful feature in the informationaccess domain where users may not know such information. In addition, DIAL’suse of recipe templates leads to a nice semantics for commands in context that isbased on constrained possible futures.Other efforts that combine discourse theories or agent technologies in the design

of human-computer interfaces include work using speech act theory, rhetoricalstructures theory, and models of attentional state.The application of SharedPlans described in this paper resembles Winograd and

Flores’s [52] use of speech act theory [45] for systems that support cooperative workby teams of people. However, SharedPlans provides a model for action at the levelof whole segments of discourse comprising multiple utterances, not just actions atthe single utterance level. Furthermore, the SharedPlans theory enables modelingof a wider range of collaborative situations; in particular, it allows for the full rangeof organizational structures among participants.Stein and Maier [47] combine speech act theory [45] and dialogue scripts based

on Rhetorical Structures Theory [33] to structure information retrieval dialogues(e.g., for search and retrieval in databases). The major difference with our work isfoundational: The SharedPlans theory precludes the need for many of the speechacts and discourse relations they use to structure dialogues. The latter might requirethe definition of additional discourse relations or speech acts to accommodate newtypes of discourses. Just as SharedPlans have enabled simpler explanations of dis-course structure for natural language dialogues [29, 31], the integrated treatmentthat they allow of actions at the utterance and discourse-segment level simplifiesthe discourse structures needed for dialogue-oriented multimedia interfaces.Other information access systems that maintain a context of user activities include

Letizia [27]. That system assists a user in locating information on the web by trackingthe user’s behavior and attempting to anticipate items that might also be of interest.It accomplishes this by conducting a parallel web search based on heuristics used toinfer user interests. As discussed in Section 9, we expect that the approach we havetaken in DIAL will prove very useful in the development of systems that guide auser in applications like this. Our system differs from Letizia, however, in the wayin which we represent context: by recording intentional and attentional state.A different, but somewhat related area of work is that on explanation generation.

For example, Mittal and Moore [34], describe a method for creating useful menusystems that can serve as embedded help systems and that also allow users to askfollow-up or elaboration questions. The authors describe a system, developed using


this method, that maintains a context of a request and also reasons about expla-nations. The system uses a plan-based model of text generation and records thereasoning behind the planning process in order to understand the user request.

11. Future work

We are currently exploring extensions that would allow a maintainer of the systemto easily specify course-specific recipes. For example, other types of courses suchas chemistry or physics that include laboratories might require area-specific recipes.Students might also want to customize the system with their own specialized recipes:for example, a student might specify a typical study sequence or a typical sequencefor preparing for an assignment (say, by viewing a particular video followed bysome section of notes and then answering a set of questions). We are also lookingat ways to extend the software for transporting DIAL to allow a user to modify suchrecipe knowledge. An important part of this work would involve systematic studiesto incorporate user feedback on the design and use of DIAL.We plan to investigate ways of adding interruption handling to DIAL, while

keeping the design and operation fairly intuitive. This may involve incorporatingelements of the theory more explicitly into the design. Although we have triedto keep the recipe representation as simple as possible, there are applications inwhich a richer representation would be useful. For example, the representation ofloops would be needed to handle a case in which a user wanted to look througheach assignment until a particular piece of information was found. Finally, we havebegun to investigate the challenging problem of learning user-specific recipes.

12. Contributions and lessons learned

We have shown how a theory of rational, collaborative behavior—SharedPlans—caninform the design of information access systems. To demonstrate and test our claimswe have developed DIAL, a collaborative tool for distributed information access.In DIAL, a set of user utterances is viewed as a discourse, in a formal language,in which contextual information is used to disambiguate inputs. Keeping track ofcontextual information simplifies users’ tasks. Users need not express informationaccess requests in precise detail; instead, the intentional and attentional contextsserve to usefully constrain the set of possible interpretations of each utterance.These contexts can be constructed through knowledge of a set of fairly generic taskrecipes together with a commitment on the part of the system that users be ableto succeed in their requests. A context also naturally reflects the goals currentlypursued by the user, and interactions taking place within such an environment willtherefore tend to be more focused and less tedious.The use of context-sensitive menu systems is certainly not new in the field of

user interfaces; our intention has not been to propose a new form of interface atthe surface level. Indeed, the menu systems that we have made use of in DIALare not difficult to implement. The central lesson to be learned from our work

462 ortiz and grosz

is a new methodology for developing collaborative human-computer informationsystems that models “what the agent is doing.” The SharedPlans theory can serveas a blueprint for designers wanting to add collaborative elements to their systems ina principled way: In the case of menu-based systems, for example, a new semanticshas been proposed that is based on the intentional structure of the dialogue and,derivatively, on the underlying task structure.12 This has an important consequence:the sort of collaborative behavior modeled in the theory need not be acquired atthe cost of implementing the theory directly. As a guideline for designers we havepointed out explicit correspondences between the theory and the design.This has led to a new and more intuitive method of accessing web-resident infor-

mation. DIAL supports information searches based on a progressive refinement ofgoal context rather than the sort of one-shot query mechanisms used in standardweb search engines in which a user must input a query exactly reflecting that user’sgoals and that also is formed in a way that minimizes the number of hits. In partic-ular, the use of both intentional structure and attentional state simplifies the user’squery task. Finally, DIAL and the access method it supports represent a uniquetool for distance learning environments.

13. Acknowledgments

This work was supported by National Science Foundation Grants No. IRI 95-25915,96-18848, and CDA 94-01024.We thank Nathan Scales who worked on the original implementation and also

anonymous reviewers for their comments and suggestions.

Notes

1. Distance learning refers to learning that utilizes and deploys a distributed computational environ-ment so that students who are geographically dispersed may learn from a common source.

2. Notice that the system or user may each individually use additional recipe information not explicitlyrepresented or displayed: for example, the method chosen to display a video. Since these elementsare part of individual plans, mutual belief is not required.

3. Grosz and Kraus [14] do not advocate this use, but only type (2) use.4. Section 9 discusses a separately implemented subcomponent that performs some failure recovery,

but has not yet been integrated into DIAL. One use of such a capability is to support recovery ifone of DIAL’s default beliefs is defeated.

5. Intentions-that can themselves lead to helpful actions [15, 37]. Furthermore, it is possible to defineformally the notion of one agent helping another in some task [37].

6. Such additional descriptive information plays a role in references to events [9, 11].7. Note that this is a purely semantic interpretation in terms of possible futures or worlds, not a

suggested method of implementation.8. More precisely, as will be discussed later, assignments with declared scopes that overlap with the

scope of the exam.9. DIAL’s ability to constrain lists is limited by the amount of structural information it has. It will

perform better with more information, but can operate even with only a little.10. As a first step toward simplifying the process of setting up such databases, we have developed

software, described in Section 9, to record scope information.


11. We emphasize that task models are distinct from domain models. The financial domain, for example,includes such concepts as money, credit, and accounts. The bank-teller task operates in that domain,as do stock-broker and bank-presidency tasks. Bank-teller dialogues have a structure reflective of thebank teller’s task and thus quite different from bank-president dialogues, even though the (domain)concepts discussed may overlap considerably.

12. One way of viewing this is as placing constraints on communication.

References

1. C. T. Balkanski, “Modeling Act-type Relations in Collaborative Activity,” Harvard University, 1990.Technical report.

2. C. T. Balkanski, Actions, Beliefs, and Intentions in Multi-Action Utterances, Harvard University, 1993.Ph.D. Dissertation.

3. M. E. Bratman, D. J. Israel, and M. E. Pollack, “Plans and resource-bounded practical reasoning,”Computational Intelligence vol. 4, pp. 349–355, 1988.

4. M. Bratman, Intentions, Plans, and Practical Reason, Harvard University Press, 1987.5. P. Cohen and H. Levesque, “Teamwork,” Nous vol. 25, pp. 487–512, 1991.6. P. R. Cohen, M. E. Dalrymple, D. B. Moran, F. C. N. Pereira, J. Sullivan, S. Tyler, J. Gargan,

and J. Schlossberg, Synergistic Use of Direct Manipulation and Natural Language, Morgan KaufmannPublishers, Inc., 1998.

7. P. R. Cohen, H. J. Levesque, Nunes H. T. Jose, and S. L. Oviatt, “Task-oriented dialogue as aconsequence of joint activity,” in Proceedings of the Pacific Rim International conference on ArtificialIntelligence, 1990, pp. 203–208.

8. P. R. Cohen, Integrated Interfaces for Decision-Support with Simulation, Proceedings of the WinterSimulation Conference, Association for Computing Machinery, 1991.

9. D. Davidson, “The logical form of action sentences,” in Actions and Events. Clarendon Press,pp. 105–148. (Originally published in The Logic of Decision and Action, N. Rescher, ed.), Uni-versity of Pittsburg Press, 1967.

10. B. Di Eugenio, Understanding Natural Language Instructions: A Computational Approach to PurposeClauses, University of Pennsylvania, 1993. Ph.D. Dissertation.

11. A. Goldman, A Theory of Human Action, Princeton University Press, 1970.12. I. Greif, “Desktop agents in group-enabled products,” Communications of the ACM, vol. 37, no. 7,

pp. 100–105, 1994.13. B. Grosz and R. Davis, “A report to ARPA on twenty-first century intelligent systems,” AI Magazine

pp. 10–20, 1994.14. B. J. Grosz and S. Kraus, “Collaborative plans for complex group action,” Artificial Intelligence

vol. 86, no. 1, pp. 269–357, 1996.15. B. J. Grosz and S. Kraus, “The evolution of SharedPlans,” in A. Rao and M. Wooldridge (eds.),

Foundations and Theories of Rational Agency, 1998.16. B. J. Grosz and C. Sidner, “Attention, intentions, and the structure of discourse,” Computational

Linguistics, vol. 12, no. 3, pp. 175–204, 1986.17. B. Grosz and C. Sidner, “Plans for discourse,” in P. Cohen, J. Morgan, and M. Pollack (eds.),

Intentions in Communication. Bradford Books/MIT Press: Cambridge, MA. 1990a, pp. 417–444.18. B. J. Grosz and C. L. Sidner, Plans for Discourse. MIT Press: Cambridge, MA, 1990b, ch. 20, pp. 417–

444.19. B. J. Grosz, D. Appelt, P. Martin, and F. Pereira, “TEAM: An experiment in the design of trans-

portable natural-language interfaces,” Artificial Intelligence vol. 32, pp. 173–244, 1987.20. B. Grosz, M. Pollack, and C. Sidner, “Discourse,” in M. Posner (ed.), Foundations of Cognitive

Science. MIT Press: Cambridge, MA, 1989.21. B. J. Grosz, L. Hunsberger, and S. Kraus, “Planning and acting together,” AI Magazine pp. 23–34,

1999.22. B. J. Grosz, “Discourse analysis,” in Understanding Spoken Language. Elsevier North-Holland, ch. IX,

pp. 235–268, 1978.

464 ortiz and grosz

23. B. Grosz, “Collaborative systems: 1994 AAAI presidential address,” AI Magazine vol. 2, no. 17,pp. 67–85, 1996.

24. M. Hadad, “Using SharedPlan model in electronic commerce environment,” Bar Ilan University.1997. Master’s thesis. Technical report.

25. IITA Task Group, Information Infrastructure Technology and Applications. Office of Science andTechnology Policy, 1994.

26. N. Lesh, C. Rich, and C. L. Sidner, “Using plan recognition in human-computer collaboration,” inSeventh International Conference on User Modeling, 1999.

27. H. Lieberman, “Letizia: An agent that assists web browsing,” in Proceedings of the International JointConference on Artificial Intelligence, 1995.

28. K. Lochbaum, B. Grosz, and C. Sidner, “Models of plans to support communication: An initialreport,” in Proceedings of the 8th National Conference on Artificial Intelligence (AAAI-90), MIT Press:Cambridge, MA. pp. 485–490, 1990.

29. K. Lochbaum, “A model of plans to support inter-agent communication,” in AAAI-94 Workshop onPlanning for Inter-Agent Communication, American Association of Artificial Intelligence, 1994a.

30. K. Lochbaum, “Using Collaborative Plans to Model the Intentional Structure of Discourse,” HarvardUniversity, Ph.D. dissertation, 1994b. Available as Tech Report TR-25-94.

31. K. E. Lochbaum, “The use of knowledge preconditions in language processing,” in C. S. Mellish(ed.), Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-95), MorganKaufmann Publishers, Inc. San Mateo, CA. vol. 2, pp. 1260–1266, 1995.

32. K. E. Lochbaum, “A collaborative planning model of intentional structure,” Computational Linguis-tics vol. 34, no. 4, pp. 525–572, 1998.

33. W. Mann and S. Thompson, Rhetorical Structure Theory, Ablex, pp. 279–300, 1987.34. B. O. Mittal and J. D. Moore, “Dynamic generation of follow up question menus: Facilitating inter-

active natural language dialogues,” in Proceedings of the ACM CHI95, 1995.35. NRC-CSTB, More than Screen Deep: Toward Every-Citizen Interfaces to the Nation’s Information

infrastructure. National Academy Press, 1997.36. C. L. Ortiz, “A commonsense language for reasoning about causation and rational action,” Artificial

Intelligence vol. 111, pp. 73–130, 1999a.37. C. L. Ortiz, “Introspective and elaborative processes in rational agents,” Annals of Mathematics and

Artificial Intelligence vol. 25, no.1–2, pp. 1–34, 1999b.38. H. Pasula, “Design of a collaborative planning system,” Harvard University. Technical report, Senior

Honors Thesis, 1996.39. M. Pollack, Inferring Domain Plans in Question-Answering. University of Pennsylvania, Ph.D. disser-

tation, 1986.40. M. Pollack, Plans as Complex Mental Attitudes. MIT Press. pp. 77–103, 1990.41. C. Rich and C. L. Sidner, “Segmented interaction history in a collaborative interface agent,” in

Proceedings of the Third International Conference on Intelligent User Interfaces, 1997a.42. C. Rich and C. L. Sidner, “When agents collaborate with people,” in First International Conference

on Autonomous Agents, pp. 284–291, 1997b.43. C. Rich and C. L. Sidner, “COLLAGEN: A collaboration manager for software interface agents,”

User Modeling and User-Adapted Interaction vol. 8, no. 3/4, pp. 315–350, 1998.44. D. Sadek, “Design considerations on dialogue systems: From theory to technology—the case of

Artimis,” in Proceedings of the ESCA Interactive Dialogue in Multi-Modal Systems Workshop, 1999.45. J. R. Searle, Speech Acts: An Essay in the Philosophy of Language, Cambridge University Press, 1969.46. C. L. Sidner, “What the speaker means: The recognition of speaker’s plans in discourse,” Interna-

tional Journal of Computers and Mathematics vol. 9, pp. 71–82, 1983.47. A. Stein and E. Maier, “Structuring collaborative information-seeking dialogues,” Knowledge-Based

Systems vol. 8, no. 2–3, pp. 82–93, 1995. Special Issue: Human-computer collaboration.48. O. Stock, G. Carenini, F. Cecconi, E. Franconi, A. Lavelli, B. Magnini, F. Pianesi, M. Ponzi,

V. Samek-Lodovici, and C. Strapparava, “Alfresco enjoying the combination of natural languageprocessing and hypermedia for information exploration,” in M. T. Maybury (ed.), Intelligent Multi-media Interfaces. The MIT Press, pp. 197–224, 1993.

49. M. Tambe, “Towards flexible teamwork,” Journal of Artificial Intelligence Research vol. 7, pp. 83–124,1997.


50. L. Terveen, “Overview of human-computer collaboration,” Knowledge-Based Systems vol. 8, no. 2–3,pp. 67–81, 1995. Special Issue: Human-Computer Collaboration.

51. D. Weld, “The role of intelligent systems in the national information infrastructure,” AI Magazinevol. 3, no. 16, 1995.

52. T. Winograd and F. Flores, Understanding Computers and Cognition: A New Foundation for Design.Ablex, 1986.

Documents

Interpreting Information Requests in Context A Collaborative Web Interface for Distance Learning