18
539 ©Copyright 1996 by International Business Machines Corpora- tion. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other infor- mation-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 0018-8670/96/$5.00 1996 IBM LIEBERMAN AND MAULSBY Agent software is a topic of growing interest to users and developers in the computer industry. Already, agents and wizards help users automate tasks such as editing and searching for information. But just as we expect human assistants to learn as we work with them, we will also come to expect our computer agents to learn from us. This paper explores the idea of an instructible agent that can learn both from examples and from advice. To understand design issues and languages for human-agent communication, we first describe an experiment that simulates the behavior of such an agent. Then we describe some implemented and ongoing instructible agent projects in text and graphic editing, World Wide Web browsing, and virtual reality. Finally, we analyze the trade-offs involved in agent software and argue that instructible agents represent a “sweet spot” in the trade-off between convenience and control. he idea of intelligent agent software has recently captured the popular imagination. From Apple Computer’s Knowledge Navigator** video to Microsoft Corporation’s Bob**, to the recent explo- sion of personalized search services on the World Wide Web, the idea of intelligent software that per- forms the role of a human assistant is being explored in a wide range of applications. Indeed, there is a growing realization that such soft- ware is not only desirable, but necessary. The com- plexity of computer interfaces has accelerated recently. The prevalent view has been that a computer interface is a set of tools, each tool performing one function represented by an icon or menu selection. If we continue to add to the set of tools, we will simply run out of screen space, not to mention user tolerance. We must move from a view of the computer as a crowded toolbox to a view of the computer as an agency—one made up of a group of agents, each with its own capabilities and expertise, yet capable of working together. Conspicuously rare in today’s agent software is a capability considered essential in human agents: they learn. To be truly helpful, an assistant must learn over time, through interacting with the user and its envi- ronment—otherwise, it will only repeat its mistakes. Learning is essential for a software agent; no software developer can anticipate the needs of all users. More- over, each person’s needs change from day to day, and knowledge learned from experience can lead to a bet- ter understanding of how to solve problems in the future. Most current agent software and most of the proposed scenarios for future agent software have an agent act- ing to meet a user’s goals, but ignore the problem of how the agent comes to learn those goals or actions. We are interested not only in how to build a smart agent from scratch, but also in how an agent can become just a little bit smarter every time it interacts with the user. The question is: “What is the best way for an agent to learn?” We have an answer: We think agents should learn by example. The reason for this is simple. If we are striv- ing to make a software agent act like a human assis- T Instructible agents: Software that just keeps getting better by H. Lieberman D. Maulsby

New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

539

©Copyright 1996 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted withoutpayment of royalty provided that (1) each reproduction is donewithout alteration and (2) theJournal reference and IBM copyrightnotice are included on the first page. The title and abstract, but noother portions, of this paper may be copied or distributed royaltyfree without further permission by computer-based and other infor-mation-service systems. Permission torepublish any other portionof this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 0018-8670/96/$5.00 1996 IBM LIEBERMAN AND MAULSBY

Agent software is a topic of growing interest tousers and developers in the computer industry.Already, agents and wizards help users automatetasks such as editing and searching forinformation. But just as we expect humanassistants to learn as we work with them, we willalso come to expect our computer agents to learnfrom us. This paper explores the idea of aninstructible agent that can learn both fromexamples and from advice. To understand designissues and languages for human-agentcommunication, we first describe an experimentthat simulates the behavior of such an agent. Thenwe describe some implemented and ongoinginstructible agent projects in text and graphicediting, World Wide Web browsing, and virtualreality. Finally, we analyze the trade-offs involvedin agent software and argue that instructibleagents represent a “sweet spot” in the trade-offbetween convenience and control.

he idea of intelligent agent software has recentlycaptured the popular imagination. From Apple

Computer’s Knowledge Navigator** video toMicrosoft Corporation’s Bob**, to the recent explo-sion of personalized search services on the WorldWide Web, the idea of intelligent software that per-forms the role of a human assistant is being exploredin a wide range of applications.

Indeed, there is a growing realization that such soft-ware is not only desirable, but necessary. The com-plexity of computer interfaces has acceleratedrecently. The prevalent view has been that a computerinterface is a set of tools, each tool performing onefunction represented by an icon or menu selection. Ifwe continue to add to the set of tools, we will simplyrun out of screen space, not to mention user tolerance.We must move from a view of the computer as a

crowded toolbox to a view of the computer as anagency—one made up of a group of agents, each withits own capabilities and expertise, yet capable ofworking together.

Conspicuously rare in today’s agent software is acapability considered essential in human agents: theylearn. To be truly helpful, an assistant must learn overtime, through interacting with the user and its envi-ronment—otherwise, it will only repeat its mistakes.Learning is essential for a software agent; no softwaredeveloper can anticipate the needs of all users. More-over, each person’s needs change from day to day, andknowledge learned from experience can lead to a bet-ter understanding of how to solve problems in thefuture.

Most current agent software and most of the proposedscenarios for future agent software have an agent act-ing to meet a user’s goals, but ignore the problem ofhow the agent comes to learn those goals or actions.We are interested not only in how to build a smartagent from scratch, but also in how an agent canbecome just a little bit smarter every time it interactswith the user. The question is: “What is the best wayfor an agent to learn?”

We have an answer: We think agents should learnbyexample. The reason for this is simple. If we are striv-ing to make a software agent act like a human assis-

T

Instructible agents:Software that justkeeps getting better

by H. LiebermanD. Maulsby

Page 2: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996540

tant, we should make the communication betweenuser and agent as human-like as we can—not neces-sarily personal, but intuitive and flexible. So the ques-tion of how an agent might learn becomes: “What arethe best ways for people to teach and agents to learn?”

Agents might learn by being given explicit instruc-tions, but giving instructions is much like conven-tional computer programming, and we know thatprogramming is difficult for many people. However,people are extraordinarily good at learning fromobserving and understanding examples, especiallyexamples in the context of real work. So we can takeadvantage of that ability and design user-agent inter-action around the activity of demonstrating, record-ing, and generalizing examples. Do you want youragent to solve a new kind of problem? Show it how.

If an agent learns by watching the user work on con-crete examples, by listening to the user’s comments,and by asking questions, we call it aninstructibleagent. Work on instructible agents is a synthesis ofmany disparate fields of artificial intelligence andhuman-computer interaction: machine learning,knowledge acquisition, program synthesis, user mod-eling, direct-manipulation interfaces, and visual inter-face design. Specific techniques such as constraints,natural language processing, or statistical methodsmight come into play in different kinds of instructibleinterfaces.

Much of the work on instructible agents has appearedunder an interdisciplinary topic calledprogrammingby example or programming by demonstration.1 Theuse of the word “programming” here may be mislead-ing; while most of these systems do learn procedures,the process by which they learn and how it appears tothe user is typically very different from what we cur-rently think of as computer programming. It is morelike the interaction between a teacher and a student, ora coach and a trainee—where the user is the coachand the computer is the trainee.

Instructible interface agentsare embedded in direct-manipulation interfaces like those in common usetoday: text editors, graphical editors, spreadsheets,and the like. In the near term, instructible agents willwork within the framework of familiar applications. Auser can choose to ignore the agent and operate theapplication as usual. However, the agent can, either onits own or at the user’s behest, make suggestions orautomate actions.

The central problem for such agents is to infer whatthe user wants from observing the user’s actions in theinterface. There are two approaches, both explored inthis paper.

The first approach is to design an interface in whichactions express their intent. For example, the Emacstext editor provides many navigation commands, suchas “forward word” and “forward line,” that expressmovement and selection relative to text structures. Butinterfaces like Emacs based on typed commands arenot for everyone. In several of the learning systemswe describe, the user can point to an object and say,“This is an example.” The user thereby directs the sys-tem to generalize operations involving the exampleobject, so that an analogous procedure can be per-formed on another similar object in the future. Theuser may annotate the example, specifying the fea-tures to be adopted in the generalization.

The other, more ambitious approach is to let the sys-tem decide for itself which aspects of the demonstra-tion are relevant. This reduces the tedium ofexplaining examples. As with a human agent, thecomputer is given the opportunity to make mistakes.The agent communicates with the user and learns bycorrecting its mistakes.

These two approaches—direct instruction and induc-tive inference—are complementary and have beenused together in some systems, such as Cima.2

Figure 1 The Turvy experimental setup

J. H. Andreae, B. ARobotic programmiJournal IEEE

Ray Bar

J. H. Andreae, B. ARobotic programmiJournal IEEE

Ray Bar

USER TURVY

SHOW ME,PLEASE!

TURVY, I WANT YOU TOTAKE TITLES THAT ARE NOT

BOLD, AND PUT THEMIN QUOTES

Page 3: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 541

Instruction has also been combined with deductiveinference (i.e., plan recognition) in the Instructo-Soarsystem,3 which enables the user to supply missingknowledge about planning operators and objectswhenever Soar fails to explain an action.

Instructible agents: Issues, experiments, andsystems

In this paper, we explore a variety of scenarios andtechniques for building instructible agents. We showexamples from learning agents we have built for inter-active applications such as graphical editors and texteditors. We show a variety of interaction styles, frominstructing through direct-manipulation menu selec-tion and mouse clicks to natural language input,graphical annotation, and visual observation. We alsodiscuss scenarios for future agents in virtual-realityworlds.

We begin by describing an experiment, Turvy, whichsimulates user-agent interaction in a text editor, with ahuman playing the role of the agent. The goal is tounderstand what kind of vocabulary could serve as ameans of discourse between a user and an instructibleagent. We use that experience to develop a set of gen-eral criteria for agent-user interaction. A system thatimplements some of this agent’s learning ability,Cima, is then described.

We move on to Mondrian, an agent that learns newgraphical procedures, both by demonstration and bygraphical annotation of video. The agent describeswhat it learns through feedback in pictorial and natu-ral language. It accepts the user’s advice about gener-alization via speech input. Mondrian illustrates how amedia-rich, multimodal interface can enhance theexperience of teaching instructible agents. Mondrianrepresents an approach where the user understandsthat the purpose of the interface actions is explicitly toteach the agent.

Another approach, which relies less on explicitinstruction and more on observing the user’s behavior,is represented by Letizia,4 an agent that assists a userin browsing the Web. Letizia also illustrates continu-ous, incremental learning interleaved with perfor-mance.

Finally, we discuss a scenario of work in progress foran instructible agent,ACT, that learns new behaviorfor animated characters in a virtual reality environ-ment. Here both the user and the learner are repre-

sented by characters in the virtual reality world(“avatars”) and observation, advice, and explicitinstruction play an integrated role.

How should the user and the agentcommunicate?

To investigate possibilities for how a user might com-municate with an intelligent computer agent, a userstudy5 was conducted, in which a researcher, hiddenbehind a screen, pretended to be an agent that learnsword-processing tasks. Figure 1 illustrates the setup:The researcher playing the role of the agent, seated atright, is known to the user as “Turvy.”

Example: Formatting a bibliography. Figure 2illustrates the example task: making citation headingsfor bibliography entries, using the primary author’ssurname and the last two digits of the year of publica-

Figure 2 Sample data from the “citation headings” task

Philip E. Agre:The dynamic structure of everyday life:PhD thesis: MIT: 1988.

J. H. Andreae, B. A. MacDonald: Robot programmingfor non-experts: Journal IEEE SMC: July 1990.

Kurt van Lehn:Mind bugs: the origins of proceduralmisconceptions: MIT Press: 1990.

[Agre 88]Philip E. Agre:The dynamic structure of everyday life:PhD thesis: MIT: 1988.

[Andreae 90]J. H. Andreae, B. A. MacDonald: Robot programmingfor non-experts: Journal IEEE SMC: July 1990.

[van Lehn 90]Kurt van Lehn:Mind bugs: the origins of proceduralmisconceptions: MIT Press: 1990.

BEFORE EDITING

AFTER PUTTING AUTHOR AND DATE INTO HEADING

Page 4: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996542

tion. When first introduced to Turvy, users are toldonly that its goal is to help them perform repetitivetasks, that it watches what they do and understands “abit of English.” Users soon learn that Turvy knowsnothing about bibliographies or proper names ordates. Instead, it must learn how to recognize them.They come in many varieties: a primary author’s sur-name is the capitalized word before the first comma,unless there is only one author, in which case it pre-cedes a colon. Sometimes it precedes “(ed)” andsometimes it has a baronial prefix like “van.”

To perform this task manually on the 18 bibliographyitems, the user would have to execute more than 110mouse clicks, 70 key presses, and 90 menu com-mands. Given the variety of special cases, creating akeyboard macro for this task would be impractical.

Writing a C orBASIC program for this task would takeeven longer. In the user tests, Turvy performed nearlyall the edits on 14 of the 18 entries, the other 4 servingas examples. When predicting actions, Turvy aver-aged 4 mistakes, which users had to correct.

To really understand Turvy, it is helpful to see it inaction. Figure 3 presents “Pam,” a composite of twousers from the study, working on the task describedabove.

Design issues

When tested by more than a dozen users on word-pro-cessing, file management, and drawing tasks, Turvyrevealed important design issues:

Figure 3 Dialog between Turvy and the user while formatting the bibliography

Pam: Take the last name, before the colon.[Copies “Agre,” pastes it to a new para-graph, types brackets, sets style.] Thisis a list of publications. Now we’ll addthe last two digits of the date. [Types“88.”] Do you want another demo?

Turvy interprets Pam’s question to mean theexample is complete. Turvy has formed an ini-tial generalization and now offers to edit thesecond entry (author “Andreae”).

Turvy: I can try if you want.Pam: OK.Turvy: [Picks “Macdonald.”] Looking for word

before first colon in paragraph, OK?Pam: No—stop! You want the first author.Turvy: Show me, please.Pam: [Picks “Andreae.”]Turvy: You picked word before comma. You

want word before first comma or colon?Pam: Yes! The first comma or colon.Turvy: [While doing actions] Copy, go to start

of paragraph, OK?Pam: OK.Turvy: Type open bracket, paste, close

bracket, change style to “citation,” OK?Pam: OK.

Now Turvy makes a mistake with the date.When Pam typed “88,” Turvy assumed thenumber was a constant.

Turvy: Type in blank space “88,” OK?Pam: No, the last two digits at the end.Turvy: Show me, please.Pam: [Picks “75.”]Turvy: Two digits before period at end of para-

graph?Pam: Yes.

… Later, Turvy encounters author “van Lehn,”mistakenly selecting “Lehn.” When the usercorrects it, Turvy asks for an explanation.

Turvy: What’s different about this case?Pam: Hmm.Turvy: You selected these two words, why?Pam: Because these two words make up his

last name.Turvy: Is there some feature of this word [high-

lights “van”] I should look for? Lower-case word?

Pam: [Hesitant] In this case, yes.

Turvy does the rest of the task without error.

Page 5: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 543

• The agent should learn from examples: Verbalspecifications alone are too difficult for users tocompose, and they are riddled with ambiguity anderror. Though progress is being made in instructiblenatural language interfaces,6,7 we believe examplesshould play a primary role.

• The agent should accept verbal and gestural hints inconjunction with examples; this speeds learning byfocusing attention on relevant features of the exam-ples.

• The agent should teach users its language and itsbackground knowledge through the feedback itgives; verbal feedback should include only termsthe agent can recognize. Figure 4 illustrates someforms of feedback.

• When the agent needs a hint because it cannotexplain some action, it should not ask “Why didyou do this?” Instead, the agent should make aguess, to which the user can respond with a hint.

• Learning about special cases (e.g., “van Lehn”) asthey arise makes the agent much easier to teach; theagent should discourage users from giving long ver-bal instructions that try to anticipate special cases.

• The agent should watch what the user doesnot do;implicit negative examples (e.g., the user selectedsurnames but not initials) speed learning and reducepredicted negative examples (i.e., mistakes).

Although Turvy represents just one kind ofinstructible agent, it exemplifies characteristics weadvocate for instructible agents in general. To summa-rize them:

• The agent should be able to be personalized; itshould learn what the individual user is trying to do.

• The user should be able to control the agent byteaching it.

• The user should be able to develop a mental modelof what the agent knows about the task.

• The user should not be required to interact; theagent can learn by watching the user work.

• The agent should learn from its mistakes.• The agent should learn continually, integrating new

cases with prior knowledge.

Figure 5 shows the flow of information between theuser and the agent, while Figure 6 illustrates the flowof information between application and agent. In par-ticular, the lower half of the figure shows interactionsrequired for the agent to operate the application and toaugment its output with feedback (as in Figure 4).Agents often add functionality to applications; forexample, graphical aids such as Mondrian8 deal withinterobject constraints that an ordinary drawing pro-gram might not represent. In this case, the application

Figure 4 Examples of visual and spoken feedback

“DRAW LINE FROM MID-RIGHT OFFIRST BOX TO MID-LEFT OF SECOND BOX.CONTACT WITH OTHER LINE DOESN’T MATTER.”

“LOOKING FOR CAPITAL WORDAT END OF SENTENCE BEFORE QUOTE.”

…Philip E. Agre. “The dynamic…

File Edit Text Object

1

1

“I PREDICT YOU WILL COPY.”

Undo CreateSave Again

CutCopyPasteClear

CVX

X

Page 6: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996544

may be unable to execute all of the script directly, andmust call back to the agent to compute some parame-ters.

Incorporating feedback and user-agent dialogs can beproblematic, since most current systems provide nomeans for other applications to modify their displays.To skirt these problems, many existing agent systemssuch as Eager1 are tightly coupled to applications.

Machine learning requirements

Cima (“Chee-mah”) is a symbolic learning algorithmthat has been implemented to provide the kind oflearning necessary to realize the Turvy scenario. Cimalearns about data, actions, and the conditions underwhich actions are performed. Although a genericlearning system, it is designed to be readily custom-ized for particular applications, such as text editing.

Figure 5 Interactions between user and agent

Figure 6 Interactions between application and agent

QUESTIONS

“DID YOU PICK THIS ONE BECAUSE…?”HINTS “IT’S JUST BEFORE THE QUOTES.”

“LOOK HERE.”

“THIS IS A SPECIAL CASE.”

EXAMPLES

PREDICTIONS

REQUESTS FOR PARAMETERS

ACTIONS, FEEDBACK, DIALOGS

compute: find(B, a rectangle such that:color(B, blue) and aligned(B.left, B2.left))

ACTIONS, DATA CRITERIA, BIASESDraw Line L1touch (L1.left, B1.mid-right)touch (L1.right, B2.mid-left)objects (L1, L2, B1, B2, B3)

Draw Line requires:specify Line.point1(x, y)specify Line.point2(x, y)specify Line.color(r, g, b)

LinePoint spec prefers:minimize-degrees-free(x, y)

Page 7: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 545

At present, it has been partially integrated with a texteditor so that it can record and learn how to search forsyntactic patterns in the text. Its design illustrates twoimportant differences between learning in aninstructible agent and traditional machine learning:reliance on task-oriented background knowledge, andthe interpretation of ambiguous instructions.

Action microtheory. An agent must learn generaliza-tions that map to executable operations within anapplication. We have identified some categories ofactions and their operational criteria. This task knowl-edge, primitive though it is, biases the learner so thatit forms plausible, operational hypotheses from veryfew examples.

Most user-interface actions apply some primitiveoperator (like copy or move) to selected data. Thus,the first key to learning about actions is to learn aboutdata.Data descriptions1 specify criteria for selectingand modifying objects. For instance, suppose the userwants an agent to store electronic mail messages fromPattie Maes in the folder “Mail from pattie,” as shown

at the top of Figure 7. The data descriptionsender’s idbegins “pattie” tells it which messages to select—those from Pattie, regardless of her current worksta-tion. The data descriptionfolder named “Mail from<first word of sender’s id>” tells it where to put them.

Conventional machine-learning algorithms learn toclassify examples. But agentsdo things with data, andto be useful, data descriptions may require features inaddition to those needed for classification. Figure 7illustrates four types of actions: classify data, finddata, generate new data, and modify properties. Cimaencodes utility criteria that specify the characteristicsof a well-formed data description for each type ofaction.

Classify actions have a single utility criterion: to dis-criminate between positive and negative examples.Features with the most discriminating power aretherefore strongly preferred.

Find adds a second criterion: The description mustdelimit objects, and, in some domains, state the direc-

Figure 7 General types of action on data

CLASSIFY

FIND

GENERATE

MODIFY

From

To

Subject

If the message is from “pattie,”then put it in the folder “Mail from pattie.”

Find the next telephone numberpreceded by the word “fax.”

Insert a calendar template of the form:[Next(DayName) Tab Next(DayNumber)

Move the circle to the point at which ared line intersects a blue line.

tel 243-6166fax 284-4707

Mon 21 toDoTue 22 toDo

tel 243-6166fax 284-4707

Mon 21 toDo

maulsby@media

Tuesday meeting

Mail from pattiepattie@media

Tab toDo].

Page 8: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996546

tion of search. Thus a text search pattern specifieswhere the string begins and ends, and whether to scanforward or backward. Features that describe moredelimiters or constraints are preferred.

Generate adds a third criterion: The descriptionshould specify all features of a new instance. If gener-ating a graphic, the description must specify size,

shape, color, etc.; for text, it must specify the actualstring. Thoughuser input is a valid feature value, thesystem strongly prefers value “generators”—con-stants, such as“toDo,” or functions, such asNext(DayName).

Modify stipulates two criteria: The description shoulddiscriminate between positive and negative examples,and it should generate the property’s new value. Fea-tures that determine the new value are preferred tothose that merely constrain it. The graphics examplein Figure 7 shows a conjunction of two features,touch(Circle.center, Line1) and touch(Circle.center,Line2), that together determine a property value, thecircle’s new (x,y) location. By itself, each intersectionleaves one degree of freedom on the circle’s location.The utility criteria for setting an object’s locationassume that the goal is to remove all degrees of free-dom if possible. Hence, features that remove bothdegrees of freedom, e.g.,touch(Circle.center,Line1.midpoint), are preferred over features thatremove one degree of freedom. Cima continues add-ing touch(Circle.center, Line) features until zerodegrees of freedom remain. If the user rejects anexample in which the circle touches two blue lines,Cima adds a third feature—that one of the lines bered—to satisfy the classification criterion.

Interpreting user instructions. Most machine learn-ing systems generalize from positive and negativeexamples, using biases encoded in their algorithms

and perhaps in domain knowledge. Cima learns fromexamples, but also from hints and partial specifica-tions. Hints, which may be verbal or gestural, indicatewhether features of data (such as the color of a line)are relevant. The salient characteristic of hints isambiguity: Cima must choose among multiple inter-pretations. Partial specifications, on the other hand,are unambiguous, though incomplete, descriptions.They specify relevant features, and even rules, andmay be input as textual program fragments or by edit-ing property sheets. Cima’s learning algorithm com-poses rules (which are conjunctions of features) thatmeet the utility criteria for describing an action. Itcombines the evidence of examples, hints, specifica-tions, and background knowledge to select the mostjustifiable features for each rule. By combiningknowledge sources, Cima is able to maximize infor-mation gained from a small set of examples and tochoose the most justifiable interpretation of ambigu-ous hints.

Evaluating instructible agents. It is difficult to per-form evaluation studies on instructible agents,because users tend not to use an instructible system“in the same way” as they do an ordinary nonin-structible system. However, one can specify particulartasks with potential repetitive or semirepetitive sub-tasks, and test whether users are able to successfullyteach the system how to do such tasks. Such a studywas conducted at Apple Computer,9 which is a goodexample of how instructible agents can be evaluated.Potter introduces the notion of “just in time” program-ming,10 a criterion for evaluating instructibility. Heargues that the benefit of automating the task shouldoutweigh the cost of performing the instruction andthe possible risk of the instruction being performedincorrectly.

Mondrian: User interface requirements forlearning from explicit instruction

As another example of an instructible agent, we con-sider Mondrian, a prototype object-oriented graphicaleditor that can learn new graphical procedures dem-onstrated by example. Like Cima, Mondrian observesuser actions and automates procedures in the contextof a familiar interactive application—in Mondrian’scase, a graphical editor. Mondrian implements a muchsimpler learning algorithm than Cima, because itrelies more on explicit instruction, and it provides aninterface for teaching that is carefully integrated intothe underlying interactive application. Procedures aredemonstrated by user interface actions, and the results

Mondrian is a prototypegraphical editor that can

learn graphical proceduresdemonstrated by example.

Page 9: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 547

are put back into the graphical interface in the form oficons generated by the system, so they can take part indefining future operations.

Mondrian addresses the problem of teaching not onlyprocedural, but also declarative knowledge. A teachermight need to teach not only a sequence of actions,but also new concepts in the course of the demonstra-tion. Mondrian’s approach to this isgraphical annota-tion. The user can draw annotations on an image tointroduce new objects and to specify relationsbetween objects. These objects and relations becomehints to the learning process, as expressed in the dis-cussion of Turvy and Cima. Whereas Instructo-Soar3

lets the user describe the relevant features of data in arestricted form of natural language, in a graphicaldomain, drawn annotations can be more natural thanverbal descriptions.

In the text domain in which Cima operates, the dis-play is relatively straightforward. In Mondrian’sgraphic domain a rich set of media interfaces is usedto give the user the feeling of interacting with virtualobjects. Graphical objects can be drawn, moved, cop-ied, etc. Video images can be used as sources for thedemonstration and new objects and relations intro-duced from the video frames. Voice input can be used

to give advice to the agent and voice output is used tocommunicate the agent’s interpretation of the user’sactions. This lets the user adopt a natural “show-and-tell” method for teaching the agent.

To make the discussion more concrete, we introduceone of Mondrian’s application domains, automatingand documenting operational and maintenance proce-dures for electrical devices. TheMIT Media Labora-tory is collaborating with the Italian electronicscompany Alenia Spazio S.p.A., which makes manyspecialized electronic devices for the aerospace indus-try. These devices are characterized by the followingattributes: they are complex, have relatively small usercommunities, must be operated by relativelyuntrained personnel under time constraints, and theconsequences of operational mistakes can be expen-sive. All these factors conspire to make the program-ming of automated equipment, or documentation forteaching the procedures to people, expensive, time-consuming, and error-prone.

A relatively untrained technician who is familiar withthe operation of the device might “teach” the com-puter the steps of the procedure by interacting with anon-screen simulation of the device. Figure 8 showsthe interface for teaching Mondrian how to disassem-

Figure 8 Teaching a technical procedure through a video example

Page 10: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996548

ble a circuit board. The simulation is created by teach-ing the computer (1) about the objects involved, byannotating images captured from videotaped demon-

strations of the procedure, and (2) the actions involvedin the procedure, by demonstrating actions in the userinterface. Mondrian records and generalizes the user-interface actions, and synthesizes a procedure that canbe applied to other examples. Such a procedure couldalso be used to drive robotic assembly or test equip-ment on an assembly line.

Identifying objects in video frames.We would liketo give the user the illusion of interacting with a simu-lation of a device. If we were to program such a simu-lation “from scratch,” the programmer would producerenderings of each object, and operating on theseobjects would produce a corresponding change in therenderings to reflect the consequences of the action.But to do this, the underlying model of the device andits structure must already exist. Here the purpose ofthe interaction is to capture the object model of thedevice and to teach the system about it.

For each step that the user wishes to describe, he orshe must select salient frames that represent the statesbefore and after the relevant action. Subsets of thevideo image serve as visual representations of theobjects. The next step is to label the images with adescription of the objects and their structure. This isshown in Figure 9. The group structure of the graphi-cal editor is used to represent the part/whole structureof the device.

A simple visual parser relates the text and lines to theimages of the parts they annotate. A part/whole graphgives the user visual feedback, shown in Figure 10.The tree of labels shows the conceptual structure, andthe leaves show specific images that represent eachconcept in the example.

As computer vision systems become more practical,the agent could perhaps automatically recognize someobjects and relations. Some partial vision algorithmssuch as object tracking could streamline the annota-tion process. An alternative to choosing and annotat-ing video frames might be to have the user sketch theparts and relations, then relate them to images througha technique like Query by Image Content.11

Descriptions of actions. After annotating the objectsin the frame representing the initial state, the userselects a frame to represent the outcome of the opera-tion. To construct a description of the procedure step,the user must use the operations of the graphical edi-tor to transform the initial state to (an approximationof) the final state.

Figure 9 Parts annotated with text labels

PLUG PANEL

RIBBON CABLE

RIBBON CONNECTOR

PLUG CONNECTOR

CIRCUIT BOARD

Figure 10 Part/whole graph

(A “CIRCUIT-BOARD”)

PARTS:

(A “RIBBON-CABLE”)

(A “RIBBON-CONNECTOR”)

(A “PLUG-PANEL”)

(A “PLUG-CONNECTOR”)

Page 11: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 549

The editor provides both generic operations (Cut,Copy, Paste, Move, …) and operations specific to thedomain (for the repair domain: Assemble, Disassem-ble, Turn On and Turn Off switches and lights, Openand Close fasteners, etc.).

Once a procedure has been recorded, it appears inMondrian’s palette as a domino, with miniaturizedframes of the initial and final states of the procedure.The newly defined operation can then be applied to anew example. Assume that we have recorded a two-step procedure that disassembles the ribbon cableassembly and the plug panel assembly. First, wechoose a video frame, then we outline the parts andadd annotation labels that represent the input to theprocedure. Clicking on the domino for the Disassem-ble-Circuit-Board procedure results in performing thedisassembly of both parts in the new example.

Documenting the procedure. We use a graphical sto-ryboard to represent the procedure in the interface.The storyboard, shown in Figure 11, consists of a setof saved pictures of each step of the procedure,together with additional information that describes thestep; the name of the operation chosen, a miniatureicon for the operation, and the selected arguments arehighlighted. The system also captions each frame ofthe storyboard with a text description of the step. Asimple template-based natural language generator“reads the code” to the user by constructing sentencesfrom templates associated with each function andeach kind of object. The storyboard provides a form

of automatically generated documentation, which,although it does not match the quality of documenta-tion produced by a technical documentation staff, isinexpensive to produce and guaranteed not to omitcrucial steps.

Mondrian and design criteria for instructibleagents

Mondrian operates broadly within the design criteriafor instructible agents outlined in the Turvy experi-ment, but takes a somewhat different approach thanCima in responding to those criteria. The basic collab-oration between the user and the agent is the same.The user demonstrates the task on a concrete examplein the interactive application, and the system recordsthe user’s actions. A learning algorithm generalizesthe sequence of actions so that they can be used inexamples similar to, but not exactly the same as, theexamples used to teach the agent. The user can alsosupply hints that control the generalization processinteractively. Mondrian’s graphical and verbal feed-back communicates to the user what its interpretationsof the user’s actions are, and through this, the userbecomes familiar with its biases and capabilities.Thus it directly meets the first three design criteriaoutlined for Turvy.

In Mondrian, the user is assumed to be presenting theexamples for the explicit purpose of teaching theagent, so the instruction tends to be more explicit thanin other systems, such as Eager, that rely more on

Figure 11 Storyboard for Disassemble Circuit Board

DISASSEMBLE CIRCUIT BOARD

UNSCREW THE RIGHT BOTTOM SCREWOF THE FIRST ARGUMENT

UNSCREW THE RIGHT TOP SCREWOF THE FIRST ARGUMENT

Page 12: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996550

unaided observation. The user explicitly indicateswhich objects are the examples at the start of the dem-onstration, and all subsequent operations are general-ized relative to those examples. Mondrian does notinterrupt the user during the demonstration the wayTurvy does, although it optionally can provide itsgraphical and verbal feedback at each step. Mondriancannot dynamically select which features of theexamples to pay attention to, as does Cima, but thereis no reason why Cima’s feature-selection algorithmcould not be incorporated.

Turvy’s last three design criteria—hints in response toexplanation failures, special cases, and negativeexamples—are not covered in Mondrian, but repre-sent possibilities for future work. Lieberman’s earlierinstructible agent for general-purpose programming,Tinker,12 demonstrated conditional and recursive pro-cedures using multiple examples. This approachallows special cases and exceptional situations to bepresented as supplementary examples after the mostcommon, typical case is demonstrated. This techniquemight also work for Mondrian.

Mondrian, like most pure action learning systems, interms of the action categories discussed for Cima,concentrates on the Generate and Modify actions. Thesystem does not do pure Classification. It relies on theuser to explicitly perform much of the Find actions.However, graphical annotations do allow the systemto perform a kind of Find, since actions taken on apart described by an annotation in a teaching examplewill be performed on an analogous part in a new

example that is selected by the relations specified inthe annotation.

Letizia: An agent that learns from passiveobservation

Another kind of instructible agent is one that learnsnot so much from explicit instruction as from rela-tively passive observation. We illustrate this kind ofagent withLetizia, an agent that operates in tandemwith a conventional browser such as Netscape Com-munication Corp.’s Navigator** to assist a user inbrowsing the World Wide Web. The agent tracks theuser’s browsing behavior—following links, initiatingsearches, selecting specific pages—and tries to antici-pate what items may be of interest to the user. It con-tinuously displays a page containing its currentrecommendations, which the user can choose either tofollow or to ignore and return to conventional brows-ing activity. It may also display a page summarizingrecommendations to the user, as shown in Figure 12.

Here, there is no pretense that the user is explicitlyteaching the system. The user is simply performingeveryday activities, and it is up to the agent to makewhatever inferences it can without relying on theuser’s explicit help or hints. The advantage of suchagents is that they operate more autonomously andrequire less intervention on the part of the user. Ofcourse, this limits the extent to which they can be con-trolled or adapted by the user.

Letizia is in the tradition ofbehavior-basedinterfaceagents,13 so called to distinguish them fromknowl-edge-based agents. These have become popular latelybecause of the knowledge bottleneck that has plaguedpure knowledge-based systems. It is so difficult tobuild up an effective knowledge base for nontrivialdomains—as in conventional expert systems—thatsystems that simply track and perform simple pro-cessing on user actions can be relatively attractive.The availability of increased computing power thatcan be used to analyze user actions in real time is alsoa factor in making such systems practical. Rather thanrely on a preprogrammed knowledge representationstructure to make decisions, the knowledge about thedomain in behavior-based systems is incrementallyacquired as a result of inferences from the user’s con-crete actions.

In the model adopted by Letizia, the search for infor-mation is a cooperative venture between the humanuser and an intelligent software agent. Letizia and the

Figure 12 Letizia helps the user browse the World WideWeb

USER BROWSING

SEARCH CANDIDATES

RECOMMENDATIONS

Page 13: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 551

user both browse the same search space of linked Webdocuments, looking for “interesting” ones. No goalsare predefined in advance. The user’s search has areliable static evaluation function, but Letizia canexplore search alternatives faster than the user can. Inparallel with the user’s browsing, Letizia conducts aresource-limited search to anticipate what pages auser might like to see next. Letizia adopts a mostlybreadth-first search strategy that complements themostly depth-first search strategy encouraged by thecontrol structure of Web browsers such as Netscape’sNavigator. Letizia produces a recommendation on thecurrent state of the user’s browsing and Letizia’s ownsearch. This recommendation is continuously dis-played in an independent window, providing a kind of“channel surfing” interface.

User interest is inferred from two kinds of heuristics:those based on the content of the document and thosebased on user actions in the browser. Letizia readseach page and applies a simple keyword-based mea-sure of similarity between the current document andthe ones the user has explicitly chosen so far. Thismeasure simply counts keywords and tries to weightthem so as to find keywords that occur relatively fre-quently in the document but are rare otherwise. This isa standard content measure in information retrieval.

The second kind of heuristic is perhaps more interest-ing. It uses the sequence of user actions in the browserto infer interest. Each action that the user takes—selecting one link rather than another, saving links, orperforming searches—can be taken as a “trainingexample” for the agent trying to learn the user’sbrowsing strategy. For example, if the user picks apage, then quickly returns to the containing pagewithout choosing any of the links on the page, it mightbe an indication that the user found the page uninter-esting. Choosing a high percentage of links off a par-ticular page indicates interest in that page.

One characteristic of instructible agents is the methodused to temporally relate the actions of the user andthe actions of the agent. In Turvy, Cima, and Mon-drian, the interaction is conversational: the user andthe system “take turns” in performing actions or giv-ing feedback. In Letizia, the agent is continuouslyactive and accepting actions in parallel with the user’sactivity. This increases the sense of autonomy of theagent.

Because of this autonomy, Letizia does not satisfymany of the advisability criteria developed in the

Turvy study. The user’s influence on the agent is lim-ited to browsing choices—but these choices are madenot to influence the agent, but to directly accomplishthe user’s browsing goals.

However, each individual decision that Letizia makesis not so critical as it is in Cima or Mondrian. BecauseLetizia is only making suggestive recommendationsto the user, it can be valuable even if it makes helpfulsuggestions only once in a while. Because it is contin-

uously active, if it misses a possible suggestion atsome time, it will likely get another chance to findthat same choice in the future because of consisten-cies in the user’s browsing behavior. There is a real-time trade-off between requiring more interactionfrom the user to make a better decision and exploringmore possibilities. However, in the future we willprobably back off from the pure unsupervised learn-ing approach of Letizia and provide facilities foraccepting hints and explicit instruction.

In Cima’s discussion of categories of actions forinstructible agents, Letizia’s task is one of almost pureClassification—classifying a possible link as eitherinteresting or not. Find, Generate, and Modify are notrelevant.

ACT: Teaching virtual playmates and helpers

In the ALIVE project (Artificial Life Video Environ-ment), developed in the Autonomous Agents Group atthe MIT Media Laboratory,14 people experience animaginative encounter with computer agents, as theirown video images are mixed with three-dimensionalcomputer graphics.ALIVE agents are designed with“believability” in mind: the goal is to have the userperceive the agent as having distinctive personal qual-ities. Following ethological principles, their behavior

In Letizia, the agent iscontinuously active and

accepting actions in parallelwith the user’s activity.

Page 14: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996552

arises from complex interactions among instinctualmotivations, environmental releasing mechanisms,and the dampening effects of frustration and satiation.At present, all behaviors must be preprogrammed.While some unanticipated behavior emerges from theinteraction of these programs, the agents are nonethe-less noticeably limited, especially in their ability toplay with users.

Perhaps the most interesting class of real agentsincludes those who interact with us adaptively, chang-ing their behavior so that we cannot quite predictthem. They learn to help us or frustrate us, becominga more useful assistant or challenging opponent. The

goal of theALIVE Critter Training project (ACT) is toaugment agents with the ability to learn from the user.ACT is an attempt to bring the kind of learning repre-sented by Cima to a virtual reality world. By trainingagents, users can customize and enrich theirALIVEexperiences. And dealing with the unpredictableresults of training is in itself entertaining and instruc-tive.

The initial version of theACT system learns to associ-ate predefined actions with new releasing mecha-nisms. In particular, actions may be chained togetheras a task. Conceptually,ACT is an agent’s learning“shadow,” watching the agent interact with the userand otherALIVE agents and learning to predict theiractions.ACT has an implicit motivation—to please theuser by learning new behaviors.

One scenario consists of anALIVE world in which theuser and a synthetic character named Jasper play withvirtual LEGO** blocks. Figure 13 shows Jasper help-ing the user to build a slide; since he does not alreadyknow what the user wants to build or what a slide is,he watches the user’s actions and pitches in to per-form any action he can predict, whether it is to gatherup the right sort of blocks or to do the actual building.

The assembly task here resembles the disassemblytask that we taught to Mondrian earlier by graphicalannotation of example video frames, and the learningstrategy of recording and generalizing user actions islike that explored in Turvy and implemented in Cima.As in Cima, the agent looks for patterns in sequencesof user actions and uses utility criteria to choose gen-eralizations.

The spectrum of instructibility

Just as “direct manipulation” and “software tool” aremetaphors describing the user’s perception of operat-ing the computer, so too is “instructible agent.” Physi-cal tools translate a person’s actions in order tomanipulate objects, and software tools translate physi-cal actions (captured by a mouse, keyboard, etc.) intocommands for manipulating data. In contrast, anagent interprets the user’s actions, treating them asinstructions whose meaning depends on both the con-text in which they are given and the context in whichthey are carried out. Whether its task is to help formattext or search the World Wide Web, the agent, unlike atool, must deal with novelty, uncertainty, and ambigu-ity. The relationship between user and agent is basedon delegation and trust, rather than control.

Because of this, some researchers have objected to thenotion of agents on the presumption that they robusers of control over their computers.15 At the heart ofthe antiagent argument lie two beliefs: first, that directmanipulation is good enough for the vast majority oftasks; and second, that “real” programming must bedone in a formal language.

Figure 14 illustrates the spectrum of approaches toaccomplishing tasks with a computer, comparingthem in terms of their effectiveness and ease of use.The “English Butler” at the far right requires an intro-duction: this is a pre-educated, intelligent agent of thesort portrayed in Apple’s Knowledge Navigatorvideo. It understands a good deal of natural languageand already knows how to carry out a variety of tasks;the user need only give a vague description and theButler reasons its way to an executable program.

On the graph shown in Figure 14, brighter hue meansmore or better. The top line compares the approachesby the degree of artificial intelligence they require.Other than its skill at program optimization, a C++compiler evinces little intelligence, as do directmanipulation interfaces, though some have “smart”modes: for instance, drawing programs that snap

The goal is to have the userperceive the ALIVE agent ashaving distinctive personal

qualities.

Page 15: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 553

Figure 13 Building a slide together

The user (shown in silhouette) wants tobuild a slide for the ALIVE character,Jasper, using virtual LEGO blocks.

Jasper is naturally helpful: he tries tolearn what the user is building so he canpitch in.

Jasper pays attention to the way the blocks areconnected. He watches for repeated actions.

The user sets up the ladderframe and attaches two rungs.

When the lad-der gets tootall for hisarms’ reach,Jasper’s“behavior sys-tem” adoptsanothermethod—climbing orjumping—toextend hisreach.

Jasper predictsthat rods ofvarious colorsshould beattachedsome distanceapart along theframe.

Soon he has placed rungs allalong the frame. Meanwhile, theuser has attached the slide.

All done!

Page 16: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996554

objects to one another (Intellidraw**) and text editorsthat predict input (Quicken**) or correct it automati-cally (Microsoft Word**). In contrast, the EnglishButler must be highly intelligent if it is to translatevague instructions into viable procedures; it mustinterpret natural language, maintain a model of dis-course with the user, formulate plans, and changeplans “on the fly” when they fail or when unexpectedopportunities arise.

Instructible agents cover a range of intelligence.Some, like Mondrian, use only very simple rules ofinference. This makes them highly efficient and effec-tive for learning certain tasks. They rely on a well-designed user interface to harness the user’s intelli-gence. Other systems, like Cima, use more sophisti-cated learning methods and background knowledge,trying to maximize the agent’s contribution to thelearning/teaching collaboration. What all instructibleagents have in common is that they avoid relying onthe agent’s intelligence to plan the task: for this, theyrely on the user.

The second scale on Figure 14, run-time adaptability,refers to a system’s ability to change its program inresponse to changing conditions. Once a program iscompiled, it is set in stone—unless written in a

dynamic language such asLISP. Direct-manipulationuser interfaces generally do not adapt their behavior,except to store default values and preferences. TheEnglish Butler must be highly adaptable, otherwise itsplans would fail. Instructible agents range from macrorecorders that fix on a procedure after a single exam-ple (Mondrian) or a few examples (Eager), to systemsthat learn continuously and try to integrate new con-cepts with old ones (Cima).

On the third scale, task automation, traditional pro-gramming and the English Butler score highest, sincethey are able to execute procedures without user inter-vention. Of course, this maximizes the risk of doingsomething bad. Direct manipulation automates onlycertain types of tasks, such as updating all paragraphsof a given style to a new format. Instructible agentscan be used to automate tasks completely, but in gen-eral they are intended for “mixed-initiative” collabo-ration with the user, as exemplified in all the systemswe have discussed.

The next scale, programmability, concerns the degreeto which programmers can determine a system’sbehavior. A “serious” programming system enablesprogrammers to describe any computable task and tochoose an optimal method for doing so. At the oppo-

Figure 14 Comparing a range of instructible systems

C++ PROGRAMMING DIRECT MANIPULATION INSTRUCTIBLE AGENTS “ENGLISH BUTLER”

INTELLIGENCE

RUN-TIME ADAPTABILITY

TASK AUTOMATION

PROGRAMMABILITY

END-USER CONTROL

EFFORT OF INSTRUCTION

EFFORT OF LEARNING

Page 17: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996 LIEBERMAN AND MAULSBY 555

site extreme, direct-manipulation interfaces are inher-ently nonprogrammable, though as noted above, someoffer restricted forms of task automation that the usermay parameterize. Nor is the English Butler truly pro-grammable, since it decides for itself how to accom-plish a task, and it can perform only those tasks it iscapable of planning. Instructible agents are program-mable, and in theory could provide a “Turing-com-plete” instruction set, though in practice they tend torestrict the forms of program that users can create,either because they lack some primitives and controlstructures or because they cannot infer them from agiven set of examples. The inevitable (and desirable)limit on an agent’s exploration of hypotheses is agood argument for providing other means of instruct-ing it: in other words, learning agents can be part of aprogramming system, not necessarily a replacementfor one.

Another issue—a controversial one, as noted above—is the degree to which the end user can control thesystem’s task behavior. A C++ program gives the enduser no control, except through its user interface. Adirect-manipulation interface provides a high degreeof control, provided it is simple to understand andoffers immediate feedback. Unfortunately, direct-manipulation interfaces are becoming more complexand feature-ridden, as software vendors respond tousers’ demands for more powerful features. There isno control without understanding, and such complex-ity makes understanding the interface practicallyimpossible. The English Butler takes commands butcan easily get out of control by misinterpreting them;we presume that this is the sort of agent thatShneiderman15 decries. Instructible agents are moreeasily controlled, because the user can define theirbehavior in detail, as exemplified in Mondrian. Anagent that learns continuously from user instruction,such as Turvy, affords user control over more complexbehavior. Systems such as Letizia, that learn by obser-vation only, can avoid robbing the user of control byoffering suggestions rather than executing actions;since they cannot be guaranteed to do the right thing,they learn to make it easier for the user to do the rightthing.

The last two scales on Figure 14 have been paid themost attention by the designers of instructible agents.The effort of instruction involves the cognitive andphysical operations needed to transmit knowledgefrom the user to the computer. Programming in a for-mal language is high-effort in both respects, whereasdirect manipulation has demonstrated success in mini-

mizing effort—until tasks become repetitive orrequire great precision or involve visually complexstructures. Delegating tasks to the English Butler issimplicity itself—in theory—because the user doesnot even have to know precisely what he or she wants.In practice, describing a task in words is often moredifficult than performing it. Although instructibleagents vary somewhat in ease of use, the goal is gen-erally to minimize the effort of instruction beyond thatneeded to demonstrate an example.

Finally, learning how to instruct the computer is thefirst and greatest hurdle users will encounter.Although direct manipulation interfaces removed the“language barrier” between user and machine, theyare now likely to come with an instructional video toguide the new owner through the maze of special fea-tures. Learning to instruct the English Butler requireslittle or no effort, unless it has “quirks” (such asunpredictable limitations) about which the user mustlearn. Similarly, the users of instructible agents mustlearn about their limitations (such as Turvy’srestricted vocabulary) and about the tools of instruc-tion (such as Mondrian’s graphical annotations). Asnoted above, feedback is considered crucial for help-ing users learn how to teach.

We think instructible agents represent a “sweet spot”in the trade-offs represented by the spectrum shown inFigure 14. They provide a way for the user to easilydelegate tasks to a semi-intelligent system withoutgiving up the possibility of detailed control. Whilethey take advantage of manyAI (artificial intelligence)techniques, they do not require full solutions toAIproblems in order to be useful, and they can enhancethe usefulness of existing applications and interfaces.The best part is that, as you use them, they just keepgetting better.

**Trademark or registered trademark of Apple Computer, Inc.,Microsoft Corp., Netscape Communications Corp., LEGO Sys-tems, Inc., Aldus Corporation, or Intuit.

Cited references

1. Watch What I Do: Programming by Demonstration, A. Cypher,Editor, MIT Press, Cambridge, MA (1993).

2. D. Maulsby and I. H. Witten. “Learning to Describe Data inActions,” Proceedings of the Programming by DemonstrationWorkshop, Twelfth International Conference on MachineLearning, Tahoe City, CA (July 1995), pp. 65–73.

3. S. Huffman and J. Laird, “Flexibly Instructible Agents,”Jour-nal of Artificial Intelligence Research3, 271–324 (1995).

4. H. Lieberman, “Letizia: An Agent that Assists Web Browsing,”International Joint Conference on Artificial Intelligence, Mon-treal, Canada (August 1995).

Page 18: New Instructible agents: Software that just keeps getting betterweb.media.mit.edu/~lieber/Lieberary/Getting-Better/... · 1998. 7. 28. · Instructible interface agentsare embedded

LIEBERMAN AND MAULSBY IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996556

5. D. Maulsby, S. Greenberg, and R. Mander, “Prototyping anIntelligent Agent Through Wizard of Oz,”Conference Pro-ceedings on Human Factors in Computing Systems, Amster-dam, May 1993, ACM Press, New York (1993), pp. 277–285.

6. C. Crangle and P. Suppes,Language and Learning for Robots,CSLI Press, Palo Alto, CA (1994).

7. J. F. Lehman,Adaptive Parsing: Self-Extending Natural Lan-guage Interfaces, Kluwer Academic Publishers, Norwell, MA(1992).

8. H. Lieberman, “Mondrian: A Teachable Graphical Editor,”Watch What I Do: Programming by Demonstration, A. Cypher,Editor, MIT Press, Cambridge, MA (1993), pp. 341–361.

9. K. Kvavik, S. Karimi, A. Cypher, and D. Mayhew, “User-Cen-tered Processes and Evaluation in Product Development,”Interactions 1, No. 3, 65–71, Association for ComputingMachinery, New York (July 1994).

10. R. Potter, “Just in Time Programming,”Watch What I Do: Pro-gramming by Demonstration, A. Cypher, Editor, MIT Press,Cambridge, MA (1993), pp. 513–526.

11. M. Flickner, H. Sawney, W. Niblack, J. Ashley, Q. Huang,B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele,and P. Yanker, “Query by Image and Video Content: The QBICSystem,”IEEE Computer28, No. 9, 23–48 (September 1995).

12. H. Lieberman, “Tinker: A Programming by DemonstrationSystem for Beginning Programmers,”Watch What I Do: Pro-gramming by Demonstration, A. Cypher, Editor, MIT Press,Cambridge, MA (1993), pp. 48–64.

13. P. Maes and R. Kozierok, “Learning Interface Agents,”Pro-ceedings of the National Conference on Artificial Intelligence1993,AAAI , Menlo Park, CA (1993), pp. 459–465.

14. P. Maes, “Artificial Life Meets Entertainment: Lifelike Autono-mous Agents,”Communications of the ACM38, No. 11, 108–111 (November 1995).

15. B. Shneiderman, “Beyond Intelligent Machines: Just Do It,”IEEE Software10, No. 1, 100–103 (January 1993).

Accepted for publication April 10, 1996.

Henry Lieberman MIT Media Laboratory, 20 Ames Street, Cam-bridge, Massachusetts 02139-4307 (electronic mail: [email protected]). Dr. Lieberman has been a research scientist at MITsince 1975, first with the Artificial Intelligence Laboratory, thenwith the Media Laboratory since 1987. At the Media Lab he is amember of the Autonomous Agents Group, and his interests lie atthe intersection of artificial intelligence, the human/computer inter-face, and graphics. He received a B.S. degree in mathematics fromMIT in 1975. His graduate degree, calledHabilitation a Diriger desRecherches, is the highest degree given by a French university. Itwas awarded in 1990 from Université Paris 6, Pierre et Marie Curie,where he was a visiting professor from 1989 to 1990. He has pub-lished over 40 research papers.

David Maulsby CAMIS Medical Informatics Section, StanfordUniversity, Stanford, California 94305 (electronic mail: [email protected]). Dr. Maulsby was awarded the B.Sc., M.Sc.,and Ph.D. degrees, all in computer science, from the University ofCalgary, Canada. During 1995 he was a postdoctoral Fellow in theAutonomous Agents Group at the MIT Media Lab. He is spendingthe second year of his fellowship in the Medical Informatics Groupat Stanford University.

Reprint Order No. G321-5623.