SUBMITTED FOR PAPER PUBLICATION;
DO NOT EXCERPT, CITE OR QUOTE WITHOUT AUTHOR PERMISSION

Issues for Software Agent UI

Alan Wexelblat, Pattie Maes
MIT Media Lab, Software Agents Group

Abstract

Agent user interfaces pose a number of special challenges for interface designers. These challenges can be formulated as a series of issues which must be addressed: understanding, trust, control, distraction, and personification. We examine each of these in turn and draw recommendations for designers in dealing with each of the issues as well as for the overall design of an agent interface based on our experiences with building such systems.

Keywords: agents, user interface, design, interaction

Introduction

This paper highlights some of the user interface issues which must be addressed by software designers who want to use agent technology as part of their system interfaces. It is not a comprehensive review -- that would be too large for any single article. Instead, it focuses on five issues which we have found to be the ones most often encountered by designers or most often raised by users of our systems, based on our six years of experience designing several prototype agents [12]. These issues deal not with hard technical formulae such as Fitt's Law nor even with well-explored interface considerations such as menu layout or dialog design. Instead these issues arise as a result of the attempt to create an interface with a different interaction paradigm. Specifically, these interface issues come about because agents encourage us as software designers to think in terms of delegation rather than manipulation.

Agent user interfaces also raise all the same questions as other interfaces; we should not think that we can escape the demands of design rigor nor throw away the past decades of cognitive science and interface research simply because we think about the interactions differently. In general, this is the case for any new technology. Nothing alleviates the problem of the interaction designer, who must still understand the job the user is trying to accomplish, the context in which the system is to be used, the types of people who will be using the system, and so on. We assume our readers are well versed in that literature and practice and are reading this article to extend their knowledge into new areas; therefore, we will not cover standard HCI issues at all in this article, except to make the point that delegation-based user interfaces are not a "magic bullet" for HCI. Schneiderman [20] has pointed out that agent interfaces are sometimes promulgated as excuses for poor interface design. We agree that this is not acceptable and hope our readers will understand that we attempt to address here new issues raised by new styles of UI design without denigrating the importance of basic UI principles.

To begin with, we will define an agent as a program (or set of programs) with a specific, understandable competence to which a user can delegate some portion of a task. As shown in Figure 1, we talk about an agent being a conceptually separate part in a system. The agent observes the interaction between the user and the application and interacts with the user and with the application. The agent is complementary to the application's existing interface. At the same time, the user and the agent each form models of the other, including information about preferences, capabilities, and so on. We will talk more about these models later.

Figure 1: Preferred Model for Agent/User/Program Interaction

An alternative approach, shown in Figure 2, is to have the agent serve as an intermediary between the user and the application. In this design, the user interacts solely with the agent, which translates the user's input to the program. The agent is the interface, rather than serving as an adjunct to a more conventional interface. An example of this is the Peedee music-selection agent from Microsoft's Persona project [2]; there the agent, modeled as a talking parrot, serves as the interface to a collection of CD music. Our experiments have shown that this second approach can have problems because the user's inability to bypass the agent can cause her to feel out of control, an issue we address in depth below.

Figure 2: Alternative User/Agent/Program Interaction

In our discussion of agent interfaces, we should keep in mind that this is an assistive technology, not a replacement. We talk about agent systems in the context of helping people solve particular problems and not in terms of developing agents which will completely fulfill functions now performed by people. The goal of an agent ideally is to take over the repetitive, boring or impossibly complex portions of a task, leaving the person who is being assisted better able to manage her time and less overwhelmed.

For example: in October, 1996 the Software Agents group recently hosted a demonstration for a large number of our sponsors of a marketplace system, where agents help people buy and sell goods. In this demo, participants were given items such as books, watches and tote bags which they could buy and sell with fake money (called 'bits') which we also supplied. Participants used PC-based kiosks to interact with the system, creating selling agents to dispose of the goods which they didn't want to keep and buying agents to acquire items they wanted. In the course of the day it happened naturally that people who wanted to buy and sell the same item found themselves near each other and negotiated the trade directly. We not only permitted this activity but encouraged it; the point of the Agent Marketplace was not to replace peoples' abilities to buy and sell on their own but rather to enhance it by providing a marketplace mechanism for people who didn't happen to be standing near people with compatible desires.

Neither of the basic interaction designs requires that we conceive of agents as separate shrinkwrapable products like spreadsheets or word processors. Instead, any program can be said to be more or less agent-like, depending on the degree to which it possesses certain properties of interest. These properties are personalization, autonomy, and learning.

Personalization is the ability of the system to know the individual it is assisting -- particularly her goals, interests, habits, preferences, and so forth -- and to change its goals and methods of operation to better meet the specific desires of the individual user. Software can be personalized to an individual, to a position (which may be filled by many people, for example shift workers), or to a company, which may wish its standard procedures and practices to be followed by all employees. By contrast, most current software works in a rote fashion, without even a simple user model to change its operation.

Autonomy is the degree to which the program can act on its own over potentially long periods of time, without detailed explicit instructions from the user, and the ability of the program to operate while the user does other things. In many direct-manipulation applications, the main loop of the application is an event-based loop which consists of waiting for an input from the user, processing the input, showing some result and then waiting for the next event; in an autonomous program, useful work can be done without waiting for every single user event. Users of an agent system should be able to describe their desired end result without needing to specify precise methods for achieving these results. By contrast, think of a database user; she cannot simply ask for items from the database but must specify detailed query strings, often in a precise language. Autonomous programs may continue to operate without input for some time; they may run passively for a time and initiate action in the future; the ability to initiate without specific user input is key to autonomy.

Finally, learning is the property of the software which allows it to observe, understand, and adapt to changes in its environment. Agent terminology derives from artificial intelligence and retains from its hardware origins the notion of sensors -- ways in which the agent can gather data about its environment -- and the notion of effectors -- ways in which the agent can manipulate its environment. In a software environment, the sensors and effectors may be code subroutines without any physical embodiments, but the idea of a program being able to sense its computational environment (users, other agents, other programs, data files, etc.) and manipulate that environment is still applicable. Most software affects its environment to some degree, but the important aspect of learning is that the learning software can sense and understand the changes it is making and react differently based on what it has observed.

The key application of this ability is the ability of the agent to detect and track changes in the personalization properties. It is one thing to set up a system in advance and tell it that you like science fiction television shows; it is a much more useful thing to have a system which detects that your taste for science fiction (or any other prespecified category) waxes and wanes and adapt its behavior to these changes. Likewise, in a business setting, the user's goals will change and the program which is able to sense and adapt to these goal changes is likely to be more useful than one which blindly continues on a set course.

Each of these properties gives us a dimension on which we can measure software systems. To the degree that a piece of software has more autonomy, more personalization and more learning capability, we say that the software is more agent-like and -- for our purposes -- more interesting. These scales are relative, rather than absolute. There is no fixed endpoint of ultimate autonomy, personalization or learning for which one would aim. As technologies such as machine learning and user modeling advance, the possibilities for building better agents also advance.

With this background in mind, the five agent interface issues which we address here are:

Understanding -- how does the user come to understand what the agent can do?
Trust -- how does the user trust the agent?
Control -- how does the user exert control over the agent?
Distraction -- how do we minimize the distraction of the user from her task?
Personification -- how do we represent the agent to the user?

Each of these terms is a mnemonic more than a descriptive in the sense that it tries to capture a broad set of issues in a single word. In addition, the issues are far from orthogonal. In particular, trust and control heavily interrelate, as do distraction and personification. Nevertheless, we will try to look at each of the issues in turn.

Issue 1: Understanding

Part of the defining quality of an agent is that it possesses an understandable competence. In particular, that means that the user must be able to understand what the agent is capable of doing. It is important to distinguish what knowledge from how knowledge. The methods used by an agent to perform the tasks it is assigned are far less important for the user to know -- unlike, say, a compiler where the user must know precisely how files are to be linked together to produce the desired kind of executable. Generally users need only know what results can be obtained by what agent actions in order to understand agent capabilities. Information about how operations are performed can be important from trust or control points of view, which will be addressed next. For understanding purposes, however, information about what the agent is capable of doing is the most important.

From an interface point of view, this leads us to believe that agents must communicate to and take instructions from the user in task- and domain-specific vocabularies. For example, a user might instruct an agent to "create a chart" rather than "run a function," even though the function must be run in order to create the chart. Direct manipulation interfaces are often built assuming that the user has some goal(s) in mind and the interface's job is to present tools which allow the user to achieve her goals. The goal language, however, is often not part of the interface, in the sense that it is rarely stated. For example, the HomeFinder system [24] is designed to help people find homes in a particular area; however, the user is restricted to manipulating those parameters provided by the interface (cost, distance, etc) and must explicitly give values for each of these parameters. For users who wish to use other selection criteria, or for whom the difference between a $140,000 cost and a $150,000 cost is irrelevant the interface provides no help. Users must manipulate each interface widget one at a time. In effect, the system provides tools to solve a complex constraint problem and then steps back -- it is up to the user to operate the tools.

With agent interfaces, by contrast, we often want the agent to operate the tools for the user, in pursuit of which we need to be able to give the agent goal information. That is, we want to be able to express the desired result directly. Therefore, software agent interfaces often need to be able to communicate directly in goal language. An agent-like HomeFinder might take a descriptive like "reasonably priced" and "near work" and translate that into ranges of values which would be fed into the constraint mechanisms to produce sets of solutions.

Often agents will function in the background or more or less continuously as the user does her work. In order ot help users understand the capabilities of an operating agent, the interface designer may wish to show something of the state of the agent's processing; this is even more important in situation where the user may be waiting for the agent to return a result. In many cases, a simple vocabulary can serve to convey all the information necessary. For example, in one calendar-management agent system [7, 11] a simple graphical vocabulary of fewer than a dozen icon/word combinations were used to let users know all the different states and actions of the system. Two of these combinations are show in Figure 3. In fact, the agent had a large number of possible states; however, the designer realized that most of these states were not relevant to helping users with the scheduling task. Had all of the states been revealed, users would likely have been confused by differences that did not, in fact, make a difference. By showing only the important state differences, users' understanding was enhanced. Understanding comes from a careful blend of hiding and revealing agent state and functioning.

Figure 3: Two Agent appearances and meanings (from Kozierok/Maes)

Finally, there is the question of how a user comes to understand all that an agent can do. This is, of course, a similar problem to that faced by designers of conventional user interfaces. Even relatively simple programs can have hundreds or thousands of possible functions, actions, shortcuts and so on. Studies have repeatedly shown that even experienced users take advantage of a small fraction of these choices. An increasingly popular way to address this situation is for an application to show a 'tip' or 'hint' on start-up. These tips, such as the one from Quicken shown in Figure 4, are often phrased as suggestion to the user -- e.g. "Did you know..." These suggestions introduce functionality -- such as keyboard shortcuts -- which may not be obvious from the interface.

Figure 4: Quicken start-up tip

An agent user interface can use this technique and improve on it. Currently the tips shown by applications appear without context -- usually only at start-up and almost always without any relationship to the user's task. An agent, equipped with the ability to observe the user's interactions with the application, can interpret what the user is doing and suggest functionality appropriate to the current task and context. For example, many systems which use learning-by-example [5] styles of interaction observe repeated sequences of user actions and -- once an example sequence has been seen -- can offer to complete the sequence as soon as the user begins another iteration of the same steps. Of course the timing and method used in these offers is important; we will revisit this under Issue 4, Distraction. Another application of these techniques can be seen in agent systems such as Open Sesame from Charles River Analytics [3]. This system uses a hybrid neural network/expert system to detect patterns in user actions and make inferences about preferred user choices from these patterns. As soon as a pattern is detected, the agent offers to automate it.

In addition to introducing new functionality, agent software must often help the user understand what actions it has taken or why a particular action was taken or recommended. For example, a calendar management system's operation may be quite clear if it does not schedule two meetings for overlapping times. However, if the calendar agent suggests a particular time for a meeting, the user may want to understand why that time was suggested rather than other open slots. If the agent can offer an explanation in domain-specific terms, the user will be able to understand better why things happened. In our calendar example, the agent should explain that it selected Wednesday morning for the meeting because it has observed that meetings held at this time are almost never postponed, whereas meetings scheduled for late Tuesday afternoon are often postponed. Internally, it may be the case that the agent derived this from a combination of its rules, but a reference to those rules is less relevant to the user than a reference to the likelihood of a meeting being postponed.

Often information which was key to the agent making a decision can be shown concisely in an interface. For example, a news-filtering agent can highlight key terms or phrases in the article when asked to explain why a particular item was selected for viewing.

In summary, we make the following recommendations for helping users understand agents' capabilities:

use task-specific vocabulary;
keep descriptions of the agents' states simple;
allow the agent to make functionality suggestions at appropriate times and in a gradual way.

Issue 2: Trust

Agents are useful to the degree that users can trust them to perform an understandable task autonomously, without repeated, direct instruction and constant supervision. Trust is usually thought of as an interpersonal concept. That is, we as humans have greater or lesser degrees of trust in other people. In fact, though, we already extend great trust to our software systems. We must do this in any situation in which the actions of the software are opaque to us. For example, we trust that the input we type to forms will be passed to the application as we typed it, not in some altered form; we trust that spreadsheets will correctly calculate the formulas we create. This trust is so complete that when it is not met -- as in the case of the recent Intel chip mathematical flaw -- the event is often industry or national news. Agent software will, of course, require this form of trust. however, agents will go beyond this level, into areas where trust in software is far less universal. In this area, the actions of software have real-world consequences which are intimately personal to the users. Software of all kinds has been used for these sorts of applications for years and, of course, there have been failures. [18]

Agents differ from this second class of software only in that the trust is more personally established. As noted in the introduction, one of the important features of agent interfaces is their degree of personalization to the user they are assisting. To the degree that the agent is able to better adapt to the user's pattern of work, the user will feel any errors made by the agent more personally and will have a harder time trusting the agent. This is a well-known psychological phenomenon: if a software error causes a train crash, then the average user is unlikely to be personally affected and so tends not to pay too much attention to the problem. If, however, agent software fails to record a meeting properly and a sales opportunity is lost, then the user of that agent is highly unlikely to trust the agent, even as he steps onto another train in the same subway system where the crash occurred. This difference in human reaction is constantly faced by designers of physical systems, such as high-voltage power lines, which are perceived to be risky. Agent interface designers must also address the perception of risk if agents are to be useful. A trusted agent may be permitted by users to make travel arrangement, filter incoming news and email streams, and in general take a number of actions with real-world consequences. An untrusted agent will likely not be permitted to take these actions and so may become less than useful.

With this understanding, we can see that trust must inevitably be a part of the agent interface, whether or not the designer addresses the issue directly. We feel it is better to tackle the issue directly. The designer should try to answer two key questions:

How can users express evolving trust?
How can users know what their trust translates into, in terms of agent actions?

Trust is an evolving state that grows gradually as the agent demonstrates itself more capable (just as with people). An agent can be set up to gradually increase the number of things that it attempts to do; this is one advantage of a learning agent: its initial capabilities are weak but as it gets more use it grows more useful. A good example of this is Lieberman's Letizia web-browsing assistant [10], which learns a model of its user's interests by watching what web pages are browsed and deriving relevant key terms from these pages. As the number of pages seen goes up, so does Letizia's ability to correctly suggest the most relevant next pages to be browsed.

Even without a sophisticated learning algorithm, it is possible to introduce agent capabilities gradually. For example, imagine an information-filtering agent which can scan thousands of Usenet newsgroups and produce a customized list of articles to be read by the user. One possible interface for such an agent would be to start up with a list of newsgroups to be filtered and have the user select a subset of them; the agent would then offer its list of selected articles for reading in place of these groups. This interaction requires that the user have a high level of trust in the agent initially, which may not be realistic. A better design, from a trust point of view, might be to have the agent initially observe the user's reading habits. It might -- at first -- offer to filter those newsgroups in which there are a large number of articles, of which the user only reads a few. This low-signal/high-noise situation maximizes the usefulness of the agent while minimizing the effects of missed articles. Here `missed' means an article which the agent erroneously filters out which the user would have preferred to read. By offering the most benefit for the least risk, the agent can promote a user's growing trust.

An even better solution would be to have the agent re-order the news articles, placing the ones it thought the user would be most interested in at the top of the list. The agent could add some kind of visual marker (such as a thick horizontal line) at the bottom of the articles it suggests reading. This way, the user is able to most quickly see which items have been selected while at the same time not being cut off from access to any of the other articles he might choose to read. Once again, this design takes advantage of the agent's ability to maximize benefit, while promoting trust by allowing the user to compare directly the agent's cut-off point with his own and reducing the risk of missed articles.

Another approach to this kind of problem would be to have the agent suggest what it would do and allow the user to approve or disapprove the suggestion before any action is taken. For example, Cypher's Eager system [4] searched for the interface objects it expected the user to select next and pre-highlighted them. The user could accept the selection or override it and select something else if Eager had incorrectly anticipated his desires. The Open Sesame system also takes this approach. Although having suggestions which the user may or may not approve adds another step, and thus possibly an inefficiency, to the interface, the extra effort may be worthwhile if the interface designer expects her users to be unlikely to trust the agent. When users are able to see in advance what the agent would do, they are more likely to trust that agent to do the right thing.

One of the problems of delegating any task, whether it is to a human or to a computer agent, is that the task is likely not to get done in precisely the same way it would be done if the person did it herself. Methodological differences can have a large impact on trust; if I see someone doing a tasks a different way than I do it -- and do not understand that this different method will lead to the same (or better) results -- then I am unlikely to trust that the task will be done. Therefore, it may be advantageous if the agent performs the task initially in the same fashion as the user performs it, even if a more efficient method is available. Once the user has gained confidence that the job will be done, the agent may switch to a more optimal method.

Users' growing trust in an agent can be measured by the degree to which they accept the agent's suggestions. Untrusted agents' suggestions will not be adopted; trusted agents' ideas will more often be taken. It is also possible to allow users to express their trust in the agent directly. One simple way to do this is, as in the earlier situation of the learning-by-example agent, to have the agent offer to take over individual actions that the user is performing. However, in an application with many similar or nearly identical actions -- such as filtering newsgroups -- this could be excessively distracting or inefficient. A better approach might be to allow the user to state a general trust level and have the agent use that information directly. Many algorithms used to generate options for an agent's actions can also generate confidence measures (usually on a real scale from 0.0 to 1.0) indicating how likely the action is to be successful. Users may wish to disallow the agent from taking actions in which it is not highly confident; for example, setting a threshold of 0.8. Any actions for which the algorithm generates a confidence rating below the threshold are not taken. If the confidence ratings are meaningful to the users, this approach may be ideal. For less expert users -- who may not grasp the difference in the application between a 0.7 and a 0.8 confidence -- a better way to approach this might be to allow users to state their confidence level in real-world terms. Users might be able to express that they are 'unsure' of a new agent or 'highly confident' in software they have been using for some time.

Another possibility is that the application domain will offer natural divisions which can be used to demark increasing levels of confidence. For example, agents which deal in a marketplace application might be allowed to make deals autonomously with real-world values of less than $10 but required to get user approval for more valuable transactions. Alternatively, the agent might be permitted to handle the negotiations part of any transaction no matter what the value, but would be required to get user approval before any goods or money actually change hands.

As noted in the Introduction, one of the important activities that goes on between an agent and a user is a mutual modeling effort. Understanding, as described above, is largely about the user's model of the agent: its capabilities, its methods of operation, and so on. The agent's model of the user is important for the Trust issue; specifically, if I am to trust the agent to do what I ask, I need to know that it has a good enough model of my tastes/preferences/desire to perform acceptably. An agent which can show (and possibly explain) its model of the user is more likely to be trusted than one which hides this model. Often the model is built up by direct user input, such as answering questions in a survey or selecting items from a list of options. In these cases it is a simple matter to make an interface screen which allows the user to review her inputs and possibly change them. The agent can then adapt its model of the user to the new input. In other cases -- for example the URL Minder agent offered by Netmind corporation [17] -- the user may have given preference-selecting input over a possibly long period of time. In these cases, a screen which summarizes the input accumulated over time and which allows the user to have the agent discard old input which is no longer valid can be a good way to expose the model.

In summary, we offer the following recommendations for enhancing users' trust in their software agents:

allow trust to evolve;

focus on high-benefit/low-risk activities for agents to begin with;

consider ways for users to express trust directly;

make the agent's model of the user open to examination/change.

Issue 3: Control

Control is, in many ways, the dual of trust. In extending trust to the agent, the user may give up some amount of control -- specifically, the level of control needed to execute the specific actions which the agent is trusted to perform -- but it is important to realize that this is a matter of degree. That is, in using any piece of software, users must give up some control; what is important is that control not be taken from users involuntarily.

Control issues are always at the heart of software agent design for two reasons. First, an agent must be able to operate to some degree autonomously. If the user is required to direct every operation that the agent performs (or pre-script the actions in great detail) then the agent is much less useful. Agent interfaces are inherently delegative -- the user turns things over to the agent to be done rather than doing them herself.

Second, agents must be able to be instructed, or have tasks delegated, at an appropriately high level of detail. Agents operate best when users can specify what the agent is to do, and not have to worry about how it gets done. An easy way to understand this is by analogy to a real-world example. Let's say I call a taxi and ask to be taken to the airport. What matters is that I get to the airport within some 'reasonableness' constraints: the trip should not take too long, nor cost too much, nor be too unsafe. But as long as these reasonableness constraints are obeyed, I am happy to give up control over the driving to the cabbie. In this way, the driver is able to act as my agent for this interaction.

More specifically, we can take advantage of work done on task analysis to address control issues. In any complex task environment there are inevitably points which are more important than others -- places where key decisions are made or where widely varying alternatives must be sorted out. One way to help address the control issue in agent interfaces is to allow the agent to proceed in execution of a task until one of these key points is reached. At these points, the human user is kept (or brought back into) the loop.

Being "in the loop" in this sense means more than just clicking an 'OK' button or answering yes or no in a pop-up window. The user may not realize where in the process the agent is, or how it has gotten to this point. Users must be given the information they need to understand what the agent is asking them to do and how it came to ask this question. Users who do not have meaningful decision-making capability cannot be considered to be in the loop. This implies that the agent should be able to suspend its operation at these points at least -- if not at any point -- since users may not be available or may not have the attention at the time to consider the decision. If this is not possible due to the nature of the task -- perhaps a transaction that, once begun, must be completed within a specified time -- the user should be warned in advance. In some cases the user will be able to give up the appropriate level of control and the agent can proceed to complete the task when the time comes. In other cases, however, this is not possible and the agent should not even begin the task. Nothing makes users feel more out of control than having to 'chase after' a software process which continues on despite their stated intention to the contrary!

One way to help users feel that their agents are not out of control is to allow agent and user actions to be interchangeable. As in the discussion of how tasks get done above, this may involve the agent using a less efficient method to accomplish the task. However, if the user has Pause and Resume buttons available as part of the agent interface, she is much more likely to feel in control. The agent designer must realize that while the agent is paused the user may take some of the steps herself; therefore, when the agent resumes operation it should check the state of the task and make use of any partial work which the user may have done. In some cases, actions taken by the user can be seen as a form of instruction from the user to the agent.

In general, instructing an agent can be made quite simple. If the agent needs to learn a procedure which the user performs, it can either observe the user's actions, detect patterns, and take over the repetitive activity, or it can be put into a 'learn' mode, where the user demonstrates a procedure and the agent records what has been done. The key to this form of learning is the ability of the agent to generalize from the specific examples to an appropriate class of actions. If the agent over-generalizes, the user may start to feel out of control; if it does not generalize enough it will not be useful. In other cases, the agent needs to learn which properties of situations or elements of input are relevant. To do this, the user can often point out the specific important items. For example, in the NEWT news-filtering agent [21], the user interface allowed readers to highlight words or phrases within articles and click a '+' or '-' button. These buttons provided direct positive and negative feedback to the agent on the usefulness of the highlighted items and caused the agent to improve its filtering. This feedback allowed users to have more direct control over the agent's training (and its model of the user's preferences) than simply approving or disapproving articles wholesale, although this latter option was available.

In some cases, the users may be more expert and may wish to have a form of programmatic control over the agent. An agent interface can be augmented with a special-purpose scripting language to allow this, provided that the interface designers believe their users will be able to understand and use the language. For example the OVAL system [14] built by Malone and his students allows users to program agents with rule sets -- essentially IF... THEN... statements which fire off system actions such as invoking another agent, starting a notification program, or saving a message to a particular folder. The agents run these rules through an interpreter provided with the system to decide how to filter the user's incoming mail.

Unfortunately, experience with Beyond Mail, a commercial product based on these ideas, has shown that many users either do not use the rules at all, or do not use them appropriately. This seems to indicate that end-user programming of agents is likely to be useful only for the most experienced of users. This can have unintended side effects: the presence of a programming language which the user realizes he does not understand can make him feel even more powerless. The attempt to make the agent more open and controllable may in fact result in users feeling like they are less in control by reinforcing their lack of knowledge.

Another important aspect of control that interface designers must consider is the degree to which the agent is perceived to be asserting its importance over the user. To a large degree this depends on the kinds of actions taken by the agent, though as we will discuss under the final issue, personification, the way in which the agent presents its actions can also have an influence. For now, though, we will look at kinds of actions. In general, we recommend that agents initially begin not by taking actions but by suggesting actions or offering to do actions, as described above. This suggest/approve/act paradigm may seem inefficient (and may cause more disruption than desired, which we address below) but it gives the user the maximal feeling of control. Of course, as the user is willing to relinquish more control, the agent can take more direct actions, particularly in situations identical to ones where the user has previously given explicit approval.

An effective way to give users control is to open up some of the parameters of the agent to user input. For example, learning algorithms often have settings for how quickly old information is discarded as no longer relevant. An agent which uses keyword matching to track interests may be programmed to omit keywords which have not been used in more than 6 months. In cases where such parameters are simple it is possible to present them in the interface and allow the user direct control over tuning the agent -- its rate of learning or forgetting, the importance it places on unique or rare examples, and so on. In other cases, the parameters themselves may be too complex to allow manipulation but the algorithm may be `tuned' by a combination of settings. For example, information filtering algorithms often trade off the number of articles found in which the user is interested against the total number of articles shown. At one extreme, the agent can show all articles and be sure that all articles of interest are shown. However, this is usually undesirable. In order to reduce the number shown, the agent must risk missing some articles which would have been of interest to the user. Although the parameters of the algorithm making this tradeoff may not be intuitive, it would be easy to provide an interface which allowed userss to control how cautiously an agent behaved. The interface could translate inputs into parameter settings for the algorithms to tune it for the desired behavior.

In closing out our discussion of the issues of trust and control, it is important to stress a respect for individual user differences. Some users rapidly learn to trust their agents and will want to turn over control to them quickly. Other users start out more cautious, preferring to verify the results of agent actions several times then grow to accept that control can be safely turned over. Still others are always cautious and always want to be asked for approval. Informal user testing by Maes and her graduate students with the MAXIMS [13] agent showed all three of these behavior patterns. Given this wide variation in user reaction to potential loss of control, we recommend that agent interfaces be built to allow users to gradually change their level of control, rather than having a specific level or method built in. In many situations, this can be as simple as allowing each user to run a separate copy of the agent; each agent models the user for whom it works and adapts directly to that user's preferences, including his preferred control level.

Trust and control are interrelated issues; as users increase their trust in their agents they will be willing to give up more control. Interface designers can take steps to help users continue to be in control to the degree that they desire, by modeling the user appropriately, and by insuring that the user remains in the loop for critical decisions that must be made. Trust can be extended slowly, often by use of an explicit scale or limit provided in the agent's interface; agents can affect users' trust levels by concentrating first on those cases where the return is highest and the consequences of an agent error are lowest. As Schneiderman [20] points out, one of the key reasons why we must pay close attention to issues of trust and control is that agents may be responsible for actions they take on someone's behalf which have consequences, including potentially harmful consequences. In the American precedent-based legal system there are currently no guiding decisions on who is responsible for actions taken by software of any kind. For example, if an accountant uses a spreadsheet with a bug and therefore prepares an erroneous tax return, it is unclear to what degree the spreadhseet software company may be held responsible. Thus, most software is sold with explicit disclaimers of responsibility designed to shield the software manufacturers from consequences of what people do with their software. Still, in some court cases the software manufacturer has been held responsible -- or legally liable -- particularly when the software does not behave as expected. In other court cases, particularly when the software behaves as intended, only the user has been held to be responsible. If agent software, acting autonomously, appears to be out of control it introduces a further complication into the picture. By keeping users in control of their software agents, we help responsibility rest with the users, though no interface designer can predict how the contradictory court cases will play out in the end. As with the unstoppable actions discussed above, actions with potentially serious consequences require a higher level of attention in the interface.

allow variable degrees of control;

give users the highest-feasible level of control specification;

help users understand the agent's model of the user;

keep users meaningfully in the loop, particularly for key decision points;

begin with suggestions and move to actions later.

Issue 4: Distraction

Distraction is a consequent issue, derived from the preceding three issues. In particular, distraction is that part of usability which is most often emphasized in agent interfaces. As with any computer software interface, designers must pay attention to standard usability issues such as screen layout, color and font choice, and so on. However, because agent software operates to some degree autonomously, it often needs to initiate communication with the user, unlike other interfaces which often communicate with the user only in response to input. This initiation by the program may be audio, visual, or a combination of both. In any form, though, a change in the user interface which is not the obvious result of user input will likely attract the user's attention. If that attention was focussed on getting a job done, the communication from the agent will likely be seen as a distraction.

Distraction is most often a problem because it disrupts the user's normal work pattern. Sometimes disruption occurs without the agent ever being visible -- for example, if your agent is consuming all your system resources so that your other programs cannot run then that clearly disrupts your work. However, most of the time we are concerned with more overt disruptions. These disruptions take place at the user interface, most often in the form of a pop-up appearing or other change in the interface of the agent.

Often these changes come from an effort by the interface designer to handle one of the first three issues we outlined: Understanding, Trust, and Control. In designing for the challenges of each issue, the solution seems to be more communication with the user, more information given to the user, and more input from the user. Each of these seems to require more and more frequent interruptions of the user's tasks in order to make the communication happen. While we agree with the ideal of having agents communicate useful information to the user, for all the reasons outlined above, the issue here is how to approach this ideal while still minimizing disruption. As interface designers, we have to remember that the user's goal is not to interact with the agent, but to have the agent help in completing a task. If the time and effort saved by the software agents is not significantly greater than the time spent training, customizing and interacting with them, then the agents will not be useful and will not be used.

There are four factors which designers can use to minimize disruption by reducing the number and level of interruptions. The first is simply using the least obtrusive possible design for informational messages which are deemed necessary. In some cases, the agent may simply change appearance or place a small flag on the screen and wait for the user to click or otherwise accept the notification before displaying it. The most obvious way to display a notification may not be the best.

The second way to minimize disruption is to differentiate beforehand which notices are the most important. Many less-important communications from the agent can be delivered unobtrusively, usually by email. Interface designers sometimes make the mistake of thinking that -- because the agent is running at its own pace -- messages and interrupts should be delivered and acted upon as they occur; this ignores the fact that it is often easier (and better interface design) to have the agent wait until the user can be safely interrupted, even though this delays the agent's task completion.

If there is no easy way to differentiate important from unimportant events a priori, it may still be possible to have users directly control the level of interaction they want, either by designating certain classes of messages as important, or by specifying a level. This third factor requires the user to do some thinking beforehand and may require the user to become familiar with the kinds of information that the agent can present. As with any form of user learning, if this is likely to be too difficult, or the interface designer cannot expect the user to learn this before interacting with the system, then some other method must be provided for users to give this input. As we discussed under Control, it may be effective enough to simply provide the user with a widget and allow her to specifically set a notification level based on some aspect of the agent's state, such as its confidence in a suggestion.

For example, the widget of Figure 5 (adapted from the Meeting Scheduling Agent described by Maes in [11]) gives the user simultaneous Control and Disruption options via dual sliders. One slider sets a "tell me" level, by which the user asks the agent to give him suggestions or information whenever the agent's confidence in a pending proposal is higher than 0.5; the other slider allows the agent to act only when it is more than 90% confident in the action it has chosen. A setting of 1.0 for either slider would prevent the agent from sending notifications or taking actions, respectively, allowing the user to remain ultimately in control or completely uninterruped if she so desires.

Figure 5: Dual-control slider

Finally, the fourth way to control the level of distraction comes from task analysis. If the agent is aware of the structure of the user's task then it can adapt its level of communication to the importance of the current operation. If the user is engaged in a more critical phase of work then the number of interruptions should be minimized; in other cases more disruption can be tolerated. Of course, this requires a good deal of up-front effort by the agent designers and programmers but in some cases giving the agent a large knowledge base from which to work is the only way to have it behave usefully in the target environment. For example, an agent which assists in selling goods would do well to know the difference between casual and time-critical goods offered for sale. An old PC may be sold casually over many days; a basket of fresh produce must be quickly dealt with.

As noted, the goal of an agent interface designer dealing with the Distraction issue is to minimize intrusion into the user's work environment. Humans are still evolving conversational and social protocols to handle these problems among themselves. For example, we learn which of our coworkers can be interruped when they are typing or when their office doors are half-closed, but we learn it haltingly, with much trial and error. Agents will probably never be better than humans at this; however, since humans do acheive acceptable solutions to this problm, it does suggest that it might be worthwhile teaching our agents the human rules of convention for when interruption is permitted. Eventually, agents may learn rules of human conversation and turn-taking, such as nodding, using back-channels, and so on. The Ph.D. work of Kris Thorisson from the MIT Media Lab [22] points in this direction.

Distraction from, or disruption of, a user's work can be minimized by having the agent:

learn when interruptions are appropriate;
reduce or eliminate overt interruptions such as pop-ups;
use unobtrusive notifications (e.g. communicate via email);
allow users to initiate communication.

Issue 5: Personification

If agents are going to take advantage of human-like social paradigms, this raises issues of how person-like an agent needs to be. In most cases, the agent not only does not need to be person-like, but may not even need to be directly represented. In fact, most commercial agents today do not use any visible representation; however, as we discuss below, personification may offer some advantages and may be seen more often in the future.

For our work we separate the issues of anthropomorphism from personification. Anthropomorphism is the use of graphical and/or audio interface components which are used to give the agent a human-like presence at the interface. Interesting work on anthropomorphic interfaces is being done by a number of researchers such as Walker and Sproull [23]. A presence at the interface, no matter how un-human-like, is often called "the agent" even though the programs which do the real work of assisting the user are quite separate from the graphical animation routines which control the screen appearance. Designers should be careful about misleading customers in these cases. It is also possible to develop interface agents beyond simple appearance and endow them with a form of personality or behavior. For example, work done by Brenda Laurel [9] and by Tim Oren [19] has explored ways in which peoples' expectations of social interactions can be made to work in favor of the interface. Their work has largely involved the use of human-like or human-appearing agents. Conversely, personification is the tendency to assign human-like characteristics, such as emotions and sophisticated planning capabilities, to non-human animals and objects. The root of these assignments is often called intentionality -- that is, we speak of things as though they could form intentions.

This intentional stance [6] is what allows us to say things like "the printer doesn't like me" or "this car doesn't want to go where I steer it." In neither case does the speaker truly believe that the object being spoken of has likes or wants in the sense that human beings have them. However, this language provides conversational shortcuts for describing problems or situations, with the expectation that our listeners will understand what we mean. People use models of things they know to explain the actions of things which seem novel or otherwise unexplainable. Human quirkiness is well known and serves as a good model for the unpredictable behavior of devices. We believe that these are natural tendencies not specific to software systems of any kind. We take it to be inevitable that people will personalize agents to some degree; humans are notorious for personalizing everything from their pets to their possessions. The issue for the interface designer is not whether or not the agent should be personalized, but to what degree is the natural tendency of users to be worked with versus to what degree must it be fought. These are not simple issues, nor do we have space to explore them all here. People such as Clifford Nass at Stanford [16] have been looking at the questions raised by treating computer systems as partners in social interactions; their results appear to show that the rules we are used to applying to human-human interactions can be translated relatively unchanged to human-computer interactions, at least insofar as human attitudes and perceptions are concerned.

The tension over personification revolves around two opposing poles. On the one hand, we can use the naturalness of the intentional stance to improve our agent interfaces; on the other hand, we must take care that -- to the extent that we allow or encourage human-like interactions with users -- we do not mislead them into thinking that the system is more capable than it truly is. Often the differences can be quite subtle; personification can creep into interfaces without the designer's intention, and many designers can tell stories of users who were misled into overestimating even simple system capabilities.

For example, imagine that the user selects a Save command in a standard text editor. The message "File Saved" might appear on completion of the command. However, this begs the question of who is responsible for the file being saved. Did the computer save it? Did the editor program? Did the user? Critics of agent technology, such as Jaron Lanier [8], have argued that agents disempower users by obfuscating issues such as who is responsible for actions being taken in the software. However, we see these issues even in ordinary interfaces; agent software did not create the human tendency to personify, nor did they originate ambiguity in the interface. When we see actions take place, we naturally look for causes, but the causes may not be obvious. For example, the ultimate cause of the File Saved message may in fact be the person who installed the software and chose a setup preference which automatically saves the file every 15 minutes.

In agent software, it is important to consider what effect it may have on the user if we hide the causes for agent actions. Agents are more useful if they are able to act autonomously, based on the user's trust in turning over control of part of a task. To the extent that the user is able to think of the agent as an intentional entity we can encourage the delegation style of interface which characterizes good agent systems. However, the user must not think she is turning over this task to a human being, one who will be able to use all the common sense and intelligence we expect from people. We must personify agents in the way that cars or other familiar objects are personified, so that users are not misled. To this end, we recommend using an obviously non-realistic depiction for the agent, or none at all. In particular, there is a rich history in American culture of cartoon depictions of people that convey emotions, attitudes and intentions [15] without there being the slightest doubt that no real person is portrayed. Using these caricatures and abstract depictions we can help the user understand what functions the agent is able to perform and at the same time make our interfaces less hostile and more fun to interact with. Of course, it is possible to go overboard and end up with an interface that is silly or unintentionally ridiculous. The "Lifestyle Finder" experimental agent made by Andersen Consulting [1] plays with this by using the persona of "Waldo the Web Wizard." Here the joking and self-deprecating nature of the interface helps remind users that the system is a beta-test prototype; a production version of the same agent would probably require a completely different persona.

Figure 6: Andersen Consulting's Lifestyle Finder Agent Image

Figure 7: Idle and Notice Agent Images

The benefits of depicting an agent as a character in an interface revolve primarily around focus and helping users anticipate. If the agent software is making suggestions for actions and these suggestions are appearing more or less out of nowhere it can be quite disconcerting for users. A ubiquitous but disembodied agent could be quite disconcerting to users. Likewise, the agent may want to put up notifications, as discussed above. Unfortunately, the notifications may not be visible if they are not properly located, or if properly located may be too obtrusive. If there is a place on the screen where the user goes to interact with the agent, notifications can be placed there, perhaps by changing the appearance of the agent. For example, the two pictures in Figure 7 show different states of a cartoon character used in our interface personalization prototype (see below). The right-hand image is used when the agent has something it wants to communicate to the user. The difference is direct and obvious while at the same time the `notification' image takes up no more screen real estate than the other image. By keeping an eye on the agent icon, users can anticipate what the agent will do or say next, reducing surprise factors

In addition to having a known place to check for notifications and suggestions, an agent appearance on-screen can help users who are trying to control or direct an agent, as described above. It is a simple step to make the agent's on-screen appearance active, so that clicking on it with the mouse brings up help screens, instructional screens, and so on. From an interface design point of view, this can be quite beneficial; often the interface elements related to controlling the agent are orthogonal to the functioning of the application and there is no good place to put them into an integrated interface without disrupting the application's interface design. If they are 'hidden' behind the agent's appearance they do not disrupt the interface design, but can be found by users who look for them in the natural place -- where the agent is. Even if the agent is not directly represented on-screen, the interface designer would do well to provide a central control point, such as a menu-bar item or button, where the user can look for all agent controls.

If the interface is to take advantage of the intentional stance, then it is often to the interface designer's advantage to use deliberately intentional language. For example, an agent interface would do better to say "I will filter your email..." or "This program will filter your email..." instead of "Your email will be filtered..." It must be clear to the user that the 'I' refers to the agent and not to another user or person.

In summary, the use of graphics and intentional language can allow the user to understand the program's capabilities as a coherent set of actions to be taken by a recognizable entity. This can encourage the user to engage in a more delegative style of interaction; however, designers must be careful not to mislead users into thinking a program is more capable than it really is. Even something as simple as using natural language sentences in interface dialogs may lead users to believe that a program can handle arbitrary natural language input.

Our personification recommendations are:

direct representation of the agent is not a necessity;
beware of misleading impressions; use representations appropriate to the agent's abilities;
provide ways for users to focus their attention and agent-related input.

Summary

One of the reasons we call our systems "agents" as opposed to simply "software" is to remind users that whenever you delegate something to be done by another, things may not be done the way you wanted. You have to understand what the other is capable of doing, trust that the task can be done within the applicable reasonableness constraints, give up a measure of control, and accept that sharing the workload with another necessarily imposes some amount of communication overhead. When the `other' is a person, we can draw on thousands of years of social experimentation to help us cope with these difficulties. As software agents begin to be available to assist us, we find ourselves needing solutions for these old problems in new domains.

This paper has presented five of the most important issues in agent user interface design. They emerge from an ongoing research program; there are no hard-and-fast answers to the problems posed. Instead, we can only offer guidelines and suggestions for what choices can be made and what the factors are that must be traded off in the search for a good answer. We believe these issues apply to all agent interfaces, though different systems will take different approaches to solving them.

These issues do not always come in nice separable packages; for example, the appearance of an agent personified on the screen can have definite effects on its perceived trustworthiness. Some users may find cartoon depictions, or any sort of graphical animation, excessively distracting. Solid principles of interface design can never be ignored; as we stated in the beginning, delegative interfaces are not a magic bullet for designers. There are also important issues in agent systems which affect but which are not directly related to the interface. For example, it is an open question whether a personalized agent should mimic the user's "bad" habits -- where bad may be defined by the user herself or by her environment -- or whether the agent should attempt to instill better habits. We hope that further experience with building agent systems will help us develop guidelines for these sorts of issues as well. However, based on our experience and research on agent interfaces, we offer the following Top 10 principles for agent interface designers:

Make the agent's user model available for inspection and modification.
Always allow the user to bypass the agent when desired.
Allow the user to control the agent's level of autonomy.
Use gradual approaches whenever possible (e.g. learning, scope of operation, severity of action).
Provide explanation facilities.
Give concise, constant, non-intrusive feedback to the user about the agent's state, actions, learning, etc.
Allow the user to program the agent without needing to be a programmer (e.g. manipulate levels, control learning/forgetting).
Do not hide the agent's methods of operation from deliberate user inspection; conversely, do not force the user to understand them either.
Communicate with the user in her language, not in the agent's language.
Integrate the agent's suggestions and actions into the application interface to the greatest extent possible, rather than requiring the user to find separate windows or learn new controls.

As users' computational environments become more complex, as computer products move into more consumer-oriented and more broad-based applications, and as people attempt to take on more new tasks, we find ourselves in desperate need of improvements to the interface, improvements which will enable users to accomplish what they want more efficiently, more pleasantly and with less attention to computer-imposed detail. Software agents are one way to, as Brenda Laurel [9] put it:

...mediate a relationship between the labyrinthine precision of computers and the fuzzy complexity of man.

In order for that to happen, we will have to find ways to build interfaces to software agents that maximize their potential while avoiding the worst of their risks. Precisely how we can do that is likely to remain a matter of debate for many years.

References

Andersen Consulting makes a prototype of the Lifestyle Finder agent available at <URL: http://bf.cstar.ac.com/lifestyle/>
Ball, Gene, et al. "Lifelike Computer Characters: The Persona project at Microsoft Research," Software Agents, Jeffrey Bradshaw, (ed.), MIT Press, 1996.
Caglayan, Alper et al. "Lessons from Open Sesame!, a User Interface Learning Agent." First International Conference and Exhibition on The Practical Application of Intelligent Agents and Multi-Agent Technology, PAP, Blackpool, Lancashire, UK, 1996.
Cypher, Allen. "Eager: Programming Repetitive Tasks by Example," Proceedings of CHI'91, ACM Press, New York, 1991.
Cypher, Allen, Daniel Halbert, David Kurlander. Watch What I Do: Programming by Demonstration, MIT Press, 1993.
Dennett, Daniel. The Intentional Stance, MIT Press, 1989.
Kozierok, Robyn. A Learning Approach to Knowledge Acquisition for Intelligent Interface Agents, S.M. Thesis, MIT Department of Electrical Engineering and Computer Science, 1993.
Lanier, Jaron. "Agents of Alienation," interactions, July 1995.
Laurel, Brenda. "Interface Agents: Metaphors with Character," Software Agents, Jeffrey Bradshaw, (ed.), MIT Press, 1996.
Lieberman, Henry. "Autonomous Interface Agents," Proceedings of CHI'97, Atlanta, GA, ACM Press, 1997.
Maes, Pattie. "Agents that Reduce Work and Information Overload," Communications of the ACM, Vol 37#7, ACM Press, 1994.
Maes, Pattie. "Intelligent Software," Scientific American, Vol. 273, No.3, pp. 84-86, September 1995.
Maes, Pattie and Robyn Kozierok. "Learning Interface Agents," Proceedings of AAAI'93, AAAI Press, 1993.
Malone, Lai & Fry. "Experiments with Oval: A Radically-Tailorable Tool for Cooperative Work," MIT Center for Coordination Science Technical Report #183, 1994.
McCloud, Scott. Understanding Comics, Harperperennial Library, 1994.
Nass, Clifford, Jonathan Steuer, Ellen Tauber. "Computers are Social Actors," Proceedings of CHI'94, Boston, MA, ACM Press, 1994.
Netmind Corporation, "Your Own Personal Web Robot," available at <URL: http://www.netmind.com/URL-minder/URL-minder.html>
Neuman, Peter. Computer-Related Risks, ACM Press/Addison Wesley, 1995.
Oren, Tim et al. "Guides: Characterizing the Interface," The Art of Human-Computer Interface Design, Brenda Laurel (ed.), Addison Wesley, 1990.
Shneiderman, Ben. "Looking for the Bright Side of User Interface Agents," interactions, ACM Press, Jan 1995.
Sheth, Beerud and Pattie Maes. "Evolving Agents for Personalized Information Filtering," Proceedings of the Ninth Conference on Artificial Intelligence for Applications, IEEE Computer Society Press, 1993.
Thorisson, Kristinn. "Dialogue Control in Social Interface Agents," InterCHI Adjunct Proceedings, Amsterdam, Holland. ACM Press, 1993.
Walker, Janet, Lee Sproull and R. Subramani. "Using a Human Face in an Interface," Proceedings of CHI'94, Boston, MA, ACM Press, 1994.
Williamson, Christopher. "The Dynamic HomeFinder: evaluating Dynamic Queries in a real-estate information exploration system," in Sparks of Innovation in Human-Computer Interaction, Ben Shneiderman (ed.), Ablex Publishing Corp, 1993.

http://wex.www.media.mit.edu/people/wex/

Except where otherwise noted.

References

Last Modified: 03:54pm , November 04, 1997