SUBMITTED FOR PAPER PUBLICATION;
DO NOT EXCERPT, CITE OR QUOTE
WITHOUT AUTHOR PERMISSION
Alan Wexelblat, Pattie Maes
MIT
Media Lab, Software Agents Group
Agent user interfaces pose a number of special challenges for
interface designers. These challenges can be formulated as a series of issues
which must be addressed: understanding, trust, control, distraction, and
personification. We examine each of these in turn and draw recommendations for
designers in dealing with each of the issues as well as for the overall design
of an agent interface based on our experiences with building such systems.
Keywords: agents, user interface, design,
interaction
This paper highlights some of the user interface issues which
must be addressed by software designers who want to use agent technology as part
of their system interfaces. It is not a comprehensive review -- that would be
too large for any single article. Instead, it focuses on five issues which we
have found to be the ones most often encountered by designers or most often
raised by users of our systems, based on our six years of experience designing
several prototype agents [12].
These issues deal not with hard technical formulae such as Fitt's Law nor even
with well-explored interface considerations such as menu layout or dialog
design. Instead these issues arise as a result of the attempt to create an
interface with a different interaction paradigm. Specifically, these interface
issues come about because agents encourage us as software designers to think in
terms of delegation rather than manipulation.
Agent user interfaces also raise all the same questions as
other interfaces; we should not think that we can escape the demands of design
rigor nor throw away the past decades of cognitive science and interface
research simply because we think about the interactions differently. In general,
this is the case for any new technology. Nothing alleviates the problem of the
interaction designer, who must still understand the job the user is trying to
accomplish, the context in which the system is to be used, the types of people
who will be using the system, and so on. We assume our readers are well versed
in that literature and practice and are reading this article to extend their
knowledge into new areas; therefore, we will not cover standard HCI issues at
all in this article, except to make the point that delegation-based user
interfaces are not a "magic bullet" for HCI. Schneiderman [20]
has pointed out that agent interfaces are sometimes promulgated as excuses for
poor interface design. We agree that this is not acceptable and hope our readers
will understand that we attempt to address here new issues raised by new styles
of UI design without denigrating the importance of basic UI principles.
To begin with, we will define an agent as a program (or set of
programs) with a specific, understandable competence to which a user can
delegate some portion of a task. As shown in Figure
1, we talk about an agent being a conceptually separate part in a system.
The agent observes the interaction between the user and the application and
interacts with the user and with the application. The agent is complementary to
the application's existing interface. At the same time, the user and the agent
each form models of the other, including information about preferences,
capabilities, and so on. We will talk more about these models later.

Figure 1: Preferred Model for Agent/User/Program
Interaction
An alternative approach, shown in Figure
2, is to have the agent serve as an intermediary between the user and the
application. In this design, the user interacts solely with the agent, which
translates the user's input to the program. The agent is the interface, rather
than serving as an adjunct to a more conventional interface. An example of this
is the Peedee music-selection agent from Microsoft's Persona project [2];
there the agent, modeled as a talking parrot, serves as the interface to a
collection of CD music. Our experiments have shown that this second approach can
have problems because the user's inability to bypass the agent can cause her to
feel out of control, an issue we address in depth below.

Figure 2: Alternative User/Agent/Program
Interaction
In our discussion of agent interfaces, we should keep in mind
that this is an assistive technology, not a replacement. We talk about agent
systems in the context of helping people solve particular problems and not in
terms of developing agents which will completely fulfill functions now performed
by people. The goal of an agent ideally is to take over the repetitive, boring
or impossibly complex portions of a task, leaving the person who is being
assisted better able to manage her time and less overwhelmed.
For example: in October, 1996 the Software Agents group
recently hosted a demonstration for a large number of our sponsors of a
marketplace system, where agents help people buy and sell goods. In this demo,
participants were given items such as books, watches and tote bags which they
could buy and sell with fake money (called 'bits') which we also supplied.
Participants used PC-based kiosks to interact with the system, creating selling
agents to dispose of the goods which they didn't want to keep and buying agents
to acquire items they wanted. In the course of the day it happened naturally
that people who wanted to buy and sell the same item found themselves near each
other and negotiated the trade directly. We not only permitted this activity but
encouraged it; the point of the Agent Marketplace was not to replace peoples'
abilities to buy and sell on their own but rather to enhance it by providing a
marketplace mechanism for people who didn't happen to be standing near people
with compatible desires.
Neither of the basic interaction designs requires that we
conceive of agents as separate shrinkwrapable products like spreadsheets or word
processors. Instead, any program can be said to be more or less agent-like,
depending on the degree to which it possesses certain properties of interest.
These properties are personalization, autonomy, and
learning.
Personalization is the ability of the system to know the
individual it is assisting -- particularly her goals, interests, habits,
preferences, and so forth -- and to change its goals and methods of operation to
better meet the specific desires of the individual user. Software can be
personalized to an individual, to a position (which may be filled by many
people, for example shift workers), or to a company, which may wish its standard
procedures and practices to be followed by all employees. By contrast, most
current software works in a rote fashion, without even a simple user model to
change its operation.
Autonomy is the degree to which the program can act on its own
over potentially long periods of time, without detailed explicit instructions
from the user, and the ability of the program to operate while the user does
other things. In many direct-manipulation applications, the main loop of the
application is an event-based loop which consists of waiting for an input from
the user, processing the input, showing some result and then waiting for the
next event; in an autonomous program, useful work can be done without waiting
for every single user event. Users of an agent system should be able to describe
their desired end result without needing to specify precise methods for
achieving these results. By contrast, think of a database user; she cannot
simply ask for items from the database but must specify detailed query strings,
often in a precise language. Autonomous programs may continue to operate without
input for some time; they may run passively for a time and initiate action in
the future; the ability to initiate without specific user input is key to
autonomy.
Finally, learning is the property of the software which allows
it to observe, understand, and adapt to changes in its environment. Agent
terminology derives from artificial intelligence and retains from its hardware
origins the notion of sensors -- ways in which the agent can gather data about
its environment -- and the notion of effectors -- ways in which the agent can
manipulate its environment. In a software environment, the sensors and effectors
may be code subroutines without any physical embodiments, but the idea of a
program being able to sense its computational environment (users, other agents,
other programs, data files, etc.) and manipulate that environment is still
applicable. Most software affects its environment to some degree, but the
important aspect of learning is that the learning software can sense and
understand the changes it is making and react differently based on what it has
observed.
The key application of this ability is the ability of the
agent to detect and track changes in the personalization properties. It is one
thing to set up a system in advance and tell it that you like science fiction
television shows; it is a much more useful thing to have a system which detects
that your taste for science fiction (or any other prespecified category) waxes
and wanes and adapt its behavior to these changes. Likewise, in a business
setting, the user's goals will change and the program which is able to sense and
adapt to these goal changes is likely to be more useful than one which blindly
continues on a set course.
Each of these properties gives us a dimension on which we can
measure software systems. To the degree that a piece of software has more
autonomy, more personalization and more learning capability, we say that the
software is more agent-like and -- for our purposes -- more interesting. These
scales are relative, rather than absolute. There is no fixed endpoint of
ultimate autonomy, personalization or learning for which one would aim. As
technologies such as machine learning and user modeling advance, the
possibilities for building better agents also advance.
With this background in mind, the five agent interface issues
which we address here are:
Each of these terms is a mnemonic more than a descriptive in
the sense that it tries to capture a broad set of issues in a single word. In
addition, the issues are far from orthogonal. In particular, trust and control
heavily interrelate, as do distraction and personification. Nevertheless, we
will try to look at each of the issues in turn.
Part of the defining quality of an agent is that it possesses
an understandable competence. In particular, that means that the user must be
able to understand what the agent is capable of doing. It is important to
distinguish what knowledge from how knowledge. The methods used by an agent to
perform the tasks it is assigned are far less important for the user to know --
unlike, say, a compiler where the user must know precisely how files are to be
linked together to produce the desired kind of executable. Generally users need
only know what results can be obtained by what agent actions in order to
understand agent capabilities. Information about how operations are performed
can be important from trust or control points of view, which will be addressed
next. For understanding purposes, however, information about what the agent is
capable of doing is the most important.
From an interface point of view, this leads us to believe that
agents must communicate to and take instructions from the user in task- and
domain-specific vocabularies. For example, a user might instruct an agent to
"create a chart" rather than "run a function," even though the function must be
run in order to create the chart. Direct manipulation interfaces are often built
assuming that the user has some goal(s) in mind and the interface's job is to
present tools which allow the user to achieve her goals. The goal language,
however, is often not part of the interface, in the sense that it is rarely
stated. For example, the HomeFinder system [24]
is designed to help people find homes in a particular area; however, the user is
restricted to manipulating those parameters provided by the interface (cost,
distance, etc) and must explicitly give values for each of these parameters. For
users who wish to use other selection criteria, or for whom the difference
between a $140,000 cost and a $150,000 cost is irrelevant the interface provides
no help. Users must manipulate each interface widget one at a time. In effect,
the system provides tools to solve a complex constraint problem and then steps
back -- it is up to the user to operate the tools.
With agent interfaces, by contrast, we often want the agent to
operate the tools for the user, in pursuit of which we need to be able to give
the agent goal information. That is, we want to be able to express the desired
result directly. Therefore, software agent interfaces often need to be able to
communicate directly in goal language. An agent-like HomeFinder might take a
descriptive like "reasonably priced" and "near work" and translate that into
ranges of values which would be fed into the constraint mechanisms to produce
sets of solutions.
Often agents will function in the background or more or less
continuously as the user does her work. In order ot help users understand the
capabilities of an operating agent, the interface designer may wish to show
something of the state of the agent's processing; this is even more important in
situation where the user may be waiting for the agent to return a result. In
many cases, a simple vocabulary can serve to convey all the information
necessary. For example, in one calendar-management agent system [7,
11]
a simple graphical vocabulary of fewer than a dozen icon/word combinations were
used to let users know all the different states and actions of the system. Two
of these combinations are show in Figure
3. In fact, the agent had a large number of possible states; however, the
designer realized that most of these states were not relevant to helping users
with the scheduling task. Had all of the states been revealed, users would
likely have been confused by differences that did not, in fact, make a
difference. By showing only the important state differences, users'
understanding was enhanced. Understanding comes from a careful blend of hiding
and revealing agent state and functioning.

Figure 3: Two Agent appearances and meanings (from
Kozierok/Maes)
Finally, there is the question of how a user comes to
understand all that an agent can do. This is, of course, a similar problem to
that faced by designers of conventional user interfaces. Even relatively simple
programs can have hundreds or thousands of possible functions, actions,
shortcuts and so on. Studies have repeatedly shown that even experienced users
take advantage of a small fraction of these choices. An increasingly popular way
to address this situation is for an application to show a 'tip' or 'hint' on
start-up. These tips, such as the one from Quicken shown in Figure
4, are often phrased as suggestion to the user -- e.g. "Did you know..."
These suggestions introduce functionality -- such as keyboard shortcuts -- which
may not be obvious from the interface.

Figure 4: Quicken start-up tip
An agent user interface can use this technique and improve on
it. Currently the tips shown by applications appear without context -- usually
only at start-up and almost always without any relationship to the user's task.
An agent, equipped with the ability to observe the user's interactions with the
application, can interpret what the user is doing and suggest functionality
appropriate to the current task and context. For example, many systems which use
learning-by-example [5]
styles of interaction observe repeated sequences of user actions and -- once an
example sequence has been seen -- can offer to complete the sequence as soon as
the user begins another iteration of the same steps. Of course the timing and
method used in these offers is important; we will revisit this under Issue
4, Distraction. Another application of these techniques can be seen in agent
systems such as Open Sesame from Charles River Analytics [3].
This system uses a hybrid neural network/expert system to detect patterns in
user actions and make inferences about preferred user choices from these
patterns. As soon as a pattern is detected, the agent offers to automate it.
In addition to introducing new functionality, agent software
must often help the user understand what actions it has taken or why a
particular action was taken or recommended. For example, a calendar management
system's operation may be quite clear if it does not schedule two meetings for
overlapping times. However, if the calendar agent suggests a particular time for
a meeting, the user may want to understand why that time was suggested rather
than other open slots. If the agent can offer an explanation in domain-specific
terms, the user will be able to understand better why things happened. In our
calendar example, the agent should explain that it selected Wednesday morning
for the meeting because it has observed that meetings held at this time are
almost never postponed, whereas meetings scheduled for late Tuesday afternoon
are often postponed. Internally, it may be the case that the agent derived this
from a combination of its rules, but a reference to those rules is less relevant
to the user than a reference to the likelihood of a meeting being postponed.
Often information which was key to the agent making a decision
can be shown concisely in an interface. For example, a news-filtering agent can
highlight key terms or phrases in the article when asked to explain why a
particular item was selected for viewing.
In summary, we make the following recommendations for helping
users understand agents' capabilities:
- use task-specific vocabulary;
- keep descriptions of the agents' states simple;
- allow the agent to make functionality suggestions at appropriate times and
in a gradual way.
Agents are useful to the degree that users can trust them to
perform an understandable task autonomously, without repeated, direct
instruction and constant supervision. Trust is usually thought of as an
interpersonal concept. That is, we as humans have greater or lesser degrees of
trust in other people. In fact, though, we already extend great trust to our
software systems. We must do this in any situation in which the actions of the
software are opaque to us. For example, we trust that the input we type to forms
will be passed to the application as we typed it, not in some altered form; we
trust that spreadsheets will correctly calculate the formulas we create. This
trust is so complete that when it is not met -- as in the case of the recent
Intel chip mathematical flaw -- the event is often industry or national news.
Agent software will, of course, require this form of trust. however, agents will
go beyond this level, into areas where trust in software is far less universal.
In this area, the actions of software have real-world consequences which are
intimately personal to the users. Software of all kinds has been used for these
sorts of applications for years and, of course, there have been failures. [18]
Agents differ from this second class of software only in that
the trust is more personally established. As noted in the introduction, one of
the important features of agent interfaces is their degree of personalization to
the user they are assisting. To the degree that the agent is able to better
adapt to the user's pattern of work, the user will feel any errors made by the
agent more personally and will have a harder time trusting the agent. This is a
well-known psychological phenomenon: if a software error causes a train crash,
then the average user is unlikely to be personally affected and so tends not to
pay too much attention to the problem. If, however, agent software fails to
record a meeting properly and a sales opportunity is lost, then the user of that
agent is highly unlikely to trust the agent, even as he steps onto another train
in the same subway system where the crash occurred. This difference in human
reaction is constantly faced by designers of physical systems, such as
high-voltage power lines, which are perceived to be risky. Agent interface
designers must also address the perception of risk if agents are to be useful. A
trusted agent may be permitted by users to make travel arrangement, filter
incoming news and email streams, and in general take a number of actions with
real-world consequences. An untrusted agent will likely not be permitted to take
these actions and so may become less than useful.
With this understanding, we can see that trust must inevitably
be a part of the agent interface, whether or not the designer addresses the
issue directly. We feel it is better to tackle the issue directly. The designer
should try to answer two key questions:
- How can users express evolving trust?
- How can users know what their trust translates into, in terms of agent
actions?
Trust is an evolving state that grows gradually as the agent
demonstrates itself more capable (just as with people). An agent can be set up
to gradually increase the number of things that it attempts to do; this is one
advantage of a learning agent: its initial capabilities are weak but as it gets
more use it grows more useful. A good example of this is Lieberman's Letizia
web-browsing assistant [10],
which learns a model of its user's interests by watching what web pages are
browsed and deriving relevant key terms from these pages. As the number of pages
seen goes up, so does Letizia's ability to correctly suggest the most relevant
next pages to be browsed.
Even without a sophisticated learning algorithm, it is
possible to introduce agent capabilities gradually. For example, imagine an
information-filtering agent which can scan thousands of Usenet newsgroups and
produce a customized list of articles to be read by the user. One possible
interface for such an agent would be to start up with a list of newsgroups to be
filtered and have the user select a subset of them; the agent would then offer
its list of selected articles for reading in place of these groups. This
interaction requires that the user have a high level of trust in the agent
initially, which may not be realistic. A better design, from a trust point of
view, might be to have the agent initially observe the user's reading habits. It
might -- at first -- offer to filter those newsgroups in which there are a large
number of articles, of which the user only reads a few. This
low-signal/high-noise situation maximizes the usefulness of the agent while
minimizing the effects of missed articles. Here `missed' means an article which
the agent erroneously filters out which the user would have preferred to read.
By offering the most benefit for the least risk, the agent can promote a user's
growing trust.
An even better solution would be to have the agent re-order
the news articles, placing the ones it thought the user would be most interested
in at the top of the list. The agent could add some kind of visual marker (such
as a thick horizontal line) at the bottom of the articles it suggests reading.
This way, the user is able to most quickly see which items have been selected
while at the same time not being cut off from access to any of the other
articles he might choose to read. Once again, this design takes advantage of the
agent's ability to maximize benefit, while promoting trust by allowing the user
to compare directly the agent's cut-off point with his own and reducing the risk
of missed articles.
Another approach to this kind of problem would be to have the
agent suggest what it would do and allow the user to approve or disapprove the
suggestion before any action is taken. For example, Cypher's Eager system [4]
searched for the interface objects it expected the user to select next and
pre-highlighted them. The user could accept the selection or override it and
select something else if Eager had incorrectly anticipated his desires. The Open
Sesame system also takes this approach. Although having suggestions which the
user may or may not approve adds another step, and thus possibly an
inefficiency, to the interface, the extra effort may be worthwhile if the
interface designer expects her users to be unlikely to trust the agent. When
users are able to see in advance what the agent would do, they are more likely
to trust that agent to do the right thing.
One of the problems of delegating any task, whether it is to a
human or to a computer agent, is that the task is likely not to get done in
precisely the same way it would be done if the person did it herself.
Methodological differences can have a large impact on trust; if I see someone
doing a tasks a different way than I do it -- and do not understand that this
different method will lead to the same (or better) results -- then I am unlikely
to trust that the task will be done. Therefore, it may be advantageous if the
agent performs the task initially in the same fashion as the user performs it,
even if a more efficient method is available. Once the user has gained
confidence that the job will be done, the agent may switch to a more optimal
method.
Users' growing trust in an agent can be measured by the degree
to which they accept the agent's suggestions. Untrusted agents' suggestions will
not be adopted; trusted agents' ideas will more often be taken. It is also
possible to allow users to express their trust in the agent directly. One simple
way to do this is, as in the earlier situation of the learning-by-example agent,
to have the agent offer to take over individual actions that the user is
performing. However, in an application with many similar or nearly identical
actions -- such as filtering newsgroups -- this could be excessively distracting
or inefficient. A better approach might be to allow the user to state a general
trust level and have the agent use that information directly. Many algorithms
used to generate options for an agent's actions can also generate confidence
measures (usually on a real scale from 0.0 to 1.0) indicating how likely the
action is to be successful. Users may wish to disallow the agent from taking
actions in which it is not highly confident; for example, setting a threshold of
0.8. Any actions for which the algorithm generates a confidence rating below the
threshold are not taken. If the confidence ratings are meaningful to the users,
this approach may be ideal. For less expert users -- who may not grasp the
difference in the application between a 0.7 and a 0.8 confidence -- a better way
to approach this might be to allow users to state their confidence level in
real-world terms. Users might be able to express that they are 'unsure' of a new
agent or 'highly confident' in software they have been using for some time.
Another possibility is that the application domain will offer
natural divisions which can be used to demark increasing levels of confidence.
For example, agents which deal in a marketplace application might be allowed to
make deals autonomously with real-world values of less than $10 but required to
get user approval for more valuable transactions. Alternatively, the agent might
be permitted to handle the negotiations part of any transaction no matter what
the value, but would be required to get user approval before any goods or money
actually change hands.
As noted in the Introduction, one of the important activities
that goes on between an agent and a user is a mutual modeling effort.
Understanding, as described above, is largely about the user's model of the
agent: its capabilities, its methods of operation, and so on. The agent's model
of the user is important for the Trust issue; specifically, if I am to trust the
agent to do what I ask, I need to know that it has a good enough model of my
tastes/preferences/desire to perform acceptably. An agent which can show (and
possibly explain) its model of the user is more likely to be trusted than one
which hides this model. Often the model is built up by direct user input, such
as answering questions in a survey or selecting items from a list of options. In
these cases it is a simple matter to make an interface screen which allows the
user to review her inputs and possibly change them. The agent can then adapt its
model of the user to the new input. In other cases -- for example the URL Minder
agent offered by Netmind corporation [17]
-- the user may have given preference-selecting input over a possibly long
period of time. In these cases, a screen which summarizes the input accumulated
over time and which allows the user to have the agent discard old input which is
no longer valid can be a good way to expose the model.
In summary, we offer the following recommendations for
enhancing users' trust in their software agents:
- allow trust to evolve;
- focus on high-benefit/low-risk activities for agents to begin with;
- consider ways for users to express trust directly;
- make the agent's model of the user open to examination/change.
Control is, in many ways, the dual of trust. In extending
trust to the agent, the user may give up some amount of control -- specifically,
the level of control needed to execute the specific actions which the agent is
trusted to perform -- but it is important to realize that this is a matter of
degree. That is, in using any piece of software, users must give up some
control; what is important is that control not be taken from users
involuntarily.
Control issues are always at the heart of software agent
design for two reasons. First, an agent must be able to operate to some degree
autonomously. If the user is required to direct every operation that the agent
performs (or pre-script the actions in great detail) then the agent is much less
useful. Agent interfaces are inherently delegative -- the user turns things over
to the agent to be done rather than doing them herself.
Second, agents must be able to be instructed, or have tasks
delegated, at an appropriately high level of detail. Agents operate best when
users can specify what the agent is to do, and not have to worry about how it
gets done. An easy way to understand this is by analogy to a real-world example.
Let's say I call a taxi and ask to be taken to the airport. What matters is that
I get to the airport within some 'reasonableness' constraints: the trip should
not take too long, nor cost too much, nor be too unsafe. But as long as these
reasonableness constraints are obeyed, I am happy to give up control over the
driving to the cabbie. In this way, the driver is able to act as my agent for
this interaction.
More specifically, we can take advantage of work done on task
analysis to address control issues. In any complex task environment there are
inevitably points which are more important than others -- places where key
decisions are made or where widely varying alternatives must be sorted out. One
way to help address the control issue in agent interfaces is to allow the agent
to proceed in execution of a task until one of these key points is reached. At
these points, the human user is kept (or brought back into) the loop.
Being "in the loop" in this sense means more than just
clicking an 'OK' button or answering yes or no in a pop-up window. The user may
not realize where in the process the agent is, or how it has gotten to this
point. Users must be given the information they need to understand what the
agent is asking them to do and how it came to ask this question. Users who do
not have meaningful decision-making capability cannot be considered to be in the
loop. This implies that the agent should be able to suspend its operation at
these points at least -- if not at any point -- since users may not be available
or may not have the attention at the time to consider the decision. If this is
not possible due to the nature of the task -- perhaps a transaction that, once
begun, must be completed within a specified time -- the user should be warned in
advance. In some cases the user will be able to give up the appropriate level of
control and the agent can proceed to complete the task when the time comes. In
other cases, however, this is not possible and the agent should not even begin
the task. Nothing makes users feel more out of control than having to 'chase
after' a software process which continues on despite their stated intention to
the contrary!
One way to help users feel that their agents are not out of
control is to allow agent and user actions to be interchangeable. As in the
discussion of how tasks get done above, this may involve the agent using a less
efficient method to accomplish the task. However, if the user has Pause
and Resume buttons available as part of the agent interface, she is much
more likely to feel in control. The agent designer must realize that while the
agent is paused the user may take some of the steps herself; therefore, when the
agent resumes operation it should check the state of the task and make use of
any partial work which the user may have done. In some cases, actions taken by
the user can be seen as a form of instruction from the user to the agent.
In general, instructing an agent can be made quite simple. If
the agent needs to learn a procedure which the user performs, it can either
observe the user's actions, detect patterns, and take over the repetitive
activity, or it can be put into a 'learn' mode, where the user demonstrates a
procedure and the agent records what has been done. The key to this form of
learning is the ability of the agent to generalize from the specific examples to
an appropriate class of actions. If the agent over-generalizes, the user may
start to feel out of control; if it does not generalize enough it will not be
useful. In other cases, the agent needs to learn which properties of situations
or elements of input are relevant. To do this, the user can often point out the
specific important items. For example, in the NEWT news-filtering agent [21],
the user interface allowed readers to highlight words or phrases within articles
and click a '+' or '-' button. These buttons provided direct positive and
negative feedback to the agent on the usefulness of the highlighted items and
caused the agent to improve its filtering. This feedback allowed users to have
more direct control over the agent's training (and its model of the user's
preferences) than simply approving or disapproving articles wholesale, although
this latter option was available.
In some cases, the users may be more expert and may wish to
have a form of programmatic control over the agent. An agent interface can be
augmented with a special-purpose scripting language to allow this, provided that
the interface designers believe their users will be able to understand and use
the language. For example the OVAL system [14]
built by Malone and his students allows users to program agents with rule sets
-- essentially IF... THEN... statements which fire off system actions such as
invoking another agent, starting a notification program, or saving a message to
a particular folder. The agents run these rules through an interpreter provided
with the system to decide how to filter the user's incoming mail.
Unfortunately, experience with Beyond Mail, a
commercial product based on these ideas, has shown that many users either do not
use the rules at all, or do not use them appropriately. This seems to indicate
that end-user programming of agents is likely to be useful only for the most
experienced of users. This can have unintended side effects: the presence of a
programming language which the user realizes he does not understand can make him
feel even more powerless. The attempt to make the agent more open and
controllable may in fact result in users feeling like they are less in control
by reinforcing their lack of knowledge.
Another important aspect of control that interface designers
must consider is the degree to which the agent is perceived to be asserting its
importance over the user. To a large degree this depends on the kinds of actions
taken by the agent, though as we will discuss under the final issue,
personification, the way in which the agent presents its actions can also have
an influence. For now, though, we will look at kinds of actions. In general, we
recommend that agents initially begin not by taking actions but by suggesting
actions or offering to do actions, as described above. This suggest/approve/act
paradigm may seem inefficient (and may cause more disruption than desired, which
we address below) but it gives the user the maximal feeling of control. Of
course, as the user is willing to relinquish more control, the agent can take
more direct actions, particularly in situations identical to ones where the user
has previously given explicit approval.
An effective way to give users control is to open up some of
the parameters of the agent to user input. For example, learning algorithms
often have settings for how quickly old information is discarded as no longer
relevant. An agent which uses keyword matching to track interests may be
programmed to omit keywords which have not been used in more than 6 months. In
cases where such parameters are simple it is possible to present them in the
interface and allow the user direct control over tuning the agent -- its rate of
learning or forgetting, the importance it places on unique or rare examples, and
so on. In other cases, the parameters themselves may be too complex to allow
manipulation but the algorithm may be `tuned' by a combination of settings. For
example, information filtering algorithms often trade off the number of articles
found in which the user is interested against the total number of articles
shown. At one extreme, the agent can show all articles and be sure that all
articles of interest are shown. However, this is usually undesirable. In order
to reduce the number shown, the agent must risk missing some articles which
would have been of interest to the user. Although the parameters of the
algorithm making this tradeoff may not be intuitive, it would be easy to provide
an interface which allowed userss to control how cautiously an agent behaved.
The interface could translate inputs into parameter settings for the algorithms
to tune it for the desired behavior.
In closing out our discussion of the issues of trust and
control, it is important to stress a respect for individual user differences.
Some users rapidly learn to trust their agents and will want to turn over
control to them quickly. Other users start out more cautious, preferring to
verify the results of agent actions several times then grow to accept that
control can be safely turned over. Still others are always cautious and always
want to be asked for approval. Informal user testing by Maes and her graduate
students with the MAXIMS [13]
agent showed all three of these behavior patterns. Given this wide variation in
user reaction to potential loss of control, we recommend that agent interfaces
be built to allow users to gradually change their level of control, rather than
having a specific level or method built in. In many situations, this can be as
simple as allowing each user to run a separate copy of the agent; each agent
models the user for whom it works and adapts directly to that user's
preferences, including his preferred control level.
Trust and control are interrelated issues; as users increase
their trust in their agents they will be willing to give up more control.
Interface designers can take steps to help users continue to be in control to
the degree that they desire, by modeling the user appropriately, and by insuring
that the user remains in the loop for critical decisions that must be made.
Trust can be extended slowly, often by use of an explicit scale or limit
provided in the agent's interface; agents can affect users' trust levels by
concentrating first on those cases where the return is highest and the
consequences of an agent error are lowest. As Schneiderman [20]
points out, one of the key reasons why we must pay close attention to issues of
trust and control is that agents may be responsible for actions they take on
someone's behalf which have consequences, including potentially harmful
consequences. In the American precedent-based legal system there are currently
no guiding decisions on who is responsible for actions taken by software of any
kind. For example, if an accountant uses a spreadsheet with a bug and therefore
prepares an erroneous tax return, it is unclear to what degree the spreadhseet
software company may be held responsible. Thus, most software is sold with
explicit disclaimers of responsibility designed to shield the software
manufacturers from consequences of what people do with their software. Still, in
some court cases the software manufacturer has been held responsible -- or
legally liable -- particularly when the software does not behave as expected. In
other court cases, particularly when the software behaves as intended, only the
user has been held to be responsible. If agent software, acting autonomously,
appears to be out of control it introduces a further complication into the
picture. By keeping users in control of their software agents, we help
responsibility rest with the users, though no interface designer can predict how
the contradictory court cases will play out in the end. As with the unstoppable
actions discussed above, actions with potentially serious consequences require a
higher level of attention in the interface.
Our control-related recommendations are:
- allow variable degrees of control;
- give users the highest-feasible level of control specification;
- help users understand the agent's model of the user;
- keep users meaningfully in the loop, particularly for key decision points;
- begin with suggestions and move to actions later.
Distraction is a consequent issue, derived from the preceding
three issues. In particular, distraction is that part of usability which is most
often emphasized in agent interfaces. As with any computer software interface,
designers must pay attention to standard usability issues such as screen layout,
color and font choice, and so on. However, because agent software operates to
some degree autonomously, it often needs to initiate communication with the
user, unlike other interfaces which often communicate with the user only in
response to input. This initiation by the program may be audio, visual, or a
combination of both. In any form, though, a change in the user interface which
is not the obvious result of user input will likely attract the user's
attention. If that attention was focussed on getting a job done, the
communication from the agent will likely be seen as a distraction.
Distraction is most often a problem because it disrupts the
user's normal work pattern. Sometimes disruption occurs without the agent ever
being visible -- for example, if your agent is consuming all your system
resources so that your other programs cannot run then that clearly disrupts your
work. However, most of the time we are concerned with more overt disruptions.
These disruptions take place at the user interface, most often in the form of a
pop-up appearing or other change in the interface of the agent.
Often these changes come from an effort by the interface
designer to handle one of the first three issues we outlined: Understanding,
Trust, and Control. In designing for the challenges of each issue, the solution
seems to be more communication with the user, more information given to the
user, and more input from the user. Each of these seems to require more and more
frequent interruptions of the user's tasks in order to make the communication
happen. While we agree with the ideal of having agents communicate useful
information to the user, for all the reasons outlined above, the issue here is
how to approach this ideal while still minimizing disruption. As interface
designers, we have to remember that the user's goal is not to interact with the
agent, but to have the agent help in completing a task. If the time and effort
saved by the software agents is not significantly greater than the time spent
training, customizing and interacting with them, then the agents will not be
useful and will not be used.
There are four factors which designers can use to minimize
disruption by reducing the number and level of interruptions. The first is
simply using the least obtrusive possible design for informational messages
which are deemed necessary. In some cases, the agent may simply change
appearance or place a small flag on the screen and wait for the user to click or
otherwise accept the notification before displaying it. The most obvious way to
display a notification may not be the best.
The second way to minimize disruption is to differentiate
beforehand which notices are the most important. Many less-important
communications from the agent can be delivered unobtrusively, usually by email.
Interface designers sometimes make the mistake of thinking that -- because the
agent is running at its own pace -- messages and interrupts should be delivered
and acted upon as they occur; this ignores the fact that it is often easier (and
better interface design) to have the agent wait until the user can be safely
interrupted, even though this delays the agent's task completion.
If there is no easy way to differentiate important from
unimportant events a priori, it may still be possible to have users directly
control the level of interaction they want, either by designating certain
classes of messages as important, or by specifying a level. This third factor
requires the user to do some thinking beforehand and may require the user to
become familiar with the kinds of information that the agent can present. As
with any form of user learning, if this is likely to be too difficult, or the
interface designer cannot expect the user to learn this before interacting with
the system, then some other method must be provided for users to give this
input. As we discussed under Control, it may be effective enough to simply
provide the user with a widget and allow her to specifically set a notification
level based on some aspect of the agent's state, such as its confidence in a
suggestion.
For example, the widget of Figure
5 (adapted from the Meeting Scheduling Agent described by Maes in [11])
gives the user simultaneous Control and Disruption options via dual sliders. One
slider sets a "tell me" level, by which the user asks the agent to give him
suggestions or information whenever the agent's confidence in a pending proposal
is higher than 0.5; the other slider allows the agent to act only when it is
more than 90% confident in the action it has chosen. A setting of 1.0 for either
slider would prevent the agent from sending notifications or taking actions,
respectively, allowing the user to remain ultimately in control or completely
uninterruped if she so desires.

Figure 5: Dual-control slider
Finally, the fourth way to control the level of distraction
comes from task analysis. If the agent is aware of the structure of the user's
task then it can adapt its level of communication to the importance of the
current operation. If the user is engaged in a more critical phase of work then
the number of interruptions should be minimized; in other cases more disruption
can be tolerated. Of course, this requires a good deal of up-front effort by the
agent designers and programmers but in some cases giving the agent a large
knowledge base from which to work is the only way to have it behave usefully in
the target environment. For example, an agent which assists in selling goods
would do well to know the difference between casual and time-critical goods
offered for sale. An old PC may be sold casually over many days; a basket of
fresh produce must be quickly dealt with.
As noted, the goal of an agent interface designer dealing with
the Distraction issue is to minimize intrusion into the user's work environment.
Humans are still evolving conversational and social protocols to handle these
problems among themselves. For example, we learn which of our coworkers can be
interruped when they are typing or when their office doors are half-closed, but
we learn it haltingly, with much trial and error. Agents will probably never be
better than humans at this; however, since humans do acheive acceptable
solutions to this problm, it does suggest that it might be worthwhile teaching
our agents the human rules of convention for when interruption is permitted.
Eventually, agents may learn rules of human conversation and turn-taking, such
as nodding, using back-channels, and so on. The Ph.D. work of Kris Thorisson
from the MIT Media Lab [22]
points in this direction.
Distraction from, or disruption of, a user's work can be
minimized by having the agent:
- learn when interruptions are appropriate;
- reduce or eliminate overt interruptions such as pop-ups;
- use unobtrusive notifications (e.g. communicate via email);
- allow users to initiate communication.
If agents are going to take advantage of human-like social
paradigms, this raises issues of how person-like an agent needs to be. In most
cases, the agent not only does not need to be person-like, but may not even need
to be directly represented. In fact, most commercial agents today do not use any
visible representation; however, as we discuss below, personification may offer
some advantages and may be seen more often in the future.
For our work we separate the issues of anthropomorphism from
personification. Anthropomorphism is the use of graphical and/or audio interface
components which are used to give the agent a human-like presence at the
interface. Interesting work on anthropomorphic interfaces is being done by a
number of researchers such as Walker and Sproull [23].
A presence at the interface, no matter how un-human-like, is often called "the
agent" even though the programs which do the real work of assisting the user are
quite separate from the graphical animation routines which control the screen
appearance. Designers should be careful about misleading customers in these
cases. It is also possible to develop interface agents beyond simple appearance
and endow them with a form of personality or behavior. For example, work done by
Brenda Laurel [9]
and by Tim Oren [19]
has explored ways in which peoples' expectations of social interactions can be
made to work in favor of the interface. Their work has largely involved the use
of human-like or human-appearing agents. Conversely, personification is the
tendency to assign human-like characteristics, such as emotions and
sophisticated planning capabilities, to non-human animals and objects. The root
of these assignments is often called intentionality -- that is, we speak of
things as though they could form intentions.
This intentional stance [6]
is what allows us to say things like "the printer doesn't like me" or "this car
doesn't want to go where I steer it." In neither case does the speaker truly
believe that the object being spoken of has likes or wants in the sense that
human beings have them. However, this language provides conversational shortcuts
for describing problems or situations, with the expectation that our listeners
will understand what we mean. People use models of things they know to explain
the actions of things which seem novel or otherwise unexplainable. Human
quirkiness is well known and serves as a good model for the unpredictable
behavior of devices. We believe that these are natural tendencies not specific
to software systems of any kind. We take it to be inevitable that people will
personalize agents to some degree; humans are notorious for personalizing
everything from their pets to their possessions. The issue for the interface
designer is not whether or not the agent should be personalized, but to what
degree is the natural tendency of users to be worked with versus to what degree
must it be fought. These are not simple issues, nor do we have space to explore
them all here. People such as Clifford Nass at Stanford [16]
have been looking at the questions raised by treating computer systems as
partners in social interactions; their results appear to show that the rules we
are used to applying to human-human interactions can be translated relatively
unchanged to human-computer interactions, at least insofar as human attitudes
and perceptions are concerned.
The tension over personification revolves around two opposing
poles. On the one hand, we can use the naturalness of the intentional stance to
improve our agent interfaces; on the other hand, we must take care that -- to
the extent that we allow or encourage human-like interactions with users -- we
do not mislead them into thinking that the system is more capable than it truly
is. Often the differences can be quite subtle; personification can creep into
interfaces without the designer's intention, and many designers can tell stories
of users who were misled into overestimating even simple system
capabilities.
For example, imagine that the user selects a Save command in a
standard text editor. The message "File Saved" might appear on completion of the
command. However, this begs the question of who is responsible for the file
being saved. Did the computer save it? Did the editor program? Did the user?
Critics of agent technology, such as Jaron Lanier [8],
have argued that agents disempower users by obfuscating issues such as who is
responsible for actions being taken in the software. However, we see these
issues even in ordinary interfaces; agent software did not create the human
tendency to personify, nor did they originate ambiguity in the interface. When
we see actions take place, we naturally look for causes, but the causes may not
be obvious. For example, the ultimate cause of the File Saved message may in
fact be the person who installed the software and chose a setup preference which
automatically saves the file every 15 minutes.
In agent software, it is important to consider what effect it
may have on the user if we hide the causes for agent actions. Agents are more
useful if they are able to act autonomously, based on the user's trust in
turning over control of part of a task. To the extent that the user is able to
think of the agent as an intentional entity we can encourage the delegation
style of interface which characterizes good agent systems. However, the user
must not think she is turning over this task to a human being, one who will be
able to use all the common sense and intelligence we expect from people. We must
personify agents in the way that cars or other familiar objects are personified,
so that users are not misled. To this end, we recommend using an obviously
non-realistic depiction for the agent, or none at all. In particular, there is a
rich history in American culture of cartoon depictions of people that convey
emotions, attitudes and intentions [15]
without there being the slightest doubt that no real person is portrayed. Using
these caricatures and abstract depictions we can help the user understand what
functions the agent is able to perform and at the same time make our interfaces
less hostile and more fun to interact with. Of course, it is possible to go
overboard and end up with an interface that is silly or unintentionally
ridiculous. The "Lifestyle Finder" experimental agent made by Andersen
Consulting [1]
plays with this by using the persona of "Waldo the Web Wizard." Here the joking
and self-deprecating nature of the interface helps remind users that the system
is a beta-test prototype; a production version of the same agent would probably
require a completely different persona.

Figure 6: Andersen Consulting's Lifestyle Finder Agent
Image


Figure 7: Idle and Notice Agent
Images
The benefits of depicting an agent as a character in an
interface revolve primarily around focus and helping users anticipate. If the
agent software is making suggestions for actions and these suggestions are
appearing more or less out of nowhere it can be quite disconcerting for users. A
ubiquitous but disembodied agent could be quite disconcerting to users.
Likewise, the agent may want to put up notifications, as discussed above.
Unfortunately, the notifications may not be visible if they are not properly
located, or if properly located may be too obtrusive. If there is a place on the
screen where the user goes to interact with the agent, notifications can be
placed there, perhaps by changing the appearance of the agent. For example, the
two pictures in Figure
7 show different states of a cartoon character used in our interface
personalization prototype (see below). The right-hand image is used when the
agent has something it wants to communicate to the user. The difference is
direct and obvious while at the same time the `notification' image takes up no
more screen real estate than the other image. By keeping an eye on the agent
icon, users can anticipate what the agent will do or say next, reducing surprise
factors
In addition to having a known place to check for notifications
and suggestions, an agent appearance on-screen can help users who are trying to
control or direct an agent, as described above. It is a simple step to make the
agent's on-screen appearance active, so that clicking on it with the mouse
brings up help screens, instructional screens, and so on. From an interface
design point of view, this can be quite beneficial; often the interface elements
related to controlling the agent are orthogonal to the functioning of the
application and there is no good place to put them into an integrated interface
without disrupting the application's interface design. If they are 'hidden'
behind the agent's appearance they do not disrupt the interface design, but can
be found by users who look for them in the natural place -- where the agent is.
Even if the agent is not directly represented on-screen, the interface designer
would do well to provide a central control point, such as a menu-bar item or
button, where the user can look for all agent controls.
If the interface is to take advantage of the intentional
stance, then it is often to the interface designer's advantage to use
deliberately intentional language. For example, an agent interface would do
better to say "I will filter your email..." or "This program will filter your
email..." instead of "Your email will be filtered..." It must be clear to the
user that the 'I' refers to the agent and not to another user or person.
In summary, the use of graphics and intentional language can
allow the user to understand the program's capabilities as a coherent set of
actions to be taken by a recognizable entity. This can encourage the user to
engage in a more delegative style of interaction; however, designers must be
careful not to mislead users into thinking a program is more capable than it
really is. Even something as simple as using natural language sentences in
interface dialogs may lead users to believe that a program can handle arbitrary
natural language input.
Our personification recommendations are:
- direct representation of the agent is not a necessity;
- beware of misleading impressions; use representations appropriate to the
agent's abilities;
- provide ways for users to focus their attention and agent-related input.
One of the reasons we call our systems "agents" as opposed to
simply "software" is to remind users that whenever you delegate something to be
done by another, things may not be done the way you wanted. You have to
understand what the other is capable of doing, trust that the task can be done
within the applicable reasonableness constraints, give up a measure of control,
and accept that sharing the workload with another necessarily imposes some
amount of communication overhead. When the `other' is a person, we can draw on
thousands of years of social experimentation to help us cope with these
difficulties. As software agents begin to be available to assist us, we find
ourselves needing solutions for these old problems in new domains.
This paper has presented five of the most important issues in
agent user interface design. They emerge from an ongoing research program; there
are no hard-and-fast answers to the problems posed. Instead, we can only offer
guidelines and suggestions for what choices can be made and what the factors are
that must be traded off in the search for a good answer. We believe these issues
apply to all agent interfaces, though different systems will take different
approaches to solving them.
These issues do not always come in nice separable packages;
for example, the appearance of an agent personified on the screen can have
definite effects on its perceived trustworthiness. Some users may find cartoon
depictions, or any sort of graphical animation, excessively distracting. Solid
principles of interface design can never be ignored; as we stated in the
beginning, delegative interfaces are not a magic bullet for designers. There are
also important issues in agent systems which affect but which are not directly
related to the interface. For example, it is an open question whether a
personalized agent should mimic the user's "bad" habits -- where bad may be
defined by the user herself or by her environment -- or whether the agent should
attempt to instill better habits. We hope that further experience with building
agent systems will help us develop guidelines for these sorts of issues as well.
However, based on our experience and research on agent interfaces, we offer the
following Top 10 principles for agent interface designers:
- Make the agent's user model available for inspection and modification.
- Always allow the user to bypass the agent when desired.
- Allow the user to control the agent's level of autonomy.
- Use gradual approaches whenever possible (e.g. learning, scope of
operation, severity of action).
- Provide explanation facilities.
- Give concise, constant, non-intrusive feedback to the user about the
agent's state, actions, learning, etc.
- Allow the user to program the agent without needing to be a programmer
(e.g. manipulate levels, control learning/forgetting).
- Do not hide the agent's methods of operation from deliberate user
inspection; conversely, do not force the user to understand them either.
- Communicate with the user in her language, not in the agent's language.
- Integrate the agent's suggestions and actions into the application
interface to the greatest extent possible, rather than requiring the user to
find separate windows or learn new controls.
As users' computational environments become more complex, as
computer products move into more consumer-oriented and more broad-based
applications, and as people attempt to take on more new tasks, we find ourselves
in desperate need of improvements to the interface, improvements which will
enable users to accomplish what they want more efficiently, more pleasantly and
with less attention to computer-imposed detail. Software agents are one way to,
as Brenda Laurel [9]
put it:
...mediate a relationship between the labyrinthine precision of
computers and the fuzzy complexity of man.
In order for that to happen, we will have to find ways to build interfaces to
software agents that maximize their potential while avoiding the worst of their
risks. Precisely how we can do that is likely to remain a matter of debate for
many years.
References
- Andersen Consulting makes a prototype of the Lifestyle
Finder agent available at <URL: http://bf.cstar.ac.com/lifestyle/>
- Ball, Gene, et al. "Lifelike Computer Characters: The
Persona project at Microsoft Research," Software Agents, Jeffrey
Bradshaw, (ed.), MIT Press, 1996.
- Caglayan, Alper et al. "Lessons from Open Sesame!, a User
Interface Learning Agent." First International Conference and Exhibition
on The Practical Application of Intelligent Agents and Multi-Agent
Technology, PAP, Blackpool, Lancashire, UK, 1996.
- Cypher, Allen. "Eager: Programming Repetitive Tasks by
Example," Proceedings of CHI'91, ACM Press, New York, 1991.
- Cypher, Allen, Daniel Halbert, David Kurlander. Watch
What I Do: Programming by Demonstration, MIT Press, 1993.
- Dennett, Daniel. The Intentional Stance, MIT Press,
1989.
- Kozierok, Robyn. A Learning Approach to Knowledge
Acquisition for Intelligent Interface Agents, S.M. Thesis, MIT
Department of Electrical Engineering and Computer Science, 1993.
- Lanier, Jaron. "Agents of Alienation,"
interactions, July 1995.
- Laurel, Brenda. "Interface Agents: Metaphors with
Character," Software Agents, Jeffrey Bradshaw, (ed.), MIT Press,
1996.
- Lieberman, Henry. "Autonomous Interface Agents,"
Proceedings of CHI'97, Atlanta, GA, ACM Press, 1997.
- Maes, Pattie. "Agents that Reduce Work and Information
Overload," Communications of the ACM, Vol 37#7, ACM Press,
1994.
- Maes, Pattie. "Intelligent Software," Scientific
American, Vol. 273, No.3, pp. 84-86, September 1995.
- Maes, Pattie and Robyn Kozierok. "Learning Interface
Agents," Proceedings of AAAI'93, AAAI Press, 1993.
- Malone, Lai & Fry. "Experiments with Oval: A
Radically-Tailorable Tool for Cooperative Work," MIT Center for Coordination
Science Technical Report #183, 1994.
- McCloud, Scott. Understanding Comics,
Harperperennial Library, 1994.
- Nass, Clifford, Jonathan Steuer, Ellen Tauber. "Computers
are Social Actors," Proceedings of CHI'94, Boston, MA, ACM Press,
1994.
- Netmind Corporation, "Your Own Personal Web Robot,"
available at <URL:
http://www.netmind.com/URL-minder/URL-minder.html>
- Neuman, Peter. Computer-Related Risks, ACM
Press/Addison Wesley, 1995.
- Oren, Tim et al. "Guides: Characterizing the Interface,"
The Art of Human-Computer Interface Design, Brenda Laurel (ed.),
Addison Wesley, 1990.
- Shneiderman, Ben. "Looking for the Bright Side of User
Interface Agents," interactions, ACM Press, Jan 1995.
- Sheth, Beerud and Pattie Maes. "Evolving Agents for
Personalized Information Filtering," Proceedings of the Ninth Conference
on Artificial Intelligence for Applications, IEEE Computer Society
Press, 1993.
- Thorisson, Kristinn. "Dialogue Control in Social Interface
Agents," InterCHI Adjunct Proceedings, Amsterdam, Holland. ACM
Press, 1993.
- Walker, Janet, Lee Sproull and R. Subramani. "Using a Human
Face in an Interface," Proceedings of CHI'94, Boston, MA, ACM
Press, 1994.
- Williamson, Christopher. "The Dynamic HomeFinder:
evaluating Dynamic Queries in a real-estate information exploration system,"
in Sparks of Innovation in Human-Computer Interaction, Ben
Shneiderman (ed.), Ablex Publishing Corp, 1993.
Copyright © 1997 Alan Wexelblat
http://wex.www.media.mit.edu/people/wex/Except
where otherwise noted.
Last Modified: 03:54pm , November 04, 1997