| Psychological Review | © 2000 by the American Psychological Association |
October 2000 Vol. 107, No. 4, 914-942 | For personal use only--not for distribution. |
Group impressions are dynamic configurations. The tensor product model (TPM), a connectionist model of memory and learning, is used to describe the process of group impression formation and change, emphasizing the structured and contextualized nature of group impressions and the dynamic evolution of group impressions over time. TPM is first shown to be consistent with algebraic models of social judgment (the weighted averaging model; N. Anderson, 1981 ) and exemplar-based social category learning (the context model; E. R. Smith & M. A. Zárate, 1992 ), providing a theoretical reduction of the algebraic models to the present connectionist framework. TPM is then shown to describe a common process that underlies both formation and change of group impressions despite the often-made assumption that they constitute different psychological processes. In particular, various time-dependent properties of both group impression formation (e.g., time variability, response dependency, and order effects in impression judgments) and change (e.g., stereotype change and group accentuation) are explained, demonstrating a hidden unity beneath the diverse array of empirical findings. Implications of the model for conceptualizing stereotype formation and change are discussed.
Ever since Asch's (1946) ground-breaking research, person impression formation has been a major topic of inquiry in social psychology for more than half a century. Despite Asch's (1952) interest, the topic of group impression formation and change began to attract empirical attention relatively recently (e.g., Hamilton & Gifford, 1976 ). By research on group impression, we mean a class of studies in which various information about individual members of social groups is presented, and the effects of the information on people's judgments and evaluations about the groups are examined. When participants have little prior information about a target group, this type of research examines the formation of group impressions. By contrast, a group impression change occurs when participants' impressions about a target group, about which participants have some prior expectancies (e.g., stereotypes), evolve as a result of the information given.
The research on group impression formation and change now constitutes a substantial literature in which a number of robust empirical phenomena have been identified (for reviews, see Hamilton & Sherman, 1994 ; Hamilton & Sherman, 1996 ; Hilton & von Hippel, 1996 ). However, theoretical understanding of the phenomena has been hampered by the lack of a coherent theoretical framework that describes the processing of information about social groups. Hilton and von Hippel (1996) lamented, "There has been little effort directed at specifying the details of various representational models" (p. 244). Many empirical phenomena point to the dynamic character of group impressions, that is, the ever-evolving and constantly changing nature of group impressions. Theories of group impressions, however, fall short of capturing this dynamism.
Our main objective here is to present an explicit theory of group impressions that can shed light on their dynamics. We propose a theory of group impression formation and change based on a distributed representational system called the tensor product model (TPM) ( Humphreys, Bain, & Pike, 1989 ; Kashima, 1999 ; Kashima & Kerekes, 1994 ; Kashima, Woolcock, & King, 1998 ; Pike, 1984 ). We then show that this theory can provide an integrative framework in which to explain diverse time-dependent properties of group impression formation and change. It is often assumed that the formation and change of impressions are two separate phenomena: That is, impressions, once formed, become a stable entity (e.g., schema), and change processes involve something different. Contrary to this, we show that the process underlying both formation and change of group impressions could be a single, learning process described by the TPM.
Another objective is a theoretical reduction of algebraic models of social judgment to the TPM. Connectionist models are often said to describe information processing at a microcognitive level. Just as the macrolevel thermodynamic description may be reduced to microlevel statistical mechanics ( Nagel, 1961 ), we seek to reduce macrolevel cognitive theories to a microlevel connectionist description. Smolensky (1988) suggested that connectionism would provide a theoretical reduction of symbol processing theories to a subsymbolic paradigm; we believe our model provides a theoretical reduction of algebraic models to a distributed representational system. Social psychologists often seek a theory replacement, in which an old theory is falsified and replaced by a new theory. However, in a theory reduction, a new theory integrates old theories with lesser generality within a more general framework. We believe there are advantages of theory reduction in social psychology.
Our impressions about a social group evolve over time. As we learn more about the group and its members, our impressions become more elaborate and complex. This intuition about the dynamic nature of group impression was expressed by
Asch (1952
, pp. 234235) nearly half a century ago:
Our [initial] impressions of groups are often global, corresponding to particularly blunt central qualities. ... Simplified impressions are a first step toward understanding the surroundings and toward establishing clear, meaningful views. ... When conditions permit, initial impressions are corrected and become more articulated in the light of new experiences.
Asch's repudiation of elementarism and theoretical affiliation with the Gestalt tradition are apparent even in this short passage on group impression formation with his allusion to "meaning" and articulation. To his own question of "Is the impression of a group other than the sum of impressions of separate individuals?" (p. 222), Asch responded, "There are group properties that are the mode of interaction between the members. These are neither identical with properties of the individual members nor with properties that exist in some way behind individuals" (p. 226). Group impression was to be understood as an organized whole. To Asch, "impressions" were mental representations that are both dynamic and meaningfully structured or, put simply, dynamic configurations.
Linville, Salovey, and Fischer (1986
; for similar views, see, e.g.,
Brewer, Dull, & Lui, 1981
;
Taylor, 1981
;
R. Weber & Crocker, 1983
) gave a more contemporary expression of a similarly dynamic view of group impression formation.
Social categories evolve from relatively general, undifferentiated structures to more highly differentiated ones. Thus, new instances that do not fit the category are dealt with in part through increasing category differentiation. We assume that category differentiation tends to occur when the perceiver encounters numerous and varied instances of the category, and experiences incentives to distinguish among category members. (p. 166)
The concept of schema has often been used to refer to mental representations of social groups ( Fiske & Neuberg, 1990 ; Fiske & Taylor, 1991 ); for a related formulation, see Stangor & Lange, 1994 ). In fact, Asch's contention that group impressions are structured (as in Gestalt) is well reflected in the notion of "group schema." Neisser (1976) defined the concept of "schemata" as what he called cognitive structures, which are "a nonspecific but organized representation of prior experiences" (p. 287). Fiske and Taylor (1991) similarly defined "schema" as "a cognitive structure that represents knowledge about a concept or type of stimulus, including its attributes and the relations among those attributes" (p. 98). Rumelhart (1980) defined a schema as "a data structure for representing the generic concepts stored in memory. ... Inasmuch as a schema underlying a concept stored in memory corresponds to the meaning of that concept, meanings are encoded in terms of the typical or normal situations or events that instantiate that concept" (p. 34).
However, the generally static notion of schema is not suitable for describing the dynamic evolution of impressions, despite some attempts at revising it (e.g.,
Crocker, Fiske, & Taylor, 1984
, on schema change).
Bartlett (1932)
, who is credited with having introduced the schema concept to psychology, most clearly expressed this concern.
I strongly dislike the term "schema." It is at once too definite and too sketchy. ... It suggests some persistent, but fragmentary, "form of arrangement," and it does not indicate what is very essential to the whole notion, that the organised mass results of past changes of position and posture are actively
doing
something all the time; are, so to speak, carried along with us, complete, though developing, from moment to moment. (pp. 200201)
More recent theorizing about mental representations of social groups moved away from the static conception while retaining the structured, Gestalt-like property. Smith and Zárate (1990 , 1992 ; also see Linville & Fischer, 1993 ) postulated an exemplar theory of mental representations of social groups based on the context model of exemplar-based categorization (e.g., Medin & Schaffer, 1978 ; Nosofsky, 1984 ). Their basic premise is that people represent specific exemplars of a group, including an episode of encountering a member of the group, an inference made from any information given about the group, and hearsay about the group from others. Smith and Zárate assumed that exemplars may vary on multiple dimensions, and categorizations and judgments about exemplars are modeled by an algebraic function of similarities among the exemplars. Furthermore, the overall similarity between two exemplars is assumed to be a multiplicative function of the similarities on the dimensions (reviewed later). As noted by Medin and Schaffer (1978) , the multiplicative similarity function used in the context model embodies its assumption that a category is configurally represented (e.g., as opposed to Reed, 1972 ). An exemplar-based representation takes for granted a potential for change and development of group impressions; clearly, as new exemplars are cumulated, representations should change as well.
Although the exemplar model incorporates both dynamic and configural properties of group impressions, it falls short of explaining some quantitative properties of group impression formation. Smith and Zárate (1992) assumed that when multiple exemplars are retrieved from memory, features are averaged on a dimension. Although this averaging assumption is consistent with the well-known averaging phenomenon in person impression formation (e.g., Anderson, 1968 , 1981 ; for a review, see Kashima & Kerekes, 1994 ), it does not specify the mechanism by which the computation may be accomplished. We explicate a model that explains the averaging phenomenon while retaining the configural nature of group representations postulated by the exemplar model. The weighted averaging model ( Anderson, 1981 , 1982 ) and the context model adopted by Smith and Zárate (1992) are shown to be derivable from a more general connectionist model of memory: TPM. 2
To locate TPM in the contemporary theoretical landscape, a brief sketch of connectionist applications may be useful (for reviews, see Read & Miller, 1998 ; Read, Vanman, & Miller, 1996 ; Smith, 1996 ). Currently, there are two general connectionist approaches. Localist connectionist models assume that each information-processing unit represents a meaningful concept and that the interconnected units collectively represent a network of concepts and ideas. In this framework, simultaneous activation of the connected units produces mutual facilitation and inhibition, enabling it to reproduce surprisingly complex psychological phenomena such as stereotyping ( Kunda & Thagard, 1996 ), causal explanation ( Read & Marcus-Newhall, 1993 ; Van Overwalle, 1998 ), and cognitive dissonance ( Schultz & Lepper, 1996 ). Its strength lies in its capacity to describe the dynamics involved in the use of a network of existing concepts. In contrast, distributed connectionist models (e.g., Kashima & Kerekes, 1994 ; Smith & DeCoster, 1998a , 1998b ) take the view that a meaningful concept is represented by a pattern of activation over multiple processing units and that learning occurs as the connections among the units are modified. In this framework, a central focus is learning. TPM extends the distributed connectionist approach.
A virtue of the TPM is its versatility and generality. TPM has been used to explain memory (e.g., Humphreys et al., 1989 ; Pike, 1984 ), natural language processing (e.g., Smolensky, 1990 ), and reasoning ( Halford et al., 1994 ). We show that TPM can account for a wide range of findings on group impression formation and change: averaging phenomena in impression formation (e.g., Anderson, 1981 ), the learning of group categories from exemplars ( Smith & Zárate, 1990 ), time-dependent phenomena in group impression formation (e.g., recency, response dependency; see Kashima & Kerekes, 1994 ), stereotype change (e.g., R. Weber & Crocker, 1983 ), and category accentuation phenomena (e.g., Krueger & Rothbart, 1990 ; Tajfel & Wilkes, 1963 ). In doing so, the model incorporates a variety of theoretical insights such as variable perspective model ( Upshaw, 1969 ), the notion of individuation ( Brewer, 1988 ; Fiske & Neuberg, 1990 ), and relational information about interpersonal and intergroup relationships ( Turner, 1987 ). We report the results of three major simulations and one major experiment to support the model.
In this section we offer an overview of the model, first explicating its basic assumptions and then mathematically describing the processes of encoding, storage, and output.
Basic Assumptions of the ModelSocial perceivers acquire information about a social group mostly from their social environment. Through direct interaction with members of the group or indirect hearsay in interpersonal discourse ( Asch, 1952 ; Linville & Fischer, 1993 ; Park & Hastie, 1987 ), the perceivers construct their impressions about the group. Like exemplar theories (e.g., Linville & Fischer, 1993 ; Smith & Zárate, 1992 ), TPM assumes that particular episodes of interaction and discourse are the basis of group impression formation and change. The episodic social information is culturally structured (e.g., Bruner, 1990 ; Triandis, 1995 ). Social events typically present themselves as meaningful actions that can be described by natural languages (i.e., action verbs in Semin & Fiedler's, 1988 , 1991 , linguistic category model; for instance, "helping an old lady crossing the street"). Conversants about a group use meaningful words and phrases to characterize a group (i.e., adjectives or state verbs in Semin & Fiedler; for instance, "helpful"). It is those culturally meaningful events that engage the perceivers' cognitive activities.
The episodic nature of social information makes it necessary for a model of group impressions to represent the context in which the cognitive episode occurred ( Tulving, 1983 ). Group impressions not only are based on the information about the group but also include the information about the context in which the information was obtained. Contextual information may include the social situation in which the event was observed (e.g., at the party), temporary information such as before or after a landmark event (e.g., shortly after the landing on the moon), the person who told the perceiver about the group (e.g., "Joe told me this"), the affective state of the self, or even a simple indexical representation such as "this time" as opposed to "that time." Therefore, information is assumed to be packaged as a configuration of an event and the context in which the event occurred. A number of researchers presented evidence and arguments consistent with this assumption (e.g., McConnell, Liebold, & Sherman, 1997 ; Schaller, 1992 ; Shoda, Mischel, & Wright, 1989 ; Trafimow, 1998 ; Wright & Mischel, 1987 ).
It is assumed that an event-in-context is cognitively analyzed into various aspects and encoded into specific features. Aspects were called "respects" by Medin, Goldstone, and Gentner (1993) and "dimensions" by Turner, Oakes et al. ( Oakes, Haslam, & Turner, 1994 ; Turner, 1987 ). In this article, aspects are defined as culturally useful dimensions, and features are specific levels within those dimensions. Kashima et al. (1998) illustrated these concepts using the example of an American male viewer watching on television a documentary of an Australian Aboriginal family in the outback of Australia. From this episode, the viewer may extract the aspects of "skin color," "area of residence," and "context" and encode these aspects in terms of specific features such as "dark skin," "rural area," and "on television." The TPM, therefore, assumes that an event-in-context is interpreted into a set of features (with regard to aspects).
The TPM also assumes that thus analyzed features of an event-in-context are integrated into a coherent, configural representation, and this integration process can be mathematically modeled as the computation of a tensor product. The process of feature integration may be analogous to perceptual binding ( Crick, 1984 ; Treisman & Gelade, 1980 ), which is hypothesized to occur when a viewer's neural mechanisms rapidly bind a variety of visual features together to present themselves to the viewer as a coherent, meaningful object and event. Although the neural basis of the binding process is yet to be examined fully (for a review, see Schacter, Norman, & Koutstaal, 1998 ), the TPM may provide a computational solution to this problem ( Humphreys, Wiles, & Dennis, 1994 ).
Encoding, Storage, and Output Processes in the TPMFigure 1 provides a schematic picture of the TPM architecture involving four aspects: group, person, event, and context. Each aspect is represented by a designated cluster of cognitive units, and a pattern of activation over a given cluster of units represents a feature (e.g., a specific group label, an individuated person). The operation of the TPM can be described in terms of encoding, storage, and output processes.
Encoding ProcessThe encoding process consists of two subprocesses: feature encoding and representation construction. In the feature-encoding subprocess, an event-in-context is analyzed into a set of features (e.g., group membership, personal identity, behavioral description) and represented in a distributed format. This subprocess transforms a feature into a distributed representation of the feature. In exemplar theories, an exemplar is usually understood as a configuration of features, whereas each feature takes a unitary representation. Within a distributed representational system, however, a meaningful, apparently unitary concept (e.g., feature) may be represented as a pattern of activation over a collection of cognitive units rather than the activation of a single semantic node (see Hinton & Anderson, 1989 ; Rumelhart et al., 1986 ). For ease of exposition, it is assumed that a feature is represented by a pattern of activation over N cognitive units in a given cluster; a unit takes any value from negative infinity to positive infinity; and a unit in the resting state takes the activation level of 1/√ N .
Mathematically, a pattern of activation over
N
units is described by a real valued vector with
N
elements. In other words, one feature of an event-in-context is represented by a vector,
The subprocess of representation construction integrates the distributed feature representations into a configural representation. That is, a representation of a relevant social episode is constructed as a configuration of the features with regard to various aspects of the experience. In Figure 1 , this may be understood as the spreading of activation from the clusters of units to their connection points, and as the computation of the amount by which each connection is strengthened. This mechanism, a generalization of Hebbian learning, is mathematically modeled as the computation of an outer product of the vectors. Recall that the vectors describe the patterns of activation over the units, which represent the features of the episode. The computation of the outer product results in a mathematical entity known as a tensor. A tensor is a generalization of a vector and a matrix. A vector can be thought of as a Rank 1 tensor. When the outer product of two vectors is computed, the result is a matrix, which is a Rank 2 tensor. The outer product of three or more vectors can also be computed, resulting in a tensor of Rank 3 or higher.
Imagine an episode analyzed into four features represented by four
N
-element vectors:

Let
E
1
[
i
,
j
,
k
,
l
] represent the element at the (
i
,
j
,
k
,
l
) coordinate of the tensor,

The same notation can be generalized to a rank
m
tensor:
The present article mostly deals with representations consisting of group label, person, behavior episode, and context. For example, an episode in which the observer witnessed George, a member of a soccer club, help an old lady crossing the busy street may be represented by a rank four tensor of the form,

where
The representation of the behavior episode may include any inferences made of the episode, such as traits (e.g.,
Sherman, 1996
; cf.
Uleman, Newman, & Moskowitz, 1996
), agentic or communal orientations from role expectations (e.g.,
Eagly & Steffen, 1984
;
Hoffman & Hurst, 1990
), and generalized expectations based on a group's performance and decision (e.g.,
Allison & Messick, 1985
; for a review, see
Allison, Mackie, & Messick, 1996
). We also assume that the amount of attention directed to a given episode may vary. The amount of attention at time
t
is indexed by the attentional parameter, a
t
(0 <= a
t
<= 1), where 0 is no attention and 1 is full attention. The encoded event at time 1 is, therefore, represented as a
1
Once a mental representation of an event is constructed, it is stored in memory. The central assumption of TPM is that every new representation is superimposed on preexisting representations. With the passing of a unit time period, the memory trace is assumed to weaken as specified by the forgetting parameter, b (0 < b < 1). The storage operation is then modeled as a tensor addition. For instance, the representation of a new episode,

where M
1
[
i
,
j
,
k
,
l
] = bM
0
[
i
,
j
,
k
,
l
] + a
1
E
1
[
i
,
j
,
k
,
l
]. This is equivalent to the strengthening of the connections among the units in
Figure 1
.
More generally, assuming that the tensors are of the same rank and dimensionality, the memory representation at time
t
- 1,

Note that the attention parameter is time dependent: That is, it may vary from time to time; however, the forgetting parameter is assumed to be a system constant.
Two kinds of output processes have been examined in the group impression formation literature: judgment and classification. In judgment, an overall impression is reported on a rating scale (e.g., Hamilton & Gifford, 1976 ); in classification, exemplar information is used to classify the exemplar into an appropriate category. The difference lies in the use of cues. In judging a group, the label of the group (e.g., group A) is used as a cue to access memory, and whatever is remembered is reported on rating scales. In classification, it is a concrete example that acts as a cue, and an associated group label is retrieved from memory.
To model the two types of output processes, Kashima et al. (1998) used two operations, retrieval and matching, postulated by Humphreys et al. (1989) . According to Humphreys et al., the retrieval operation is involved in recall in which a piece of information is retrieved from memory. This is modeled within the tensor product framework as the accessing of a distributed memory representation by a lower ranked tensor. For instance, if the memory representation involves a Rank 3 tensor, a Rank 2 tensor is used as a cue. This operation results in the emergence of a distributed representation (i.e., a vector). This process is analogous to classification, in which memory is accessed by the representation of an episode without a group label (a tensor of a lower rank), and a distributed representation of a group label (a vector) is retrieved.
More formally, let

where the vector,

The retrieved vector,
The matching operation was used by Humphreys et al. (1989) to model recognition memory. They postulated that recognition judgment is based on a sense of familiarity that people feel when they see an object. Matching involves the accessing of memory by a cue represented by a tensor of the same rank, returning a scalar, which indicates a general feeling of matching strength, or a feeling of knowing. People would take a greater matching strength as an indication that they have seen the object before. Kashima et al. (1998 ; also see Kashima & Kerekes, 1994 ) postulated that bipolar impression judgment involves a process analogous to recognition memory. They suggested that, in making a group impression judgment on a bipolar scale (e.g., likeability, trait, or attitude dimensions), people access memory by cues containing the group label, context information, and the high- and low-end anchors of the judgment scale.
More formally, let us designate by a tensor of Rank 4,
The matching operation is defined as

and

These operations return a scalar that approximates the similarity between the memory and the cue, which can be interpreted psychologically as the general feeling of familiarity.
The judgment process is then modeled as follows:

This equation embodies the assumption that the judgment scale provides a frame of reference in which people place the target group. People are assumed to access the memory representation by the higher end (
Equation 6
) and the lower end (
Equation 7
) of the scale. They then evaluate the relative "closeness" of the target group to the higher end relative to the lower end. This evaluation is used to make a judgment on the bipolar scale (
Equation 8
). Note that this is a special case of the relative goodness rule (
Massaro & Friedman, 1990
; see also
Luce, 1959
).
The major characteristic of the TPM is its capacity to model group impressions as dynamic configurations. The model explicitly traces the dynamic development of the mental representations of a social group over time as new information is encountered. It also provides a way of describing a configural representation that Asch's (1946 , 1952) Gestalt approach postulated. As noted by Read et al. (1996) , Kashima and Kerekes's (1994) simple linear-associator does not handle a complex configural representation; however, the TPM rectifies this limitation and contributes to the configural research tradition that Read et al. advocated.
In modeling the output process for impression judgment, TPM incorporates the insight of Upshaw's (1969) variable perspective model. According to Equation 8 , a judgment is, generally, a function of how similar the memory is to the high-end anchor relative to its similarity to the low-end anchor (for detailed discussion, see Kashima & Kerekes, 1994 ). This implies that people interpret the adjectives and words that are used in judgment scales differently, and observed judgments can vary as a function of the mental representations of the scale anchors (e.g., Campbell, Lewis, & Hunt, 1958 ; Manis, 1967 ; Ostrom & Upshaw, 1968 ; Volkmann, 1951 ). More recently, Biernat et al. ( Biernat & Kobrynowicz, 1997 ; Biernat & Manis, 1994 ) convincingly demonstrated the importance of this insight in group-relevant judgments by showing that a social group membership of targets can alter the mental representations of end anchors of judgment scales.
Several points are noteworthy about the representations of judgment scale anchors. First, scale anchors must pertain to an aspect of the tensor representation (e.g., in the present case, to the event aspect). Second, we assume that the scale anchors are selected by the experimenter so that they are relevant to the expected memory content.
4
If they are irrelevant to the event memories, both Match(
In this section, we contrast TPM with three most relevant connectionist theories.
Kunda and Thagard's (1996) IMP ModelKunda and Thagard's (1996) theory, which modeled person impression formation as a parallel constraint satisfaction process, differs from TPM in two respects. First, the Kunda-Thagard model treats an observer's "knowledge" about a social group as given and describes its use in forming impressions about a person. As Kunda and Thagard (1996) noted, their model does "not address the question of how incoming information may alter one's knowledge about stereotypes, behaviors, and their associations" (p. 304). It is this process of formation and change, or temporal dynamics, of group impressions that TPM is designed to address.
Second, cognitive architectures differ. The present model assumes a distributed representation, whereas the Kunda-Thagard model assumes a localist representation. In a distributed representational system, a vector is used to represent a concept, whereas a localist representational system uses a meaningfully interpretable "node" to represent a concept. An advantage of the distributed representational system in the present context is the ease with which it can explain the averaging phenomenon in impression formation ( Kashima & Kerekes, 1994 ).
Fiedler's (1996) BIAS ModelFiedler (1996) proposed the BIAS (Brunswikian Induction Algorithm for Social Cognition) model to explain a number of judgmental biases found in social cognition. This model uses a distributed representational system in which a piece of information is represented as a vector. Biases are explained as a consequence of the process of aggregating a number of representations. In terms of its formal property, BIAS is a special case of the tensor product model. When a Rank 1 tensor (or a vector) is used to represent a concept, TPM reduces to BIAS. Alternatively, TPM may be thought of as an extension of BIAS along the dynamic configural line. BIAS uses a vector representation but does not construct a configural representation that conjunctively combines features of a stimulus object. BIAS is not a memory model, but TPM is grounded in the memory literature.
One major difference between BIAS and TPM lies in their metatheoretical interpretations of the mathematical formalism. TPM takes a cognitive perspective and assumes that the processing of distributed representations as characterized by the mathematical formalism is an algorithmic description of the cognitive processes ( Marr, 1982 ). By contrast, BIAS explicitly interprets a distributed representation as a set of multiple proximal cues that bears probabilistic relations with a distal object within Egon Brunswik's (1956) theory of perception, without adopting "the cognitive metaphor" ( Fiedler, 1996 , p. 200). Nevertheless, the metatheoretical difference may be more apparent than real in that Marr's algorithmic theory does not make a strong commitment to the way in which a process is implemented in a physical system. Processing units in TPM (or any distributed representational system for that matter) can be interpreted as "proximal cues."
Smith and DeCoster's (1998a , 1998b) Recurrent NetworkSmith and DeCoster (1998a , 1998b) used an autoassociative network to model the process of person perception and memory. Like the TPM and Fiedler's BIAS, their model adopts a distributed representational system. However, its capacity for memory makes it different from BIAS. Further, the architecture and learning algorithm of their autoassociative network differs from the TPM. The processing units are all linked to each other (except with themselves) in the recurrent network, whereas the TPM's associative links are limited to the units that represent different aspects of a social event. The TPM uses a version of the Hebbian learning rule ( Kashima et al., 1998 ), but the Smith-DeCoster model uses the delta rule, which is designed to minimize the network's error in reproducing an input vector.
Like Kunda and Thagard, Smith and DeCoster model the domain of person perception, although they sometimes reported simulations pertinent to group impressions. Smith and DeCoster's modeling attempt differs from ours in two respects. First is the level of abstraction at which the research programs are pitched. Smith and DeCoster are generally concerned about describing the stereotype learning and use at an abstract level, whereas we attempt to model empirical phenomena at a concrete level, much closer to data. Therefore, Smith and DeCoster did not model different types of output processes (i.e., impression judgment vs. classification learning), a variety of time-dependent properties of group impression formation (to be discussed later), and so on.
Second, Smith and DeCoster's model and TPM may also describe different types of learning processes. McClelland, McNaughton, and O'Reilly (1995) suggested that there are two types of learning processes: a slow-learning system that extracts general regularities, postulated to be implemented in the neocortex, and a fast-learning system that requires attentional resources and binds novel stimuli to construct an episodic representation, which is said to be localized in the hippocampal region. On the one hand, McClelland et al., as well as Smith and DeCoster (1997) , argued that a connectionist learning system that uses an error-driven algorithm (such as the delta rule) may be suitable for modeling the slow-learning process. On the other hand, Dennis and Humphreys (1997) postulated that a mechanism similar to the TPM may be able to describe the fast-learning system (also see Wiles & Humphreys, 1993 ).
This discussion suggests that Smith and DeCoster's model may be best understood as an attempt at modeling the slow-learning mechanism. An empirical inadequacy of Smith and DeCoster's model (e.g., Kashima & Kerekes, 1994 ; also Busemeyer & Myung, 1988 ) may be interpretable in this light. As we discuss later, Busemeyer and Myung analytically proved that the learning mechanism involved in the distributed memory system developed by McClelland and Rumelhart (1985 , 1986) and used by Smith and DeCoster (1998a , 1998b) predicts that the way in which people estimate the prototype of a category is time invariant (to be discussed later more fully). However, Busemeyer and Myung's as well as Kashima and Kerekes's (1994) data contradicted this prediction. Later in this article we show that group impression data also contradict it. Although it is too early to tell, Smith and DeCoster's model may be more suitable for modeling the slow-learning system, whereas TPM may be better suited for modeling the fast, binding mechanism.
In this section, we first model group impression formation processes within the TPM framework and then report an experiment in which TPM predictions are tested.
Modeling Group Impression FormationTwo types of experimental paradigms have been used to examine the formation of group representations ( Kashima, 1999 ; Kashima et al., 1998 ). One type is based on the classification learning paradigm ( Medin & Schaffer, 1978 ). Exemplars that vary along multiple dimensions are classified into two novel groups. The participants' task is to learn the classification and to classify new stimuli into the two categories ( Smith & Zárate, 1990 ). The other type is analogous to the person impression formation paradigm, in which novel groups are described by a series of stimuli, and experimental participants are told to make judgments about a group. A well-known example is the distinctiveness-based illusory correlation ( Hamilton & Gifford, 1976 ). The classification and judgment paradigms have produced two different theories, which have not been integrated within a single theoretical framework.
Classification Learning
Kashima et al. (1998)
showed that the TPM is consistent with the generalized context model of classification learning (
Nosofsky, 1984
,
1986
;
Smith & Zárate, 1992
). The generalized context model assumes that when people learn to classify into
n
groups exemplars that vary along multiple dimensions (e.g., artistic vs. scientific, sociable vs. unsociable), the classification decision for a new exemplar is a function of the similarities of the new exemplar with the learned exemplars. In a typical experiment, people learn to classify exemplars
E
ij
into groups
G
i
and are later tested for their classification of the old and new exemplars. The probability of classifying a test exemplar,
T
, into the
i
th group,
G
i
, is:

where
s
(
E
ij
,
T
) is the similarity between a learned exemplar (
j
th exemplar of the
i
th group),
E
ij
, and the new exemplar,
T
. Note that
G
i
∋
E
ij
indicates that the summation is over all the exemplars that belong to the
i
th group,
G
i
.
The context model (
Medin & Schaffer, 1978
) postulates that the overall similarity between a learned exemplar and the test exemplar is a multiplicative function of the dimensional similarities:

where
s
(
E
ijk
,
T
k
) is the similarity between the learned exemplar,
E
ij
, and the new exemplar,
T
, on the
k
th dimension (where
k
= 1 ...
K
).
From an analytical perspective, the classification choice predicted by TPM is consistent with the context model as characterized by
Equation 9
and the multiplicative similarity function as in
Equation 10
(see
Kashima et al., 1998
, for a general proof). We assume that the values that the exemplars
E
ij
and
T
take on the
k
th dimension are encoded as
⊗
⊗

When this is accessed by the new exemplar,

where the weight, ∑
Gi
∋Eij
∏
k
(

Substitute the equality, ∏
k
(
Empirically too, Kashima et al. (1998) reported that Smith and Zárate's (1990) experimental results on classification learning were closely reproduced by a computer simulation of the TPM. In Smith and Zárate's experiment, human participants learned to classify nine exemplars into categories A and B (five to A and four to B). Later they were given these nine exemplars and seven new exemplars to classify into A and B. Figure 2 presents the probability of classifying the nine old and seven new exemplars to category A, observed in the experiment (dashed line) and obtained in the simulation. The simulation results closely followed the empirical results.
Impression Judgment
Kashima et al. (1998)
pointed out that TPM is compatible with the weighted averaging model as well. Again, a simplified version of the proof is provided here. Suppose that participants learn group labels, individual members, their behavior episodes, and the context in which the episode was observed. We designate the
i
th group's label
G
i
(
i
= 1 ...
I
), the
j
th person in the
i
th group
P
ij
(
j
= 1 ...
J
), the person's
k
th behavior episode
E
ijk
, and the context in which the episode is observed,
X
1
, (1 = 1 ...
L
). Most researchers assume that the weighted averaging model can describe group impression judgments under this circumstance, so that the impression judgment for the
i
th group is described as follows:

where
w
i'jk
and
s
i'jk
designate the weight and scale value of the stimulus,
S
i'jk
. The relative weight for the stimulus,
S
i'jk
, is defined as
w
i'jk
/∑
j
∑
k
w
i'jk
.
According to TPM, the event involving the
k
th episode of the
j
th person in the
i
th group observed in the
l
th context is encoded as a Rank 4 tensor,

Substituting
Equation 15
into
Equation 8
(judgment model) and simplifying it using
Equations 6
and
7
, we obtain the following:

assuming that all group label representations,
Let us define

and

When
Equations 17
, and
18
are substituted into
Equation 16
, we obtain
Equation 14
, which represents the weighted averaging model.
5
The assumption that the weighted averaging model holds in group impression formation has not been tested in the literature.
Equation 17
suggests that the scale value should remain relatively constant regardless of the context and person vectors because the scale value,
s
i'jk
, is a function only of the similarities of the exemplar and the scale anchors. By contrast,
Equation 18
suggests that the weight,
w
i'jk
, can vary as a function of the person and context representation. In particular, it is important to note that the weight for an exemplar varies as a function of (
The foregoing discussion implies that, according to TPM, group impressions bear a dynamic relationship with the information environment. Once an event-in-context is positively or negatively encoded for instance, it is stored in memory, and group impressions are constantly updated. The resultant representations about social groups can quite accurately reflect the probabilistic environment with which the connectionist learning system interacts. However, the relationship between group impressions and the probabilistic property of the information environment is rather complex. TPM predicts that group impressions vary as a function of both the probability of types of events encoded about a group and the total amount of information (or number of events) learned about the group.
To see this, we consider a simple case. Suppose that one event is learned about a group member and that there are
J
members of the group. That is, the number of events observed about the group is
J
. Further suppose that, of those
J
events, the probability of positive to negative events is
p
. Under some simplifying assumptions, the impression judgment about the group can be written as follows (
see Appendix
for proof):

where
s
0
is the effect of the prior memory and
s
p
and
s
n
represent the scale values of the positive and negative events. Assuming that
s
n
<
s
0
<
s
p
,
Equation 19
implies that when
J
is constant, impression judgment is more positive when
p
is greater, and as
J
becomes very large, judgment approaches
ps
p
+ (1 -
p
)
s
n
, a value that is a function only of
p
. Therefore, impression judgments should reflect the probabilistic property of the information environment fairly accurately in the long run.
6
Equation 19 also implies that impression judgment varies as a function of the total amount of information, J , when J is relatively small, even if p is constant. In particular, if p > ( s 0 - s n )/( s 0 - s n ), judgment increases (or becomes more positive) as J increases; if p > ( s 0 - s n )/( s 0 - s n ), judgment decreases (or becomes more negative) as J increases. This implication obtains because of s 0 , the effect of the prior memory ( see Appendix ). In other words, when the number of positive events learned about a group is large relative to that of negative events, a group about which social perceivers know a great deal is more positively evaluated than a group about which they know only a little. Conversely, when the number of positive events learned about a group is small relative to that of negative events, a group about which social observers know a great deal is more negatively evaluated than a group about which they know only a little.
Both of these implications of TPM are, in fact, consistent with the distinctiveness-based illusory correlation first identified by Hamilton and Gifford (1976 ; also see Hamilton, Dugan, & Trolier, 1985 ), arguably the first experiment that reported about group impression formation. In their Experiment 1, Hamilton and Gifford presented 39 behavior episodes performed by individual members of two groups (A and B). The majority group exhibited 18 positive and 8 negative behaviors, whereas the minority performed 9 positive and 4 negative behaviors. Although the ratio of positive to negative behaviors remained constant across the two groups, the overall impression formed was more positive for the majority than for the minority group. In their Experiment 2, they showed the reverse tendency; that is, when more negative than positive behaviors were shown, the majority was evaluated more negatively than the minority. This finding has been replicated in a number of experiments (see Mullen & Johnson, 1990 ).
This account of the phenomenon differs from that of Hamilton and Gifford (1976) . According to them, the combination of a minority status and infrequent negative behaviors makes this class of episodes distinctive. This distinctiveness is analogous to the pairing of exceptionally long words (e.g., blossoms-notebook) in Chapman's (1967 ; also see Chapman & Chapman, 1967 , 1969 ) experiments. Chapman's participants overestimated the frequency of the occurrence of distinctive pairings. Likewise, the participants in Hamilton and Gifford's experiment weighted the infrequent behaviors more than others, leading to the more negative impression of the minority group.
More recently, a number of researchers suggested alternative explanations. First, Fiedler (1991 , 1996 ; Fiedler & Armbruster, 1994 ; also see Smith, 1991 ) suggested that information loss can explain the Hamilton-Gifford phenomenon within the BIAS model framework. Positivity of information was represented by a vector, and another vector perfectly negatively correlated with it as representing negativity. A judgment was modeled by a correlation between the positivity vector and the sum of all vectors representing the behavioral information of the majority and minority groups. Even if the minority and majority groups exhibit the same level of positivity (or negativity), the magnitude of the correlation between the vector sum and the positivity vector was greater for the majority than that for the minority only when some errors were introduced to the vectors representing the behaviors. This amounts to a more extreme judgment (either negative or positive) for the majority than for the minority when errors are present or information is lost. Second, McGarty, Haslam, Turner, and Oakes (1993) suggested that the participants have a preconception about the relationship between two contrasted groups. The contrastive relationship may contribute to the differential evaluations of the groups.
The robustness of the Hamilton-Gifford phenomenon suggests that it may be multiply determined (e.g., Berndsen, Spears, McGarty, & van der Pligt, 1998 ; Mackie, Hamilton, Susskind, & Rosselli, 1996 ). The TPM framework may provide a possibility for incorporating a number of explanations of the Hamilton-Gifford phenomenon and evaluating relative contributions of these effects. To begin, the TPM is not inconsistent with the distinctiveness-based account and can incorporate it in terms of the attentional parameter postulated in Equation 8 . In addition, if random error vectors are added to the encoded behaviors in the TPM (as is routinely done in simulations), this can produce the condition simulated by Fiedler (1996) . Finally, as suggested by McGarty et al. (1993) and Berndsen et al. (1998) , it is possible that the process of differentiating the contrasting groups (discussed later in the Group Differentiation section) may be involved in the process.
Furthermore, research identified a number of limiting conditions of the illusory correlation phenomenon ( Berndsen et al., 1998 ; McConnell, Sherman, & Hamilton, 1994 ; McConnell, Sherman, & Hamilton, 1997 ; Pryor, 1986 ; Sanbonmatsu, Sherman, & Hamilton, 1987 ; Schaller, 1992 ; Schaller & Maass, 1989 ). As suggested by Hamilton and Sherman (1996 ; also Berndsen et al., 1998 ), a theory based on the concept of entitativity ( Campbell, 1958 ), the extent to which a group is perceived to be a coherent entity, may provide an integrative explanation of the complex processes involved in the illusory correlation phenomenon. In the meantime, it should be noted that the effect of the prior memory, as suggested by the TPM, may also play some role in generating this robust phenomenon.
Time-Dependent Properties of Group Impression FormationImpression judgments are time dependent. When a series of stimuli are presented and a judgment is made, weight given to a stimulus for the judgment depends on time. The TPM makes detailed and novel predictions about time dependence of impression formation, which are explicated here and tested in a later experiment reported. These predictions are couched in terms of a serial position weight, which is the weight given to a stimulus that occupies a certain serial position for a given impression judgment. When a series of J stimuli ( j = 1 to J ) is presented and a judgment is made after the J th stimulus, the weight given to the j th stimulus is written as SPW( j , J ). For example, SPW(1, 4), SPW(2, 4), SPW(3, 4), and SPW(4, 4) indicate the serial position weights for Stimulus 1 through 4 computed based on the judgment made after the fourth stimulus was presented.
Linearity and time invariance.
Busemeyer and Myung (1988
;
Myung & Busemeyer, 1992
) showed that a number of connectionist models of category learning can be tested by examining time dependence of impression formation. Those models include Metcalfe-Eich's (
Eich, 1982
) holographic memory model,
Hintzman's (1986)
multiple-trace memory model,
Knapp and Anderson's (1984)
distributed memory model, and
McClelland and Rumelhart's (1985)
connectionist model.
Smith and DeCoster (1998a)
also used McClelland and Rumelhart's model. In particular, Busemeyer and Myung showed that prototype estimate, which is an experimental participant's estimate of the prototype of a category, can be predicted by these models and rewritten in the following form:

where
By contrast, TPM does not have the time-invariance property, although it implies linearity under a certain circumstance (see Kashima & Kerekes, 1994 , Kashima et al., 1998 ). Busemeyer and Myung's (1988 ; Myung & Busemeyer, 1992 ) empirical studies showed that human prototype estimates are largely linear but not time invariant. Although Kashima and Kerekes (1994) showed that person impression judgments are time variable, it is yet to be examined whether group impression judgments are time variable or not. Further, a linearity assumption of group impressions has never been tested directly.
Response dependency.This means that the weight of a stimulus for a judgment depends on whether, and if so when, another judgment is made between the stimulus and the judgment. Kashima et al. ( Kashima & Kerekes, 1994 ; Kashima et al., 1998 ) suggested that the TPM predicts a response dependency in impression judgments under some conditions. This prediction rests on two arguments. First, Equation 18 predicts that the weight for a given exemplar is a function of the similarity between the learning and judgment contexts, other things being equal. Therefore, under the condition in which people are expected to interpret the judgment context to be different from the learning context, the exemplar will be weighted differently to the condition in which the judgment context is interpreted to be the same as the learning context. The greater the similarity between the learning and judgment contexts, the greater should the weight be for an exemplar.
More formally, according to TPM (
Equation 18
; i.e., ignoring the attentional and forgetting parameters), SPW(
j
,
J
) can be described by the following equation under some simplifying assumptions:

where
Second, it is hypothesized that the act of making a judgment often prompts people to alter the context representation. Suppose that an experimental participant receives a first series of stimuli, makes a first judgment, receives a second set of stimuli, and then makes a second judgment. When one task using the first stream of stimuli is completed by making a judgment, the subsequent stream of stimuli may be differentiated from the first. This differentiation of the two sets can be represented as a change in context representation in TPM ( Kashima & Kerekes, 1994 , Footnote 5). Martin (1986) made a similar suggestion in his analysis of assimilation and contrast effects.
Based on this formulation, two predictions follow. First, the weight for the j th stimulus for the judgment made after the J th stimulus, SPW( j, J ), should vary depending on whether another judgment is made between the j th and J th stimuli. Suppose that a series of J stimuli are presented, and a first impression judgment is made after the J 0 th stimulus, and a second impression judgment is made after the J th stimulus is presented. The weights estimated from the second judgment, SPW( j, J ), should increase as a step function of j , where the increase occurs at the position of J 0 (Prediction 1). This is because the context representation for the stimuli before the first judgment is expected to be less similar to the context representation for the stimuli after it. Provided that the serial position weight is a function of the similarity between the learning and judgment contexts ( Equation 21 ), the first set of stimuli should be weighted less than the second set.
Second, the same mechanism predicts that the serial position weights before the first judgment estimated from that judgment should be greater than the weights for the same positions estimated on the basis of the second judgment (Prediction 2). That is, SPW( j, J 0 ) > SPW( j, J ), where J 0 > j . This is because the context representation for the first set of stimuli is expected to be less similar to the context for the second judgment than the context for the first judgment ( Equation 21 ). A test of these predictions is reported later.
Order effect.When the response dependency is not an issue, an order effect is another instance of time dependency. If information encountered earlier has a greater effect on a judgment than recent information, it is called a primacy effect; a greater effect of recent information relative to earlier information is called a recency effect. Hamilton and Sherman's (1996) formulation suggests that a group impression may exhibit a primacy or recency effect depending on the target group's perceived entitativity (see Manis & Paskewitz, 1987 , for some evidence). When a group is highly entitative (i.e., a group is perceived to be an entity), people would attempt to form an integrative impression in the same fashion as in person impression formation. In person impression formation, observers have been postulated to direct a decreasing amount of attention to later stimuli presumably because the later stimuli are assumed to be redundant (see Kashima & Kerekes, 1994 , for a review). This implies a primacy effect resulting from attention decrement when a group is perceived to be an entity.
In contrast, when a group does not have a high level of entitativity, group impression formation may be conceptualized as a series of person impression formation. This suggests that the same attention decrement may occur for each individual because the person is perceived to be an entity. However, when a new individual member is encountered, attention is renewed; therefore, no systematic attention decrement should occur for the overall group impression. It implies that a recency effect may obtain for impressions of a low entitative group. This is because earlier information may be forgotten, and more recent information may have a greater impact in the absence of attention decrement. This process is modeled in TPM by the attentional and forgetting parameters, a and b ( Equation 4 ). As a function of the relative magnitude of the parameters, a primacy or recency effect could occur (see Strange, Schwei, & Geiselman, 1978 , for results consistent with this reasoning). An experiment is reported, testing the hypothesis that a recency effect is likely to occur when impressions are formed when groups are not entitative.
Experiment: Time Dependency of Group Impression FormationThis experiment tested the time-dependent properties of group impression formation predicted by the TPM. A fictitious group of four friends served as a target group. Each individual was attributed two opinions on social issues relevant in Australia: republicanism (whether Australia should become a republic) and Aboriginal issues (whether Australian Aboriginals should receive a better treatment). This provides an experimental condition comparable to Dreben, Fiske, and Hastie's (1979) person impression formation experiment, in which a person was described by four sets of two behavior episodes. Participants were told to form an impression of the group and made their judgments on the group's opinion on Aboriginal issues in five different judgment conditions. In the final responding condition, all opinion statements were presented first, and a judgment was made. In the continuous-responding condition, a judgment was made after each individual. In the other three conditions, judgments were made twice. In the (1, 4), (2, 4), and (3, 4) conditions, a judgment was first made after the first, second, or third person, respectively, and also after the fourth person.
This design allowed us to test the TPM assumption that an implication of a behavior episode constitutes part of the representation of an encoded event. Each stimulus individual expressed an opinion about Aboriginal issues as well as an opinion on republicanism, which is clearly an issue distinct from but related to Aboriginal issues according to a pretest. Republicanism is a stance about the constitutional status of Australia as a nation. Currently, the head of the state of Australia is the British queen. However, republicans take the view that Australia should become a republic. Australian students tend to believe that a sympathetic stance to Aboriginals and a republican stance go together, presumably because they express a liberal attitude. One half of the stimulus groups consisted of individuals who expressed pro-republican opinions, and the other half expressed all anti-republican opinions. If the implication of an opinion on republicanism is encoded, this should have an effect on the overall impression about the group's opinion on Aboriginal issues.
Method Participants.Eighty (24 male, 56 female) undergraduate students at La Trobe University participated in this experiment for AUD5 per hour.
Design.Five response conditions were constructed. In the final responding condition, participants were given attitude statements purportedly made by four members of a group and made an impression judgment after the fourth stimulus person. In the sequential responding condition, a participant made an impression judgment after each group member. In the other three conditions, two judgments were requested: after the first and fourth members in the (1,4) condition; after the second and fourth members in the (2,4) condition; and after the third and fourth members in the (3,4) condition. Sixteen participants were randomly assigned to each condition.
Stimulus.To construct stimulus groups, 60 attitudinal statements (30 favoring and 30 opposing) on each of the republican and the Aboriginal issues were written. These issues were chosen because a pilot study ( N = 20) showed that they were perceived to be related to each other in that those who are in favor of Australia becoming a republic were perceived to be likely more sympathetic to Aboriginals. In a separate pilot study, 40 participants drawn from the same pool as those who participated in the main experiment were asked to judge whether the statements "favors or opposes Australia becoming a republic" or "favors or opposes Aboriginal people receiving a better treatment" on an 11-point scale (0 = opposes, 10 = favors ). Four statements with highest ( M > 7.5; e.g., "Anyone who denies the Aboriginal population the chance to retain some of their land is being racist and immoral") and four statements with lowest ( M < 2.5; e.g., "Aborigines are basically unemployable because they are all lazy and disruptive") mean scale values were selected.
Each participant was shown 64 groups of stimulus individuals in total (presented in a random order for each participant), of which 32 groups were crucial to the experiment. The other 32 groups served as fillers to mask the repetitive nature of judgments about the crucial groups. Each group was said to consist of four friends whose opinions on two social issues were presented. One issue was a target issue (Aboriginal issue; whether Aboriginal people should receive a better treatment) and the other issue was a related issue (republican issue; whether Australia should become a republic). The 32 stimulus groups were constructed so that they embodied five within-participant factors (two levels in each): group's opinion on the related issue (high vs. low on republican issue), first, second, third, and fourth members' opinion on the target issue (high vs. low on the Aboriginal issue), where "high" means favorable to Australia becoming a republic and sympathetic to Aboriginals.
In half of the 32 stimulus groups, all members expressed "high" opinions on the republican issue, and in the other half, all members had "low" opinions. On the Aboriginals issue, the members' opinions varied in accordance with the factorial design (e.g., HHHH, HHHL, HHLH). However, to ensure that all four different high statements and low statements appear at each position equally frequently within an experimental condition, Anderson's (1973 ; also see Kashima & Kerekes, 1994 ) design was used. First, four high and low statements were each randomly numbered from 1 to 4. In Stimulus Set 1, statements were ordered from Number 1 to Number 4 in all groups. In other stimulus sets (Sets 2 through 4), the order of statements was varied according to the Latin square design (4, 1, 2, 3; 3, 4, 1, 2; and 2, 3, 4, 1). Four participants were randomly assigned to each stimulus set in each response condition. Each group member's opinion on the republican issue was always presented first, followed by an opinion on the target Aboriginal issue.
This design also permitted an estimation of serial position weights, on which the most TPM predictions were based. Recall the notation, SPW( j, J ), which refers to the weight for the j th stimulus for the judgment made after the J th stimulus. The estimation procedure was as follows. First, we summed the judgments after the J th stimulus for all the sequences of stimuli whose j th stimulus person expressed a positive attitude toward the issue and then summed the judgments after the J th stimulus for all the sequences of stimuli whose j th stimulus person expressed a negative attitude toward the issue. Finally, the second sum was subtracted from the first sum. The difference score should be a linear function of SPW( j, J ). For the rationale of this method, see Anderson (1973) and Kashima and Kerekes (1994) .
Procedure.Participants were greeted by a male experimenter and shown to a computer. After the experimenter ensured the participants' familiarity with the equipment, all the instructions were given on the screen. The instructions informed that the experiment was concerned with how people form impressions about groups and would be shown many groups of friends who express various opinions about social issues. First, they were asked to express their own opinions on various social issues, including the republican and Aboriginal issues using 11-point scales (0 = opposes, 10 = favors ). The mean scores were 7.8 and 7.4, indicating generally liberal attitudes. A practice session was then presented, in which the participants were shown a series of four statements attributed to four different individuals and asked to make judgments about the groups on the same 11-point scale regarding social issues such as the republicanism and Aboriginal issues. Each statement was presented for 7 s, and judgments were self-paced. The schedule by which judgments were requested was the same as in the main experiment. After the practice session, the participants were prompted for questions. No questions were asked. The main experiment then began. Judgments about the target Aboriginal issue were made on the 11-point scale (0 = opposes, 10 = favors ). The experiment lasted 50 to 70 min. Participants were thanked and debriefed.
Simulation procedures.Identical experimental conditions were simulated using Mathematica on a Silicon Graphics Indy Workstation. The attention parameter a was set at 1 for the first stimulus and .8 for a second stimulus for each person. The forgetting parameter b was set at .95. For each condition, 16 simulations were run, and the estimates of serial position weights were computed based on these simulations ( see Appendix for details).
Results Additivity.To test the additivity of the impression judgments, a five-way factorial analysis of variance (ANOVA) was conducted on the overall impression judgment for the final responding condition. Five within-participants factors were related attitude (high vs. low), Position 1 (high vs. low), Position 2 (high vs. low), Position 3 (high vs. low), and Position 4 (high vs. low). The TPM predicts a significant main effect for each of the five within-participants factors but no interaction effects. As predicted, the five main effects were significant, F (1, 15) = 9.37, 88.59, 154.64, 345.19, and 141.13, all p s < .01, for related attitude, Position 1, Position 2, Position 3, and Position 4, respectively. No interaction effects were significant. The size of the additive effects was substantial (72% of the total variance), and the interaction effects were relatively minor (less than 2%).
Overall patterns of serial position weights.SPW(1, 4) through SPW(4, 4) were computed separately for the conditions in which related attitude statements were high or low. A Related Attitude (high vs. low) × Position (Position 1 to 4) ANOVA was conducted on these estimated weights for each response condition. None of the main and interaction effects involving related attitude was significant ( F < 1.20). The serial position weights were then computed by averaging across the judgments for the high and low related attitude conditions. All subsequent analyses are based on this measure. The means from the simulation are reported in Figure 3 , which constitute theoretical predictions, and comparable means for the human judgments are reported in Figure 4 . The human and simulation results were similar, as seen from the figures. The correlation between the human data and simulated results was .88 overall and .90, .87, .99, .96, and .79 for the final responding, continuous responding, (1, 4), (2, 4), and (3, 4) conditions, respectively.
Time variability.A test of time variability was conducted for the continuous-responding condition first. If the group impression judgments were time invariant, there should be no difference among SPW(1, 1), SPW(2, 2), SPW(3, 3), and SPW(4, 4). This is because these serial position weights are for the stimulus immediately preceding the judgment (i.e., there is no difference in time lag); time invariance predicts that the serial position weights should not differ. However, as seen in Panel B, Figure 3 , TPM predicts some time variability. A repeated measures ANOVA on the serial position weights from the human data yielded a reliable linear trend, F (1, 45) = 5.71, p < .05, suggesting time variability of group impression judgments.
A comparable test of time variability was conducted using the serial position weights for the (1, 4), (2, 4), (3, 4), and final responding conditions. In particular, SPW(1, 1) from the (1, 4) condition, SPW(2, 2) from the (2, 4) condition, SPW(3, 3) from the (3, 4) condition, and SPW(4, 4) from the final responding condition were compared. Note that all these serial position weights indicate the weights given for the stimulus that immediately precedes judgments. Given that there is no difference in time lag, time invariance predicts no difference among them, whereas TPM predicts difference. A one-way ANOVA revealed a significant linear trend, F (1, 60) = 50.83, p < .001, supporting TPM.
Response dependency.To illustrate response dependency, it is most informative to examine the final and continuous-responding conditions first. In the final responding condition, in which only one judgment was made, there should be no effect of prior responses. Therefore, only effects of attention and forgetting (a and b in Equation 4 ) are expected. The serial position weights here are expected to be smooth as seen in Panel A of Figure 3 (simulation). By contrast, in the continuous-responding condition, in which a judgment was made after each stimulus person, the serial position weights are expected to change radically. Examine the shape of connected points for Panel B in Figure 3 . There is a strong upward swing in each line. This is expected because the stimulus associated with the context for a judgment (i.e., the most recent stimulus in the continuous condition) should have the greatest weight, other things being equal ( Equation 21 ).
Generally, TPM suggests that the serial position weights should depend on the schedule of impression judgments. In particular, the estimated weight for a given serial position should change when a judgment has been made before that position relative to the condition in which no judgment has been made. This implies that the serial position weights should vary as a function of the response condition. A Position (14) × Response condition (5 conditions) ANOVA on SPW(1, 4) through SPW(4, 4) yielded the predicted interaction of position and response condition, F (12, 225) = 2.82, p < .01.
TPM makes more detailed predictions. First, when an impression judgment is made twice, the weights estimated from the second judgment should increase as a step function of serial position where the increase occurs at the position of the first judgment (Prediction 1). An examination of SPW(1, 4) through SPW(4, 4) of the simulation results (see Figure 3 ) verifies this prediction. Note that in the (1, 4) condition (Panel C), SPW(1, 4) increases to SPW(2, 4), and the rest is relatively stable to SPW(4, 4); in the (2, 4) condition (Panel D), SPW(1, 4) and SPW(2, 4) are similar, and then the weight increases to SPW(3, 4) and SPW(4, 4); and in the (3, 4) condition (Panel E), SPW(1, 4) through SPW(3, 4) remain relatively stable, and then jumps to SPW(4, 4). Table 1 summarizes expected patterns and the corresponding results from the human participants. Generally, the expectations were supported.
Second, when impression judgments are made twice, the serial position weights before the first judgment estimated from that judgment should be greater than those serial positions estimated from the second judgment (Prediction 2). For instance, in the (1, 4) condition, the weight for the first serial position estimated from the first judgment, SPW(1, 1), was expected to be greater than the weight for the same position estimated from the second judgment, SPW(1, 4); similarly, for the (2, 4) condition, SPW(1, 2) and SPW(2, 2) were expected to be greater than SPW(1, 4) and SPW(2, 4); and for the (3, 4) condition, SPW(1, 3), SPW(2, 3), and SPW(3, 3) were expected to be greater than SPW(1, 4), SPW(2, 4), and SPW(3, 4) (see Figure 3 ). All expectations were borne out by the data (see Table 1 ).
Order effects.As discussed, recency effects are expected in the present experiment when the consideration of response dependency is not relevant. In particular, a weak to moderate recency effect is expected because of memory decay for the stimuli whose context is the same as the judgment context (i.e., when there is no intervening judgment). When the judgment and the stimulus contexts differ, however, the effect of the context difference may override that of the memory decay. Therefore, a recency effect should be observable in which stimuli share the same context representation as the judgment but not in which they have different context representations. A series of planned contrasts generally supported the expectation ( Table 2 ).
DiscussionThe human judgments were consistent with the TPM predictions. Although the changes in the human data (see Figure 4 ) seem somewhat sharper than those in the simulations (see Figure 3 ), this is because context vectors were randomly generated. It is possible to generate context vectors so that the psychological distinction between pre- and postjudgment is more pronounced. This would have produced sharper changes as observed in the human data. All in all, the high correlation between the human and simulation results (.88 overall) is impressive given that no parameter fitting was performed.
Furthermore, other TPM predictions were also supported. Human group impression judgments tend to be approximately additive under the final responding condition as a number of researchers reported in the person impression judgment (e.g., Anderson, 1981 ). There is a variety of time-dependent properties in group impression judgments as predicted by TPM. Temporal dynamics of impression formation have been classified into two general types, primacy and recency, but this simple classification masks its complexity. Small recency effects from memory decay as well as large ones from response dependency (i.e., change in context representation) may occur in impression formation.
Once formed, group impressions have often been assumed to persist. The static schema concept and Lippmann's (1922) "picture in the head" metaphor of stereotypes seem to suggest the durability of impressions about social groups. Although the process of impression change has been a long-standing concern (e.g., Allport's, 1954 , contact hypothesis; see Pettigrew, 1998 ), Rothbart was well justified in commenting in 1981 that "although an understanding of how beliefs can be disconfirmed is fundamental for the development of an adequate theory of beliefs, we know very little about this problem" (p. 176).
Since then, a number of studies have been conducted to examine how group impressions change and evolve as new information is encountered, mainly in two experimental paradigms. One is concerned with stereotype change (e.g., R. Weber & Crocker, 1983 ), in which changes in a stereotypic impression about a social group (typically culturally recognized as a social category) are examined when people are presented with stereotype-inconsistent information. The second paradigm may be called group differentiation, in which impressions of two contrasting groups are experimentally created, and subsequent change processes examined with additional information (e.g., Krueger & Rothbart, 1990 ).
Stereotype Change Basic Findings of Stereotype ChangeWeber and Crocker (1983) conducted a seminal work on stereotype change. In Experiment 1, information inconsistent with the prior impression about an occupational group (corporate lawyers or librarians) was presented to participants by describing a large number of group members who were each ascribed three characteristics that are consistent, inconsistent, or irrelevant to the prior impression about the group (i.e., stereotype). The participants then evaluated the occupational group on various stereotype-relevant trait dimensions. In all conditions, one third of the information was inconsistent, one sixth was consistent, and one half was irrelevant. The amount, but not the proportion, of inconsistent information was manipulated by presenting either 6 members or 30 members. However, within each condition, the pattern of distribution of stereotype inconsistent information was also manipulated. In the concentrated condition, the stereotype inconsistent information was concentrated in one third of the members; in the dispersed condition, it was dispersed across all members. They also included a control condition in which participants simply judged occupational groups without additional information.
Weber and Crocker found that, generally, stereotype-inconsistent information changes group impressions. First, the occupational groups were judged less stereotypically in all the experimental conditions in which inconsistent information was presented than in the control condition in which no information was given. Heit (1994) observed a similar effect of the amount of information in his Experiments 3 and 4. This clearly suggests that people are responsive to information that contradicts their prior group impression at least in the experimental context (Finding 1). Second, a greater amount of inconsistent information, nonetheless, tends to change the prior impression more (Finding 2). Although this effect was reliable only when the inconsistent information was dispersed across individuals in Experiment 1 of R. Weber and Crocker (1983) , it was found even in the concentrated condition in Experiment 2.
Third, not only the amount but also the pattern of stereotype-inconsistent information was found to influence the extent of stereotype change. A greater stereotype change was observed when stereotype-inconsistent information was dispersed across all group members than when concentrated to a minority (Finding 3). Although this tendency was reliably present when a large amount of information was given (30 members), it was weaker when only 6 group members were described. This finding, a greater stereotype change in the dispersed than in the concentrated condition, has been replicated by Johnston, Hewstone, et al. even with a relatively small amount of information (e.g., Johnston & Hewstone, 1992 ; Johnston, Hewstone, Pendry, & Frankish, 1994 ; for a review, see Hewstone, 1994 ; also Hantzi, 1995 ).
The effects of dispersed versus concentrated manipulation is apparently mediated by the perceived typicality of stereotype-disconfirming group members. When the typicality rating of the stereotype-disconfirming members was statistically controlled, the effect of information pattern became nonsignificant (Experiment 1 of Johnston & Hewstone, 1992 ; Hantzi, 1995 ). This explanation has been corroborated by two other findings in the literature. First, both R. Weber and Crocker's (1983) Experiment 3 and Rothbart and Lewis's (1988) Experiment 3 showed that the effect of typical examples on group impressions was greater than that of atypical examples (black, medium-income lawyers vs. white, high-income lawyers in Weber and Crocker; high-, medium-, and low-typicality fraternity members in Rothbart and Lewis). In addition, Rothbart and Lewis's Experiments 1 and 2 showed that people tended to overestimate the frequency of pairing of prototypical examples of a category (e.g., typical triangles) with a feature that is irrelevant to the definition of the category (e.g., the color of a triangle) compared with the case in which atypical examples of the category were paired with the feature. This latter finding was interpreted as showing that typical examples are weighted more than atypical examples, an insight to which we return later.
Modeling Stereotype ChangeThese findings have been interpreted in terms of three different models of belief change in this literature ( Hewstone, 1994 ; R. Weber & Crocker, 1983 ). The bookkeeping model ( Rothbart, 1981 ) implies a gradual change of beliefs, indicating a step-by-step updating of group impressions as new information is encountered. By contrast, the conversion model ( Rothbart, 1981 ) suggests a sudden alteration of a group impression based on the information about a group member that dramatically disconfirms the prior impression. Finally, the subtyping model postulates that information inconsistent with the prior impression tends to be "subtyped" as exceptions to the rule. This fencing off of inconsistent information, which Allport (1954) called the "re-fencing device" (p. 32), leads to a relatively conservative change of the prior impression if any.
These models, however, cannot provide a comprehensive explanation of the findings. The bookkeeping model can explain the first finding, a change of stereotype when stereotype-inconsistent information is presented relative to when no additional information is given. R. Weber and Crocker (1983) argued that the bookkeeping model cannot account for the second finding, a greater change when there is a greater amount of inconsistent information while the proportion of inconsistent to consistent information remains the same. The subtyping model can explain a greater effect of dispersed, as opposed to concentrated, stereotype-inconsistent information on stereotype change. When inconsistent information is concentrated in a minority of group members, the minority is likely to be subtyped as an atypical subgroup within the stereotyped group. The fencing off of the subtype would reduce the impact of inconsistent information on the stereotype. It has been argued that this finding contradicts the bookkeeping and conversion models (e.g., R. Weber & Crocker, 1983 ).
TPM analysis of stereotype change.
The TPM provides a general explanation of the findings, in which the person representation plays a central role. On the basis of
Fiske and Neuberg's (1990)
and
Brewer's (1988)
theories of stereotyping, it is assumed that when an event pertaining to a group member is consistent with the stereotype, the person is not individuated, so that the activation level of the units for the person aspect remains at the resting level (it remains at 1/√
N
, or the vector for the person aspect is
Put mathematically, when the
j
th person,
P
j
(and event,
E
ij
1
), is encountered about a group (
G
i
), we assume that

This ratio indicates the similarity of old exemplars to the new exemplar relative to the similarity of the old exemplars with any exemplar. The typicality index, t, is used to construct the person representation,
In other words, a stereotype-consistent member's representation is constructed so that it does not deviate markedly from the resting state,

where
s
0
is the prior stereotype,
s
C
and
s
I
are the scale values for stereotype-consistent and stereotype-inconsistent information, and t = (
Equation 23 makes several points clear. First, when there are sufficient stereotype-inconsistent members, the stereotype likely changes from the original level, s 0 , closer toward the scale value of the stereotype-inconsistent information, s I . This is because the weight for stereotype-inconsistent information becomes greater when J I becomes larger, a prediction consistent with Finding 1. Second, the amount of stereotype change can increase as the amount of stereotype-inconsistent information increases even if the ratio of J I to J C remains constant provided that J C / J I is sufficiently small (especially J C / J I < t). This can be seen by dividing both the denominator and numerator of the right-hand side of Equation 23 by J I . When J I is very large, the effect of s 0 is negligible. This is consistent with Finding 2.
In addition, Equation 23 suggests that the new information should be approximately additively combined with the prior impression. Heit's (1994) findings are consistent with this implication. His experiments examined the effect of category-consistent and category-inconsistent information on the meaning of a category such as "shyness." He systematically manipulated the probability that a "shy" person is described by behaviors such as "does not attend parties often (consistent)" and "attends parties often (inconsistent)" or a "not shy" person is characterized by these behaviors in his stimulus about people in city W. Participants were asked to estimate the probability of consistent pairing (i.e., a shy person not attending parties and a not-shy person attending parties) and inconsistent pairing (i.e., a shy person attending parties and a not-shy person not attending parties). The probability estimate for consistent pairings was always greater than that for inconsistent pairings, showing the effect of prior expectation. Further, the greater the probability of inconsistent pairing in the stimulus, the smaller was the estimated probability of consistent pairing, suggesting a change in category meaning. In addition, Heit showed that the effect of the prior expectation remained the same regardless of the probability of consistent and inconsistent pairing in the stimulus. This last finding implies that the effects of prior impressions and new information are additively combined. Hayes and Taplin (1992) also reported similar findings with children.
Equation 23
also implies that the amount of stereotype change is mediated by the extent to which a stereotype-inconsistent group member is individuated. Note that the relative weight for a stereotype-consistent and stereotype-inconsistent member is 1/[1 +
J
C
+ t
J
I
] and t/[1 +
J
C
+ t
J
I
], respectively. This means that a stereotype-inconsistent member's information is weighted less than a stereotype-consistent member's information. Recall that t = (
This implication of TPM is consistent with the theoretical insight expressed by Rothbart and John (1985) as well as Hewstone and Brown (1986 ; see, e.g., Scarberry, Ratcliff, Lord, Lanicek, & Desforges, 1997 , for evidence). It is also consistent with Rothbart, Sriram, and Davis-Stitt's (1996) finding that typical members are more likely retrieved by cuing memory with a group label than atypical members. Given that an atypical group member is more likely individuated than a typical group member, the model is consistent with the findings that typical group members are more likely to change stereotypes than atypical group members. Finally, it is consistent with the finding that a stereotype changes more when inconsistent information is dispersed across a number of individuals than when concentrated in a few individuals. This is because dramatically atypical individuals are likely individuated or subtyped.
Gurwitz and Dodge (1977) finding and TPM.One puzzling finding was reported by Gurwitz and Dodge (1977) , whose result appears to contradict the wealth of empirical research in stereotype change. Their experiment was probably the first to examine the effect of dispersed versus concentrated stereotype-inconsistent information on stereotypes. They presented information about three sorority women who were friends and shared a room together. In the concentrated condition, one of the three women had all stereotype-inconsistent information, whereas all three women had some stereotype-inconsistent information in the dispersed condition. In both conditions, however, the total amount of stereotype-inconsistent information remained the same. Their participants were then asked to make impression judgments about another woman who was described as a friend of the three women, who shared the room with them, and who also belonged to the same sorority. Their findings suggested that the target person was judged as less stereotypical in the concentrated than in the dispersed condition.
Although the Gurwitz-Dodge finding seems inconsistent with the other findings, TPM suggests that a friend of only mildly stereotype-inconsistent group members (dispersed condition) can be evaluated to be more stereotypical than a friend of a radically stereotype-inconsistent group member (concentrated condition), apparently showing less of a stereotype change in the dispersed condition. The judgment about an individual member of a group can be modeled by TPM
Appendix
:

where t
T
represents the typicality of the target person and
s
(
I, T
) is the similarity between the target and stereotype-inconsistent members. Note that
J
(
G
i
, P
T
) is a judgment about a target person, who is a member of group
i
.
Equation 24 implies that if the target is similar to stereotype-inconsistent members, that is, s ( I, T ) is large, then the judgment about the target is influenced more by the stereotype-inconsistent members. This implication of TPM can explain the Gurwitz-Dodge finding. Furthermore, to the extent that the typicality of the target person is low (i.e., t T is small), the effect of the stereotype should be relatively small. This latter implication is consistent with Fiske and Neuberg's continuum model (1990).
Simulating the Stereotype Change FindingsJohnston and Hewstone's (1992) conditions were simulated using Mathematica on a Silicon Graphics Indy Workstation. In their experiment, participants were shown eight members of a stereotyped group, each of whom was described by six pieces of information. There were 12, 12, and 24 pieces of stereotype-consistent, stereotype-inconsistent, and stereotype-irrelevant information, respectively. In the concentrated condition, the stereotype-inconsistent information was concentrated in two members, but in the dispersed condition, it was distributed across six members (two pieces each). In the latter condition, two members were ascribed three pieces of stereotype-consistent information. Their third stimulus condition was excluded from the simulation for simplicity.
The attention parameter a declined from 1, .8, .64, and so on (i.e., 8 k -1 , where k = 1 to 8) from the first to the eighth stimulus for each person. The forgetting parameter b was set at .95. For each condition, 20 simulations were run. Impression judgments about the group and the individual person were collected after the learning phase, the first change phase, and the second change phase. The means are reported in Table 3 ( see Appendix for details).
The means show that the stereotype was successfully learned by the TPM after the learning phase. The impression judgments using the group and person cues were both .81, indicating a high level of stereotyping (1 = perfectly stereotypical ). When information that is inconsistent with the group stereotype was presented, however, group impression judgments clearly changed. The group impression judgments after the first change phase were less stereotypical than those before it. The impression judgments were even less stereotypical after the second change phase than immediately after the first change phase. Clearly, a greater amount of stereotype-inconsistent information changes the stereotype.
The effects of concentrated versus dispersed stimulus configuration were successfully simulated in the present simulation. When the group was the target as in the typical experimental paradigm, the impression judgments were more stereotypical in the concentrated condition than in the dispersed condition, suggesting a greater stereotype change in the dispersed condition. By contrast, when the person was the target, as in Gurwitz and Dodge, there appears to be a greater stereotype change in the concentrated condition than in the dispersed condition.
CommentsThe simulation results showed that TPM can reproduce both the basic findings and Gurwitz and Dodge's finding, showing its capacity to provide a unified account. Central in this is the process of individuation, a process with an ironic implication. On the one hand, as noted by Fiske et al. (e.g., Fiske & Neuberg, 1990 ), the individuated person is less likely stereotyped. On the other hand, as pointed out by Rothbart et al. (e.g., Rothbart & John, 1985 ), the individuated person less likely affects the group impression: That is, less stereotype change is likely. In other words, individuation may be good for the individual but not necessarily good for the group (see Yzerbyt, Coull, & Rocher, 1999 ). Nevertheless, this does not mean that individuation should be avoided to change an undesirable stereotype. As Rothbart and John (1985) pointed out and TPM suggests, although radically stereotype-inconsistent exemplars may have only small effects on stereotypes (as shown in the concentrated condition), they too could eventually effect a stereotype change if cumulated over time. It would just take more exemplars to attain the same amount of stereotype change when group members are individuated than when they are not.
Finally, in accounting for Gurwitz and Dodge's (1977) finding, we made use of relational information, that is, information about the interpersonal relationship between the stimulus persons and the person about whom impression judgments were required. We assumed that a friend of stimulus persons would be represented in a way that resembles the representations of the stimulus persons. The effect of information about interpersonal relationships on group impressions should be examined more systematically.
Group DifferentiationTajfel and Wilkes's (1963) classical research on the accentuation phenomenon provided the original impetus to this line of research. Participants in their study were shown a series of lines and asked to estimate their lengths. Tajfel and Wilkes then compared estimated lengths of the lines that were adjacent to each other in length. In some conditions, shorter lines and longer lines were classified into different categories, whereas in other conditions there was no meaningful relation between classification and line length. In the former conditions, the difference between the estimated lengths of adjacent lines was exaggerated when the two lines were classified into two different categories, although this accentuation of interclass difference was not observed when the classification did not meaningfully correlate with line length. A number of studies successfully replicated this finding in the past (e.g., Eiser, 1971 ; McGarty & Penny, 1988 ; see the latter for a review).
Tajfel and Wilkes's (1963) original studies examined people's evaluation of individual stimuli that were classified into categories. As noted by Krueger, Rothbart, et al. ( Krueger, 1991 , 1992 ; Krueger & Rothbart, 1988 ; Krueger, Rothbart, & Sriram, 1989 ), this procedure cannot distinguish two sources of the interclass accentuation. One is a contrast effect, in which the perception of an individual stimulus is affected by its membership with one of the two contrasting categories. The other is an accentuation effect, in which a difference between the central tendencies of the differentiated categories is accentuated over and beyond what is expected only from the contrast effect. In this article, we are concerned with this latter phenomenon as it pertains to the judgments of central tendencies, or group impressions.
Basic Findings of Group DifferentiationKrueger and Rothbart's (1990) Experiment 2 provides a prototypical example. Participants were shown a series of personality trait adjectives (pretested to determine their favorability) that were classified into two contrasting groups (focal and context groups) and rated the favorability of each adjective as well as the overall mean favorability of the two groups. In the learning phase, the distributions of the stimulus traits in two groups did not overlap in their favorability. The mean favorability of the context group was, relative to that of the focal group, higher in one condition and lower in the other condition. In the change phase, additional trait adjectives were presented for the two groups. Although the actual means of the two distributions remained constant, the variance of the focal group was made greater than before, so that there was now some overlap between the focal and context groups. Krueger and Rothbart examined the estimated means of the focal and context groups before and after the change phase while controlling for the average of the favorability judgments for the individual traits. They found that the estimated mean of the focal group moved away from the mean of the context group, although there was no change in either the actual mean of the trait adjectives or the average of the rated favorability of individual traits. This finding was largely replicated in their Experiment 3. Krueger and Rothbart's (1990) Experiment 1 using traits, as well as Krueger et al. (1989) and Krueger (1991) using numbers as stimuli, showed a comparable effect when there was a real change in the central tendency of the distribution.
Modeling Group DifferentiationIn line with the suggestion made by Krueger, Rothbart, et al., TPM accounts for the basic group differentiation phenomenon by extending the analysis for the stereotype change. In modeling the stereotype change phenomena, the typicality of a person was determined with regard to the single group for which the person was a member. In the group differentiation paradigm, in which two groups are contrasted, however, we assume that an exemplar's typicality is determined not only by the exemplar's similarity with its group's representation but also by its dissimilarity from the representation of the group to which its group is contrasted ( Campbell, 1958 ; Turner, 1987 ; also see Ford & Stangor, 1992 ). As in the stereotype change paradigm, we suggest that the person representation is constructed as a function of the typicality, but it is defined within the frame of reference set by the two contrasting groups.
This can be modeled mathematically. Suppose that two groups, Group 1 (
G
1
) and Group 2 (
G
2
), are contrasted with each other, and an event pertaining to Group 1,
E
1
jk
, is encoded as

The typicality of
G
2
's exemplar
E
2
jk
(
Equation 25
is closely related to the concept of metacontrast (
Campbell, 1958
;
Turner, 1987
). Match (
To show that TPM with these additional assumptions can account for the group differentiation phenomena, Krueger and Rothbart's (1990) Experiment 2 was simulated. The results ( Table 4 ) showed that the mean judgments of the focal group moved away from the context group mean in the change phase relative to the learning phase. In the condition in which the context group mean was lower than the focal group mean (Condition 1), the simulated judgment mean for the focal group became larger. Similarly, in the condition in which the context group mean was higher than the focal group mean (Condition 2), the simulated judgment mean for the focal group became smaller. For each condition, a two-way repeated measures ANOVA was conducted with the judgment as the dependent variable and phase (learning vs. change) and group (focal vs. context) as independent variables. As expected, the Phase × Group interaction effect was significant, F (1, 19) = 6.24, p = .022, and F (1, 19) = 26.67, p < .001, for Conditions 1 and 2, respectively.
DiscussionIn line with the current theories of stereotyping and group differentiation (e.g., Fiske & Neuberg, 1990 ; Rothbart & John, 1985 ), we postulated that the evaluation of typicality drives the encoding of the individual member. Central in this formulation was the importance of various relational information, that is, the assumption that social perceivers make use of not only the information about the relationship between the group and an individual exemplar (group-person relationship) but also the information about the relationship between two groups (intergroup relationship) in computing the typicality of an individual exemplar. This assumption enabled the TPM to explain the empirical phenomena of stereotype change and group differentiation.
Group impressions are dynamic configurations. They represent social perceivers' flexibly structured and constantly evolving understandings about social groups. The empirical findings reviewed and the experiment reported here underscore the dynamics of group impressions. Group impressions exhibit a number of time-dependent properties and evolve over time in interaction with the information environment. Despite the implicit assumption that group impression formation and change are two separate phenomena, the process underlying both formation and change of group impressions could be a single, learning process.
Group impressions are configural too. The configural use of features is clearly important in learning group categories. A variety of research on the use of context information corroborates the importance of the configural encoding of groups, group members, social events pertaining to them, and context in which the events are said to occur. The use of relational information about person-group and intergroup relationships also underlines the significance of configuration, that is, the structure of the social information based on which group impressions are formed. The TPM provides a unified framework for theorizing about group impressions as dynamic configurations.
Strengths and Weaknesses of the TPM as a Framework for Group Impression Formation and ChangeThe TPM not only provides a unified framework for the diverse array of empirical findings but also affords an insight about the interpretation of algebraic models postulated in social cognition. Algebraic models are often regarded literally as describing the psychological process in terms of algebraic operations and, therefore, as a description of the controlled, deliberative processing of information (e.g., Fazio, 1990 ; Fiske & Neuberg, 1990 ). However, as shown here, both the weighted averaging model and the context model can fall out of the current connectionist model as a natural consequence of the memory and judgment process. This implies, first, that algebraic models should be construed as computational models that simply describe input-output relations rather than algorithmic models that describe the psychological process ( Kashima & Kerekes, 1994 ; E. U. Weber, Goldstein, & Busemeyer, 1991 ). In other words, the algebraic models should not be interpreted literally but may be seen as describing macrolevel regularities that emerge from microlevel psychological processes, which may not be effortful at all.
Despite these strengths, TPM has its weaknesses. In many socialcognitive theories (e.g., Wyer & Carlston, 1979 ), the central processing unit (CPU) has been implicitly or explicitly postulated, whose function is to execute procedural knowledge to manipulate declarative knowledge. In connectionist networks, a set of simple processing units, whether localist or distributed, collectively process information. This removed the necessity for the CPU, which smacks of a homunculus in the head. This feature of connectionism may be regarded as an advantage. However, TPM cannot do away with a control mechanism. For example, recall that order effects were explained partly by the attentional parameter and that the individuation process involved in stereotype change and group differentiation was explained in terms of the construction of a person representation. Some mechanism is needed to control the attentional parameter and the construction process for a person representation. This mechanism does not have to be a single CPU but may have a parallel distributed architecture.
A number of areas are yet to be incorporated into the present framework. For instance, more detailed discussion is necessary about the process of stereotyping and individuation in which an individual is the target of judgment (e.g., Fiske & Neuberg, 1990 ), the perception of an individual's behavior (e.g., Manis, Biernat, & Nelson, 1991 ), the judgment of group variability (e.g., Ostrom & Sedikides, 1992 ), memory about groups (e.g., Rothbart, Evans, & Fulero, 1979 ; Rothbart, Fulero, Jensen, Howard, & Birrell, 1978 ), and the relation between memory and judgment (e.g., Hastie & Park, 1986 ; Srull & Wyer, 1989 ). Because of this, we did not discuss some studies that examined the process of group impression formation when the target was a new group consisting of members of a stereotyped social category (e.g., Dijksterhuis & van Knippenberg, 1995 ; Dijksterhuis, van Knippenberg, Kruglanski, & Schaper, 1996 ). We have not addressed some of the nonlinear processes associated with group impression formation and change. People construct emergent properties when two pieces of contradictory information are integrated (e.g., Asch & Zukier, 1984 ; Hastie, Schroeder, & Weber, 1990 ; Kunda, Miller, & Claire, 1990 ). Although TPM can address this issue by adopting a strategy similar to Smith and DeCoster's (1998a , 1998b) , full implications of this type of cognitive activity are still outside its scope.
Advantages of Theoretical ReductionWe attempted a theoretical reduction of algebraic models to the TPM, and there are clear advantages. Because old theories are not falsified, they can be regarded as simpler approximations to more complex descriptions provided by a new theory. Old theories can be retained as a useful tool for investigation and a practical approximation. Old theories can be interpreted in a new light, and new theoretical insights may be gained. The use of the weighted averaging model in the current article provides an illustration. The weighted averaging model was shown to be derivable from TPM. The weight and scale value concepts in the weighted averaging model allowed us to examine the time-dependent properties of group impression formation. Furthermore, we could use a model similar to the weighted averaging model ( Equations 23 , and 24 ) to shed light on the stereotype change literature.
A theoretical reduction shows a cumulative and dynamic nature of the scientific enterprise of social psychology. Echoing Massaro (1990 ; also Massaro & Cowan, 1993 ), we believe connectionist approaches underscore a continuity in psychological theorizing (also see Kashima & Kerekes, 1994 ; Read et al., 1996 ) rather than a radical departure. If Asch's (1946 , 1952) foundational insight was to conceptualize impressions as dynamic configurations, the evolution of socialcognitive theories in the past two decades since Person Memory: The Cognitive Basis of Social Perception by Hastie et al. (1980) may be seen as a pursuit of an increasingly dynamic theory of socialcognitive processes. The upsurge of interest in connectionism may be a continuation of this trend. The current formulation attempted to show that at least one connectionist-type model, TPM, can describe both dynamic and Gestalt-like configural properties of group impression formation and change, making a contribution to the socialcognitive research tradition ( Laudan, 1977 ; or research program as in Lakatos, 1970 ).
Connectionism as a Research TraditionIn providing support for TPM in particular, we showed the utility of connectionism in general. Connectionism too is a research tradition whose core consists of a set of theoretical principles. This way, connectionism provides a very general framework in which to develop more specific architectures. Just as a number of specific models of person representation can be generated within the framework of associative memory (e.g., Srull & Wyer, 1989 ), a specific connectionist architecture such as TPM can generate a number of more specific models. These models can be competitively tested and falsified (e.g., Equations 22 , and 25 ). We showed that the current form of TPM can explain existing data and generate new, empirically supported predictions.
In so doing, we also showed that the current forms of other connectionist architectures have some difficulty explaining the data we provided. At one level, the current forms of those architectures were falsified. However, it does not mean that those architectures cannot be modified to account for the data. Just as impressions are dynamic configurations, so too are theories. They may very well evolve in the face of empirical challenges and develop some novel predictions and possibilities. Connectionism provides a set of conceptual tools with which to theorize about psychological phenomena. With new tools, new possibilities emerge. In this way, the research tradition of connectionism and its particular architectures coevolve with empirical investigations.
Implications for Stereotype Formation and MaintenanceOne reason for the current interest in group impression formation and change is its potential implications for real-life stereotypes. Nevertheless, unprincipled generalizations of laboratory results based on hypothetical groups can always be challenged for their lack of ecological validity. However, a well-developed theory can provide a defensible basis for generalizing laboratory results to the sociocultural milieu. In our assessment, the TPM can provide just such a theoretical basis. The present modeling of group impression formation implies that existing stereotypes likely reflect the distribution of types of events in the probabilistic sociocultural environment, although some aspects of group impressions may have a genetic, modular basis ( Hirschfeld, 1996 ). A group is likely perceived in a positive light when the preponderance of direct or indirect hearsay information is relatively positive or vice versa. However, when a relatively small amount of information is involved, stereotypes may not reflect the information distribution in the sociocultural environment. The "distinctiveness-based" illusory correlation may form the basis of some stereotypes.
The current formulation sheds light on the process of stereotype maintenance. The model suggests that group impressions can change in the long run insofar as stereotype-inconsistent information continues to be encoded and stored. Nevertheless, the current model assumes, but does not address, the encoding process fully (especially feature encoding process). As von Hippel, Sekaquaptewa, and Vargas (1995) emphasized, the perceptual encoding process may, in fact, be responsible for the persistence of stereotypes. To this extent, the stability of stereotypes may stem only in part from the rigidity of the cognitive system despite the picture-in-the-head metaphor enshrined by Lippman (1922) . Potential sources of stereotype maintenance may be more affective and motivational (see Forgas, 1992 ; Kunda, 1990 ). For various reasons such as right-wing authoritarianism ( Adorno, Frenkel-Brunswik, Levenson, & Sanford, 1950 ; see Altemeyer, 1998 ; Pratto, 1999 ), people may engage in motivated reasoning, so that they can maintain once-formed stereotypes. As Kunda and Oleson (1995) noted, such processes of justification of established impressions may be a significant source of stereotype maintenance. Hoffman and Hurst (1990) argued that gender stereotypes (and probably stereotypes in general) are based not only on observed covariation between group categories and role occupancy, as argued by Eagly and Steffen (1984) , but also on justifications that people make about the group difference (see Jost & Banaji, 1994 , for a related point).
Another source of the stability of stereotypes may be the information environment in which the cognitive system resides, that is, the sociocultural environment. In particular, a social stereotype may be sustained because a social observer's environment, from which stereotype-relevant information is learned, provides a steady flow of a similar mix of stereotype-consistent and stereotype-inconsistent information. As Oakes et al. (1994) argued, the intergroup relationships that actually exist between the perceiver's ingroups and outgroups may provide a strong basis for stereotypes. Alternatively, under some circumstances, as Jussim and Fleming (1996) noted, a stereotypical expectation may act as a self-fulfilling prophecy, bringing the social reality in line with the stereotype (e.g., Rosenthal & Jacobson, 1968 ). Mackie and Smith's (1998) review shows that stereotyping can be conceptualized within an integrative framework of intergroup relationships.
As Bartlett (1932) and Allport and Postman (1947) noted long ago, culturally shared stereotypes may persist as they are told and retold in informal communication. In keeping with this, Kashima (2000b) showed that stereotype-consistent information tends to be retained better than stereotype-inconsistent information as a story embedding stereotype-relevant information is transmitted from one person to the next. Culturally structured explanations and justifications are likely to play a significant role in stereotype maintenance in conjunction with the intergroup relationships ( McGarty, 1999 ). What is required is a social psychology of cultural dynamics ( Kashima, 2000a , 2000c ), that is, a systematic investigation of the dynamics involved in the sociocultural embedding of stereotypes as dynamic configurations.
All in all, connectionist modeling of laboratory-based group impressions can provide insights into the socialpsychological process involving existing social stereotypes. This is possible because strong theories can lay a solid foundation for laboratory phenomena, and the credibility of the theory can be used as a basis for principled generalization of theoretical insights to the "real-life" phenomena in sociocultural context.
Concluding RemarksIn 1952, Asch described the state of knowledge about group impression formation and change:
We know little today of the question at issue, mainly because of our failure to study directly the process of impression-forming. Therefore we are not in a position to answer certain first questions such as: What are the organizational properties of group impressions? In what respects do they differ among individuals? What conditions determine their rigidity and lability? (p. 235)
We have indeed made some headway but are now only beginning to answer these questions in a theoretically principled manner.
There are five simplifying assumptions. First, there is some influence of prior memory. When a novel group
i
is judged without new information, the judgment,
J
(
G
i
)
0
= Match(



When
J
is constant,
J
(
G
i
)
J
approaches (
s
0
+
Js
p
)/(1 +
J
) as
p
approaches 1. When
J
becomes very large,
J
(
G
i
)
J
approaches
ps
p
+ (1 -
p
)
s
n
, that is, the average of the positive and negative information. When
J
is relatively small and
p
is constant, the change of
J
(
G
i
)
J
when
J
increases by 1 is

There are five simplifying assumptions. First, there is a stereotype about group
i
. That is, when group
i
is judged in terms of
Under these assumptions, the memory representation after the
J
th group member is

When the target is a group, the representation,

When the target is an individual, the accessing cues include the target person representation,

In all simulations, relevant tensors and vectors were generated as follows.
To simulate the main experiment, high and low stimuli for a related topic were
For the stereotype change simulation, 80 stereotype-consistent, 10 stereotype-inconsistent, and 10 stereotype-irrelevant exemplars were presented in the learning phase. The specifications for stereotype-consistent, stereotype-inconsistent, and stereotype-irrelevant information were
For the group differentiation simulation, event representations,

It should be noted that Asch's sense of dynamics also implied that meaning of a stimulus changes as a function of preexisting memory and concurrent stimuli. This sense of dynamics as meaning change is not directly handled by the current model. To capture this, a model of encoding processes is necessary, and it falls outside the current scope of TPM.
Although another type of mental representations postulated for social groups is an associative network model (e.g.,
Stangor & Lange, 1994
), that type of model is not suitable for modeling the averaging phenomenon. For a further discussion, see
Kashima and Kerekes (1994)
.
In this article, for expository simplicity, attention is assumed not to differ for different aspects. However, it is possible to make the model more general, so that attention varies across aspects or even for each exemplar's different aspects.
In fact, in most impression formation experiments, stimuli are constructed or selected so that they clearly mark higher or lower ends of a given bipolar scale (e.g., likability). For example, personality trait words may be selected for their clear evaluative connotation; behavioral descriptions may be selected on the basis of their normative ratings in a pilot study so that some clearly indicate one trait and others, its opposite.
Hogarth and Einhorn (1992)
proposed a model that makes predictions similar to the current model under some conditions. However, it should be noted that their model cannot account for the results of
Smith and Zárate (1990)
. For a discussion about other inadequacies of their model, see
Kashima and Kerekes (1994)
.
This statement holds provided that the endpoints of judgment scales and scaling parameters are the same and the information environment remains stable at least probabilistically. Interestingly, this implication of the TPM is broadly in agreement with Gigerenzer and Hoffrage's
(1995)
contention that people's probabilistic judgments are often consistent with the Bayesian normative criterion when information is presented in frequency rather than in probability. The TPM, like other connectionist models, retains information about the frequency of events.
Busemeyer and Myung (1988)
also discussed noninterference, but we do not address it because it is not directly relevant to the current discussion.
Table 1. Tests of Response Dependency: Expected Patterns of the Serial Position Weights (SPW) for the (1, 4), (2, 4), and (3, 4) Responding Conditions
1
2
3
4
5
6
7
This research was supported by an Australian Research Council grant. We acknowledge Paul Polidori for his programming and Paul Clifford for conducting the experiment reported here. We thank Craig McGarty and Michael Platow for their comments on earlier versions of the article.
Correspondence may be addressed to
Electronic mail may be sent to y.kashima@psych.unimelb.edu.au
Received:
Revised: January 31, 2000
Accepted: February 23, 2000

Table 2. Recency Effects for the Stimuli That Share the Same Stimulus and Judgment Context and Those That Do Not Share the Same Context

Table 3. Mean Simulated Group Impression Judgments for the Stereotype Change Simulation

Table 4. Means of the Simulated Mean Judgments for the Group Differentiation

Table 5. Distribution of Events Used in the Simulation for the Group Differentiation

Figure 1. A schematic diagram of a tensor product net with four aspects representing group, person, event, and context.

Figure 2. The probability of classifying exemplars into category A, observed in
Smith and Zárate (1990)
, and the simulation results of the tensor product model. Constructed from the data reported in
Table 3
.1 in
Kashima et al. (1998
, p. 84). The dashed line reports the observed probability in the exemplar-only condition of
Smith and Zárate (1990)
and the solid line represents the prototype + exemplar simulation results, the condition theorized to be analogous to the exemplar-only condition in the empirical study. The positions on the
x
-axis are different exemplars used in the experiment: a1 to a5 are exemplars to be classified into category A; b1 to b4 are exemplars to be classified into category B; and n1 to n7 are new exemplars.

Figure 3. The means of the serial position weights based on the simulation of the tensor product model in the final, sequential, (1, 4), (2, 4), and (3, 4) conditions.

Figure 4. The means of the serial position weights in the final, sequential, (1, 4), (2, 4), and (3, 4) conditions.