Psychological Review	© 2000 by the American Psychological Association
October 2000 Vol. 107, No. 4, 914-942	For personal use only--not for distribution.

Group Impressions as Dynamic Configurations
The Tensor Product Model of Group Impression Formation and Change

Yoshihisa Kashima
School of Psychological Science La Trobe University
Jodie Woolcock
School of Mathematical Science La Trobe University
Emiko S. Kashima
Division of Psychology Swinburne University of Technology
ABSTRACT

Group impressions are dynamic configurations. The tensor product model (TPM), a connectionist model of memory and learning, is used to describe the process of group impression formation and change, emphasizing the structured and contextualized nature of group impressions and the dynamic evolution of group impressions over time. TPM is first shown to be consistent with algebraic models of social judgment (the weighted averaging model; N. Anderson, 1981 ) and exemplar-based social category learning (the context model; E. R. Smith & M. A. Zárate, 1992 ), providing a theoretical reduction of the algebraic models to the present connectionist framework. TPM is then shown to describe a common process that underlies both formation and change of group impressions despite the often-made assumption that they constitute different psychological processes. In particular, various time-dependent properties of both group impression formation (e.g., time variability, response dependency, and order effects in impression judgments) and change (e.g., stereotype change and group accentuation) are explained, demonstrating a hidden unity beneath the diverse array of empirical findings. Implications of the model for conceptualizing stereotype formation and change are discussed.

Ever since Asch's (1946) ground-breaking research, person impression formation has been a major topic of inquiry in social psychology for more than half a century. Despite Asch's (1952) interest, the topic of group impression formation and change began to attract empirical attention relatively recently (e.g., Hamilton & Gifford, 1976 ). By research on group impression, we mean a class of studies in which various information about individual members of social groups is presented, and the effects of the information on people's judgments and evaluations about the groups are examined. When participants have little prior information about a target group, this type of research examines the formation of group impressions. By contrast, a group impression change occurs when participants' impressions about a target group, about which participants have some prior expectancies (e.g., stereotypes), evolve as a result of the information given.

The research on group impression formation and change now constitutes a substantial literature in which a number of robust empirical phenomena have been identified (for reviews, see Hamilton & Sherman, 1994 ; Hamilton & Sherman, 1996 ; Hilton & von Hippel, 1996 ). However, theoretical understanding of the phenomena has been hampered by the lack of a coherent theoretical framework that describes the processing of information about social groups. Hilton and von Hippel (1996) lamented, "There has been little effort directed at specifying the details of various representational models" (p. 244). Many empirical phenomena point to the dynamic character of group impressions, that is, the ever-evolving and constantly changing nature of group impressions. Theories of group impressions, however, fall short of capturing this dynamism.

Our main objective here is to present an explicit theory of group impressions that can shed light on their dynamics. We propose a theory of group impression formation and change based on a distributed representational system called the tensor product model (TPM) ( Humphreys, Bain, & Pike, 1989 ; Kashima, 1999 ; Kashima & Kerekes, 1994 ; Kashima, Woolcock, & King, 1998 ; Pike, 1984 ). We then show that this theory can provide an integrative framework in which to explain diverse time-dependent properties of group impression formation and change. It is often assumed that the formation and change of impressions are two separate phenomena: That is, impressions, once formed, become a stable entity (e.g., schema), and change processes involve something different. Contrary to this, we show that the process underlying both formation and change of group impressions could be a single, learning process described by the TPM.

Another objective is a theoretical reduction of algebraic models of social judgment to the TPM. Connectionist models are often said to describe information processing at a microcognitive level. Just as the macrolevel thermodynamic description may be reduced to microlevel statistical mechanics ( Nagel, 1961 ), we seek to reduce macrolevel cognitive theories to a microlevel connectionist description. Smolensky (1988) suggested that connectionism would provide a theoretical reduction of symbol processing theories to a subsymbolic paradigm; we believe our model provides a theoretical reduction of algebraic models to a distributed representational system. Social psychologists often seek a theory replacement, in which an old theory is falsified and replaced by a new theory. However, in a theory reduction, a new theory integrates old theories with lesser generality within a more general framework. We believe there are advantages of theory reduction in social psychology.

Group Impression as Dynamic Configuration

Our impressions about a social group evolve over time. As we learn more about the group and its members, our impressions become more elaborate and complex. This intuition about the dynamic nature of group impression was expressed by Asch (1952 , pp. 234—235) nearly half a century ago:

Our [initial] impressions of groups are often global, corresponding to particularly blunt central qualities. ... Simplified impressions are a first step toward understanding the surroundings and toward establishing clear, meaningful views. ... When conditions permit, initial impressions are corrected and become more articulated in the light of new experiences.

Asch's emphasis on the dynamics of social psychological process permeates his 1952 textbook. The process of impression formation, whether regarding a group or a person, is no exception. ¹

Asch's repudiation of elementarism and theoretical affiliation with the Gestalt tradition are apparent even in this short passage on group impression formation with his allusion to "meaning" and articulation. To his own question of "Is the impression of a group other than the sum of impressions of separate individuals?" (p. 222), Asch responded, "There are group properties that are the mode of interaction between the members. These are neither identical with properties of the individual members nor with properties that exist in some way behind individuals" (p. 226). Group impression was to be understood as an organized whole. To Asch, "impressions" were mental representations that are both dynamic and meaningfully structured or, put simply, dynamic configurations.

Linville, Salovey, and Fischer (1986 ; for similar views, see, e.g., Brewer, Dull, & Lui, 1981 ; Taylor, 1981 ; R. Weber & Crocker, 1983 ) gave a more contemporary expression of a similarly dynamic view of group impression formation.

Social categories evolve from relatively general, undifferentiated structures to more highly differentiated ones. Thus, new instances that do not fit the category are dealt with in part through increasing category differentiation. We assume that category differentiation tends to occur when the perceiver encounters numerous and varied instances of the category, and experiences incentives to distinguish among category members. (p. 166)

As reviewed later, this view of group impressions as evolving, dynamic configurations is well supported by the empirical literature. However, theories of mental representations about social groups have failed to give a comprehensive explanation of the empirical findings.

The concept of schema has often been used to refer to mental representations of social groups ( Fiske & Neuberg, 1990 ; Fiske & Taylor, 1991 ); for a related formulation, see Stangor & Lange, 1994 ). In fact, Asch's contention that group impressions are structured (as in Gestalt) is well reflected in the notion of "group schema." Neisser (1976) defined the concept of "schemata" as what he called cognitive structures, which are "a nonspecific but organized representation of prior experiences" (p. 287). Fiske and Taylor (1991) similarly defined "schema" as "a cognitive structure that represents knowledge about a concept or type of stimulus, including its attributes and the relations among those attributes" (p. 98). Rumelhart (1980) defined a schema as "a data structure for representing the generic concepts stored in memory. ... Inasmuch as a schema underlying a concept stored in memory corresponds to the meaning of that concept, meanings are encoded in terms of the typical or normal situations or events that instantiate that concept" (p. 34).

However, the generally static notion of schema is not suitable for describing the dynamic evolution of impressions, despite some attempts at revising it (e.g., Crocker, Fiske, & Taylor, 1984 , on schema change). Bartlett (1932) , who is credited with having introduced the schema concept to psychology, most clearly expressed this concern.

I strongly dislike the term "schema." It is at once too definite and too sketchy. ... It suggests some persistent, but fragmentary, "form of arrangement," and it does not indicate what is very essential to the whole notion, that the organised mass results of past changes of position and posture are actively doing something all the time; are, so to speak, carried along with us, complete, though developing, from moment to moment. (pp. 200—201)

Interestingly, one schema theorist also reconceptualized the schema concept within a dynamic connectionist framework ( Rumelhart, Smolensky, McClelland, & Hinton, 1986 ).

More recent theorizing about mental representations of social groups moved away from the static conception while retaining the structured, Gestalt-like property. Smith and Zárate (1990 , 1992 ; also see Linville & Fischer, 1993 ) postulated an exemplar theory of mental representations of social groups based on the context model of exemplar-based categorization (e.g., Medin & Schaffer, 1978 ; Nosofsky, 1984 ). Their basic premise is that people represent specific exemplars of a group, including an episode of encountering a member of the group, an inference made from any information given about the group, and hearsay about the group from others. Smith and Zárate assumed that exemplars may vary on multiple dimensions, and categorizations and judgments about exemplars are modeled by an algebraic function of similarities among the exemplars. Furthermore, the overall similarity between two exemplars is assumed to be a multiplicative function of the similarities on the dimensions (reviewed later). As noted by Medin and Schaffer (1978) , the multiplicative similarity function used in the context model embodies its assumption that a category is configurally represented (e.g., as opposed to Reed, 1972 ). An exemplar-based representation takes for granted a potential for change and development of group impressions; clearly, as new exemplars are cumulated, representations should change as well.

Although the exemplar model incorporates both dynamic and configural properties of group impressions, it falls short of explaining some quantitative properties of group impression formation. Smith and Zárate (1992) assumed that when multiple exemplars are retrieved from memory, features are averaged on a dimension. Although this averaging assumption is consistent with the well-known averaging phenomenon in person impression formation (e.g., Anderson, 1968 , 1981 ; for a review, see Kashima & Kerekes, 1994 ), it does not specify the mechanism by which the computation may be accomplished. We explicate a model that explains the averaging phenomenon while retaining the configural nature of group representations postulated by the exemplar model. The weighted averaging model ( Anderson, 1981 , 1982 ) and the context model adopted by Smith and Zárate (1992) are shown to be derivable from a more general connectionist model of memory: TPM. ²

To locate TPM in the contemporary theoretical landscape, a brief sketch of connectionist applications may be useful (for reviews, see Read & Miller, 1998 ; Read, Vanman, & Miller, 1996 ; Smith, 1996 ). Currently, there are two general connectionist approaches. Localist connectionist models assume that each information-processing unit represents a meaningful concept and that the interconnected units collectively represent a network of concepts and ideas. In this framework, simultaneous activation of the connected units produces mutual facilitation and inhibition, enabling it to reproduce surprisingly complex psychological phenomena such as stereotyping ( Kunda & Thagard, 1996 ), causal explanation ( Read & Marcus-Newhall, 1993 ; Van Overwalle, 1998 ), and cognitive dissonance ( Schultz & Lepper, 1996 ). Its strength lies in its capacity to describe the dynamics involved in the use of a network of existing concepts. In contrast, distributed connectionist models (e.g., Kashima & Kerekes, 1994 ; Smith & DeCoster, 1998a , 1998b ) take the view that a meaningful concept is represented by a pattern of activation over multiple processing units and that learning occurs as the connections among the units are modified. In this framework, a central focus is learning. TPM extends the distributed connectionist approach.

A virtue of the TPM is its versatility and generality. TPM has been used to explain memory (e.g., Humphreys et al., 1989 ; Pike, 1984 ), natural language processing (e.g., Smolensky, 1990 ), and reasoning ( Halford et al., 1994 ). We show that TPM can account for a wide range of findings on group impression formation and change: averaging phenomena in impression formation (e.g., Anderson, 1981 ), the learning of group categories from exemplars ( Smith & Zárate, 1990 ), time-dependent phenomena in group impression formation (e.g., recency, response dependency; see Kashima & Kerekes, 1994 ), stereotype change (e.g., R. Weber & Crocker, 1983 ), and category accentuation phenomena (e.g., Krueger & Rothbart, 1990 ; Tajfel & Wilkes, 1963 ). In doing so, the model incorporates a variety of theoretical insights such as variable perspective model ( Upshaw, 1969 ), the notion of individuation ( Brewer, 1988 ; Fiske & Neuberg, 1990 ), and relational information about interpersonal and intergroup relationships ( Turner, 1987 ). We report the results of three major simulations and one major experiment to support the model.

The Tensor Product Model of Group Impression Formation and Change

In this section we offer an overview of the model, first explicating its basic assumptions and then mathematically describing the processes of encoding, storage, and output.

Basic Assumptions of the Model

Social perceivers acquire information about a social group mostly from their social environment. Through direct interaction with members of the group or indirect hearsay in interpersonal discourse ( Asch, 1952 ; Linville & Fischer, 1993 ; Park & Hastie, 1987 ), the perceivers construct their impressions about the group. Like exemplar theories (e.g., Linville & Fischer, 1993 ; Smith & Zárate, 1992 ), TPM assumes that particular episodes of interaction and discourse are the basis of group impression formation and change. The episodic social information is culturally structured (e.g., Bruner, 1990 ; Triandis, 1995 ). Social events typically present themselves as meaningful actions that can be described by natural languages (i.e., action verbs in Semin & Fiedler's, 1988 , 1991 , linguistic category model; for instance, "helping an old lady crossing the street"). Conversants about a group use meaningful words and phrases to characterize a group (i.e., adjectives or state verbs in Semin & Fiedler; for instance, "helpful"). It is those culturally meaningful events that engage the perceivers' cognitive activities.

The episodic nature of social information makes it necessary for a model of group impressions to represent the context in which the cognitive episode occurred ( Tulving, 1983 ). Group impressions not only are based on the information about the group but also include the information about the context in which the information was obtained. Contextual information may include the social situation in which the event was observed (e.g., at the party), temporary information such as before or after a landmark event (e.g., shortly after the landing on the moon), the person who told the perceiver about the group (e.g., "Joe told me this"), the affective state of the self, or even a simple indexical representation such as "this time" as opposed to "that time." Therefore, information is assumed to be packaged as a configuration of an event and the context in which the event occurred. A number of researchers presented evidence and arguments consistent with this assumption (e.g., McConnell, Liebold, & Sherman, 1997 ; Schaller, 1992 ; Shoda, Mischel, & Wright, 1989 ; Trafimow, 1998 ; Wright & Mischel, 1987 ).

It is assumed that an event-in-context is cognitively analyzed into various aspects and encoded into specific features. Aspects were called "respects" by Medin, Goldstone, and Gentner (1993) and "dimensions" by Turner, Oakes et al. ( Oakes, Haslam, & Turner, 1994 ; Turner, 1987 ). In this article, aspects are defined as culturally useful dimensions, and features are specific levels within those dimensions. Kashima et al. (1998) illustrated these concepts using the example of an American male viewer watching on television a documentary of an Australian Aboriginal family in the outback of Australia. From this episode, the viewer may extract the aspects of "skin color," "area of residence," and "context" and encode these aspects in terms of specific features such as "dark skin," "rural area," and "on television." The TPM, therefore, assumes that an event-in-context is interpreted into a set of features (with regard to aspects).

The TPM also assumes that thus analyzed features of an event-in-context are integrated into a coherent, configural representation, and this integration process can be mathematically modeled as the computation of a tensor product. The process of feature integration may be analogous to perceptual binding ( Crick, 1984 ; Treisman & Gelade, 1980 ), which is hypothesized to occur when a viewer's neural mechanisms rapidly bind a variety of visual features together to present themselves to the viewer as a coherent, meaningful object and event. Although the neural basis of the binding process is yet to be examined fully (for a review, see Schacter, Norman, & Koutstaal, 1998 ), the TPM may provide a computational solution to this problem ( Humphreys, Wiles, & Dennis, 1994 ).

Encoding, Storage, and Output Processes in the TPM

Figure 1 provides a schematic picture of the TPM architecture involving four aspects: group, person, event, and context. Each aspect is represented by a designated cluster of cognitive units, and a pattern of activation over a given cluster of units represents a feature (e.g., a specific group label, an individuated person). The operation of the TPM can be described in terms of encoding, storage, and output processes.

Encoding Process

The encoding process consists of two subprocesses: feature encoding and representation construction. In the feature-encoding subprocess, an event-in-context is analyzed into a set of features (e.g., group membership, personal identity, behavioral description) and represented in a distributed format. This subprocess transforms a feature into a distributed representation of the feature. In exemplar theories, an exemplar is usually understood as a configuration of features, whereas each feature takes a unitary representation. Within a distributed representational system, however, a meaningful, apparently unitary concept (e.g., feature) may be represented as a pattern of activation over a collection of cognitive units rather than the activation of a single semantic node (see Hinton & Anderson, 1989 ; Rumelhart et al., 1986 ). For ease of exposition, it is assumed that a feature is represented by a pattern of activation over N cognitive units in a given cluster; a unit takes any value from negative infinity to positive infinity; and a unit in the resting state takes the activation level of 1/√ N .

Mathematically, a pattern of activation over N units is described by a real valued vector with N elements. In other words, one feature of an event-in-context is represented by a vector, f , whose i th element, f [ i ], represents the level of activation of the i th unit. If two features are extracted, two N -element vectors, such as f ₁ and f ₂ would represent the event-in-context. More generally, if m features are extracted, m N -element vectors are used. The length of a vector is defined as the square root of its inner product with itself, that is, the length of f = √( f · f ), where the inner product, ( f · f ) = ∑ f [ i ] f [ i ]. The vector, r , is used to represent a collection of units all in the resting state, that is, all N elements of r are 1/√ N (see Humphreys et al., 1989 ). In this article, all the vectors are assumed to have the length of unity.

The subprocess of representation construction integrates the distributed feature representations into a configural representation. That is, a representation of a relevant social episode is constructed as a configuration of the features with regard to various aspects of the experience. In Figure 1 , this may be understood as the spreading of activation from the clusters of units to their connection points, and as the computation of the amount by which each connection is strengthened. This mechanism, a generalization of Hebbian learning, is mathematically modeled as the computation of an outer product of the vectors. Recall that the vectors describe the patterns of activation over the units, which represent the features of the episode. The computation of the outer product results in a mathematical entity known as a tensor. A tensor is a generalization of a vector and a matrix. A vector can be thought of as a Rank 1 tensor. When the outer product of two vectors is computed, the result is a matrix, which is a Rank 2 tensor. The outer product of three or more vectors can also be computed, resulting in a tensor of Rank 3 or higher.

Imagine an episode analyzed into four features represented by four N -element vectors: f ₁ , f ₂ , f ₃ , and f ₄ . The tensor product, E ₁ , of these vectors is written as

Let E ₁ [ i , j , k , l ] represent the element at the ( i , j , k , l ) coordinate of the tensor, E ₁ . Then, the tensor, E ₁ , is defined as follows:

The same notation can be generalized to a rank m tensor: E ₁ = f ₁ ⊗ f ₂ ⊗ f ₃ ⊗ ... ⊗ f _m . Note that a bold lowercase letter is used to denote a vector and a bold uppercase letter is used to denote a tensor.

The present article mostly deals with representations consisting of group label, person, behavior episode, and context. For example, an episode in which the observer witnessed George, a member of a soccer club, help an old lady crossing the busy street may be represented by a rank four tensor of the form,

where g ₁ represents the group label, "soccer club", p ₁ represents "George," e ₁ represents "helping an old lady crossing the busy street," and x ₁ represents "near the busy street." If the individual member of the group is not individuated, the person aspect is assumed to remain at the resting level (i.e., r ; all elements are 1/√ N ).

The representation of the behavior episode may include any inferences made of the episode, such as traits (e.g., Sherman, 1996 ; cf. Uleman, Newman, & Moskowitz, 1996 ), agentic or communal orientations from role expectations (e.g., Eagly & Steffen, 1984 ; Hoffman & Hurst, 1990 ), and generalized expectations based on a group's performance and decision (e.g., Allison & Messick, 1985 ; for a review, see Allison, Mackie, & Messick, 1996 ). We also assume that the amount of attention directed to a given episode may vary. The amount of attention at time t is indexed by the attentional parameter, a _t (0 <= a _t <= 1), where 0 is no attention and 1 is full attention. The encoded event at time 1 is, therefore, represented as a ₁ E ₁ . ³

Storage Process

Once a mental representation of an event is constructed, it is stored in memory. The central assumption of TPM is that every new representation is superimposed on preexisting representations. With the passing of a unit time period, the memory trace is assumed to weaken as specified by the forgetting parameter, b (0 < b < 1). The storage operation is then modeled as a tensor addition. For instance, the representation of a new episode, E ₁ , is added to the preexisting memory, M ₀ , resulting in the memory, M ₁ , the sum of the two tensors:

where M ₁ [ i , j , k , l ] = bM ₀ [ i , j , k , l ] + a ₁ E ₁ [ i , j , k , l ]. This is equivalent to the strengthening of the connections among the units in Figure 1 .

More generally, assuming that the tensors are of the same rank and dimensionality, the memory representation at time t - 1, M _t-1 , is updated by a new episode at time t , E _t , as follows:

Note that the attention parameter is time dependent: That is, it may vary from time to time; however, the forgetting parameter is assumed to be a system constant.

Output Process

Two kinds of output processes have been examined in the group impression formation literature: judgment and classification. In judgment, an overall impression is reported on a rating scale (e.g., Hamilton & Gifford, 1976 ); in classification, exemplar information is used to classify the exemplar into an appropriate category. The difference lies in the use of cues. In judging a group, the label of the group (e.g., group A) is used as a cue to access memory, and whatever is remembered is reported on rating scales. In classification, it is a concrete example that acts as a cue, and an associated group label is retrieved from memory.

To model the two types of output processes, Kashima et al. (1998) used two operations, retrieval and matching, postulated by Humphreys et al. (1989) . According to Humphreys et al., the retrieval operation is involved in recall in which a piece of information is retrieved from memory. This is modeled within the tensor product framework as the accessing of a distributed memory representation by a lower ranked tensor. For instance, if the memory representation involves a Rank 3 tensor, a Rank 2 tensor is used as a cue. This operation results in the emergence of a distributed representation (i.e., a vector). This process is analogous to classification, in which memory is accessed by the representation of an episode without a group label (a tensor of a lower rank), and a distributed representation of a group label (a vector) is retrieved.

More formally, let M be a tensor of Rank 4, consisting of the aspects of group label, person, event, and context, and let C be a tensor of Rank 3 with the aspects of person, event, and context, that is, C = [ ] ⊗ p ₁ ⊗ e ₁ ⊗ x ₁ , where [ ] denotes a missing aspect. In other words, C does not contain group label information. The retrieval function is defined as follows:

where the vector, v , is defined as below:

The retrieved vector, v , is a noisy representation of the group label associated with the event-in-context.

The matching operation was used by Humphreys et al. (1989) to model recognition memory. They postulated that recognition judgment is based on a sense of familiarity that people feel when they see an object. Matching involves the accessing of memory by a cue represented by a tensor of the same rank, returning a scalar, which indicates a general feeling of matching strength, or a feeling of knowing. People would take a greater matching strength as an indication that they have seen the object before. Kashima et al. (1998 ; also see Kashima & Kerekes, 1994 ) postulated that bipolar impression judgment involves a process analogous to recognition memory. They suggested that, in making a group impression judgment on a bipolar scale (e.g., likeability, trait, or attitude dimensions), people access memory by cues containing the group label, context information, and the high- and low-end anchors of the judgment scale.

More formally, let us designate by a tensor of Rank 4, M , the memory representation based on which a bipolar judgment about a group is made. This tensor consists of the aspects of group label, person, behavior episode, and context. In judgment, two cues are used to access this memory. The cues are described by tensors of Rank 4, H and L , both including a particular group label and context, g and x . In addition, they contain representations of a person and the judgment scale. When a particular member of the group is specified (i.e., George the soccer player), the person aspect of the tensors involves a particular activation pattern, p , to represent this person. When an individuated person is not specified as a cue, the units for the person aspect stay at the resting state; therefore, the person aspect of these tensors is described by a vector r . In this section, we describe only the latter case, although the former will be relevant later. The behavior episode aspect of these tensors is either the mental representation of the higher or lower end anchor of a judgment scale (e.g., likable or unlikable), h , or l . Therefore, H = g ⊗ r ⊗ h ⊗ x , and L = g ⊗ r ⊗ l ⊗ x .

The matching operation is defined as

and

These operations return a scalar that approximates the similarity between the memory and the cue, which can be interpreted psychologically as the general feeling of familiarity.

The judgment process is then modeled as follows:

This equation embodies the assumption that the judgment scale provides a frame of reference in which people place the target group. People are assumed to access the memory representation by the higher end ( Equation 6 ) and the lower end ( Equation 7 ) of the scale. They then evaluate the relative "closeness" of the target group to the higher end relative to the lower end. This evaluation is used to make a judgment on the bipolar scale ( Equation 8 ). Note that this is a special case of the relative goodness rule ( Massaro & Friedman, 1990 ; see also Luce, 1959 ).

General Characteristics of the TPM

The major characteristic of the TPM is its capacity to model group impressions as dynamic configurations. The model explicitly traces the dynamic development of the mental representations of a social group over time as new information is encountered. It also provides a way of describing a configural representation that Asch's (1946 , 1952) Gestalt approach postulated. As noted by Read et al. (1996) , Kashima and Kerekes's (1994) simple linear-associator does not handle a complex configural representation; however, the TPM rectifies this limitation and contributes to the configural research tradition that Read et al. advocated.

In modeling the output process for impression judgment, TPM incorporates the insight of Upshaw's (1969) variable perspective model. According to Equation 8 , a judgment is, generally, a function of how similar the memory is to the high-end anchor relative to its similarity to the low-end anchor (for detailed discussion, see Kashima & Kerekes, 1994 ). This implies that people interpret the adjectives and words that are used in judgment scales differently, and observed judgments can vary as a function of the mental representations of the scale anchors (e.g., Campbell, Lewis, & Hunt, 1958 ; Manis, 1967 ; Ostrom & Upshaw, 1968 ; Volkmann, 1951 ). More recently, Biernat et al. ( Biernat & Kobrynowicz, 1997 ; Biernat & Manis, 1994 ) convincingly demonstrated the importance of this insight in group-relevant judgments by showing that a social group membership of targets can alter the mental representations of end anchors of judgment scales.

Several points are noteworthy about the representations of judgment scale anchors. First, scale anchors must pertain to an aspect of the tensor representation (e.g., in the present case, to the event aspect). Second, we assume that the scale anchors are selected by the experimenter so that they are relevant to the expected memory content. ⁴ If they are irrelevant to the event memories, both Match( M , H ) and Match( M , L ) would return relatively small values. In this case, a judgment following Equation 8 may fall around the scale midpoint because Match( M , H ) and Match( M , L ) may both be equally small (this would be the case of indifference as opposed to ambivalence). Third, the model accommodates the possibility that the trait words used as scale anchors may be relatively independent concepts. More formally, the mental representations of the high and low endpoints of a scale, h and l , may be such that ( h · l ) may be close to zero (or uncorrelated). Although this assumption is not strictly necessary, it is consistent with the finding by Skowronski and Shook (1997) , in which antonymous trait adjectives were shown to have relatively independent representations.

Comparisons With Other Relevant Theories

In this section, we contrast TPM with three most relevant connectionist theories.

Kunda and Thagard's (1996) IMP Model

Kunda and Thagard's (1996) theory, which modeled person impression formation as a parallel constraint satisfaction process, differs from TPM in two respects. First, the Kunda-Thagard model treats an observer's "knowledge" about a social group as given and describes its use in forming impressions about a person. As Kunda and Thagard (1996) noted, their model does "not address the question of how incoming information may alter one's knowledge about stereotypes, behaviors, and their associations" (p. 304). It is this process of formation and change, or temporal dynamics, of group impressions that TPM is designed to address.

Second, cognitive architectures differ. The present model assumes a distributed representation, whereas the Kunda-Thagard model assumes a localist representation. In a distributed representational system, a vector is used to represent a concept, whereas a localist representational system uses a meaningfully interpretable "node" to represent a concept. An advantage of the distributed representational system in the present context is the ease with which it can explain the averaging phenomenon in impression formation ( Kashima & Kerekes, 1994 ).

Fiedler's (1996) BIAS Model

Fiedler (1996) proposed the BIAS (Brunswikian Induction Algorithm for Social Cognition) model to explain a number of judgmental biases found in social cognition. This model uses a distributed representational system in which a piece of information is represented as a vector. Biases are explained as a consequence of the process of aggregating a number of representations. In terms of its formal property, BIAS is a special case of the tensor product model. When a Rank 1 tensor (or a vector) is used to represent a concept, TPM reduces to BIAS. Alternatively, TPM may be thought of as an extension of BIAS along the dynamic configural line. BIAS uses a vector representation but does not construct a configural representation that conjunctively combines features of a stimulus object. BIAS is not a memory model, but TPM is grounded in the memory literature.

One major difference between BIAS and TPM lies in their metatheoretical interpretations of the mathematical formalism. TPM takes a cognitive perspective and assumes that the processing of distributed representations as characterized by the mathematical formalism is an algorithmic description of the cognitive processes ( Marr, 1982 ). By contrast, BIAS explicitly interprets a distributed representation as a set of multiple proximal cues that bears probabilistic relations with a distal object within Egon Brunswik's (1956) theory of perception, without adopting "the cognitive metaphor" ( Fiedler, 1996 , p. 200). Nevertheless, the metatheoretical difference may be more apparent than real in that Marr's algorithmic theory does not make a strong commitment to the way in which a process is implemented in a physical system. Processing units in TPM (or any distributed representational system for that matter) can be interpreted as "proximal cues."

Smith and DeCoster's (1998a , 1998b) Recurrent Network

Smith and DeCoster (1998a , 1998b) used an autoassociative network to model the process of person perception and memory. Like the TPM and Fiedler's BIAS, their model adopts a distributed representational system. However, its capacity for memory makes it different from BIAS. Further, the architecture and learning algorithm of their autoassociative network differs from the TPM. The processing units are all linked to each other (except with themselves) in the recurrent network, whereas the TPM's associative links are limited to the units that represent different aspects of a social event. The TPM uses a version of the Hebbian learning rule ( Kashima et al., 1998 ), but the Smith-DeCoster model uses the delta rule, which is designed to minimize the network's error in reproducing an input vector.

Like Kunda and Thagard, Smith and DeCoster model the domain of person perception, although they sometimes reported simulations pertinent to group impressions. Smith and DeCoster's modeling attempt differs from ours in two respects. First is the level of abstraction at which the research programs are pitched. Smith and DeCoster are generally concerned about describing the stereotype learning and use at an abstract level, whereas we attempt to model empirical phenomena at a concrete level, much closer to data. Therefore, Smith and DeCoster did not model different types of output processes (i.e., impression judgment vs. classification learning), a variety of time-dependent properties of group impression formation (to be discussed later), and so on.

Second, Smith and DeCoster's model and TPM may also describe different types of learning processes. McClelland, McNaughton, and O'Reilly (1995) suggested that there are two types of learning processes: a slow-learning system that extracts general regularities, postulated to be implemented in the neocortex, and a fast-learning system that requires attentional resources and binds novel stimuli to construct an episodic representation, which is said to be localized in the hippocampal region. On the one hand, McClelland et al., as well as Smith and DeCoster (1997) , argued that a connectionist learning system that uses an error-driven algorithm (such as the delta rule) may be suitable for modeling the slow-learning process. On the other hand, Dennis and Humphreys (1997) postulated that a mechanism similar to the TPM may be able to describe the fast-learning system (also see Wiles & Humphreys, 1993 ).

This discussion suggests that Smith and DeCoster's model may be best understood as an attempt at modeling the slow-learning mechanism. An empirical inadequacy of Smith and DeCoster's model (e.g., Kashima & Kerekes, 1994 ; also Busemeyer & Myung, 1988 ) may be interpretable in this light. As we discuss later, Busemeyer and Myung analytically proved that the learning mechanism involved in the distributed memory system developed by McClelland and Rumelhart (1985 , 1986) and used by Smith and DeCoster (1998a , 1998b) predicts that the way in which people estimate the prototype of a category is time invariant (to be discussed later more fully). However, Busemeyer and Myung's as well as Kashima and Kerekes's (1994) data contradicted this prediction. Later in this article we show that group impression data also contradict it. Although it is too early to tell, Smith and DeCoster's model may be more suitable for modeling the slow-learning system, whereas TPM may be better suited for modeling the fast, binding mechanism.

Group Impression Formation

In this section, we first model group impression formation processes within the TPM framework and then report an experiment in which TPM predictions are tested.

Modeling Group Impression Formation

Two types of experimental paradigms have been used to examine the formation of group representations ( Kashima, 1999 ; Kashima et al., 1998 ). One type is based on the classification learning paradigm ( Medin & Schaffer, 1978 ). Exemplars that vary along multiple dimensions are classified into two novel groups. The participants' task is to learn the classification and to classify new stimuli into the two categories ( Smith & Zárate, 1990 ). The other type is analogous to the person impression formation paradigm, in which novel groups are described by a series of stimuli, and experimental participants are told to make judgments about a group. A well-known example is the distinctiveness-based illusory correlation ( Hamilton & Gifford, 1976 ). The classification and judgment paradigms have produced two different theories, which have not been integrated within a single theoretical framework.

Classification Learning

Kashima et al. (1998) showed that the TPM is consistent with the generalized context model of classification learning ( Nosofsky, 1984 , 1986 ; Smith & Zárate, 1992 ). The generalized context model assumes that when people learn to classify into n groups exemplars that vary along multiple dimensions (e.g., artistic vs. scientific, sociable vs. unsociable), the classification decision for a new exemplar is a function of the similarities of the new exemplar with the learned exemplars. In a typical experiment, people learn to classify exemplars E _ij into groups G _i and are later tested for their classification of the old and new exemplars. The probability of classifying a test exemplar, T , into the i th group, G _i , is:

where s ( E _ij , T ) is the similarity between a learned exemplar ( j th exemplar of the i th group), E _ij , and the new exemplar, T . Note that G _i ∋ E _ij indicates that the summation is over all the exemplars that belong to the i th group, G _i .

The context model ( Medin & Schaffer, 1978 ) postulates that the overall similarity between a learned exemplar and the test exemplar is a multiplicative function of the dimensional similarities:

where s ( E _ijk , T _k ) is the similarity between the learned exemplar, E _ij , and the new exemplar, T , on the k th dimension (where k = 1 ... K ).

From an analytical perspective, the classification choice predicted by TPM is consistent with the context model as characterized by Equation 9 and the multiplicative similarity function as in Equation 10 (see Kashima et al., 1998 , for a general proof). We assume that the values that the exemplars E _ij and T take on the k th dimension are encoded as e _ijk and e _tk . Then these exemplars may be configurally represented as E _ij = g _i ⊗ e _ijl ⊗ ⊗ e _ijK and T = [ ] ⊗ t ₁ ⊗ ⊗ t _K , where [ ] indicates that the group aspect of this representation is missing. Furthermore, assume that the memory representation after learning all the old exemplars is modeled as

When this is accessed by the new exemplar, T , the retrieved vector is a weighted sum of the vectors representing the group labels

where the weight, ∑ _Gi

∋Eij ∏ _k ( e _ijk · t _k ), can be interpreted as the strength of activation of the i th group label. Note that the similarity between the k th feature of an exemplar, E _ijk , and the k th feature of the test exemplar, T _k , s ( E _ijk , T _k ) = ( e _ijk · t _k ). Therefore, ∏ _k ( e _ijk · t _k ) = ∏ _k s ( E _ijk , T _k ) = s ( E _ij , T ), according to Equation 10 . This implies that the TPM is consistent with the multiplicative similarity function in Equation 10 , which embodies the configural representation assumed by the context model. If we assume that the probability of choosing the i th group label, G _i , for the test exemplar, T , follows Luce's (1959) choice rule, then the following equation obtains

Substitute the equality, ∏ _k ( e _ijk · t _k ) = ∏ _k s ( E _ijk , T _k ) = s ( E _ij , T ), into Equation 13 , and we obtain Equation 9 .

Empirically too, Kashima et al. (1998) reported that Smith and Zárate's (1990) experimental results on classification learning were closely reproduced by a computer simulation of the TPM. In Smith and Zárate's experiment, human participants learned to classify nine exemplars into categories A and B (five to A and four to B). Later they were given these nine exemplars and seven new exemplars to classify into A and B. Figure 2 presents the probability of classifying the nine old and seven new exemplars to category A, observed in the experiment (dashed line) and obtained in the simulation. The simulation results closely followed the empirical results.

Impression Judgment

Kashima et al. (1998) pointed out that TPM is compatible with the weighted averaging model as well. Again, a simplified version of the proof is provided here. Suppose that participants learn group labels, individual members, their behavior episodes, and the context in which the episode was observed. We designate the i th group's label G _i ( i = 1 ... I ), the j th person in the i th group P _ij ( j = 1 ... J ), the person's k th behavior episode E _ijk , and the context in which the episode is observed, X ₁ , (1 = 1 ... L ). Most researchers assume that the weighted averaging model can describe group impression judgments under this circumstance, so that the impression judgment for the i th group is described as follows:

where w _i'jk and s _i'jk designate the weight and scale value of the stimulus, S _i'jk . The relative weight for the stimulus, S _i'jk , is defined as w _i'jk /∑ _j ∑ _k w _i'jk .

According to TPM, the event involving the k th episode of the j th person in the i th group observed in the l th context is encoded as a Rank 4 tensor, E _ijkl = g _i ⊗ p _ij ⊗ e _ijk ⊗ x _l . For the ease of exposition, let us assume that the context remains the same. Ignoring the attention and forgetting parameters for the time being, we come to the following memory representation:

Substituting Equation 15 into Equation 8 (judgment model) and simplifying it using Equations 6 and 7 , we obtain the following:

assuming that all group label representations, g _i , are distinct, so that ( g _i · g _i' ) = 0 for all i ≠ i '.

Let us define

and

When Equations 17 , and 18 are substituted into Equation 16 , we obtain Equation 14 , which represents the weighted averaging model. ⁵ The assumption that the weighted averaging model holds in group impression formation has not been tested in the literature.

Equation 17 suggests that the scale value should remain relatively constant regardless of the context and person vectors because the scale value, s _i'jk , is a function only of the similarities of the exemplar and the scale anchors. By contrast, Equation 18 suggests that the weight, w _i'jk , can vary as a function of the person and context representation. In particular, it is important to note that the weight for an exemplar varies as a function of ( x _l · x ); that is, the similarity between the context in which the exemplar was learned ( x _l ) and the context in which the judgment is made ( x ), as well as a function of ( p _i'j · r ), that is, the similarity between the person representation ( p _i'j ) and the resting state ( r ). This property is important in explaining a variety of phenomena as we see later.

Group Impression Formation and the Information Environment

The foregoing discussion implies that, according to TPM, group impressions bear a dynamic relationship with the information environment. Once an event-in-context is positively or negatively encoded for instance, it is stored in memory, and group impressions are constantly updated. The resultant representations about social groups can quite accurately reflect the probabilistic environment with which the connectionist learning system interacts. However, the relationship between group impressions and the probabilistic property of the information environment is rather complex. TPM predicts that group impressions vary as a function of both the probability of types of events encoded about a group and the total amount of information (or number of events) learned about the group.

To see this, we consider a simple case. Suppose that one event is learned about a group member and that there are J members of the group. That is, the number of events observed about the group is J . Further suppose that, of those J events, the probability of positive to negative events is p . Under some simplifying assumptions, the impression judgment about the group can be written as follows ( see Appendix for proof):

where s ₀ is the effect of the prior memory and s _p and s _n represent the scale values of the positive and negative events. Assuming that s _n < s ₀ < s _p , Equation 19 implies that when J is constant, impression judgment is more positive when p is greater, and as J becomes very large, judgment approaches ps _p + (1 - p ) s _n , a value that is a function only of p . Therefore, impression judgments should reflect the probabilistic property of the information environment fairly accurately in the long run. ⁶

Equation 19 also implies that impression judgment varies as a function of the total amount of information, J , when J is relatively small, even if p is constant. In particular, if p > ( s ₀ - s _n )/( s ₀ - s _n ), judgment increases (or becomes more positive) as J increases; if p > ( s ₀ - s _n )/( s ₀ - s _n ), judgment decreases (or becomes more negative) as J increases. This implication obtains because of s ₀ , the effect of the prior memory ( see Appendix ). In other words, when the number of positive events learned about a group is large relative to that of negative events, a group about which social perceivers know a great deal is more positively evaluated than a group about which they know only a little. Conversely, when the number of positive events learned about a group is small relative to that of negative events, a group about which social observers know a great deal is more negatively evaluated than a group about which they know only a little.

Both of these implications of TPM are, in fact, consistent with the distinctiveness-based illusory correlation first identified by Hamilton and Gifford (1976 ; also see Hamilton, Dugan, & Trolier, 1985 ), arguably the first experiment that reported about group impression formation. In their Experiment 1, Hamilton and Gifford presented 39 behavior episodes performed by individual members of two groups (A and B). The majority group exhibited 18 positive and 8 negative behaviors, whereas the minority performed 9 positive and 4 negative behaviors. Although the ratio of positive to negative behaviors remained constant across the two groups, the overall impression formed was more positive for the majority than for the minority group. In their Experiment 2, they showed the reverse tendency; that is, when more negative than positive behaviors were shown, the majority was evaluated more negatively than the minority. This finding has been replicated in a number of experiments (see Mullen & Johnson, 1990 ).

This account of the phenomenon differs from that of Hamilton and Gifford (1976) . According to them, the combination of a minority status and infrequent negative behaviors makes this class of episodes distinctive. This distinctiveness is analogous to the pairing of exceptionally long words (e.g., blossoms-notebook) in Chapman's (1967 ; also see Chapman & Chapman, 1967 , 1969 ) experiments. Chapman's participants overestimated the frequency of the occurrence of distinctive pairings. Likewise, the participants in Hamilton and Gifford's experiment weighted the infrequent behaviors more than others, leading to the more negative impression of the minority group.

More recently, a number of researchers suggested alternative explanations. First, Fiedler (1991 , 1996 ; Fiedler & Armbruster, 1994 ; also see Smith, 1991 ) suggested that information loss can explain the Hamilton-Gifford phenomenon within the BIAS model framework. Positivity of information was represented by a vector, and another vector perfectly negatively correlated with it as representing negativity. A judgment was modeled by a correlation between the positivity vector and the sum of all vectors representing the behavioral information of the majority and minority groups. Even if the minority and majority groups exhibit the same level of positivity (or negativity), the magnitude of the correlation between the vector sum and the positivity vector was greater for the majority than that for the minority only when some errors were introduced to the vectors representing the behaviors. This amounts to a more extreme judgment (either negative or positive) for the majority than for the minority when errors are present or information is lost. Second, McGarty, Haslam, Turner, and Oakes (1993) suggested that the participants have a preconception about the relationship between two contrasted groups. The contrastive relationship may contribute to the differential evaluations of the groups.

The robustness of the Hamilton-Gifford phenomenon suggests that it may be multiply determined (e.g., Berndsen, Spears, McGarty, & van der Pligt, 1998 ; Mackie, Hamilton, Susskind, & Rosselli, 1996 ). The TPM framework may provide a possibility for incorporating a number of explanations of the Hamilton-Gifford phenomenon and evaluating relative contributions of these effects. To begin, the TPM is not inconsistent with the distinctiveness-based account and can incorporate it in terms of the attentional parameter postulated in Equation 8 . In addition, if random error vectors are added to the encoded behaviors in the TPM (as is routinely done in simulations), this can produce the condition simulated by Fiedler (1996) . Finally, as suggested by McGarty et al. (1993) and Berndsen et al. (1998) , it is possible that the process of differentiating the contrasting groups (discussed later in the Group Differentiation section) may be involved in the process.

Furthermore, research identified a number of limiting conditions of the illusory correlation phenomenon ( Berndsen et al., 1998 ; McConnell, Sherman, & Hamilton, 1994 ; McConnell, Sherman, & Hamilton, 1997 ; Pryor, 1986 ; Sanbonmatsu, Sherman, & Hamilton, 1987 ; Schaller, 1992 ; Schaller & Maass, 1989 ). As suggested by Hamilton and Sherman (1996 ; also Berndsen et al., 1998 ), a theory based on the concept of entitativity ( Campbell, 1958 ), the extent to which a group is perceived to be a coherent entity, may provide an integrative explanation of the complex processes involved in the illusory correlation phenomenon. In the meantime, it should be noted that the effect of the prior memory, as suggested by the TPM, may also play some role in generating this robust phenomenon.

Time-Dependent Properties of Group Impression Formation

Impression judgments are time dependent. When a series of stimuli are presented and a judgment is made, weight given to a stimulus for the judgment depends on time. The TPM makes detailed and novel predictions about time dependence of impression formation, which are explicated here and tested in a later experiment reported. These predictions are couched in terms of a serial position weight, which is the weight given to a stimulus that occupies a certain serial position for a given impression judgment. When a series of J stimuli ( j = 1 to J ) is presented and a judgment is made after the J th stimulus, the weight given to the j th stimulus is written as SPW( j , J ). For example, SPW(1, 4), SPW(2, 4), SPW(3, 4), and SPW(4, 4) indicate the serial position weights for Stimulus 1 through 4 computed based on the judgment made after the fourth stimulus was presented.

Linearity and time invariance.

Busemeyer and Myung (1988 ; Myung & Busemeyer, 1992 ) showed that a number of connectionist models of category learning can be tested by examining time dependence of impression formation. Those models include Metcalfe-Eich's ( Eich, 1982 ) holographic memory model, Hintzman's (1986) multiple-trace memory model, Knapp and Anderson's (1984) distributed memory model, and McClelland and Rumelhart's (1985) connectionist model. Smith and DeCoster (1998a) also used McClelland and Rumelhart's model. In particular, Busemeyer and Myung showed that prototype estimate, which is an experimental participant's estimate of the prototype of a category, can be predicted by these models and rewritten in the following form:

where e _j is the j th exemplar ( j = 1 ... J ) and w _{(

J

-

j

)} is a scalar weight for the j th exemplar when the prototype estimate is made after the J th exemplar is presented. This means that these models have common properties of linearity and time invariance. ⁷ Linearity implies that the prototype estimate is a linear additive function of the exemplars. Time invariance means that the weight for a given exemplar should be constant in so far as, ( J - j ), the time interval between the exemplar presentation and the judgment (operationalized as the number of intervening exemplars) remains the same. To use the present notation, this implies that SPW( j , J ) is constant when ( J - j ) is constant.

By contrast, TPM does not have the time-invariance property, although it implies linearity under a certain circumstance (see Kashima & Kerekes, 1994 , Kashima et al., 1998 ). Busemeyer and Myung's (1988 ; Myung & Busemeyer, 1992 ) empirical studies showed that human prototype estimates are largely linear but not time invariant. Although Kashima and Kerekes (1994) showed that person impression judgments are time variable, it is yet to be examined whether group impression judgments are time variable or not. Further, a linearity assumption of group impressions has never been tested directly.

Response dependency.

This means that the weight of a stimulus for a judgment depends on whether, and if so when, another judgment is made between the stimulus and the judgment. Kashima et al. ( Kashima & Kerekes, 1994 ; Kashima et al., 1998 ) suggested that the TPM predicts a response dependency in impression judgments under some conditions. This prediction rests on two arguments. First, Equation 18 predicts that the weight for a given exemplar is a function of the similarity between the learning and judgment contexts, other things being equal. Therefore, under the condition in which people are expected to interpret the judgment context to be different from the learning context, the exemplar will be weighted differently to the condition in which the judgment context is interpreted to be the same as the learning context. The greater the similarity between the learning and judgment contexts, the greater should the weight be for an exemplar.

More formally, according to TPM ( Equation 18 ; i.e., ignoring the attentional and forgetting parameters), SPW( j , J ) can be described by the following equation under some simplifying assumptions:

where x _j is the context representation for the j th stimulus and x _J is the context representation for the J th stimulus. The assumptions made for this are that the scale value and the stimulus representation remain constant for all stimuli and that the context for a judgment after the J th stimulus is the same as the context for the J th stimulus.

Second, it is hypothesized that the act of making a judgment often prompts people to alter the context representation. Suppose that an experimental participant receives a first series of stimuli, makes a first judgment, receives a second set of stimuli, and then makes a second judgment. When one task using the first stream of stimuli is completed by making a judgment, the subsequent stream of stimuli may be differentiated from the first. This differentiation of the two sets can be represented as a change in context representation in TPM ( Kashima & Kerekes, 1994 , Footnote 5). Martin (1986) made a similar suggestion in his analysis of assimilation and contrast effects.

Based on this formulation, two predictions follow. First, the weight for the j th stimulus for the judgment made after the J th stimulus, SPW( j, J ), should vary depending on whether another judgment is made between the j th and J th stimuli. Suppose that a series of J stimuli are presented, and a first impression judgment is made after the J ₀ th stimulus, and a second impression judgment is made after the J th stimulus is presented. The weights estimated from the second judgment, SPW( j, J ), should increase as a step function of j , where the increase occurs at the position of J ₀ (Prediction 1). This is because the context representation for the stimuli before the first judgment is expected to be less similar to the context representation for the stimuli after it. Provided that the serial position weight is a function of the similarity between the learning and judgment contexts ( Equation 21 ), the first set of stimuli should be weighted less than the second set.

Second, the same mechanism predicts that the serial position weights before the first judgment estimated from that judgment should be greater than the weights for the same positions estimated on the basis of the second judgment (Prediction 2). That is, SPW( j, J ₀ ) > SPW( j, J ), where J ₀ > j . This is because the context representation for the first set of stimuli is expected to be less similar to the context for the second judgment than the context for the first judgment ( Equation 21 ). A test of these predictions is reported later.

Order effect.

When the response dependency is not an issue, an order effect is another instance of time dependency. If information encountered earlier has a greater effect on a judgment than recent information, it is called a primacy effect; a greater effect of recent information relative to earlier information is called a recency effect. Hamilton and Sherman's (1996) formulation suggests that a group impression may exhibit a primacy or recency effect depending on the target group's perceived entitativity (see Manis & Paskewitz, 1987 , for some evidence). When a group is highly entitative (i.e., a group is perceived to be an entity), people would attempt to form an integrative impression in the same fashion as in person impression formation. In person impression formation, observers have been postulated to direct a decreasing amount of attention to later stimuli presumably because the later stimuli are assumed to be redundant (see Kashima & Kerekes, 1994 , for a review). This implies a primacy effect resulting from attention decrement when a group is perceived to be an entity.

In contrast, when a group does not have a high level of entitativity, group impression formation may be conceptualized as a series of person impression formation. This suggests that the same attention decrement may occur for each individual because the person is perceived to be an entity. However, when a new individual member is encountered, attention is renewed; therefore, no systematic attention decrement should occur for the overall group impression. It implies that a recency effect may obtain for impressions of a low entitative group. This is because earlier information may be forgotten, and more recent information may have a greater impact in the absence of attention decrement. This process is modeled in TPM by the attentional and forgetting parameters, a and b ( Equation 4 ). As a function of the relative magnitude of the parameters, a primacy or recency effect could occur (see Strange, Schwei, & Geiselman, 1978 , for results consistent with this reasoning). An experiment is reported, testing the hypothesis that a recency effect is likely to occur when impressions are formed when groups are not entitative.

Experiment: Time Dependency of Group Impression Formation

This experiment tested the time-dependent properties of group impression formation predicted by the TPM. A fictitious group of four friends served as a target group. Each individual was attributed two opinions on social issues relevant in Australia: republicanism (whether Australia should become a republic) and Aboriginal issues (whether Australian Aboriginals should receive a better treatment). This provides an experimental condition comparable to Dreben, Fiske, and Hastie's (1979) person impression formation experiment, in which a person was described by four sets of two behavior episodes. Participants were told to form an impression of the group and made their judgments on the group's opinion on Aboriginal issues in five different judgment conditions. In the final responding condition, all opinion statements were presented first, and a judgment was made. In the continuous-responding condition, a judgment was made after each individual. In the other three conditions, judgments were made twice. In the (1, 4), (2, 4), and (3, 4) conditions, a judgment was first made after the first, second, or third person, respectively, and also after the fourth person.

This design allowed us to test the TPM assumption that an implication of a behavior episode constitutes part of the representation of an encoded event. Each stimulus individual expressed an opinion about Aboriginal issues as well as an opinion on republicanism, which is clearly an issue distinct from but related to Aboriginal issues according to a pretest. Republicanism is a stance about the constitutional status of Australia as a nation. Currently, the head of the state of Australia is the British queen. However, republicans take the view that Australia should become a republic. Australian students tend to believe that a sympathetic stance to Aboriginals and a republican stance go together, presumably because they express a liberal attitude. One half of the stimulus groups consisted of individuals who expressed pro-republican opinions, and the other half expressed all anti-republican opinions. If the implication of an opinion on republicanism is encoded, this should have an effect on the overall impression about the group's opinion on Aboriginal issues.

Method Participants.

Eighty (24 male, 56 female) undergraduate students at La Trobe University participated in this experiment for AUD5 per hour.

Design.

Five response conditions were constructed. In the final responding condition, participants were given attitude statements purportedly made by four members of a group and made an impression judgment after the fourth stimulus person. In the sequential responding condition, a participant made an impression judgment after each group member. In the other three conditions, two judgments were requested: after the first and fourth members in the (1,4) condition; after the second and fourth members in the (2,4) condition; and after the third and fourth members in the (3,4) condition. Sixteen participants were randomly assigned to each condition.

Stimulus.

To construct stimulus groups, 60 attitudinal statements (30 favoring and 30 opposing) on each of the republican and the Aboriginal issues were written. These issues were chosen because a pilot study ( N = 20) showed that they were perceived to be related to each other in that those who are in favor of Australia becoming a republic were perceived to be likely more sympathetic to Aboriginals. In a separate pilot study, 40 participants drawn from the same pool as those who participated in the main experiment were asked to judge whether the statements "favors or opposes Australia becoming a republic" or "favors or opposes Aboriginal people receiving a better treatment" on an 11-point scale (0 = opposes, 10 = favors ). Four statements with highest ( M > 7.5; e.g., "Anyone who denies the Aboriginal population the chance to retain some of their land is being racist and immoral") and four statements with lowest ( M < 2.5; e.g., "Aborigines are basically unemployable because they are all lazy and disruptive") mean scale values were selected.

Each participant was shown 64 groups of stimulus individuals in total (presented in a random order for each participant), of which 32 groups were crucial to the experiment. The other 32 groups served as fillers to mask the repetitive nature of judgments about the crucial groups. Each group was said to consist of four friends whose opinions on two social issues were presented. One issue was a target issue (Aboriginal issue; whether Aboriginal people should receive a better treatment) and the other issue was a related issue (republican issue; whether Australia should become a republic). The 32 stimulus groups were constructed so that they embodied five within-participant factors (two levels in each): group's opinion on the related issue (high vs. low on republican issue), first, second, third, and fourth members' opinion on the target issue (high vs. low on the Aboriginal issue), where "high" means favorable to Australia becoming a republic and sympathetic to Aboriginals.

In half of the 32 stimulus groups, all members expressed "high" opinions on the republican issue, and in the other half, all members had "low" opinions. On the Aboriginals issue, the members' opinions varied in accordance with the factorial design (e.g., HHHH, HHHL, HHLH). However, to ensure that all four different high statements and low statements appear at each position equally frequently within an experimental condition, Anderson's (1973 ; also see Kashima & Kerekes, 1994 ) design was used. First, four high and low statements were each randomly numbered from 1 to 4. In Stimulus Set 1, statements were ordered from Number 1 to Number 4 in all groups. In other stimulus sets (Sets 2 through 4), the order of statements was varied according to the Latin square design (4, 1, 2, 3; 3, 4, 1, 2; and 2, 3, 4, 1). Four participants were randomly assigned to each stimulus set in each response condition. Each group member's opinion on the republican issue was always presented first, followed by an opinion on the target Aboriginal issue.

This design also permitted an estimation of serial position weights, on which the most TPM predictions were based. Recall the notation, SPW( j, J ), which refers to the weight for the j th stimulus for the judgment made after the J th stimulus. The estimation procedure was as follows. First, we summed the judgments after the J th stimulus for all the sequences of stimuli whose j th stimulus person expressed a positive attitude toward the issue and then summed the judgments after the J th stimulus for all the sequences of stimuli whose j th stimulus person expressed a negative attitude toward the issue. Finally, the second sum was subtracted from the first sum. The difference score should be a linear function of SPW( j, J ). For the rationale of this method, see Anderson (1973) and Kashima and Kerekes (1994) .

Procedure.

Participants were greeted by a male experimenter and shown to a computer. After the experimenter ensured the participants' familiarity with the equipment, all the instructions were given on the screen. The instructions informed that the experiment was concerned with how people form impressions about groups and would be shown many groups of friends who express various opinions about social issues. First, they were asked to express their own opinions on various social issues, including the republican and Aboriginal issues using 11-point scales (0 = opposes, 10 = favors ). The mean scores were 7.8 and 7.4, indicating generally liberal attitudes. A practice session was then presented, in which the participants were shown a series of four statements attributed to four different individuals and asked to make judgments about the groups on the same 11-point scale regarding social issues such as the republicanism and Aboriginal issues. Each statement was presented for 7 s, and judgments were self-paced. The schedule by which judgments were requested was the same as in the main experiment. After the practice session, the participants were prompted for questions. No questions were asked. The main experiment then began. Judgments about the target Aboriginal issue were made on the 11-point scale (0 = opposes, 10 = favors ). The experiment lasted 50 to 70 min. Participants were thanked and debriefed.

Simulation procedures.

Identical experimental conditions were simulated using Mathematica on a Silicon Graphics Indy Workstation. The attention parameter a was set at 1 for the first stimulus and .8 for a second stimulus for each person. The forgetting parameter b was set at .95. For each condition, 16 simulations were run, and the estimates of serial position weights were computed based on these simulations ( see Appendix for details).

Results Additivity.

To test the additivity of the impression judgments, a five-way factorial analysis of variance (ANOVA) was conducted on the overall impression judgment for the final responding condition. Five within-participants factors were related attitude (high vs. low), Position 1 (high vs. low), Position 2 (high vs. low), Position 3 (high vs. low), and Position 4 (high vs. low). The TPM predicts a significant main effect for each of the five within-participants factors but no interaction effects. As predicted, the five main effects were significant, F (1, 15) = 9.37, 88.59, 154.64, 345.19, and 141.13, all p s < .01, for related attitude, Position 1, Position 2, Position 3, and Position 4, respectively. No interaction effects were significant. The size of the additive effects was substantial (72% of the total variance), and the interaction effects were relatively minor (less than 2%).

Overall patterns of serial position weights.

SPW(1, 4) through SPW(4, 4) were computed separately for the conditions in which related attitude statements were high or low. A Related Attitude (high vs. low) × Position (Position 1 to 4) ANOVA was conducted on these estimated weights for each response condition. None of the main and interaction effects involving related attitude was significant ( F < 1.20). The serial position weights were then computed by averaging across the judgments for the high and low related attitude conditions. All subsequent analyses are based on this measure. The means from the simulation are reported in Figure 3 , which constitute theoretical predictions, and comparable means for the human judgments are reported in Figure 4 . The human and simulation results were similar, as seen from the figures. The correlation between the human data and simulated results was .88 overall and .90, .87, .99, .96, and .79 for the final responding, continuous responding, (1, 4), (2, 4), and (3, 4) conditions, respectively.

Time variability.

A test of time variability was conducted for the continuous-responding condition first. If the group impression judgments were time invariant, there should be no difference among SPW(1, 1), SPW(2, 2), SPW(3, 3), and SPW(4, 4). This is because these serial position weights are for the stimulus immediately preceding the judgment (i.e., there is no difference in time lag); time invariance predicts that the serial position weights should not differ. However, as seen in Panel B, Figure 3 , TPM predicts some time variability. A repeated measures ANOVA on the serial position weights from the human data yielded a reliable linear trend, F (1, 45) = 5.71, p < .05, suggesting time variability of group impression judgments.

A comparable test of time variability was conducted using the serial position weights for the (1, 4), (2, 4), (3, 4), and final responding conditions. In particular, SPW(1, 1) from the (1, 4) condition, SPW(2, 2) from the (2, 4) condition, SPW(3, 3) from the (3, 4) condition, and SPW(4, 4) from the final responding condition were compared. Note that all these serial position weights indicate the weights given for the stimulus that immediately precedes judgments. Given that there is no difference in time lag, time invariance predicts no difference among them, whereas TPM predicts difference. A one-way ANOVA revealed a significant linear trend, F (1, 60) = 50.83, p < .001, supporting TPM.

Response dependency.

To illustrate response dependency, it is most informative to examine the final and continuous-responding conditions first. In the final responding condition, in which only one judgment was made, there should be no effect of prior responses. Therefore, only effects of attention and forgetting (a and b in Equation 4 ) are expected. The serial position weights here are expected to be smooth as seen in Panel A of Figure 3 (simulation). By contrast, in the continuous-responding condition, in which a judgment was made after each stimulus person, the serial position weights are expected to change radically. Examine the shape of connected points for Panel B in Figure 3 . There is a strong upward swing in each line. This is expected because the stimulus associated with the context for a judgment (i.e., the most recent stimulus in the continuous condition) should have the greatest weight, other things being equal ( Equation 21 ).

Generally, TPM suggests that the serial position weights should depend on the schedule of impression judgments. In particular, the estimated weight for a given serial position should change when a judgment has been made before that position relative to the condition in which no judgment has been made. This implies that the serial position weights should vary as a function of the response condition. A Position (1—4) × Response condition (5 conditions) ANOVA on SPW(1, 4) through SPW(4, 4) yielded the predicted interaction of position and response condition, F (12, 225) = 2.82, p < .01.

TPM makes more detailed predictions. First, when an impression judgment is made twice, the weights estimated from the second judgment should increase as a step function of serial position where the increase occurs at the position of the first judgment (Prediction 1). An examination of SPW(1, 4) through SPW(4, 4) of the simulation results (see Figure 3 ) verifies this prediction. Note that in the (1, 4) condition (Panel C), SPW(1, 4) increases to SPW(2, 4), and the rest is relatively stable to SPW(4, 4); in the (2, 4) condition (Panel D), SPW(1, 4) and SPW(2, 4) are similar, and then the weight increases to SPW(3, 4) and SPW(4, 4); and in the (3, 4) condition (Panel E), SPW(1, 4) through SPW(3, 4) remain relatively stable, and then jumps to SPW(4, 4). Table 1 summarizes expected patterns and the corresponding results from the human participants. Generally, the expectations were supported.

Second, when impression judgments are made twice, the serial position weights before the first judgment estimated from that judgment should be greater than those serial positions estimated from the second judgment (Prediction 2). For instance, in the (1, 4) condition, the weight for the first serial position estimated from the first judgment, SPW(1, 1), was expected to be greater than the weight for the same position estimated from the second judgment, SPW(1, 4); similarly, for the (2, 4) condition, SPW(1, 2) and SPW(2, 2) were expected to be greater than SPW(1, 4) and SPW(2, 4); and for the (3, 4) condition, SPW(1, 3), SPW(2, 3), and SPW(3, 3) were expected to be greater than SPW(1, 4), SPW(2, 4), and SPW(3, 4) (see Figure 3 ). All expectations were borne out by the data (see Table 1 ).

Order effects.

As discussed, recency effects are expected in the present experiment when the consideration of response dependency is not relevant. In particular, a weak to moderate recency effect is expected because of memory decay for the stimuli whose context is the same as the judgment context (i.e., when there is no intervening judgment). When the judgment and the stimulus contexts differ, however, the effect of the context difference may override that of the memory decay. Therefore, a recency effect should be observable in which stimuli share the same context representation as the judgment but not in which they have different context representations. A series of planned contrasts generally supported the expectation ( Table 2 ).

Discussion

The human judgments were consistent with the TPM predictions. Although the changes in the human data (see Figure 4 ) seem somewhat sharper than those in the simulations (see Figure 3 ), this is because context vectors were randomly generated. It is possible to generate context vectors so that the psychological distinction between pre- and postjudgment is more pronounced. This would have produced sharper changes as observed in the human data. All in all, the high correlation between the human and simulation results (.88 overall) is impressive given that no parameter fitting was performed.

Furthermore, other TPM predictions were also supported. Human group impression judgments tend to be approximately additive under the final responding condition as a number of researchers reported in the person impression judgment (e.g., Anderson, 1981 ). There is a variety of time-dependent properties in group impression judgments as predicted by TPM. Temporal dynamics of impression formation have been classified into two general types, primacy and recency, but this simple classification masks its complexity. Small recency effects from memory decay as well as large ones from response dependency (i.e., change in context representation) may occur in impression formation.

Evolution of Group Impression

Once formed, group impressions have often been assumed to persist. The static schema concept and Lippmann's (1922) "picture in the head" metaphor of stereotypes seem to suggest the durability of impressions about social groups. Although the process of impression change has been a long-standing concern (e.g., Allport's, 1954 , contact hypothesis; see Pettigrew, 1998 ), Rothbart was well justified in commenting in 1981 that "although an understanding of how beliefs can be disconfirmed is fundamental for the development of an adequate theory of beliefs, we know very little about this problem" (p. 176).

Since then, a number of studies have been conducted to examine how group impressions change and evolve as new information is encountered, mainly in two experimental paradigms. One is concerned with stereotype change (e.g., R. Weber & Crocker, 1983 ), in which changes in a stereotypic impression about a social group (typically culturally recognized as a social category) are examined when people are presented with stereotype-inconsistent information. The second paradigm may be called group differentiation, in which impressions of two contrasting groups are experimentally created, and subsequent change processes examined with additional information (e.g., Krueger & Rothbart, 1990 ).

Stereotype Change Basic Findings of Stereotype Change

Weber and Crocker (1983) conducted a seminal work on stereotype change. In Experiment 1, information inconsistent with the prior impression about an occupational group (corporate lawyers or librarians) was presented to participants by describing a large number of group members who were each ascribed three characteristics that are consistent, inconsistent, or irrelevant to the prior impression about the group (i.e., stereotype). The participants then evaluated the occupational group on various stereotype-relevant trait dimensions. In all conditions, one third of the information was inconsistent, one sixth was consistent, and one half was irrelevant. The amount, but not the proportion, of inconsistent information was manipulated by presenting either 6 members or 30 members. However, within each condition, the pattern of distribution of stereotype inconsistent information was also manipulated. In the concentrated condition, the stereotype inconsistent information was concentrated in one third of the members; in the dispersed condition, it was dispersed across all members. They also included a control condition in which participants simply judged occupational groups without additional information.

Weber and Crocker found that, generally, stereotype-inconsistent information changes group impressions. First, the occupational groups were judged less stereotypically in all the experimental conditions in which inconsistent information was presented than in the control condition in which no information was given. Heit (1994) observed a similar effect of the amount of information in his Experiments 3 and 4. This clearly suggests that people are responsive to information that contradicts their prior group impression at least in the experimental context (Finding 1). Second, a greater amount of inconsistent information, nonetheless, tends to change the prior impression more (Finding 2). Although this effect was reliable only when the inconsistent information was dispersed across individuals in Experiment 1 of R. Weber and Crocker (1983) , it was found even in the concentrated condition in Experiment 2.

Third, not only the amount but also the pattern of stereotype-inconsistent information was found to influence the extent of stereotype change. A greater stereotype change was observed when stereotype-inconsistent information was dispersed across all group members than when concentrated to a minority (Finding 3). Although this tendency was reliably present when a large amount of information was given (30 members), it was weaker when only 6 group members were described. This finding, a greater stereotype change in the dispersed than in the concentrated condition, has been replicated by Johnston, Hewstone, et al. even with a relatively small amount of information (e.g., Johnston & Hewstone, 1992 ; Johnston, Hewstone, Pendry, & Frankish, 1994 ; for a review, see Hewstone, 1994 ; also Hantzi, 1995 ).

The effects of dispersed versus concentrated manipulation is apparently mediated by the perceived typicality of stereotype-disconfirming group members. When the typicality rating of the stereotype-disconfirming members was statistically controlled, the effect of information pattern became nonsignificant (Experiment 1 of Johnston & Hewstone, 1992 ; Hantzi, 1995 ). This explanation has been corroborated by two other findings in the literature. First, both R. Weber and Crocker's (1983) Experiment 3 and Rothbart and Lewis's (1988) Experiment 3 showed that the effect of typical examples on group impressions was greater than that of atypical examples (black, medium-income lawyers vs. white, high-income lawyers in Weber and Crocker; high-, medium-, and low-typicality fraternity members in Rothbart and Lewis). In addition, Rothbart and Lewis's Experiments 1 and 2 showed that people tended to overestimate the frequency of pairing of prototypical examples of a category (e.g., typical triangles) with a feature that is irrelevant to the definition of the category (e.g., the color of a triangle) compared with the case in which atypical examples of the category were paired with the feature. This latter finding was interpreted as showing that typical examples are weighted more than atypical examples, an insight to which we return later.

Modeling Stereotype Change

These findings have been interpreted in terms of three different models of belief change in this literature ( Hewstone, 1994 ; R. Weber & Crocker, 1983 ). The bookkeeping model ( Rothbart, 1981 ) implies a gradual change of beliefs, indicating a step-by-step updating of group impressions as new information is encountered. By contrast, the conversion model ( Rothbart, 1981 ) suggests a sudden alteration of a group impression based on the information about a group member that dramatically disconfirms the prior impression. Finally, the subtyping model postulates that information inconsistent with the prior impression tends to be "subtyped" as exceptions to the rule. This fencing off of inconsistent information, which Allport (1954) called the "re-fencing device" (p. 32), leads to a relatively conservative change of the prior impression if any.

These models, however, cannot provide a comprehensive explanation of the findings. The bookkeeping model can explain the first finding, a change of stereotype when stereotype-inconsistent information is presented relative to when no additional information is given. R. Weber and Crocker (1983) argued that the bookkeeping model cannot account for the second finding, a greater change when there is a greater amount of inconsistent information while the proportion of inconsistent to consistent information remains the same. The subtyping model can explain a greater effect of dispersed, as opposed to concentrated, stereotype-inconsistent information on stereotype change. When inconsistent information is concentrated in a minority of group members, the minority is likely to be subtyped as an atypical subgroup within the stereotyped group. The fencing off of the subtype would reduce the impact of inconsistent information on the stereotype. It has been argued that this finding contradicts the bookkeeping and conversion models (e.g., R. Weber & Crocker, 1983 ).

TPM analysis of stereotype change.

The TPM provides a general explanation of the findings, in which the person representation plays a central role. On the basis of Fiske and Neuberg's (1990) and Brewer's (1988) theories of stereotyping, it is assumed that when an event pertaining to a group member is consistent with the stereotype, the person is not individuated, so that the activation level of the units for the person aspect remains at the resting level (it remains at 1/√ N , or the vector for the person aspect is r ). In other words, the representation for a stereotype consistent person, p _C , is assumed to be very similar to r . By contrast, when a group member exhibits a stereotype-inconsistent characteristic, the person representation is individuated, so that its representation, p _I , deviates from the resting state, r .

Put mathematically, when the j th person, P _j (and event, E _ij

1 ), is encountered about a group ( G _i ), we assume that C ₁ = g _i ⊗ r ⊗ e _ij1 ⊗ x and C ₂ = g _i ⊗ r ⊗ r ⊗ x are used to access the memory representation. C ₁ is the representation of an exemplar of the group, G _i , involving the event, E _ij1 , when no particular group member is specified. C ₂ is a representation of the group when neither particular group member nor episode is specified. Match( M, C ₁ ) approximates the overall similarity of the new exemplar with all exemplars of G _i ; Match( M, C ₂ ) provides a measure of the baseline similarity of any event with all exemplars of G _i , which indicates the overall feeling of knowing about this group. We assume that the typicality of the event, t, is indexed as follows:

This ratio indicates the similarity of old exemplars to the new exemplar relative to the similarity of the old exemplars with any exemplar. The typicality index, t, is used to construct the person representation, p _ij = p _C or p _I , so that 0 < ( p _I · r ) < ( p _C · r ) < 1.

In other words, a stereotype-consistent member's representation is constructed so that it does not deviate markedly from the resting state, r (i.e., the person is not individuated). However, a stereotype-inconsistent member's representation differs markedly from the resting state (i.e., the person is individuated; see Fiske, Neuberg, Beattie, & Milberg, 1987 , for consistent evidence). In fact, this process may lead to the production of a subtype (e.g., R. Weber & Crocker, 1983 ) by a deliberative process of justification ( Kunda & Olson, 1995 ). More specifically, suppose that experimental participants come to the experiment with a prior memory representation and a total of J stimuli about one social group (group i ) are presented, and the number of stereotype-consistent and stereotype-inconsistent stimuli is J _C and J _I respectively ( J = J _C + J _I ). Under some simplifying assumptions, the judgment about group i when the j th member is encountered, J ( G _i ) _J , can be written as follows Appendix :

where s ₀ is the prior stereotype, s _C and s _I are the scale values for stereotype-consistent and stereotype-inconsistent information, and t = ( p _I · r ) is the typicality of a stereotype-inconsistent person. Note that Equation 23 is a special case of the weighted averaging model.

Empirical findings and the TPM.

Equation 23 makes several points clear. First, when there are sufficient stereotype-inconsistent members, the stereotype likely changes from the original level, s ₀ , closer toward the scale value of the stereotype-inconsistent information, s _I . This is because the weight for stereotype-inconsistent information becomes greater when J _I becomes larger, a prediction consistent with Finding 1. Second, the amount of stereotype change can increase as the amount of stereotype-inconsistent information increases even if the ratio of J _I to J _C remains constant provided that J _C / J _I is sufficiently small (especially J _C / J _I < t). This can be seen by dividing both the denominator and numerator of the right-hand side of Equation 23 by J _I . When J _I is very large, the effect of s ₀ is negligible. This is consistent with Finding 2.

In addition, Equation 23 suggests that the new information should be approximately additively combined with the prior impression. Heit's (1994) findings are consistent with this implication. His experiments examined the effect of category-consistent and category-inconsistent information on the meaning of a category such as "shyness." He systematically manipulated the probability that a "shy" person is described by behaviors such as "does not attend parties often (consistent)" and "attends parties often (inconsistent)" or a "not shy" person is characterized by these behaviors in his stimulus about people in city W. Participants were asked to estimate the probability of consistent pairing (i.e., a shy person not attending parties and a not-shy person attending parties) and inconsistent pairing (i.e., a shy person attending parties and a not-shy person not attending parties). The probability estimate for consistent pairings was always greater than that for inconsistent pairings, showing the effect of prior expectation. Further, the greater the probability of inconsistent pairing in the stimulus, the smaller was the estimated probability of consistent pairing, suggesting a change in category meaning. In addition, Heit showed that the effect of the prior expectation remained the same regardless of the probability of consistent and inconsistent pairing in the stimulus. This last finding implies that the effects of prior impressions and new information are additively combined. Hayes and Taplin (1992) also reported similar findings with children.

Equation 23 also implies that the amount of stereotype change is mediated by the extent to which a stereotype-inconsistent group member is individuated. Note that the relative weight for a stereotype-consistent and stereotype-inconsistent member is 1/[1 + J _C + t J _I ] and t/[1 + J _C + t J _I ], respectively. This means that a stereotype-inconsistent member's information is weighted less than a stereotype-consistent member's information. Recall that t = ( p _I · r ) < 1: t indicates the extent to which the stereotype-inconsistent member is individuated. If a person is individuated, the person representation deviates from the resting state, r . This implies that an individuated group member's inconsistent information does not affect the stereotype as much as a nonindividuated member's equally inconsistent information.

This implication of TPM is consistent with the theoretical insight expressed by Rothbart and John (1985) as well as Hewstone and Brown (1986 ; see, e.g., Scarberry, Ratcliff, Lord, Lanicek, & Desforges, 1997 , for evidence). It is also consistent with Rothbart, Sriram, and Davis-Stitt's (1996) finding that typical members are more likely retrieved by cuing memory with a group label than atypical members. Given that an atypical group member is more likely individuated than a typical group member, the model is consistent with the findings that typical group members are more likely to change stereotypes than atypical group members. Finally, it is consistent with the finding that a stereotype changes more when inconsistent information is dispersed across a number of individuals than when concentrated in a few individuals. This is because dramatically atypical individuals are likely individuated or subtyped.

Gurwitz and Dodge (1977) finding and TPM.

One puzzling finding was reported by Gurwitz and Dodge (1977) , whose result appears to contradict the wealth of empirical research in stereotype change. Their experiment was probably the first to examine the effect of dispersed versus concentrated stereotype-inconsistent information on stereotypes. They presented information about three sorority women who were friends and shared a room together. In the concentrated condition, one of the three women had all stereotype-inconsistent information, whereas all three women had some stereotype-inconsistent information in the dispersed condition. In both conditions, however, the total amount of stereotype-inconsistent information remained the same. Their participants were then asked to make impression judgments about another woman who was described as a friend of the three women, who shared the room with them, and who also belonged to the same sorority. Their findings suggested that the target person was judged as less stereotypical in the concentrated than in the dispersed condition.

Although the Gurwitz-Dodge finding seems inconsistent with the other findings, TPM suggests that a friend of only mildly stereotype-inconsistent group members (dispersed condition) can be evaluated to be more stereotypical than a friend of a radically stereotype-inconsistent group member (concentrated condition), apparently showing less of a stereotype change in the dispersed condition. The judgment about an individual member of a group can be modeled by TPM Appendix :

where t _T represents the typicality of the target person and s ( I, T ) is the similarity between the target and stereotype-inconsistent members. Note that J ( G _i , P _T ) is a judgment about a target person, who is a member of group i .

Equation 24 implies that if the target is similar to stereotype-inconsistent members, that is, s ( I, T ) is large, then the judgment about the target is influenced more by the stereotype-inconsistent members. This implication of TPM can explain the Gurwitz-Dodge finding. Furthermore, to the extent that the typicality of the target person is low (i.e., t _T is small), the effect of the stereotype should be relatively small. This latter implication is consistent with Fiske and Neuberg's continuum model (1990).

Simulating the Stereotype Change Findings

Johnston and Hewstone's (1992) conditions were simulated using Mathematica on a Silicon Graphics Indy Workstation. In their experiment, participants were shown eight members of a stereotyped group, each of whom was described by six pieces of information. There were 12, 12, and 24 pieces of stereotype-consistent, stereotype-inconsistent, and stereotype-irrelevant information, respectively. In the concentrated condition, the stereotype-inconsistent information was concentrated in two members, but in the dispersed condition, it was distributed across six members (two pieces each). In the latter condition, two members were ascribed three pieces of stereotype-consistent information. Their third stimulus condition was excluded from the simulation for simplicity.

The attention parameter a declined from 1, .8, .64, and so on (i.e., 8 ^k

-1 , where k = 1 to 8) from the first to the eighth stimulus for each person. The forgetting parameter b was set at .95. For each condition, 20 simulations were run. Impression judgments about the group and the individual person were collected after the learning phase, the first change phase, and the second change phase. The means are reported in Table 3 ( see Appendix for details).

The means show that the stereotype was successfully learned by the TPM after the learning phase. The impression judgments using the group and person cues were both .81, indicating a high level of stereotyping (1 = perfectly stereotypical ). When information that is inconsistent with the group stereotype was presented, however, group impression judgments clearly changed. The group impression judgments after the first change phase were less stereotypical than those before it. The impression judgments were even less stereotypical after the second change phase than immediately after the first change phase. Clearly, a greater amount of stereotype-inconsistent information changes the stereotype.

The effects of concentrated versus dispersed stimulus configuration were successfully simulated in the present simulation. When the group was the target as in the typical experimental paradigm, the impression judgments were more stereotypical in the concentrated condition than in the dispersed condition, suggesting a greater stereotype change in the dispersed condition. By contrast, when the person was the target, as in Gurwitz and Dodge, there appears to be a greater stereotype change in the concentrated condition than in the dispersed condition.

Comments

The simulation results showed that TPM can reproduce both the basic findings and Gurwitz and Dodge's finding, showing its capacity to provide a unified account. Central in this is the process of individuation, a process with an ironic implication. On the one hand, as noted by Fiske et al. (e.g., Fiske & Neuberg, 1990 ), the individuated person is less likely stereotyped. On the other hand, as pointed out by Rothbart et al. (e.g., Rothbart & John, 1985 ), the individuated person less likely affects the group impression: That is, less stereotype change is likely. In other words, individuation may be good for the individual but not necessarily good for the group (see Yzerbyt, Coull, & Rocher, 1999 ). Nevertheless, this does not mean that individuation should be avoided to change an undesirable stereotype. As Rothbart and John (1985) pointed out and TPM suggests, although radically stereotype-inconsistent exemplars may have only small effects on stereotypes (as shown in the concentrated condition), they too could eventually effect a stereotype change if cumulated over time. It would just take more exemplars to attain the same amount of stereotype change when group members are individuated than when they are not.

Finally, in accounting for Gurwitz and Dodge's (1977) finding, we made use of relational information, that is, information about the interpersonal relationship between the stimulus persons and the person about whom impression judgments were required. We assumed that a friend of stimulus persons would be represented in a way that resembles the representations of the stimulus persons. The effect of information about interpersonal relationships on group impressions should be examined more systematically.

Group Differentiation

Tajfel and Wilkes's (1963) classical research on the accentuation phenomenon provided the original impetus to this line of research. Participants in their study were shown a series of lines and asked to estimate their lengths. Tajfel and Wilkes then compared estimated lengths of the lines that were adjacent to each other in length. In some conditions, shorter lines and longer lines were classified into different categories, whereas in other conditions there was no meaningful relation between classification and line length. In the former conditions, the difference between the estimated lengths of adjacent lines was exaggerated when the two lines were classified into two different categories, although this accentuation of interclass difference was not observed when the classification did not meaningfully correlate with line length. A number of studies successfully replicated this finding in the past (e.g., Eiser, 1971 ; McGarty & Penny, 1988 ; see the latter for a review).

Tajfel and Wilkes's (1963) original studies examined people's evaluation of individual stimuli that were classified into categories. As noted by Krueger, Rothbart, et al. ( Krueger, 1991 , 1992 ; Krueger & Rothbart, 1988 ; Krueger, Rothbart, & Sriram, 1989 ), this procedure cannot distinguish two sources of the interclass accentuation. One is a contrast effect, in which the perception of an individual stimulus is affected by its membership with one of the two contrasting categories. The other is an accentuation effect, in which a difference between the central tendencies of the differentiated categories is accentuated over and beyond what is expected only from the contrast effect. In this article, we are concerned with this latter phenomenon as it pertains to the judgments of central tendencies, or group impressions.

Basic Findings of Group Differentiation

Krueger and Rothbart's (1990) Experiment 2 provides a prototypical example. Participants were shown a series of personality trait adjectives (pretested to determine their favorability) that were classified into two contrasting groups (focal and context groups) and rated the favorability of each adjective as well as the overall mean favorability of the two groups. In the learning phase, the distributions of the stimulus traits in two groups did not overlap in their favorability. The mean favorability of the context group was, relative to that of the focal group, higher in one condition and lower in the other condition. In the change phase, additional trait adjectives were presented for the two groups. Although the actual means of the two distributions remained constant, the variance of the focal group was made greater than before, so that there was now some overlap between the focal and context groups. Krueger and Rothbart examined the estimated means of the focal and context groups before and after the change phase while controlling for the average of the favorability judgments for the individual traits. They found that the estimated mean of the focal group moved away from the mean of the context group, although there was no change in either the actual mean of the trait adjectives or the average of the rated favorability of individual traits. This finding was largely replicated in their Experiment 3. Krueger and Rothbart's (1990) Experiment 1 using traits, as well as Krueger et al. (1989) and Krueger (1991) using numbers as stimuli, showed a comparable effect when there was a real change in the central tendency of the distribution.

Modeling Group Differentiation

In line with the suggestion made by Krueger, Rothbart, et al., TPM accounts for the basic group differentiation phenomenon by extending the analysis for the stereotype change. In modeling the stereotype change phenomena, the typicality of a person was determined with regard to the single group for which the person was a member. In the group differentiation paradigm, in which two groups are contrasted, however, we assume that an exemplar's typicality is determined not only by the exemplar's similarity with its group's representation but also by its dissimilarity from the representation of the group to which its group is contrasted ( Campbell, 1958 ; Turner, 1987 ; also see Ford & Stangor, 1992 ). As in the stereotype change paradigm, we suggest that the person representation is constructed as a function of the typicality, but it is defined within the frame of reference set by the two contrasting groups.

This can be modeled mathematically. Suppose that two groups, Group 1 ( G ₁ ) and Group 2 ( G ₂ ), are contrasted with each other, and an event pertaining to Group 1, E _1

jk , is encoded as e _1jk . Let C ₁₁ = g ₁ ⊗ r ⊗ e _1jk ⊗ x and C ₂₁ = g ₂ ⊗ r ⊗ e _1jk ⊗ x . C ₁₁ is the representation of an exemplar of Group 1 involving the event, and C ₂₁ is the representation of the same event counterfactually assuming that it pertained to Group 2. We then assume that the typicality of G ₁ 's exemplar, E _1

jk , is determined by the following rule:

The typicality of G ₂ 's exemplar E _2

jk ( e _2jk ), t _2

jk , can be defined analogously.

Equation 25 is closely related to the concept of metacontrast ( Campbell, 1958 ; Turner, 1987 ). Match ( M, C _11jk ) approximates the similarity of the exemplar, E _1

jk , with all the other exemplars of Group 1 stored in memory (plus some error), and Match( M, C ₂₁ ) approximates the similarity of the same exemplar with all the exemplars of Group 2. As the similarity of the event with all the events associated with Group 1 increases and the similarity of the event with all the events associated with Group 2 decreases, the typicality of this event for Group 1 increases. Therefore, t _1

jk increases as the metacontrast ratio, Match( M, C ₁₁ )/Match( M, C ₂₁ ), increases.

Simulating the Group Differentiation Findings

To show that TPM with these additional assumptions can account for the group differentiation phenomena, Krueger and Rothbart's (1990) Experiment 2 was simulated. The results ( Table 4 ) showed that the mean judgments of the focal group moved away from the context group mean in the change phase relative to the learning phase. In the condition in which the context group mean was lower than the focal group mean (Condition 1), the simulated judgment mean for the focal group became larger. Similarly, in the condition in which the context group mean was higher than the focal group mean (Condition 2), the simulated judgment mean for the focal group became smaller. For each condition, a two-way repeated measures ANOVA was conducted with the judgment as the dependent variable and phase (learning vs. change) and group (focal vs. context) as independent variables. As expected, the Phase × Group interaction effect was significant, F (1, 19) = 6.24, p = .022, and F (1, 19) = 26.67, p < .001, for Conditions 1 and 2, respectively.

Discussion

In line with the current theories of stereotyping and group differentiation (e.g., Fiske & Neuberg, 1990 ; Rothbart & John, 1985 ), we postulated that the evaluation of typicality drives the encoding of the individual member. Central in this formulation was the importance of various relational information, that is, the assumption that social perceivers make use of not only the information about the relationship between the group and an individual exemplar (group-person relationship) but also the information about the relationship between two groups (intergroup relationship) in computing the typicality of an individual exemplar. This assumption enabled the TPM to explain the empirical phenomena of stereotype change and group differentiation.

General Discussion

Group impressions are dynamic configurations. They represent social perceivers' flexibly structured and constantly evolving understandings about social groups. The empirical findings reviewed and the experiment reported here underscore the dynamics of group impressions. Group impressions exhibit a number of time-dependent properties and evolve over time in interaction with the information environment. Despite the implicit assumption that group impression formation and change are two separate phenomena, the process underlying both formation and change of group impressions could be a single, learning process.

Group impressions are configural too. The configural use of features is clearly important in learning group categories. A variety of research on the use of context information corroborates the importance of the configural encoding of groups, group members, social events pertaining to them, and context in which the events are said to occur. The use of relational information about person-group and intergroup relationships also underlines the significance of configuration, that is, the structure of the social information based on which group impressions are formed. The TPM provides a unified framework for theorizing about group impressions as dynamic configurations.

Strengths and Weaknesses of the TPM as a Framework for Group Impression Formation and Change

The TPM not only provides a unified framework for the diverse array of empirical findings but also affords an insight about the interpretation of algebraic models postulated in social cognition. Algebraic models are often regarded literally as describing the psychological process in terms of algebraic operations and, therefore, as a description of the controlled, deliberative processing of information (e.g., Fazio, 1990 ; Fiske & Neuberg, 1990 ). However, as shown here, both the weighted averaging model and the context model can fall out of the current connectionist model as a natural consequence of the memory and judgment process. This implies, first, that algebraic models should be construed as computational models that simply describe input-output relations rather than algorithmic models that describe the psychological process ( Kashima & Kerekes, 1994 ; E. U. Weber, Goldstein, & Busemeyer, 1991 ). In other words, the algebraic models should not be interpreted literally but may be seen as describing macrolevel regularities that emerge from microlevel psychological processes, which may not be effortful at all.

Despite these strengths, TPM has its weaknesses. In many social—cognitive theories (e.g., Wyer & Carlston, 1979 ), the central processing unit (CPU) has been implicitly or explicitly postulated, whose function is to execute procedural knowledge to manipulate declarative knowledge. In connectionist networks, a set of simple processing units, whether localist or distributed, collectively process information. This removed the necessity for the CPU, which smacks of a homunculus in the head. This feature of connectionism may be regarded as an advantage. However, TPM cannot do away with a control mechanism. For example, recall that order effects were explained partly by the attentional parameter and that the individuation process involved in stereotype change and group differentiation was explained in terms of the construction of a person representation. Some mechanism is needed to control the attentional parameter and the construction process for a person representation. This mechanism does not have to be a single CPU but may have a parallel distributed architecture.

A number of areas are yet to be incorporated into the present framework. For instance, more detailed discussion is necessary about the process of stereotyping and individuation in which an individual is the target of judgment (e.g., Fiske & Neuberg, 1990 ), the perception of an individual's behavior (e.g., Manis, Biernat, & Nelson, 1991 ), the judgment of group variability (e.g., Ostrom & Sedikides, 1992 ), memory about groups (e.g., Rothbart, Evans, & Fulero, 1979 ; Rothbart, Fulero, Jensen, Howard, & Birrell, 1978 ), and the relation between memory and judgment (e.g., Hastie & Park, 1986 ; Srull & Wyer, 1989 ). Because of this, we did not discuss some studies that examined the process of group impression formation when the target was a new group consisting of members of a stereotyped social category (e.g., Dijksterhuis & van Knippenberg, 1995 ; Dijksterhuis, van Knippenberg, Kruglanski, & Schaper, 1996 ). We have not addressed some of the nonlinear processes associated with group impression formation and change. People construct emergent properties when two pieces of contradictory information are integrated (e.g., Asch & Zukier, 1984 ; Hastie, Schroeder, & Weber, 1990 ; Kunda, Miller, & Claire, 1990 ). Although TPM can address this issue by adopting a strategy similar to Smith and DeCoster's (1998a , 1998b) , full implications of this type of cognitive activity are still outside its scope.

Advantages of Theoretical Reduction

We attempted a theoretical reduction of algebraic models to the TPM, and there are clear advantages. Because old theories are not falsified, they can be regarded as simpler approximations to more complex descriptions provided by a new theory. Old theories can be retained as a useful tool for investigation and a practical approximation. Old theories can be interpreted in a new light, and new theoretical insights may be gained. The use of the weighted averaging model in the current article provides an illustration. The weighted averaging model was shown to be derivable from TPM. The weight and scale value concepts in the weighted averaging model allowed us to examine the time-dependent properties of group impression formation. Furthermore, we could use a model similar to the weighted averaging model ( Equations 23 , and 24 ) to shed light on the stereotype change literature.

A theoretical reduction shows a cumulative and dynamic nature of the scientific enterprise of social psychology. Echoing Massaro (1990 ; also Massaro & Cowan, 1993 ), we believe connectionist approaches underscore a continuity in psychological theorizing (also see Kashima & Kerekes, 1994 ; Read et al., 1996 ) rather than a radical departure. If Asch's (1946 , 1952) foundational insight was to conceptualize impressions as dynamic configurations, the evolution of social—cognitive theories in the past two decades since Person Memory: The Cognitive Basis of Social Perception by Hastie et al. (1980) may be seen as a pursuit of an increasingly dynamic theory of social—cognitive processes. The upsurge of interest in connectionism may be a continuation of this trend. The current formulation attempted to show that at least one connectionist-type model, TPM, can describe both dynamic and Gestalt-like configural properties of group impression formation and change, making a contribution to the social—cognitive research tradition ( Laudan, 1977 ; or research program as in Lakatos, 1970 ).

Connectionism as a Research Tradition

In providing support for TPM in particular, we showed the utility of connectionism in general. Connectionism too is a research tradition whose core consists of a set of theoretical principles. This way, connectionism provides a very general framework in which to develop more specific architectures. Just as a number of specific models of person representation can be generated within the framework of associative memory (e.g., Srull & Wyer, 1989 ), a specific connectionist architecture such as TPM can generate a number of more specific models. These models can be competitively tested and falsified (e.g., Equations 22 , and 25 ). We showed that the current form of TPM can explain existing data and generate new, empirically supported predictions.

In so doing, we also showed that the current forms of other connectionist architectures have some difficulty explaining the data we provided. At one level, the current forms of those architectures were falsified. However, it does not mean that those architectures cannot be modified to account for the data. Just as impressions are dynamic configurations, so too are theories. They may very well evolve in the face of empirical challenges and develop some novel predictions and possibilities. Connectionism provides a set of conceptual tools with which to theorize about psychological phenomena. With new tools, new possibilities emerge. In this way, the research tradition of connectionism and its particular architectures coevolve with empirical investigations.

Implications for Stereotype Formation and Maintenance

One reason for the current interest in group impression formation and change is its potential implications for real-life stereotypes. Nevertheless, unprincipled generalizations of laboratory results based on hypothetical groups can always be challenged for their lack of ecological validity. However, a well-developed theory can provide a defensible basis for generalizing laboratory results to the sociocultural milieu. In our assessment, the TPM can provide just such a theoretical basis. The present modeling of group impression formation implies that existing stereotypes likely reflect the distribution of types of events in the probabilistic sociocultural environment, although some aspects of group impressions may have a genetic, modular basis ( Hirschfeld, 1996 ). A group is likely perceived in a positive light when the preponderance of direct or indirect hearsay information is relatively positive or vice versa. However, when a relatively small amount of information is involved, stereotypes may not reflect the information distribution in the sociocultural environment. The "distinctiveness-based" illusory correlation may form the basis of some stereotypes.

The current formulation sheds light on the process of stereotype maintenance. The model suggests that group impressions can change in the long run insofar as stereotype-inconsistent information continues to be encoded and stored. Nevertheless, the current model assumes, but does not address, the encoding process fully (especially feature encoding process). As von Hippel, Sekaquaptewa, and Vargas (1995) emphasized, the perceptual encoding process may, in fact, be responsible for the persistence of stereotypes. To this extent, the stability of stereotypes may stem only in part from the rigidity of the cognitive system despite the picture-in-the-head metaphor enshrined by Lippman (1922) . Potential sources of stereotype maintenance may be more affective and motivational (see Forgas, 1992 ; Kunda, 1990 ). For various reasons such as right-wing authoritarianism ( Adorno, Frenkel-Brunswik, Levenson, & Sanford, 1950 ; see Altemeyer, 1998 ; Pratto, 1999 ), people may engage in motivated reasoning, so that they can maintain once-formed stereotypes. As Kunda and Oleson (1995) noted, such processes of justification of established impressions may be a significant source of stereotype maintenance. Hoffman and Hurst (1990) argued that gender stereotypes (and probably stereotypes in general) are based not only on observed covariation between group categories and role occupancy, as argued by Eagly and Steffen (1984) , but also on justifications that people make about the group difference (see Jost & Banaji, 1994 , for a related point).

Another source of the stability of stereotypes may be the information environment in which the cognitive system resides, that is, the sociocultural environment. In particular, a social stereotype may be sustained because a social observer's environment, from which stereotype-relevant information is learned, provides a steady flow of a similar mix of stereotype-consistent and stereotype-inconsistent information. As Oakes et al. (1994) argued, the intergroup relationships that actually exist between the perceiver's ingroups and outgroups may provide a strong basis for stereotypes. Alternatively, under some circumstances, as Jussim and Fleming (1996) noted, a stereotypical expectation may act as a self-fulfilling prophecy, bringing the social reality in line with the stereotype (e.g., Rosenthal & Jacobson, 1968 ). Mackie and Smith's (1998) review shows that stereotyping can be conceptualized within an integrative framework of intergroup relationships.

As Bartlett (1932) and Allport and Postman (1947) noted long ago, culturally shared stereotypes may persist as they are told and retold in informal communication. In keeping with this, Kashima (2000b) showed that stereotype-consistent information tends to be retained better than stereotype-inconsistent information as a story embedding stereotype-relevant information is transmitted from one person to the next. Culturally structured explanations and justifications are likely to play a significant role in stereotype maintenance in conjunction with the intergroup relationships ( McGarty, 1999 ). What is required is a social psychology of cultural dynamics ( Kashima, 2000a , 2000c ), that is, a systematic investigation of the dynamics involved in the sociocultural embedding of stereotypes as dynamic configurations.

All in all, connectionist modeling of laboratory-based group impressions can provide insights into the social—psychological process involving existing social stereotypes. This is possible because strong theories can lay a solid foundation for laboratory phenomena, and the credibility of the theory can be used as a basis for principled generalization of theoretical insights to the "real-life" phenomena in sociocultural context.

Concluding Remarks

In 1952, Asch described the state of knowledge about group impression formation and change:

We know little today of the question at issue, mainly because of our failure to study directly the process of impression-forming. Therefore we are not in a position to answer certain first questions such as: What are the organizational properties of group impressions? In what respects do they differ among individuals? What conditions determine their rigidity and lability? (p. 235)

We have indeed made some headway but are now only beginning to answer these questions in a theoretically principled manner.

APPENDIX A Group Impression Formation and Information Environment ( Equation 19 )

There are five simplifying assumptions. First, there is some influence of prior memory. When a novel group i is judged without new information, the judgment, J ( G _i ) ₀ = Match( M ₀ , H )/[Match( M ₀ , H ) + Match( M ₀ , L )], is assumed to be s ₀ , which is assumed to be close to the neutral point of the scale (.5). Second, all positive and negative pieces of information are represented by e _p and e _n . Assume s _p = ( e _p · h )/[( e _p · h ) + ( e _p · l )] and s _n = ( e _n · h )/[( e _n · h ) + ( e _n · l )] (.5 < s _p < 1 and 0 < s _n < .5; so that s _n < s ₀ < s _p ). Third, both the attention and forgetting parameters are assumed to be 1. Fourth, assume that every person is represented as r . Finally, assume that

Under the simplifying assumptions, the memory representation after the J th group member is

where E _p = g _i ⊗ r ⊗ e _p ⊗ x _l and E _n = g _i ⊗ r ⊗ e _n ⊗ x _l , and p is the probability of positive events and (1 - p ) is the probability of negative events. When this representation is accessed by H and L , the judgment after the J th group member is

Equation A2 is Equation 19 in the text.

When J is constant, J ( G _i ) _J approaches ( s ₀ + Js _p )/(1 + J ) as p approaches 1. When J becomes very large, J ( G _i ) _J approaches ps _p + (1 - p ) s _n , that is, the average of the positive and negative information. When J is relatively small and p is constant, the change of J ( G _i ) _J when J increases by 1 is

Note that Equation A3 shows that this is positive (i.e., judgment becomes more positive), when p - ( s ₀ - s _n )/( s _p - s _n ) > 0 ( s _p - s _n > 0; J > 0). It is negative otherwise.

Stereotype Change ( Equations 23 , and 24 )

There are five simplifying assumptions. First, there is a stereotype about group i . That is, when group i is judged in terms of H and L without new information, the judgment, J ( G _i ) ₀ = Match( M ₀ , H )/[Match( M ₀ , H ) + Match( M ₀ , L )], is assumed to be s ₀ where .5 < s ₀ < 1. Second, all stereotype-consistent and stereotype-inconsistent information are represented by e _C and e _I , and assume that s _C = ( e _C · h )/[( e _C · h ) + ( e _C · l )] and s _I = ( e _I · h )/[( e _I · h ) + ( e _I · l )] (.5 < s _C < 1 and 0 < s _I < .5). Third, both the attention and forgetting parameters are 1. Fourth, assume ( p _C · r ) = 1 and ( p _I · r ) = t < 1. Finally, assume [Match( M ₀ , H ) + Match( M ₀ , L )] = [( e _C · h ) + ( e _C · l )] = [( e _I · h ) + ( e _I · l )].

Under these assumptions, the memory representation after the J th group member is

where E _C = g _i ⊗ p _C ⊗ e _C ⊗ x _I and E _I = g _i ⊗ p _I ⊗ e _I ⊗ x _I , and J = J _C + J _I ( J _C = number of stereotype-consistent stimuli, and J _I = number of stereotype-inconsistent stimuli).

Group Target

When the target is a group, the representation, M _J , is accessed by H and L , and the judgment is

Equation A5 is Equation 23 .

Person Target

When the target is an individual, the accessing cues include the target person representation, p _T . The mental representation M _J ( Equation A2 ) is accessed by H' = g _i ⊗ p _T ⊗ h ⊗ x _l and L' = g _i ⊗ p _T ⊗ l ⊗ x _l . By modifying Equation A3 , the judgment is

where t _T = ( r · p _T ) = ( p _C · p _T ), the typicality of the target person. Equation A6 is Equation 24 .

Details of the Simulations

In all simulations, relevant tensors and vectors were generated as follows. M ₀ is a tensor whose element is a random number between 0 and 1. All vectors were constructed by first generating a 10-element vector with a random number between 0 and 1 for each element conforming to a specification, adding a small random vector (length = .1) and normalizing it (making its length unity). The representations for a group, a group member, and a context were random vectors, g , p , and x . A new random vector was computed for a new group, a new person, and a new context. The high and low ends of a judgment scale (e.g., attitude or stereotype consistency), h and l , were specified so that ( h · l ) = 0; stimuli e _h and e _l were so that ( e _h · h )/[( e _h · h ) + ( e _h · l )] = .9 and ( e _l · h )/[( e _l · h ) + ( e _l · l )] = .1.

To simulate the main experiment, high and low stimuli for a related topic were e _rh and e _rl , so that ( e _rh · e _h ) = .7 and ( e _rl · e _l ) = .7.

For the stereotype change simulation, 80 stereotype-consistent, 10 stereotype-inconsistent, and 10 stereotype-irrelevant exemplars were presented in the learning phase. The specifications for stereotype-consistent, stereotype-inconsistent, and stereotype-irrelevant information were e _C , e _I , and e _R , where ( e _C · h )/[( e _C · h ) + ( e _C · l )] = .9, ( e _I · h )/[( e _I · h ) + ( e _I · l )] = .1, and ( e _R · h ) = ( e _R · l ) = 0. The target person representation, p _T , was made similar to the group members (dot product between .6 and .8).

For the group differentiation simulation, event representations, e _s , with different scale values were specified so that ( e _s · h )/[( e _s · h ) + ( e _s · l )] = s/10, where s took an integer from 1 to 9. The focal and context groups were g ₁ and g ₂ , so that ( g ₁ · g ₂ ) = 0 (focal group = Group 1 with scale values between .4 and .6, and context group = Group 2 with scale values ranging from .1 to .3 or .7 to .9). The distribution of events for the simulation is presented in Table A1 . The person representation was constructed for Group 1,

The person representation was constructed analogously for Group 2.

References

Adorno, T. W., Frenkel-Brunswik, E., Levenson, D. J. & Sanford, R. N. (1950). The authoritarian personality. (New York: Harper & Row)
Allison, S. T., Mackie, D. M. & Messick, D. M. (1996). Outcome biases in social perception: Implications for dispositional inference, attitude change, stereotyping, and social behavior. Advances in Experimental Social Psychology, 28, 53-93.
Allison, S. T. & Messick, D. M. (1985). The group attribution error. Journal of Experimental Social Psychology, 21, 563-579.

Allport, G. W. (1954). The nature of prejudice. (Reading, MA: Addison-Wesley)
Allport, G. W. & Postman, L. (1947). The psychology of rumor. (New York: Henry Holt)
Altemeyer, B. (1998). The other "authoritarian personality." Advances in Experimental Social Psychology, 30, 47-92.
Anderson, N. H. (1968). Application of a linear-serial model to a personality-impression task using serial presentation. Journal of Personality and Social Psychology, 10, 354-362.

Anderson, N. H. (1973). Serial position curves in impression formation. Journal of Experimental Psychology, 97, 8-12.
Anderson, N. H. (1981). Foundations of information integration theory. (New York: Academic Press)
Anderson, N. H. (1982). Methods of information integration theory. (New York: Academic Press)
Asch, S. E. (1946). Forming impressions of personality. Journal of Abnormal and Social Psychology, 41, 258-290.

Asch, S. E. (1952). Social psychology. (Englewood Cliffs, NJ: Prentice Hall)
Asch, S. E. & Zukier, H. (1984). Thinking about persons. Journal of Personality and Social Psychology, 46, 1230-1240.

Bartlett, F. C. (1932). Remembering: A study in experimental psychology. (New York: Macmillan)
Berndsen, M., Spears, R., McGarty, C. & van der Pligt, J. (1998). Dynamics of differentiation: Similarity as the precursor and product of stereotype formation. Journal of Personality and Social Psychology, 74, 1451-1463.

Biernat, M. & Kobrynowicz, D. (1997). Gender- and race-based standards of competence: Lower minimum standards but higher ability standards for devalued groups. Journal of Personality and Social Psychology, 72, 544-557.

Biernat, M. & Manis, M. (1994). Shifting standards and stereotype-based judgments. Journal of Personality and Social Psychology, 66, 5-20.

Brewer, M. B. (1988). A dual process model of impression formation. Advances in Social Cognition, 1, 1-36.
Brewer, M. B., Dull, V. & Lui, L. (1981). Perceptions of the elderly: Stereotypes as prototypes. Journal of Personality and Social Psychology, 41, 656-670.

Bruner, J. (1990). Acts of meaning. (Cambridge, MA: Harvard University Press)
Brunswik, E. (1956). Perception and the representative design of experiments. (Berkeley: University of California Press)
Busemeyer, J. R. & Myung, I. J. (1988). A new method for investigating prototype learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 3-11.
Campbell, D. T. (1958). Common fate, similarity, and other indices of the status of aggregates of persons as social entities. Behavioral Science, 3, 14-25.

Campbell, D. T., Lewis, N. A. & Hunt, W. A. (1958). Context effects with judgmental language that is absolute, extensive, and extra-experimentally anchored. Journal of Experimental Psychology, 55, 220-228.

Chapman, L. J. (1967). Illusory correlation in observational report. Journal of Verbal Learning and Verbal Behavior, 6, 151-155.

Chapman, L. J. & Chapman, J. P. (1967). Genesis of popular but erroneous diagnostic observations. Journal of Abnormal Psychology, 72, 193-204.

Chapman, L. J. & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 14, 271-280.

Crick, F. (1984). Function of the thalamic reticular complex: The searchlight hypothesis. Proceedings of the National Academy of Sciences, 81, 4586-4590.
Crocker, J., Fiske, S. T. & Taylor, S. E. (1984). Schematic bases of belief change.(In J. R. Eiser (Ed.), Attitudinal judgment (pp. 197—226). New York: Springer.)
Dennis, S. & Humphreys, M. S. (1997). Integrating global matching and dual processing approaches to episodic recognition: The bind cue decide model of episodic memory. (Unpublished manuscript, Department of Psychology, University of Queensland)
Dijksterhuis, A. & van Knippenberg, A. (1995). Memory for stereotype-consistent and stereotype-inconsistent information as a function of processing pace. European Journal of Social Psychology, 25, 689-693.

Dijksterhuis, A., van Knippenberg, A., Kruglanski, A. W. & Schaper, C. (1996). Motivated social cognition: Need for closure effects on memory and judgment. Journal of Experimental Social Psychology, 32, 254-270.

Dreben, E. K., Fiske, S. T. & Hastie, R. (1979). The independence of evaluative and item information: Impression and recall order effects in behavior-based impression information. Journal of Personality and Social Psychology, 37, 1758-1768.

Eagly, A. H. & Steffen, V. J. (1984). Gender stereotypes stem from the distribution of women and men into social roles. Journal of Personality and Social Psychology, 46, 735-754.

Eich, J. M. (1982). A composite holographic associative recall model. Psychological Review, 89, 627-661.

Eiser, J. R. (1971). Enhancement of contrast in the absolute judgment of attitude statements. Journal of Personality and Social Psychology, 17, 1-10.
Fazio, R. H. (1990). Multiple processes by which attitudes guide behavior: The mode model as an integrative framework. Advances in Experimental Social Psychology, 23, 75-109.
Fiedler, K. (1991). The tricky nature of skewed frequency tables: An information loss account of distinctiveness-based illusory correlations. Journal of Personality and Social Psychology, 60, 24-36.

Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenomenon in probabilistic, multiple-cue environments. Psychological Review, 103, 193-214.

Fiedler, K. & Armbruster, T. (1994). Two halfs may be more than one whole: Category-split effects on frequency illusions. Journal of Personality and Social Psychology, 66, 633-645.

Fiske, S. T. & Neuberg, S. L. (1990). A continuum model of impression formation from category-based to individuating processes: Influence of information and motivation on attention and interpretation. Advances in Experimental Social Psychology, 23, 1-74.
Fiske, S. T., Neuberg, S. L., Beattie, A. E. & Milberg, S. J. (1987). Category-based and attribute-based reactions to others: Some informational conditions of stereotyping and individuating processes. Journal of Experimental Social Psychology, 23, 399-427.

Fiske, S. T. & Taylor, S. E. (1991). Social cognition ((2nd ed.). New York: McGraw-Hill)
Ford, T. E. & Stangor, C. (1992). The role of diagnosticity in stereotype formation: Perceiving group means and variances. Journal of Personality and Social Psychology, 63, 356-367.

Forgas, J. P. (1992). Affect in social judgments and decisions: A multiprocess model. Advances in Experimental Social Psychology, 25, 227-275.
Gigerenzer, G. & Hoffrage, U. (1995). How to improve bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684-704.

Gurwitz, S. B. & Dodge, K. A. (1977). Effects of confirmations and disconfirmations on stereotype-based attributions. Journal of Personality and Social Psychology, 35, 495-500.

Halford, G. S., Wilson, W. H., Guo, J., Gayler, R. W., Wiles, J. & Stewart, J. E. M. (1994). Connectionist implications for processing capacity limitations in analogies.(In K. J. Holyoak & J. A. Barnden (Eds.), Advances in connectionist and neural computation theory (pp. 363—415). Norwood, NJ: Ablex.)
Hamilton, D. L., Dugan, P. M. & Trolier, T. K. (1985). The formation of stereotypic beliefs: Further evidence for distinctiveness-based illusory correlations. Journal of Personality and Social Psychology, 48, 5-17.
Hamilton, D. L. & Gifford, R. K. (1976). Illusory correlation in interpersonal perception: A cognitive basis of stereotypic judgments. Journal of Experimental Social Psychology, 12, 392-407.

Hamilton, D. L. & Sherman, J. W. (1994). Stereotypes.(In R. S. Wyer, Jr., & T. K. Srull (Eds.), Handbook of social cognition (2nd ed., Vol. 2, pp. 1—68). Hillsdale, NJ: Erlbaum.)
Hamilton, D. L. & Sherman, S. J. (1996). Perceiving persons and groups. Psychological Review, 103, 336-355.

Hantzi, A. (1995). Change in stereotypic perceptions of familiar and unfamiliar groups: The pervasiveness of the subtyping model. British Journal of Social Psychology, 34, 463-477.

Hastie, R., Ostrom, T. M., Ebbesen, E. B., Wyer, R. S., Hamilton, D. L. & Carlston, D. E. (1980). Person memory: The cognitive basis of social perception. (Hillsdale, NJ: Erlbaum)
Hastie, R. & Park, B. (1986). The relationship between memory and judgment depends on whether the judgment task is memory-based or on-line. Psychological Review, 93, 258-268.

Hastie, R., Schroeder, C. & Weber, R. (1990). Creating complex social conjunction categories from simple categories. Bulletin of the Psychonomic Society, 28, 242-247.

Hayes, B. K. & Taplin, J. E. (1992). Developmental changes in categorization processes: Knowledge and similarity-based modes of categorization. Journal of Experimental Child Psychology, 54, 188-212.

Heit, E. (1994). Models of the effects of prior knowledge on category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1264-1282.

Hewstone, M. (1994). Revision and change of stereotypic beliefs: In search of the elusive subtyping model. European Review of Social Psychology, 5, 69-109.
Hewstone, M. & Brown, R. J. (1986). Contact is not enough: An intergroup perspective on the "contact hypothesis." In M.(Hewstone & R. Brown (Eds.), Contact and conflict in intergroup encounters (pp. 1—44). Oxford, England: Blackwell.)
Hilton, J. L. & von Hippel, W. (1996). Stereotypes. Annual Review of Psychology, 47, 237-271.

Hinton, G. E. & Anderson, J. A. (Eds.) (1989). Parallel models of associative memory ((rev. ed.). Hillsdale, NJ: Erlbaum)
Hintzman, D. L. (1986). "Schema abstraction" in a multiple-trace memory model. Psychological Review, 93, 411-428.

Hirschfeld, L. A. (1996). Race in the making: Cognition, culture, and the child's construction of human kinds. (Cambridge, MA: MIT Press)
Hoffman, C. & Hurst, N. (1990). Gender stereotypes: Perception or rationalization? Journal of Personality and Social Psychology, 58, 197-208.

Hogarth, R. M. & Einhorn, H. J. (1992). Order effects in belief updating: The belief-adjustment model. Cognitive Psychology, 24, 1-55.

Humphreys, M. S., Bain, J. D. & Pike, A. R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic, and procedural tasks. Psychological Review, 96, 411-428.
Humphreys, M. S., Wiles, J. & Dennis, S. (1994). Toward a theory of human memory: Data structures and access processes. Behavioral and Brain Sciences, 17, 655-692.

Johnston, L. & Hewstone, M. (1992). Cognitive models of stereotype change 3: Subtyping and the perceived typicality of disconfirming group members. Journal of Experimental Social Psychology, 28, 360-386.

Johnston, L., Hewstone, M., Pendry, L. & Frankish, C. (1994). Cognitive models of stereotype change 4: Motivational and cognitive influences. European Journal of Social Psychology, 24, 237-265.

Jost, J. T. & Banaji, M. R. (1994). The role of stereotyping in system-justification and the production of false consciousness. British Journal of Social Psychology, 33, 1-27.

Jussim, L. & Fleming, C. (1996). Self-fulfilling prophecies and the maintenance of social stereotypes: The role of dyadic interactions and social forces.(In C. N. Macrae, C. Stangor, & M. Hewstone (Eds.), Stereotypes and stereotyping (pp. 161—192). New York: Guilford Press.)
Kashima, Y. (1999). Tensor product model of exemplar-based category learning.(In J. Wiles & T. Dartnall (Eds.), Perspectives on cognitive science: Theories, experiments, and foundations, Vol. 2 (pp. 191—203). Stamford, CT: Ablex.)
Kashima, Y. (2000a). Conceptions of culture and person for psychology. Journal of Cross-Cultural Psychology, 31, 14-32.
Kashima, Y. (2000b). Maintaining cultural stereotypes in the serial reproduction of narratives. Personality and Social Psychology Bulletin, 26, 594-604.

Kashima, Y. (2000c). Recovering Bartlett's social psychology of cultural dynamics. European Journal of Social Psychology, 30, 383-403.

Kashima, Y. & Kerekes, A. R. Z. (1994). A distributed memory model of averaging phenomena in person impression formation. Journal of Experimental Social Psychology, 30, 407-455.

Kashima, Y., Woolcock, J. & King, D. (1998). The dynamics of group impression formation: The tensor product model of exemplar-based social category learning.(In S. J. Read & L. C. Miller, (Eds.), Connectionist models of social reasoning and social behavior (pp. 71—109). Mahwah, NJ: Erlbaum.)
Knapp, A. G. & Anderson, J. A. (1984). Theory of categorization based on distributed memory storage. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 616-637.

Krueger, J. (1991). Accentuation effects and illusory change in exemplar-based category learning. European Journal of Social Psychology, 21, 37-48.

Krueger, J. (1992). On the overestimation of between-group differences. European Review of Social Psychology, 3, 31-56.
Krueger, J. & Rothbart, M. (1988). The use of categorical and individuating information in making inferences about personality. Journal of Personality and Social Psychology, 55, 187-195.

Krueger, J. & Rothbart, M. (1990). Contrast and accentuation effects in category learning. Journal of Personality and Social Psychology, 59, 651-663.

Krueger, J., Rothbart, M. & Sriram, N. (1989). Category learning and change: Differences in sensitivity to information that enhances or reduces intercategory distinctions. Journal of Personality and Social Psychology, 56, 866-875.

Kunda, Z. (1990). The case of motivated reasoning. Psychological Bulletin, 108, 480-498.

Kunda, Z., Miller, D. T. & Claire, T. (1990). Combining social concepts: The role of causal reasoning. Cognitive Science, 14, 551-577.

Kunda, Z. & Oleson, K. C. (1995). Maintaining stereotypes in the face of disconfirmation: Constructing grounds for subtyping deviants. Journal of Personality and Social Psychology, 68, 565-580.

Kunda, Z. & Thagard, P. (1996). Forming impressions from stereotypes, traits, and behaviors: A parallel-constraint-satisfaction theory. Psychological Review, 103, 284-308.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes.(In I. Lakatos & A. Musgrave (Eds.), Criticism and the growth of knowledge (pp. 91—195). Cambridge, England: Cambridge University Press.)
Laudan, L. (1977). Progress and its problems: Toward a theory of scientific growth. (Berkeley: University of California Press)
Linville, P. W. & Fischer, G. W. (1993). Exemplar and abstraction models of perceived group variability and stereotypicality. Social Cognition, 11, 91-125.
Linville, P. W., Salovey, P. & Fischer, G. W. (1986). Stereotyping and perceived distributions of social characteristics: An application to ingroup-outgroup perception.(In J. F. Dovidio & S. L. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 165—208). Orlando, FL: Academic Press.)
Lippmann, W. (1922). Public opinion. (New York: Harcourt Brace Jovanovich)
Luce, R. D. (1959). Individual choice behavior. (New York: Wiley)
Mackie, D. M., Hamilton, D. L., Susskind, J. & Rosselli, F. (1996). Social psychological foundations of stereotype formation.(In C. N. Macrae, C. Stangor, & M. Hewstone (Eds.), Stereotypes and stereotyping (pp. 41—78). New York: Guilford Press.)
Mackie, D. M. & Smith, E. R. (1998). Intergroup relations: Insights from a theoretically integrative approach. Psychological Review, 105, 499-529.

Manis, M. (1967). Context effects in communication. Journal of Personality and Social Psychology, 5, 325-334.
Manis, M., Biernat, M. & Nelson, T. F. (1991). Comparison and expectancy processes in human judgment. Journal of Personality and Social Psychology, 61, 203-211.

Manis, M. & Paskewitz, J. R. (1987). Assessing psychopathology in individuals and groups: Aggregating behavior samples to form overall impressions. Personality and Social Psychology Bulletin, 13, 83-94.

Marr, D. (1982). Vision. (San Francisco, CA: Freeman)
Martin, L. L. (1986). Set/reset: Use and disuse of concepts in impression formation. Journal of Personality and Social Psychology, 51, 493-504.
Massaro, D. W. (1990). The psychology of connectionism. Behavioral and Brain Sciences, 13, 403-406.

Massaro, D. W. & Cowan, N. (1993). Information processing models: Microscopes of the mind. Annual Review of Psychology, 44, 383-425.

Massaro, D. W. & Friedman, D. (1990). Models of integration given multiple sources of information. Psychological Review, 97, 225-252.

McClelland, J. L., McNaughton, B. L. & O'Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419-457.

McClelland, J. L. & Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159-188.

McClelland, J. L. & Rumelhart, D. E. (1986). A distributed model of human learning and memory.(In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 2 (pp. 170—215). Cambridge, MA: MIT Press.)
McConnell, A. R., Liebold, J. M. & Sherman, S. J. (1997). Within-target illusory correlations and the formation of context-dependent attitudes. Journal of Personality and Social Psychology, 73, 675-686.

McConnell, A. R., Sherman, S. J. & Hamilton, D. L. (1994). Illusory correlation in the perception of groups: An extension of the distinctiveness-based account. Journal of Personality and Social Psychology, 67, 414-429.

McConnell, A. R., Sherman, S. J. & Hamilton, D. L. (1997). Target entitativity: Implications for information processing about individual and group targets. Journal of Personality and Social Psychology, 72, 750-762.

McGarty, C. (1999). Categorization in social psychology. (London: Sage)
McGarty, C., Haslam, S. A., Turner, J. C. & Oakes, P. J. (1993). Illusory correlation as accentuation of actual intercategory difference: Evidence for the effect with minimal stimulus information. European Journal of Social Psychology, 23, 390-410.
McGarty, C. & Penny, R. E. C. (1988). Categorization, accentuation and social judgment. British Journal of Social Psychology, 27, 147-157.

Medin, D. L., Goldstone, R. L. & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254-278.

Medin, D. L. & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.

Mullen, B. & Johnson, C. (1990). Distinctiveness-based illusory correlations and stereotyping: A meta-analytic integration. British Journal of Social Psychology, 29, 11-27.

Myung, I. J. & Busemeyer, J. R. (1992). Measurement-free tests of a general state-space model of prototype learning. Journal of Mathematical Psychology, 36, 32-67.

Nagel, E. (1961). The structure of science. (New York: Harcourt Brace and World)
Neisser, U. (1976). Cognition and reality. (San Francisco, CA: Freeman)
Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 104-114.

Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57.

Oakes, P. J., Haslam, A. & Turner, J. C. (1994). Stereotyping and social reality. (Oxford, England: Blackwell)
Ostrom, T. M. & Sedikides, C. (1992). Out-group homogeneity effects in natural and minimal groups. Psychological Bulletin, 112, 536-552.

Ostrom, T. M. & Upshaw, H. S. (1968). Psychological perspective and attitude change.(In A. G. Greenwald, T. C. Brock, & T. M. Ostrom (Eds.), Psychological foundations of attitudes (pp. 217—242). New York: Academic Press.)
Park, B. & Hastie, R. (1987). Perception of variability in category development: Instance versus abstraction-based stereotypes. Journal of Personality and Social Psychology, 53, 621-635.

Pettigrew, T. F. (1998). Intergroup contact theory. Annual Review of Psychology, 49, 65-85.

Pike, A. R. (1984). Comparison of convolution and matrix distributed memory systems for associative recall and recognition. Psychological Review, 91, 281-294.

Pratto, F. (1999). The puzzle of continuing group inequality: Piecing together psychological, social, and cultural forces in social dominance theory. Advances in Experimental Social Psychology, 31, 191-263.
Pryor, J. B. (1986). The influence of different encoding sets upon the formation of illusory correlations and group impressions. Personality and Social Psychology Bulletin, 12, 216-226.

Read, S. J. & Marcus-Newhall, A. (1993). Explanatory coherence in social explanations: A parallel distributed processing account. Journal of Personality and Social Psychology, 65, 429-447.

Read, S. J. & Miller, L. C. (1998). Connectionist models of social reasoning and social behavior. (Mahwah, NJ: Erlbaum)
Read, S. J., Vanman, E. J. & Miller, L. C. (1996). Connectionism, parallel constraint satisfaction processes, and gestalt principles: (Re)introducing cognitive dynamics to social psychology. Personality and Social Psychology Review, 1, 26-53.

Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382-407.

Rosenthal, R. & Jacobson, L. (1968). Pygmalion in the classroom: Teacher expectations and student intellectual development. (New York: Holt, Rinehart and Winston)
Rothbart, M. (1981). Memory processes and social beliefs.(In D. L. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behavior (pp. 145—181). Hillsdale, NJ: Erlbaum.)
Rothbart, M., Evans, M. & Fulero, S. (1979). Recall for confirming events: Memory processes and the maintenance of social stereotyping. Journal of Experimental Social Psychology, 15, 343-355.

Rothbart, M., Fulero, S., Jensen, C., Howard, J. & Birrell, B. (1978). From individual to group impressions: Availability heuristics in stereotype formation. Journal of Experimental Social Psychology, 14, 237-255.

Rothbart, M. & John, O. (1985). Social categorization and behavioral episodes: A cognitive analysis of the effects of intergroup contact. Journal of Social Issues, 41, 81-104.

Rothbart, M. & Lewis, S. (1988). Inferring category attributes from exemplar attributes: Geometric shapes and social categories. Journal of Personality and Social Psychology, 55, 861-872.

Rothbart, M., Sriram, N. & Davis-Stitt, C. (1996). The retrieval of typical and atypical category members. Journal of Experimental Social Psychology, 32, 309-336.

Rumelhart, D. E. (1980). Schemata: The building blocks of cognition.(In R. J. Spiro, B. C. Bruice, & W. F. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 33—58). Hillsdale, NJ: Erlbaum.)
Rumelhart, D. E., Smolensky, P., McClelland, J. L. & Hinton, G. E. (1986). Schemata and sequential thought processes in PDP models.(In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 2 (pp. 7—57). Cambridge, MA: MIT Press.)
Sanbonmatsu, D. M., Sherman, S. J. & Hamilton, D. L. (1987). Illusory correlation in the perception of individuals and groups. Social Cognition, 5, 1-25.

Scarberry, N. C., Ratcliff, C. D., Lord, C. G., Lanicek, D. L. & Desforges, D. M. (1997). Effects of individuating information on the generalization part of Allport's contact hypothesis. Personality and Social Psychology Bulletin, 23, 1291-1299.

Schacter, D. L., Norman, K. A. & Koutstaal, W. (1998). The cognitive neuroscience of constructive memory. Annual Review of Psychology, 49, 289-318.

Schaller, M. (1992). In-group favoritism and statistical reasoning in social inference: Implications for formation and maintenance of group stereotypes. Journal of Personality and Social Psychology, 63, 61-74.
Schaller, M. & Maass, A. (1989). Illusory correlation and social categorization: Toward an integration of motivational and cognitive factors in stereotype formation. Journal of Personality and Social Psychology, 56, 709-721.

Schultz, T. R. & Lepper, M. R. (1996). Cognitive dissonance reduction as constraint satisfaction. Psychological Review, 103, 219-240.
Semin, G. & Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: Social cognition and language. Journal of Personality and Social Psychology, 54, 558-568.

Semin, G. R. & Fiedler, K. (1991). The linguistic category model, its basis, applications, and range.(In W. Stroebe & M. Hewstone (Eds.), European review of social psychology (Vol. 2, pp. 1—31). Chichester, England: Wiley.)
Sherman, J. W. (1996). Development and mental representation of stereotypes. Journal of Personality and Social Psychology, 70, 1126-1141.

Shoda, Y., Mischel, W. & Wright, J. C. (1989). Intuitive interactionism in person perception: Effects of situation-behavior relations on dispositional judgments. Journal of Personality and Social Psychology, 56, 41-53.

Skowronski, J. J. & Shook, J. (1997). Facilitation in repeated trait judgments: Implications for the structure of trait concepts. Journal of Experimental Social Psychology, 33, 21-46.

Smith, E. R. (1991). Illusory correlation in a simulated exemplar-based memory. Journal of Experimental Social Psychology, 27, 107-123.

Smith, E. R. (1996). What do connectionism and social psychology offer each other? Journal of Personality and Social Psychology, 70, 893-912.

Smith, E. R. & DeCoster, J. (1997). Heuristic-systematic and other dual process models in social and cognitive psychology: An integration and connectionist interpretation. (Unpublished manuscript, Department of Psychology, Purdue University)
Smith, E. R. & DeCoster, J. (1998a). Knowledge acquisition, accessibility, and use in person perception and stereotyping: Simulation with a recurrent connectionist network. Journal of Personality and Social Psychology, 74, 21-35.

Smith, E. R. & DeCoster, J. (1998b). Person perception and stereotyping: Simulation using distributed representations in a recurrent connectionist network.(In S. J. Read & L. C. Miller (Eds.), Connectionist models of social reasoning and social behavior (pp. 111—140). Mahwah, NJ: Erlbaum.)
Smith, E. R. & Zárate, M. A. (1990). Exemplar and prototype use in social categorisation. Social Cognition, 8, 243-262.

Smith, E. R. & Zárate, M. A. (1992). Exemplar-based model of social judgment. Psychological Review, 99, 3-21.
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74.

Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46, 159-216.
Srull, T. K. & Wyer, R. S. (1989). Person memory and judgment. Psychological Review, 96, 58-83.

Stangor, C. & Lange, J. E. (1994). Mental representations of social groups: Advances in understanding stereotypes and stereotyping. Advances in Experimental Social Psychology, 26, 357-416.
Strange, K. R., Schwei, M. & Geiselman, R. E. (1978). Effects of the structure of descriptions on group impression formation. Bulletin of the Psychonomic Society, 12, 224-226.

Tajfel, H. & Wilkes, A. L. (1963). Classification and quantitative judgment. British Journal of Psychology, 54, 101-114.

Taylor, S. A. (1981). A categorization approach to stereotyping.(In D. L. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behavior (pp. 83—114). Hillsdale, NJ: Erlbaum.)
Trafimow, D. (1998). Situation-specific effects in person memory. Personality and Social Psychology Bulletin, 24, 314-321.

Treisman, A. M. & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136.

Triandis, H. C. (1995). Culture and social behavior. (New York: McGraw-Hill)
Tulving, E. (1983). Elements of episodic memory. (New York: Oxford University Press)
Turner, J. C. (1987). Rediscovering the social group: A self-categorisation theory. (Oxford, England: Blackwell)
Uleman, J. S., Newman, L. S. & Moskowitz, G. B. (1996). People as flexible interpreters: Evidence and issues from spontaneous trait inference. Advances in Experimental Social Psychology, 28, 211-279.
Upshaw, H. S. (1969). The personal reference scale: An approach to social judgment. Advances in Experimental Social Psychology, 14, 315-371.
Van Overwalle, F. (1998). Causal explanation as constraint satisfaction: A critique and a feedforward connectionist alternative. Journal of Personality and Social Psychology, 74, 312-328.

Volkmann, J. (1951). Scales of judgment and their implications for social psychology.(In J. H. Rohrer & M. Sherif (Eds.), Social psychology at the crossroads (pp. 273—294). New York: Harper.)
Von Hippel, W., Sekaquaptewa, D. & Vargas, P. (1995). On the role of encoding processes in stereotype maintenance. Advances in Experimental Social Psychology, 27, 177-254.
Weber, E. U., Goldstein, W. M. & Busemeyer, J. R. (1991). Beyond strategies: Implications of memory representation and memory processes for models of judgment and decision making.(In W. E. Hockley & S. Lewandowsky (Eds.), Relating theory and data: Essays on human memory in honour of Bennet B. Murdock (pp. 75—101). Hillsdale, NJ: Erlbaum.)
Weber, R. & Crocker, J. (1983). Cognitive processes in the revision of stereotypic beliefs. Journal of Personality and Social Psychology, 45, 961-977.

Wiles, J. & Humphreys, M. S. (1993). Using artificial neural nets to model implicit and explicit memory test performance.(In P. Graf & M. E. J. Masson (Eds.), Implicit memory: New directions in cognition, development, and neuropsychology (pp. 141—165). Hillsdale, NJ: Erlbaum.)
Wright, J. C. & Mischel, W. (1987). A conditional approach to dispositional constructs: The local predictability of social behavior. Journal of Personality and Social Psychology, 53, 1159-1177.

Wyer, R. S. & Carlston, D. E. (1979). Social cognition, inference, and attribution. (Hillsdale, NJ: Erlbaum)
Yzerbyt, V. Y., Coull, A. & Rocher, S. J. (1999). Fencing off the deviant: The role of cognitive resources in the maintenance of stereotypes. Journal of Personality and Social Psychology, 77, 449-462.

It should be noted that Asch's sense of dynamics also implied that meaning of a stimulus changes as a function of preexisting memory and concurrent stimuli. This sense of dynamics as meaning change is not directly handled by the current model. To capture this, a model of encoding processes is necessary, and it falls outside the current scope of TPM.

Although another type of mental representations postulated for social groups is an associative network model (e.g., Stangor & Lange, 1994 ), that type of model is not suitable for modeling the averaging phenomenon. For a further discussion, see Kashima and Kerekes (1994) .

In this article, for expository simplicity, attention is assumed not to differ for different aspects. However, it is possible to make the model more general, so that attention varies across aspects or even for each exemplar's different aspects.

In fact, in most impression formation experiments, stimuli are constructed or selected so that they clearly mark higher or lower ends of a given bipolar scale (e.g., likability). For example, personality trait words may be selected for their clear evaluative connotation; behavioral descriptions may be selected on the basis of their normative ratings in a pilot study so that some clearly indicate one trait and others, its opposite.

Hogarth and Einhorn (1992) proposed a model that makes predictions similar to the current model under some conditions. However, it should be noted that their model cannot account for the results of Smith and Zárate (1990) . For a discussion about other inadequacies of their model, see Kashima and Kerekes (1994) .

This statement holds provided that the endpoints of judgment scales and scaling parameters are the same and the information environment remains stable at least probabilistically. Interestingly, this implication of the TPM is broadly in agreement with Gigerenzer and Hoffrage's (1995) contention that people's probabilistic judgments are often consistent with the Bayesian normative criterion when information is presented in frequency rather than in probability. The TPM, like other connectionist models, retains information about the frequency of events.

Busemeyer and Myung (1988) also discussed noninterference, but we do not address it because it is not directly relevant to the current discussion.

This research was supported by an Australian Research Council grant. We acknowledge Paul Polidori for his programming and Paul Clifford for conducting the experiment reported here. We thank Craig McGarty and Michael Platow for their comments on earlier versions of the article.
Correspondence may be addressed to Yoshihisa Kashima, Department of Psychology, University of Melbourne, Victoria, Australia, 3010.
Electronic mail may be sent to y.kashima@psych.unimelb.edu.au
Received: March 5, 1999
Revised: January 31, 2000
Accepted: February 23, 2000

Table 1. Tests of Response Dependency: Expected Patterns of the Serial Position Weights (SPW) for the (1, 4), (2, 4), and (3, 4) Responding Conditions

Table 2. Recency Effects for the Stimuli That Share the Same Stimulus and Judgment Context and Those That Do Not Share the Same Context

Table 3. Mean Simulated Group Impression Judgments for the Stereotype Change Simulation

Table 4. Means of the Simulated Mean Judgments for the Group Differentiation

Table 5. Distribution of Events Used in the Simulation for the Group Differentiation

Figure 1. A schematic diagram of a tensor product net with four aspects representing group, person, event, and context.

Figure 2. The probability of classifying exemplars into category A, observed in Smith and Zárate (1990) , and the simulation results of the tensor product model. Constructed from the data reported in Table 3 .1 in Kashima et al. (1998 , p. 84). The dashed line reports the observed probability in the exemplar-only condition of Smith and Zárate (1990) and the solid line represents the prototype + exemplar simulation results, the condition theorized to be analogous to the exemplar-only condition in the empirical study. The positions on the x -axis are different exemplars used in the experiment: a1 to a5 are exemplars to be classified into category A; b1 to b4 are exemplars to be classified into category B; and n1 to n7 are new exemplars.

Figure 3. The means of the serial position weights based on the simulation of the tensor product model in the final, sequential, (1, 4), (2, 4), and (3, 4) conditions.

Figure 4. The means of the serial position weights in the final, sequential, (1, 4), (2, 4), and (3, 4) conditions.