Review of General Psychology | © 1999 by the Educational Publishing Foundation |
March 2000 Vol. 4, No. 1, 25-58 | For personal use only--not for distribution. |
Subjective correlations that exaggerate objectively presented contingencies are usually referred to as illusory correlations. An empirical review reveals 3 major paradigms of illusory correlations, drawing on 2 prominent but conflicting gestalt principles, congruency and distinctiveness. Congruency accounts for expectancy-based illusory correlations, whereas distinctiveness is relevant to illusions resulting from the asymmetry of positive and negative attributes and from infrequency. The congruency principle implies a processing advantage for expected stimuli, whereas distinctiveness assumes enhanced processing of unexpected events. This apparent conflict is resolved, and an integrative account is offered within a simple connectionist framework (BIAS) of correlation assessment. The basic algorithm is outlined, empirical findings are simulated, new theoretical distinctions are introduced, and analogies to related paradigms are explained.
Illusory correlations have become a prominent research topic not only in modern social psychology ( Crocker, 1981 ; Fiedler, 1985 ; Hamilton, 1981 ) but also in personality research (Shweder, 1975 , 1977a ), diagnostics ( Berman & Kenny, 1976 ; L. J. Chapman & Chapman, 1967 ), learning and conditioning ( Alloy & Tabachnik, 1984 ; Orr & Lanzetta, 1980 ; Shanks & Dickinson, 1987 ), memory ( Arkes & Harkness, 1983 ; Kao & Wasserman, 1993 ), and applied psychology ( Bettman, John, & Scott, 1986 ; Jussim, 1986 ). There are (at least) two reasons for the prominence of this topic.
On the one hand, the detection and assessment of event correlations are among the most fundamental operations an adaptive organism must acquire. Human (or animal) intelligence requires learning of categorization, discrimination (i.e., differential categorization), and correlation (i.e., related discrimination in different dimensions). Note that all three mental operations form a logical hierarchy such that categorization enables discrimination, which then enables the understanding of correlation. Just as in Inhelder and Piaget's (1958) developmental theory of intelligence, the ability to detect, quantify, and use correlations is taken for granted as a basic module in almost every theory of learning, attribution, language comprehension, and inductive reasoning.
On the other hand, the popularity of illusory correlations is mainly due to the challenges and provocations that are explicit or implicit in the term illusory. The intriguing and often pessimistic message is that people's insensitivity to actual correlations can lead to serious biases and shortcomings of social judgment. And these biases, in turn, contribute to stereotyping ( Hamilton, 1981 ), erroneous decisions ( Einhorn & Hogarth, 1978 ), injustice ( Smither, Collins, & Buda, 1989 ), intergroup discrimination ( Hamilton & Gifford, 1976 ), and false clinical judgment ( Dawes, 1989 ).
The aim of the present article is to present a review and a comprehensive explanatory framework of research on illusory correlations and its intriguing implications in social and cognitive psychology. In several respects, this review is atypical in format and scope. After an examination of the empirical research, a computer-based model is presented that provides an explicit algorithm of the informational processes underlying correlation assessment. Through the use of this integrative framework, within which the diverse variants of illusory correlations can be located, the intention is to convey the following points.
First, research on illusory correlations has evolved in separate paradigms and fields of application, without an integrative conceptual framework. Second, as a result of this lack of a comprehensive framework, there is little theoretical exchange or cross-referencing between major paradigms. Some approaches to illusory correlations, covered under different paradigm labels, have gone largely unnoticed. Third, as a consequence, the theoretical scope and the domain of application have not been fully acknowledged. Fourth, analogous to the separately evolved paradigms, an implicit assumption has been that different and sometimes opposite versions of the illusion are bound to separate task conditions. In this way, fruitful theoretical conflicts could be avoided. Finally, a distinction is highlighted between two sources of bias in correlation assessment: one that is itself biased and originates in expectations, motives, and selective processing and one that is unbiased in nature and not mediated by selective forces or processes. It is argued that explaining biased correlations by biased processes runs the risk of becoming circular ( Wallach & Wallach, 1994 ).
The article is divided in two major parts. The first section is devoted to a review of the three bodies of literature: (a) expectancy-based illusory correlations, (b) differential weighting of positive and negative stimuli in covariation assessment, and (c) infrequency-based illusory correlations. Each subsection starts with an explication of the phenomenon and an empirical review that covers historical origins as well as recent findings and what they reveal about the underlying psychological processes. A final conclusion summarizes well-established findings, mentions open questions, delimits the domain, and attempts to identify the theory heuristic that has guided (and restricted) the research in each paradigm.
Although the review of empirical research yielded a compelling set of well-established findings, with intriguing and often provocative implications, a less satisfactory picture arises at the theoretical level. No integrative theoretical framework is available that connects and coordinates the findings from separate paradigms (for a notable exception, see Busemeyer, 1991 ). Moreover, the psychological assumptions proposed to explain the underlying cognitive processes in the three major paradigms have remained isolated and, in some respects, even contradictory. Thus, the second part of the article attempts to depict the outlines of a unitary framework based on a simple connectionist feedforward model (the Brunswikian induction algorithm for social inference [BIAS] model; Fiedler, 1996 ). Within this framework, different origins of illusory correlations can be distinguished analytically, and previously neglected factors can be identified. Apparent theoretical conflicts can be resolved, and the contribution of traditional paradigms can be located. The algorithm reveals particularly how unbiased processes can lead to biased outcomes and how different types of illusory correlations are not mutually exclusive.
The aim here is not to provide an exhaustive bibliography. Research on illusory correlations has led to so many publications spread over so many literatures and applied domains that almost no combination of keywords in a bibliographic database would yield a complete list of references. Excellent reviews can be found in earlier publications by Allan (1993) , Alloy and Tabachnik (1984) , Crocker (1981) , Nisbett and Ross (1980) , and Papini and Bitterman (1990) . Recent empirical findings that were not covered by these previous reviews did not alter the major findings and conclusions. Rather than trying to be exhaustive, the present review intends to explain the current state of the art in terms of the historical origins and early theoretical commitments that have restricted research on illusory correlations.
A number of different keywords and combinations thereof were used to gather the bibliographical basis of this review (as summarized in Tables 1 3 ). These keywords included illusory correlation, correlation, covariation, and contingency combined with assessment, judgment, and detection as well as in conjunction with distinctiveness, expectancy, and cognitive biases. However, the pooled list of references would be too large to be reviewed and, by the way, would hardly justify the journal space because the majority of articles refer to replications and applications that do not really increase theoretical understanding. Therefore, only a subset of references was included, guided by the goal of pointing out historical origins, uncovering seminal findings that have influenced further research, and doing justice to important theoretical issues. It is unlikely that any selectivity arising from this partly subjective procedure will interfere with the primary goal, namely, to elucidate unresolved issues within a unitary framework.
Defining the Domain of InquiryA potentially severe problem is whether the term illusory correlation refers to a unitary phenomenon at all or whether it is but a label that unsystematically connects some phenomena and excludes others that may be equally relevant to covariation assessment. This is, of course, a matter of definition, but definitions need not be arbitrary. The present approach assumes that illusory correlations do, in fact, represent a clearly definable class of phenomena with obvious face validity and a systematic reference to cognitive theory. An illusory correlation is simply defined as a subjective correlation assessment that deviates systematically from an objectively presented correlation. In the simplest, qualitative case, the illusion consists in perceiving a correlation that is actually not there. In the more general case, the definition also includes overestimates, underestimates, reversals, and other distortions of "real" correlations.
One might conjecture that normative statistical models may not provide an ultimate criterion of "real" correlations, but this problem can be ignored in the present context, because the experimental comparisons on which illusory correlations are based (e.g., the same stimulus series presented with different stimulus labels) rarely require a normative criterion. Operationally, the domain is restricted to inductivestatistical assessment or learning of correlations from a series of observations that may be provided by the experimenter or result from the participant's own information search. This definition excludes studies based on summary statistics presented in tables or texts. Thus, the phenomenon under focus is a clearly circumscribed, homogeneous class of cognitive operations: the extraction of a statistical rule from a bivariate (or multivariate) series of stimuli presented over time. Typical of this cognitive task is the interplay of bottom-up (stimulus-driven) and top-down (knowledge-driven) components (i.e., the competition of old knowledge or expectations with new stimulus data).
The three paradigms that are the focus of the next three subsections emerge naturally from this field of two complementary forces. The first paradigm, expectancy-based illusory correlations, highlights the top-down influence of expectations that may override stimulus data. A typical feature of this paradigm is that meaningful stimulus materials are used, and participants' prior knowledge constitutes an essential aspect of the experimental design. In contrast, in the second paradigm, expectations are ruled out so that covariation assessment can be studied as a pure function of stimulus properties (e.g., perceptual variables such as present vs. absent features). In the third paradigm, the emphasis is neither on prior expectations nor on perceptual aspects of individual stimuli; rather, the typical design involves manipulations of the distribution of different stimulus subsets while holding the overall correlation constant.
The very fact that research has mainly evolved in these three paradigms and that the present article focuses on these paradigms is not arbitrary but naturally reflects three facets of the dialectic interplay of old knowledge and new stimuli. Admittedly, one might point to several other paradigms that are also highly relevant to illusory correlations but not commonly associated with this label, such as work on causal and statistical reasoning, attribution errors, or in-group favoritism. That these paradigms are not treated explicitly does not mean that they are neglected or that they would lead to divergent conclusions. Many of these additional phenomena are easily assimilated by the three major paradigms (e.g., in-group favoritism and attribution errors as expectancy-based effects). Others that are of distinct theoretical value are mentioned in later sections.
Expectancy-Based Illusory CorrelationsMany judgment and evaluation problems call for the fair and impartial assessment of empirical observations, uncontaminated by the observer's subjective beliefs or wishes. However, the notion of expectancy-based illusory correlations highlights people's inability to keep old expectations apart from new empirical data. Accordingly, the typical research design in this paradigm pits top-down influences of prior knowledge or expectations against bottom-up influences stemming from stimulus data. However, the explicit task is only to assess the correlation in the stimulus data and to disregard prior knowledge.
Review of historical origins and recent state of the art.Much earlier than in cognitive and social psychology, such illusory correlations were introduced as a challenge to standard procedures in diagnostics. L. J. Chapman and Chapman ( 1967 , 1969 ) were the first to demonstrate illusions in users of projective techniques such as the Rorschach inkblot test and the draw-a-person test. When presented with a series of test results (e.g., picture drawings) along with the patient's diagnosis, diagnosticians report an enhanced coincidence of particular symptoms (e.g., anomalous head) and associated categories (e.g., worry about intelligence), even though these expected pairings did not appear more frequently than unexpected pairings ( L. J. Chapman & Chapman, 1967 ). The diagnostic stereotypes that governed these (mis)perceptions were so superficial and obvious (cf. Shweder, 1977b ) that even laypeople could anticipate the rules used by the expert diagnosticians. After all, associating anomalous heads with intelligence or anal features with homosexuality does not require much professional expertise. Analogous illusions were demonstrated in subsequent years with reference to many other diagnostic procedures, such as the incomplete sentence blank ( Starr & Katkin, 1969 ) and observation techniques ( Berman & Kenny, 1976 ; D'Andrade, 1974 ), the common denominator being that diagnostic expectations mislead diagnosticians to perceive correlations that actually do not exist.
A similar point was made, and most strongly articulated, by Shweder ( 1975 , 1977a ) regarding correlations between personality traits. When trait-relevant behaviors are observed, the trait correlations that are later memorized closely follow the semantic similarity between trait terms. In fact, semantic similarity is a better predictor of reported correlations than the actually observed statistical relations ( Shweder, 1982 ). According to Shweder, the whole endeavor of personality research may thus be grounded on an illusory network that confounds likelihood with likeness.
It is worth noting that the same expectation effects that govern meaningful observation tasks in diagnostics and personality are also possible at lower levels of learning, such as classical conditioning. Garcia and Koelling (1966) have long demonstrated that an unconditional stimulus such as sickness can be easier associated with conditional stimuli in the olfactory modality, whereas electrical shock can be more effectively paired with distinct auditory signals. In a somewhat different vein, it is easier to associate negative than positive facial expressions with an aversive shock ( Orr & Lanzetta, 1980 ).
In recent times, expectancy-based illusory correlations have continued to attract research interest. However, whereas the challenging idea has been extended to many domains, the basic psychological principle has received little modification. As summarized in Table 1 , recent applications concern illusory correlations in clinical (e.g., de Jong & Merckelbach, 1993 ; DiBattista & Shepherd, 1993 ) and organizational (e.g., Camerer, 1988 ; Smither et al., 1989 ) contexts. Other studies have addressed potential moderator variables, showing enhanced illusions under arousal ( Kim & Baron, 1988 ), in older people ( Mutter & Poliske, 1994 ), and when judges' knowledge base is high ( Billman, Bornstein, & Richards, 1992 ). However, these moderator analyses are guided by pragmatic issues rather than distinct theoretical questions.
In social psychology, illusory correlations have reached a prominent status in stereotype research. A stereotype is commonly defined (cf. McCauley & Stitt, 1978 ; Rothbart & John, 1985 ; Stangor & McMillan, 1992 ) as an expected correlationtypically illusory but potentially veridicalbetween groups and some attribute (behavior, trait, or role). Given this analogous definition, stereotypes are special cases of expectancy-based illusory correlations.
In the social domain, expectations may not always originate in semantic or epistemic knowledge; rather, they may sometimes originate in affective goals or wishful thinking. Thus, perceivers who are themselves members of one group are typically biased to perceive correlations that assign more positive attributes to the in-group than to the out-group ( Brewer, 1979 ). A similar variant of wishful thinking can be found in self-perception. Alloy and Abramson (1979) presented their participants with a simple contingency game in which they could try to control the onset of a light by either pressing or not pressing a button. Normal participants (as opposed to depressives) typically overestimated the degree of control they exerted over the light onset, even when it was noncontingent. Note that the apparatus was free of any meaning, so epistemic expectations could hardly have influenced this variant of unrealistic optimism (cf. Weinstein, 1980 ).
Contribution to psychological theory.In any case, the literature on illusory correlations is replete with provocative and partly ingenious demonstrations of expectation effects in social and applied contexts. However, the contribution of this prominent research field remains mainly empiristic. A growing body of evidence has been accumulated for an empirical law, namely, that prior expectations canand almost universally dointrude into the subjective assessment of stimulus correlations. The fascination of this phenomenon mainly arises from its power and robustness and the irrational flavor of people's inability to keep old knowledge and new data apart.
At the level of psychological theory, the impact of prior knowledge, the Kantian notion that human intelligence does not start as a tabula rasa, is simply taken for granted as an axiom or theoretical primitive. No explicit attempt is made to explain how and why it is that events that are expected to go together or resemble each other appear to be more frequent than unexpected or dissimilar pairings. Note that quite heterogeneous sources of expectation were used in the studies reviewed here, ranging from stereotypical beliefs to semantic similarity, biological preparedness, and wishful thinking. Note, in particular, that no systematic distinction was made between similarity-based and expectancy-based illusory correlations. No explicit process theory seems to be required for such a fully normal expectation effect. Implicitly, the interplay of inductive and deductive influences is assumed to follow a simple compromise: When observing the contingency between events, the intelligent organism does not start from zero but is already prepared with prior expectations rooted in older knowledge. To the extent that empirical stimulus data are incomplete, impoverished, masked, or forgotten as a consequence of imperfect memory, the resulting uncertainty gap can be filled with epistemic expectations, which afford useful default knowledge.
Within this plausible and seemingly uncontestable compromise model, there has been little controversy between competing theories, and reflections on the underlying cognitive process have been largely confined to locating the stages in the cognitive process that are sensitive to expectancies. Most experiments have emphasized the role of expectancy-driven encoding and selective recall ( Hamilton, 1981 ; C. Hoffman & Hurst, 1990 ), but the early stages of information search ( Klayman & Ha, 1987 ; Pyszczynski & Greenberg, 1987 ) and perception ( Fiedler, Hemmeter, & Hofmann, 1984 ) may contribute as well. The considerable counterevidence that exists for enhanced processing of expectancy-incongruent information ( Hastie, 1980 ; Srull & Wyer, 1989 ) has largely been ignored in this paradigm (as described subsequently).
Summary and leading theory heuristic.Expectancy-based illusory correlations have been observed in hundreds of experiments, showing that correlation judgments under uncertainty (stimulus load and memory loss) reflect a compromise between actual observations and prior expectancies. Theoretical explanations have taken for granted that expectancy-congruent biases can occur at different stages of cognitive processing (information search, perception, encoding, recall, judgment, and communication), although stages may differ in sensitivity. Conflicting evidence for selective processing of incongruent information has rarely been related to illusory correlations (for a notable exception, see Garcia-Marques & Hamilton, 1996 ).
The notion of expectancy-congruent processing is so plausible and self-evident that it is presupposed as a theoretical primitive that need not itself be explained. Cacioppo, Gardner, and Berntson (1997) have recently identified this sense of obviousness as a major block in scientific progress. Note that the congruency concept, as a theory heuristic, is deeply rooted in a priori gestalt principles that hold not only for experimental participants but for researchers as well. Thus, if I expect that women are low in leadership ability, consistency theories ( Festinger, 1957 ; Read, Vanman, & Miller, 1997 ; Shultz & Lepper, 1996 ) will predict that pertinent observations should be interpreted in a manner congruent with this cognition. If the target of observation is female, the expectancy linking females to leadership inability will cause ambiguous behavioral observations to be interpreted accordingly (i.e., as weak, disorganized, hesitating, and low in charisma). Just as balanced structures can be learned more readily than unbalanced ones ( de Soto, 1960 ), the basic gestalt principle of congruency predicts that stimulus observations should be adjusted to preexisting expectancies.
Because the expectancy paradigm is mainly governed by the gestalt heuristic of congruency, it contributes little to cognitive theory. Neither the question of stimulus encoding and representation nor an algorithmic description of the assessment process is illuminated within this paradigm. It does not even afford an explicit model of the interaction between new stimuli and old expectancies.
Distinctiveness and PositiveNegative AsymmetryIronically, the obviousness of an expectancy bias may itself reflect an expectancy bias in the mind of researchers who continue to believe in congruency, although there is compelling counterevidence for an advantage of incongruent information in memory ( Hastie, 1980 ; Srull & Wyer, 1989 ; Stangor & McMillan, 1992 ). Incongruent, expectancy-deviant, surprising, or conflict-prone observations are particularly likely to be salient during perception, to be elaborated deeply at encoding, and therefore to be highly accessible in recall. Although such an incongruency effect is clearly at variance with a universal congruency principle, it is rooted in an equally common gestalt metaphor, the contrast of a figure against the ground. Salient, distinctive stimuli that deviate markedly from the background or baseline are likely to become the focus of attention and to give rise to illusory correlations distorted toward distinctive, salient, unexpected, and attention-grabbing stimuli.
Empirical evidence for this principle has developed in two paradigms. The first paradigm is concerned with the perceptual distinctiveness of positive as opposed to negative stimuli and their impact on learning and memory ( Shaklee & Mims, 1982 ; Shanks & Dickinson, 1987 ; Wasserman, Dorner, & Kao, 1990 ). In the other paradigm, distinctiveness refers to the contrast of outstanding stimuli against the remaining list. This has become a central issue in modern social psychology ( Fiedler, 1991 ; Hamilton & Gifford, 1976 ). It is addressed in the next section.
Review of historical origins and recent state of the art.In an early investigation conducted by Jenkins and Ward (1965) , the task was to figure out the extent to which an outcome (a lighted circle symbolizing "success") could be controlled by pressing one of two buttons. When the actual contingency was zero, in that both buttons had the same success rate, participants experienced more control when the constant success rate was high (e.g., 75%) rather than low. This control illusion was obviously due to the fact that subjective contingency is mainly sensitive to the number of successful trials and rather insensitive to complementary feedback on negative trials.
This bias to attend to present features more than to absent features was also highlighted by Nisbett and Ross (1980) and confirmed by numerous studies conducted since Jenkins and Ward's seminal article (e.g., Crocker, 1982 ). For example, when judging therapy success, people usually consider the number of patients recovered after psychotherapy and fail to consider the rate of spontaneous recovery without psychotherapy ( Eysenck, 1956 ). Or when judging correlations between a symptom and a disease, they usually assign the greatest weight to the number of cases in which both the symptom and the disease are present ( Smedslund, 1963 ).
In a similar vein, judgments of the observed impact of a cause (e.g., plants receiving fertilizer) on an effect (blooming) are more sensitive to observations of the consequences of present causes and the antecedents of present effects than to observations of absent causes or missing effects ( Kao & Wasserman, 1993 ). Thus, judgments of causation do not generally follow Cheng and Novick's (1990) contrast model or the underlying delta rule, stating that causal judgments reflect the difference between the two conditional probabilities, effect/cause present and effect/cause absent. Instead, the bias toward positive information is evident in the so-called density bias ( Allan, 1993 ), showing that, when delta is held constant, judged causality or control increases with the absolute occurrence rate of the effect (e.g., A. G. Baker, Berbrier, & Vallee-Tourangeau, 1989 ; Shanks, 1985 ). Thus, even when an effect occurs at the same rate in the presence as in the absence of a cause, the judged contingency is higher when the constant rate is high (e.g., 75%) rather than low (25%).
On the basis of the empirical findings obtained in this paradigm (summarized in Table 2 ), a strong case can be made for positivenegative asymmetry (cf. Allan, 1993 ; Wasserman, Elek, Chatlosh, & Baker, 1993 ). Whenever one level of a dichotomous variable is more informative or diagnostic than the other level, contingency judgments will give more weight to observations representing the positive level.
Contribution to psychological theory.As mentioned earlier, the asymmetric impact of positive versus negative information reflects a general gestalt principle: Present features (e.g., a traffic sign or an observed symptom) provide the figure before the ground. Absent features (an absent traffic sign or a missing symptom) are less informative because they do not reveal the nature of what is missing ( Garner, 1978 ).
However, research in this area has not made a systematic attempt to explain why, how, and under what boundary conditions positive features are more informative than negative features. Just as the expectancy effect described earlier, positivenegative asymmetry has been taken for granted as a plausible account of biased contingency assessment in this paradigm, covering diverse, intuitively chosen operationalizations of present versus absent features.
Rather than further pursuing the origins of perceptual salience, the theoretical emphasis was on testing covariation learning models, formulated as algebraic functions of the four frequencies a , b , c , and d in a standard 2 × 2 contingency table. In this notation (see Figure 1 ), a is the number of observations in which both attributes (e.g., cause and effect) are present, b represents the presentabsent case, c represents the absentpresent case, and d represents the absentabsent case. In spite of some ongoing debate about the specific integration rule and its moderators, there is wide agreement that a receives a much higher weight than d (reflecting the positivity bias) and relative agreement in the ordering a > b > c > d ( Kao & Wasserman, 1993 ).
Note that the prominent approach to model correlation assessment as a function of stimulus frequencies, or probabilities a / a + b and c / c + d , originates in a major concern with models of learning and conditioning (such as the delta rule) that are defined in terms of these statistics. Rather sophisticated methods and designs have been developed to test and quantify the impact of the four cell frequencies, based on the systematic variation of a , b , c , and d in the stimulus series ( Wasserman et al., 1990 ). However, whereas the empirically obtained weighting rules may afford "paramorphic" models ( P. J. Hoffman, 1960 ) of correlation judgments, they cannot be regarded as models of the cognitive process.
Summary and leading theory heuristic.The extra distinctiveness of positive observations, which constitutes the major theory heuristic of this paradigm, can be considered a well-established law of inductive learning. However, aside from their implications for general learning models, these findings reveal little about the cognitive representation of positive and negative information or the underlying memory algorithm. Moreover, the impact of expectations and content specificity, which is central to the former paradigm, is largely excluded from this paradigm in which prior knowledge is typically ruled out as a factor to be controlled experimentally.
Distinctiveness and InfrequencyThe power of distinctive, attention-grabbing events in regard to producing illusory correlations is not confined to the perceptual salience of present (positive) as opposed to absent (negative) features. Distinctiveness may also arise from the infrequency of outstanding stimuli within a list. This variant dates back to the famous von Restorff (1933) effect. The typical task used by Hedwig von Restorff, a student of Koehler in Berlin, consisted of a series of numbers in which, say, one letter string was inserted, or pairs of nonsense syllables with singular pairs of other materials in between. In subsequent memory tests, the outstanding stimuli were shown to have a clear memory advantage.
Review of historical origins and recent state of the art.Although the von Restorff effect is rarely cited explicitly, it had a huge impact on empirical approaches to illusory correlation. L. J. Chapman (1967) showed that the frequency of outstanding pairs of stimulus events (e.g., liontiger) was overestimated relative to less distinctive pairs (e.g., bacontiger). In a well-known experiment conducted by Taylor, Fiske, Etkoff, and Ruderman (1978) , participants observed a videotaped group discussion in which one Black (White) and five White (Black) individuals took part or one woman (man) and five men (women) took part. The relative contribution of the less frequent type of discussant was regularly overestimated. Likewise, McArthur (1980) reviewed research showing that salient stimulus persons are given more attention and are perceived to exert more "social causality" than less salient persons.
By far the most important elaboration of the von Restorff phenomenon took place in social psychology, following the seminal work of Hamilton and Gifford (1976) . Their stimulus series consisted of 26 behavior descriptions pertaining to Group A (the majority) and 13 behavior descriptions pertaining to Group B (the minority). The correlation between group membership and the desirability (positivity) of behaviors was zero in that the same ratio of positive to negative behaviors held for both groups (i.e., 18+:8- for Group A and 9+:4- for Group B). As it turned out, however, the larger group was consistently associated with the predominant valence of behavior (i.e., Group A appeared more positive when positive behaviors were more frequent), whereas the smaller group was more associated with the valence of the less frequent behavior (i.e., negativity). This was apparent in different dependent measures, such as frequency estimates, trait impression ratings, and cued-recall tests of groupbehavior associations. This noteworthy phenomenon has been replicated, validated, and extended in numerous studies (see Mullen & Johnson, 1990 ).
The enormous popularity of the HamiltonGifford paradigm is mainly due to the social psychological challenge it conveys. Minorities are, by definition, less numerous than majorities, and negative behavior is norm deviant and therefore less frequent than positive, norm-conforming behavior ( Taylor, 1991 ). Thus, the HamiltonGifford paradigm provides an analog of the stimulus environment that characterizes real minorities. Given the same ratio of (prevailing) positivity in large and small groups, the impressions and cognitive representations of minorities will be relatively negative. This raises a pessimistic perspective on the problem of minority discrimination. A synopsis of relevant literature is provided in Table 3 .
Contribution to psychological theory.Adopting von Restorff's gestalt notion of distinctiveness, Hamilton and colleagues (cf. Hamilton & Sherman, 1989 ) reasoned that the joint infrequency of small groups and rare behaviors renders negative minority behavior particularly distinctive and salient. As a consequence, stimulus items that belong to the most distinctive category should be encoded more deeply and should therefore have a memory advantage, which is assumed to mediate the resulting illusory correlation.
Over many years, no theoretical alternative was apparent to this account, which was regarded as empirically well established. Only in the last few years have new findings and computer simulations provided alternative explanations of the HamiltonGifford phenomenon without the assumption of a memory advantage for infrequent events ( Fiedler, 1991 ; Smith, 1991 ). Crucial to these accounts is sample size; if positive behavior prevails by the same ratio in two groups, this prevalence will be more apparent in the larger group because of the larger number of observations or learning trials. Theoretically, this approach places illusory correlations in the context of a basic law of learning, namely, that a constant "reinforcement ratio" (i.e., ratio of positive behaviors of a group) is learned more effectively as the number of learning trials increases. It is not necessary to assume changes in the learning parameters (i.e., a memory advantage for particular events).
Summary and leading theory heuristic.Almost all research in this paradigm is related to judgments of majority versus minority groups. Applications of the major theory heuristic, distinctiveness, to formally analogous stimulus distributions in other content areas have been rare (see Table 3 ). In the absence of a comprehensive cognitive model, the latent conflict with the expectancy paradigm (which predicts a stronger influence of expected rather than distinctive information) has not been addressed explicitly. Nor has the relation to the second paradigm, the perceptual asymmetry of present versus absent features, been delineated systematically. With regard to the cognitive underpinnings, modern research tools of cognitive psychology have rarely been applied to substantiate the supposed memory advantage for infrequent observations (as described later). Cognitive boundary conditions are only crudely apparent in the general finding that infrequency-based illusory correlations are most pronounced under suboptimal encoding conditions. Thus, the illusion increases when memory load is high ( McConnell, Sherman, & Hamilton, 1994b ), learning is incidental rather than intentional ( Pryor, 1986 ), and stimuli refer to groups rather than individuals ( McConnell, Leibold, & Sherman, 1997 ; see meta-analysis by Mullen & Johnson, 1990 ).
Although the two gestalt notions of congruency and distinctiveness are among the most prominent building blocks of psychological theory formation ( Heider, 1958 ; Hunt, 1995 ; Kunda & Thagard, 1996 ; von Restorff, 1933 ), the theoretical foundation they provide for an explanation of illusory correlations is less than satisfactory. It entails the danger of being circular on the one hand and contradictory or incoherent on the other.
Circularity is present when a judgment bias toward expected correlations (e.g., strong male leaders) is explained by enhanced weight given to expected information, or when a bias in favor of salient, unexpected information (e.g., strong female leaders) is explained by a tendency for enhanced processing of unexpected observations. Moreover, the contradiction between the two gestalt notions of congruency and distinctiveness cannot be discarded as merely rhetorical. Congruency implies enhanced weight given to expected information, whereas distinctiveness gives superiority to unexpected, outstanding information. Thus, a priest's benevolent behavior should override his criminal behavior (congruency), but a priest's criminality should be particularly attention grabbing, leading to an opposite judgment bias.
From a metatheoretical view, it seems fair to characterize the situation as follows. Depending on what outcome is obtained in one particular context (i.e., a bias toward expected or unexpected behavior), theories rely on the congruency metaphor or on the distinctiveness metaphor. This post hoc feature, or hindsight reasoning, creates theoretical dissatisfaction. What is strongly needed is a comprehensive theoretical framework that allows for the forward prediction of illusory correlations, from antecedent conditions to observed consequences.
Stangor and McMillan (1992) offered an elegant solution to this problem in an elucidating meta-analysis. On the basis of signal-detection analyses, these authors drew a systematic distinction between genuine memory of original stimuli and guessing inferences involving prior knowledge. When the experimental task relies heavily on memory for original stimuli, the deeper encoding of incongruent information determines the outcome. However, when the task invites top-down inferences from prior knowledge structures, an advantage for expectancy-congruency information becomes apparent.
This solution relies on process dissociation. There might be no conflict or even contradiction at all between congruence and distinctiveness (see also Garcia-Marques & Hamilton, 1996 ) if both principles apply under completely different conditions: The congruency or expectancy bias might operate directly on covariation judgments, independently of the memory representation and recall of stimuli, which profit from the encoding advantage of incongruency. However, this peaceful solution of the conflict is more apparent than real. In fact, there is evidence that expectancies affect not only final judgments but also the perception, encoding, and disambiguation of stimuli from the beginning ( Fiedler et al., 1984 ). In the preceding example, the priest's behavior is more likely to be perceived, classified, and encoded as benevolent than mean (cf. Trope & Liberman, 1993 ). Moreover, these congruency effects on memory are correlated with later judgment biases. Likewise, the distinctiveness bias (at least in the infrequency paradigm) is not merely a matter of memory for original stimuli but reflects a substantial judgment bias ( Klauer & Meiser, 1998 ). Thus, the conflict does exist and cannot be discarded beforehand as belonging to two genuinely separate situations.
What are the properties that one would expect of a unifying theoretical framework? Three indispensable criteria are that the framework should indicate an algorithm that is valid, fertile, and noncircular. In other words, it should (a) account for the empirical evidence, (b) lead to new predictions, and (c) be precise in explicating the processes rather than relying solely on verbal paraphrases of the phenomena to be explained.
A simple learning model is now depicted that has the potential to meet all of the preceding criteria. It can account for expectancy-based and distinctiveness-based illusory correlations within the same framework. The model deviates from traditional accounts in several nontrivial predictions. In addition to illusory correlations reflecting biased cognitive processes, it predicts similar illusions when no biased processes are involved, thus overcoming the status of a theory that has to assume one bias to explain another. Moreover, the model helps to avoid conceptual confusion and to isolate variants of illusory correlations. And, finally, it offers the power and precision of a theory that is explicated as a transparent computer algorithm amenable to everyone rather than dependent on interpretations of a few privileged theorists ( Read et al., 1997 ; Smith, 1996 ).
The model to be proposed is derived from the BIAS framework ( Fiedler, 1996 ) that explains judgmental biases as a consequence of simple rules of associative learning in a probabilistic environment. The Brunswikian premise, or starting assumption, is that most meaningful correlation tasks refer to distal entities (e.g., health, leadership, danger, or femininity). This premise has crucial implications for the nature of the stimulus input and its cognitive representation. Because the distal variables cannot be perceived directly, the meaning of stimulus observations (e.g., a woman's leadership ability) has to be inferred from vectors of multiple probabilistic cues. For example, the cues that mediate the "perception" of leadership ability may include status symbols, formal dress, strong voice, and upright posture (see Figure 2 ).
It is typical of such distal perception that singular cues have rather modest diagnosticities; only the concert of multiple probabilistic cues warrants valid perception. Neither strong voice nor upright posture, nor any other cue alone, would afford a reliable index of leadership. However, misleading information in some cues can be compensated by other cues so that, over multiple cues, the distal variable can be inferred with reasonable validity. As a result of this imperfect, flexible relationship between the distal variable and its proximal cue indicators, the same value on a distal variable can appear in many different patterns or configurations. Thus, leadership behavior (even of the same person) does not always appear as the same pattern of voice, posture, facial expression, and so forth (just as lies, danger, attractiveness, and many other distal concepts are manifested in diverse cue patterns).
An essential implication of the Brunswikian approach is that the
basic format of a stimulus is not a singular scalar (or scale value
amenable to direct perception) but a distributive pattern of
information (a vector of multiple cue values).
Figure 2
illustrates how one can think of the
generation of distributive stimulus patterns. Let the column vector
Inductive judgments involve, by definition, the collection of multiple observations; just as the basic format of a stimulus is a vector (not a scalar), the entire stimulus series yields a matrix of stimulus vectors (as in Figure 2 ). Forming an inductive judgment of the entire stimulus series can be conceived as an aggregate over all matrix columns; in the simplest case, the aggregate could be the horizontal, featurewise sum of all "+" and "-" values, as shown on the right of Figure 2 . Crucial to understanding the simulation model is that the aggregate pattern resembles the ideal pattern from which all stimuli were generated more strongly ( r = .78) than the individual stimuli (i.e., repeated encounters will clearly reveal the politician's leadership qualities).
This reflects a most essential and consequential property of distributive stimulus representations arising from noisy multicue environments: Through aggregation over an increasing number of stimuli, the noise or error variance in the stimulus matrix is canceled out, and the systematic variance component (reflecting the distal variable) becomes increasingly visible. A single observation of an individual high in leadership ability may not reveal this latent trait. However, as the sample of observations increases, the aggregation process will make the strong leadership style more and more apparent, just as with an image on film that is gradually illuminated. (In a similar vein, aggregation over several observations is necessary to extract someone's attitude from multiple remarks in a discussion, to determine someone's intelligence from multiple items of a test, or to figure out someone's dishonesty, humor, or other distal attributes.)
Before the BIAS model is explained further and applied to illusory correlations, a few comments are in order regarding the problem of symbol grounding and parameter settings. First, the model is not restricted to binary cue values (plus vs. minus, as in Figure 2 ) but easily extends to cues that vary continuously on a quantitative scale. Second, the particular cues used for illustration were selected for convenience and can be substituted by many other cue sets. It is essential for Brunswik's ( 1955 , 1956 ) cognitiveecological approach that an adaptive organism changes and substitutes cues in a highly flexible fashion (e.g., depth perception during the day vs. at night). All of the implications derived from the model are purely structural and independent of the particular cues used for illustration. Third, the quantitative predictions will of course depend on several parametric decisions concerning the specific aggregation function (e.g., sum or weighted average), specific similarity measures (correlation or otherwise), number of cues, and so forth. However, the important qualitative results generalize over many reasonable aggregation functions and parameter values.
Finally, it is important to recognize that BIAS entails little restriction concerning the temporal course of the cognitive process. Thus, the multiple columns of the stimulus matrix do not necessarily imply a multitrace model ( Hintzman, 1988 ; Smith, 1991 ) in which individual stimulus exemplars are conserved until a final judgment is based on these original entries. Rather, aggregation may occur during encoding as well, or bunches of stimuli will be aggregated to higher order chunks during encoding, and the final judgment will transform these medium-sized aggregates into an overall aggregate. Whatever the temporal order and segmentation of these aggregation functions, the central implications remain the same.
Applying BIAS to Correlation Assessment TasksThe same basic prompting and aggregation process that has been outlined so far can now be applied to illusory correlations. The BIAS algorithm can simulate illusory correlations in two fundamentally different ways. On the one hand, traditional interpretations of the illusion can be simulated by ad hoc parameter settings reflecting the selective processing of salient, expected, or desired categories. Simulation of these cases is clearly due to those additional assumptions (e.g., increased weight given to expected stimuli) rather than intrinsic properties of BIAS. On the other hand, and more intriguing, the BIAS algorithm alone produces at least one variant of each type of illusory correlation without ad hoc parameters. Whereas the former demonstrations refer to illusions due to biased processes, the latter simulations highlight that no biased processes are needed to explain biased outcomes.
BIAS can even predict variants that have not yet been discovered for empirical research. The reported simulations begin with the unbiased case of accurate correlation assessment, to further illustrate and validate the algorithm, and then address illusory correlations based on genuine expectancy biases, similarity of meaning, and various types of distinctiveness. Simulation studies are discussed in terms of their cognitive analogs, and relevant empirical evidence is mentioned.
Modeling the correlation between two variables (e.g., leadership
and health) requires that the stimulus vectors include cues that
speak to both distal entities. Two subsets of cues, or segments, are
thus distinguished in the left part of
Figure 3
. The upper segment contains cue
information about health, in this case, ideal patterns describing
high (
One way to operationalize correlation judgment is to compare the
differential leadership impressions of healthy and nonhealthy people
(analogous to the so-called delta rule;
Allan, 1993
).
Accordingly, BIAS computes a leadership impression (aggregate)
associated with healthy people and a leadership aggregate associated
with unhealthy people; the difference between these two impressions
(i.e., of leadership given a high vs. low level of health) affords a
measure of the simulated correlation. First consider the way in
which BIAS arrives at a leadership impression judgment of healthy
people. The task instruction to judge healthy people corresponds to
using a prompt vector
As shown at the bottom of
Figure
3
, the aggregate
Having demonstrated how the model works and how it applies to judgments of actual correlations, I now address the crucial issue of illusory correlations. Cognitive process assumptions are translated into the BIAS algorithm. The transparency gained from explicating these assumptions within the BIAS framework is sufficient to make analytical distinctions between variants of illusory correlations that are normally confused. Simulations were run to demonstrate virtually all types of the phenomenon, some of which are the product of fully unbiased information processes.
Because the purpose was to simulate illusory rather than veridical correlations, the simulated frequency distributions were deliberately chosen to represent zero correlations. Within this restriction, both skewed and unskewed frequency distributions were used, on the basis of either equal cell frequencies, a = b = c = d = 10, or unequal cell frequencies, a = 20, b = 10, c = 10, d = 5 (see notation in Figure 1 ). In either case, the correlation was zero, because the same frequency ratio holds for both rows [a/(a + b) = c/(c + d)] and columns [a/(a + c) = b/(b + d)].
Each simulation began from a randomly chosen ideal pattern
representing positive levels on both variables, say,
Stimulus vectors were generated by copying the corresponding ideal type and randomly inverting a proportion i of all values. As already noted, this noise factor may reflect various sources of information loss, such as imperfect cue validity, unreliable perception, or memory decay. (Memory decay may be more adequately modeled by cue values set to zero; for simplicity, however, only inversions were used.) The number of stimuli generated from each ideal type corresponded to the cell frequencies a , b , c , and d , according to the simulated distribution.
High and low levels of one variable were chosen for convenience
as prompts (
Within this general frame, the various sources of illusory correlations could be modeled. Table 4 provides a summary of distinct process assumptions on which the simulations were based. Implementing these cognitive process assumptions involved the following procedures.
Genuine expectancy effects.Given the expectation that high (low) levels of health and leadership coincide, postulating an expectancy bias amounts to assuming a processing advantage of Cells A and D (observations of high leadership ability in healthy people and low leadership ability in unhealthy people) over Cells B and C. A systematic attempt to explicate this standard assumption in the literature on so-called expectancy-based illusory correlations within the BIAS algorithm leads to the distinction of (at least) three different processes that can all explain the dominance of expected over unexpected information (see Table 4 ). All three types represent biased processes reflected in additional parameter settings.
The first possibility is simply that people confuse actual stimulus data with older knowledge, stemming from previous observations or secondhand information about expected correlations. Thus, the a priori cell frequencies, before stimulus presentation, would not be zero; rather, they would reflect the prior expectancy that a + d > b + c . This case simply refers to appending additional column vectors for Cells A and D, reflecting self-generated stimulus expectancies in addition to actual observations.
Alternatively, the bias may result from selective memory favoring expected information. Thus, even when judges clearly understand the task to judge the correlation in the stimulus data and they do not confuse the data with older knowledge, they may process information about A and D more efficiently than information about B and C. This case can be simulated by setting the noise parameter i lower for Cells A and D than for Cells B and C. This could reflect enhanced attention, perception, encoding, or storage or reduced forgetting of expected information.
A third possibility is that there is neither source confusion nor selective memory for Cells A, B, C, and D, but the cognitive integration rule assigns higher weight to expected than unexpected stimuli (e.g., as a result of perceived validity or confidence). This refers to amplifying the weight given to stimulus vectors for Cells A and D relative to Cells B and C in the aggregation process. Because all of these process assumptions have a similar effectnamely, to overrepresent expected information in Cells A and Donly the latter case was simulated (i.e., double weighting of expected items). However, the other variants of an expectancy-based process can also be simulated easily.
Semantic similarity.As shown in the empirical review section, illusory correlations based on semantic similarity are treated in a manner synonymous to expectation effects, the implicit assumption being that semantic meaning is a major source of expectations. Within the BIAS framework, it is apparent that both types of illusory correlation have to be distinguished for analytical reasons. An illusion based on semantic similarity may occur in the absence of any expectation about the sign and strength of the covariation and without any biased processes, merely as a reflection of the overlap in the semantic features that characterize correlated variables. Thus, even when observers have to learn a fully new correlation, with all prior expectancies eliminated, BIAS alone will produce a correlation if there is only some overlap in the cue patterns mediating the assessment of the two correlated variables.
BIAS offers a concrete explanation for this abstract theoretical statement. Consistent with leading approaches to similarity ( Tversky, 1977 ), BIAS defines similarity in terms of feature overlap. Accordingly, the similarity between two attributes (such as health and leadership) depends on the number of common perceptual features shared by these attributes. Within BIAS, cue overlap affords a straightforward way to operationalize this notion of similarity. For instance, some of the cues (e.g., strong voice, upright position, and no warm expression) that have been assumed to represent high leadership ability ( Figure 2 ) are also indicative of strong health. This meaning overlap, which reflects semiotic confusion rather than any expectancies about the distribution of events, was simulated by overlapping vector segments (see Figure 4 ). The segments for the two attributes are not clearly separated; rather, they overlap in the middle portion. In addition to nine pure health cues and nine pure leadership cues, the simulation to be reported assumed four overlap cues to represent the similar meaning of health and leadership. (Again, the chosen degree of overlap affects the strength but not the quality of the simulated results.) However, importantly, no ad hoc assumption on biased processing of particular events was added.
In summary, the analytical clarity of an algorithmic approach serves to refine and specify the notion of expectancy-based illusory correlations. Although commonly treated as a homogeneous phenomenon, expectancy effects can originate in such fundamentally different cognitive processes as source confusion (of observed and self-generated items), selective processing (of Cells A and D), and enhanced weighting of expected information during the final integration process. Even more important, from a theoretical point of view, is the insight that similarity-based and expectancy-based illusory correlations must not be confused, for analytical reasons. The semiotic mechanism underlying similarity effects can occur in the absence of any prior expectancy or belief, merely as a consequence of the confounding of the cues that mediate the assessment of distal attributes.
Distinctiveness through asymmetry.The pervasive tendency to give unequal weight to information in different cells of a contingency table (see evidence in Table 2 ) reflects the enhanced informativeness or distinctiveness of positive attribute levels (e.g., about the presence of symptoms) as compared with the rather pallid and less distinctive appearance of negative attribute levels (e.g., absent symptoms). This basic asymmetry in the learning of positive versus negative information, often referred to as a feature-positive effect ( Jenkins & Sainsbury, 1970 ; Newman, Wolff, & Hearst, 1980 ), constitutes the crucial cognitive assumption explaining why Cell A (representing positive levels on both variables) normally receives the highest weight and why cell weights are ordered A > B > C > D.
This type of distinctiveness, based on the figureground
asymmetry of positive and negative attribute levels, was simulated
by different degrees of resemblance of stimuli to the respective
ideal patterns. Let
A more radical way to simulate the asymmetry of present and absent attributes would be to start from a list of present features to represent positive variable levels and to add only a single cue, for negation, to represent a negative variable level. In this case, the asymmetry is even more apparent. However, in the present simulations, this case was ignored, and only the more subtle case was chosen in which positive information is encoded into slightly more distinctive cue patterns than negative information ( i + = .33 vs. i - = .44). Note that although this assumption does not in itself entail a correlation bias, the BIAS algorithm alone can simulate illusory correlations under this condition.
Distinctiveness through infrequency.The HamiltonGifford or von Restorff type of distinctiveness-based illusory correlation ( Table 3 ) rests on the cognitive assumption that the least frequent attribute combination is most salient and encoded at the deepest level, thus resulting in a memory advantage for rare events. This assumption is often used to explain illusory correlations obtained with skewed distributions in which absolute frequencies differ ( a = 20, b = 10, c = 10, d = 5) but the correlation is zero [i.e., a /( a + b ) = c /( c + d ) = 20/(20 + 10) = 10/(10 + 5)]. If the extra distinctiveness of the five cases in Cell D (e.g., the five cases of unhealthy nonleaders) represents a memory advantage, the subjective correlation should rise above zero.
Within the BIAS model, the suggested memory advantage of Cell D can be specified to mean that little Cell D information is forgotten; this amounts to assuming a reduced noise parameter i for Cell D. Alternatively, one might assume that Cell D data receive an extra weight (as a result of increased confidence or deeper encoding). Because these two assumptions lead to similar effects, only the case of enhanced weights was simulated. In any case, this simulation represents a biased process, as evident in the ad hoc parameter set for Cell D.
Notably, the BIAS algorithm isolates one type of infrequency effect that is not mediated by distinctiveness at all. BIAS predicts an illusory correlation in the absence of any enhanced memory for Cell D, simply because a /( a + b ) = 20/(20 + 10) is psychologically more "significant" than c /( c + d ) = 10/(10 + 5). That is, a large sample of 20 leaders among 30 healthy people is worth more than 10 leaders among 15 unhealthy people, simply as a consequence of the unequal sample size. Simulations demonstrated that no increased weight or reduced i has to be assumed for Cell D; different sample sizes alone will produce illusory correlations. Thus, as for the other two classes of illusory correlations, at least one variant of infrequency-based illusions arises from the associative algorithm alone, in the absence of biased processes.
Simulation Results for Specific Illusory Correlation EffectsSimulation studies were not conducted for all combinations of the preceding assumptions (Frequency Distributions × Types of Expectancy Effects × Degree of Overlap × Distinctiveness Types); rather, they were conducted only for specific conditions corresponding to real research paradigms or particularly interesting cases. The simulated conditions and results (all based on 100 simulated "subjects") are given in Table 5 .
Accurately assessed zero correlations.
The first row of
Table
5
shows that BIAS correctly predicts zero correlations
when the actual correlation is zero and there is no skew, no
asymmetry of attribute levels, no selective forgetting, and no
selective weighting. In this case, the aggregates
When all other parameters remained unchanged and only the assumption of selective processing of Cells A and D was introduced (i.e., double weighting for A and D), the mean simulated correlation was substantial (.60 vs. .05). This case reflects a genuine expectancy effect. Of course, a very similar effect would result from reduced forgetting of information from Cells A and D (defined in terms of reduced i ).
Similarity based on cue overlap.
Similarity-based illusory correlations may be independent of
expectancies. In the absence of any selective processing, an
artificial correlation may be perceived because the respective cue
sets are confounded. Note that in this simulation, all items from
all cells were processed with the same weight and accuracy, and no
asymmetry or skewed distribution was involved. The extremely strong
illusory correlation (.86 vs. .39) reflects the high overlap
proportion (through four common cues, with nine specific cues for
As a means of simulating the pure asymmetry version of illusory correlations, all biases were avoided, and separable (nonoverlapping) nine-cue segments were used to represent both variables. Asymmetry was introduced merely by making all negative stimuli somewhat less diagnostic ( i - = .44) than positive stimuli ( i + = .33). This slight asymmetry in diagnosticity of positive and negative attribute levels caused illusory correlations ( r = .45 vs. r = .31).
Cell D distinctiveness.Illusory correlations based on skewed stimulus distributions are often explained in terms of an alleged memory advantage for the most infrequent information in Cell D, thought to be particularly distinctive. As Table 5 confirms, the joint operation of skewed distributions and a Cell D advantage (reduced i of .17) led to a strong effect ( r = .58 vs. r = .18).
Pure aggregation effect.When the bias in favor of Cell D is removed, the skewed distribution can produce a significant correlation (.67 vs. .36) without any distinctiveness effect. This is merely due to aggregation from different sample sizes without any processing bias.
Distinctiveness of frequent events.To highlight the conceptual independence of distinctiveness and skewed distributions, the final simulation in Table 5 maintained the different sample sizes (giving rise to positive correlation judgments) but increased the distinctiveness (double weighting) of Cell C information (implying negative correlation). The net effect was a negative correlation (.38 vs. .71). However, this effect was weaker than the impact of Cell D distinctiveness because of the opposing trend of a positive correlation due to the skewed distribution.
Summary.In conclusion, the simple associative rule underlying the BIAS algorithm can be used to model all "classical" types of illusory correlations within the same basic framework. Regardless of whether expectancy-based illusory correlations are simulated by selective weighting or forgetting, whether distinctiveness is introduced by asymmetric attribute levels or infrequent event classes, or whether distinctiveness is introduced as enhanced salience or enhanced memory, BIAS can simulate the illusions. Moreover, the potentially inverse influences of expectancies and distinctiveness can be located in different facets of the same cognitive process that becomes clear and transparent in the computer model.
Most important, the simulated illusions extend beyond the familiar notions of congruency and distinctiveness, showing that illusory correlations are obtained in the absence of any expectancies, merely as a result of cue overlap, and independently of distinctiveness, merely as a consequence of unequal sample sizes. All three traditional classes of illusory correlations (see Tables 1 3 ) could be simulated when biased-process assumptions were built into the algorithm via arbitrary parameters. However, more originally from a theoretical point of view, for each class of phenomena at least one simulation was successful without any biased processes, as a natural consequence of the BIAS algorithm. This pertains to similarity-based illusory correlations due to feature overlap, to the asymmetry of positive and negative attribute levels, and to infrequency-based illusory correlations due to sample size alone.
Implications and Insights Gained From the BIAS SimulationsTo the extent that the multicue assumption of BIAS applies to real information processing, the present simulations have rather challenging implications. They reveal a number of ways in which common theoretical explanations of illusory correlations, based on the two gestalt notions of congruency and distinctiveness, must be refined, extended, corrected, and tested in future research.
The literature on expectancy-based illusory correlations, to begin with, has failed to distinguish between two fundamentally different cognitive processes, only one of which is driven by expectancies. Whenever participants' prior expectancies have not been controlled directly, but the crucial independent variable has relied on semantic similarity of attribute meanings, the alleged expectancy effect may actually reflect semiotic confusion effects. As the BIAS algorithm elucidates, the tendency to report correlations between semantically similar attributes (e.g., leadership and health) may be due not to any processing advantage or higher weight given to expectancy-congruent stimuli but to overlapping cue sets contaminating the assessment of similar attributes. Even when covariations are learned among completely new stimuli, ruling out any prior expectancies, the similarity of stimulus features can lead to illusory contingencies.
Of course, the BIAS model can assimilate genuine expectancy effects as well, giving a processing advantage to expected stimulus events (in Cells A and D). However, a serious problem regarding the interpretation and validity of the entire literature on so-called expectancy biases remains, because few attempts have been made to set expectancy effects apart from similarity effects. In operational terms, validity checks have not separated individual judges' prior expectancies and the cues mediating the similarity of attributes.
How likely and how plausible is an alternative account of expectancy effects in terms of semiotic diffusion? A vast body of evidence on learning, memory, and cognition converges in the conclusion that, if anything, unexpected, surprising, or script-inconsistent information is elaborated more deeply and recalled better than expected or schema-congruent information ( Cheng, 1997 ; Hastie, 1980 ; Rescorla & Wagner, 1972 ; Stangor & McMillan, 1992 ). Thus, the available evidence suggests that any processing bias (in terms of effective encoding or reduced information loss, as measured by i ) facilitates memory for unexpected rather than expected information.
One might conjecture that correlation judgments are driven not by memory for individuating stimulus information but by a top-down process that is mostly sensitive to prior knowledge. And, indeed, when the experimental task setting encourages guessing based on expectations rather than recall of individual stimuli, expected information may override unexpected data (see Garcia-Marques & Hamilton, 1996 ; Heit, 1993 ; Stangor & McMillan, 1992 ). In BIAS, this would correspond to a case in which initial stimulus processing is unbiased ( i constant) and an extra weight of expected information is introduced in the final integration process. However, then the unresolved question remains as to the origin of the expectations that influence final judgment or guessing. One possible answer is, again, semantic similarity.
Conversely, much can be said in favor of a cue-overlap account, in accordance with a feature approach to similarity ( Tversky, 1977 ). After all, the crucial independent variable in most experiments on expectancy-based illusory correlations is the semantic relatedness of attribute names ( L. J. Chapman & Chapman, 1967 ; Hamilton & Rose, 1980 ; Miller, 1971 ) rather than direct manipulations or measures of expectancies (i.e., subjective likelihood of anticipated stimulus events). A prominent task for future research is to differentiate between genuine expectancy effects and other aspects of similarity, such as cue overlap.
The notion of cue overlap opens a new semiotic perspective on cognition and social cognition. Just as two personality tests may exhibit an artificial correlation because a subset of items occurs in both tests, perception in a Brunswikian multiple-cue world may confound the meaning of distal concepts. Stereotypes linking masculinity and leadership, femininity and emotionality, or novelty and danger may arise from neither expectations nor motivational biases but may simply reflect semiotic confounding of the related concepts.
With respect to the other area of illusory correlations based on distinctiveness, the simulations reveal a similar need to distinguish between fundamentally different cases. Most important, the model highlights the need to conceptually and operationally distinguish between the infrequency of Cell D and the enhanced encoding or memory of Cell D information. An experimental analog of the present simulations would involve a design in which the infrequency of Cell D is manipulated independently of the salience of stimuli. For instance, the stimulus distribution might be equal ( a = b = c = d = 10) or skewed ( a = 20, b = 10, c = 10, d = 5), whereas an orthogonal manipulation might give extra attention to either Cell D or Cell C events. The very crossing of both manipulations in one design shows that both factors are conceptually different, as explicated in BIAS. Whereas one factor (enhanced salience) presupposes a biased process, the other factor (infrequency) produces illusory correlations through a purely unbiased aggregation process. To my knowledge, such experiments have rarely been conducted (but see Fiedler & Stroehm, 1986 ).
Of course, the different sources of illusory correlations are not mutually exclusive. Although skewed frequency distributions alone can produce illusions ( Fiedler, 1991 ; Smith, 1991 ), this does not preclude that infrequent events can have a memory advantage, as suggested by Hamilton and colleagues ( Hamilton, Dugan, & Trolier, 1985 ; Hamilton & Sherman, 1989 ). However, as just mentioned, both sources have not been separated in the same design; experimenters have confined themselves to manipulating the infrequency of one cell (e.g., negative behavior in a minority). Thus, the crucial theoretical question is whether there is cogent evidence to support the mediational assumption that infrequency effects observed in these experiments are actually due to a memory advantage for the most infrequent observations.
Proponents of such an account have pointed to occasionally obtained evidence for enhanced recall ( Hamilton et al., 1985 ; Hamilton & Sherman, 1989 ; Mullen & Johnson, 1990 ) or prolonged encoding latencies of infrequent stimuli ( Johnson & Mullen, 1994 ). However, a critical inspection of this research shows that evidence for enhanced memory is hard to replicate ( Fiedler, Russer, & Gramm, 1993 ; Klauer & Meiser, 1998 ) and that the latency data suffer from a failure to control for speedaccuracy trade-offs. With regard to accuracy, memory for infrequent events is worst in that judgments of this category exhibit the strongest bias. Thus, alleged evidence for enhanced memory has been typically confused with overestimation or exaggeration effects. When more refined methods are used to analyze recall performance, such as signal-detection analysis ( Fiedler, Freytag, Walther, & Nickel, 1997 ; Fiedler et al., 1993 ); or multinomial modeling ( Klauer & Meiser, 1998 ), there is no support for enhanced memory of infrequent stimuli. However, a strong response bias is apparent in that judges associate the frequent level of one variable with the frequent level of the other variable (e.g., positive behavior with the majority).
Apart from the empirical evidence, the distinctiveness account of infrequency effects suffers from a serious problem, as neatly delineated by Hunt (1995) . Distinctiveness is used as a theoretical construct to explain the impact of infrequency on correlation judgments, indicating that infrequent observations are salient and prominent in memory. However, at the operational level, distinctiveness (the theoretical construct) is either equated with infrequency (the independent variable) or inferred from the resulting bias (the dependent variable). Operationally independent measures of distinctiveness are extremely rare. Thus, the same variables appear in the explanation as in the phenomena to be explained.
The present review has identified several strands of research on illusory correlation that have received considerable attention in cognitive and social psychology. The systematic distortion ( Shweder, 1982 ) of correlation assessments provides a challenging research topic with serious implications for many applied areas such as diagnostics, stereotyping, marketing, and decision making.
However, from a theoretical perspective, the review revealed that explanations of illusory correlations have not been embedded in a comprehensive model and important distinctions have gone unnoticed. Almost all accounts have referred to biased cognitive processes and selective weighting as sources of illusory correlations. But the precise cognitive algorithms have not been spelled out clearly, and potential alternatives have been neglected. In different paradigms, theoretical accounts have emphasized either a processing advantage of expectancy-congruent information or an advantage of expectancy-discrepant, distinctive information. Both assumptions arise from two basic gestalt principles, congruency and distinctiveness, the plausibility of which may have hindered deeper, more critical analyses.
Within the framework of a Brunswikian, probabilistic multiple-cue model, BIAS, a simple associative algorithm, was proposed; this model provides a comprehensive framework for various types of illusory correlations. Within this framework, the reviewed phenomena could be integrated. The model also helped to elucidate the underlying processes, to distinguish analytically between qualitatively different variants of the illusion, and to point out new variants and formulate open questions for future research.
Simulations highlighted the possibility that so-called expectancy-based illusory correlations may, to an unknown degree, reflect an influence of similarity that is independent of expectations. An alternative, semiotic account suggests that cue overlap alone can cause illusory correlations between semantically similar attributes. It was also clarified that genuine expectancy effects can reflect different processes (cf. Hamilton, 1981 ) such as selective information search, biased encoding, forgetting, or differential judgment weights.
Within the other major domain, distinctiveness-based illusory correlations, the model also served to distinguish several variants of genuine distinctiveness effects. Illusory correlations may be due to the enhanced salience of selected stimuli during encoding, to enhanced memory for particular event combinations, or to an extra weight in the final judgment stage. However, without any such bias, an illusion can also arise from skewed frequency distributions alone. Because prior experimental research has failed to isolate these separate sources, many findings remain equivocal. Experimental research is strongly needed to manipulate the different types of distinctiveness in an orthogonal fashion. Skewed frequencies alone produce the illusion, without biased processing; thus, an unequivocal distinctiveness effect requires a stronger illusion than the pure infrequency bias.
With reference to an explicit and transparent algorithm such as BIAS, it is not only possible to clarify theoretical issues related to illusory correlations. The algorithm may also facilitate recognition of other paradigms, usually treated under different labels, as hidden cases of illusory correlations. Four such related paradigms are addressed briefly, and the BIAS framework is used to widen the scope of illusory correlation research.
Intergroup DiscriminationAs has been shown, BIAS correctly predicts that the same prevalence of desirable over undesirable behaviors is more readily detected in a large (majority) than in a small (minority) sample or group ( Fiedler, 1991 ; Hamilton & Sherman, 1989 ). However, such a constellation is by no means confined to minority issues. Regardless of the actual group size, people may have more information on one group than another (as a result of familiarity, proximity, etc.). Because more observations are usually available on one's in-group than one's out-group, in the environment as well as in memory, the (normative) predominance of positive, desirable behavior should be more apparent for the in-group.
This sort of an illusory correlation is supported by hundreds of intergroup studies showing a relative in-group-serving bias ( Messick & Mackie, 1989 ; Tajfel, 1982 ). Moreover, if observations refer to different behavioral trait dimensions, BIAS predicts that more different traits should be detected from the larger sample of the in-group, leading to a more differentiated and less homogeneous impression of the in-group than the out-group ( Judd & Park, 1988 ; Linville, Fischer, & Salovey, 1989 ). Thus, when applied to the illusory correlation between group membership (in-group vs. out-group) and valence (positive vs. negative), BIAS explains two major intergroup phenomena at the same time, the in-group-serving bias and the out-group homogeneity effect ( Fiedler, Kemmelmeier, & Freytag, 1999 ).
As an aside, a curious conflict is created when the asymmetry of in-groups and out-groups is reframed as an illusory correlation. The out-group homogeneity effect has been explained by the assumption that the smaller samples of out-group information are represented as abstract prototypes, whereas the richer samples of in-group information are represented to a much greater extent on distinctive, detailed information about individual cases ( Judd & Park, 1988 ; Park & Hastie, 1987 ). This is in sharp contrast to the basic assumption in the illusory correlation paradigm that small groups (infrequent event categories) are more distinctive than large groups ( Hamilton & Sherman, 1989 ). The BIAS model clarifies that both assumptions are not strictly necessary. Favorable and differentiated judgments of large groups can be simulated without different processing or representation assumptions.
Illusory Hypothesis VerificationUnequal sample sizes, as a source of illusory correlations, may be self-generated rather than provided by the experimenter or the environment. People may, for several reasons, think about, discuss, and search more information on one category than another. For instance, in an election campaign, voters may expose themselves more to arguments of their own party than to arguments of an opponent party, even when the argument pools are equally large (or infinite). Similarly, unequal samples may result from lopsided discussions or memory search, producing the same kind of illusion as between in-groups and out-groups or majorities and minorities.
The crucial role of sample size, or aggregation, speaks to the well-known verification tendency in hypothesis testing, commonly referred to as confirmation bias ( Snyder, 1984 ) or self-fulfilling prophecies ( Jussim, 1986 ; Kukla, 1994 ). When trying to determine whether their interview partner is an extravert, people form a more extraverted impression of the partner than when the question focuses on the partner's introversion ( Snyder & Swann, 1978 ; Swann, Guiliano, & Wegner, 1982 ; Zuckerman, Knee, Hodgins, & Miyake, 1995 ). These and countless similar findings can be reframed as infrequency-based illusory correlations, for two reasons. First, hypothesis testers typically engage in positive testing ( Klayman & Ha, 1987 ); that is, they gather larger samples on the focused category (e.g., extraverted behaviors) than the unfocused category (introverted behaviors). Second, confirming answers are generally more likely in social communication than negative, disconfirming answers ( Zuckerman et al., 1995 ). Both tendencies together will produce the kind of skewed distribution that BIAS has shown to produce illusory correlations in the absence of any further bias. Experimental support for this was recently presented by Fiedler, Walther, & Nickel (1999) .
Note also that the BIAS algorithm predicts that another variant of illusory correlations may contribute to the verification bias. To the extent that confirming data (e.g., actually observed behaviors) are more diagnostic than disconfirming data (e.g., omitted, unobserved behaviors), judgments should be mainly determined by confirming evidence for the focused category. As shown in the simulations, enhanced diagnosticity alone can create illusory correlations, or verifications ( Trope & Bassok, 1983 ; Trope & Thompson, 1997 ).
Learning and ConditioningThe affinity of the BIAS model to learning processes is immediately evident. On the basis of Garcia and Koelling's (1966) pioneering work, Seligman (1970) introduced the notion of preparedness into the field of classical conditioning. Organisms are prepared or predisposed to learn the association of an unconditional stimulus (e.g., sickness) to certain conditional stimuli (olfactory sensations) more readily than to others (electrical shock). This phenomenon is often treated in a manner similar to another manifestation of an expectancy effect. Over a long evolutionary period, an organism seems to have learned that olfactory sensations most likely predict sickness. Such an interpretation is facilitated by the neighborhood of other conditioning experiments using more meaningful stimuli, showing, for instance, that frowning faces are more easily associated with an aversive unconditioned stimulus than friendly faces ( Orr & Lanzetta, 1980 ). According to the present approach, this may not be justified. Whereas the readiness to associate aversive shock with frowning (as opposed to smiling) faces may indeed reflect an expectancy effect, the preparedness to associate sickness and olfactory sensations may be due to confounded, overlapping cues in the area of odor and taste.
At the same time, there is ample evidence for an important role of distinctiveness in conditioning, especially with reference to the leading model of Rescorla and Wagner (1972) . Unexpected conditioned stimulusunconditioned stimulus pairings that are not predicted from previous stimulus pairings (i.e., distinctive events) cause more learning progress than nonsurprising, anticipated events, as evident in the blocking effect ( Kamin, 1968 ; Sanbonmatsu, Akimoto, & Gibson, 1994 ).
Bayesian InferenceCountless experiments have been published on the so-called base-rate neglect in probabilistic inference. In a typical task ( Gigerenzer & Hoffrage, 1995 ), judges are asked to estimate the probability of breast cancer given a positive mammography based on statistical data on the contingency between these two variables. Judges normally exaggerate the conditional probability of breast cancer given a positive mammography, as if they were ignoring the low base rate of the criterion event. The same bias can be obtained with sequential observations, just as in illusory correlation experiments ( Fiedler, Brinkmann, Betsch, & Wild, in press ).
Again, the common explanation of base-rate neglect, in terms of the representativeness heuristic ( Kahneman & Tversky, 1972 ), highlights the similarity in meaning of breast cancer and positive mammography, providing another case for the congruence rule. Within the BIAS approach, viable alternatives are immediately apparent (see Fiedler et al., 1999 ). The same sort of overestimation of the co-occurrence of breast cancer and positive mammography may originate in the enhanced diagnosticity of positive information (concentrating on women with breast cancer and ignoring women without breast cancer) or an infrequency-based illusory correlation (yielding an overestimation of rare events, such as breast cancer).
What Are the Cognitive Properties of BIAS?Crucial to understanding the "cognitive properties" of the BIAS algorithm is that it is sensitive to event frequencies as well as similarities or extensional (statistical) as well as intensional (meaning-related) information ( Fiedler & Stroehm, 1986 ). BIAS integrates both sources of information within the same algorithm. The statistical relation between x and y is represented by the number of stimulus items making up the bivariate distribution. The semantic similarity of "x" and "y" enters as the cue overlap of respective segments used to represent x and y . Thus, whereas the matrix columns (see Figure 3 ) reflect the statistical stimulus distribution, the cue composition across rows contains the intensional (similarity) information. Both influences, event frequencies and similarity, can principally compensate each other. When the statistical distribution (of columns) does not support a correlation, a judgment prompt that activates many overlap cues will distort the resulting judgment in a way that exaggerates the similar, overlapping attributes. In this way, BIAS helps to bridge the categorical gap between two sources of seemingly incomparable information ( Shweder, 1977b ).
Closely related to the two sources of frequency-related and similarity-related information are the two major "cognitive properties" of BIAS that are responsible for its ability to simulate so many empirical phenomena (cf. Fiedler, 1996 ; Fiedler et al., in press ). These two key properties are the differential aggregation resulting from unequal statistical samples and the built-in similarity function resulting from overlapping cues. BIAS shares these properties with other connectionist approaches involving distributed representations of noisy data ( Kashima & Kerekes, 1994 ; Kruschke, 1992 ; McClelland & Rumelhart, 1985 ; Smith, 1991 , 1996 ). It is no wonder that these alternative models would also allow for simulations of illusory correlations. BIAS was only chosen as the simplest approach with a minimum of assumptions and parameters. What accounts for illusory correlations is not specific parameters or functions but the basic qualitative properties of aggregation and semiotic cue overlap.
ConclusionThe intended message of the present article, if it was conveyed successfully, can be summarized as follows. Illusory correlations have attracted considerable research interest in many fundamental as well as applied domains. However, their popular theoretical explanations in terms of two basic gestalt rules, congruency and distinctiveness, remained incomplete and did not reach the level of a clearly spelled out algorithm. A review of the empirical literature revealed three major research areas that developed in relative isolation, with little cross- referencing: illusory correlations based on expectancies, those based on the asymmetry of positive and negative attributes, and those based on stimulus infrequencies. Different theory heuristics have guided the research in these areas, pointing to separate aspects of the cognitive process. However, no comprehensive theoretical framework has been developed within which these different aspects of the cognitive process can be located and pitted against each other.
As a step toward such a framework, a simple distributive learning model was presented as a transparent computer algorithm. This model can account for virtually all qualitative variants of illusory correlations, with very few assumptions. In addition to its simplicity and explanatory value, the model gives rise to alternative explanations of old phenomena and original predictions of neglected types of illusory correlations. Moreover, the model helps to integrate research from other paradigms that are not commonly recognized as variants of illusory correlations.
However, perhaps the most important theoretical insight gained from the present approach is an understanding of how and why it is possible that biased correlation assessments need not originate in biased processes. Although it is hardly surprising that biased cognitive processes (due to expectancies, distinctiveness, or salience) can lead to biased judgments, a more intriguing theoretical issue is how aggregation effects and cue overlap can cause biased outcomes in a completely unbiased information-processing device.
Table 1. Overview of Research on Expectancy-Based Illusory Correlations
Table 2. Overview of Research on Illusory Correlations Originating in Unequal Weighting of Different Event Combinations
Table 3. Overview of Research on Illusory Correlations Originating in Distinctiveness or Infrequency
Table 4. Implementation of Various Cognitive Process Assumptions in BIAS to Explain Different Influences on Subjective Correlation Assessment
Table 5. Mean Simulated Correlation Judgments Across 100 Replications for Different Variants of Illusory Correlations
Figure 1. Conventional notation of the 2
× 2 contingency between the
presence versus absence of a cause
and the presence versus absence
of an effect (or, in the generalized
case, between the positive and
negative levels of two attributes).
Lowercase letters
a
,
b
,
c
, and
d
represent the statistical
frequencies of observations
pertaining to the four cells denoted in
the text by uppercase
letters
A
,
B
,
C
, and
D
.
Figure 2. Illustration of an aggregation
effect within the BIAS framework. All
stimulus vectors (matrix
columns)
Figure 3. Modeling illusory correlations
within the BIAS framework. The 36
vertical stimulus patterns are
generated from all four combinations
of the ideal patterns at left,
representing high and low health
(
Figure 4. Overlapping cue
representations that create semiotic confusion
between two distal
concepts (e.g., leadership and health) in a
distributive
framework.