    Brit. J. Phil. Sci. 49 (1998), 31-47

    Probabilities and Epistemic


    Eric Christian Barnes

    A B S T R A C T

    A pluralistic scientific method is one that incorporates a variety of points of view in

    scientific inquiry. This paper investigates one example of pluralistic method: the use of

    weighted averaging in probability estimation. I consider two methods of weight

    determination, one based on disjoint evidence possession and the other on track

    record. I argue that weighted averaging provides a rational procedure for probability

    estimation under certain conditions. I consider a strategy for calculating 'mixed

    weig hts' which incorporate m ixed information about agent credibility. I address various

    objections to the weighted averaging technique and conclude that the technique is a

    promising one in various respects.

    1 Introduction

    2 Scenario 1: the mapmakers

    3 Scenario 2: the probab ilistic jury

    4 Applications for the two scenarios

    5 Mixed weights

    6 Weighting gone wrong



    1 Introduction

    Peter is a bookie who posts odds on sporting events. In determining the

    probability that the Dallas Cowboys will defeat their next opponents, Peter

    takes into account the various strengths and weaknesses of both teams. Peter

    then declares that this probability is 0.5 (thus Peter posts even odds on this

    event).But at this point Peter learns that another bookie, Diane, has declared

    that the probability of a Cowboys' victory in this case is 0.01Peter is

    surprised, for he holds Diane's opinions on such matters in high regard.

    Peter may feel that he should modify his own posted probability on the basis

    of knowing the probability Diane has posted. But how, exactly, should he go

    about assimilating this information? Peter's predicament raises important (and

    largely unexplored) questions about how rational agents do or should make use

    of other persons' posted probabilities as evidence. How, if at all, is Peter to

    combine the content ofhisown deliberations about the result of the Cowboys'

    Eric Christian Barnes

    game with the information about Diane's probability to compute an updated


    The question as to what degree Peter takes Diane's probability seriously

    obviously depends on the extent to which Peter takes Diane to be a credible

    authorityand it is clearly important in this regard how (in Peter's view)

    Diane's credibility compares to his own. The epistemics of such a case can of

    course be made more complicated by the inclusion of other bookies and their

    probabilities as well.

    Philosophers have differed over whether rational agents ought to consider

    other persons' probabilities as epistemically significant. Earman argues


    p. 30), that 'It is fundamental to science that opinions be evidence-

    driven' rather than driven by reference to other persons' opinions. But Barrett

    [1996] argues that it certainly can be rational to use other persons' beliefs as

    evidence, and provides strong substantiation that the history of quantum

    mechanics was driven in part by such evidence. Philip Kitcher's ([1993],

    Ch. 8) treats at some length the basic notion of the cognitive authority of

    individual scientists, and develops extensive apparatus for showing how

    assessments of cognitive authority can be crucial for explaining the strategies

    of individual scientists and optimizing science policy.'

    It is not hard to understand why such controversy exists. In his essay

    'Epistemic Dependence' John Hardwig argues that there is a long-standing

    tradition in the Western World of 'epistemological individualism', which

    maintains that one m ay not rationally defer to the opinions ofothersin forming

    beliefs. Hardwig diagnoses the causes of this tradition as follows:

    The idea behind [epistemological individualism] lies at the heart of one

    model of what it means to be an intellectually responsible and rational

    person, a model which is nicely captured by Kant's statement that one of

    the three basic rules ormaximsfor avoiding erroristo 'think for




    I think, an extremely pervasive model of rationalityit underlies

    Descartes's methodological doubt; it is implicit in most epistemologies; it

    colours the waywehave thought about knowledge.Onthis view, the very

    core of rationality consists in preserving and adhering to one's own

    independent judgement



    Hardwig ultimately concludes, however, that such a model provides a 'romantic

    ideal' which ultimately results in less rationalbelief.Once we realize that the

    reliability of other persons as cognitive agents is something that can be scruti-

    nized and evaluated critically, there is no reason not to avail ourselves of the

    evidential significance of other persons' beliefs, including their probabilities.

    But we should b e aware that this proposal strikes at the heart of much of what has

    long been taken for granted about reasonablebelief.


    Kitcher briefly no tes at one point that weighted a veraging of the sort discussed here could be

    useful in some contexts ([1993], p. 336).

    Probabilities and Epistemic Pluralism 33

    The purpose of this paper is to assess one natural proposal relevant to these

    issues. This proposal runs as follows: each agent i of N-many posters of

    probabilities for theory T is assigned a 'weight' w; which reflects that agent's

    com petence relative to the other N 1 agents (the weights sum toone).W here

    p,is the probability i assigns to T, we comp ute the comm unal probability P(T)

    as the weighted sum of the various posted probabilities:

    P(T) = w,p, + w


    p2 + ... + w





    This technique of weighted averaging of probabilities thus provides a straight-

    forward method of accommodating the epistemic significance ofadistribution

    of probabilities across a community.

    I find the weighted averaging method interesting because it is both prima

    facieplausible yet not obviously sound. Most importantly, it is far from clear

    how the relevant weights are to be determined. It is not intuitively clear to me

    that a scientist's competence is the kind of thing that can be measured by a

    simple numeral. Even assuming a quantitative comparison of competence, it is

    hardly obvious that the best way to take account of measured competence in

    assessing the probability of some theory is to compute the product of compe-

    tence w ith the probability endorsed by the quantized agen t and use the product

    as a component of an updated probability.

    The purpose of this paper is to assess the merit of the weighted averaging

    approach to probability updating. Our fundamental difficulty, as noted, is the

    problem of how weights are to be understood and calculated. Weights are

    somehow based on assessment of competence, but 'competence' is of course a

    complex and m ultifaceted notion. A knowledge se eke r's competence consists of

    her overall relevant knowledge, logical acumen, and disposition to avoid various

    kinds of error, but presumably covers as well the see ker's skill and judgem ent in

    intuitively assessing complex bodies of evidence. Even the seeker's character

    could play a role in determining her competence, insofar as consistently resilient

    know ledge seekers are more likely to


    he truth than those w ho give up easily.

    My strategy will be to consider two simplified examples in which we isolate a

    particular species of competence and consider how weights might reflect it. (I

    then consider how to calculate 'mixed w eights' which are based on both species

    of competence.) The two scenarios differ in the following way (among others).

    In the first scenario, individuals begin w ith identical relevant background beliefs

    (represented here as identical probability functions) but independently explore a

    large and diffuse body of evidencethus the various bodies of evidence they

    accumulate differ. In the second scenario, a group of individuals are presented

    with a single body of evidence, but begin deliberations with different back-

    ground beliefs (represented here as different probability functions). As we will

    see, weighted averaging turns out to be a tool that will deliver rational

    probability updates under various carefully described conditions.

    Eric Christian Barnes

    Before proceeding, we should note that Keith Lehrer has in a series of

    published works (Lehrer [1976a, 1976b, 1977, 1978]) made use of the

    weighted averaging method; the formal aspects of Lehrer's program have

    been further developed by Carl Wagner ([1978, 1981]). Their program is

    laid out in detail in Lehrer and Wagner's [1981] book Rational Consensus

    in Science and Society (referred to as 'L&W' hereafter). L&W's program

    focuses on the use of weighted averaging to explain the phenomenon of

    consensus in epistemic communities. Our concern, however, is not with the

    generation of consensusthe primary aim of this paper is to understand the

    task of weighting and to determine w hether certain weighting procedures enable

    the averaging method to deliver rational probability updates. L&W devote only

    cursory attention to the problem of how weights are to be


    and I take note of

    some of what they suggest on this subject at appropriate points below.

    2 Scenario 1: the mapmakers

    In this scenario we consider a method of fixing weights which evaluates an

    agent's competence in terms of the amount of relevant evidence that agent

    possesses this is surely one species of 'com peten ce'. Let us consider a group

    of N-many mapmakers who are simultaneously set loose in a large complex

    foreign city. We assume the mapmakers are equally competent and work

    equally hard. The city is divided into many subregions of equal areait

    takes a mapmaker one hour to map one subregion, and we assume that map-

    makers proceed by drawing com plete maps ofasubregion before m oving on to

    the next subregion. Finally, we assume that each mapmaker i is given a whole

    number of hours H, to work within, and H | is not necessarily the same for each

    mapmaker. Mapmaker i will thus map precisely H,-many subregions of the

    city; no mapmaker succeeds in mapping the whole city.

    At the end of the mapmaking period, the mapmakers are all shown a single

    map M, and asked to determ ine the probability that it is a true map of the city.

    Each mapmaker i posts a probability pj that M is a true map based on just the

    evidence that i has acquired (i.e. the mapmakers do not collaborate). Our

    question is whether weighted averaging should be applied to the piand if


    how? We assume, for the moment, that M is a true map of the city

    (unbeknownst to the mapmakers).



    It might be objected at this point that the failure of the various mapm akers to share their data

    amo unts to a disanalogy with the actual practice of scientists, but this is not the case. First of all,

    data are often not shared between scientists for practical reasons (there is no easy way to transfer

    them) or social reasons (hav ing to do with the competitive nature of scien ce). Yet more relevant

    is the fact that when a scientist possesses a large amount of relevant eviden ce, evidence sharing

    may take prolonged time and effort on both sides, hence the premium on scientific expertise,

    which allows deference to another scientist's opinion without having to assimilate all that

    scientist's evidence.

    Probabilities and Epistemic Pluralism 35

    Let us pause to consider how individual mapmakers would determine their

    original posted probabilities pj. One point of considering this scenario is to

    consider a case in which the various m apmakers start from the same epistemic

    starting pointi.e. with precisely the same relevant background information.

    In Bayesian terms, this amounts to their starting with the same probability

    functions. To make the task of computing each p


    concrete, let us stipulate

    some shared prior probabilities. Suppose that the number of subregions of the

    city is 100and the mapmakers are informed that the prior probability that

    any particular subregion as drawn on M is accurate is 0.9. (This may be

    attributed to map M having been drawn by a mapmaker who is known to

    possess precisely this degree of reliability.) Assum ing that the probability that

    any particular region is accurate is independent of that of any other region , the

    prior probability that M is a true map, p(M), is (0.9)


    , or 2.66 x 10~


    . Each

    mapmaker i should thus compute pi by simply using Bayes' theorem. Thus

    where ej refers to the total evidence garnered by i, then

    P i

    = p(M)p(e





    Thus, noting that pfo/ M ) will of course be1 (since ej is an accurate picture of a

    set of subregions that are also truly pictured by M),

    P l

    = ( 2 . 6 6 x l 0 -


    ) / ( . 9 )

    H |


    So, if Hj = 50 hours, and j thus surveyed 50 regions,Pj= 0.005; for H


    = 75,


    = 0.072; for H, = 90, pi =


    for h


    = 99,p


    = 0.90. Clearly,


    is closer to

    1 the more subregions observed by iexactly what we would expect given

    that M is the true map.

    Let us attempt to apply weighted averaging to this example. The first

    question is how the various weights are to be fixed. Our mapmakers are

    more or less identical to one another except for the number of subregions

    they have mappedso it would seem to make sense to make each weight w;

    proportional to the num ber of subregions i mappedvis a visothers (here, ag ain,

    weights aim to measure competence in one particular sense: an agent is

    competent to assess the probability of T to the degree that the agent is in

    possession of relevant evidence). The sum Hi + H


    + ... + H


    gives the total

    number of acts of subregion-mappings performed by the com m unity thus let

    us declare w


    = H


    / H


    + H


    + . . . + H


    . The basic intuition in back of our

    weighting procedure is that weights reflect the amount of (reliably) garnered

    evidence by the weighted agent in comparison to the total amount of evidence

    garnered by all agents. Weighted averaging determines the probability that M

    is a true map P(M ) as follows:

    P(M) = w,p, + w


    p2 + ... + w





    The question for us is whether P(M) is a rational updated probability for M

    given all information we possess.

    36 Eric Christian Barnes

    Let us pause to pose a basic question: what does it mean for a probability p2

    to count asarational up date ofaprobability p i for some theory T? One initially

    plausible answer is that, if T is true, then p 2 is a rational update over p i if



    pi (conversely if


    is false). But this answer is v ulnerable to the objection that

    updates would then be rational independently of whether there exists suitable

    evidence for the update. M ore in line with norma l intuitions is the claim that p2

    is a rational update ove r pi in so far as the total available evidence favours p2

    over pi, and we assume this view of rational updating in what follows.

    Thus the question whether P(M) is a rational update given the various pi and



    amounts to the question whether P(M) is a probability better grounded on

    the total available evidence. The total available evidence in this case amounts

    to the totality of the city's subregions mapped by our mapmakers. Does P(M)

    better reflect the total evidence than, say, any pa rticular p


    ? Let us first consider

    a case in which a total of 10 mapmakers each map 10 subregionsbut that, as

    luck would have it, they happen to sample precisely the same 10 subregions.

    Thus each mapmaker i is given the weight w




    and each mapmaker posts

    Pi = 0.000 0762. The weighted averag e P(M ) will thus turn out to be

    0.0000762precisely the probability justified by the total evidence, since it


    just the correspondence of 10 subregions to the map M . But now consider

    the following variation on this example: in this case, each of 10 mapmakers

    maps 10 subregions, but no subregion is mapped by more than one mapmaker.

    Thus all 100 subregions of the city are covered, and our total evidence raises

    the probability that M is true to one. Unfortunately, given that the weights and

    the various pi are the same in this case as before, the weighted average

    determines the probability of M to be 0.0000762and weighted averaging

    clearly goes badly wrong.

    We are noting at this point a fundamental problem with weighted averaging

    as a technique for updating probabilities. If there is a large body of evidence

    supporting T , but the evidence is thinly d ispersed over a large num ber of agents

    who thus each post a low probability for T, weighted averaging will deliver a

    low value . Let us consider a modified proposal: instead of requiring individual

    agents to post probabilities for the truth of the entire map M, we require only

    that they post the probability that the group of subregions they surveyed are

    truly represented on M w e deem this the 'alternative app roach' to weighted

    averaging. Thus, assuming that any agent will correctly determine that any

    subregion they have mapped corresponds to M or not, every agent will post

    either 1 or 0 for this probability when applying the alternative approach . We

    com pute w eights as before, so as to be proportionate to the amo unt of garnered

    eviden ce for the weighted ag en t W e retain for the mom ent our assumption that

    M is a true m ap. Does w eighted averaging d eliver a rational update now on the

    alternative approach? Let us consider again the case in which each of 10

    mapmakers surveys 10 subregions, and no subregion is covered by more than

    Probabilities and Epistemic Pluralism



    so that the entire city is covered. Now we have w




    for each agent as

    before, but each Pi= 1, so P(M) is determined by weighted averaging to be

    equal to 1, precisely the correct result. So far so goodour alternative

    approach allows the epistemic significance of disjoint bodies of evidence to

    be combined.

    But clearly the alternative approach is no less fallible. Consider again the

    case in which each of 10 mapmakers surveys precisely the same 10 subre-

    gionsin this case, the calculation goes through just as in the previous

    paragraph, delivering

    P ( M ) = 1 ,

    but this is hardly the correct probability

    given the existing evidence. One can imagine many other scenarios in which

    the mapped subregions of the various mapmakers partially overlap, producing

    a weighted average that effectively accords too much weight to any subregion

    which is mapped by more than one mapmaker. Thus we conclude that if

    weighted averaging is to work for cases like these, we must require that

    agents gamer bodies of evidence which are disjoint, i.e. do not overlap

    suggesting that weighted averaging will work best when the accumulation of

    evidence is organized into a cooperative effort in which agents do not duplicate

    each other's evidence.


    But there is another snag. Heretofore we have been assuming that M is the

    true maplet us now suppose that M is false. In this case it is perfectly

    possible that even our alternative approach to weighted averaging will fail to

    deliver a rational update even if the various bodies of evidence are disjoint

    Suppose that exactly one ofthe100 subregions on M is false the other 99 are

    true. Now providing that none of the mapmakers actually surveys the false

    subregion, each m apmaker w ill post probability 1 for the subregion sur-

    veyedand the weighted averaging will proceed unhindered as above. But

    if any one mapmaker surveys the false subregion, she will post 0 for this


    The exam ple as modified into the alternative approach has some structural resemb lance to an

    example of weight assignments treated briefly by L&W ([1981], pp. 138-9): 'Suppose that n

    individuals are given a collection ofNobjects, each bearing a label from the set ( 1 , 2 ,. . . ,k} ,and

    must determ ine the fraction pj of objects bearing the label j for each j =


    k. Suppose further

    that the collection is partitioned into n disjoint subsets, and that individual i examines a subset

    with Nj objects and rep orts, for each j = 1, 2 ,. .. k, the fraction py of objects in that set which bear

    the label j .... [T]he rational sequence of weights W[, w


    , . . . ,w


    with which to average the

    [fractions] is immediately a ppa ren t Individual i should receive a weight proportional to the size

    of the set which he exam ines, so thatW|= N(/N. For assuming that individuals count correctly, it

    is easy to check that pj = w,p ij +W2P2J+ .. . + w ^ . ' While L&W offer this as one example of

    how weights might actually be assigned, they do not make mention ofthefact that what is going

    on in this example differs sharply from the use of weighted averaging to calculate the probability

    of some theo ry, which is of course the focus of their book; rather, wha t is calculated are various

    relative frequencies. (One might imagine that the agents arc calculating the probability that a

    randomly selected object has label j.) Neither do L&W note that the technique of weighted

    averaging in this exam ple differs from the technique applied in the rest of their book, insofar as

    the latter technique has various agents posting probabilities that a single theory T is true. Here,

    each agent merely reports information specifically about some proper segment of T s domain

    (and this information is combined to compute relative frequencies for the whole dom ain) with no

    reference to the probability of any particular theory.

    38 Eric Christian Barnes

    probability. But in this event the technique of weighted averaging is under-

    mined , for now the only rational update for the probability that M is a true m ap

    is 0 (not the w eighted average of


    various pj ). Here is yet ano ther limitation

    on the alternative approach to weighted averaging in our current context: it

    fails in cases in which any agent posts probability 0 based on evidence that

    conclusively refutes the theory.

    In this section we have considered a case in which the members of a

    comm unity of know ledge seekers begin with a comm on probability function

    and independently investigate a large and diffuse body of evidence. We

    conclude at this point that the technique of weighted averaging may be of

    some use in cases like these, but only under some fairly stringent conditions.

    These include (1) that various p, are based on disjoint bodies of evidence, (2)

    that all the garnered evidence effectively supports, rather than falsifies, T (this

    will occur in any scenario in which a community tests hypothesis h by testing

    for various ej, each of which follows logically from h and is confirmed),



    (3) we employ the alternative approach.

    The fundamental intuition about weighting w ith which we began this section

    was that weights should reflect the amount of evidence garnered by the

    weighted agents. But as matters have turned out, this is not quite the correct

    gloss on weighting in the mapmaking scenario. Weights should reflect not

    merely the amount of garnered evidence, but the extent to which the body of

    evidence extends the totality of evidence garnered by the entire community.

    Agents are thus weighted not on the basis of how much evidence they possess,

    but on the basis of how much evidence they have garnered that has not been

    garnered by others. (If agents have surveyed overlapping regions, weighted

    averaging cou ld be applied only if their posted probabilities were re-calculated

    so as to be based on disjoint evidence.) Let us now turn to a quite different


    3 Scen ario 2: the probabilistic jury

    In this scenario we deve lop a method for fixing weights w hich evaluates agents

    in terms of their track record as posters of probabilities. Let us consider a

    person charged with tax evasionhe undergoes a trial byNjurors. During the

    trial a carefully circumscribed body of evidence is presented to the jury

    consisting of documents, letters, testimony about the defendant's background,

    * There is of course another case: one in which evidence is produced by an agent that neither

    confirms nor refutes the theo ry at issue but m erely disconfirms the theory to some de gree. I leave

    aside this comp lication here, as in situations like our m apmaking scenario it is presumably easy

    to see whether any ac cumu lated evidence accords with M or not if it does, M is supported, ifit

    does not, M is refuted (i.e. there is no likely occurrence of evidence that m erely 'disco nfirms' the

    theory to some degree). Theory testing tends to have this feature when the theory at stake is a

    description of some cpistemically accessible domainand the evidence likewise may be easily

    surveyed and requires no interpretative skill to understand. We discuss this sort of feature of

    theories in more detail below.

  • 8/11/2019 Barnes, EC - Probabilities and Epistemic Pluralism - 1998.pdf


    Probabilities and Epistemic Pluralism 39


    The juror s all listen carefully to all the evidence. We assum e, however, that

    different jurors bring somewhat different background beliefs to the trial (in

    Bayesian terms, they begin with different probability functions). Now let us

    suppose that the jurors are not required simply to declare the defendant either

    guilty or not guilty rather, after the jurors co llectively d eliberate, each juro r

    is required only to post a probability that the defendant is guilty. Our prob lem,

    of course, is whether and how weighted averaging might be used to provide a

    rational update once the N probabilities are posted.

    In the mapmaking scenario, we assumed on the part of our various agents an

    identical starting point (i.e. their initial probability functions a re the sam e), and

    a subsequent independent search that produced various pools of evidence. In

    this case, we assume different initial probability functions, but only a single,

    shared body of evidence. In so far as it is at the level of their initial functions

    that our jurors differ (we assume no computational errors), in weighting the

    jurors we are somehow weighting their respective probability functions. But

    how might such weighting proceed? How would it be justified?

    There are various ways in which one m ight try to measure the 'qua lity' of a

    probability function. C onsider a juror J with probability function pj ; this

    function will consist of a set of prior and conditional probabilities held by J

    at the trial's onset. W e may define the verisimilitude of J's probability function

    pj ,

    Ver(pj), in the following terms: for any prior probability pj(x) in pj, deem

    Ver(pj(x)) to be equal to

    1 |

    Tr(x) - pj(x)|), where Tr(x) is




    is true, 0 if


    is false. For any conditional probability pj(y/z) in p j, deem Ver(y/z) to be equal

    to (1

    |Z(y/z) - pj(y/z)|), where Z(y /z) is the 'objective probability' that y

    will be true on the assumption that z is true (where the objective probability

    is grounded on relative frequencies or objective propensities).


    With all of

    these determinations in hand, let us simply declare the verisimilitude of pj to

    be equal to the average verisimilitude of all priors and conditionals in pj.

    An agent J's weight, we may then propose, is equal to Ver(pj)/

    Ve r(pl) + Ver(p2) + ... +Ver(p n).

    While we have succeeded in measuring the relative quality of a juror's

    probability function in some clear sense, as a method of calculating weights

    this proposal is easy to disqualify. First of all, the only person who could

    actually perform the required weighting would have to be a veritably om niscient

    agent, equipped not only with profound knowledge of truth values and objective

    probabilities for the relevant propositions but also with profound knowledge of

    the jurors' probability functions. To avail ourselves of such a deus ex machina,

    even in an idealized context like our present one, is perfectly absurd. (If an

    omniscient agent is available, why bother with calculating probabilities?)


    We clearly make a strong assumption at this point in assuming the existence of objective

    probabilities for such con ditionals. In that this proposal is shortly to be rejected on other g rounds,

    I ignore this particular source of difficulty here.

    40 Eric Christian Barnes

    Equally troubling is the obvious fact that the various propositions that pj m aps

    to probabilities are almost certainly not of equal importance to the overall

    adequacy of the probability J posts for the defendant's guilt how is the unequal

    importance of these propositions to be reflected in our ultimate assessment of the

    quality of J's function?

    Thus we find o urselves facing again the question how our juro rs are to be

    weigh ted for the purposes of weighted averag ing in our present examp le. L&W


    p. 21) suggest that in fixing an agent's weight, his track record as an

    epistemically competent agent could be considered (along with 'more diffuse

    impressions of intellectual ability'). The suggestion that we appeal to age nts'

    track records in the estimation of weights seems an eminently reasonable

    proposal that we have not yet considered. Track records may be supposed,

    at least in some cases, to be publicly available information. How might we

    define the quality of a probability function in terms of the track record of the

    agent who possesses that function?

    At this point I propose that a suggestion of L&W's might be developed to

    suit our current purposes. Toward the end of their 1981 book they quickly

    sketch several suggestions as to how weights might be calculatedsee fn. 3

    for discussion of one of these. Here is another

    Suppose that n individuals are attempting with unbiased devices of

    differing accuracy to measure a quantity u. Assume further that their

    estimates ai,






    of this number may be regarded as realizations

    of a sequence of independent random variables X i , . .. X


    , with variances

    ffi. Variances are often used as 'pe rform anc e' mea sures in estimation

    problems, reliability varying inversely with variance magnitude. If this

    measure of reliability is adopted by the group, the group should adopt as

    their consensual estimate of u the number w,ai + ... + w


    an where the

    weightsW| are chosen to minimize the v ariance a


    of the random variable

    w ^ i + . . . + W[,X


    . It is easy to check that o


    = w


    t r


    + ... + w




    , and a

    little partial differentiation then shows thata


    is minimized when



    (1 /

    o\)l \lox + . . . + l / a


    ). In decision making contexts like the one under

    discussion, one might initially have been inclined to endorse the estimate

    of the individual with smallest variance as the rational group estim ate. Yet

    such a policy is demonstrably inferior (with respect to variance minimiza-

    tion) to the use of the above weighted average, indicating the wisdom of

    collective deliberation on a single matter, even when some individuals are

    more expert at this matter than others ([1981], p. 139).

    (Rea ders interested in a proof of


    claim that these weights indeed serve to

    minimize variance are directed to Hoel ([1971], pp. 128-9).) L&W have

    indeed cited an example in which weighted averaging may be rationally

    deployed, though they do not note the fact that this example seems not to be

    one in which what is estimated is a probability for a theoryrather, what is

    estimated by w eighted averaging is the value for some m agnitude. How might

    this proposal be developed so as to apply to probability estimation? We might

    Probabilities and Epistemic Pluralism 41

    deploy this apparatus with the proviso that the measured magnitude is the

    'correct epistemic probability' of some theory given existing evidenceand

    where such true probabilities exist (and become subsequently known) this may

    be a viable strategy. But the assumption that correct probabilities exist is a

    quite troubled one in many circumstances, including circumstances in which

    experts disagree sharply about probabilities (and such .circumstances are

    precisely where we might hope weighted averaging to be most useful).


    Thus I propose that the estimated magnitude be regarded as the truth value

    of the theory T whose probability is being estimated, w here the truth value is 1

    if T is true and 0 is false. The posting of probabilities now becomes a type of

    truth value 'me asurem ent' of sorts: the closer a posted probab ility is to the truth

    value of T, the better that measurement.

    Let us now return to our jury story and enrich it with some additional

    assumptions. Let us suppose that each of our N-many jurors have in fact

    previously served on juries for many similar cases. We might imagine that

    there is a professional juro r system in which juro rs spec ialize in adjudicating a

    particular sort of trial each of our N juro rs, let us assum e, has served as a

    juro r for many sim ilar tax evasion trials (including trials of


    similar degree of

    'difficulty' with respect to how much deliberational skill is required to deliver

    a wise probability). For some significant subset of these trials, moreover, the

    eventual guilt or innocence of the defendant was determined beyond doubt (in

    each case, afterthejury had posted its probabilities). These assum ptions allow

    us to apply the minimal variance weighting technique to our probabilistic jury .

    Each juro r J, we now suppose, has a variance oj which is inversely proportional

    to that ju ro r's reliability as a poster of probabilities. We thus calculate weights

    in the way sketched aboveand use weighted averaging to determine an

    updated probability that the defendant is guilty. We are assured thus that the

    variance of the weighted average will be less in the long run than that of any

    particular juror, and the rationality of weighted averaging for probability

    updating is, in this precise sense, assured.

    Whereas the weighting technique in the mapmaking scenario measured

    weights in terms of disjoint evidence possession, this scenario imagines that

    the evidence (i.e. the information presented in the trial) is shared between all

    participantsthe weighting technique reflects our estimate of how likely it is

    that an agent w ill deliberate upon that evidence so as to post a probability close

    To expand: a correct epistemic probability of T will exist when then are codified and uncon-

    troversial procedures for calculating the probability of T together with a well defined and

    uncontroversial body of evidence relevant to T. Consider, for example, the probability that an

    agent will win a fair lottery withntickets soldthe correct epistemic probabilityfor thisclaim is

    1/n. The point here is that such correct probabilities will of course often fail to exist, either

    because there is no codified procedure for probability estimation or because the evidence

    relevantto is in some way controversial. The non-existence of correct epistemic probabilities

    is a familiar point to Bayesians (and critics of Bayesianism)it is just the familiar point that

    there is no general procedure for fixing prior probabilities.

    42 Eric Christian Barnes

    to the truth value of the proposition that the defendant is guilty. Thus I refer

    below to the track-record based weighting technique as a measurement of the

    'deliberational skill' of the agents.

    4 Applications for the two scenarios

    The mapmaking scenario corresponds, as noted, to a case in which agents with

    identical starting information begin to assimilate a large and diffuse body of

    evidence. The evidence in this example has the quality of being easy to acquire

    for any mapmaker with the time and willingness to compile it. This ease,

    however, is coun terbalanced by the great quantity of relevant data that must be

    acquired. It is easy to identify cases of knowledge pursuit that have this

    structure. Imagine an historian who presents a long and detailed narrative

    on, say, the history of Australia. The community of historians seeks to assess

    the probability that the narrative is correct. We may well suppose that the

    narrative will be read by various Australian historians with disjoint areas of

    expertise, their expertise could be divided along temporal, regional or other



    We might then suppose that each reader is weighted with respect to the

    relative quantity of expert knowledge he (disjointly) possessesvis a visothers,

    and then enjoin each historian to declare that the account is accurate or not with

    respect to his areathus weighted averaging (specifically, the alternative

    approach) could proceed more or less as in the case of our mapmakers, with

    the same conditions on its application.

    The probabilistic jury corresponds to a case in which agents begin with

    different probability functions but then reflect on a single, relatively small

    body of evidence. Thus this example is structurally similar to cases in which a

    well defined body of evidence is reflected upon by a group of agents with

    somewhat different backgrounds, all presumably competent by some m inimum

    standard. Differences in assigned weights reflect differences in 'deliberational

    skill' rather than differences in the evidence possessed. Examples (other than

    real juries) would include scientific communities consisting of agents who each

    consider how likely is a particular theory based on a publicly available body of

    evidence, but also any g roup of experts who m ust collectively post a p robability

    based on shared evidence (engineers declaring the probability that a certain

    project can be accomplished under a certain price, physicians declaring the

    In so far as the mapmak ing scenario was one in which we imagined the various mapm akers start

    with the same b ackground beliefs (or probability function), we m ust tell this story as follows: the

    various readers begin their study of Australia with the same background beliefs (as they pertained

    to Australia's history), and thereafter acquire disjoint bodies of historical information prior to

    assessing the narrative in our example. (It is not essential to the example that their historical

    knowledge literally be disjointbut only that each historian evaluates the narrative from the

    standpoint of their particular area of expertise, which we assume is different from the other


    Probabilities and Epistemic Pluralism 43

    probability that a patient will recover from his illness under a certain therapy,

    etc.).There can be little question that such agents are evaluated by their peers

    with respect to their relative authority, and this proposal shows how such

    evaluations may be rationally deployed in communal probability updating. Of

    course it may be objected that the mathematical technique used above to

    calculate minimal variance weights is not actually applied in real life examples,

    but this technique is simply a mathematical development of the basic idea that

    those agents are weighted more heavily who have delivered relatively more

    accurate probabilities in the past, and something like this basic procedure is

    surely part of real life examples.

    5 Mixed weights

    We have explored two strategies for fixing weights, and now must confront the

    following question: given some actual scenario involving multiple agents who

    have posted probabilities for some theory, how would we (or the community)

    choose which weighting method to use? An obvious way to answer this

    question would be to refer to the sort of information that is available in

    the actual scenario. If we possess information about the extent to which the

    various agents are (disjointly) in possession of the relevant evidence, we will

    use the evidence-based metfiod used in the mapmaking scenario. If we possess

    information about agents' track records in posting probabilities for similar

    theories, we will presumably choose to use the track record-based method.

    Lacking either kind of evidence, we should probably refrain from assigning

    any weights at all.

    But suppose we have both sorts of information available. How would these

    two sorts of information be combined to compute a set of 'mixed weights'

    which would accomm odate both pools of information? It might be tempting to

    simply assign two weights to each agent, one based on each weighting

    technique, and then declare the average of the agent's two weights to be the

    agent's actual weight (a community's set of weights thus computed will

    of course sum to one). This procedure, however, is too simple. It would

    effectively assume that in this actual scena rio, the two types of information

    about the agen t's competence are of equal importance. But, this assumption may

    very well be false, as we will now see.

    There are cases in which what matters primarily (with respect to the

    credibility of an agent's probability) is not how much evidence the agent

    possesses that is not possessed by others but rather how much deliberational

    skill (as measured by the track-record techn ique) that agent possesses. There are

    other cases in which deliberational skill is less important than the possession of

    evidence that is not possessed by others. The primary differences between these

    two sorts of cases concern (1) how much deliberational skill is required to

    44 Eric Christian Barnes

    competently assess the significance of a given body of evidence and (2) how

    likely it is that the procurement of new evidence will improve the probability

    estimation p rocess. Cases in w hich deliberational skill is more important than the

    possession of evidence not possessed by others include cases in which it is

    judged that additional evidence is unlikely to affect the probability estimation

    process very much. Consider a group of pharmacologists who are interested in

    the toxicity of


    particular substance on human beingsthey have surveyed the

    results of extensive and thorough toxicity testing of


    substance on a variety of

    laboratory a nima ls. Now a maverick pharmaco logist arrives and declares that he

    alone is in possession of pertinent ev idence that he has acquired by testing the

    substance on various anim als that are not standardly used in laboratory testing

    he has tested the substance on ostriches, panda bears, and moles. Even if he truly

    claims to be the sole possessor of a wealth of such information, he may not

    receive a weight proportionate to the sheer quantity of this evidence. Such

    evidence may well be judged


    be,wh ile not irrelevant, more or



    given the totality of evidence on standard laboratory animals that already is

    publicly accessible. What matters in this context is the ability to competently

    assess the publicly known evidence, and pharmacologists should be weighted

    primarily (though perhaps not exclusively) on this basis.

    Cases in which the possession of evidence not possessed by others is more

    important that deliberational skill include cases with two features: first, little

    deliberational skill is required for the competent assessment of evidence, and

    second, the evidence uniquely possessed by the agent is believed to be of

    considerable importance to the estimation of the probability. The mapmaking

    scenario is clearly intended to be a case of this type, but it is easy to imagine

    others. Imagine a group of anthropologists w ho are interested in the culture of


    particular native communitythe anthropologists have differing degrees of

    deliberational skill. One of these anthropologists, however, has had extensive

    contact with the mem bers of this com mu nity she is


    n their language and

    has lived am ong them for years. She h as, for exam ple, been extensively exposed


    the folklore of the comm unity. Her views abou t the folklore of that comm unity

    should thus be heavily weighted without much reference to her deliberational

    skill qua anthropologist This is because the content of a culture's folklore is

    something that can be thoroughly known without bringing to bear the sort

    of deliberational skill that anthropologists would value more highly in

    other contexts, and because such first hand acquaintance with folklore is indis-

    putably likely to improve the estimated probability of some account of that



    W e thus require a method of calcu lating mixed weigh ts which is sensitive to


    My apo logies to anthropological sophisticates w ho may, for all I know, justifiably believe this

    claim to b e false. The exam ple does not, I believe, depend on its being literally true in order to

    make its point

    Probabilities and Epistemic Pluralism 45

    local distributions of emphasis between evidence possession and deliberational

    skill. Such a method will of course only be applicable in cases in which both sorts

    of information are available regarding each agentinthe com munity. In such cases


    propose that each agent


    receive twoweights,w


    and w


    , which are determined

    according to the evidence-based and track record-based methods respectively.

    Agent i's mixed weight n^ will thus be a weighted average of these two:

    n^ = aw


    + (1 - a)w



    Coefficient a reflects the amount of emphasis accorded to disjoint evidence

    possession relative to deliberational skill.

    6 Weighting gone wrong

    I now propose to consider two basic objections to the weighted averaging

    project. The re are (1) it is fundamentally unscientific to estimate probabilities

    by weighted averaging because such a method will promote an unhealthy

    elitism in w hich the views of prestigious scientists are accep ted m erely on the

    basis of prestige-indicators like institutional affiliation, and (2) the weighted

    averaging technique misunderstands the way in which attributions of prestige

    function in scientific deliberations. We deal with each in turn.

    On objection (1): one might imagine that one way in which weights are

    distributed is on the basis of scientists' 'prestige', where this quality is reflected

    by factors like their institutional affiliation or educational pedigree. But to

    privilege scien tists' op inions (including their probabilities) on the basis of factors

    like these is blatantly unscientificit will lead to an uncritical acceptance of

    certain views merely because they are propounded by 'prestigious' people.


    However, such a criticism can easily be turned around. The cause of the

    problem we are describing, in fact, is not that the credibility of various agents

    with differing views would be taken into account in assessing the relevant

    scientific claims. It is, rather, that too few agents are weighted in die assess-

    ment only 'prestigious' authors are weighted and their weight is assessed

    all too simply on the grounds of prestige-indicators like institutional affiliation

    (a weighting procedure that corresponds to neither of those discussed in this

    paper). The solution to defective weighting procedures is not to abandon the


    Peters and Ceci ( [19 82 , 1985]) present evid ence that scientif ic papers in one disciplin e w ere

    accepted on the ba sis of the institutional affiliation o f the authors rather than the intrinsic merit of

    the papers. He len Longino ( [1 990 ] , p . 68) comm ents that Presumably the rev iew ers . . . assume

    that someone would not getajo b at X institution if that person were n ot a top-no tch investig ator,

    and so his/her experiments m ust be well done and the reasoning correct . At f irst blush, such a

    case m ight seem an excelle nt datum with which to argue against the use of weigh ted averaging in

    general. See what happen s, a critic might insist , when asse ssme nts of scientific c laim s are

    grounded not on a careful examination of the evidence on which they are based but on an

    assessm ent of the so-called credibility of the invest igators. A s I argue below , howev er, the

    correct remedy for this problem is not to abandon weighted averaging, but to adopt a superior

    weighting procedure.

    46 Eric Christian Barnes

    use of weighting in general, I would argue, but to improve the calculation and

    distribution of weights. The remedy for such elitism is a more democratic

    weigh ting procedure that spreads cognitive authority across a greater variety of

    agents, recognizing the often legitimate claims to authority made by scientists

    at institutions not gene rally regarded as prestigious.


    On criticism (2): perhaps it is the case that although an ag en t's cred ibility, as

    assessed by others, clearly plays some role in scientific dialogue about claims

    that agent makes, that role is not accurately portrayed in terms of weighted

    averaging. Rather, an agent Y's assessment of another agent X's credibility

    serves to determine how much of a hearing Y accords to X's views on the

    matter at hand. If Y assigns a high weight to X, this means that Y will pay

    careful attention to X 's views in determining Y 's own v iews conv ersely, ifY

    assigns X low weight, Y will not waste too much time in pondering the

    evidence and arguments presented by X. This sort of weighting is ultimately

    consistent with epistemological individualism, for while Y may rely on X to

    provide Y with evidence and arguments Y would not have considered on her

    own, ultimately Y 's p robability m ay reflect only Y 's assessment of



    issues (rather than a process of weighted averaging).

    No doubt this sort of 'weighting' does transpirebut in my view it cannot

    be the whole story. For ultimately rational agents must concede that there are

    some contexts in which, try as they might, they will never absorb all the

    knowledge, arguments and expertise all their colleagues possess, even on

    subjects on which the agents are themselves experts. At such points there is

    no choice but to concede that they must assign a non-zero weight to their

    colleagues' probabilities even when they disagree with the probabilities those

    colleagues post. One could only do otherwise by effectively assigning zero

    weight to colleagues despite the impressive credentials those colleagues may

    possess. While some scientists may choose this path, I would argue that the

    wisest ones usually do not.

    It should be noted in passing that one ofthecentral theses of Lon gino's bookScience as Social

    Knowledgesquares n eatly with the claims of this paper. Longino argues that the objectivity of

    science consists in large part in the fact that scientific dialogue occurs between a plurality of

    agents with differing perspectives. She writes that 'only if the products of inquiry are un derstood

    to be formed by the kind of critical discussion that is possible among a plurality of individuals

    about a commonly accessible phenom enon, can we see how they count as knowledge rather than

    opinion' ([1990], p. 74). She concludes that 'Scientific knowledge is, therefore, social knowl-

    edge. It is produced by processes that are intrinsically social' ibid., p. 75). Although Longino

    does not discuss the use of weighted averaging per se(she focuses primarily on interpersonal

    dialogue about method) her analysis is entirely applicable to it. Her argumen t that the scientific

    community requires the incorporation of a variety of 'voices' is a call for a particular sort of

    weighting procedure. Longino's community-based conception of inquiry stands in sharp con-

    trast to toe conception based on 'epistemological individu alism' which Hardw ig claimed was at

    the heart of traditional epistemology (see Section 1). A related thesis of Longino'sthat

    scientific objectivity requires that scientific communities incorporate the voices of women

    and minoritiescan also clearly be understood as a call for a more democratic weighting

    procedure ibid.,pp. 78f.)

    Probabilities and Epistemic Pluralism 47

    7 Conclusion

    Philosophers who take seriously the hypothesis that scientific discourse is and

    ought to be pluralistic should not ignore the technique of weighted averaging in

    probability estimation. Though the technique is no panacea, it provides a

    rigorous tool for representing one way in which differing perspectives in

    inquiry clash and mesh with each other.

    Department of Philosophy

    Southern Methodist University

    Dallas, TX 75275





