Richard Carrier Proving History Review

Reading Dr. Carriers Proving History

A Review From a Bayesian Perspective

Tim Hendrix

May 4, 2013

Introduction

Dr. Richard Carriers new book, Proving History (Prometheus Books, ISBN978-1-61614-560-6), is the first of two volumes in which Dr. Carrier investigatesthe question if Jesus existed or not. According to Dr. Carrier, the current stateof Jesus studies is one of confusion and failures in which all past attempts torecover the true Jesus has failed. According to Dr. Carrier the main problemis the methods employed: Past studies has focused on developing historicalcriteria to determine which parts of a text (for instance the Gospel of Luke)can be trusted, but according to Dr. Carrier all criteria and their use is flawed.This has as a result led to many incompatible views of what Jesus said ordid, and accordingly the question Who was Jesus? has many incompatibleanswers: a Cynic sage, a Rabinical Holy Man, a Zealot Activist, an Apolyticprophet and so on.

Richard Carrier propose that Bayes theorem (see below) should be employedin all areas of historical study. Specifically, Dr. Carrier propose that the prob-lems plaguing the methods of criteria can be solved by applying Bayes theorem,and this will finally allow allow the field of Jesus studies to advance. What thisprogress will be like and specifically, how the question if Jesus exist should beanswered, will be the subject of his second volume.

I was interested in Dr. Carriers book, both because I have a hobby interestin Jesus studies and found his other book on early christianity, Not the impos-sible faith very enjoyable and informative, but certainly also because Bayesianmethods was the focus area of my PhD and my current research area. My mainfocus in writing this review will therefore be on the technical content relating tothe use of Bayes theorem and its applicability to historical questions as arguedin the book.

The book is divided into six chapters. Chapter one contain an introductionwhich argues historical Jesus studies in its present form is ripe with problems,chapter two introduce the historical method as a set of 12 axioms and 12 rules,

Tim Hendrix is not my real name. For family reasons I prefer not to have my nameassociated with my religious views online. All quotations are from Proving History.

1

chapter three introduce Bayes Theorem, chapter four discuss historical methodsand seek to demonstrate with formal logic that all valid historical methodsreduce to applications of Bayes theorem, chapter five goes through historicalcriteria often used in Jesus studies and conclude each is only valid insofar as itagrees with Bayes theorem. Finally chapter six, titled the hard stuff, discussa number of issues that arise in applying Bayes theorem as well as RichardCarriers proposal for how a frequentist and Bayesian view of probabilities canbe unified.

In reviewing this book I wish to focus on what I believe are the books maincontributions: The first point is that Bayes theorem not only applies to thehistorical method, but that it can be formally proven all historical methods canbe reduced to applications of Bayes theorem and importantly, thinking in thisway will give tangible benefits compared to traditional historical methods.

The second point is how Dr. Carrier address several philosophical points thatare raised throughout the book, for instance the unification of the frequentisticand Bayesian view of probabilities. Since I am not a philosopher I will not beable to say much on the philosophical side, but I do think there are a numberof points which fall squarely within my field that should be raised.

However before I proceed I will first briefly touch upon the Bayesian view ofprobabilities and Bayes theorem.

1 Bayes Theorem

I wish to begin with a point that may seem pedantic at first, namely why weshould think Bayes theorem is true at all. Dr. Carrier introduces Bayes Theoremas follows:

In simple terms, Bayess Theorem is a logical formula that dealswith cases of empirical ambiguity, calculating how confident we canbe in any particular conclusion, given what we know at the time.The theorem was discovered in the late eighteenth century and hassince been formally proved, mathematically and logically, so we nowknow its conclusions are always necessarily true if its premises aretrue. (Chapter 3)

Unfortunately there are no references for this section, and so it is not explainedwhat definitions Bayes theorem make use of, which assumptions Bayes theoremrests upon and how its proven. For reasons I will return to later I think thisomission is problematic. However, shortly after the above quotation, just beforeintroducing the formula for bayes theorem, we are given a reference:

But if you do want to advance to more technical issues of theapplication and importance of Bayess Theorem, there are severalhighly commendable texts[9]

Footnote 9 has as its first entry E.T. Jaynes Probability Theory from 2003. Ihighly endorse this choice and I think most Bayesian statisticians would agree.

2

E.T. Jaynes was not only an influential physicist, he was a great communicatorand his book is in my preferred reference for students. In his book, Jaynesargues Bayes theorem is an extension of logic, and I will attempt to give thegist of Jaynes treatment of Bayes theorem below. Interested readers can findan almost complete draft of Jaynes book freely available online1:

Suppose you want to program a robot that can reason in sensible manner.You want the robot to be reason quantitatively about true/false statements suchas:

A = The next flip of the coin will be heads

B = There has been life on mars

C = Jesus existed,

A basic problem is neither we or the robot have perfect knowledge, and so itmust reason under uncertainty. Accordingly, we want the robot to have a notionof the degree of plausibility of some statements given other statements thatare accepted as true.

The most important choice in the above is I have not defined what thedegree of plausibility is. Put in other words, the goal is to analyse whatthe degree of plausibility of some statement could possibly mean and derivea results. Jaynes treatment in Probability Theory is both throughout and en-tertaining2, and at the end he arrive at the following 3 desiderata a notion ofdegree of plausibility must fulfill:

The degree of plausibility must be described by a real number It must agree with common sense (logic) in the limit of certainty It must be consistent

Consistency implies that if we have two ways to reason about the degree ofplausibility of a statement, these two ways must give the same result. Aftersome further analysis he arrive at the result that the degree of plausible ofstatements A,B,C, . . . can be described by a function P , and the functionmust behave like ordinary probabilities usually do, hereunder Bayes theorem:

P (A|B) = P (B|A)P (A)P (B)

Where by the notation P (A|B) mean the degree of plausibility of A given B.The key point is Bayes theorem now -if we accept what goes into the derivation-not only applies to flips with coins, but to all assignment of the degrees ofplausibility of true/false statements we may consider, and the interpretationthat a probability is really a degree of plausibility is then called the Bayesian

1c.f. http://bayes.wustl.edu/etj/prob/book.pdf2It should be noted the argument is not original to E.T. Jaynes, see R.T Coxs work from

1946 and 1961 or Jaynes book for a detailed discussion of the history

3

interpretation of probabilities. It is in this sense Jaynes (as well as most otherwho call themselves Bayesians) consider Bayes theorem an extension of logic.

These definitions may appear somewhat technical and irrelevant at thispoint, however their importance will hopefully be apparent later. For now letus make a few key observations:

Bayes theorem do not tell us what any particular probability should be Bayes theorem do not tell us how we should define the statementsA,B,C, . . .

in a particular situation

What Bayes theorem do provide us is a consistency requirement: If we knowthe probabilities on the right-hand side of the above equation, then know whatthe probability on the left-hand side should be.

2 Is all historical reasoning just Bayes theorem?

First and foremost, I think it is entirely uncontroversial to say Bayes theoremhas something important to say about reasoning in general and so also historicalreasoning. For instance, by going through various toy examples, Bayes theoremprovide a powerful tool to weed out biases and logical fallacies we are all proneto make.

However I believe Dr. Carrier has a more general connection between BTand the historical method in mind. In chapter 3:

Since BT is formally valid and its premises (the probabilitieswe enter into it) constitute all that we can relevantly say aboutthe likelihood of any historical claim being true, it should followthat all valid historical reasoning is described by Bayess Theorem(whether historians are aware of this or not). That would meanany historical reasoning that cannot be validly described by BayessTheorem is itself invalid (all of which Ill demonstrate in the nextchapter). There is no other theorem that can make this claim. ButI shall take up the challenge of proving that in the next chapter.

and later, just before the formal proof:

(...) we could simply conclude here and now that Bayess The-orem models and describes all valid historical methods. No othermethod is needed, apart from the endless plethora of techniquesthat will be required to apply BT to specific cases of which the AFEand ABE represent highly generalized examples, but examples ateven lower levels of generalization could be explored as well (such asthe methods of textual criticism, demographics, or stylometrics). Allbecome logically valid only insofar as they conform to BT and thusare better informed when carried out with full awareness of theirBayesian underpinning. This should already be sufficiently clear by

4

now, but there are always naysayers. For them, I shall establish thisconclusion by formal logic

The crux of the logical argument seem to be this. Dr. Carrier define variablesC,D and E, of which only D and E will be of interest to us. The relevant partof the argument is as follows:

Formally, if C = a valid historical method that contradicts BT,D = a valid historical method fully modeled and described by (andthereby reducible to) BT, and E = a valid historical method that isconsistent with but only partly modeled and described by BT, then:

P8 Either C, D, or E. (proper trichotomy)

P10 If P5 and P6, then E.P11 P5 and P6.

C4 Therefore, E

To establish premise P5 and P6, we consider a historical claim h, a piece ofevidence e and some background knowledge b. The premises are as follows:3

P5 Anything that can be said about any historical claim h thatmakes any valid difference to the probability that h is true willeither (a) make h more or less likely on considerations of back-ground knowledge alone or (b) make the evidence more or lesslikely on considerations of the deductive predictions of h giventhat same background knowledge or (c) make the evidence moreor less likely on considerations of the deductive predictions ofsome other claim (a claim which entails h is false) given thatsame background knowledge.

P6 Making h more or less likely on considerations of backgroundknowledge alone is the premise P (h|b) in BT; making the evi-dence more or less likely on considerations of the deductive pre-dictions of h on that same background knowledge is the premiseP (e|h.b) in BT; making the evidence more or less likely on con-siderations of the deductive predictions of some other claim thatentails h is false is the premise P (e| h.b) in BT; any value forP (h|b) entails the value for the premise P (h|b) in BT; andthese exhaust all the premises in BT.

3I have chosen to follow Dr. Carriers typesetting and accordingly for propositions suchas A = It will rain tomorrow and B = It will be cold tomorrow the notation A meansnot A (it will not rain tomorrow) and A.B means A and B (It will be rainy and coldtomorrow

5

I think we can summarize the argument as follows: Consider a valid historicalmethod. Either the historical method is fully or partly described by Bayestheorem. We can rule out the later possibility, E, for the following reason:Anything that can be said about the probability a historical claim h is truegiven some background knowledge b and evidence e, denoted by P (h|e.b), willaffect either P (h|b), P (e|h.b), P (e| h, b) or P (h|b). However these values fullydetermine P (h|e.b) according to Bayes theorem:

P (h|e.b) = P (e|h.b)p(h|b)p(e|h.b)p(h|b) + p(e| h.b)p(h|b)

and so the method must be fully included in Bayes theorem, proving the originalclaim.

I see two problems with the argument. The first is somewhat technicalbut need to be raise: Though it is not stated explicitly, Dr. Carrier tacitlyassume anything we are interested in about a claim h is the probability h is true.However I see no reason why this should be the case. For instance Dempster-Schafer theory establish the support and plausibility (the later term is used toa different effect than I did in the introduction) of a claim and multi-valuedlogics attempts to define and analyze the graded truth of a proposition; allof these are concepts different than the probability. It is not at all apparentwhy these concepts can be ruled out as being either not useful or reducing toBayes theorem. For instance, suppose we define Jesus as a highly inspirationalprophet, a great many in my field would say the modifier highly is not wellanalysed in terms of probabilities but requires other tools. More generally, itgoes without saying we do not have a general theory for cognition, and I wouldbe very surprised if that theory turned out to reduce to probability theory inthe case of history.

The second problem is more concrete and relates to the scope of what is beingdemonstrated: Lets assume we are only interested in the probability of a claimh being true. As noted in the past section, Bayes theorem is clearly only sayingsomething about how the quantity on the left-hand side of the above equation,P (h|e.b), must be related to those on the right-hand side, and Dr. Carrieris correct in pointing out any change in P (h|e.b) must (this is pure algebra!)correspond to a change in at least one term on the right-hand side. The problemis we do not know what those quantities on the right-hand side are numerically,and we cannot figure them out only by applying Bayes theorem more times. Forinstance, applying Bayes theorem to the term P (e|h.b) will require knowledgeof P (h|e.b), exactly the term we set out to determine.

This however seems to severely undercut the importance of what is beingdemonstrated. Let me illustrate this by an example: Lets say I make a claimsuch as:

Basic algebra [Bayess Theorem] models and describes all validmethods for reasoning about unemployment [historical methods]

My proof goes at follows: Let X be the number of unemployed people, Y is thenumber of people who are physically unable to work due to some disability and

6

Z is the number of people who can work but have not found a work. Now thealgebra:

X = Y + Z

(contrast this equation to Bayes theorem). I can now make an equivalent claimto P5 and P6: All that can be validly said about X must imply a change ineither Y or Z, and I can conclude: all that can validly be said about the numberof unemployed people must therefore be described by algebra.

Clearly in some sense this is true however it misses nearly everything ofeconomical interest such as what actually affects the terms Y and Z and byhow much; while it is clear if X change at least one of the terms Y or Z haveto change, algebra does not tell us which, just as Bayes theorem does not tellus what the quantities P (e|h.b), P (h|b), actually are, and it does not tell ushow the propositions e, h, b should be defined.

Suppose we try to rescue the idea of a formal proof by accepting the terma valid historical method simply mean system (or method) of inference whichoperate on the probability of propositions, without worrying which proposi-tions are relevant (which Bayes theorem does not say) or how to obtain theseprobabilities (which Bayes theorem does not say either). But if we accept thisdefinition, I see no reason why we could not simply replace the argument inchapter 4 by the following:

Bayesian inference describes the relationship between probabil-ities of various propositions (c.f. Jaynes, 2003). In particular itapplies when the propositions are related to historical events.

This claim would of course be hard to expand into about half a chapter.It is of course true Bayesian methods has found wide applications in almost

all sciences, but this has been because Bayesian methods has shown themselvesto work. I completely agree with Dr. Carrier that there are reasons to con-sider how Bayesian methods could be applied to history so as to give tangibleresults, but the main point is this must be settled by giving examples of actualapplications that offer tangible benefits, just as it has been the case in all otherscientific disciplines where Bayesian methods are presently applied. This is whatI will focus on in the next sections.

3 Applications of Bayes theorem in ProvingHistory

To my surprise, Proving History contains almost no applications of Bayes the-orem to historical problems. The purpose of most of the applications of Bayestheorem in Proving History is to illustrate aspects of Bayes theorem and showhow it agree with our common intuition. Take for instance the first example inthe book, the analysis of the disappearing sun in chapter in chapter 3, whichseem mainly intended to show how different degree of evidence affect ones con-clusion in Bayes theorem. The example considers an ahistorical disappearing

7

sun in 1989 with overwhelming observational evidence, and the claimed disap-pearing sun in the gospels with very little evidence, and shows that accordingto Bayes theorem we should be more inclined to believe the disappearance withoverwhelming evidence. This is certainly true, however it is not telling us any-thing new.

The example which by far receive the most extensive treatment is the criteriaof embarrassment, for which the discussion take up about half of chapter fiveand end with a computation of probabilities. I will therefore only focus on thisexample:

3.1 The criteria of embarrassment

The criteria of embarrassment (EC) is as follows:

The EC (or Embarrassment Criterion) is based on the folk be-lief that if an author says something that would embarrass him, itmust be true, because he wouldnt embarrass himself with a lie. AnEC argument (or Argument from Embarrassment) is an attempt toapply this principle to derive the conclusion that the embarrassingstatement is historically true. For example, the criterion of embar-rassment states that material that would have been embarrassing toearly Christians is more likely to be historical since it is unlikely thatthey would have made up material that would have placed them orJesus in a bad light, (Chapter 5)

Dr. Carrier then offers an extended discussion of some of the problems withthe criteria of embarrassment which I found well written an interesting. Theproblems raised are: (1) the gospels are themselves very late making it prob-lematic to assume the authors had access to an embarrassing core tradition theyfelt compelled to write down (2) we do not know what would be embarrassingfor the early church and (3) would the gospel authors pen something genuinelyembarrassing at all?.

There then follows treatments of several embarrassing stories in the gospelswhere Dr. Carrier argues (convincingly in my opinion) there can be little groundfor an application of the EC. We then gets to the application of Bayes theorem:

Thus, for the Gospels, were faced with the following logic. IfN(T ) = the number of true embarrassing stories there actually werein any friendly source, N(T ) = the number of false embarrass-ing stories that were fabricated by friendly sources, N(T.M) =the number of true embarrassing stories coinciding with a motivefor friendly sources to preserve them that was sufficient to causethem to be preserved, N( T.M) = the number of false embar-rassing stories (fabricated by friendly sources) coinciding with amotive for friendly sources to preserve them that was sufficient tocause them to be preserved, and N(P ) = the number of embar-rassing stories that were preserved (both true and fabricated), then

8

N(P ) = N(T.M) + N(T.M), and P (T |P ), the frequency of truestories among all embarrassing stories preserved, = N(T.M)/N(P ),which entails P (T |P ) = N(T.M)/(N(T.M) + N(T.M)) Since allwe have are friendly sources that have no independently confirmedreliability, and no confirmed evidence of there ever being any reliableneutral or hostile sources, it further follows that N(T.M) = qN(T ),where q 1, and N(T.M) = 1N(T ): because all false storiescreated by friendly sources have motives sufficient to preserve them(since that same motive is what created them in the first place),whereas this is not the case for true stories that are embarrassing,for few such stories so conveniently come with sufficient motives topreserve them (as the entire logic of the EC argument requires). Sothe frequency of the former must be 1, and the frequency of latter(i.e., q) must be 1. Therefore: [Assuming N(T ) = N(T ) andwith slight changes to the typesetting]

P (T |P ) = N(T.M)N(T.M) +N(T.M) =

qN(T )

q N(T ) + 1N(T ) =q

q + 1

So this is saying the probability a story will be true given it is embarrassing willalways be less than 0.5, so the EC actually works in reverse!

3.2 Reality Check

If you read a memoir and it said (1) the author severely bullied one of hisclassmates for a year (2) the author once gave a large sum of money to a homelessman, then, all things being equal, which of the two would you be more inclinedto believe the author to had made up? If the memoir was a gospel, we shouldbe more inclined to believe the story of the bullying was made up, however thisobviously goes against common sense! As Richard Carrier himself points out,sometimes the EC does work, and any computation must at the least be ableto replicate this situation.

3.3 What happened

I think the first observation is the quoted argument in Proving History do notactually use Bayes theorem (specifically, it avoids the use of probabilities), butrely on fractions of the size of appropriately defined sets. I cant tell why thischoice is made, but it is a recurring theme throughout the book to argue forthe application of Bayes theorem and then carry out at least a part of theargumentation using non-standard arguments. Another thing I found confusingwas how the sets are actually defined and why they are chosen the way theyare. To first translate the criteria into Bayes theorem we need to define the

9

appropriate variables. As I understand the text they are defined as follows

T, F : The story is true (as opposed to fabricated)

Pres : The story was preserved

Em : The story is embaressing

The discussion carried out in the text now amount to the following assumptions

P (Pres| T,Em) = 1P (Pres|T,Em) = q < 1

The first assumption is saying the only way someone would fabricate a seeminglyembarrassing story is if it serves some purpose and so it must be preserved, andthe second is saying a true story which seems embarrassing might not serve aspecific purpose and we are not guaranteed it will be preserved. It should beclear now we are really interested in computing P (T |Pres,Em), the probability astory is true given it is preserved and seems embarrassing. Turning the Bayesiancrank:

P (T|Pres,Em) = P (Pres|T,Em)P (T |Em)P (Pres|T,Em)P (T |Em) + P (Pres| T,Em)P (T |Em)

=qP (T |Em)

qP (T |Em) + P (T |Em) =q

q + 1

from which the result follows. We can try to translate the result into En-glish: Suppose the gospel writers started out with/made up an equal numberof true and false stories that seems embarrassing today. However all the seem-ingly embarrassing stories that are false were made up (by the gospel writersor whoever supplied them with their material) because they were significantand were therefore preserved, and the true seemingly embarrassing stories werepreserved/writtern down by the gospel writers at a low rate, q, and thereforealmost all seemingly embarrassing stories that survive to this date are false.

A reader might notice I have used the phrase seemingly embarrassing, bywhich I mean seemingly embarrassing to us. This is evidently required for theargument to work. Consider for instance the assumption P (Pres| T,Em) = 1.However if Em meant that it was truly embarrassing to the author, this wouldmean that false stories made up by friendly sources (are there any?) which weretruly embarrassing would always be preserved a highly dubious assumptionand clearly contrary to Dr. Carriers argument.

A basic problem in the above line of argument is there is no way to encodethe information that a story was actually embarrassing. We are, effectively,analysing the criteria for embarrassment without having any way to express astory was embarrassing to the author!.

Embarrassing therefore become effectively synonymous with embarrassingwith a deeper literary meaning (the reader can try the substitution this phrasein the previous sections and notice the argument become more natural), and the

10

analysis boil down to saying stories with a deeper literary meaning (that alsohappens to look embarrassing today) are for the most part made up, except afew that are true and happens to have a deeper meaning by accident.

3.4 Adding Embarrassment to the Criteria of Embarrass-ment

To call something an analysis of the criteria of embarrassment, we need toinclude an expressiveness amongst our variables that include the basic intuitionbehind the criteria. I believe the following is minimal:

T, F : The story is true or fabricated

Pres : The story was preserved

Em : The story is seemingly embaressing (to us)

Tem : The story was truly embaressing to the author

Lp : The story served a litteraty purpose (we assume Tem = Lp)Notice Tem mean something different than Em: Tem mean the story was em-barrassing to the one doing the preservation, Em means it seem embarrassingto us 2000 years later. To put the EC into words: A person would not pre-serve something that was actually embarrassing which he knew was false, or insymbols:

P (Pres| T,Tem) = 0The following is always true:

P (Pres, T,Tem|Em) = P (Pres|T,Tem,Em)P (T |Tem,Em)P (Tem|Em)Where I have been really sloppy in the notation and implicitly assume variablessuch as T and Tem can also take values T and Tem = LP. The next step isto add simplifying assumptions. I am going to assume

P (Pres|T,Tem,Em) = P (Pres|T,Tem)P (T |Tem,Em) = P (T |Tem)

The assumptions here is that our (20th century) interpretation of whether astory is embarrassing or not is secondary to if it was truly embarrassing. Next,lets look at the likelihood term. I will assume:

P (Pres|F,Tem) = 0P (Pres|F,LP) = lP (Pres|T,Tem) = cP (Pres|T,LP) = 1

The first and last specification is saying an author would never record somethingtruly embarrassing he knew was false, and he would always record something he

11

knew was true and served a literary purpose. The second specification is sayingthe author will (with probability l) include stories that are false but neverthelessserve a literary purpose, and the third that he has a certain candor that makeshim sometimes (with probability c) include embarrassing stories he know aretrue. Turning the Bayesian crank now give:

P (T |Pres,Em) = P (Tem|Em)P (T |Tem)c+ P (LP|Em)P (T |LP)P (Tem|Em)P (T |Tem)c+ P (LP|Em)P (T |LP) + P (F |LP)P (LP|Em)l

This is a bit of a mess. Lets begin by assuming we are equally good atdetermining if a story is truly embarrassing or serves a literary purpose, ie.P (Tem|Em) = P (LP|Em) = 0.5 and we know nothing of the (conditional)probability a story is true/false, eg. P (T |Tem) = P (T |LP) = 0.5. In this case:

P (T |Pres,Em) = c+ 1c+ 1 + l

We can now try to plug in some limits. Assume the gospel authors have perfectcandor and will always report true stories (c = 1) we get:

P (T |Pres,Em) = 22 + l

[ 23

; 1]

so in this case the criteria of embarrassment actually work. Another case mightbe where the gospel authors have no candor and will always suppress embar-rassment stories, c = 0, and in this case

P (T |Pres,Em) = 11 + l

[ 12

; 1]

so actually the criteria of embarrassment also work in this limit(!). To recoverDr. Carriers analysis, we need something more. Inspecting the full expressionreveal the easiest thing to assume is something like:

P (T |LP) = q < 12

Which is saying stories that serves a literary purpose are likely to be made up.I suppose which value you think q would have depend on how you view Jesus:Do you expect him to have lived the sort of life where many of the things he didor said would have a deeper literary purpose afterwards? Your religious viewsmay influence how you judge that question to put it mildly. At any rate, thislead to the new expression:

P (T |Pres,Em) = c+ qc+ q + (1 q)l .

It is difficult to directly relate this expression to Dr. Carriers analysis, howeverlets assume a story is preserved with probability 1 if it is true and serves aliterary purpose (l = 1) and a story which is true but also embarrassing willnever be preserved (c = 0). Then we simply obtain

P (T |Pres,Em) = q < 12

which is qualitatively consistent with Dr. Carriers result.

12

3.4.1 Some thorny issues

Dr. Carrier offered one analysis of the EC which indicate embarrassment lowerthe probability a story is historical, I included a variable that actually allowfor a story to be embarrassing and got the opposite result. My point is not todemonstrate one of us is wrong or right, but motivate some questions I thinkare problematic in terms of applying Bayes theorem to history:

Do we actually model history: I think both Dr. Carriers and my analysiscontained a term like P (Pres|T, x) (x possibly meaning different things).The model this presume is something akin to the following: The gospel au-thors are compiling (or preserving) a set of stories with knowledge of theirtruth-value and at least in my case knowledge of their literary purposeand them being embarrassing. However I think it is uncontroversial to saythis is a bad model of how the gospel authors worked. For instance thegospel authors also made up a good deal of the gospels, changed storiesto fit an agenda and so, and this should also figure in the analysis.

When true is not true: Continuing the above line of thought: With someprobability, which we need to estimate, the gospel authors did not knowwhat was true or false per see because they were writing about eventsthat may have happened 40 years prior. This means that conditioningon a variable T (true) is problematic. True seem to more likely mean(with some probability at least) that the statement was something thatwere being told by the Christian community and believed to be true. Thisshould be included in the analysis.

Where do the stories come from: Continuing the above line of thought,if the Gospel authors had access to a set of stories about Jesus, we needto ask where they came from. This lead to a secondary application of thecriteria of embarrassment, but with the subtle difference that we knoweven less about who the original compilers (or tellers) of these storieswere, what they would find embarrassing, what they actually producedand so on, this should also be included in the analysis.

Variable sprawl: A basic point is this: If we want to determine how well thecriteria of embarrassment work in a Bayesian fashion, we need to modelthe underlying situation with some accuracy. Continuing the above lineof thought would properly result in a good 10-20-(100?) variables thatmean different things and are all relevant to determining if a seeminglyembarrassing story is historical or not. Basically, every time one have anoun and a might or properly, there is a new variable for the analysis,and we must include this variables in our analysis. Determining what thevariables actually mean, what their probabilities are and how they (numer-ically) affect each other is a truly daunting task that scale exponentiallyin the number of variables. Is it possible to undertake this project andexpect some accuracy at the end?

13

Toy models: An alternative view is to undertake the analysis using naive toymodels and arguing why large parts of the problem can either be ignoredor approximated by these toy model. This is what both I and Dr. Carrierhas done. This is properly a more fruitful way to approach the problem,however since all toy models are going to be wrong (the fact Dr. Carrierand I produced exactly opposite results is evidence of this), this raises somebasic questions of how the numerical estimates we get out are connectedto the historical truth of any given proposition under consideration.

In statistical modelling, or any other science for that matter, whenever one ispostulating a model, no matter how reasonable the assumptions that goes intoit may seem, there must be a step where the result is validated in some way bypredicting a feature of the data which can be checked. I hope the disagreementof Dr. Carriers model for the Criteria of Embarrassment and my proposedmodel will convince the reader such measures are required.

How such validations should be carried out is not discussed in proving his-tory, nor does one get the impression there would be much of a need in the firstplace. I will try to illustrate how Proving History treats this issue by twoexamples. The first is from chapter six, on resolving expert disagreement, inwhich it is discussed at some length is how Bayes theorem can be used to maketwo parties agree:

The most common disagreements are disagreements as to thecontents of b (background knowledge) or its analysis (the derivationof estimated frequencies). Knowledge of the validity and mechanicsof Bayess Theorem, and of all the relevant evidence and scholarship,must of course be a component of b (hence the need for meeting thoseconditions before proceeding). This basic process of education caninvolve making a Bayesian argument, allowing opponents to critiqueit (by giving reasons for rejecting its conclusion), then resolving thatcritique, then iterating that process until they have no remaining ob-jections (at which time they will realize and understand the validityand operation of Bayess Theorem and the soundness of its applica-tion in the present case). So, too, for any other relevant knowledgealthough they may also have their own information to impart to you,which might in fact change your estimates and results, but either waydisagreements are thereby resolved as both parties become equallyinformed and negotiate a Bayesian calculation whose premises (thefour basic probability numbers) neither can object to, and thereforewhose conclusion both must accept

A worrying aspect of the above quote is how Dr. Carrier discuss these problemsas having to do with estimating the four basic probability numbers, by whichI assume he really intend to say the three numbers p(e|h, b), p(h|b), p(e| h, b).Just to take my toy example from above, there will very clearly be more thanfour numbers involved. In fact, the amount of numbers will grow exponentiallyin the number of different binary variable (such as T , Em, Tem, etc. in the

14

above) we attempt to treat in our analysis. I think the pressing issue is not ifor if not two perfectly rational scholars should in principle end up agreeing, buthow we ourselves would know what we were doing had scientific value and whattwo scholars should do in practise.

The second suggestion in Proving History is a-fortiori reasoning. This roughlymeans using the largest/smallest plausible value of probabilities in the analysisto see which kind of results one may obtain. I think there are ample reasonsto suspect, based on the past example alone, that one can get divergent resultsthis way. At any rate such over or underestimation would not fix the problemof having the wrong model to begin with, a point the toy example above shouldbe sufficient to demonstrate.

4 The re-interpretation of probability theory

In my reading of the book there was a number of times where I had prob-lems following the discussion, for instance when discussion how to obtain priorprobabilities from frequencies, or the suggestion of a-fortiori reasoning. I thinkChapter six, The technical stuff, explain much of this confusion, namely Dr.Carriers suggestion for how one can combine the Bayesian and the frequentistview on probabilities, which is also a main theoretical contribution of the book.Before I return to some more practical considerations I wish to treat Dr. Car-riers suggestion in more details.

One of the main purposes of chapter six is to address some philosophicalissues of Bayesian theory. Dr. Carrier introduce the chapter with these words:

Six issues will be taken up here: a bit more on how to resolveexpert disagreements with BT; an explanation of why BT still workswhen hypotheses are allowed to make generic rather than exact pre-dictions; the technical question of determining a reference class forassigning prior probabilities in BT; a discussion of the need to at-tenuate probability estimates to the outcome of hypothetical models(or a hypothetically infinite series of runs), rather than deriving esti-mates solely from actual data sets (and how we can do either); and aresolution of the epistemological debate between so-called Bayesiansand frequentists, where Ill show that since all Bayesians are in factactually frequentists, there is no reason for frequentists not to beBayesians as well. That last may strike those familiar with thatdebate as rather cheeky. But I doubt youll be so skeptical after hav-ing read what I have to say on the matter. That discussion willend with a resolution of a sixth and final issue: a demonstration ofthe actual relationship between physical and epistemic probabilities,showing how the latter always derive from (and approximate) theformer.

Where the emphasis are the claims I will focus on in this review. In reviewingDr. Carriers suggestion, I will not focus so much on the debate between fre-

15

quentists and Bayesians (in my experience it is a not something one encountersvery frequently), but rather on Dr. Carriers proposed interpretation of Bayesianprobabilities. I apologize in advance the section will be somewhat technical atplaces, I have tried to structure it by providing what I consider a standardBayesian answer (these sections will be marked with an *) to the questions Dr.Carrier attempt to answer, and then discuss Dr. Carriers alternative suggestion.

But before I begin I think it is useful to review the standard Bayesian in-terpretation of the two central terms Dr. Carrier seek to investigate, namelyprobabilities and frequencies. The following continue from the introduction ofBayes theorem I outlined in the first section. I will refer readers to E.T. Jaynesbook which discuss these issues with much more clarity.

4.1 Probabilities and frequencies. The mainstream view*

The mainstream Bayesian view on frequencies and probabilities can be summa-rized as follows:

Probabilities represent degrees of plausibility. Probabilities therefore referto a state of knowledge of a rational agent and are either assigned based on (forinstance) symmetry considerations (the chance a coin come up heads is 50%because there are two sides) or derived from other probabilities according to therules of probability theory (hereunder Bayes theorem).

Frequencies is a factual property of the real world that we measure orestimate. For instance, if we count 10 cows on the field and notice 3 are red, thefrequency of red cows is 3/10 = 0.3. This is not a probability. The two thingssimply refer to completely different things: Probabilities change when our stateof knowledge change, frequencies do not.

With these things in mind lets focus on Dr. Carriers definition of probabili-ties and frequencies:

4.2 Richard Carriers proposal

A key point I found confusing is what Dr. Carrier actually mean by the wordprobability. The word is used from the beginning to the end of the book, howeveran attempt to clarify its meaning is only encountered in Chapter 2, right afterstating axiom 4: Every claim has a nonzero probability of being true or false(unless its being true or false is logically impossible)4the following clarificationfollows:

...by probability here I mean epistemic probability, which is theprobability that we are correct when affirming a claim is true. Set-ting aside for now what this means or how theyre related, philoso-phers have recognized two different kinds of probabilities: physicaland epistemic. A physical probability is the probability that an

4The axioms and rules are themselves somewhat. If the historical method reduces toapplication of Bayes theorem, shouldnt we rather be interested in the assumptions behindBayes theorem?

16

event x happened. An epistemic probability is the probabilitythat our belief that x happened is true.

Notice that both the definition of probability, epistemic probability andphysical probability themselves rely on the word probability which is ofcourse circular. The definition is revisited in chapter 6 in the section Therole of hypothetical data in determining probability. The definition (it is hardto tell if an actual definition is offered) introduce the axillary concepts logicaltruths, emperical truths and hypothetical truths. I will confess I foundthe chapter very difficult to understand, and I will therefore provide quotationsbefore giving my own impression of the various definitions and arguments suchthat the reader can form his own opinion.

What are probabilities really probabilities of? Mathematiciansand philosophers have long debated the question. Suppose we havea die with four sides (a tetrahedron), its geometry is perfect, and wetoss it in a perfectly randomizing way. From the stated facts we canpredict that it has a 1 in 4 chance of coming up a 4 based on thegeometry of the die, the laws of physics, and the previously provenrandomizing effects of the way it will be tossed (and where). Thiscould even be demonstrated with a deductive syllogism (such thatfrom the stated premises, the conclusion necessarily follows). Yetthis is still a physical probability. So in principle we can connectlogical truths with empirical truths. The difference is that empiri-cally we dont always know what all the premises are, or when orwhether they apply (e.g. no dies geometry is ever perfect; we dontknow if the die-thrower may have arranged a scheme to cheat; andcountless other things we might never think of). Thats why wecant prove facts from the armchair.

From this, it seem the logical truth is the observation a perfectly randomthrow with a perfect die with four sides will come up 4 exactly 1/4th of the time.Dr. Carrier note this probability is connected to the physical probability, bywhich I believe is meant how a concrete die will behave. While it is clearly truethe two things must be connected in some way, the entire point must be how thetwo are connected. In the following section Dr. Carrier (correctly) identify thisconnection as having to do with our lack of knowledge. The text then continue:

Thus we go from logical truths to empirical truths. But we haveto go even further, from empirical truths to hypothetical truths. Thefrequency with which that four-sided die turns up a 4 can be deducedlogically when the premises can all be ascertained to be true, or nearenough that the deviations dont matter (...), yet ascertained stillmeans empirically, which means adducing a hypothesis and testingit against the evidence, admitting all the while that no test canleave us absolutely certain. And when these premises cant be thusascertained, all we have left is to just empirically test the die: roll

17

it a bunch of times and see what the frequency of rolling 4 is. Yetthat method is actually less accurate. We can prove mathematicallythat because of random fluctuations the observed frequency usuallywont reflect the actual probability. For example, if we roll the diefour times and it comes up 4 every time, we cannot conclude theprobability that this die will roll a 4 on the next toss is 100% (oreven 71%, which is roughly the probability that can be deduced ifwe dont assume the other facts in evidence). Thats because if theprobability really is 1 in 4, then there is roughly a 4% chance youllsee a straight run of four 4s (mathematically: 0.254 = 0.00390625)

I believe the above discussion can be summarized as follows: Suppose we havean idealized die with four sides we roll in an idealized way. The chance it willcome up 4 is (exactly) 0.25. This is what Dr. Carrier call a hypothetical truth.However, since the die has minute random imperfections, the real chance it willcome up 4 is slightly different, perhaps 0.256. This is the physical probability.The reason why these two numbers are different is because we are unaware ofthe small imperfections in the die. Now, if we roll an actual die a number oftimes, say 4, and compute the frequency of times the die will come up 4 to thetotal number of rolls, we will get a third number which properly will not beany of the above. In fact, the fluctuations that are being discussed are exactlydistributed according to the previously introduced expression, viz.:

P (n rolls|N rolls) =(N

n

)pn(1 p)Nn, and p = 0.25.

While there are a few minor points about the way the problem is laid out(for instance the use of the word chance is problematic; how is that definedwithout reference to probabilities?) and the terminology, the problems raisedabove -namely how these three numbers are related- is the central one. We willnow turn to Dr. Carriers proposal, the discussion continue as follows:

Even a thousand tosses of an absolutely perfect four-sided diewill not generate a perfect count of 250 4s (except but rarely). Theequivalent of absolutely perfect randomizer do exist in quantum me-chanics. An experiment involving an electron apparatus could beconstructed by a competent physicist that gave a perfect 1 in 4 de-cision every time. Yet even that would not always generate 250 hitsevery 1,000 runs. Random variation will frequently tilt the resultsslightly one way or another. Thus, you cannot derive the actualfrequency from the data alone. For example, using the hypotheticalelectron experiment, we might get 256 hits after 1,000 runs. Yet wewould be wrong if we concluded the probability of getting a hit thenext time around was 0.256. That probability would still be 0.250.

We could show this by running the experiment several timesagain. Not only would we get a different result on some of thosenew runs (thus proving the first result should not have been so con-cretely trusted), but when we combined all these data sets, odds are

18

the result would converge even more closely on 0.250. In fact youcan graph this like an approach vector over many experiments andsee an inevitable curve, whose shape can be quantified by mathemat-ical calculus, which deductively entails that that curve ends (whenextended out to infinity) right at 0.250. Calculus was invented forexactly those kinds of tasks, summing up an infinite number of cases,and defining a curve that can be iterated indefinitely, so we can seewhere it goes without actually having to draw it (and thus we cancount up infinite sums in finite time).

The last paragraph verge on gobblygog in using technical words in a manner thatis both unclear and very hard to recognize. The proposal seem to be that if wecarry out the idealized experiment out for sufficiently long time, the observedfrequency will converge towards 0.25. A reader who is unfamiliar with thisresult should keep in mind that a formal statement of the result (from the setupI assume it is the weak law of large numbers Dr. Carrier has in mind) containthe somewhat technical statement: ..will converge with probability one.., soif one is using such an argument to later define probability there is again anissue of circularity. Directly following the above paragraph is this:

Clearly, from established theory, when working with the imag-ined quantum tabletop experiment we should conclude the frequencyof hits is 0.25, even though we will almost never have an actual dataset that exhibits exactly that frequency. Hence we must concludethat that hypothetical frequency is more accurate than any actualfrequency will be. After all, either the true frequency is the observedfrequency or the hypothesized frequency; due to the deductive logicof random variation you know the observed frequency is almost neverexactly the true frequency (the probability that it is is always 0.5,and in fact approaches 0 as the odds deviate from even and thenumber of runs increases); given any well-founded hypothesis youwill know the probability that the hypothesized frequency is the truefrequency is > 0.5 (and often 0.5), and certainly not 0); there-fore P (THE HYPOTHESIZED FREQUENCY IS THE TRUE FREQUENCY)> P (THE OBSERVED FREQUENCY IS THE TRUE FREQUENCY); in fact,quite often P (HYPOTHESIZED) P (OBSERVED). So the same is truein every case, including the four-sided die, and anything else we aremeasuring the frequency of. Deductive argument from empiricallyestablished premises thus produces more accurate estimates of prob-ability.

The main philosophical charge (if you will) leveled by Bayesian statesticiansagainst frequentists is a frequentist view tend to require thought-experimentsin idealized situations that are run to infinite, and I will just notice we are nowhaving a imagined quantum tabletop experiment where we assume we know thelimit frequency is 0.25 (no concrete experiment I can think of would behavelike that, and no experiment can be run to the limit of infinite). The typical

19

Bayesian objection is that while we are free to think of this idealized situation asa thought-experiment, it is quite different to eg. the situation where we considerthe probability a corpse is stolen from a grave. Again I will refer to Jaynes bookfor a deeper treatment of the problems that arise and again only notice Carrierdoes not discuss them at all.

However Dr. Carrier also introduce some novel problems in his discussion.Consider the statement: After all, either the true frequency is the observedfrequency or the hypothesized frequency. But clearly this is false. Suppose ihypothesize that the so-called hypothesized frequency of the die coming up 4is 0.25. I then roll the die 10 times and get a observed frequency (in Bayesianterms, the frequency) of 3/10. However both of these values are going to bewrong, because clearly the microscopic imperfections in the die is going to meanit will have a different true frequency (in Dr. Carriers language) than either0.25 or 0.3, simply due to the fact there are an infinite number of other candidatetrue frequencies. The statement is therefore in any practical situation a falsedilemma; regarding the inequalities what would happen would be that bothsides would tend towards zero, in direct contradiction to what Dr. Carrier write(because, again staying in the frequentist language, the true frequency is withprobability 1 something else) and depending on the situation the inequalitycould go either way. The argument is simply false.

Finally, and this is a recurrent theme, it is very hard to tell what has actuallybeen defined. I have carefully gone through the chapter, and the above quotationis the first time the word hypothetical frequency is used. But what exactlydoes it mean? The closest to a definition is shortly later in chapter six: Thuswe must instead rely on hypothetical frequencies, that is, frequencies that aregenerated by hypothesis using the data available which data includes not justthe frequency data (from which we can project an observed trend to a limit ofinfinite runs), but also the physical data regarding the system thats generatingthat frequency (like the shape and weight distribution of a die).. What I thinkis intended here is to say the hypothetical frequency represent our best guessat what will happen with the die (or quantum tabletop experiment) if we rollit in the future, given given our knowledge of the geometry of the die and pastrolls. In Bayesian terms, we would call this the probability.

Having introduced observed and hypothetical frequencies, we can now beginto make headway towards defining probabilities, unfortunately it is done in avery indirect manner:

...that hypothetical frequencies are more accurate than observedfrequencies, should not surprise anyone. ... if we take care to manu-facture a very good four-sided die and take pains to use methods oftossing it that have been proven to randomize well, we dont needto roll it even once to know that the hypothetical frequency of thisdie rolling 4s is as near to 0.25 as we need it to be. (...) Thusits not valid to argue that because hypothetical frequencies are notactual data, and since all we have are actual data, we should onlyderive our frequencies from the latter. All probability estimates (even

20

of the very fuzzy kind historians must make, such as occasioned inchapters 3 through 5) are attempted approximations of the true fre-quencies (as Ill further explain in the next and last section of thischapter, starting on page 265). So thats what were doing when wesubjectively assign probabilities, attempting to predict and thus ap-proximate the true frequencies, which we can only approximate fromthe finite data available because those data do not reflect the truefrequency of anything (...). Thus we must instead rely on hypotheti-cal frequencies, that is, frequencies that are generated by hypothesisusing the data available which data includes not just the frequencydata (from which we can project an observed trend to a limit ofinfinite runs), but also the physical data regarding the system thatsgenerating that frequency (like the shape and weight distribution ofa die). Of course, when we have a lot of good data, the observed andhypothetical frequencies will usually be close enough as to make nodifference. [my italic]

The question I started out with was this: What is a probability in ProvingHistory? To the best of my knowledge, probability is being equated with hy-pothetical frequencies, however this suggestion is definately non-Bayesian andis plagued by all the problems Bayesian has been raising for nearly a century,starting with Dr. Carriers main technical reference for Bayes theorem, namelyJaynes book.

The first thing to notice is the discussion above is entirely focused on dies andquantum tabletop computers, that is, experiments which we can easily imaginebe carried out over and over again. However these setups are very different fromthe ones we are really interested in, namely probabilities of historical eventsthat perhaps only happened once. To give a concrete example of this difficulty,consider the following propositions

A : I believe with probability 0.8 that the 8th digit of pi is a nine

In a Bayesian view, the term with probability 0.8 refer to a state of knowledgeof pi, and thus require no axillary considerations; it simply reflect me thinkingthe 8th digit is properly a nine while not being certain.

However, in the interpretation above, when we assign a probability of 0.8 tothe statement then (to quote): what were doing when we subjectively assignprobabilities,[is] attempting to predict and thus approximate the true frequencies,which we can only approximate from the finite data available. But what is thetrue frequency of the 8th digit in pi being a 9? Why should we think there issuch a thing? How would we set out to prove it exists? What is the true valueof the true frequency? The basic reason why these questions are hard to answeris this: either it is or it is not a nine, and the reason I am uncertain reflectonly my lack of knowledge. A Bayesian treatment give a direct analysis of thissituation, an attempt to connect it to a quantum tabletop experiment does not.

The situation is analogous for history. Consider for instance the probabilityCaesar crossed the Rubicon, or a miracle was made up and attributed to a

21

first-century miracle worker. The notion of true frequency in these situationsbecome very hard to define, however if we accept probability simply refer to ourdegree of belief there is no need for such thought experiments.

5 The connection between frequencies and prob-abilities

The last section of Chapter six offers a main philosophical point of the book,namely a combination of frequentistic and Bayesian view of probabilities. This isdone by re-interpretating what is meant by Bayesian probabilities. The chapteropen thus:

Probability is obviously a measure of frequency. If we say 20%of Americans smoke, we mean 1 in 5 Americans smoke, or in otherwords, if there are 300 million Americans, 60 million Americanssmoke. When weathermen tell us there is a 20% chance of rainduring the coming daylight hours, they mean either that it will rainover one-fifth of the region for which the prediction was made (i.e., ifthat region contains a thousand acres, rain will fall on a total of twohundred of those acres before nightfall) or that when comparing allpast days for which the same meteorological indicators were presentas are present for this current day we would find that rain occurredon one out of five of those days (i.e., if we find one hundred such daysin the record books, twenty of them were days on which it rained).

Speaking of bold assertions, consider the first line: Probability is obviously ameasure of frequency. The basic problem is this: If this is obvious, how comeBayesians has failed to see the obvious for 50 years and insisted on probabilityas being rational degrees of belief, ie. a state of knowledge? if it is obvioushow come the main technical reference, Jaynes book, dedicate entire chaptersto argue against this misconception?

What is of course obvious is one can go from probabilities to frequencies-as I have already illustrated with the example of the coin-, but in that casethe implication goes the other way: If the probability is defined in a situationwhere there is a well-defined experiment, such as with a coin, one can makeprobabilistic predictions about its frequency using Bayesian methods.

What is frustrating is Dr. Carriers examples illustrate this well. For instance,if I am the weatherman, if I say i believe it will rain tomorrow with probability0.2, what I mean is most definitely not what Dr. Carrier says, it will rainover one-fifth of the region. Think of how variable the weather is and hownonsensical that statement is if you take it at face value! In fact, I would bebe almost certain that it might rain over either 1/10 or 1/2 or 1/3 or someother fraction of the region. What I am trying to convey is a have a lack ofknowledge whether or not it will rain tomorrow, and my models and data (andpossible Bayes theorem) allow me to quantify this as being 0.2, full stop, nofurther thought-experiments required!.

22

The section continues directly:

Those are all physical probabilities. But what about epistemicprobabilities? As it happens, those are physical probabilities, too.They just measure something else: the frequency with which beliefsare true. Hence all Bayesians are in fact frequentists (and as thisbook has suggested, all frequentists should be Bayesians). WhenBayesians talk about probability as a degree of certainty that h istrue, they are just talking about the frequency of a different thingthan days of rain or number of smokers. They are talking aboutthe frequency with which beliefs of a given type are true, whereof a given type means backed by the kind of evidence and datathat produces those kinds of prior and consequent probabilities. Forexample, if I say I am 95% certain h is true, I am saying that of allthe things I believe that I believe on the same strength and type ofevidence as I have for h, 1 in 20 of those beliefs will nevertheless stillbe false (...). Probability can be expressed in fractions or percentilenotation, but either is still a ratio, and all ratios by definition entail arelation between two values, and those values must be meaningful fora probability to be meaningful. For Bayesians, those two values arebeliefs that are true and all beliefs backed by a certain comparablequantity and quality of evidence, which values Ill call T and Q. Tis always a subset of Q, and Bayesians are always in effect sayingthat when we gather together and examine every belief in Q, wellfind that n number of them are T , giving us a ratio, nt/nq, which isthe epistemic probability that any belief selected randomly from Qwill be true

The good news about the proposal is that it is relatively clearly stated, the badnews is it is both unnecessary and defective. That the definition is defective isproperly best illustrated with a small puzzle: Suppose I have a coin of whichI know if I flip it two times (independently), the chance it will come up headsboth times is 1/2. What is the probability it will come up heads if I flip it once?

The problem is easy to solve: P (HH) = P (H)P (H) = 12 and so P (H) =

1/

2. Now, the problem is 1/

2 cannot be represented as a fraction of twointegers, so when Dr. Carrier writes: Probability can be expressed in fractionsor percentile notation, but either is still a ratio, and all ratios by definitionentail a relation between two values, and those values must be meaningful for aprobability to be meaningful., and then go on to define the probability in termsof fractions of integers (see the quotation above), he is exactly excluding theabove case.It goes without saying the coin should not and do not pose a problem from aBayesian or frequentist perspective.

There are two ways to avoid the problem: One is to say we simply dontcare about the coin because its a stupid example. In my opinion thats justadmitting the proposed definition do not work. The other is to say the abovediscussion only applies to epistemic probabilities and the coins probability is

23

something else which we have not defined. The problem is this would createabsurdities, because I could then change the type of probability from epistemicto that something else by considering a new system that involve the coin atsome point.

I think this basic example is fatal in terms of obtaining a general and con-sistent theory out of Dr. Carriers proposal, but to avoid charges of rejecting agood idea because of some mathematical trickery which can perhaps be fixed,I want to point out some other more serious ailments of the proposal of whichthe coin-example is only a symptom:

Lets simply try to imagine how the proposal can be implemented. SupposeI consider the statement: I will get an email between 16.00-17.00 today. Letssay that after thinking about this as carefully I can, possibly using Bayes the-orem, I arrive at a probability of 0.843 of that statement being true. Now, toimplement the above definition, I think very carefully about all I know and,though I cannot at the moment tell how I would arrive at this conclusion, I re-alize I know exactly 3 other things on the same type and strength of evidenceas was the case of the email, giving nq = 4. I now need to compute nt, namely:beliefs that are true. A basic problem is that I wouldnt know how to do thisbecause I do not know which of these are true or not, so I suppose I shouldimagine I have access to an oracle that knows the real truth.

At any rate, even without the oracle, nt can take the values: 0, 1, 2, 3 and 4.This give 5 different possible epistemic probabilities, nt/nq = 0, 1/4, 1/2, 3/4, 1,none of which is 0.843. So does this mean I didnt really believe the statementat probability 0.843? In which case, with what probability do I believe thestatement with, then? Does it mean the probabilities we have available is limitedby how many things we know? If taken at face value, the proposal seems entirelyflawed.

To counter any claim I am quoting Dr. Carrier out of context the proposalis summarized later in the section as follows:

So when you say you are only about 75% sure youll win a par-ticular hand of poker, you are saying that of all the beliefs you havethat are based on the same physical probabilities available to you inthis case, 1 in 4 of them will be false without your knowing it, andsince this particular belief could be one of those four, you will actaccordingly. So when Bayesians argue that probabilities in BT rep-resent estimates of personal confidence and not actual frequencies,they are simply wrong. Because an estimate of personal confidenceis still a frequency: the frequency with which beliefs based on thatkind of evidence turn out to be true (or false). As Faris says ofJaynes (who in life was a prominent Bayesian), Jaynes considersthe frequency interpretation of probability as far too limiting. In-stead, probability should be interpreted as an indication of a state ofknowledge or strength of evidence or amount of information withinthe context of inductive reasoning. But an indication of a state ofknowledge is a frequency: the frequency with which beliefs in that

24

state will actually be true, such that a 0.9 means 1 out of every 10beliefs achieving that state of knowledge will actually be false (so ofall the beliefs you have that are in that state, 1 in 10 are false, youjust wont know which ones). This is true all the way down the line.

If anything I think this write up is even more muddled. To take the pokerexample, I dont know 1 in 4 things I know with probability 0.75 will be false.Why should they? It might turn out everything I know with probability 0.75will be true.

Asides suffering from the above flaws, it suffer from all the other flaws Ipreviously discussed: Suppose I know exactly 4 things with probability 0.75:The 6th digit of pi is 3, that the Brazillians speak Brazillian, that there are 52states in USA and that Adam and Eve really lived; however these things will allbe false! For that reasoner, the frequency of which beliefs based on that typeof evidence turn out to be false is 1. This is no problem if we use probabilityto refer to a state of knowledge, as Jaynes do, but it is a problem if we wantto root it in what is actually the case, as Dr. Carrier suggests. Again there isabsolutely nothing novel about the points raised here they can all be found inJaynes book.

One might attempt to rescue the proposal as follows. Suppose one say:I did not intend to say, a [probability of] 0.9 means 1 out of every10 beliefs achieving that state of knowledge will actually be false. Imerely meant: The average (or expected value) of nt/nq is 0.9. The problem ofsuch a definition is that it will almost inherently be circular, since the average iscomputed using the probability, and so cannot be used to define probabilities. Asource of confusion is that we can make probabilistic statements about nt, butin order to do so require we have a theory for probabilities. What is that theory?If we are frequentists, we need to consider why it should apply to statementsabout eg. Jesus. If it is Bayesian, well, there is your theory. There is no reasonto force an ad-hoc layer of interpretation on top of it.

5.1 There is good weather at infinity

That the proposal is flawed simply by the virtue of not allowing one to representa probability of 0.843 if one only know 3 other things at that confidence (andsuppose how unfortunate we would be if we only knew 1 such thing..) or a aprobability of 1/

2 makes me suspect Dr. Carrier had intended some sort of

limit statement, that is, using infinities in some way.A basic problem of using infinities is the things we consider are not infinite.

If we have two interpretations of assigning probabilities in the case of 3 coinsand the existence of 1 Jesus, and the first only require us to consider 3 coins andone Jesus while the second require us to consider an infinite number of coinsand Jesuses, I think there is ample reason to suspect the first proposal is themore fundamental for the sole reason there was at most one Jesus.

Nevertheless I will briefly mention 3 ways to attempt to fix the proposalabove by appeal to infinities and simply notice there is no need for any similar

25

ad-hockery in a Bayesian interpretation.The first is to propose we always know an infinite number of things of any

given probability. I think this proposal can be rejected on the grounds it isblatantly false.

The second proposal is somewhat related to the first, and that is that to makesense of any given probability of (say) 0.8, we immediately imagine an infinitenumber of coin-flips with biased coins that come up heads with probability 0.8and define probability from this. I suspect it is hard to define this in a non-circular fashion (keep in mind random must be defined in this context withoutusing probability), but a worse problem is the chance of the event happeningin the real world is irrelevant to the definition, since the limit will be entirelydominated by the infinite number of hypothetical coins. Thus, the proposalhas no normative power. Finally the proposal seem to simply be a fancy wayof arriving at the number 0.8: Does the proposal effectively differ from simplysaying a probability of 0.8 is taking a cake and dividing it in such a way the onepart is in a ratio of 0.8 to the total and thats the definition of probability? Putbriefly, I dont see how the proposal has a normative effect on how probabilitiesare used.

The third proposal is going deeper into frequentist land and imagining aninfinite number of words in which we believe things at a probability of 0.8 andimagine the probability is defined as how often things believed at a probability0.8 turn out to be true in these worlds. This is basically the frequentist definitionof probabilities, and contain all the illusions of circularity and fancy reasoningBayesians usually object to, and has led frequentists themselves to object to theidea we can assign probability to things like Jesus raising from the dead. Forinstance, how does the infinite number of worlds where the 6th digit of pi is 3look like?

5.2 The Bayesian/frequentist divide is not only about prob-abilities

Finally, I am not sure how the division between frequentists and Bayesians isresolved even if the proposal work. The division involve things such as if datais fixed and parameters are variable, or if data is variable and parameters isfixed. It involve frequentists objecting to applying Bayes theorem to thingslike those considered by Dr. Carrier, and it involve (at least some) Bayesiansrejecting frequentist methods such as confidence intervals and t-tests as blatantad-hockery that should go the way of the Dodo. I simply do not see how addinga layer of frequencies on top of Bayesian language affect the difference of opinionon these issues.

5.3 The big picture

Why should we accept Bayes theorem and its applications to questions like thebook of Matthew being written by Matthew? If we do, it must be because of a

26

rigorous argument. I believe that eg. Cox and Jaynes provide such arguments,and it seem Dr. Carrier believe so as well, recall from chapter three:

The theorem was discovered in the late eighteenth century andhas since been formally proved, mathematically and logically, so wenow know its conclusions are always necessarily trueif its premisesare true

Though the claim us surprisingly not given a reference, Carrier himself suggestexactly Jaynes. But I think it is evident Dr. Carrier is in opposition to mostof Jaynes philosophical points and assumptions from first to last chapter. Forinstance, Dr. Carrier advance several different notions of probability (probabil-ity, physical probability, epistemic probability, hypothized probability) and offrequencies. Suppose all of these are equivalent to what Jaynes call probabilityand frequency, in that case why confuse the language and not simply talk aboutprobability and frequency?

The most logical conclusion, which I repeat think is very evident from simplynoting the differences in opinion I have documented above, is Dr. Carrier is inopposition to Jaynes and by extension Cox and most other Bayesian thinkers ofthe 20th century. In that case why should we think Bayes theorem hold? Howdo we set out to prove it? Simply pointing to the Kolmogorov Axioms wont cutit: Sure, that give us a mathematical theory of probabilities, but why supposeit applies to historical questions any more than the theory of matrices?

The alternative is that Dr. Carrier is in agreement with eg. Jaynes andCox and I have just been to sloppy to see it. For instance the re-interpretationof epistemic probabilities as frequencies is really just something added on topof the Bayesian framework. Well if it is just something we add and it has nonormative effect in terms of our calculations, I think Laplace reply is in order:[No, Sire,] I had no need of that hypothesis.

6 Priors

The problem with interpreting probabilities as frequencies is in my opinionreflected through the book, for instance when Dr. Carrier propose how oneshould arrive at priors from frequencies. The problem can be summarized asthis: Suppose you want to assign a prior probability to some event E. Youobserve E happening n times out of N . What is the prior probability p(E)?For probability to have a quantitative applicability to history it is crucial toarrive at objective ways of specifying prior probabilities. For instance in theexample of the Criteria of Embarrassment we must be able to estimate numberssuch as P (P ) (the probability a gospel is preserved) or P (Em) (the probabilitya story is embarrassing). Without such machinery Bayes theorem will just bea consistency requirement without the ability to provide quantitative results.To give a concrete example of how Dr. Carrier treats this problem consider thefollowing from chapter six, in the section on determining a reference class:

27

If our city is determined to have had that special patronage, andour data shows 67 out of 67 cities with their patronage have publiclibraries, then the prior probability our new city did as well will nowbe over 98%.

Laplaces rule of succession is invoked here to arrive at the figure 98%, as it oftenis through the book, but without any consideration where it come from or ifthe specific assumptions are fulfilled. In fact, one would not get the impressionfrom reading the book Laplaces rule is a Bayesian method at all, but I digress.

Now consider the following example of a more elaborate problem on librariesin two provinces:

To illustrate this, the libraries scenario can be represented withthis Venn diagram [see figure 1] In this example, P (LIBRARY |RC) =

Figure 1: Venn diagram from Proving History

0.80, P (LIBRARY |IT ) = 0.90, and P (LIBRARY |NP ) = 0.20.Whats unknown is P (LIBRARY |C), the frequency of libraries atthe conjunction of all three sets. If we use the shortcut of assigningP (LIBRARY |C) the value of P (LIBRARY |NP ) < P (LIBRARY |C)

P (LIBRARY |RC), but also because P (LIBRARY |RC) is already>> P (LIBRARY |NP ), which to overcome requires something ex-tremely unusual. Lacking evidence of such differences, we must as-sume there are none until we know otherwise, and even becomingaware of such differences, we must only allow those differences tohave realistic effects (e.g., evidence of a small difference in conditionscannot normally warrant a huge difference in outcome; and if youpropose something abnormal, you have to argue for it from pertinentevidencewhich all constitutes attending to the contents of b and itsconditional effect on probabilities in BT). However, we would haveto say all the same for P (LIBRARY |C) > P (LIBRARY |NP ),since we have no more evidence that P (LIBRARY |C) is anythingother than exactly P(LIBRARYNP). All we have is the fact thatP (LIBRARY |IT ) is higher than P (LIBRARY |RC), but that initself does not even suggest an increase in P (LIBRARY |NP ), andcertainly not much of an increase. Thus P (LIBRARY |NP )

Documents

Richard Carrier Proving History Review