CAN BAYES' RULE BE JUSTIFIED BY COGNITIVE RATIONALITY ... · world (Alchourron-Gärdenfors-Makinson,1985), and updating where the message gives some recent information on an evolving

CAN BAYES' RULE BE JUSTIFIED BY COGNITIVE RATIONALITY PRINCIPLES ?

by Bernard WALLISER (CERAS-ENPC and CREA -Ecole Polytechnique) and Denis ZWIRN (CREA -Ecole Polytechnique)

August 2002

Published in ‘Theory and Decision’, 53, 2002 Summary: The justification of Bayes' rule by cognitive rationality principles is undertaken by extending the propositional axiom systems usually proposed in two contexts of belief change: revising and updating. Probabilistic belief change axioms are introduced, either by direct transcription of the set-theoretic ones, or in a stronger way nevertheless in the spirit of the underlying propositional principles. Weak revising axioms are shown to be satisfied by a General Conditioning rule, extending Bayes' rule but also compatible with others, and weak updating axioms by a General Imaging rule, extending Lewis rule. Strong axioms (equivalent to the Popper-Miller axiom system) are necessary to justify Bayes' rule in a revising context, and justify in fact an extended Bayes' rule which applies even if the message has zero probability.

1

In a context of uncertainty, when an actor’s belief is represented by a probability function on some

set of possible worlds, the traditional approach to belief change is to use Bayes' rule. The principle underlying this rule is to reallocate proportionally the probability of the worlds excluded on the remaining worlds. This rule receives a statistical justification when subjective probabilities are identified with proportions or frequencies (populational argument), and a decision-theoretic justification when probabilities are identified with betting rates (Dutch Book argument). But Bayes' rule has not received yet a clear epistemic justification based on subjective probabilities as true degrees of belief and suffers from three drawbacks.

First, Bayes' rule is not grounded on axioms which express cognitive rationality principles followed

by the actor in any type of belief change. For instance, Teller (1976) argues that Bayes' rule is the only one satisfying two conditions, stating in fact that the probabilities of worlds compatible with the message change homothetically. Williams (1980) shows that Bayes' rule minimizes Kulback’s informational distance between initial and final belief when the message is recognized as certain. Heckerman (1988), following Cox (1946), proves that Bayes' rule is the only one ensuring the consistency of the Boolean structure of events and two algebraic axioms. But all these propositions have an arbitrary flavor since they lack of cognitive foundations.

Second, Bayes' rule is not related to a precise epistemic context where it is assumed to adequately

apply. To be fair, Gärdenfors (1988) proposes such a foundation, by exhibiting a link between Bayes' rule and the propositional axioms of Alchourron-Gärdenfors-Makinson (1985), backed on very general intuitions governing belief change and now a reference system in AI literature. However, that approach encounters two important limits. First, the transcription of the qualitative axioms of revision into probabilistic ones relies itself on a very demanding axiom ("linear mixing") which seems quite as arbitrary as the previous purely algebraic axioms. Second, Gärdenfors recognizes that Bayes' rule has a challenger, the Lewis rule (or Imaging), which obeys to different axioms, but he is silent about the contexts of change where they respectively should apply.

Third, Bayes' rule cannot be used in situations where the message contradicts the initial belief, i.e.

has zero prior probability. A handling of this problem, due to Miller-Popper (1994), introduces directly some axioms for conditional probabilities accepting ordinary probabilities as specific cases. But this axiom system does not provide any rule for effectively computing the conditional probability.

The present work tries to justify some families of change rules by a complete system of axioms

reflecting principles of cognitive rationality and to overcome the preceding limits. First, it translates usual axioms for belief change from set-theoretic into probabilistic ones, in a weak, strong or super-strong way, transforming qualitative into quantitative principles. Second, it distinguishes two contexts of belief change: revising where the message completes or contradicts the initial belief about a static world (Alchourron-Gärdenfors-Makinson,1985), and updating where the message gives some recent information on an evolving world (Katsuno-Mendelzon,1992). Third, it shows that an “extended Bayes' rule”, which applies even for a zero probability message, can only be obtained by the super-strong axiom system proposed in the revising context, shown to be equivalent to the Miller-Popper system.

The representation theorems obtained in the paper are summarized in the following table where

Extended Bayes' rule appears just in one cell:

context of change axiomatic system

Revising (Alchourron -Gärdenfors-Makinson)

Updating (Katsuno- Mendelzon)

weak

Weak General Conditioning (GCW)

Weak General Imaging (GIW)

strong

Strong General Conditioning (GCS)

Strong General Imaging (GIS)

super strong Bayesian General Conditioning (GC-B) or Extended Bayes' rule (⇔ Miller-Popper)

P-independent General Imaging (GI-P)

2

In a first part, the paper considers change axioms for both contexts of change. Set-theoretic axioms are recalled (§1.1), from which weak probabilistic axioms are then derived (§1.2); finally, strong probabilistic axioms extending the spirit of weak ones to a numerical framework are proposed and discussed (§1.3). In a second part, the paper considers change rules obtained by representation theorems for both contexts. Set-theoretic rules are recalled (§2.1), followed by probabilistic rules obtained with weak axioms (§2.2) and strong ones (§2.3). 1. CHANGE AXIOMS 1.1. SET- THEORETIC AXIOMS The paper favors the semantical possible worlds approach to a syntactical approach, since it looks more suited to an extension from a set-theoretic to a probabilistic framework. In this approach, each proposition is symbolized by an event denoted X, i.e. by the subset of worlds where the proposition is true. A contradictory proposition will be represented by the empty set ∅ and a tautological proposition by the whole set of worlds W, considered as finite. 1.1.1. COMMON SET-THEORETIC AXIOMS . The change process considers that the agent has an initial belief K and receives a message A, both given as events. The message can be compatible with the initial belief (K∩A ≠∅) or contradictory (K∩A =∅). He infers from them a revised belief K * A, which is explicitly assumed to be a unique event. Hence, the process studies the change of some belief, but not the origin of any belief. Formally, a change function * is a function from 2W to P(W) which associates to a couple of events (K,A) an event K * A. The context of change is assumed to be given by the message itself, which completes or contradicts a prior belief about a static world (revising), or gives some information about the modification of an evolving world (updating). For instance, if somebody believes that he has money in his pocket and discovers that his pocket is empty, the message indicates whether he never had money in his pocket (revising) or the money was stolen (updating).The question of whether one context can be reduced to the other is still an open one (Friedman-Halpern,1994). In general, a semantic representation has two well-known consequences. First, the Logical omniscience axiom is accepted : the agent’s belief is always deductively closed and keeps no distinction between explicit and implicit (derived) belief. Second, the Extensionality axiom is always satisfied : if two propositions have the same truth values, they are symbolized by the same subset of worlds and hence are indistinguishable. Applied to belief change, the Extensionality axiom says that two initial beliefs (or two messages) represented by the same set of worlds lead to final beliefs represented by the same set of worlds, whatever their linguistic formulation. In a purely set-theoretic framework, this property becomes trivial, but will be exploited in § 1.2.1 : A0. Neutrality If K = K’ then K*A = K’*A The two contexts of change share moreover the four following axioms : A1. Consistency If K ≠ ∅ and A ≠ ∅ then K * A ≠ ∅ A2. Success K * A ⊆ A A3. Conservation If K ⊆ A then K * A = K

3

A4. Sub-expansion (K * A) ∩B ⊆ K * (A ∩ B) Intuitively, Consistency states that when a non contradictory initial belief K is revised by a non contradictory message, the final belief is non contradictory. Conservation states that if the message is already validated by the initial belief, the final belief is unchanged. Summarizing a priority principle, alternative to a possible symmetry principle between message and prior belief, Success states that the final belief validates the message considered as true (contrary to the initial belief). Sub-expansion is one side of a principle of minimal change of the prior belief; it states that the final belief resulting from two messages keeps at least the part of the revised belief by one message compatible with the other. From these axioms, one can derive the following properties: A2’. Idempotence (K * A) * A = K * A. A3’.Weak conservation K * W = K A4’. Inclusion (by A4 and A3’) K ∩ A ⊆ K * A Idempotence states that when a message is repeated, it no longer modifies the final belief ; this axiom is natural when the priority principle is stated (contrary to symmetric combination where a repeated message is reinforced). Weak conservation is a restriction of Conservation to a tautological message. Inclusion is a restriction of Sub-expansion to the case of a single message : it states that the final belief keeps at least the part of the initial belief which is compatible with the message. 1.1.2 SET-THEORETIC REVISING AXIOMS The revising axiom system consists in adding the following axiom to the previous common ones : A5. Super-expansion If (K * A) ∩ B ≠ ∅ then K * (A ∩ B) ⊆ (K * A) ∩ B This axiom is the other side of a principle of minimal change of the prior belief : it states that the final belief resulting from two messages always keeps at most the part of the revised belief by one of them compatible with the other, if any. Call Ar = {A1, A2, A3, A4, A5} the revising axiom system. From Ar, the following properties can be derived (see appendix 1 for proof of A45) : A5’. Preservation (by A5 and A3’) If K ∩ A ≠ ∅ then K * A ⊆ K ∩ A A45. Right distributivity K * (A U B) = (K * A) U (K * B) or K *A or K * B A45’.Commutativity If K * A ∩ B ≠ ∅ and K * B ∩ A ≠ ∅, then K * (A ∩B) = (K * A) * B = (K * B) * A Preservation is a restriction of Super-expansion to the case of a single message : it states that the final belief keeps at most the part of the initial belief compatible with the message, if it exists. Right- distributivity states that the final belief resulting from the disjunction of two messages is equivalent either to the final belief resulting from one of these messages, or to the disjunction of the final beliefs resulting from each of these messages. Finally, Commutativity states that if no contradiction arises in the belief change, the final belief does not depend of the order in which two messages are dealt; this property is quite natural in a revising context where the messages give independent information on a same world.

4

Considering that Extensionality is automatically satisfied in a possible worlds model, and assuming that K * A is always unique, the set of axioms Ar is a correct transcription in a set- theoretic framework of the Alchourron-Gärdenfors-Makinson (1985) system, however limited to the cases where the initial belief K is consistent. But the system can also be represented by {A1, A2 ,A3’, A4, A5} or by {A1, A2, A3’, A45}. 1.1.3. UPDATING SET-THEORETIC AXIOMS The updating axiom system consists in adding the two following axioms to the previous common ones : A6. Pointwise super-expansion If ∃ w0 ∈ W : K = {w0} and if (K * A) ∩ B ≠ ∅ then K * (A ∩ B ) ⊆ (K * A) ∩ B A7. Left-distributivity (K U K’) * A = (K * A) U (K’ * A) Pointwise super-expansion restricts the intuition of Super - expansion to the case where the initial belief is complete (or certain), i.e. reduced to one world; hence, the principle of minimal change survives only in a weaker form; it follows that the Commutativity axiom is no more satisfied: when the real world evolves, the order of successive messages matters, the last being favored. Left - distributivity states that when balancing between two alternative initial beliefs, these are revised independently for any message. From the preceding axiom, it is possible to deduce the following property: A7’. Left-monotonicity If K ⊆ K’ then K * A ⊆ K’ * A This axiom states that if initial belief is weakly decreased, so is final belief . Since Extensionality is necessarily satisfied, the set of axioms Au = {A1, A2, A3, A4, A6, A7} is a correct transcription in a set-theoretic framework of the system referred as {U1-U5, U8, U9} by Katsuno-Mendelzon (1992), the strongest of the two versions of update axioms they propose. Remark : When initial belief is assumed to be always complete, Left-distributivity A7 is void since K U K’ is reduced to a single world, requiring K and K’ to be identical; hence, Ar and Au are equivalent systems. When not, Au is incompatible with Ar since A5’ and A7 are contradictory (proof in appendix 1). 1.2. WEAK PROBABILISTIC AXIOMS The set-theoretic axioms can be easily transcripted in probabilistic ones through natural conventions which reduce a probability distribution to its support. A probability distribution is naturally stated in a possible worlds framework. Contradictory probabilistic belief is symbolized by a function P∅. 1.2.1. TRANSCRIPTION CONVENTIONS AND COMMON AXIOMS In a probabilistic framework, the initial belief is associated with a prior probability distribution P on the worlds. The message is again an event A, associated in the model with a subset of worlds. The final belief is assumed to be a posterior probability distribution denoted P A, or more concisely P*A. Let ∆(W) be the set of all probability distributions on the set of possible worlds W. A probabilistic change rule is then a function from ∆(W) x P(W) to ∆(W). Let Sup(P) = {w ∈ W, P(w) > 0} the support of P. If K, K’, K’’ are the respective supports of probability distributions P, P’, P’’, the following table enumerates useful rewritings of set-theoretic formulas in a probabilistic framework :

5

Set-theoretic formula

Probabilistic transcription

K ⊆ A

P(A) = 1

K ⊆ K’

P’(X) = 1 ⇒ P(X) = 1, or : P(X) > 0 ⇒ P’(X) > 0

K’ = K ∩ A

P’(X) = 1 ⇔ P(X∩A) = P(A) ⇔ P(A→X) = 1, or : P’(X) > 0 ⇔ P(X∩A) > 0

K’’ = K ∩ K’

P’’(X) > 0 ⇔ P(X) > 0 and P’(X) > 0

K’’ = K U K’

P’’(X) = 1 ⇔ P(X) = 1 and P’(X) = 1, or : P’’(X) > 0 ⇔ P(X) > 0 or P’(X) > 0

K ∩ A ≠ ∅

P(A) > 0

For instance, the second line of the table means that if the larger support probability distribution assigned probability one to some event X, then any smaller support probability distribution would have to do also. In order to set up the weakest probabilistic axioms discussed in this paper, one needs to rely on the following definition and postulate, which can be thought as a minimal transcription device of set-theoretic axioms (endorsed by Gärdenfors, 1988) : Definition: A probabilistic change rule and a set-theoretic change rule * are said to be associated if and only if ∀P, ∀A : Sup(P A) = Sup(P*A) = Sup(P) * A. TP. Transcription Postulate : A set-theoretic change rule * associated to always exists, whatever the probabilistic change rule . Corrolary : The set-theoretic change rule * associated to a probabilistic change rule is always unique.

Suppose that * and ** are both associated to . For K, consider one probability distribution P (always existing) such that : K = Sup(P). Then : Sup(P A) = Sup(P) * A = Sup (P) ** A, ∀ K,∀ A. Hence * and ** are identical. The previous definition and postulate can now be used to transcript set-theoretic postulates into probabilistic ones : B0. Neutrality If [P(X) > 0 ⇔ P’(X) > 0] then [P*A(X) > 0 ⇔ P’*A(X) > 0] B1. Consistency If A ≠ ∅ and P ≠ P∅ then P*A ≠ P∅ B2. Success P*A(A) = 1 B3. Conservation If P(A) = 1 then [P*A(X) > 0 ⇔ P(X) > 0]

6

B4. Sub - expansion If P*A(X∩B) > 0 then P*A ∩ B (X) > 0 Neutrality, derived from A0 and TP, is no more a trivial property, since it assumes that two posterior probability distributions have the same support when the prior distributions have the same support. This principle, labeled in Lindström – Rabinowicz (1989) the principle of Top Equivalence, even if debatable in extreme cases (when the weight of some worlds within the support becomes infinitesimal), will be hereafter accepted as a minimal assumption of transcription, backed on a quite intuitive property as can be seen in the generic case. Once TP is accepted, the other axioms can be considered as the simplest translation of the corresponding set-theoretic axioms in a probabilistic framework. 1.2.2. WEAK PROBABILISTIC REVISING AXIOMS The weak probabilistic axioms for revising are defined by adding the following transcripted axioms to the previous common ones : B5. Super-expansion If P*A(B) > 0 then [P*A∩B (X) > 0 ⇒ P*A (X∩B) > 0] Further properties can of course be derived from the preceding ones or directly transcripted: B4’. Inclusion If P(X∩A) > 0 then P*A(X) > 0 B5’. Preservation: If P(A) > 0 then [P*A (X) > 0 ⇒ P(X∩A) > 0] B45. Right-distributivity P*AUB (X) > 0 ⇔ P*A(X) > 0 and/or P*B(X) > 0 The system Br = {B0, B1, B2, B3, B4, B5} is the weak transcription of the revising axiomatic system Ar, the following lemma being obviously stated :

Lemma 1 : A probabilistic change rule mapping any prior probability distribution P into a posterior probability distribution P*A satisfies the axiom system Br if and only if the associated set-theoretic change rule * satisfies the axiomatic system Ar. 1.2.3. WEAK PROBABILISTIC UPDATING AXIOMS The weak probabilistic axioms for updating are defined by adding the following transcripted axioms to the previous common ones : B6. Pointwise Super-expansion: If [∃w0⎥ P(w0) = 1 and P*A(B) > 0] then [P*A∩B (X) > 0 ⇒ P*A (X∩B) > 0] B7. Left - distributivity If [∀X, Q(X) > 0 ⇔ P(X) > 0 or P’(X) > 0] then [∀X, Q*A(X) > 0 ⇔ P*A(X) > 0 or P’*A (X) > 0] The system Bu = {B0, B1, B2, B3, B4, B6, B7} is the weak transcription of the updating axiom system Au, the following lemma being again obviously stated :

Lemma 2 : A probabilistic change rule mapping any prior probability distribution P into a posterior probability distribution P*A satisfies the axiom system Bu if and only if the associated set-theoretic change rule * satisfies the axiomatic system Au.

Commentaire [d1] : P*A

7

1.3. STRONG PROBABILISTIC AXIOMS Probabilities are semantically richer than propositions, to which they attribute a real number and not only a binary value. So, a more ambitious transcription of set-theoretic axioms requires stronger axioms, extending to a numerical framework the same generic intuitions that lie behind set-theoretic axioms. In fact, one even distinguishes between strong and superstrong axioms. 1.3.1. STRONG PROBABILISTIC REVISING AXIOMS The strong system of axioms for probabilistic revising, B#r, is defined by replacing B3 and B4 by the two following axioms : B#3. Strong conservation: If P(A) = 1 then P*A(X) = P(X) B#4. Strong sub-expansion: ∀X P*A∩B(X) ≥ P*A (X∩B) Intuitively, Strong conservation states that if a message is already certain according to the prior probability, the posterior probability is the same. Strong sub-expansion expresses that the conjunction of two messages never decreases the probability of the part of an event selected by one and compatible with the other. These two intuitions can be thought as natural applications in a quantitative setting of the intuitions underlying Conservation and Sub-expansion in a qualitative setting. From the system B#r, the following principle can be derived : B#4’. Strong inclusion: ∀X P*A(X) ≥ P(X∩A), or equivalently : ∀X P*A(X) ≤ P(A→X). This axiom asserts that a message never decreases the probability of the part of an event compatible with the message. This extends to a numerical framework the generic intuition that lies behind the set-theoretic principle of Inclusion : a message never weakens the belief in its own worlds. Jointly with Success (B1), Strong inclusion means that the only way to weaken the probability of a world is to eliminate it : if P*A(w) > 0 then P*A(w) ≥ P(w). The super-strong system of axioms for probabilistic revising, B##r, is defined by adding to B#r the following axiom : B##45. Linear mixing: If A ∩ A’ = ∅ then ∃ a ∈ [0,1] s.t. ∀X, P*A U A’ (X) = a P*A (X) + (1-a) P*A’ (X) This axiom is a very demanding numerical extension of the Right-distributivity axiom (B45). It is endorsed by Gärdenfors (1988) in order to characterize probabilistic change when the message has a positive prior probability (axiom P+1), but with a more specific form associated with the constraint: a = P(A) / P(A U A’). In fact, this constraint will here be shown to result from the fact that Linear mixing is represented only by the Bayes' rule (in the presence of the other revising axioms). Linear mixing implies Right-distributivity only for two messages A and A’ such that A ∩ A’ = ∅. 1.3.2. STRONG PROBABILISTIC UPDATING AXIOMS The strong system of axioms for probabilistic updating B#u, is defined by replacing B3, B4 and B7 by the following axioms : B#3. Strong conservation: If P(A) = 1 then P*A(X) = P(X)

8

B#4. Strong sub-expansion: ∀X P*A∩B(X) ≥ P*A (X∩B) B#7. Separability: If [∀X, Q’(X) = a P(X) + (1-a) P’(X) and Q’’(X) = aP(X) + (1-a) P’’(X), a ∈ ]0,1] ] then [∀X, P’*A(X) = P’’*A (X) = 0 ⇒ Q’*A(X) = Q’’*A(X) ] Separability states that when combining linearly a probability distribution with a second one, the revised probability of an event is not affected by a substitution of the second prior probability distribution if their respective posteriors are null for this event. It constitutes a first possible transcription of Left-distributivity to a quantitative setting, even if it does not imply it. The super-strong system of axioms for probabilistic updating, B##u, is defined from B#u by replacing B#7 by B7 and the super-strong following axiom : B##7. Homomorphism: If [∀X Q(X) = a P(X) + (1-a) P’(X)] then [∀X Q*A(X) = a P*A(X) + (1-a) P’*A(X)] ∀ a ∈ ]0, 1[ Homomorphism (Gärdenfors, 1988) looks again like a numerical extension of Left-distributivity to linear mixtures of probability distributions, a natural way to combine them. Homomorphism implies Separability, but not Left - distributivity. However, Left - distributivity, Separability and Homomorphism appear as stronger and stronger expressions of the same generic principle : a message which revises a combination of two initial beliefs leads to a similar combination of the corresponding final beliefs. 1.4. SYNTHETIC TABLE The eight axiom systems that have been introduced can be summarized in the following table :

Change context

Axioms

Revising (in a static world)

Updating (in an evolving world)

Set theoretic (referring to propositional truth values)

Ar

Au

Weak Probabilsitic (referring to probability supports)

Br

Bu

Strong Probabilistic (referring to probabilistic ordinal values)

B#r

B#u

Super strong Probabilistic (referring to probabilistic cardinal values)

B##r

B##u

2. CHANGE RULES 2.1. SET- THEORETIC RULES In the axiom systems, the economy principle involves a qualitative notion of minimal change from initial to final belief. In the worlds space, this principle is embodied by a notion of distance which acts as a mental characteristic of each agent, but has a different flavor in each context.

9

2.1.1. SET-THEORETIC REVISING RULES For revising, it seems natural to adopt a global distance between sets of worlds, applied between the initial and the final belief. Indeed, what has to be changed is not the real world itself, but only our belief about it, that is the set of possible worlds candidate to represent the real world. So, one has to look for a new set of worlds compatible with the message, but as near as possible from the old set. Especially, as long as it is logically possible, it is assumed that the message only makes the initial belief more precise, hence justifying the Preservation axiom. This intuitive relation between the minimal change principle and the distance notion was made more precise in a representation theorem, due to Katsuno-Mendelzon (1991,1992):

Theorem KM1 : A change function * satisfies the set of revising axioms Ar if and only if it exists a total preorder ≤K on the worlds set W such that: (i) w’ ∈ K and w’’ ∈ K ⇒ w’=K w’’ (ii) w’ ∈ K and w’’ ∉ K ⇒ w‘ <K w’’ (iii) K * A = Min (A, ≤K) = {w’ ∈ A, w’’ ∈ A ⇒ w’ ≤K w’’ } According to (i) and (ii), the preorder ≤K can be thought of as defining a set of nested spheres, where the inner sphere is K. According to (iii), the event K*A is nothing else than the set of worlds with minimal distance to K, i.e. worlds which belong simultaneously to A and to the first sphere intersecting A. Moreover, the preorder defines an « epistemic entrenchment » order on all events. An event E is less entrenched than an event E’ if the first sphere intersecting -E is nearer from K than the first sphere intersecting -E’. Finally, the system of spheres not only allows to define K * A, but a new preorder ≤K*A when message A is obtained, at least in A. The new system is just the intersection of the previous system of spheres with A and can be used for dealing with a new message B. 2.1.2. SET-THORETIC UPDATING RULES For updating, it is natural to consider a local distance between worlds, which applies between each world of the initial belief and its counterpart in the final belief. What has to be changed now is the real world itself, even if it is not perfectly known. So, one has to associate to each world previously considered as possible a new one eventually not considered initially. The Left - distributivity axiom exactly reflects such a case-based reasoning: if the world was such or such, it is now such or such. The following representation theorem (Katsuno-Mendelzon, 1992) relies on a local distance of that kind, embedded with preorders indexed by each world :

Theorem KM2 : A change function * satisfies the set of updating axioms Au if and only if it exists a total preorder ≤w on the worlds set W such as: (j) w’ ≠ w ⇒ w <w w’ (jj) K * A= Min A w

w K( , )≤

∈U = {w’ ∈ A, ∃ w ∈ K, ∀ w’’ ∈ A : w’ ≤w w’’ }.

According to (j), the preorder ≤w defines again a set of nested spheres, where the inner sphere is {w}. According to (jj), the event K*A is the set of worlds in A with minimal distance to each world of K, i.e. worlds which belong simultaneously to A and to the first sphere intersecting A, for each system of spheres around a world in K. Moreover, the initial system of spheres around each world has not to be changed when receiving successive messages. Remark: If K is complete: K={w}, a revising and an updating rule lead from the same initial belief to the same final belief for any message when the preorders coincide : ∀ w’ ∈ W, ∀ w’’ ∈ W, w’ ≤w w’’ iff w’ ≤K w’’; this may be possible for any K. If K is not complete, the nearest worlds of w ∈ K have to be all the worlds of K, but this correspondence condition cannot be stable with K; hence a revising and an updating rule act differently.

10

2.1.3. CANONICAL EXAMPLE The distinction between revising and updating axioms and rules can be illustrated by the canonical basket example (Dubois-Prade, 1994) : A basket may be described by four worlds according to the fact that it contains an apple (a) or not (-a) together with a banana (b) or not (- b). The agent believes at t that the basket contains at least one fruit, K = (a ∧ b) v (-a ∧ b) v (a ∧ - b). A revising message from a direct witness says that there is no banana : A = (a ∧-b) v (-a ∧ - b). Hence he concludes that only the world with one apple and no banana remains : K*A = (a ∧ -b), the change process satisfying Preservation. An updating message says at t+1 that there is no more banana in the basket, if there were any : A = (a ∧ -b) v (-a ∧ - b). Considering the evolution of the basket from each initial possible world, the agent considers now as possible each nearest worlds of the message, according to a natural distance reflecting the physical operation of withdrawing bananas if any : K * A = (a ∧ - b) v (-a ∧ - b); hence, the change process violates Preservation and ensures Left-distributivity. 2.2. WEAK PROBABILISTIC RULES Probabilistic change rules generalizing Bayesian conditioning and Lewisian imaging can be constructed,. Since generalized methods are searched, they must apply to all situations, especially to the case where P(A) = 0. Moreover, they must apply in the same way for any initial belief and message. 2.2.1. GENERAL FEATURES Each rule associates two operations : - an operation of selection of the worlds which belong to the support of the revised probability distribution. This operation is summarized by an extended Kronecker function (the selection function) representing the distance notion : δ(w; A, .) = 1 iff w is a nearest world in A - an operation of allocation of the prior probabilities to the selected worlds. This operation is summarized by the differential weight given to the selected worlds (the allocation function) : φ (w; A, P, .) is the weight of the world w In both cases, the final dot will be hereafter replaced by a precise but changing argument (K, w' or nothing). These two operations are not independent. Indeed, the allocation operation is designed to complete the selection operation when there are more than one new world selected. Hence, it is supposed to be compatible with the chosen selection function. More precisely, the allocation operation must give a strictly positive amount of probability to each selected world : δ(w; A, .) = 1 ⇒ φ (w; A, P, .) > 0 Remark : The dependence of the allocation function to A and P could be interpreted as meaning that it differs for any message A and prior P. But this is not in the spirit of a rule which has to be given in a general form and is afterwards simply applied to the precise knowledge of A and P. Hence, the dependence of the allocation function to A and P must be interpreted as the dependence of φ to P(w) and P(A). A change rule is completely described by the value of the revised probability of each world P*A(w), the probability of any event X being then naturally computed as : (Ĝ) P*A (X) = ∑

∈XwA wP )(*

11

2.2.2 WEAK PROBABILISTIC REVISING RULES The General Conditioning rule(GC) is defined as follows : - Let ≤K be a total preorder on W, summarized by an extended Kronecker function (the selection function) such that : (γ1) δ(w; A, K) = 1 iff δ(w,A) = 1 and ∀w’, δ(w’,A) = 1 ⇒ w ≤K w’, where the usual Kronecker function δ(w,A) equals one iff w belongs to A. - Let φ (w; A, P) be a positive function (the allocation function) compatible with the selection function, i.e. such that : (γ2) δ (w; A, K) = 1 ⇒ φ (w; A, P) > 0 Weak General Conditioning (GCW) specifies the allocation process by the following formula :

(GCW) P*A(w) = P1(w/A) = ∑

∈WwPAwKAw

PAwKAw

'1

1

),;'().,;'(),;().,;(

φδφδ

This method allocates the total weight of all worlds between all selected worlds. It applies even for a message such that P(A) = 0 and defines a probability distribution according to (γ1) and (γ2). The representation theorem for weak revising is (proof in appendix 2):

Theorem 1 : The weak probabilistic revising axiom set Br is satisfied if and only if the change rule belongs to the Weak General Conditioning method (GCW). A precise General conditioning rule is generated by (GCW) only when the functions δ(w; A, K) and φ1(w; A, P) are well specified. For instance, when P(A) > 0, an α - rule is defined by : - δ(w; A, K) = 1 iff w ∈ K ∩ A. - φ1(w; A, P) = P(w)α, α being a finite real number. When α = 1, it reduces to the standard Bayes' rule of conditioning where the probability of all worlds is allocated to the selected worlds proportionally to their prior probability. When α > 1, it reduces to a « strengthened Bayes' rule », which favors the worlds with highest probability. When α < 1, it reduces symmetrically to a « weakened Bayes' rule ». Especially, when α = 0, it leads to the « Egalitarian rule », which gives equal probability to all selected worlds, and can be applied even if P(A) = 0. 2.2.3.WEAK PROBABILISTIC UPDATING RULES The General Imaging method (GI) is defined by a set of selection functions as follows: - Let {≤w} be set of total preorders on W, each summarized by an extended Kronecker function (the selection functions) such that : (η1) δ (w; A, w’) = 1 iff δ (w,A) = 1 and ∀ w’’, δ (w’’; A) = 1 ⇒ w ≤w’ w’’ - Let φ1 (w; A, P) be a positive function (the allocation function) compatible with the previous selection function, i.e. such that : (η2) 1)',;(max

'=

∈wAw

Wwδ ⇒ φ1 (w; A, P) > 0.

Weak General Imaging (GIW) specifies the allocation process by the following formula :

12

(GIW) P*A(w) = P1(w // A) = ∑

∈ ∈

∈

Ww Ww

Ww

PAwwAw

PAwwAw

''1'

1'

),;''()]',;''(max[

),;()]',;(max[

φδ

φδ

This method allocates the total weight of all worlds between all selected worlds. It defines a probability distribution according to (η1) and (η2) and applies again even if P(A) = 0. The representation theorem for weak updating is the following (proof in appendix 2):

Theorem 2 : The weak probabilistic updating axiom system Bu is satisfied if and only if the change rule belongs to the Weak General Imaging method (GIW). More precisely, α - rules can again be defined by stating : φ1(w;A,P) = [P(w)]α, α being a finite real number. 2.3.STRONG PROBABILISTIC RULES Strong probabilistic rules associated to strong axioms can now be obtained by specification of the allocation function translating the weights from initial worlds to final ones. 2.3.1.STRONG PROBABILISTIC REVISING RULES Strong General Conditioning (GCS) specifies the allocation process by the following formula :

(GCS) P*A(w) = P2(w/A) = ∑

∈

−+∩

WwPAwKAw

PAwKAwAPAwP

'2

2

),;'(),;'(),;(),;(

).()(φδ

φδ

This method allocates the total weight of all excluded worlds between all selected worlds, while keeping at least the prior probability of the selected worlds. It again defines a probability distribution according to (γ1) and (γ2). Remark : For each rule, normalized allocation functions can be defined by :

∑∈

=Ψ

Ww

PAwKAwPAwPAw

'

),;'(),;'(),;(),;(

φδφ

Hence, noticing that P(w∩A) = δ(w; A, K) P(w), (GCS) is a special case of (GCW) obtained by adding the following constraint to define a “strong” normalized allocation function :

),;()()(),;( 12 PAwAPwPPAw Ψ−+=Ψ , from a “weak” one ),;(1 PAwΨ . The representation theorem for strong revising is the following (proof in appendix 2):

Theorem 3 : The strong probabilistic revising axiom set B#r is satisfied if and only if the change rule belongs to the Strong General Conditioning method (GCS). A precise General conditioning rule is generated by (GCS) only when the functions δ(w; A, K) and φ2(w; A, P) are well specified. For instance, when P(A) > 0, an α -rule is defined by : - δ(w; A, K) = 1 iff w ∈ K ∩ A. - φ2(w; A, P) = P(w)α, α being a finite or infinite real number. When α = 1, it reduces again to the standard Bayes' rule. When α = 0, it leads to a « Distorted egalitarian rule ». When α = ∞, it leads to a « Lexicographic rule » which allocates the total probability of the

13

eliminated worlds to the selected world with highest probability (each selected world keeping nevertheless a positive probability). When P(A) > 0, the Bayes' rule is the only one which is simultaneously an α−rule for (GCW) and for (GCS), the allocation function being moreover the same. Bayesian General Conditioning (GC-B) is defined as the usual Bayes' rule when P(A) > 0 and by any allocation rule such that: φ1(w; A, P) = φ2(w; A, P) = φ(w), when P(A) = 0. The representation theorem for super strong revising is (proof in appendix 2) :

Theorem 4 : The super strong revising axiom system B##r is satisfied if and only if the change rule belongs to the Bayesian General Conditioning method (GC-B) Remark : This result is different from the Gärdenfors (1988) one, which does not make clear the distinction between the different levels of transcription : the representation theorem for justifying the Bayes' rule requires Linear Mixing, which is a demanding transcription of Right distributivity, though still a weaker one than the Gärdenfors axiom (P+1). 2.3.2.STRONG PROBABILISTIC UPDATING RULES Let now {φ2 (w; A, P, w’)} be a set of positive functions (the allocation function) compatible with the previous selection function, i.e. such that : (η2)’ δ (w; A, w’) = 1 ⇒ φ2 (w; A, P, w’) > 0 , ∀ w’. Strong General Imaging (GIS) specifies the allocation process by the following formula :

(GIS) P*A(w) = P2(w//A) = ∑∑

∈∈

WwWw wPAwwAw

wPAwwAwwP

''2

2

' )',,;''()',;''()',,;()',;(

)'(φδ

φδ

This method allocates the probability of each world to the corresponding selected worlds. It defines a probability distribution according to (η1) and (η2)’ and applies again to any message even if P(A) = 0. It generalizes Imaging, first introduced by Lewis (1976) in the special case where each world has only one nearest world (hereafter “Lewisian Imaging”). Remark : Starting from a set {φ1 (w; A, P, w’)} of allocation functions satisfying (η2)’, (GIW) can be written equivalently :

(GIW) P*A(w) = P1(w // A) = ∑ ∑∈∈ ∈

∈

WwWw Ww

Ww

wPAwwAw

wPAwwAwwP

'''

1'''

1'''

)',,;''()]''',;''(max[

)',,;()]''',;(max[)'(

φδ

φδ

It follows obviously that (GIS) is a special case of (GIW) since it corresponds to : φ1(w;A,P,w’) = φ2(w;A,P,w’) if δ(w;A,w’) = 1 φ1(w;A,P,w’) = 0 if δ(w;A,w’) = 0 The representation theorem for strong updating is (proof in appendix 2):

Theorem 5 : The strong probabilistic axiom system B#u is satisfied if the change rule belongs to the Strong General Imaging method (GIS) More precisely, α-rules can again be defined by: φ2(w; A, P, w’) = [P(w)]α , α being a finite real number. When α = 0, the Egalitarian rule φ2 (w; A, P, w’) = 1 is always applicable, the prior probability of each initial world being allocated equally between its nearest A-worlds (Lepage, 1991).

14

When α ≠ 0, the α-rules can be applied only when all the nearest worlds from w have a positive prior probability, not for technical reasons but in order to satisfy (η2)’. However, mixed rules defined by : φ2(w; A, P, w’) = c + (1-c) [P(w)]α (0 < c < 1) are always applicable. P-independent General Imaging (GIP) is defined by adding to (GIS) the following constraint: ∀w, φ2(w; A, P, w’) = φ2(w). The Egalitarian rule is in fact a special case of a P-independent rule. The representation theorem for super-strong updating is (proof in appendix 2):

Theorem 6 : The super-strong probabilistic revising axiom system B##u is satisfied if and only if the change rule belongs to the P-independent method (GI-P) 2.3.3.CANONICAL EXAMPLE The probabilistic representation theorems can be illustrated by extending the canonical example to a numerical framework : consider prior probabilities affected to each possible world of K, say 1 /2 to (a ∧ b), 1/3 to (-a ∧ b), 1/6 to (a ∧ -b) and 0 to (-a ∧ -b). Revising leads to posterior probability 1 affected to the world (a ∧ -b). Updating leads to probability 2/3 to (a ∧ - b), nearest world from (a ∧ b) and from itself and 1 /3 to (-a ∧ - b), nearest world from (-a ∧b). This example points to the different selection functions used in the two contexts, but leaves away the problem of the allocation function ; hence, a second basket example can be considered : the initial belief is kept, but the message becomes A’ = - (a ∧ b) = (-a ∧ b) v (a ∧ -b) v (-a ∧ -b). Interpreting the message as « there is at most one fruit at t», revising leads to posterior probability 1 associated to (-a ∧ b) and (a ∧ - b) conjointly. Hence, it leaves open the problem of the allocation of the total weight among the two worlds of the intersection : - using (GCW) α - rules, the Bayes' rule leads to posterior probability 2/3 for (- a ∧ b) and 1/3 for (a∧-b). The Egalitarian rule leads to 1 /2 for (- a ∧ b) and 1 /2 for (a∧- b). But the 3-rule leads to 8/9 for (- a ∧ b) and 1/9 for (a∧- b). In this last case, the posterior probability of a ∧-b is less than its prior probability. This effect does not happen with the second method of General conditioning (GCS). - using (GCS) α- rules, the Distorted egalitarian rule leads to posterior probability 7/12 for (-a ∧ b) and 5/12 for (a ∧ -b). The Bayes' rule leads again to 2/3 and 1/3 respectively. The 3-rule leads to 7/9 and 2/9, the posterior probability of a ∧-b being more than its prior probability. The Lexicographic rule leads finally to 5/6 and 1/6. Interpreting the message as « if there were two fruits, one has been removed at t+1 », updating leads to leave the probabilities 1/3 on (-a ∧ b) and 1/6 on (a ∧- b) and to transfer probability ½ conjointly to (-a ∧ b) and (a ∧ - b) (nearest worlds from a ∧ b). Hence, it goes beyond Lewis Imaging by considering that several worlds can be nearest from one world, an allocation function being again necessary. Using (GIS) α - rules, the 1- rule leads to the same result than Bayes' rule, say 2/3 to (-a∧b), 1/3 to (a ∧-b). The Egalitarian rule leads to the same result than the General conditioning Distorted egalitarian rule, say 7/12 and 5/12 respectively. Remark 1 : The fact that P-dependent rules do not necessarily satisfy Homomorphism can be illustrated by considering the mixing of two probability distributions addressed by the same message : Within the previous example, consider another probability distribution P’ giving 1 /2 to (a ∧ b) and ½ to (a ∧- b), and 0 to other worlds, the agent believing indeed at t that there is an apple in the basket. When applying the updating message A’, the posterior probabilities resulting from the 1-rule are 1 /4 to (- a ∧ b) and 3 /4 to (a ∧ -b). With the same 1-rule, the previous prior P - giving ½ to (a ∧ b), 1/3 to (-a∧ b) and 1/6 to (-a ∧ b) - leads to 2/3 to (-a∧b) and 1/3 to (a ∧-b). An average prior probability Q = 1 /2 (P+P’) gives 1/ 2 to (a ∧ b), 1/6 to (-a ∧ b) and 1/ 3 to (a ∧-b) ; the posterior probability resulting from the 1-rule gives 1/3 to (-a ∧ b) and 2/3 to (a ∧ -b). It differs from the average posterior probabilities, which give 11/24 to (-a ∧ b) and 13/24 to (a ∧ -b). Remark 2 : The fact that Bayes' rule does not satisfy Homomorphism can be considered as an explanation of the well-known Simpson paradox (Simpson, 1951), illustrated by the following example : Consider a town T with 12000 inhabitants, i.e. 8000 men among which 5000 are sick and 4000 women among which 3000 are sick. Consider a town T’ with 8000 inhabitants, i.e. 2000 men among which 0 are sick and 6000 women among which 1000 are sick. If X is the event « to be sick » and A the event « to be a man », one has respectively in both towns :

15

P(A) = 2/3

P(X) = 2/3 PA(X) = 5/8 < P-A(X) = 3 /4

P’(A) = 1 /4

P’(X) = 1/8 P’A(X) = 0 < P’-A(X) = 1/6

In each town, women are proportionally more sick than men. By gathering both towns (according to their populations), one has 20000 inhabitants, 10000 men among whom 5000 are sick and 10000 women among which 4000 are sick: Q(A) = 1 /2 Q(X) = 9/20

QA(X) = 1 /2 > Q-A(X) = 2/5

Now men are sicker than women are. This corresponds to Homomorphism with a = 3/5. 2.3.4.EQUIVALENCE WITH MILLER-POPPER AXIOM SYSTEM Miller – Popper (1994) have proposed a probabilistic change axiomatic system which relies on conditional probability as a basic object, hence addressing the zero probability puzzle, but without proposing a constructive rule for solving the puzzle. Several sets of axioms of different strength were proposed by Miller and Popper. The system B, the strongest one, ensures that its models are reducible to a Boolean algebra (see Bradley, 1997). The probability distributions satisfying system B can be precisely characterized (Spohn, 1986). This system will be written hereafter not as originally with propositions x, y, z, but with events X, Y, Z, the probability of X conditionally to Y being denoted P(X/Y). The six axioms are followed by a convention linking ordinary probability to conditional probability : PM1 : 0 ≤ P(X/Y) ≤ P(Z/Z) PM2 : ∃ X, Y, s.t. P(X/Y) ≠ 0 PM3 : P(X∩Y/Z) ≤ P(X/Z) PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z) PM5 : P(XUY/Z) + P(X∩Y/Z) = P(X/Z) + P(Y/Z) PM6 : if ∃Y s.t. P(Y/Z) ≠ P(Z/Z) then ∀X: P(X/Z) + P(-X/Z) = P(Z/Z) Convention : P(X) = P(X/W) The following theorem (proof in appendix 3) shows that the Miller-Popper system is equivalent to the super-strong axiomatic system for revising B##r. It follows that it can be represented by the Bayesian General Conditioning method (GC-B), which offers a family of constructive rules for the zero-probability case :

Theorem 7 : A revised probability distribution satisfies the Miller-Popper axiom system B if and only if it satisfies the super-strong axiom system B##r.

APPENDIX 1: SET-THEORETIC DERIVED AXIOMS

Under Success (A2), Sub-expansion (A4) and Super-expansion (A5) are equivalent to Right distributivity (A45).

Let Dual sub-expansion be the following axiom : K * (A U B) ∩ A ⊆ (K * A) Then : under A2, A4 ⇔ Dual sub-expansion

16

Indeed : A4 ⇒ Dual sub-expansion is obvious since (AUB) ∩ A = A. Dual sub-expansion ⇒ A4 : For any A and B, take C = A ∩ B and D = A ∩ -B; since A = CUD, Dual sub-expansion gives (K * A) ∩ (A ∩ B) ⊆ K * (A ∩ B). Hence, by A2, (K * A) ∩ B ⊆ K * ( A ∩ B) Let Dual super-expansion be the following axiom : If K*(AUB) ∩ A ≠ ∅ then K* A ⊆ (K * (A UB) ∩ A). Then : under A2, A5 ⇔ Dual super-expansion Indeed : A5 ⇒ Dual super-expansion is obvious since (AUB) ∩ A = A. Dual super-expansion ⇒ A5 : For any A and B, take C = A ∩ B and D = A ∩ -B; since A = CUD, Dual super-expansion gives: if (K*A) ∩ (A ∩B) ≠ ∅ then K*(A ∩B) ⊆ (K*A ) ∩ ( A ∩ B). Hence, by A2, K*(A ∩ B) ⊆ (K*A) ∩ B It follows : Dual sub-expansion and Dual super-expansion ⇒ A45. Indeed : Dual sub-expansion implies by disjunction: [K * (AUB) ∩ A] U [K*(AUB) ∩ B] ⊆ (K * A) U (K * B) Hence, by A2, K * (A U B) ⊆ (K * A) U (K * B) By A2 again this implies moreover: If K * (AUB) ∩ A = ∅ then K * (A U B) ⊆ K * B Dual super-expansion implies straightforwardly: If K * (AUB) ∩ A ≠ ∅ then K * A ⊆ K * (A U B) Hence, the following table can be established: K*(AUB) ∩ B ≠ ∅ ∩ B = ∅ ∩ A ≠ ∅ (K * A) U (K * B) K * A ∩ A = ∅ K * B Impossible A45 ⇒ Dual Sub-expansion and Dual super-expansion is obvious when considering the previous table and the equivalent formulation of Dual sub-expansion: K * (A UB) ∩ - B ⊆ K * A.

Preservation (A5)’ and Left-distributivity (A7) are contradictory

Consider K such that K ∩ A = ∅, and H = A\(K*A) ≠ ∅ ; let K’ = K U H From A5’ and K’ ∩ A ≠ ∅, K’ * A ⊆ K’ ∩ A = H But from A7’, K ⊆ K’ ⇒ (K * A) ⊆ (K’*A). Thus A5’ and A7’ are contradictory. Consequently A5’ and A7 are contradictory since A7 ⇒ A7’

APPENDIX 2 : REPRESENTATION THEOREMS

Theorem 1 : The weak probabilistic revising axiom set Br is satisfied if and only if the change rule belongs to the Weak General Conditioning method (GCW).

If sense : From (γ1) and (γ2) it is easy to compute that : Sup(P*A ) = {w, P*A(w) > 0} = {w, δ (w,A) = 1 and ∀w’, δ (w’; A) = 1 ⇒ w ≤K w’} Consequently, assuming the conditions (i) and (ii) on ≤K, the theorem KM1 implies that the change rule producing Sup(P*A) from Sup(P) satisfies the set of revising axioms Ar. The conclusion follows by considering lemma 1. Only if sense : Lemma 1 asserts that if P*A satisfies Br, Sup(P*A ) satisfies Ar. Theorem KM1 asserts then the existence of the selection function δ(w; A, K). Since no further function is assumed, the allocation function is completely general, under the only restriction of (γ1) and (γ2).

Theorem 3 :

17

The strong probabilistic revising axiom set B#r is satisfied if and only if the change rule belongs to the Strong General Conditioning method (GCS).

If sense : Since (GCS) is a special case of (GCW), theorem 2 ensures that it satisfies the set of axioms Br. It satisfies the additional axioms too : (GCS) satisfies obviously B#3 (contrary to GCW): If P(A) = 1 then P(-A) = 0 and P(w∩A) = P(w), ∀w ∈ A Hence P2(w/A) = P(w), ∀w ∈ A (i.e. ∀w s.t. P(w) > 0), and by (GCX), P2(X/A) = P(X), ∀X. (GCS) also satisfies B#4 (contrary to GCW): According to (Ĝ) it is enough to show that the axiom is satisfied for each world : (1) P2(w/A∩B) ≥ P2(w∩B/A), ∀ w ∈ W Using (GC2) :

P2(w/A∩B) = ∑

∈

∩∩

−∪−+∩∩

WwPAwKBAw

PAwKBAwBAPBAwP

''2

2

),;''(),;''(),;(),;().()(

φδφδ

P2(w∩B/A) = ∑

∈

∩−+∩∩

Ww

PAwKAwPAwKABwAPBAwP

''2

2

),;''(),;''(),;(),;().()(

φδφδ

(with the convention δ(∅; A, K) = 0). The Sub-expansion axiom A4 writes by using the Kronecker functions: If δ(w,B) = 1 and δ(w; A, K) =1 then δ(w; A ∩ B, K) =1 or δ(w; A ∩ B, K) ≥ δ (w; A, K) δ(w; B), ∀w The Super-expansion axiom A5 writes similarly: If ∃ w s.t. δ(w; B) = 1 and δ(w; A, K) = 1 then δ(w; A ∩ B, K) = 1 ⇒ δ(w; A, K) = 1 and δ(w; B) = 1, ∀w. Or : if ∃ w s.t. δ(w; B) = 1 and δ(w; A, K) = 1 then δ(w; A∩B, K) ≤ δ(w; A, K) δ(w; B) , ∀ w. Hence, if it exists a world in B selected in A: ∀w ∈W, δ(w; A ∩ B, K) ≥ δ (w∩B; A, K) ∀w ∈W, δ(w; A ∩ B, K) ≤ δ(w; A, K) Moreover: P2(-A U B) ≥ P2(-A) The inequality (1) follows by combination. If it exists no world in B selected in A, P2(w ∩ B/ A)= 0 ∀w and the inequality (1) is automatically satisfied. Only if sense : In (GCW), call Ψ1(w; A,P) the normalized allocation function :

∑∈

=Ψ

WwPAwKAw

PAwPAw

'1

11 ),;'(),;'(

),;(),;(φδ

φ

The revised probability given by (GCS) writes : P*

A (w) = δ(w;A,K) Ψ1(w; A,P) Hence: P*

A (w) – P(w ∩ A) = δ(w;A,K) (Ψ1(w; A,P) – P(w)) = δ(w;A,K) Ψ’2(w; A,P) with Ψ’2(w; A,P) = Ψ1(w; A,P) – P(w) Taking the weighted average on both sides : (2) ∑ ∑∑

∈ ∈∈

−Ψ=ΨWw WwWw

wPKAwPAwKAwPAwKAw )(),;(),;(),;(),;('),;( 12 δδδ

= 1 – P(A) = P(-A) Consider now : Ψ’2(w; A,P) = P(-A) Ψ2(w; A,P) According to B#4, Ψ2(w; A,P) is positive According to (2), Ψ2(w; A,P) is normalized Hence, Ψ2(w; A,P) is a normalized allocation function related to Ψ1(w; A,P) by :

18

Ψ1(w; A,P) = P(w) + P (-A) Ψ2(w; A,P) The revised probability is thus given by (GCS) with :

∑∈

=Ψ

WwPAwKAw

PAwPAw

'2

22 ),;'(),,'(

),;(),;(φδ

φ

This theorem shows that the strong axiomatic system B#r does not single out the Bayes' rule, but the more general class of rules (GCS). In order to single out the Bayes' rule, the super-strong axiom system, which relies on Linear mixing, is needed :

Theorem 4 : The super-strong revising axiom system B##r is satisfied if and only if the change rule belongs to the Bayesian General Conditioning method (GC-B)

If sense: (GC-B) is trivially a special case of (GCS). Hence, theorem 3 ensures that it satisfies the system B#r. It satisfies also the additional axiom of Linear Mixing B##45. Indeed, it satisfies the condition: If A ∩ A’ = ∅ then P*AUA’ (X) = a P*A(X) + (1-a) P*A’(X) • If P(A) P(A’) ≠ 0:

)'()'(

)'()()'(

)()(

)'()()(

)'()()'()(

)'())'()((

)'())'(()(*

'

APAXP

APAPAP

APAXP

APAPAP

APAPAXPAXP

AAPAXAXP

AAPAAXPXPAUA

∩+

+∩

+

=+

∩+∩=

∪∩∪∩

=∪

∪∩=

)()1()()( *'

**' XPaXaPXP AAAUA −+= with )(

)'()()( *

' APAPAP

APa AUA=+

=

• If P(A) = 0 and P(A’) ≠ 0 (or conversely):

)()1()()()( *'

***' XPaXaPXPXP AAAAUA −+== with a = 1

• If P(A) = 0 and P(A’) = 0:

∑∑

∈

∈=

Ww

XwAUA wKAUAw

wKAUAwXP

'

*' )'(),';'(

)(),';()(

φδ

φδ, because in this case φ1(w, A, P) = φ (w)

P*A(X) and P*A’(X) can be written in the same way, by substituting respectively A and A’ to AUA’. By the fact that A ∩ A’ = ∅, three cases have to be considered: a. δ(w; AUA’, K) = δ(w; A, K) ∀w ∈ W, hence Linear mixing is satisfied with a = 1 b. δ(w; AUA’, K) = δ(w; A’, K) ∀w ∈ W, hence Linear mixing is satisfied with a = 0 c. δ(w; AUA’, K) = δ(w; A, K) + δ(w; A’,K) ∀w ∈ W, hence Linear mixing is again satisfied with:

)(),';'(

),;'(*

'

'

' APKAUAw

KAwa AUA

Ww

Ww ==∑

∑

∈

∈

δ

δ

Only if sense: Consider A’ = {w’} ⊆ K and A = K \ A’. If P*A (X) is computed along a (GC2) rule, then for any w ∈ A:

)()(*' wPwPAUA =

19

∑∈

+=

Aw

A wwwPwPwP

''2

2*

)''()()'()()(

φφ

P*A’(w) = 0 Consequently, by applying Linear mixing:

∑∈

=−

Www

wwPwPa

''2

2

)''()()'()()1(

φφ

Hence, if there are at least two worlds w1 and w2 in A:

)()(

)()(

22

12

2

1

ww

wPwP

φφ

=

The same holds when taking another w’ ∈ K. Finally, ∀ w ∈ K, φ2(w) = kP(w), an allocation function characterizing the Bayes' rule.

Theorem 2 : The weak probabilistic updating axiom system Bu is satisfied if and only if the change rule belongs to the Weak General Imaging method (GIW).

If sense : Considering (η1) and (η2), it is easy to check that : Sup(P*A ) = {w, P*A(w) > 0} = {w, δ (w,A) = 1 and ∃ w’, ∀w’’, δ (w’; A) = 1 ⇒ w ≤w’ w’’} Consequently, assuming the condition (j) on ≤w’ , the theorem KM2 implies that the change rule that maps Sup(P) to Sup(P*A ) satisfies the set of updating axioms Au. Considering lemma 2, this means that (GIW) satisfies the set Bu. Only if sense : Lemma 2 asserts that if P*

A satisfies Bu, its support K*A satisfies Au. Theorem KM2 asserts then the existence of a selection function. Since no further constraint is assumed, the allocation function is completely general, under the only restriction of (η1) and (η2).

Theorem 5 : The strong probabilistic axiom system B#u is satisfied if the change rule belongs to the Strong General Imaging method (GIS)

If sense : Since (GIS) is a special case of (GIW), theorem 5 ensures that it satisfies system Bu. It satisfies the additional axioms too : (GIS) satisfies obviously B#3 : If P(A) = 1 then δ(w; A, w) = 1, ∀w ∈ A and δ(w; A ,w’) = 0, ∀w ∈ A, ∀ w’≠w Hence P(w//A) = P(w), ∀ w ∈ A (i.e. ∀ w s.t. P(w) > 0) and by (GCX), P(X//A) = P(X), ∀ X. (GIS) satisfies B#4 : According to (Ĝ), it is enough to show that the axiom is satisfied for each world: (1) P(w//A∩B) ≥ P(w∩B //A), ∀w ∈ W Using (GIS):

P2(w//A∩B) = ∑∑

∈∈ ∩

∩

WwWw wPAwwBAw

wPAwwBAwwP

''2

2

' )',,;''()',;''()',,;()',;(

)'(φδ

φδ

P2(w∩B//A) = ∑∑

∈∈

∩

WwWw wPAwwAw

wPAwwABwwP

''2

2

' )',,;''()',;''()',,;()',;(

)'(φδφδ

Here again, the Sub-expansion axiom A4 as well as the Pointwise Super-expansion axiom A6 apply to K = {w’} and can be stated with the Kronecker functions: δ (w; A ∩ B, w’) ≥ δ(w; A, w’) δ(w; B), ∀ w ∈ W

20

If ∃ w s.t. δ(w; B) δ(w; A, w’) = 1 then δ(w; A ∩ B, w’) ≤ δ(w; A, w’) δ(w; B), ∀w ∈ W. In fact, these inequalities are straightforward consequences of any system of nested spheres. Hence, if it exists a world in B which is among the nearest worlds of w’ in A: ∀ w ∈ W, δ(w; A ∩ B, w’) ≥ δ(w ∩ B; A, w’) ∀ w ∈ W, δ(w; A ∩ B, w’) ≤ δ (w; A, w’) By combination, for each w’ s.t. ∃ a world in B which is among its nearest worlds in A:

(1)’ ∑

∈

∩∩

WwwPAwwBAw

wPAwwBAw

''2

2

)',,;''()',;''()',,;()',;(

φδφδ

≥ ∑

∈

∩

WwwPAwwAw

wPAwwABw

''2

2

)',,;''()',;''()',,;()',;(

φδφδ

If there exists no world in B which is among the nearest worlds of w’ in A, (1’) still holds because δ(w∩B; A, w’) = 0. By summing on the w’, (1)’ implies (1). (GIS) satisfies also B#7 : Consider a probability distribution Q(w) = a P(w) + (1-a) P’(w), a ∈ ]0,1[, ∀w ∈ W. Hence :

∑ ∑∈∈

=Ww

WwwPAwwAw

wPAwwAwwPAwP

'''

2

22 )',',;''()',;''(

)',',;()',;()'(')//('

φδφδ

∑∑∈

∈

−+=

WwWw wQAwwAw

wQAwwAwwPawaPAwQ

''2

2

'2 )',,;''()',;''(

)',,;()',;())'(')1()'(()//(

φδφδ

The fact that 0)//('2 =AwP implies that ',0)',;()'(' wwAwwP ∀=δ .

Hence )//(2 AwQ does not depend of the distribution P’. Only if sense : Any distribution P( . ) can be written :

∑∈

Π=Ww

wwPP'

,.)'()'((.)

where ,.)'(wΠ is the probability distribution concentrated in w’. It is easily computed that :

∑∈

=Π

Ww

A wPAwwAwwPAwwAwww

''2

2

)',,;''()',;''()',,;()',;(),'(*

φδφδ

If δ(w;A,w’) = 0, 0),'(* =Π wwA ; hence, according to axiom B#7, P*A(w) does not depend on P(w’). The rule (GIW) reduces then to the rule (GIS) where the weight P(w’) is affected in P*A (w) only if δ(w;A,w’) = 1.

Theorem 6 : The super-strong probabilistic revising axiom system B##u is satisfied if and only if the change rule belongs to the P-independent method (GI-P)

If sense : (GI-P) is trivially a special case of (GIS). Hence theorem 6 ensures that it satisfies system B#u. It satisfies also the additional axiom of Homomorphism B##7 : Consider two prior probability distribution P and P’ and a message A. Then, for every world w, Strong General Imaging (GIS) with φ2(w; A, P, w’) = φ(w) leads to :

(1) P2(w//A) = ∑∑

∈∈

WwWw wwAw

wwAwwP

''' )''()',,''(

)()',,()'(φδ

φδ = )()',,()'(

'

wwAwwPWw

Ψ∑∈

δ

21

(2) P’2(w//A) = )''()',,''(

)()',,()'('

''' ∑∑

∈∈

WwWw wwAw

wwAwwPφδ

φδ = )()',,()'('

'

wwAwwPWw

Ψ∑∈

δ

Consider now the average distribution Q such that for each w’ : (3) Q(w’) = a P(w’) + (1-a) P’(w’) For each w, the change of Q(w) leads to :

(4) Q2(w//A) = Q ww A w ww A w ww W

w W

( ' )( , , ' ) ( )( ' ' , , ' ) ( ' ' )'

''∈

∈

∑ ∑δ φδ φ

= Q w w A w ww W

( ' ) ( , , ' ) ( )'∈∑ δ Ψ

But the averaging of P2(w//A) and P’2(w//A) leads to : (5)aP2(w//A)+ (1-a) P’2(w//A) = )()',,()'(

'

wwAwwaPWw

Ψ∑∈

δ + )()',,()'(')1('

wwAwwPaWw

Ψ−∑∈

δ

Considering (3), the equality of (4) and (5) is straightforward. But if this is true for any world w, this is also true for any event X, according to (Ĝ). Only if sense : A (GIS) rule using a P-dependent rule cannot satisfy Homomorphism. Indeed consider a world w’ with two nearest worlds in A, say w1 and w2. The part of the posterior probability of w1 due to w’ writes :

)',,;()',,;()',,;(

)'()//(2212

1212 wPAwwPAw

wPAwwPAwP

φφφ

+=

The same holds for prior probability P’. Hence, Homomorphism implies that )',,;()',,;(

22

12

wPAwwPAw

φφ

be

constant. In other terms, it implies that φ2(w1; A, P, w’) = φ(w1), i.e. does not depend on the prior probability distribution

APPENDIX 3 : EQUIVALENCE WITH MILLER-POPPER AXIOMS

Theorem 7 : A revised probability distribution satisfies the Miller-Popper axiom system B if and only if it satisfies the super-strong axiom system B##r.

Only if sense : B1 (Consistency) : obvious B2 (Success) : in PM4, take X = Y = Z : P(X/X) = P(X/X) P(X/X) hence P(X/X) = 0 or 1, ∀X by PM1, P(X/X) = 0 implies P(X/Y) = 0, ∀ X, Y, contradicting PM2 hence P(X/X) = 1 ∀X Three consequences concerning the probability of W and ∅ can be derived : - in PM4, take X = Z, Y = W : P(Z/Z) = P(Z/Z) P(W/Z) hence P(W/Z) = 1 - in PM6, take X = W : if Z ≠ ∅, P(W/Z) + P(∅/Z) = P(Z/Z) hence P(∅/Z) = 0, ∀Z ≠ ∅ - in PM4, take X = Z = ∅ : P(∅/∅) = P(∅/∅) P(Y/∅)

22

hence P(Y/∅) = 1, ∀ Y B#3 (Strong conservation) : in PM5, take X = X’∩Y’ and Y= X’ ∩ -Y’ P(X’/Z) = P(X’∩Y’/Z) + P(X’∩-Y’/Z) (1) by applying PM4 to the last two terms : P(X’/Z) = P(X’/Y’∩Z) P(Y’/Z) + P(X’/-Y’∩Z) P(-Y’/Z) assuming that P(Y’/Z) = 1, which implies by PM6 and B2 that P(-Y/Z) = 0, one gets : P(X’/Z) = P(X’/Y’∩Z) take now Z = W, i.e. P(Y’) = 1 under the previous assumption; one gets : P(X’) = P(X’/Y’) B#4 (Strong sub-expansion) : by PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z) by PM1 : P(y/Z) ≤ P(Z/Z) = 1 hence P(X∩Y/Z) ≤ P(X/Y∩Z) B5 (Super-expansion) : by PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z) hence if P(Y/Z) > 0 and P(X/Y∩Z) > 0 then P(X∩Y/Z) > 0 B##45 (Linear mixing) in (1), take Y’ = Y’’ U Z’’ with Y’’∩Z’’ = ∅ : P(X’/Z) = P((X’∩Y’’/Z) U (X’∩Z’’)/Z) + P(X’ ∩ - (Y’’UZ’’)/Z) applying again PM5 to the second term, since ((X’∩Y’’) U (X’∩Z’’) = ∅ : P(X’/Z) = P(X’∩Y’’/Z) + P(X’∩Z’’/Z) + P(X’∩-(Y’’UZ’’)/Z) applying now PM4 to the three terms : P(X’/Z) = P(X’/Y’’∩Z)P(Y’’/Z) + P(X’/Z’’∩Z) P(Z’’/Z) + P(X’/-(Y’’UZ’’))P(-(Y’’UZ’’)/Z) let Z = Y’’UZ’’, then the third term is null and one gets finally: P(X’/Y’’UZ’’) = P(X’/Y’’)P(Y’’/Y’’UZ’’) + P(X’/Z’’)P(Z’’/Y’’UZ’’) hence : P(X’/Y’’UZ’’) = aP(X’/Y’’) + (1-a) P(X’/Z’’) If sense : It must just be proved that the axioms of system B do not constraint the change rules more than the Bayesian generalized method allows. If P(Z/W) > 0, the change rule is just the Bayes' rule, as proved by stating Y’ = Z’ and Z = W in PM4: P(X/Z’) = P(X∩Z’) / P(Z’) If P(Z/W) = 0, the conditional probability P(./Z), which obeys necessarily PM1, PM2, PM3, PM5 and PM6 since it is a probability distribution, is just constrained by PM4. In fact, P(./Z) can be linked to P(./Z) in two ways : • Z ⊂ Z’ In PM4, take Z = Z’ such that Y∩Z’ = Z : P(X∩Y/Z’) = P(X/Z) P(Y/Z’) = P(X/Z)P(Z/Z’) For instance, if X = {w} and w ∈ Z, one has : P(w/Z’) = P(w/Z) P(Z/Z’) The constraint on P(w/Z) is acting only if P(Z/Z’) ≠ 0 (which is not the case when Z’ = W) • Z’ ⊂ Z In PM4, take Y = Z’: P(X∩Z’/Z) = P(X/Z’) P(Z’/Z) For instance, if X = {w} and w ∈ Z, one has : - if w ∈ Z’ : P(w/Z) = P(w/Z’)P(Z’/Z) - if w ∉ Z’ : 0 = P(w/Z’) P(Z’/Z) The constraint on P(w/Z) is acting only in the first case. In both cases, when the constraint is acting, one can write for two worlds w and w’ belonging to Z:

)'/'()'/(

)/'()/(

ZwPZwP

ZwPZwP

=

This is just the condition that the weights are allocated proportionally to the allocation functions φ(w) and φ(w’).

23

ACKNOWLEDGEMENTS The authors want to thank R.Bradley, D.Lehman, I.Levi and H.Zwirn for helpful comments. REFERENCES ALCHOURRON, C. E. - GÄRDENFORS, P. - MAKINSON, D. (1985): On the logic of theory change: partial meet contraction and revision functions, Journal of Symbolic Logic, 50, 510-530 BILLOT, A. - WALLISER, B. (1999): A mixed knowledge hierarchy, Journal of Mathematical Economics, 32, 185-205. BRADLEY, R. (1997) : More triviality, Journal of Philosophical Logic, Vol.28, N°2, , 12-139. COX, R.T. (1946): Probability, frequency and reasonable expectations, American Journal of Physics, 14, 1-13 DARWICHE, A. – PEARL, J. (1997): On the logic of iterated belief revision, Artificial Intelligence 89, 1-29. FRIEDMAN, N., HALPERN, J.Y. (1994) : A knowledge-based framework for belief change, Part II : revision and update", in J.Doyle, E.Sandewall, P.Tomasso, Eds, Principles of Knowledge Representation and Reasoning, Proc. Fourth International Conference KR'94, 1994, pp.190-201. GÄRDENFORS, P. (1982): Imaging and Conditionalization, The Journal of Philosophy, 79, 747-760. GÄRDENFORS, P. (1988): Knowledge in Flux, MIT Press GÄRDENFORS, P., SAHLIN, N.E. eds (1988) : Decision, probability and utility, selected readings. GÄRDENFORS, P. (1992): Belief revision: An introduction, In P. Gärdenfors ed.: Belief Revision, Cambridge University Press, 1-28 GRAHNE, G. (1991): Updates and counterfactuals. In J.Allen, R.Fikes and E.Sandewall eds., Proc. of the 2nd Inter. Conf. On Principles of Knowledge Representation and Reasoning (KR’91), Cambridge, Mass., April 22-25, 269-276 HECKERMAN, D.E. (1988): An axiomatic framework for belief updates, in J.F.Lemmer, L.N. Kanal eds, Uncertainty in Artificial Intelligence 2, North Holland, Amsterdam, 11-22. KATSUNO, A. - MENDELZON, A. (1991): Propositional knowledge base revision and minimal change, Artificial Intelligence, 52, 263-294 KATSUNO, A. - MENDELZON, A. (1992): On the difference between updating a knowledge base and revising it, in P. Gärdenfors ed.: Belief Revision, Cambridge University Press, 183-203 LAVENDHOMME, T. (1997): For a modal approach of knowledge revision and nonmonotonicity, mimeo, SMASH, Facultés universitaires Saint-Louis, Bruxelles LEPAGE, F. (1991): Conditionals and revision of probability by imaging, Cahiers du département de philosophie, N° 94-02, Université de Montréal LEPAGE, F. (1997): Revision of probability and conditional logic, mimeo. LEWIS, D.K. (1976): Probabilities of conditionals and conditional probabilities, Philosophical Review, 85, 297-315 LINDSTRÖM – RABINOWICZ (1989): On probabilistic representation of non-probabilistic belief revision, Journal of Philosophical Logic, 18, 69 -101.

24

MAKINSON, D. (1993): Five Faces of Minimality, Studia Logica, 52, 339-379. MILLER, D., POPPER, K. (1994): Contribution to the formal theory of probability, in P.Humphreys ed., Patrick Suppes : Scientific Philosopher, Vol.1, Kluwer, Dordrecht, 3-23. SIMPSON, E.H. (1951) : The interpretation of interaction in contingency tables, Journal of Royal Statistical Society, ser. B 13 :238-241. SPOHN, W. (1986) : The representation of Popper measures, Topoi 5, 69-74. TELLER, P. (1976): Conditionalization, observation and change of preference, in Foundations of Probability Theory , Statistical Inference, and Statistical Theories of Science, D.Reidel, Dordrecht, Vol.1, 205-259. WILLIAMS, P. (1980): Bayesian conditionalization and the principle of minimum information, British Journal of the Philosophy of Science, 31, 131-144.

Documents

CAN BAYES' RULE BE JUSTIFIED BY COGNITIVE RATIONALITY ... · world (Alchourron-Gärdenfors-Makinson,1985), and updating where the message gives some recent information on an evolving