Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
A Subjective Foundation of ObjectiveProbability
Luciano De Castro, UIUC&
Nabil I. Al-Najjar, MEDS
RUD 2009
To obtain a copy of this paper and other related work, visit:http://www.kellogg.northwestern.edu/faculty/alnajjar/htm/index.htm
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Similarity as basis for probability judgements Skip slide
In his classic 1937 article, de Finetti wondered:“How should an insurance company evaluate the probabilitythat an individual dies in a given year?”
De Finetti’s view was:First choose a class of “similar” events
use the frequency as base-line estimate of the probability
e.g., : “death in a given year of an individual of the sameage [...] and living in the same country.”
The choice of a class of “similar” events is subjective
..maybe “not individuals of the same age and country, butthose of the same profession and town, . . . etc, where onecan find a sense of ‘similarity’ that is also plausible.”
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Three fundamental ideas Skip slide
1 Similarity is not optional:probability judgements are founded on similarity
2 Similarity is exchangeabilityexchangeability formalizes the notion of similarity
3 Similarity is subjectiveSimilarity is the decision maker’s model or theory of theworld.
“we cannot repeat an experiment and look for a coveringtheory; we must have at least a partial theory before weknow whether we have a repetition of the experiment.”
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Alternative title:
Similarity First!
Luciano De Castro, UIUC&
Nabil I. Al-Najjar, MEDS
RUD 2009
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
De Finetti’s ideas in action Skip slide
Sequence of experiments, each with outcome S
S is Polish (complete, separable metric space)
σ-algebra S
State space:Ω = S × S × · · ·
Generic element ω = (s1, s2, . . .)
Borel σ-algebra Σ on Ω
Decision problem is static, but an inter-temporalinterpretation may be suggestive.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Acts and permutations Skip slide
An act is any bounded, measurable function
f : Ω→ R
Interpret values of acts as utils
F set of all acts
Π is the set of finite permutations π
Given an act f and a permutation π, define f π by
f π(s1, s2, ...) = f (sπ(1), sπ(2), ...)
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Stylized example: coin tosses Skip slide
S = H,T; generic ω = H,T ,T ,H, . . .
You might be interested in betting on the i th coin turningHeads
Typical assumption:
“the experiment is repeated under identical conditions”
“Absurd!” says de Finetti !!
If the experiments were really identical, then the coinshould either always turns up H or always T
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Exchangeability as model of similarity Skip slide
The fact that the outcome varies means that there areimportant factors that I do not understand or care to model
But I judge the experiments as “similar”
..means: these factors affect experiments symmetrically
De Finetti’s exchangeability: for all f ∈ F , π ∈ Π
f ∼ f π.
In the stylized coin example, H is equally likely in allexperiments
Heads in toss i ∼ Heads in toss j
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
De Finetti’s Theorem Skip slide
De Finetti’s theorem: a preferencethat is
1 exchangeable and
2 subjective expected utility
.. has the representation
P(A) =
∫θ∈Θ
Pθ(A) dµ.
1 Parameters: i.i.d. distributions Pθ indexed by θ ∈ Θ ≡ ∆(S)
2 Bayesian belief µ over the parameters Θ
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
De Finetti’s Theorem and its discontents Skip slide
Contentment: Parameters formally capture the extent to whichexperiments are similar
experiments are different enough to have differentoutcomes, but similar enough to share a commondistribution Pθ
The discontents: de Finetti’s theorem confounds similaritywith the decision criterion. It simultaneously identifies
the parameters Θ
and that the decision maker has a prior about them:
P(A) =
∫θ∈Θ
Pθ(A) d µ.
..but how about classical statisticians, ambiguity, etc?Similarity is an overarching principle for all decision making andinference; Bayesianism is not.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
1 Introduction
2 Structure
3 Subjective ergodic theory
4 Sufficient statistics
5 Ergodicity
6 Ambiguity
7 Conclusions
8 Appendix
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Preorder and Monotonicity Skip slide
Assumption 1: < is reflexive and transitive.(may be incomplete)
Assumption 2: If f (ω) ≥ g(ω) for all ω ∈ Ω, then f < g.
Assumption 3: For every x , y ∈ R,
x > y =⇒ x y .
Old Assumption: For any α ∈ R, f < g implies
f + α < g + α.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Null sets Skip slide
An event E is <-null if any pair of acts that differ only on E areindifferent.
Assumption 4:1E ∼ 0 =⇒ E is null;
α1E < β1E for some β > α =⇒ E is null.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Continuity Skip slide
We require continuity with respect to:
f n → f ⇐⇒ω ∈ Ω : lim
nf n(ω) 6= f (ω)
< is-null
Assumption 5:f n → f and gn → g;
f n < gn for all n
|f n(ω)| ≤ b(ω) and |gn(ω)| ≤ b(ω), for all ω and someb ∈ F .
Then f < g.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Exchangeability Skip slide
De Finetti’s condition:f ∼ f π
has little force without expected utility. In our more generalsetting, we need a stronger condition.
DefinitionA preference < is exchangeable if for every act f ∈ F andpermutations π1, . . . , πn
f ∼ f π1 + · · ·+ f πn
n.
Stylized coin example:
Heads in toss i ∼ average of Heads in tosses 1, . . . ,n
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
To simplify the exposition, for the remainder of the paper:
restrict to the set of acts F1 thatdepend only on the firstcoordinate
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Shifting states Skip slide
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Shifting acts Skip slide
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Ergodic Theorem
If < is exchangeable then for every act f , the act
f ?(ω) ≡ limn→∞
1n
n−1∑j=0
f(
T jω)
is well-defined, except on a <-null event, and
f ? ∼ f .
Note that f ?, when it exists, is objective. Its existence isconsequence of subjective assessment of similarity
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Ergodic Decomposition Skip slide
Eθ(ω) ≡ (f ?)−1(ω) ≡ all states ω′ such that f ?(ω′) = f ?(ω)
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Ergodic Decomposition Skip slide
Starting from ω, T defines an orbit that remains in Eθ(ω)
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Ergodic Decomposition
There is a parametrization(
Eθ,Pθ)
θ∈Θsuch that:
the Eθ ’s partition Ω
Pθ is the unique exchangeable distribution supported byEθ:
Pθ(Eθ) = 1
For every exchangeable preference < and parameter θFor Pθ-almost all ω ∈ Eθ:
f ?(ω) =
∫Ω
f dPθ(ω).
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Ergodic Decomposition Skip slide
Starting with ω, we obtain same answer if we:compute the ‘objective’ frequentist limit: f ?(ω)
calculate the expected utility:∫
Ω f dPθ(ω)
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Quick note on ‘learning’ Skip slide
Infinite data =⇒ ω is observed =⇒ θ can be learned
But the model does not admit a mechanism to change theexchangeability relationship(
Eθ,Pθ)
is the decision maker’s window to the world!His way to organize information.
The decision maker interprets the sequence
H,T ,H,T ,H, ......
as confirming that θ = 0.5
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Sufficient statistics: Goal and summary Skip slide
We will show that the i.i.d. parametrization Θ is sufficient, in thesense of sufficient statistics, for the class of exchangeablepreferences.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Sufficient statistics Skip slide
In statistics, a parameter is sufficient for a family ofdistributions if it contains all “relevant information” aboutthat family
Formally, a σ-algebra Z is sufficient for a family ofdistributions P if
P(· | Z)
does not depend on P ∈ P.
Here we have preferences, not distributions, so this cannotmake sense for us
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Sufficient Statistics Skip slide
Define FΘ ⊂ F to be the set of acts measurable withrespect to Eθθ∈Θ.
Define the mapping
Φ : F → FΘ by Φ(f )(ω) =
∫Ω
f dPθ(ω).
We allow ambiguity about parameters, of course.
However, a parameter means that, once its value is know,acts are evaluated using expected utility.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Sufficient Statistic Theorem
The i.i.d. parametrization Θ is sufficient for the set ofexchangeable preferences:
For every f and g
f < g ⇐⇒ Φ(f ) < Φ(g).
The parameter-based act associated with f :
F (θ) ≡ Φ(f )(θ) ≡∫
Ωf dPθ.
< induces a preference <<< on parameter-based acts:
f < g ⇐⇒ F <<< G.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
The Classic de Finetti’s TheoremAn expected utility preference < is exchangeable if and only ifthere is a belief µ on Θ such that:
F <<< G ⇐⇒∫
ΘF dµ ≥
∫Θ
G dµ.
This confounds:
similarity-based, statistical constructs:
Pθ ′s
Bayesian criterion to resolve uncertainty about parameters
µ
with no statistical or similarity interpretation
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Inter-subjective consensus Skip slide
CorollaryLet < be any exchangeable preference.
If all exchangeable Bayesians prefer f to g, then so would <.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Ergodicity: Goals Skip slide
We introduce learning-style conditions that imply subjectiveexpected utility.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Ergodicity Skip slide
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Probabilities as Frequencies Skip slide
E is invariant if T (E) = E ;
< is ergodic if
1 it is exchangeable, and
2 E is invariant =⇒ either E or Ec is <-null.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Empirical distributions Skip slide
The empirical distribution at ω is the set-function whichassigns to each event A the value
ν(A, ω) ≡ 1?A(ω),
if this value exists, and is not defined otherwise.
this is just the empirical frequency of A along ω.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Subjective Probability from Frequencies
If < is ergodic, then:
There is an event Ω′ with <-null complement, such thatthe empirical distribution ν(·) = ν(·, ω) is a well-definedprobability distribution on S that is constant in ω ∈ Ω′; and
< restricted to F1 is an expected utility preference withsubjective probability ν.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Different derivations Skip slide
Savage’s theory:
normative axioms: P1, P2, P4, P6+
exchangeability
⇓
exchangeable subjective probability P
Our result:
Exchangeability & Ergodicity
⇓
Expected utility: subjective belief = empirical measure ν
thus, P1, P2, P4, P6 !!
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Statistical ambiguity: Goal Skip slide
We define and characterize unambiguous events purely basedon learning foundations.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Statistical ambiguity: Summary Skip slide
Assume our weak assumptions
Exchangeability(but no substantive ambiguity-flavor assumptions: eg incompleteness, ambiguity aversion..)
⇓
(a) Subjective set of priors driven by learning conditions
(b) expected utility on statistically unambiguous events
(c) no implications about the attitude towards ambiguity(no pessimism, fear of malevolent nature..)
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Statistical ambiguity Skip slide
Assume:
finite outcome space Sfocus on the set of acts F1 that depend only on the firstcoordinate.
Definition
Given a family of subsets C ⊂ 2S, a partial probability ν on C isa function ν : C → [0,1] such that there is a probabilitydistribution on S that agrees with ν on C.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Statistical ambiguity Skip slide
Definition
An event A ⊂ S is <-statistically unambiguous if there isΩ′ ∈ Σ with <-null complement, such that the frequency of A,ν(A) = ν(A, ω), is constant in ω ∈ Ω′.
The crucial part of the definition is the requirement that ν(A, ω)is independent of ω off a <-null set.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Characterization Skip slide
Theorem (Statistical Ambiguity)
Assume S is finite. For any exchangeable preference <
The set of <-statistically unambiguous events C ⊂ 2S is aλ-system, i.e., a family of sets closed under complementsand disjoint unions;
The empirical measure ν(·) is a partial probability on C; and
For every C-measurable acts f ,g ∈ F1:
f < g ⇐⇒∫
f dν ≥∫
g dν.
C is only a λ-system.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Concluding remarks Skip slide
De Finetti understood the distinction between
1 Defining subjective probability, and2 Formulating probability judgements
Current conception of subjective probability emphasizesthe first, and largely ignores the second
Exchangeability, as formal model of similarity, is the part ofthe preference on which probability judgements are based
This paper does what de Finetti’s theorem cannot do:
separate similarity judgements from the decision criterion
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Moving forward Skip slide
An exchangeability relationship is the decision maker’stheory, of what constitutes similar experiments
..but so far we have nothing to say about theory choice,theory change...
Given our results, we can potentially talk about
Normative criteria that guide theory or model choiceModel uncertaintyIntegrating classical and Bayesian methods
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
The End!!
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Skip slide
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Epstein-Seo, Part 1 Skip slide
Epstein-Seo. Part 1, strengthen:
f ∼ f π
to:αf + (1− α)f π ∼ f (∗)
Under our general assumptions, our condition
f ∼ f π1 + · · ·+ f πn
n.
neither implies nor is it implied by (*)
More important is the difference between our results
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Epstein-Seo, Part 1 Skip slide
Assume our weak assumptions plus exchangeability
Epstein-Seo, Part 1:
Gilboa-Schmeidler axioms (completeness, C-indep., ambiguityaversion)⇓
(1) Gilboa-Schmeidler set of priors is exchangeable;(b) pessimistic MEU criterion given this set
Our approach:
Various learning conditions⇓
(a) Subjective set of priors driven by learning conditions;(b) no implications about the attitude towards statistical
ambiguity
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Epstein-Seo, Part 2 Skip slide
Epstein-Seo second model maintains:
f ∼ f πbut allow:
αf + (1− α)f π f (∗)
Now, parameters are sets of measures; the representation is aprobability distribution over sets of measures
Examples of parameters:the set of all Dirac measures on Ω is a possible parameter
so is the set of all independent distributions
the set of all independent distributions with marginalsbelonging to .2, .6or to .43, .91, . . . etc.
Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix
Epstein-Seo, Part 2 Skip slide
Important contribution because it shows the sort ofanomalies that might arise when exchangeability isweakened.
A parametrization where individual parameters are sets,including the set of all sample paths, seems far removedfrom the intuitive idea of parameters as useful devices tosummarize information.
It is difficult to imagine how statistical inference canproceed on this basis.