Upload
hoangnga
View
220
Download
0
Embed Size (px)
Citation preview
Forecasting
• Every day a forecaster announces a probability p of rain.
• We want to understand if and how these predictions can be refutedempirically.
Probabilistic predictions:
• Weather and climate (Gneiting and Raftery, 2005), aggregate outputand inflation (Diebold, Tay and Wallis, 1997), epidemics (Alkema,Raftery and Clark, 2007), seismic hazard (Jordan et al., 2011),financial risk (Timmermann, 2000), demographic variables (Rafteryet al., 2012), elections (Tetlock, 2005), etc.
Fundamental property
• Let P be the true law governing the data.
• P is unrestricted and unknown.
Dawid (1982)A forecaster who predicts according to P passes the calibration test P-a.s.
Hence:
• Type-I error free: No risk of rejecting the correct predictions of anexpert who knows the true law.
• The tester is not required to have any preconceived theory about theproblem at hand. The forecaster can be evaluated on purelyempirical ground.
Tests and incentive problems
Two main approaches:
1 Contract theory: forecasters as agents advising a principal about thebest course of action.
2 Statistical tests: alternative to standard contracts. Used when:
• Forecasts lack an easily identifiable user.(e.g. National Weather Service, Macroeconomics)
• Contracts are impractical.
• The decision problem is not well defined.(e.g. testing of scientific theories)
Key issue:Forecasters may be concerned about their reputation.
Adverse selection
Consider:
• An expert informed about the true probabilistic law governing thedata.
• A forecaster who is ignorant about the data generating process butis interested in passing the test.
The calibration test cannot discriminate between the two.
Adverse selection
Foster and Vohra (1998)There exists a randomized forecasting algorithm that requires noknowledge about the data generating process and makes the forecastercalibrated with high probability, no matter what data is realized.
Sandroni (2003)The result extends to any test that is Type-I error free and operates infinite time.
Adverse selection
Foster and Vohra (1998)There exists a randomized forecasting algorithm that requires noknowledge about the data generating process and makes the forecastercalibrated with high probability, no matter what data is realized.
Sandroni (2003)The result extends to any test that is Type-I error free and operates infinite time.
This paper
• I consider the problem of testing in the presence of a theory aboutthe data generating process.
• Theory ≈ a restriction over the domain of possible laws.
• Forecasters are required to make predictions that belong to suchdomain.
• Q1: What domains allow for tests that cannot be manipulated?
• Q2: What tests are non-manipulable?
• Q3: What does it mean to be an expert?
Literature
• Negative Results: Sandroni (2003), Al-Najjar and Weinstein (2008),Shmaya (2008), Olszewski and Sandroni (2008), etc.
• Positive results for non-finite tests: Dekel and Feinberg (2006),Olszewski and Sandroni (2008-2009), Feinberg and Stewart (2008),Feinberg and Lambert (2015), etc.
• Non-manipulable paradigms: Al-Najjar, Smorodinsky, Sandroni,Weinstein (2010), Olszewski and Sandroni (2009), Stewart (2009).
• Scoring Rules: Babaioff, Blumlrosen, Lambert, Reingold (2011).
Model: Basic Ingredients
• In each period an outcome from a finite set X is publicly observed.
• Ω = X∞ : set of all paths.
• ∆(Ω) : set of all Borel probability measures on (Ω,B).
Model: Empirical Tests
• A forecaster claims to know the law P ∈ ∆(Ω) governing the data.
• A tester is interested in evaluating this claim.
Timing:
1 The tester designs a test
T : Ω×∆(Ω)→ 0, 1
2 The forecaster observes T and reports a prediction P.
3 Nature produces a path ω ∈ Ω.
4 T (ω,P) determines acceptance or rejection.
Adverse Selection
The forecaster is either:
• A true expert who knows the true law P governing the data, andreports it truthfully.
• A strategic forecaster uninformed but interested in passing thetest.
A (mixed) strategy is a randomization ζ ∈ ∆(∆(Ω))
Example: Likelihood-Ratio Test
1 Fix a benchmark measure P∗ with full support and a time n.
2 The forecaster announces P ∈ ∆(Ω).
3 T (ω,P) = 1 if and only if
P(ωn)
P∗(ωn)> 1
There exists a strategy ζ ∈ ∆(∆(Ω)) such that
ζP : T (ω,P) = 1 ≥ 1− 1
2n
for every ω ∈ Ω.
Example: Likelihood-Ratio Test
1 Fix a benchmark measure P∗ with full support and a time n.
2 The forecaster announces P ∈ ∆(Ω).
3 T (ω,P) = 1 if and only if
P(ωn)
P∗(ωn)> 1
There exists a strategy ζ ∈ ∆(∆(Ω)) such that
ζP : T (ω,P) = 1 ≥ 1− 1
2n
for every ω ∈ Ω.
Paradigms
• A paradigm is a subset Λ ⊆ ∆(Ω).It represents a theory about the data generating process.
• Forecasts outside Λ are rejected a priori.
We want to understand: What paradigms allow for tests that do notreject true expert and cannot be manipulated ?
Desiderata
I : The test does not reject a true expert.
DefinitionGiven a paradigm Λ, a test T passes the truth with probability 1− ε iffor all P ∈ Λ
Pω : T (ω,P) = 1 ≥ 1− ε
Desiderata
II : Rejecting strategic forecasters is feasible.
DefinitionGiven a paradigm Λ a test T is ε-nonmanipulable if for every strategy ζthere is a law Pζ ∈ Λ such that
(Pζ ⊗ ζ)(ω,P) : T (ω,P) = 1 ≤ ε
Desiderata
Payoffs:
• 0 outside option
• w > 0 if T = 1
• l < 0 if T = 0
Maxmin expected payoff:
infP∈Λ
EP⊗ζ [wT + l (1− T )] < 0
whenever T is ε-nonmanipulable for ε small enough.
So, the test can screen between informed and uninformed forecasters.
Desiderata
III : The test decides in finite time.
DefinitionA test T is finite if for every P ∈ ∆(Ω) there exists a time NP such thatT (·,P) is measurable with respect to FNP
.
Testable Paradigms
DefinitionA paradigm Λ is testable if for every ε > 0 there exists a test T suchthat:
1 T passes the truth with probability 1− ε;
2 T is ε-nonmanipulable;
3 T is finite.
Q: What paradigms are testable? Using what tests?
Testable Paradigms
DefinitionA paradigm Λ is testable if for every ε > 0 there exists a test T suchthat:
1 T passes the truth with probability 1− ε;
2 T is ε-nonmanipulable;
3 T is finite.
Q: What paradigms are testable? Using what tests?
A Subjectivist Perspective
Consider an outside observer (a voter, an analyst or a consumer) who isuncertain about the data generating process, as expressed by a prior belief
µ ∈ ∆(∆(Ω))
• The observer and the tester have compatible views if µ(Λ) = 1
• His predictions are given by
Qµ(E ) =
∫∆(Ω)
P(E )dµ(P) for every event E
Characterization
TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that
supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.
• Consider a belief µ. Two polar cases:
• Qµ ∈ Λ : the observer predicts as a potential expert.
• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.
• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.
Characterization
TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that
supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.
• Consider a belief µ. Two polar cases:
• Qµ ∈ Λ : the observer predicts as a potential expert.
• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.
• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.
Characterization
TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that
supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.
• Consider a belief µ. Two polar cases:
• Qµ ∈ Λ : the observer predicts as a potential expert.
• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.
• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.
Geometric Characterization
Given Λ ⊆ ∆ (Ω), let
I (Λ) = supQ∈cow∗
(Λ)
infP∈Λ‖Q − P‖
(Shapley-Folkman-Starr)
• 0 ≤ I (Λ) ≤ 1
• I (Λ) = 0 =⇒ back to impossibility results
• I (Λ) = 1 ⇐⇒ Λ is testable
Non-Manipulable Tests
Theorem
Let Λ be testable. Let µ ∈ ∆(Λ)
satisfy ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ.
There exist integers (nP) such that the test
T (ω,P) =
1 if P ∈ Λ and P (ωnP ) > Qµ (ωnP )
0 otherwise
does not reject the truth with probability 1− ε and is ε-nonmanipulable.
• Equivalent to a Neyman-Pearson hypothesis test, where P is the nulland Qµ is the alternative.
Non-Manipulable Tests
Theorem
Let Λ be testable. Let µ ∈ ∆(Λ)
satisfy ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ.
There exist integers (nP) such that the test
T (ω,P) =
1 if P ∈ Λ and P (ωnP ) > Qµ (ωnP )
0 otherwise
does not reject the truth with probability 1− ε and is ε-nonmanipulable.
• Equivalent to a Neyman-Pearson hypothesis test, where P is the nulland Qµ is the alternative.
Ranking Tests
• The result leaves open the possibility that likelihood-ratio tests areinefficient in the number of observations they require.
Q: Do there exist tests that for a fixed sample size are more efficientthan likelihood-ratio tests?
• In the theory of hypothesis testing, a foundation for the use oflikelihood-ratio tests is provided by Neyman-Pearson lemma.
Neyman-Pearson lemma: Given two hypothesis P0 and P1, a sample sizen and an upper bound α on the probability of Type-I error, there exists alikelihood-ratio test that minimizes the probability of Type-II error.
Ranking Tests
DefinitionFix a paradigm Λ. Test T1 is less manipulable than T2 if
supζ∈∆(Λ)
infP∈Λ
EP⊗ζ [T1] ≤ supζ∈∆(Λ)
infP∈Λ
EP⊗ζ [T2]
• Any uninformed forecaster who is screened out under T2 is alsoscreened out under T1.
Comparisons are more informative if we fix:
1 A bound α for the probability of not rejecting the truth.2 Testing times (nP)
Ranking Tests
DefinitionFix a paradigm Λ. Test T1 is less manipulable than T2 if
supζ∈∆(Λ)
infP∈Λ
EP⊗ζ [T1] ≤ supζ∈∆(Λ)
infP∈Λ
EP⊗ζ [T2]
• Any uninformed forecaster who is screened out under T2 is alsoscreened out under T1.
Comparisons are more informative if we fix:
1 A bound α for the probability of not rejecting the truth.2 Testing times (nP)
A Neyman-Pearson Lemma
Theorem
Fix a paradigm Λ, testing times (nP) and a probability α ∈ [0, 1].
There exists a prior µ ∈ ∆(Λ), thresholds (λP), and a test T ∗ such that:
1 T ∗ (ω,P) = 1 if P ∈ Λ and P (ωnP ) > λPQµ (ωnP ),
2 T ∗ (ω,P) = 0 if P /∈ Λ or P (ωnP ) < λPQµ (ωnP ),
3 T ∗ is less manipulable than any other test that (i) is bounded by
(nP) and (ii) does not reject the truth with probability α.
Further Results
1 Inferring the truth vs. testing predictions.
2 Sequential tests.
3 Maximal testable paradigms.
Testing Forecasts and Inferring the Truth
Definition (Doob, 1949)
A paradigm Λ is identifiable if there exists a measurable map
f : Ω→ Λ
such thatPω : f (ω) = P = 1 for every P ∈ Λ.
Examples:
• i.i.d.
• Markov irreducible
• stationary ergodic
Testing Forecasts and Inferring the Truth
Definition (Doob, 1949)
A paradigm Λ is identifiable if there exists a measurable map
f : Ω→ Λ
such thatPω : f (ω) = P = 1 for every P ∈ Λ.
Examples:
• i.i.d.
• Markov irreducible
• stationary ergodic
Testing Forecasts and Inferring the Truth
Definition (Doob, 1949)
A paradigm Λ is identifiable if there exists a measurable map
f : Ω→ Λ
such thatPω : f (ω) = P = 1 for every P ∈ Λ.
Remarks:
• Experts are more relevant when the paradigm is not identifiable.
• Many examples of testable paradigms are identifiable.
• What the gap between the two properties?
Testability and Inferring the Truth
FactEvery infinite, identifiable paradigm is testable.
FactNot all testable paradigms are identifiable. E.g. Markov.
A procedure for creating testable domains: enlarge a rich identifiableparadigm.
Testability and Inferring the Truth
TheoremIf there exists a prior µ ∈ ∆(Λ) such that ‖Qµ − P‖ = 1 for all P ∈ Λ,then there exists a subset Λ′ ⊆ Λ that is identifiable and uncountable.
• Any “well-behaved” testable paradigm is obtained by enlarging arich identifiable paradigm.
• (relies on a result by Burgess and Mauldin, 1981)
On and Off-Path Predictions
Key property: The forecaster reports a fully specified law P at time 0.
• Unconventional.
• Under the tests considered so far,
an expert cannot prove his knowledge without revealing it.
As a result, even informed experts might not be willing to participate inthe test.
Test
Fix a prior µ and ε > 0
Test:
• At time 0, the forecaster announces a deadline d ∈ N.
• In each period n = 0, ..., d − 1, given history ωn = (ω1, ..., ωn−1) theforecaster provides a prediction pn ∈ ∆ (X ).
• At time d we obtain a history ωd = (ω1, ..., ωd) and a sequence ofpredictions (p0, ..., pd−1). The forecaster passes the test iff
d−1∏n=0
pn(ωn+1) >1
εQµ(ωd)
Remark
How a forecaster will predict at each history is described by a forecastingrule:
f :∞⋃n=0
Hn → ∆(X )
• Hn : set of histories of length n
• f (ωn)(x) : prob. of observing x at time n + 1 conditional on ωn
• To each f we can identify a law P by the identity:
P(ωn) =n−1∏m=0
f (ωm)(ωm+1)
Test
DefinitionFix µ ∈ ∆
(Λ)
be a prior. Given ε, the forecasts-based likelihood-ratiotest is defined as
Tµ,ε (d , ω,P) =
1 if P
(ωd)> 1
εQµ(ωd)
0 otherwise
for all d ∈ N, ω ∈ Ω and P ∈ ∆ (Ω).
• A strategy is a randomization ζ ∈ ∆(N×∆(Ω)).
Properties
TheoremFor every testable paradigm Λ and every ε there is a prior µ ∈ ∆(Λ) suchthat:
1 For every P in Λ there is a deadline dP such that for every d ≥ dP
P ω : Tµ,ε (d , ω,P) = 1 ≥ 1− ε
2 For every strategy ζ ∈ ∆ (N×∆ (Ω)) there is a law Pζ ∈ Λ suchthat
Pζ ⊗ ζ (ω, (d ,P)) : Tµ,ε (d , ω,P) = 1 ≤ ε
Large Paradigms
• Any theory, if incorrect, exposes the tester to the risk of rejectinginformed experts.
• Several papers study tests which do not reject the truth except for asmall set of distributions: Feinberg and Stewart (2008), Olszewskiand Sandroni (2009), Stewart (2011), Feinberg and Lambert (2015),etc.
Maximal Paradigms
TheoremLet ε ∈ (0, 1). Given a law Q ∈ ∆ (Ω) the paradigm
ΛεQ = P ∈ ∆ (Ω) : ‖Q − P‖ > 1− ε
is ε-testable and it is not included in any testable paradigm.
Two effects:
• Maximal paradigms reduce the risk of rejecting true experts.
• They make non-manipulability a weaker concept: the assumptionthat strategic forecasters consider the worst-case scenario w.r.t. theparadigm becomes stronger as the paradigm gets larger.
Maximal Paradigms
TheoremLet ε ∈ (0, 1). Given a law Q ∈ ∆ (Ω) the paradigm
ΛεQ = P ∈ ∆ (Ω) : ‖Q − P‖ > 1− ε
is ε-testable and it is not included in any testable paradigm.
Two effects:
• Maximal paradigms reduce the risk of rejecting true experts.
• They make non-manipulability a weaker concept: the assumptionthat strategic forecasters consider the worst-case scenario w.r.t. theparadigm becomes stronger as the paradigm gets larger.
Maximal Paradigms
This tension disappears when maximal paradigms are obtained byenlarging a paradigm that is already testable.
TheoremLet Λ be a testable paradigm. Then, for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that Λ ⊆ ΛεQµ
and a test T such that:
1 For every law P ∈ ΛεQµ, EP [T (·,P)] ≥ 1− ε ; and
2 For every strategy ζ there exists a law Pζ ∈ Λ such thatEPζ⊗ζ [T ] ≤ ε.