48
Testable Forecasts Luciano Pomatto Caltech

Luciano Pomatto Caltech - Becker Friedman Institute · Shmaya (2008), Olszewski and Sandroni (2008), etc. Positive results for non- nite tests: Dekel and Feinberg (2006), Olszewski

Embed Size (px)

Citation preview

Testable Forecasts

Luciano Pomatto

Caltech

Forecasting

• Every day a forecaster announces a probability p of rain.

• We want to understand if and how these predictions can be refutedempirically.

Probabilistic predictions:

• Weather and climate (Gneiting and Raftery, 2005), aggregate outputand inflation (Diebold, Tay and Wallis, 1997), epidemics (Alkema,Raftery and Clark, 2007), seismic hazard (Jordan et al., 2011),financial risk (Timmermann, 2000), demographic variables (Rafteryet al., 2012), elections (Tetlock, 2005), etc.

Calibration Test

Many variations across fields: density forecasts, Value-at-Risk, etc.

Fundamental property

• Let P be the true law governing the data.

• P is unrestricted and unknown.

Dawid (1982)A forecaster who predicts according to P passes the calibration test P-a.s.

Hence:

• Type-I error free: No risk of rejecting the correct predictions of anexpert who knows the true law.

• The tester is not required to have any preconceived theory about theproblem at hand. The forecaster can be evaluated on purelyempirical ground.

Tests and incentive problems

Two main approaches:

1 Contract theory: forecasters as agents advising a principal about thebest course of action.

2 Statistical tests: alternative to standard contracts. Used when:

• Forecasts lack an easily identifiable user.(e.g. National Weather Service, Macroeconomics)

• Contracts are impractical.

• The decision problem is not well defined.(e.g. testing of scientific theories)

Key issue:Forecasters may be concerned about their reputation.

Adverse selection

Consider:

• An expert informed about the true probabilistic law governing thedata.

• A forecaster who is ignorant about the data generating process butis interested in passing the test.

The calibration test cannot discriminate between the two.

Adverse selection

Foster and Vohra (1998)There exists a randomized forecasting algorithm that requires noknowledge about the data generating process and makes the forecastercalibrated with high probability, no matter what data is realized.

Sandroni (2003)The result extends to any test that is Type-I error free and operates infinite time.

Adverse selection

Foster and Vohra (1998)There exists a randomized forecasting algorithm that requires noknowledge about the data generating process and makes the forecastercalibrated with high probability, no matter what data is realized.

Sandroni (2003)The result extends to any test that is Type-I error free and operates infinite time.

This paper

• I consider the problem of testing in the presence of a theory aboutthe data generating process.

• Theory ≈ a restriction over the domain of possible laws.

• Forecasters are required to make predictions that belong to suchdomain.

• Q1: What domains allow for tests that cannot be manipulated?

• Q2: What tests are non-manipulable?

• Q3: What does it mean to be an expert?

Literature

• Negative Results: Sandroni (2003), Al-Najjar and Weinstein (2008),Shmaya (2008), Olszewski and Sandroni (2008), etc.

• Positive results for non-finite tests: Dekel and Feinberg (2006),Olszewski and Sandroni (2008-2009), Feinberg and Stewart (2008),Feinberg and Lambert (2015), etc.

• Non-manipulable paradigms: Al-Najjar, Smorodinsky, Sandroni,Weinstein (2010), Olszewski and Sandroni (2009), Stewart (2009).

• Scoring Rules: Babaioff, Blumlrosen, Lambert, Reingold (2011).

Model: Basic Ingredients

• In each period an outcome from a finite set X is publicly observed.

• Ω = X∞ : set of all paths.

• ∆(Ω) : set of all Borel probability measures on (Ω,B).

Model: Empirical Tests

• A forecaster claims to know the law P ∈ ∆(Ω) governing the data.

• A tester is interested in evaluating this claim.

Timing:

1 The tester designs a test

T : Ω×∆(Ω)→ 0, 1

2 The forecaster observes T and reports a prediction P.

3 Nature produces a path ω ∈ Ω.

4 T (ω,P) determines acceptance or rejection.

Adverse Selection

The forecaster is either:

• A true expert who knows the true law P governing the data, andreports it truthfully.

• A strategic forecaster uninformed but interested in passing thetest.

A (mixed) strategy is a randomization ζ ∈ ∆(∆(Ω))

Example: Likelihood-Ratio Test

1 Fix a benchmark measure P∗ with full support and a time n.

2 The forecaster announces P ∈ ∆(Ω).

3 T (ω,P) = 1 if and only if

P(ωn)

P∗(ωn)> 1

There exists a strategy ζ ∈ ∆(∆(Ω)) such that

ζP : T (ω,P) = 1 ≥ 1− 1

2n

for every ω ∈ Ω.

Example: Likelihood-Ratio Test

1 Fix a benchmark measure P∗ with full support and a time n.

2 The forecaster announces P ∈ ∆(Ω).

3 T (ω,P) = 1 if and only if

P(ωn)

P∗(ωn)> 1

There exists a strategy ζ ∈ ∆(∆(Ω)) such that

ζP : T (ω,P) = 1 ≥ 1− 1

2n

for every ω ∈ Ω.

Paradigms

• A paradigm is a subset Λ ⊆ ∆(Ω).It represents a theory about the data generating process.

• Forecasts outside Λ are rejected a priori.

We want to understand: What paradigms allow for tests that do notreject true expert and cannot be manipulated ?

Desiderata

I : The test does not reject a true expert.

DefinitionGiven a paradigm Λ, a test T passes the truth with probability 1− ε iffor all P ∈ Λ

Pω : T (ω,P) = 1 ≥ 1− ε

Desiderata

II : Rejecting strategic forecasters is feasible.

DefinitionGiven a paradigm Λ a test T is ε-nonmanipulable if for every strategy ζthere is a law Pζ ∈ Λ such that

(Pζ ⊗ ζ)(ω,P) : T (ω,P) = 1 ≤ ε

Desiderata

Payoffs:

• 0 outside option

• w > 0 if T = 1

• l < 0 if T = 0

Maxmin expected payoff:

infP∈Λ

EP⊗ζ [wT + l (1− T )] < 0

whenever T is ε-nonmanipulable for ε small enough.

So, the test can screen between informed and uninformed forecasters.

Desiderata

III : The test decides in finite time.

DefinitionA test T is finite if for every P ∈ ∆(Ω) there exists a time NP such thatT (·,P) is measurable with respect to FNP

.

Testable Paradigms

DefinitionA paradigm Λ is testable if for every ε > 0 there exists a test T suchthat:

1 T passes the truth with probability 1− ε;

2 T is ε-nonmanipulable;

3 T is finite.

Q: What paradigms are testable? Using what tests?

Testable Paradigms

DefinitionA paradigm Λ is testable if for every ε > 0 there exists a test T suchthat:

1 T passes the truth with probability 1− ε;

2 T is ε-nonmanipulable;

3 T is finite.

Q: What paradigms are testable? Using what tests?

A Subjectivist Perspective

Consider an outside observer (a voter, an analyst or a consumer) who isuncertain about the data generating process, as expressed by a prior belief

µ ∈ ∆(∆(Ω))

• The observer and the tester have compatible views if µ(Λ) = 1

• His predictions are given by

Qµ(E ) =

∫∆(Ω)

P(E )dµ(P) for every event E

Characterization

TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that

supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.

• Consider a belief µ. Two polar cases:

• Qµ ∈ Λ : the observer predicts as a potential expert.

• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.

• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.

Characterization

TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that

supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.

• Consider a belief µ. Two polar cases:

• Qµ ∈ Λ : the observer predicts as a potential expert.

• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.

• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.

Characterization

TheoremA paradigm Λ is testable if and only if for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that

supE⊆Ω|Qµ(E )− P(E )| ≥ 1− ε for all P ∈ Λ.

• Consider a belief µ. Two polar cases:

• Qµ ∈ Λ : the observer predicts as a potential expert.

• ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ : the predictions of the observer arefar from the true law, whatever that is.

• The fact that Λ is taken to be true should not exhaust all possibleopinions that a rational agent can entertain.

Geometric Characterization

Given Λ ⊆ ∆ (Ω), let

I (Λ) = supQ∈cow∗

(Λ)

infP∈Λ‖Q − P‖

(Shapley-Folkman-Starr)

• 0 ≤ I (Λ) ≤ 1

• I (Λ) = 0 =⇒ back to impossibility results

• I (Λ) = 1 ⇐⇒ Λ is testable

Non-Manipulable Tests

Theorem

Let Λ be testable. Let µ ∈ ∆(Λ)

satisfy ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ.

There exist integers (nP) such that the test

T (ω,P) =

1 if P ∈ Λ and P (ωnP ) > Qµ (ωnP )

0 otherwise

does not reject the truth with probability 1− ε and is ε-nonmanipulable.

• Equivalent to a Neyman-Pearson hypothesis test, where P is the nulland Qµ is the alternative.

Non-Manipulable Tests

Theorem

Let Λ be testable. Let µ ∈ ∆(Λ)

satisfy ‖Qµ − P‖ ≥ 1− ε for all P ∈ Λ.

There exist integers (nP) such that the test

T (ω,P) =

1 if P ∈ Λ and P (ωnP ) > Qµ (ωnP )

0 otherwise

does not reject the truth with probability 1− ε and is ε-nonmanipulable.

• Equivalent to a Neyman-Pearson hypothesis test, where P is the nulland Qµ is the alternative.

Ranking Tests

• The result leaves open the possibility that likelihood-ratio tests areinefficient in the number of observations they require.

Q: Do there exist tests that for a fixed sample size are more efficientthan likelihood-ratio tests?

• In the theory of hypothesis testing, a foundation for the use oflikelihood-ratio tests is provided by Neyman-Pearson lemma.

Neyman-Pearson lemma: Given two hypothesis P0 and P1, a sample sizen and an upper bound α on the probability of Type-I error, there exists alikelihood-ratio test that minimizes the probability of Type-II error.

Ranking Tests

DefinitionFix a paradigm Λ. Test T1 is less manipulable than T2 if

supζ∈∆(Λ)

infP∈Λ

EP⊗ζ [T1] ≤ supζ∈∆(Λ)

infP∈Λ

EP⊗ζ [T2]

• Any uninformed forecaster who is screened out under T2 is alsoscreened out under T1.

Comparisons are more informative if we fix:

1 A bound α for the probability of not rejecting the truth.2 Testing times (nP)

Ranking Tests

DefinitionFix a paradigm Λ. Test T1 is less manipulable than T2 if

supζ∈∆(Λ)

infP∈Λ

EP⊗ζ [T1] ≤ supζ∈∆(Λ)

infP∈Λ

EP⊗ζ [T2]

• Any uninformed forecaster who is screened out under T2 is alsoscreened out under T1.

Comparisons are more informative if we fix:

1 A bound α for the probability of not rejecting the truth.2 Testing times (nP)

A Neyman-Pearson Lemma

Theorem

Fix a paradigm Λ, testing times (nP) and a probability α ∈ [0, 1].

There exists a prior µ ∈ ∆(Λ), thresholds (λP), and a test T ∗ such that:

1 T ∗ (ω,P) = 1 if P ∈ Λ and P (ωnP ) > λPQµ (ωnP ),

2 T ∗ (ω,P) = 0 if P /∈ Λ or P (ωnP ) < λPQµ (ωnP ),

3 T ∗ is less manipulable than any other test that (i) is bounded by

(nP) and (ii) does not reject the truth with probability α.

Further Results

1 Inferring the truth vs. testing predictions.

2 Sequential tests.

3 Maximal testable paradigms.

Testing Forecasts and Inferring the Truth

Definition (Doob, 1949)

A paradigm Λ is identifiable if there exists a measurable map

f : Ω→ Λ

such thatPω : f (ω) = P = 1 for every P ∈ Λ.

Examples:

• i.i.d.

• Markov irreducible

• stationary ergodic

Testing Forecasts and Inferring the Truth

Definition (Doob, 1949)

A paradigm Λ is identifiable if there exists a measurable map

f : Ω→ Λ

such thatPω : f (ω) = P = 1 for every P ∈ Λ.

Examples:

• i.i.d.

• Markov irreducible

• stationary ergodic

Testing Forecasts and Inferring the Truth

Definition (Doob, 1949)

A paradigm Λ is identifiable if there exists a measurable map

f : Ω→ Λ

such thatPω : f (ω) = P = 1 for every P ∈ Λ.

Remarks:

• Experts are more relevant when the paradigm is not identifiable.

• Many examples of testable paradigms are identifiable.

• What the gap between the two properties?

Testability and Inferring the Truth

FactEvery infinite, identifiable paradigm is testable.

FactNot all testable paradigms are identifiable. E.g. Markov.

A procedure for creating testable domains: enlarge a rich identifiableparadigm.

Testability and Inferring the Truth

TheoremIf there exists a prior µ ∈ ∆(Λ) such that ‖Qµ − P‖ = 1 for all P ∈ Λ,then there exists a subset Λ′ ⊆ Λ that is identifiable and uncountable.

• Any “well-behaved” testable paradigm is obtained by enlarging arich identifiable paradigm.

• (relies on a result by Burgess and Mauldin, 1981)

On and Off-Path Predictions

Key property: The forecaster reports a fully specified law P at time 0.

• Unconventional.

• Under the tests considered so far,

an expert cannot prove his knowledge without revealing it.

As a result, even informed experts might not be willing to participate inthe test.

Test

Fix a prior µ and ε > 0

Test:

• At time 0, the forecaster announces a deadline d ∈ N.

• In each period n = 0, ..., d − 1, given history ωn = (ω1, ..., ωn−1) theforecaster provides a prediction pn ∈ ∆ (X ).

• At time d we obtain a history ωd = (ω1, ..., ωd) and a sequence ofpredictions (p0, ..., pd−1). The forecaster passes the test iff

d−1∏n=0

pn(ωn+1) >1

εQµ(ωd)

Remark

How a forecaster will predict at each history is described by a forecastingrule:

f :∞⋃n=0

Hn → ∆(X )

• Hn : set of histories of length n

• f (ωn)(x) : prob. of observing x at time n + 1 conditional on ωn

• To each f we can identify a law P by the identity:

P(ωn) =n−1∏m=0

f (ωm)(ωm+1)

Test

DefinitionFix µ ∈ ∆

(Λ)

be a prior. Given ε, the forecasts-based likelihood-ratiotest is defined as

Tµ,ε (d , ω,P) =

1 if P

(ωd)> 1

εQµ(ωd)

0 otherwise

for all d ∈ N, ω ∈ Ω and P ∈ ∆ (Ω).

• A strategy is a randomization ζ ∈ ∆(N×∆(Ω)).

Properties

TheoremFor every testable paradigm Λ and every ε there is a prior µ ∈ ∆(Λ) suchthat:

1 For every P in Λ there is a deadline dP such that for every d ≥ dP

P ω : Tµ,ε (d , ω,P) = 1 ≥ 1− ε

2 For every strategy ζ ∈ ∆ (N×∆ (Ω)) there is a law Pζ ∈ Λ suchthat

Pζ ⊗ ζ (ω, (d ,P)) : Tµ,ε (d , ω,P) = 1 ≤ ε

Large Paradigms

• Any theory, if incorrect, exposes the tester to the risk of rejectinginformed experts.

• Several papers study tests which do not reject the truth except for asmall set of distributions: Feinberg and Stewart (2008), Olszewskiand Sandroni (2009), Stewart (2011), Feinberg and Lambert (2015),etc.

Maximal Paradigms

TheoremLet ε ∈ (0, 1). Given a law Q ∈ ∆ (Ω) the paradigm

ΛεQ = P ∈ ∆ (Ω) : ‖Q − P‖ > 1− ε

is ε-testable and it is not included in any testable paradigm.

Two effects:

• Maximal paradigms reduce the risk of rejecting true experts.

• They make non-manipulability a weaker concept: the assumptionthat strategic forecasters consider the worst-case scenario w.r.t. theparadigm becomes stronger as the paradigm gets larger.

Maximal Paradigms

TheoremLet ε ∈ (0, 1). Given a law Q ∈ ∆ (Ω) the paradigm

ΛεQ = P ∈ ∆ (Ω) : ‖Q − P‖ > 1− ε

is ε-testable and it is not included in any testable paradigm.

Two effects:

• Maximal paradigms reduce the risk of rejecting true experts.

• They make non-manipulability a weaker concept: the assumptionthat strategic forecasters consider the worst-case scenario w.r.t. theparadigm becomes stronger as the paradigm gets larger.

Maximal Paradigms

This tension disappears when maximal paradigms are obtained byenlarging a paradigm that is already testable.

TheoremLet Λ be a testable paradigm. Then, for every ε > 0 there exists a priorµ ∈ ∆(Λ) such that Λ ⊆ ΛεQµ

and a test T such that:

1 For every law P ∈ ΛεQµ, EP [T (·,P)] ≥ 1− ε ; and

2 For every strategy ζ there exists a law Pζ ∈ Λ such thatEPζ⊗ζ [T ] ≤ ε.