75
Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Embed Size (px)

Citation preview

Page 1: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Bayesian Methods and Subjectiv Probability

Daniel Thorburn

Stockholm University

2011-01-10

Page 2: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

2

Outline1. Background to Bayesian statistics2. Two simple rules3. Why not design-based?4. Bayes, Public statististics and sampling5. De Finetti theorem, Bayesian Bootstrap6. Comparisons between paradigms7. Preposterior analysis8. Statistics in science9. Complementary Bayesian meethods

Page 3: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

1. Background

• Mathematically:

– Probability is a positive, finite, normed, additive measure defined on a algebra

• But what does that correspond to in real life?

Page 4: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

What is the probability of heads in the following sequence?

Does it change? And when?

– This is a fair coin– I am now going to toss it in the corner – I have tossed it but noone has seen the result– I have got a glimpse of it but you have not– I know the result but you don´t– I tell you the result

4

Page 5: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• Laplace definition. ”All outcomes are equally probable if there is no information to the contrary”. (number of positive elementary events/number of possible elementary events)

• Choose heads and bet on it, with your neighbour. You get one krona if you are right and lose one if you are wrong. When should you change from indifference?

• Frequency interpretation. (LLN). If there is an infinite sequence of independent experiments then the relative frequency converges a.s. towards the true value. Cannot be used as a definition for two reasons – It is a vicious circle. Independence is defined in terms of

probability– It is logically impossible to define over-countably many different

quantities by a countable procedure.

5

Page 6: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Probabilities do not exist (de Finetti)

• They only describe your lack of knowledge• If there is a God almighty, he knows everything

now, in the past and in the future. (God does not play dice, (Einstein))

• But lack of knowledge is personal, thus probability is subjective

• Kolmogorovs axioms only does not say anything about the relation to reality

6

Page 7: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

7

• Probability is the language which describes uncertainty

• If you do not know a quantity you should describe your opinion in terms of probability

• Probability is subjective and varies between persons and over time, depending on the background information.

Page 8: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Rational behaviour – one person

• Axiomatic foundation of probability. Type:– For any two events A and B exactly one of the following must hold A <

B, A > B or A v B (pronounce A as more likely than B, B more likely than A, equally likely)

– If A1, A2, B1 and B2 are four events such that A1A2 = B1B2 is empty and A1 > B1 and A2 > B2 then A1 U A2 > B1 U B2. If further either A1 > B1 or A2 > B2 then A1 U A2 > B1 U B2

– …

• If these axioms hold all events can be assigned probabilities, which obey Kolmogorovs axioms (Villegas, Annals Math Stat, 1964),

• Axioms for behaviour. Type …

– If you prefer A to B, and B to C then you must also prefer A to C– …

• If you want to behave rationally, then you must behave as if all events were assigned probabilities (Anscombe and Aumann, Annals Math Stat, 1963)

Page 9: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• Axioms for probability (these six are enough to prove that a probability following Kolmogorovs axioms can be defined plus the definition of conditional probability)– For any two events A and B exactly one of the following must hold A <

B, A > B or A v B (pronounce A as more likely than B, B more likely than A, equally likely)

– If A1, A2, B1 and B2 are four events such that A1A2 = B1B2 is empty and A1 > B1 and A2 > B2 then A1 U A2 > B1 U B2. If further either A1 > B1 or A2 > B2 then A1 U A2 > B1 U B2

– If A is any event then A > (the impossible (empty) event)– If Ai is an strictly decreasing sequence of events and B a fixed event

such that Ai > B for all i then (the intersection of all A ii) > B– There exists one random variable which has a uniform distribution– For any events A, B and D, (A|D) < (B|D) if and only if AD < BD

• Then one needs some axioms about comparing outcomes, (utilities) in order to be able to prove rationality…

9

Page 10: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• Further one needs some axioms about comparing outcomes, (utilities) in order to be able to prove rationality – For any two outcomes, A and B, one either prefers A to B or B to A or is

indifferent

– If you prefer A to B, and B to C then you must also prefer A to C

– If P1 and P2 are two distributions over outcomes they may be compared and you are indifferent between A and the distribution with P(A)=1

– Two measurability axioms like • If A is any outcome and P a distribution then the event that P gives an outcome

preferred two A can be compared to other events (more likely …)

– If P1 is preferred to P2 and A is an event, A > 0, then the game giving P1 if A occurs is preferred to the game giving P2 under A if the results under the not-A are the same.

– If you prefer P1 to P and P to P2, then there exists numbers a>0 and b>0 such that P1 with probability 1-a and P2 with probability a is preferred to P, which is preferred to P1 with probability b and P2 with probability 1-b.

10

Page 11: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

There is only one type of numbers, which may be known or unknown.

• Classical inference has a mess of different types of numbers e.g. – Parameters– Latent variables like in factor analysis– Random variables– Observations– Independent (explaining) variables – Dependent variables– Constants – a.s.o.

• Superstition!

Page 12: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

2. Two simple requirements for rational inference

12

Page 13: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

13

Rule 1

• What you know/believe in advance + The information in the data = What you know/believe afterwards

Page 14: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

14

Rule 1

• What you know/believe in advance + The information in the data = What you know/believe afterwards

• This is described by Bayes’ Formula:

• P(P(XPX,K)

Page 15: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

15

Rule 1

• What you know/believe in advance + The information in the data = What you know/believe afterwards

• This is described by Bayes’ Formula:

• P(P(XPX,K)

• or in terms of the likelihood

• P(LXPX,K)

Page 16: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

16

Rule 1 corrolarium

• What you believe afterwards + the information in a new study = What you believe after both studies

Page 17: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

17

Rule 1 corrolarium• What you believe afterwards +

the information in a new study = What you believe after both studies

• The result of the inference should be possible to use as an input to the next study

• It should thus be of the same form!• Note that hypothesis testing and confidence

intervals can never appear on the left hand side so they do not follow rule 1

Page 18: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

18

Rule 2

• Your knowledge must be given in a form that can be used for deciding actions. (At least in a well-formulated problem with well-defined losses/utility).

Page 19: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

19

Rule 2

• Your knowledge must be given in a form that can be used for deciding actions. (At least in a well-formulated problem with well-defined losses/utility).

• If you are rational, you must use the rule which minimizes expected ”losses” (maximizes utility)

• Dopt = argmin E(Loss(D, )|X,K) = argmin Loss(D,) P( |X,K) d

Page 20: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

20

Rule 2

• Your knowledge must be given in a form that can be used. (At least in a well-formulated problem with well-defined losses/utility.

• If you are rational, you must use the rule which minimizes expected ”losses” (maximizes utility)

• Dopt = argmin E(Loss(D, )|X,K)

= argmin Loss(D,) P( |X,K) d• Note that classical design-based inference has no

interface with decisions.

Page 21: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Statistical tests are useless

• They cannot be used to combine with new data.

• They cannot be used even in simple decision problems.

• They can be compared to the blunt plastic knife given to a three year old child– He cannot do much sensible with it– But he cannot harm himself either

Page 22: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

22

3. An example of the the stupidity of frequency-based (design-based) methods

N=4, n=2, SRS. Dichotomous data, black or white. The variable is known to come in pairs, i.e. the total is T=0, 2 or 4.

Probabilities:

Population\outcome 0 white 1 white 2 white

No white T=0 1

2 white T=2 1/6 2/3 1/6

All white T=4 1

If you observe 1 white you know for sure that the population contains 2 white. If you observe 0 or 2 you white the only unbiased estimate is T*= 0 resp. 4

The variance of this estimate is 4/3 if T=2 (=1/6*4+4/6*0+1/6*4) and 0 if T=0 or 4

So if you know the true value the design-based variance is 4/3 and if you are uncertain the design-based variance is 0. (Standard unbiased variance estimates are 2 resp. 0)

Page 23: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Bayesian analysis works OK

– We saw the Bayesian analysis when t=1, (T*=2).

– If all possibilities are equally likely à priori, the posterior estimates of T when t = 0 (2) is T* = 2/7 (26/7) and the posterior variance is 24/49.

23

Page 24: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

24

Always stupid?

• It is always stupid to believe that the variance of an estimator is a measure of precision in one particular case. (It is defined as a long run property for many repetitions)

• But it is not always so obvious and so stupid as in this example.

• Is this a consequence of the unusual prior with T must be even?

Page 25: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

25

Example without the prior infoStill stupid but not quite as much

\outcome population\

0 1 2 Var(|)=Var(2T

0 6/6 0

1 3/6 3/6 1

2 1/6 4/6 1/6 4/3

3 3/6 3/6 1

4 6/6 0

Var(|X) 9/20 6/10 9/20

Page 26: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

26

Example without the prior infoStill stupid but not quite as much

\outcome population\

0 1 2 Var(|)=Var(2T

0 6/6 0

1 3/6 3/6 1

2 1/6 4/6 1/6 4/3

3 3/6 3/6 1

4 6/6 0

Var(|X) 9/20 6/10 9/20If you observe 1, the true error is never larger than 1, but the standard deviation is always larger than 1 for all possible parameter values.

Page 27: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

27

Always stupid?• It is always stupid to assume that the variance of

an estimator is a measure of precision in one particular case. (It is defined as a long run property for many repetitions)

• But it is not always so obvious and stupid as in these examples.

• Under suitable regularity conditions designbased methods are asymptotically as efficient as Bayesian methods

nasXpsaXVar

Varn

n

),(..1)|(

)|*(

Page 28: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

28

• Many people say that one should choose the approach that is best for the problem at hand. Classical or Bayesian.

Page 29: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

29

• Many people say that one should choose the approach that is best for the problem at hand. Classical or Bayesian.

• So do Bayesians.

• But they also draw the conclusion:

Page 30: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

30

• Many people say that one should choose the approach that is best for the problem at hand. Classical or Bayesian.

• So do Bayesians.

• But they also draw the conclusion:

• Always use Bayesian methods!

Page 31: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

31

• Many people say that one should choose the approach that is best for the problem at hand. Classical or Bayesian.

• So do Bayesians.

• But they also draw the conclusion:

• Always use Bayesian methods!

• Classical methods can sometimes be seen as quick and dirty approximations to Bayesian methods.

• Then you may use them.

Page 32: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

32

4.What is special for many statistical surveys, e.g. public statistics?

Page 33: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

33

3. What is special for many statistical surveys, e.g. public statistics?

• Answer 1 The producer of the survey is not the user.

– Often many readers and many users.

– The producer has no interest in the figures par se • P( |Kuser) is not known to the producer and not unique

P(KuserLXPX, Kuser)

• Solution?

Page 34: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

34

3. What is special for many statistical surveys, e.g. public statistics?

• Answer 1 The producer of the survey is not the user.

– Often many readers and many users. – The producer has no interest in the figures par se

• P( |Kuser) not known to the producer and not unique P(KuserLXPX, Kuser)

• Publish L( |X) so that any reader can plug in his prior• Usually given in the form of the posterior with a vague,

uninformative (often = constant) prior

LXtP(0LXPX,K0)

Page 35: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Describing the likelihood

• Estimates are often asymptotically normal. Then it is enough to give the posterior mean and variance or a (symmetric) 95% prediction interval (for large samples)

• When the maximum likelihood estimator is approximately efficient and normal the ML-estimate and inverse Fisher information are enough. (t standard confidence interval)

• Asymptotically efficient t for large samples almost as good as Bayesian estimates, which are known to be admissible also for finite samples

Page 36: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

36

What is special for many statistical surveys, e.g. public statistics?

• Answer 2

There is no parameter or more exactly:

The parameter consists of all the N values of all the units in the population.

Page 37: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

37

What is special for many statistical surveys, e.g. public statistics?

• Answer 2 There is no parameter or more exactly The parameter consists of all the N values of all

the units in the population.

• Use this vector as the parameter in Bayes’ formula.

• If you are interested in a certain function, e.g. the total T, integrate out all nuisance parameters in the posterior, to get the marginal of interest

P(YT|X,K) = …i = YT P(1, … , N |X,K) 1

N-1di

Page 38: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

38

5. De Finetti’s theorem

• Random variables are said to be exchangeable if there is no information in the ordering. This is for instance the case with SRS

• If a sequence of random variables is infinitely exchangeable than they can be described as independent variables given , where is a latent random variable. (The proof is simple but needs some knowledge of random processes. Formally is defined on the tail -algebra)

• Latent means in this case that it does not exist but can be useful when desscribing the distribution.

Page 39: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

39

• This imaginary random variable can take the place of a parameter

• But note that it does not exist (is not defined) until the full infinite sequence has been defined and the full sequence will never be observed.

• Note also that most sequences in the real world are not independent but only exchangeable. If you toss a coin 1000 times and get 800 heads it is more likely that the next toss will be heads (compared to the case with 200 heads).

• So obviously there is a dependence between the first 1000 tosses and the 1001st

Page 40: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

40

Dichotomous variables or The Polya Urn scheme

– In an urn there is one white and one black ball.– Draw one ball at random. Note its colour.– Put it back together with one more ball of the same colour– Draw one at random …

• This sequence can be shown to be exchangeable and it can by de Finetti’s theorem be described as– Take € U(0,1) = Beta(1,1)– Draw balls independently with this probability of being white

• There is no way to see the difference between a Bernoulli sequence (binomial distribution) with an unknown p and a Polya urn scheme. Since the outcomes follow the same distribution there cannot exist any test to differentiate between them.

Page 41: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

41

Dichotomous variables or The Polya Urn scheme

• We could have started with another number of balls. This had given other parameters in the prior Beta-distribution

• Beta( white balls and black balls

• E(Beta) = /( + )

• Var(Beta) = /(( + )2( + +1))

Page 42: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

42

Dichotomous variables or The Polya Urn scheme

• This can be used to derive the posterior distribution of the number (yT) of balls/persons with a certain property (white) in a population, given an observed SRS-sample of size n with yS white balls/persons.

• Use a prior with parameters so that the expected value is your best guess of the unknown proportion and the standard deviation describes your uncertainty about it.

Page 43: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

43

Properties• The posterior distribution can be shown to be

• With both parameters set to 0, the expected value is Np* and the variance p*(1-p*)N(N-n)/(n+1).

• The designbased estimate and variance estimator are good approximations to this (equal apart from n in place of n+1)

)1)((

))()()(()|(

))(()|(

)!1()!()!(

)!()!()!()|Pr(

nn

nNNynyyYVar

n

ynNyyYE

nyynNyy

ynynNyyY

SSST

SSST

STST

SSSTT

Page 44: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

44

Simulation

• It is often easier to simulate from the posterior than to give its exact form.

• In this case the urn scheme gives a simple way to simulate the full population. Just continue with the Polya sampling scheme starting from the sample

• If you repeat this 1000 times, say, and plot the 1000 simulated population totals in a histogram, you will get a good description of the distribution of the unknown quantity

Page 45: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

45

Dirichlet-multinomial • If the distribution is discrete with a finite number of

categories, a similar procedure is possible• Just draw from the set of all observations and put it back

together with a similar observation. Continue until N• Repeat and you get a number of populations which are drawn

from the posterior distribution. • For each population compute the parameter of interest, e.g. the

mean or median, and plot the values in a histogram• If this is described as in de Finetti’s Theorem, the parameter

comes from a Dirichlet distribution and the observations are conditionally independent multinomial.

Page 46: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

46

The Bayesian Bootstrap• This procedure is called the Bayesian bootstrap (if an

uninformative prior i.e. all parameters = 0 is used)• This can be generalised to variables measured on a continuous

scale • The design-based estimate gives the same mean estimate as

this (for polytomous populations). • The design-based variance estimator is also close to the true

variance apart from a factor n/(n+1)• Note, that if the distribution is skew, this method does not

work well, since it does not use the prior information of skewness (nor does the designbased methods)

• Note also that with many categories it may be better to use even smaller parameters e.g. – 0.9.

Page 47: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

47

Other Bayesian models

• There are many other models/methods within Bayesian survey sampling, than the Bayesian Bootstrap

• Another approach starts with a normal-gamma model – Given 2, data come from an iid normal(2) model – The variance 2 follows a priori an inverse gamma – The mean follows a priori a normal model with

mean m and variance k2 • and later relaxes the normality assumption• but I have not enough time here.

Page 48: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

48

6. Properties of some different paradigms within survey sampling

Design-based Model-based Bayesian

Uncertainty, Randomness

Home-made Given by nature, frequencybased

Subjective, rat-ionality axioms

Main focus Population Parameters Population

Parameters Population values

Unknown unobservable

Do not exist, but useful, de Finetti

Inference Long run pro-perties based

Long run pro-perties based

This case, pro-bability based

Output Point estimates intervals

Point estimates confidence intervals

Full posterior distributions, means, variance

Possible use Not my problem Not my problem Interface with decisions

Page 49: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

49

7. Preposterior analysisStudy/experimental design

• In the design of a survey one must take into account the posterior distribution.

• You may e.g. want to – Get a small posterior variance– Get a short 95 % prediction interval– Make a good decision

• This analysis of the possible posterior consequences before the experiment is carried out, is called preposterior analysis

Page 50: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

50

Preposterior analys with public statistics

• Usually when you make a survey for your own benefit you should use your own prior both in the preposterior and the posterior analysis

• With public statistics you should have a flat prior in the posterior analysis, – e.g. the posterior variance is Var( |X, K0).

• But the design decision is yours and you should use all your information for that decision – e.g. find the design, which minimizes

E(Var( |X, K0)| KYou )

Page 51: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Example: Neyman allocation; Dichotomous data

• M strata with Nm elements. Unknown proportion in stratum m is pm.

• How many elements should be drawn from each stratum in order to estimate the average proportion best?

• Neyman: Chose nm Nm(pm(1-pm))1/2. • But pm is unknown. Classical people: Use

your best subjective guess pm0

• Neyman: Chose nm Nm (pm0(1-pm

0))1/2.

Page 52: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Bayesians: Do not use a one point prior.

It is to subjective!Take also your prior uncertainty into

account!

• Chose e.g. the prior Beta(m,m). m<M

– where pm0 = m/(m+m)

– and Var(pm) = mm/((m+m)2(m+m+1))

Page 53: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

53

Example: Optimal allocation• M strata, size Nm , dichotomous data, independent priors (m, m) (as we saw

above). Posterior variance:

• The expected value of this is

• Minimising this gives approximately the sample sizes

• The terms (mm) on the left hand side should not be there in the case of public statistics (c cost)

• This differs from Neyman-allocation since it takes the prior uncertainty of the proportions into account

1

)(*)1(*),...,1.|(

m

mmmmmStot n

nNNppMmyYVar

m

)1)((1

1)(

mmmm

mm

m

m

m

mmm

n

n

n

nNN

)1)((

mmmm

mm

m

mmmm c

Nn

Page 54: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

8. Statistics in scienceScience is more complicated

• One may divide it into (at least) three phases1. Exploratory2. Trying to get a good picture, convincing yourself3. Proving the fact - convincing others

• These phases may require different approaches and priors1. Your own prior but critical2. .

1. During work: Your own prior and often informative based on theory, arguments, experience or the exploration.

2. In the presentation: Usually vague prior.

3. Other possible priors (but use also vague)

Page 55: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

8.1 Exploratory

• Sometimes called hypothesis generating• Most theories are false. Most substances are useless

against cancer.• Use your own priors, which most often say that all facts

are most unlikely. Some examples– Screening – all substances have a probability of 0.001 to have

some effect– Regression situations with an abundance (M) of explaining

variables. If all variables are ordered after importance all M! orderings are equally likely. Given the ordering the m:th regression coefficient is N(0,1/(m-1)2) (after standardisation of X) (another possibility is that reduction in unexplained variance 1-R2 is Beta(1,m2))

• When there is a support from theory or previous experiences other priors may be used

Page 56: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

8.2 Getting a good picture yourself(Assuming that you are the scientist)

• In classical terms this is the phase when you can formulate the hypotheses that you want to test. (Your prior is strong enough to formulate hypotheses)

• (In classical theory there is no description of the first phase. Mostly one said that: if you can formulate a hypothesis you may test it, as long as you do not formulate to many)

• Your priors should still be your own.

Page 57: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• The reporting in this phase is quite similar to what was said about official statistics. I.e. try to give a good picture of the likelihood function

• But, contrary to public statistics, in the design of experiments it is your posterior precision that should be maximised. (It is assumed that you are an expert in the field and it is no reason to believe that your opinion is far from the present state of knowledge)

Page 58: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

8.3 Proving scientific facts

• It is very easy to convince people who believe in the fact from the beginning.

• It is often fairly simple to convince yourself even if you are broadminded

• But to prove a scientific fact you must convince also those that have reasonable doubts.

Page 59: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Proving scientific facts• PX,K) P(X P(• A person is convinced of a fact when his posterior

probability is close to one for the fact.• But to prove the fact scientifically this must hold for

all reasonable priors including those describing reasonable doubt.

• Even if there is no such person this must hold also for that prior as long as it is reasonable

• I.e. a result is ”proved” if

inf (PX,K); K reasonable) > 1 – for some .

Page 60: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• Reporting: Use vague priors, but also show what the consequences are for some priors with (un-) reasonable doubt.

• When you prove something all available data should be used. Type: Meta analysis. In some fields one study is ususall not enough to convince people

• Designing experiments: Design your experiments so that you maximise E(inf (PX,K); K reasonable) | KYOU) (if you are convinced).

Page 61: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

What is reasonable doubt? Convincing others

• You have to contemplate what is meant by reasonable doubt.

• Depends on the importance of the subject.

• It can be just putting very small prior probability on the fact to be proven

• But you must also try to find the possible flaws in your theory and designing your experiments to counterprove them.

Page 62: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Priors with reasonable doubt • Use priors with reasonable doubt

– In an experiment to prove telepathic effects you could e.g. use priors like P(logodds ratio = 0) = 0,9999. If the logodds ratio is different from 0 it may be modelled as N(0,2), where 2 may be moderate or large.

– If the posterior e.g. says that P(p > 0,5) > 0.95, and may consider the fact as proved. (Roughly this means that you need about 20 successes in a row, where a random guess has probability ½ to be correct).

– Never use the prior P=1, since you must always be open-minded (only fundamentalists do so. They will never change their opinion whatever the evidence).

• In more standard situations you will probably not need quite so negative priors

Page 63: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Modelling flaws in the theory/study – Example: Several studies needed

• An argument often met in medical studies is that no effect is proven until it is corroborated by at least three independent studies.

• This means that different conclusions must be drawn with different prior knowledge. (No, one or two previous studies). – People arguing like that violate the Neyman-Pearson

theory

• How can this be modelled in Bayesian terms?

Page 64: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Some type of multilevel model

• Where m is the worldwide mean, ai is the unknown methodology bias of study number i, bi is the site-specific bias and i is the precision of the experiment (usually with a known (posterior) distribution)

• m has an uninformative prior

• bi can often be assumed to follow normal distributions with common mean and variance following Normal--2-distributions.

• ai has probably prior distributions with much longer tails.

• The prior distributions for ai and bi might be estimated from other studies.

kibam iiii ,...,1;*

Page 65: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

• If the prior for ai is chosen so that two similar outliers out of two trials is not impossible but three similar outliers is unlikely, we would end up in requiring three independent trials with similar results.

• There should probably also be included a selection bias. This can be done but that is too complicated for this short talk.

• In the same way the distribution of m may depend on how strict the inclusion criteria are.

• One may argue that the k trials are not independent. Studies following the same protocol may get the same bias.

Page 66: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

66

. Some situations which design-based sampling cannot handle

• Many people say that one should choose the approach that is best for the problem at hand. Many problems are more difficult than others to handle design-based

• For instance– Missing data– Multiple imputation– Small area estimation– Outlier detection– Editing– Meta-analysis– Synthetic estimation– Coding and classification– Total survey design– …

Page 67: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

67

Missing data• A model for the missingness property is needed. The following Bayesian

notions are commonly used, but not everyone realises that they are Bayesian • Missing completely at random (MCAR):

F(x,y,z,xyz)= F(x,y,xy)F(zz)• Missing at Random (MAR): ”Given what you know the response

mechanism is independent of the other variables”

F(x,y,z,xyz)=F(y,y|X=x)F(zz|X=x)F(X,x) (where x is known, for unit non-response)

(or F(y,z,yz|y, y€R)=F(y,y|y€R)F(zz|y€R)) (with item non-response)

• Not Missing at Random (NMAR)

• X auxiliary variables, Y study variables, Z missingness indicator, parameter indexed with the variable for which it contains information.

Page 68: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

68

Multiple imputation

• Many different situations. We only look at one situation with two y-variables (but use an MCMC-technique).

• For some respondents one of the y-variables is missing, but which one differs between respondents

• We assume MAR!

• We also assume just now that (y1i,y2i) comes from a normal super-population with unknown mean and unknown variance .

Page 69: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

69

Multiple imputation – MCMC-procedure1. Impute starting values for all missing values2. Put b=13. Find true posterior of and (assuming vague normal

inverse-gamma prior) 4. Draw possible b and b from this distribution5. Find the distribution of the missing values assuming

these parameter values6. Draw new random numbers from this distribution and

impute them7. Draw a random value from the conditional distribution

of the sum of the non-sampled units (given parameters and imputed values.

8. Add all values to get an estimate of the totals YT1b YT2b 9. Save them10. If b < B0 + B set b = b+1 and go to 3 else stop

Page 70: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

70

Multiple imputation cont.

• This is an ergodic Markov chain. The distribution converges to the true distribution of T, and .

• Choose a burn-in period B0 so large that convergence is reached.

• The remaining B observations are thus drawn from the true posterior

• This distribution may be plotted

Page 71: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

Multiple imputation – not fully Bayesian1-6 As before to get an imputed data set nr b

7 Estimate the total Tb (or whatever) from the sample, with standard methods and its variance, Sb

8 Save them

9 If b < B0 + B set b = b+1 and go to 3 else stop

• Compute the mean of the last B estimates (T). Use this as the estimate of the total

• Compute the mean of the last B variances (S)• Compute the variance of the last B estimates (T)• Use the sum of these two values as a variance estimator of the

estimate of the total

• Note that this is a mixture of Bayesian and design-based variances. But it works as a classical estimate.

Page 72: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

72

Multiple imputation cont.

• Posterior under normality

• But what if the distribution is not normal?

Page 73: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

73

Multiple imputation cont.

• Posterior under normality• But what if the distribution is not normal?• The means are still a BLUE estimators of

the parameter and the total and a consistent estimator of the variance

• But if the distribution is skew it will not be particularly good. This is not a defect with multiple imputation, but a problem with skew distributions in general

Page 74: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

74

Conclusions

• Always use Bayesian methods– You will get new tools (e.g. full posterior distributions)– You will produce something useful– You will be logically consistent– You will be able to tackle many more problems within

the theory

• You may use design-based methods as ”quick and dirty methods” when you know that the result will be almost equivalent to the Bayesian approach.

Page 75: Bayesian Methods and Subjectiv Probability Daniel Thorburn Stockholm University 2011-01-10

75

Thank you for your attention!