52
• probability models- the Normal especially

Probability models- the Normal especially

Embed Size (px)

Citation preview

Page 1: Probability models- the Normal especially

• probability models- the Normal especially

Page 2: Probability models- the Normal especially
Page 3: Probability models- the Normal especially
Page 4: Probability models- the Normal especially
Page 5: Probability models- the Normal especially
Page 6: Probability models- the Normal especially
Page 7: Probability models- the Normal especially
Page 8: Probability models- the Normal especially
Page 9: Probability models- the Normal especially
Page 10: Probability models- the Normal especially

• checking distributional assumptions

Page 11: Probability models- the Normal especially

Histogram of FS

SEPA location code: 4556FS/100ml

De

nsi

ty

0 20 40 60 80 100

0.0

00

.02

0.0

40

.06

0.0

8

-2 -1 0 1 2

02

04

06

08

0

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Histogram of log10(FS)

SEPA location code: 4556log10(FS)/100ml

De

nsi

ty

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

-2 -1 0 1 2

0.0

0.5

1.0

1.5

2.0

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 12: Probability models- the Normal especially

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

FS: Site 9320

Theoretical Percentile (log10 scale): 1.47x

Fn

(x)

1.47 1.75

Theoretical Percentile

Empirical Percentile

Log scale 1.47 1.75

Directive Scale

29.5 56.2

Page 13: Probability models- the Normal especially

Modelling Continuous Variables checking normality

• Normal probability plot

• Should show a straight line

• p-value of test is also reported (null: data are Normally distributed)C1

Perc

ent

43210-1-2-3

99.9

99

95

90

80706050403020

10

5

1

0.1

Mean

0.439

0.1211StDev 1.015N 100AD 0.361P-Value

Probability Plot of C1Normal

Page 14: Probability models- the Normal especially

• another statistic- the estimated standard error

Page 15: Probability models- the Normal especially
Page 16: Probability models- the Normal especially
Page 17: Probability models- the Normal especially
Page 18: Probability models- the Normal especially
Page 19: Probability models- the Normal especially
Page 20: Probability models- the Normal especially

Statistical inference

• Confidence intervals

• Hypothesis testing and the p-value

• Statistical significance vs real-world importance

Page 21: Probability models- the Normal especially

• a formal statistical procedure- confidence intervals

Page 22: Probability models- the Normal especially

Confidence intervals- an alternative to hypothesis testing

• A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter.  

• A common form is sample estimator 2* estimated standard error

Page 23: Probability models- the Normal especially
Page 24: Probability models- the Normal especially
Page 25: Probability models- the Normal especially
Page 26: Probability models- the Normal especially
Page 27: Probability models- the Normal especially
Page 28: Probability models- the Normal especially

• another formal inferential procedure- hypothesis testing

Page 29: Probability models- the Normal especially

Hypothesis Testing

• Null hypothesis: usually ‘no effect’

• Alternative hypothesis: ‘effect’

• Make a decision based on the evidence (the data)

• There is a risk of getting it wrong!

• Two types of error:-– reject null when we shouldn’t

- Type I– don’t reject null when we should

- Type II

Page 30: Probability models- the Normal especially

Significance Levels

• We cannot reduce probabilities of both Type I and Type II errors to zero.

• So we control the probability of a Type I error.

• This is referred to as the Significance Level or p-value.

• Generally p-value of <0.05 is considered a reasonable risk of a Type I error.(beyond reasonable doubt)

Page 31: Probability models- the Normal especially

Statistical Significance vs. Practical Importance

• Statistical significance is concerned with the ability to discriminate between treatments given the background variation.

• Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

Page 32: Probability models- the Normal especially

Power

Power is related to Type II error

probability of power = 1 - making a Type II

error

Aim:

to keep power as high as possible

Page 33: Probability models- the Normal especially
Page 34: Probability models- the Normal especially
Page 35: Probability models- the Normal especially
Page 36: Probability models- the Normal especially
Page 37: Probability models- the Normal especially

Statistical models

• Outcomes or Responsesthese are the results of the practical work and are sometimes referred to as ‘dependent variables’.

• Causes or Explanationsthese are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to as ‘independent variables’, but more commonly known as covariates.

Page 38: Probability models- the Normal especially

• relationships- linear or otherwise

Page 39: Probability models- the Normal especially

Correlations and linear relationships

• pearson correlation

• Strength of linear relationship

• Simple indicator lying between –1 and +1

• Check your plots for linearity

Page 40: Probability models- the Normal especially

gene correlations

1.11.00.90.80.70.60.50.4

3

2

1

mBadSpl

RA

G1S

pl

corr 0.9

1312111098765

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

mBcl2Sp

mB

adS

pl

corr 0.5

0.150.100.050.00

3

2

1

mBclxLNR

AG

1S

pl

corr 0.03

0.90.80.70.60.50.4

3

2

1

mBadLN

RA

G1S

pl

corr -0.56

Page 41: Probability models- the Normal especially

Interpreting correlations

• The correlation coefficient is used as a measure of the linear relationship between two variables,

• The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

Page 42: Probability models- the Normal especially

A matrix plot

1209060 1.00.50.0 16808.8

8.0

7.2

120

90

608

4

01.0

0.5

0.00.4

0.2

0.0

8.88.07.2

16

8

0

840 0.40.20.0

pH (pH units)

O2 -%sat (%)

BOD (ATU) (mg/ L)

Ammonia as N (mg/ L)

o-Phos as P (mg/ L)

Fe (mg/ L)

Matrix Plot of pH (pH units, O2 -% sat (% ), BOD (ATU) (m, ...

Page 43: Probability models- the Normal especially

181614121086420

1.0

0.8

0.6

0.4

0.2

0.0

Fe (mg/ L)

Am

monia

as

N (

mg/L)

Scatterplot of Ammonia as N (mg/ L) vs Fe (mg/ L)

181614121086420

0.5

0.4

0.3

0.2

0.1

0.0

Fe (mg/ L)

o-P

hos

as

P (

mg/L)

Scatterplot of o-Phos as P (mg/ L) vs Fe (mg/ L)

1.00.80.60.40.20.0

0.5

0.4

0.3

0.2

0.1

0.0

Ammonia as N (mg/ L)

o-P

hos

as

P (

mg/L)

Scatterplot of o-Phos as P (mg/ L) vs Ammonia as N (mg/ L)

Page 44: Probability models- the Normal especially

Correlations

• P and N, 0.228 (p-value 0.001)

• Fe and N, 0.174 (p-value 0.008)

• Fe and P, 0.605 (p-value 0.000)

Page 45: Probability models- the Normal especially

• all highly significant, but do the scatterplots support this interpretation?

• points tend to be clustered in bottom left corner of plot,

• there are one or two observations well separated from the cluster

• both might suggest a transformation (try logs)

Page 46: Probability models- the Normal especially

3210-1

-1.0

-1.5

-2.0

-2.5

-3.0

-3.5

-4.0

log Fe

log P

Scatterplot of log P vs log Fe

3210-1

0

-1

-2

-3

-4

-5

-6

log Fe

log N

Scatterplot of log N vs log Fe

0-1-2-3-4-5-6

-1.0

-1.5

-2.0

-2.5

-3.0

-3.5

-4.0

log N

log P

Scatterplot of log P vs log N

Page 47: Probability models- the Normal especially

Correlations

• logP, logN 0.167 (p-value 0.012)

• logFe, LogN 0.134 (p-value 0.043)

• logP, log Fe, 0.380 (p-value 0.000)

Page 48: Probability models- the Normal especially

• what is a statistical model?

Page 49: Probability models- the Normal especially

Statistical models

• In experiments many of the covariates have been determined by the experimenter but some may be aspects that the experimenter has no control over but that are relevant to the outcomes or responses.

• In observational studies, these are usually not under the control of the experimenter but are recorded as possible explanations of the outcomes or responses.

Page 50: Probability models- the Normal especially

Specifying a statistical models

• Models specify the way in which outcomes and causes link together, eg.

• Metabolite = Temperature• The = sign does not indicate equality in a

mathematical sense and there should be an additional item on the right hand side giving a formula:-

• Metabolite = Temperature + Error

Page 51: Probability models- the Normal especially

statistical model interpretation

• Metabolite = Temperature + Error

• The outcome Metabolite is explained by Temperature and other things that we have not recorded which we call Error.

• The task that we then have in terms of data analysis is simply to find out if the effect that Temperature has is ‘large’ in comparison to that which Error has so that we can say whether or not the Metabolite that we observe is explained by Temperature.

Page 52: Probability models- the Normal especially

summary

• hypothesis tests and confidence intervals are used to make inferences

• we build statistical models to explore relationships and explain variation

• the modelling framework is a general one – general linear models, generalised additive models

• assumptions should be checked.