84
G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents Summary Computational Statistics A Proposal for a Basic Course Statistical Computing 2009 28.6.-1.7.2009, Schloss Reisensburg unther Sawitzki <[email protected]> StatLab Heidelberg 30. 6.2009 Typeset: July 26, 2009

Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1

BackgroundContentsSummary

Computational StatisticsA Proposal for a Basic Course

Statistical Computing 200928.6.-1.7.2009, Schloss Reisensburg

Gunther Sawitzki<[email protected]>

StatLab Heidelberg

30. 6.2009Typeset: July 26, 2009

Page 2: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 2

BackgroundContentsSummary

Note

Page references refer to

Computational Statistics:An Introduction to R

Chapman & Hall/CRC Press,Boca Raton (FL), 2009.

ISBN: 978-1-4200-8678-2

See http://sintro.r-forge.r-project.org/.

Page 3: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 3

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

Background

BackgroundPredecessorsAudienceTopicsStructure

Contents

Summary

Page 4: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 4

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundPredecessors

Aim

A concise course in computational statistics

Predecessors

One week post-graduate course ”Biometry in Medicine”

One week course: R programming

Linear Models

Statistical Data Analysis

. . .

Page 5: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 5

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundPredecessors

Aim

A concise course in computational statistics

Predecessors

One week post-graduate course ”Biometry in Medicine”

One week course: R programming

Linear Models

Statistical Data Analysis

. . .

Page 6: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 6

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundPredecessors

Aim

A concise course in computational statistics

Predecessors

One week post-graduate course ”Biometry in Medicine”

One week course: R programming

Linear Models

Statistical Data Analysis

. . .

Page 7: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 7

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundPredecessors

Aim

A concise course in computational statistics

Predecessors

One week post-graduate course ”Biometry in Medicine”

One week course: R programming

Linear Models

Statistical Data Analysis

. . .

Page 8: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 8

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundPredecessors

Aim

A concise course in computational statistics

Predecessors

One week post-graduate course ”Biometry in Medicine”

One week course: R programming

Linear Models

Statistical Data Analysis

. . .

Page 9: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 9

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundAudience

Computational Statistics: An Introduction to R

Designed for a mixed audience

researchers and post-graduates from applied areas (in particularfrom clinical departments and from the DKFZ, the Germancancer research center), with some working knowledge instatistical methods and with considerable laboratory experience

students from mathematics or computer science, with a basicknowledge in (mathematical) stochastics

As one of the participants from the applied field said ”We can look upthe methods ourselves. What we need is a guide to the underlyingconcepts.”

Page 10: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 10

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundAudience

Computational Statistics: An Introduction to R

Designed for a mixed audience

researchers and post-graduates from applied areas (in particularfrom clinical departments and from the DKFZ, the Germancancer research center), with some working knowledge instatistical methods and with considerable laboratory experience

students from mathematics or computer science, with a basicknowledge in (mathematical) stochastics

As one of the participants from the applied field said ”We can look upthe methods ourselves. What we need is a guide to the underlyingconcepts.”

Page 11: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 11

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

What do we need?

Try to illustrate/demonstrate:

What are the statistical concepts and methods that are essential forcomputational statistics on a scientific level?

What is not needed?

How to survive bolognese?

Page 12: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 12

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

What do we need?

Try to illustrate/demonstrate:

What are the statistical concepts and methods that are essential forcomputational statistics on a scientific level?

What is not needed?

How to survive bolognese?

Page 13: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 13

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

What do we need?

Try to illustrate/demonstrate:

What are the statistical concepts and methods that are essential forcomputational statistics on a scientific level?

What is not needed?

How to survive bolognese?

Page 14: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 14

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Statistical Topics

Idea: Select a small set of fairly general statistical topics.

Page 15: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 15

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Statistical Topics

distribution diagnosticsgiven Xi i = 1, . . . , n, infer on L (Xi )

regression models and regression diagnosticsY = m(X ) + ε

non-parametric comparisons

multivariate analysis

Page 16: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 16

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Statistical Topics

distribution diagnosticsgiven Xi i = 1, . . . , n, infer on L (Xi )

regression models and regression diagnosticsY = m(X ) + ε

non-parametric comparisons

multivariate analysis

Page 17: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 17

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Statistical Topics

distribution diagnosticsgiven Xi i = 1, . . . , n, infer on L (Xi )

regression models and regression diagnosticsY = m(X ) + ε

non-parametric comparisons

multivariate analysis

Page 18: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 18

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Statistical Topics

distribution diagnosticsgiven Xi i = 1, . . . , n, infer on L (Xi )

regression models and regression diagnosticsY = m(X ) + ε

non-parametric comparisons

multivariate analysis

Page 19: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 19

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

Note

Topics refer to statistical problem classes,

not specifically to heuristics such as least square, maximum likelihoodetc.,

not to specific models.

They try to mark a broad range of topics.

Topics may be used as self-contained teaching modules, with onlylimited cross-import. They can be taught as separate units.

Page 20: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 20

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundTopics

The course may be presented as an introduction to R. But actually itis an invitation to statistical data analysis.

Time Table

Compact course (5 days).

or

One term, 2h lectures plus 2h exercises per week.

Details to come.

Page 21: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 21

BackgroundContentsSummary

PredecessorsAudienceTopicsStructure

BackgroundStructure

Chapter Structure

Content chapters - used as course material.

Core contentR supplementStatistical summary

Course Material Structure

Four chapters, by statistical topic - used as course material.

Appendix: R Reference sections by programming topic - used assupplement or for look up

Page 22: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 22

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Contents

Background

ContentsCh. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Summary

Page 23: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 23

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

R Programming Conventions

Generation of Random Numbers and Patterns

Case Study: Distribution Diagnostics

Distribution FunctionsHistogramsBarchartsStatistics of Distribution Functions; Kolmogorov-Smirnov TestsMonte Carlo Confidence BandsStatistics of Histograms and Related Plots; χ2-Tests

Moments and Quantiles

Page 24: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 24

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Example 1.1: A Simple Plot (p.7)

Inputx <- runif(100)

plot(x)

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Index

x

Page 25: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 25

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.1(p.7)

Try experimenting with these plots and runif(). Do the plots show imagesof random numbers?To be more precise: do you accept these plots as images of 100 independentrealisations of random numbers, distributed uniformly on (0, 1)?Repeat your experiments and try to note as precisely as possible thearguments you have for or against (uniform) randomness. What is yourconclusion?

Walk through your arguments and try to draft a test strategy toanalyse a sequence of numbers for (uniform) randomness. Try toformulate your strategy as clearly as possible.

Page 26: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 26

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.2 (p.10)

Use

plot(sin(1:100))

to generate a plot of a discretisedsine function. Use your strategyfrom Exercise 1.1.

Does your strategy detect that thesine function is not a randomsequence?

Page 27: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 27

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.2 (p.10)

Use

plot(sin(1:100))

to generate a plot of a discretisedsine function. Use your strategyfrom Exercise 1.1.

Does your strategy detect that thesine function is not a randomsequence?

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Index

sin(1:100)

Page 28: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 28

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.2 (p.10)

Use

plot(sin(1:100))

to generate a plot of a discretisedsine function. Use your strategyfrom Exercise 1.1.

Does your strategy detect that thesine function is not a randomsequence?

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Index

sin(1:100)

Page 29: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 29

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.2 (p.10)

Use

plot(sin(1:100))

to generate a plot of a discretisedsine function. Use your strategyfrom Exercise 1.1.

Does your strategy detect that thesine function is not a randomsequence?

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Index

sin(1:100)

Page 30: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 30

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Note

Try to put a challenge.This is not something to solve on the fly.It is something to come back to.

Page 31: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 31

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Running Exercise in Ch. 1:

Can you tell a uniform from a Gaussian distribution, based on asample?Various methods are discussed.

What is the minimum sample size at which the distribution is barelyrecognizable?

What is the sample size needed for a clear impression?

Page 32: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 32

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Case Study: Distribution Diagnostics (p.10)

Distribution function

Histogram

Smoothing (kernel densityestimation)

Exercise 1.4 (p.16)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Empirical distribution function

(X uniform)

x

Fn((x

))

[

[

[

[

[

[

[

[

[

[

Page 33: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 33

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Note

We start with possibly competing approaches. There is more than oneway.For each approach, the theory (and pragmatics) is developed in steps.After each step, the question is addressed how these approachescompare, and, ultimately, whether there is one which is to bepreferred.

Page 34: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 34

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Case Study: Distribution Function (p.10)

start with simple prototypes

refine software, e.g. graphics

make mathematics correct.

This may need sometheorems, e.g.

Theorem: F (X(i)) has a betadistribution β(i , n − i + 1).Corollary:E (F (X(i))) = i/(n + 1). 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Empirical distribution function

(X uniform)

x

Fn((x

))

[

[

[

[

[

[

[

[

[

[

Page 35: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 35

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Example 1.11 Monte Carlo Confidence Bands (p.23)

Simulation can also help us to getan impression of the typicalfluctuation. We use randomnumbers to generate a smallnumber of samples, and compareour sample in question with thesesimulations. For comparison, wegenerate envelopes of thesesimulations and check whether oursample lies within the areadelimited by the envelopes. 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Monte Carlo Band: 19 Monte Carlo Samples

x

Fn

Page 36: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 36

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

From: Statistical Computing 1993

We claim: a diagnostic plot is only as good as the hardstatistical theory that is supporting it.

G. Sawitzki in:Computational Statistics. Papers collected on the Occasion of the 25thConference on Statistical Computing at Schloss Reisensburg.Edited by P.Dirschedl & R.Ostermann for the Working Groups . . .Physica/Springer: Heidelberg, 1994, isbn 3-7908-0813-x, p. 237-258.

Plots and their statistical counterparts

Plot Statistics/Testhistogram χ2 tests

distribution function Kolmogorov - Smirnov test

Page 37: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 37

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Statistics for the probability plot (p.27)

Theorem: (Kolmogorov, Smirnov) For a continuous distribution function F ,the distribution of supx |Fn − F |(x) is independent of F (in general, it willdepend on n).

Theorem: (Kolmogorov) For a continuous distribution function F andn→∞ the statistic

√n sup |Fn − F | has asymptotically the distribution

function FKolmogorov−Smirnov (y) =∑

m∈Z(−1)me−2m2y2

for y > 0.

Theorem: (Massart 1990) For all integer n and any positive λ, we have

P(√

n sup |Fn − F | > λ) ≤ 2e−2λ2

.

Page 38: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 38

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Note

With all respect: asymptotics should be put in its place.

Learning the difference between asymptotic statements (such asKolmogorov) and finite sample bounds (like the Dvoretzky - Kiefer -Wolfowitz inequality studied by Massart) should start early.

Page 39: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 39

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Distribution Diagnostics

Exercise 1.25 Sample Size (p. 41)

Generate a PP plot of the t(ν) distribution against the standard normaldistribution in the range 0.01 ≤ p ≤ 0.99 for ν = 1, 2, 3, . . ..

Generate a QQ plot of the t(ν) distribution against the standard normaldistribution in the range −3 ≤ x ≤ 3 for ν = 1, 2, 3, . . ..

How large must ν be so that the t distribution is barely different from thenormal distribution in these plots?How large must ν be so that the t distribution is barely different from thenormal distribution if you compare the graphs of the distribution functions?

See also (p. 42 - p. 45).

Page 40: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 40

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Note

Where possible, we try to complement theoretical results bysimulations.

At this step, we avoid concepts like power. Instead we draw theattention to the question: what is the sample size we need to solve acertain task?

At this early point of the course, power differences are discussed interms of required sample size.

We avoid to introduce the term “relative efficiency”, not to overloadthe chapter.

Page 41: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 41

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Linear Models and Regression Diagnostics

General Regression Model

Linear Model

Variance Decomposition by Orthogonal Complements,and Analysis of Variance

Simultaneous Inference

Beyond Linear Regression

Page 42: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 42

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Linear Models and Regression Diagnostics

General Regression Model

Linear Model

Least Squares EstimationRegression Diagnostics (see p. 69 ff)Model FormulaeGauss-Markov Estimator and Residual

Variance Decomposition by Orthogonal Complements,and Analysis of Variance

Simultaneous Inference

Beyond Linear Regression

Page 43: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 43

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Linear Models and Regression Diagnostics

General Regression Model

Linear Model

Variance Decomposition by Orthogonal Complements,and Analysis of Variance

Simultaneous Inference

Scheffe’s Confidence Bands (see p. 85 ff)Tukey’s Confidence Intervals (see p. 87)Case Study: Titre Plates (see p. 88ff)

Beyond Linear Regression

Page 44: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 44

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Linear Models and Regression Diagnostics

General Regression Model

Linear Model

Variance Decomposition by Orthogonal Complements,and Analysis of Variance

Simultaneous Inference

Beyond Linear Regression

Just mentioned:

TransformationsGeneralised Linear ModelsLocal Regression

Page 45: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 45

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 2: Linear Models and Regression Diagnostics

Note

This chapter: mainly textbook material by now.With some extensions for a data analytical point of view . . .

Still needed: point out the special role of the one dimensionalresponse situation, e.g. as expressed by the Gauss-Markov theorem.

Page 46: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 46

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Shift/Scale Families, and Stochastic Order

QQ Plot, PP Plot, and Comparison of Distributions

Kolmogorov-Smirnov Tests

Tests for Shift Alternatives

A Road Map

Power and Confidence

Theoretical Power and ConfidenceSimulated Power and ConfidenceNon-Parametric Quantile Estimation

Qualitative Features of Distributions

Page 47: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 47

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Exercise 3.2: Click Comparison (p. 109)

Try clicking on a random point,with left and then with righthand.

Please click on the circle

Page 48: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 48

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Exercise 3.2: Click Comparison (p. 109)

Try clicking on a random point,with left and then with righthand.

Immediate impression:”feels different”

One hand is more responsive.

Please click on the circle

Page 49: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 49

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Stochastic Order

Notation: A distribution with distribution function F1 is stochasticallysmaller than a distribution with distribution function F2 (in symbols,F1 ≺ F2), if a variable distributed as F1 takes rather smaller values than avariable distributed as F2. This means that F1 increases sooner:F1(x) ≥ F2(x) ∀x and F1(x) > F2(x) for at least one x .

Shift/Scale Families

Notation: For a distribution with distribution function F the familyFa(x) = F (x − a) is called the shift family for F . The parameter a is calledthe shift or location parameter.

Define shift/scale family, and relate to stochastic order.

Page 50: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 50

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

Note

Stochastic order and stochastic monotonicity are the core conceptsthat explain why in some situations statistical problems can bereduced to optimization problems.

It is at the core of much of theoretical statistics, e.g.Neyman-Pearson theory.

Recognizing stochastic order relation and stochastic monotonicity area key competence in statistics.

“Monotone likelihood ratios” etc. only obscure the core argument.

Page 51: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 51

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Click Time

time [s]

Fn

[[

[[[

[[

[[[[[[[[[[

[[[[

[

[[[[[[

[[

[[[

[[[[

[[

[[[

[

right hand

left hand

left hand

right hand

Distribution functions for theright/left click time

Page 52: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 52

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

Note: this is not a shiftalternative.

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Click Time

time [s]

Fn

[[

[[[

[[

[[[[[[[[[[

[[[[

[

[[[[[[

[[

[[[

[[[[

[[

[[[

[

right hand

left hand

left hand

right hand

Distribution functions for theright/left click time

Page 53: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 53

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 54: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 54

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 55: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 55

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 56: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 56

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 57: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 57

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 58: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 58

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison

Challenge: compare twosamples.

t−test for normal shift families -recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot & Kolmogorov-Smirnov

Page 59: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 59

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Lessons / Issues to point out

Statistical methods / tests may have different assumptions on thedata to make them valid.

Statistical methods / tests may have different targets. Which of thesemethods targets shift alternatives? Which have a more general target?

Unsatisfied assumptions or failed targets do not necessarily imply thata method is not usable. For example, you can use a test targeted atshift alternatives to detect differences which are not covered by shiftalternatives.

A thorough discussion is needed here to prepare for the next question:how to compare tests, or methods in general.

Page 60: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 60

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison: Comparison of Methods

Two sample comparisons

t−test for normal shiftfamilies - recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot &Kolmogorov-Smirnov

Page 61: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 61

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison: Comparison of Methods

Two sample comparisons

t−test for normal shiftfamilies - recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot &Kolmogorov-Smirnov

Comparison of methods

sample size comparisons(relative efficiency)

theoretical power

power comparison bysimulation

“test beds” and scenarios

Page 62: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 62

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison: Comparison of Methods

Two sample comparisons

t−test for normal shiftfamilies - recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot &Kolmogorov-Smirnov

Comparison of methods

sample size comparisons(relative efficiency)

theoretical power

power comparison bysimulation

“test beds” and scenarios

Page 63: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 63

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Two Sample Comparison: Comparison of Methods

Two sample comparisons

t−test for normal shiftfamilies - recall from Ch. 2

rank tests (Wilcoxon)

permutation tests

bootstrap

QQ plot

PP plot &Kolmogorov-Smirnov

Comparison of methods

sample size comparisons(relative efficiency)

theoretical power

power comparison bysimulation

“test beds” and scenarios

Page 64: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 64

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Note

Comparison of methods, other from the sample size point of view, hasbeen postponed until there is a sufficient collection of methods incompetition.

There is no discussion of optimality in this course, except for marginalremarks.

“Optimality” is helpful if there is a one dimensional optimalitycriterion. It may be a misleading focus, if there is more than oneaspect to cover.

Page 65: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 65

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 3: Non-parametric Comparisons

Open Question

What is the state of the art information we should give about twosample comparison, keeping in mind that there are more possibilitiesfor differences than what is covered by shift alternatives?

Page 66: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 66

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 4: Multivariate Analysis

Dimensions

Selections

Projections

Sections, Conditional Distributions and Coplots

Transformations and Dimension Reduction

Higher Dimensions

High Dimensions

Page 67: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 67

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 4: Multivariate Analysis

Dimensions

Selections

Projections

Marginal Distributions and Scatter Plot MatricesProjection PursuitProjections for Dimensions 1, 2, 3, . . . 7Parallel Coordinates

Sections, Conditional Distributions and Coplots

Transformations and Dimension Reduction

Higher Dimensions

High Dimensions

Page 68: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 68

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 4: Multivariate Analysis

Dimensions

Selections

Projections

Sections, Conditional Distributions and Coplots

Transformations and Dimension ReductionHigher Dimensions

Linear CasePartial Residuals and Added Variable PlotsNon-Linear CaseExample: Cusp Non-LinearityCase Study: Melbourne Temperature DataCurse of DimensionalityCase Study: Body Fat

High Dimensions

Page 69: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 69

BackgroundContentsSummary

Ch. 2: Distribution DiagnosticsCh. 2: Linear Models and Regression DiagnosticsCh. 3: Non-parametric ComparisonsCh. 4: Multivariate Analysis

ContentsCh. 4: Multivariate Analysis

Open Questions

This chapter needs a revision.

What are the minimal concepts which we should teach aboutmultivariate statistics?

What are the basic methods which should at least be mentioned?

Page 70: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 70

BackgroundContentsSummary

Time TableOpen IssuesReferences

Summary

Background

Contents

SummaryTime TableOpen IssuesReferences

Page 71: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 71

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryTime Table

Monday Tuesday Wednesday Thursday Friday

Basic Data Analysis(Ch. 1)

Regression (Ch. 2) Regression Regression: Discus-sion

Excerpts from Multi-variate (Ch. 4)

Basic DataAnalysis

Regression Regression Comparison (Ch. 3) Excerpts from Multi-variate

Lunch Break

Basic DataAnalysis

Exercises Unsupervised Exer-cises

Comparison Exercises &Discussion

Exercises Regression Unsupervised Exer-cises

Exercises Exercises &Discussion

Afternoon Break

Basic DataAnalysis

Exercises Unsupervised Exer-cises

Discussion Discussion

Exercises& Discussion

Exercises& Discussion

Supplements fromCh. 03

Issues to check:QQ-Plot, PP-Plot,Monte Carlo Bands

Issues to Check:Basic Diagnosticsfor Linear Regression

No Checks Today Issues to Check:Stochastic Order;Shift Alternatives vs.General Differences

Issues to Check:Univariate vs. Mul-tivariate Comparison

Page 72: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 72

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 73: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 73

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 74: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 74

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 75: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 75

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 76: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 76

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 77: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 77

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 78: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 78

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 79: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 79

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 80: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 80

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 81: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 81

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 82: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 82

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryOpen Issues

Ch. 1 Distribution Diagnostic

Essentially stable.

Ch. 2 Regression

Clarify Gauss-Markov theorem and role of dimension.Can the chapter be cleaned up ?

Ch. 3 Comparison (Still a placeholder)

What is an up-to-date discussion of the (non-shift) two sample case?Clean up.

Ch. 4 Multivariate

Given the time limitations, is the current list of concepts sufficient ?Discuss scaling issues, e.g. with respect to PCA.

Page 83: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 83

BackgroundContentsSummary

Time TableOpen IssuesReferences

SummaryReferences

See http://sintro.r-forge.r-project.org/.

Page 84: Computational Statistics A Proposal for a Basic …sintro.r-forge.r-project.org/CompStatBasic.pdfG. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 1 Background Contents

G. Sawitzki: Computational Statistics Basic Reisensburg, 30.6.2009 84

BackgroundContentsSummary

Time TableOpen IssuesReferences

Private Note

Document identification:

$Source: /u/math/j40/cvsroot/lectures/src/SIntro/Reisensburg_2009/CompStatBasic.tex,v $

$Revision: 1.16 $

$Date: 2009/07/25 20:09:50 $

$Name: $

$Author: j40 $

LATEX typeset: July 26, 2009