Stats Workshop2010

Preview:

DESCRIPTION

Paul Garthwaite's Presentation on Statistics

Citation preview

MCTMathematics & Statistics

Paul Garthwaite

stats-advisory@open.ac.uk

http://statistics.open.ac.uk/advisory.html

Introduction to Introduction to Statistical AnalysisStatistical Analysis

The Scientific Method

• Deductive reasoning:– from the general to the specific ("top-

down" approach)

3

Theory: In a pig’s digestive system, all phosphate ions are the same, regardless of what they

were bound with.

Theory: If you are a diabetic, losing weight will help you live longer.

Study Design(deductive reasoning)

5

Hypothesis testing is like a court of law: You aim to disprove the null hypothesis.

The hypothesis of a court: The person in the dock is innocent.

The aim is to gather evidence that is inconsistent with this hypothesis. We reject the hypothesis (and decide the person is guilty) if the evidence makes the hypothesis unlikely (beyond all reasonable doubt).

Inductive Reasoning

• From set of specific observations to broader generalizations and theories ("bottom up" approach)

7

Observational Study(inductive reasoning)

8

Observational studies could feed into inductive reasoning.

Pilot studies have a place in forming hypotheses.

Some disciplines (e.g. psychology) seem to disapprove of observational studies. Presumably such studies are written up as if the hypotheses were decided before gathering the data. (A dangerous practice!)

Statistical Design

• Study can be:– Observational analyse existing data (Inductive)– Experimental produce new data (Deductive)

• Relies on random sampling– Obtain information about the whole from analysing

the part (inferential statistics)

• Experimental design:– randomly allocates conditions/treatments on

subjects to observe their response

Warning

Poor designs can lead to:

• Inefficient use of collected data

• Difficult statistical analysis

• Inability to draw meaningful

conclusions

Use Common Sense

• Think about questions your research might answer.

• Can you gather data related to those questions?• Using common sense, would the data answer

those questions?

Pigs and phosphates: feed pigs different phosphate compounds and see if their bone strengths differ?

Diabetes and diet: use patient notes to get age at death, age at diagnosis, and weight loss in first year after diagnosis.

12

• In many ways, statistics just makes common sense rigorous.

• Think about what covariates may be relevant and try to measure them (gender and age in many social contexts; smoking in medical studies; etc.)

• Try to reduce random variation.

13

Gather lots of data

• A decent experiment will generally form about a quarter of a PhD (perhaps more) – four papers are enough for a PhD in most disciplines.

• Designing an experiment, collecting data, analysing it, writing a paper, revising the paper, and so on, will take several months.

• People typically do not spend enough time gathering data. The data drives the conclusions you can reach

More data = Firmer conclusions

14

How much data? (My rules of thumb.)• In a controlled experiment where the quantity of

interest is a measurement, forty or so independent observations will typically enable modest-sized differences to be identified.

• With observational data and questionnaire data, gathering 150 data or more should typically be the aim: you want 25 observations in each category of interest.

• More data is needed with counts than measurements.

• More data is needed with binary quantities (yes/no; cured/not cured; success/failure) than with Likert scores.

15

Questionnaires

Likert scales are good:

strongly weakly indifferent/ disagree/ strongly agree/ agree/ disagree.

Having five points on a Likert scale is often about right. Code the values as 1, 2, 3, 4, 5 and it is usually OK to treat them as measurements.

Open-ended questions are hard to analyse.

Statistical Data Analysis• Turning data into information: First produce

summary statistics (means percentages, standard deviations), graphs, bar-charts, cross-tabulations.

• Try to get a feel for your data – what does it tell you? (If you feel you are non-numerate, work at becoming numerate.)

• Try to form quantitative hypotheses that you think the data will refute. (e.g. “The proportions in the ‘strongly agree’ category are the same in these two sub-populations” or “As this quantity changes, the average value of this other quantity does not change”.)

17

Common fundamental statistical methods

• t-tests

• Comparison of proportions

• Contingency tables

• Regression

• Analysis of variance

It is worth knowing when these are useful.

18

Regression

• In many ways regression is the most useful statistical method.

• It lets you test whether one variable affects another (while controlling for other covariates if necessary).

• It also describes the relationship.• Stepwise methods help you find/test which

variables are important.• Generalised linear models add flexibility.

survival time (weight change) .age .gendera b c d

.BMI .IHD .(blood pressure).e f g

19

• There is an advisory service that can help on:

– Designing an experiment

– How to approach the analysis of data

– Choosing appropriate techniques

– Interpreting results

– Understanding outputs from statistical packages

• Too few people ask for advice before gathering data.

Statistical Software

• Packages are only tools (‘number crunches’)

Most important is to choose adequatemethod for your problem

Remember:

Garbage in Garbage out

Some Statistical Packages

• General software (e.g. spreadsheets)

• Specialised:– Genstat, Minitab, SAS, Statistica, – SPSS

• wide range of statistical procedures• good graphical capability• fairly easy to use (menu driven option)• Good help facility with case studies

Statistics Courses

• M248: Analysing Data– Exploratory data analysis. Models for data.

Estimation. Confidence intervals. Hypothesis testing. Regression and two-variable problems. (Minitab)

• M249: Practical Modern Statistics– Medical statistics. Time series analysis.

Multivariate statistics. Bayesian methods.– Focus on applications: SPSS and WinBUGS.

Statistics Courses

• M343: Applications of Probability– Models to describe patterns in time and space.

Epidemiological models. Genetics and stockmarket price applications.

• M346: Linear Statistical Modelling– ANOVA. Design of experiments. Linear

regression. Generalized linear models. Diagnostic checking. Log-linear models. (GenStat)

The Stats-Advisory Service

• Drop-in sessions

– Mondays: 2:00 – 4:00 (M216)

– Thursdays: 10:30 – 12:20 (M214)

(Both in Maths and Computing Building)

• Web:– http://statistics.open.ac.uk/advisory.html

• E-mail:

Stats-Advisory@open.ac.uk