42
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk

Statistics and Research methods

  • Upload
    ishana

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Statistics and Research methods. Wiskunde voor HMI Betsy van Dijk. Introduction. Statistics is about Systematically studying phenomena in which we are interested Quantifying variables in order to use mathematical techniques - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics and Research methods

Statistics and Research methods

Wiskunde voor HMI

Betsy van Dijk

Page 2: Statistics and Research methods

Introduction

Statistics is about– Systematically studying phenomena in which we are

interested– Quantifying variables in order to use mathematical

techniques– Summarizing these quantities in order to describe

and make inferences– Using these descriptions and inferences to make

decisions or understand

Page 3: Statistics and Research methods

The Two Branches of Statistical Methods

Descriptive statistics (beschrijvende statistiek)– Used to summarize, organize and simplify data

Inferential statistics (toetsende statistiek)– Draw conclusions/make inferences that go beyond

the numbers from a research study– Techniques that allow us to study samples and then

make generalizations about the populations from which they were selected

Page 4: Statistics and Research methods

Descriptive Statistics

Numbers that describe the characteristics of a particular data set– “The average age in the class is 27 years”– “The range of ages in class is 22 years, from a

minimum of 20 to a maximum of 42”

Page 5: Statistics and Research methods

Inferential Statistics

Descriptive statistics from a sample that are used to make inferences about the characteristics of a population.– “The average age of people taking Research

Statistics is 27 years.”

People takingResearch Statistics A sample of people taking

Research Statistics

a “parameter”

Page 6: Statistics and Research methods

Basic Concepts - Variables

Things that change– Environmental events or conditions– Personal characteristics or attributes– Behaviors

Anything that takes on different values in different situations (even just through time)

Page 7: Statistics and Research methods

Basic Concepts

Value– A possible number or category that a score can have

Score– A particular person’s value on a variable

Data– Scores or measurements of phenomena, behaviors,

characteristics, etc.

A Statistic– A number that summarizes a set of data in some way

Page 8: Statistics and Research methods

Populations and Samples

Population– Set of all the individuals of interest in a population study

Sample– Set of individuals selected from the population

Sampling error– Discrepancy, or amount of error that exists between the

sample statistic and population parameter

Page 9: Statistics and Research methods

Measurement

Measurement is the process of assigning numbers to variables following a set of rules

There are different levels of measurement– Nominal– Ordinal– Interval– Ratio

Page 10: Statistics and Research methods

Nominal Measurement

Places data in categories Non-quantitative (e.g. qualitative), even though there

might be numbers involved Nominal (categorical) variables Examples

– Male/Female M,F (0,1)

– Voting precinct Alucha, Dade, Palm Beach (023, 095, 167)

Page 11: Statistics and Research methods

Ordinal Measurement

Places data in order Quantitative as far as ranking goes Rank-order (ordinal) variables Distance between values varies Examples

– First, second, third (1,2,3) (2.7, 2.8, 7.6)

– Young, Middle Age, Old– Very Good, Good, Intermediate, Bad, Very Bad

(1,2,3,4,5)

Page 12: Statistics and Research methods

Interval Measurement

Has all the characteristics of ordinal data Additionally, the differences between values represents

a specific amount of whatever is being measured (equal intervals represent equal amounts)

Examples– Temperature (the difference between 20C and 40C is the

same as 60C and 80C, but 0 is not the absence of temperature)

Note: Many rating scales are treated like interval measurements

Page 13: Statistics and Research methods

Ratio Measurement

Has all the characteristics of interval data Additionally, has a true zero which represents the

absence of whatever is being measured Examples

– Time (e.g. reaction time)– Distance

The zero point allows you to make statements about ratios (e.g. 100 feet is twice as far as 50 feet)

Page 14: Statistics and Research methods

A Few More Things

Continuous variables– Take on an infinite number of values between two

measured levels (e.g. time measurements)

Discrete variables– Have no intermediate values (e.g. number of

people in class)

Page 15: Statistics and Research methods

Math Warm-Up

Order of operations– Parentheses, exponents, multiplication/division, addition/subtraction– PEMDAS, or “please excuse my dear aunt sally”– Summation using the summation statistic before other addition/substraction

Proportion– Some portion of some total amount– Expressed by a fraction or a decimal– To calculate, divide the portion by the total amount

Percentage– A proportion that is scaled to be out of 100 (instead of some other total amount)– To calculate, first calculate the proportion, then multiply by 100

Mathematical operators– Exponents, square roots, parentheses, summation, indexing

Page 16: Statistics and Research methods

Math Warm-Up

Practice problems

a

bxy

2)( baxy

bxyN

ii

1a

bxy

2)(

Page 17: Statistics and Research methods

Frequency Tables

Used to summarize data

Steps in making a frequency table

1. Make a list of each possible value

2. Count up the number of scores with each value

3. Make a table

Frequency table shows how often each value occurs

Page 18: Statistics and Research methods

A Frequency Table

Stress Rating

Frequency

Percent

10 14 9.3 9 15 9.9 8 26 17.2 7 31 20.5 6 13 8.6 5 18 11.9 4 16 10.6 3 12 7.9 2 3 2.0 1 1 0.7 0 2 1.3

Page 19: Statistics and Research methods

Histogram -- Stress-rating Data

0

5

10

15

20

25

30

35

0 1 2 3 4 5 6 7 8 9 10

Stress RatingFr

eq

ue

ncy

Stress Frequency0 21 12 33 124 165 186 137 318 269 15

10 14

Page 20: Statistics and Research methods

Grouped Frequency Table

A frequency table that uses intervalsStress

Rating Interval

Frequency

Percent

10-11 14 9 8-9 41 27 6-7 44 29 4-5 34 23 2-3 15 10 0-1 3 2

Page 21: Statistics and Research methods

Frequency Graphs

Histogram

Page 22: Statistics and Research methods

Frequency Graphs

Frequency polygon

Page 23: Statistics and Research methods

Shapes of Frequency Distributions

Unimodal, bimodal, and rectangular

Page 24: Statistics and Research methods

Shapes of Frequency Distributions

Unimodal – there is a single most frequent value or “peak”

Bimodal – there are two most-frequent values or peaks

Rectangular – there is no peak; all values are about equally frequent

Page 25: Statistics and Research methods

Shapes of Frequency Distributions

Symmetrical and skewed distributions

Page 26: Statistics and Research methods

Shapes of Frequency Distributions

Symmetrical – left and right halves of the distribution have approximately the same shape

Skewed – left and right halves of the distribution do not have the same shape

“skew” is towards the side with the fewer cases Right (or positive) skew = few cases with large scores Left (or negative) skew = few cases with small scores

Page 27: Statistics and Research methods

Skewed distributions may be caused by:

“Ceiling effects” – limitation in the high end of the scale

“Floor effects” – limitation in the low end of the scale

Page 28: Statistics and Research methods

Sometimes skewed distributions occur because of the nature of the variable itself…

0

5

10

15

20

25

30

35

0 1 2

Number of Children

Mill

ion

s o

f F

am

ilie

s

Page 29: Statistics and Research methods

Shapes of Frequency Distributions

Normal and kurtotic distributions

Page 30: Statistics and Research methods

Measures of Central Tendency

Median– The value in the middle

Mode– The most common value

Mean– The average value

Page 31: Statistics and Research methods

The Mean

M = the mean X = the scores N = the number of scores

MX

N

Page 32: Statistics and Research methods

The Median

Rank the scores from lowest to highest Median is the score in the middle

– if even number of scores, by convention take the average of the two middle ones

Median is not as sensitive to extreme values as the mean

Page 33: Statistics and Research methods

The Mode

The most frequent score To compute the mode: look at a frequency

table and find the most frequent score. In a symmetrical, unimodal distribution, the

mean, median and mode are all the same.

Page 34: Statistics and Research methods

Symmetrical Distribution

0

0,5

1

1,5

2

2,5

3

3,5

4 5 6 7 8

Frequency

MeanMedianMode

Page 35: Statistics and Research methods

QuestionNegative Skew

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

4 5 6 7 8

Frequency

Where (approximately) will Mean, Median and Mode be situated?

Page 36: Statistics and Research methods

Problem with the Mean

The mean can be strongly influenced by outliers– This distorts the mean as a measure of central tendency

The median and mode are less affected by outliers

Page 37: Statistics and Research methods

Measures of Variance

– A single number that tells you how spread out a distribution is

All M = 15.0

0

1

2

3

4

5

6

7

8

12 13 14 15 16 17 18

# of Chews

Fre

qu

en

cy

0

1

2

3

4

5

6

7

8

9 11 13 15 17 19 21

# of Chews

Fre

qu

en

cy

0

1

2

3

4

5

6

7

8

2.5 7.5 12.5 17.5 22.5 27.5

# of Chews

Fre

qu

en

cy

Page 38: Statistics and Research methods

Measures of Variance

Range: difference between the maximum and minimum observed values

Variance: a measure of the amount that values differ from the mean of their distribution

Standard deviation: the average amount (approximately) that values differ from the mean of their distribution

Page 39: Statistics and Research methods

Formula for the sample variance:

Estimate of the population variance:

Unbiased estimate of population variance Degrees of freedom: df = N-1

SD

X M

N2

2

SD

X M

N2

2

1

Variance

Page 40: Statistics and Research methods

Describing Individual Values

Sometimes observations have values that people are familiar with

– Rating 1 to 10, Age, Temperature, SAT

But sometimes values are on an unfamiliar scale– Score on the Wisconsin Card Sorting Task– APGAR score

How can you communicate the relative value of a given observation?

– Is that a very high value? very low? somewhere in the middle?

Page 41: Statistics and Research methods

Z Scores

Characterize a score in relation to the distribution

The number of standard deviations the score is above or below the mean is called the Z score

Formula for Z score:

SDMX

Z

Page 42: Statistics and Research methods

Standard and Raw Scores

Z scores are also called “standard scores”

The original scores are called “raw scores”

For a distribution of Z scores, always M = 0

... and always SD = 1