35
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 1 i INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

  • Upload
    gloria

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2. Standard Deviation. σ = SQRT( Σ (X - µ) 2 /N) (Does that give you a headache?). USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. David Letterman. - PowerPoint PPT Presentation

Citation preview

Page 1: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 1

i

INF 397CIntroduction to Research in Information

Studies

Fall, 2005

Day 2

Page 2: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 2

iStandard Deviation

σ = SQRT(Σ(X - µ)2/N)

(Does that give you a headache?)

Page 3: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 3

i• USA Today has come out with a new

survey - apparently, three out of every four people make up 75% of the population. – David Letterman

Page 4: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 4

i• Statistics: The only science that enables

different experts using the same figures to draw different conclusions. – Evan Esar (1899 - 1995), US humorist

Page 5: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 5

iScales (last week)Nominal Ordinal Interval Ratio

Name = = =

Mutually-exclusive

= = =

Ordered = =

Equal interval

=

+ abs. 0Gender, Yes/No

Class rank, ratings

Days of wk., temp.

Inches, dollars

Page 6: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 6

iCritical Skepticism

• Remember the Rabbit Pie example from last week?

• The “critical consumer” of statistics asked “what do you mean by ’50/50’”?

Page 7: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 7

iRemember . . .

• I do NOT want you to become cynical.

• Not all “media bias” (nor bad research) is intentional.

• Just be sensible, critical, skeptical.

• As you “consume” statistics, ask some questions . . .

Page 8: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 8

iAsk yourself. . .

• Who says so? (A Zest commercial is unlikely to tell you that Irish Spring is best.)

• How does he/she know? (That Zest is “the best soap for you.”)

• What’s missing? (One year, 33% of female grad students at Johns Hopkins married faculty.)

• Did somebody change the subject? (“Camrys are bigger than Accords.” “Accords are bigger than Camrys.”)

• Does it make sense? (“Study in NYC: Working woman with family needed $40.13/week for adequate support.”)

Page 9: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 9

iWhat were . . .

• . . . some claims you all heard this week?

Page 10: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 10

iLast week . . .

• We learned about frequency distributions.

• I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data.

• And I asserted there’s another, even shorter-hand way.

Page 11: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 11

iMeasures of Central Tendency

• Mode– Most frequent score (or scores – a

distribution can have multiple modes)

• Median– “Middle score”– 50th percentile

• Mean - µ (“mu”)– “Arithmetic average”– ΣX/N

Page 12: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 12

iOK, so which do we use?

• Means allow further arithmetic/statistical manipulation. But . . .• It depends on:

– The type of scale of your data• Can’t use means with nominal or ordinal scale data• With nominal data, must use mode

– The distribution of your data• Tend to use medians with distributions bounded at one

end but not the other (e.g., salary). (Look at our “Number of MLB games” distribution.)

– The question you want to answer• “Most popular score” vs. “middle score” vs. “middle of the

see-saw”• “Statistics can tell us which measures are technically

correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).

Page 13: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 13

iScales (which measure of CT?)Nominal

(mode)

Ordinal

(mode, median)

Interval

(any)

Ratio

(any)

Name = = =

Mutually-exclusive

= = =

Ordered = =

Equal interval

=

+ abs. 0

Gender, Yes/No

Class rank, ratings

Days of wk., temp.

Inches, dollars

Page 14: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 14

iMean – “see saw” (from Tal, 2001)

Page 15: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 15

iHave sidled up to SHAPES of distributions

• Symmetrical

• Skewed – positive and negative

• Flat

Page 16: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 16

i“Pulling up the mean”

Page 17: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 17

iWhy . . .

• . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff?

• “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).

Page 18: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 18

iNote . . .

• We started with a bunch of specific scores.• We put them in order.• We drew their distribution.• Now we can report their central tendency.• So, we’ve moved AWAY from specifics, to a

summary. But with Central Tendency, alone, we’ve ignored the specifics altogether.– Note MANY distributions could have a particular

central tendency!• If we went back to ALL the specifics, we’d be

back at square one.

Page 19: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 19

iMeasures of Dispersion

• Range

• Semi-interquartile range

• Standard deviation– σ (sigma)

Page 20: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 20

iRange

• Highest score minus the lowest score.• Like the mode . . .

– Easy to calculate– Potentially misleading– Doesn’t take EVERY score into account.

• What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency.– ‘Cause MANY different distributions of scores can

have the same central tendency!– “Standard Deviation”

Page 21: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 21

iBack to our data – MLB games

• Let’s take just the men in this class

• xls spreadsheet.

• Measures of central tendency.

• Go with mean.

• So, how much do the actual scores deviate from the mean?

Page 22: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 22

iSo . . .

• Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is.

• Let’s calculate the Standard Deviation.

• As always, start inside the parentheses.

• Σ(X - µ)

Page 23: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 23

iDamn!

• OK, let’s try it on a smaller set of numbers.

X

2

3

5

6

Page 24: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 24

iDamn! (cont’d.)

• OK, let’s try it on a smaller set of numbers.

X X - µ

2 -2

3 -1

5 1

6 2

Σ = 16 Σ = 0

µ = 4 Hmm.

Page 25: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 25

iOK . . .

• . . . so mathematicians at this point do one of two things.

• Take the absolute value or square ‘em.

• We square ‘em. Σ(X - µ)2

Page 26: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 26

iStandard Deviation (cont’d.)

• Then take the average of the squared deviations. Σ(X - µ)2/N

• But this number is so BIG!

Page 27: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 27

iOK . . .

• . . . take the square root (to make up for squaring the deviations earlier).

• σ = SQRT(Σ(X - µ)2/N)

• Now this doesn’t give you a headache, right?

• I said “right”?

Page 28: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 28

iHmmm . . .

Mode Range

Median ?????

Mean Standard Deviation

Page 29: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 29

iWe need . . .

• A measure of spread that is NOT sensitive to every little score, just as median is not.

• SIQR: Semi-interquartile range.

• (Q3 – Q1)/2

Page 30: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 30

iTo summarize

Mode Range -Easy to calculate.-May be misleading.

Median SIQR -Capture the center.-Not influenced by extreme scores.

Mean

(µ)

SD

(σ)

-Take every score into account. -Allow later manipulations.

Page 31: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 31

iPractice Problems

Page 32: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 32

iWho wants to guess . . .

• . . . What I think is the most important sentence in S, Z, & Z (2003), Chapter 2?

Page 33: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 33

ip. 19

• Penultimate paragraph, first sentence:

• “If differences in the dependent variable are to be interpreted unambiguously as a result of the different independent variable conditions, proper control techniques must be used.”

Page 34: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 34

i• http://highered.mcgraw-hill.com/sites/007

2494468/student_view0/statistics_primer.html

• Click on Statistics Primer.

Page 35: INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 35

iHomework

• LOTS of reading. See syllabus.

• Send a table/graph/chart that you’ve read this past week. Send email by noon, Friday, 9/16/2005.

See you next week.