14
5/10/13 1 Deception in Statistics Introduction Statistics

Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

  • Upload
    dodat

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

1  

Deception in Statistics

Introduction

Statistics

Page 2: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

2  

What will we do?

•  Learn about numeracy and statistics •  Tell appart good from bad statistical reporting •  Identify the sources of bad statistics

•  Inappropriate samples •  “Cherry-picking” •  Confusing absolute and relative numbers •  Miss-presenting

•  Learn about graphical presentations •  What to do and what not to do

Rough plan

•  Day 1 •  About mathematical literacy and its importance •  Bad statistics and its sources

•  Day 2 •  Choosing the right average •  Inappropriate comparisons •  Sampling

•  Day 3 •  Sample size and interval estimates •  Introduction to graphical presentations

•  Day 4 •  “Lying with graphs”

Page 3: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

3  

Numeracy and innumaracy

•  Numeracy is mathematical literacy •  ” The knowledge and skills required to apply arithmetic

operations, either alone or sequentially, using numbers embedded in printed material (e.g., balancing a checkbook, completing an order form).”

•  Lack of mathematical literacu is called "innumeracy" •  John Allen Paulos (1989): Innumeracy - Mathematical

Illiteracy and its Consequences.

Potential consequences of innumeracy

•  Inaccurate reporting of news stories and insufficient skepticism in assessing these stories

•  Financial mismanagement and accumulation of consumer debt

•  Loss of money on gambling •  Belief in pseudoscience. •  Poor assessment of risk, •  Limited job prospects

Page 4: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

4  

Inaccurate reporting of news stories

•  Joel Best (2001) reports a quote from a PhD distertation: •  ” Every year since 1950, the number of American children

gunned down has doubled.”

•  What is wrong with this quote?

Financial mismanagement and accumulation of consumer debt

•  Think for example stock-market bubbles? Why do they happen?

•  Say I lend you 100 euros on an 1 percent weekly rate. How much will you owe me by next year (after 52 weeks), when I come back and collect? •  101 euros? •  152 euros? •  167.77 euros? •  520 euros? •  10104 euros?

Page 5: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

5  

Loss of money on gambling

•  Rationally, can you expect to earn money gambling? •  When you place 10 euros on a single number of an

Ammerican roulette, how much do you expect to get on average? •  0 •  9.73 •  10 •  10.27 •  100

Belief in pseudoscience

•  Paulos (1989): "Innumeracy and pseudoscience are often associated, in part because of the ease with which mathematical certainty can be invoked, to bludgeon the innumerate into a dumb acquiescence."

•  Homeopathy and the memory of water •  Bad Science blog, http://www.badscience.net/2007/09/528/

Page 6: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

6  

Poor assessment of risk

•  According to the NHTSA, in 2004 there were in road transport:

•  6.1M accidents •  1.9M involved injuries •  38253 involved fatalities •  there were 42636 fatalities in

motor vehicle accidents, breaking down to: •  33134 "occupants" (car/truck drivers

and passengers) •  4008 "motorcycle riders" •  5494 "non-occupants" (e.g.

pedestrians)

•  an estimated 2.9T vehicle miles traveled

•  an estimated 10.0B motorcycle vehicle miles traveled.

•  According to NTSB in the 1991 to 2000 period in the comercial flights there were •  938 fatalities in 31 fatal

accidents •  145 million flight hours •  59.7 billion miles flown

What is more dangerous, flying or driving?

Job prospects

•  Numeracy (mathematical literacy) tests are standard when applying for a job •  Test yourself at e.g.: •  http://www.kent.ac.uk/careers/tests/mathstest.htm •  http://www.proprofs.com/quiz-school/story.php?

title=numeracy-test

Page 7: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

7  

Innumeracy and bad statistics

•  General audience is to a large extent innumerate •  Opportunity for bad statistics •  What are bad statistics?

•  Soft facts (guessing) •  Badly defined or undefined numbers •  Inadequate measurement •  Biased samples

Group work

•  Can you think of any more cases, when misunderstanding the numbers can lead into troubles?

•  Think of a situation, when you or someone you know suffered because of the poor understanding of numbers.

•  What would need to be done in order to avoid such situation?

Page 8: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

8  

Soft facts

•  The "dark figure" •  The extent of the phenomenon left unmeasured •  For example, the underground economy

Guessing and big numbers

•  When guessing, advocates of a fact like big round numbers •  Innumerate audience doesn’t understand big numbers

•  For example, the Megapenny Project •  One penny is:

•  value 1¢, (one cent) •  width 19.05 mm, •  height 19.05 mm, •  thickness 1.59 mm, •  weight 2.83 g, •  area 363 square mm

•  So, how much is a megapenny (one million pennies?)

Page 9: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

9  

Definitions

•  "The use of pornography is increasing!"

•  But, wait, what is pornography?

According to Family Safe Media, every second: • $3,075.64 is being spent on pornography • 28,258 Internet users are viewing pornography • 372 Internet users are typing adult search terms into search engines • Every 39 minutes a new pornographic video is being created in the United States

Measurement

•  Poverty in Finland (Veli-Matti Ritakallio, 2010)

•  How do you measure poverty? •  RELINC: Relative income poverty: current OECD

equivalent self-reported DPI is less than 50% of the national median income

•  CONCE: Cumulative deprivation or consensual deprivation: all those who lack at least three commodities regarded as necessary by the majority of the whole population are poor.

•  SCARCITY: Economic hardship: respondent’s subjective evaluation of problems in making ends met (feeling that its highly difficult to make ends meet) and continuous troubles in paying bills (rent etc.).

•  RELIABLE: Poor according to at least two of the three indicators presented above.

Page 10: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

10  

Poverty in Finland

Source: Veli-Matti Ritakallio, 2010

Sampling

•  Bad statistics because of samples: •  Too small samples •  Non-representative samples •  Non-random samples

•  E.g. convenience samples •  Unjustified generalizations

•  More on sampling tomorrow

Page 11: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

11  

A miracle drug

"The researchers found that in the group taking the drug, heart attack risk was down by 54% and stroke by 48%. The combined risk of heart attack, stroke and heart-related death fell by 47%, as did the odds of undergoing surgical procedures. "

The Guardian, November 10, 2008

Somewhat less than a miracle

•  The JUPITER trial •  Relative Risk Reduction •  Absolute Risk Reduction

•  On placebo, your risk of a heart attack in the trial was 0.37 events per 100 person years

•  if you were taking the drug, it fell to 0.17 events per 100 person years.

•  0.37 to 0.17. WOW!

Page 12: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

12  

What is statistics?

Two foundations of statistics: •  Description of the state and population (official statistics)

•  Part of mathematics (probability, distribution, sampling,…)

Two different branches of statistics are used in business

Statistics Statistics transforms data into useful information for decision makers

Descriptive Statistics

Collecting, summarizing, presenting and analyzing data

Inferential Statistics

Using data collected from a small group to draw conclusions about a

larger group

Page 13: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

13  

Basic vocabulary of statistics

VARIABLES •  Variables are a characteristics of any unit (e.g. an individual,

a company, a household, a product, an event, a transaction etc.) and are what you analyze when you use a statistical method.

DATA •  Data are the different values associated with a variable. OPERATIONAL DEFINITIONS •  Data values are meaningless unless their variables have

operational definitions, universally accepted meanings that are clear to all associated with an analysis.

Basic vocabulary of statistics POPULATION •  A population consists of all units about which you want to

draw a conclusion. The population is the “large group”. SAMPLE •  A sample is the portion of a population selected for analysis.

The sample is the “small group”. PARAMETER •  A parameter is a numerical measure that describes a

characteristic of a population. STATISTIC •  A statistic is a numerical measure that describes a

characteristic of a sample.

Page 14: Deception in Statisticsusers.metropolia.fi/~pervil/ipw/Pahor/IPW_DeceptionInS… ·  · 2013-05-11Deception in Statistics Introduction Statistics . ... Mathematical Illiteracy and

5/10/13  

14  

Population vs. Sample

Population Sample

Measures used to describe the population are called parameters

Measures used to describe the sample are called statistics

Daily assignment

•  Find (newspaper) articles containing bad statistics, •  comment on

•  why the statistics are bad, •  where are they deceiving, •  ethics of this deception