14
The Art of The Art of Deceptive Deceptive Statistics Statistics Using statistics as a dishonest tool to achieve desired Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results, and how to determine the validity of statistical results results

The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Embed Size (px)

DESCRIPTION

Not using a random sample Suppose a city council wanted to request federal assistance for their city. They would then want to show that their community is poor. One measure of which would be the value of their houses. If they decided to only take the price of houses from a very poor community, the average price of a home of the sample would be much lower than the true average price of a home. It’s not that they didn’t calculate the sample mean correctly, but they took a sample that wasn’t representative of the entire city. This is a prime example of purposely using misleading statistics.

Citation preview

Page 1: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

The Art of The Art of Deceptive Deceptive StatisticsStatistics

Using statistics as a dishonest tool to achieve Using statistics as a dishonest tool to achieve desired results, and how to determine the validity desired results, and how to determine the validity

of statistical resultsof statistical results

Page 2: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Ways to use Ways to use statistics for your statistics for your

own goodown goodSome ways that people use statistics for their own Some ways that people use statistics for their own purposes are:purposes are:

Using a nonrandom sample that supports their Using a nonrandom sample that supports their conclusionconclusion

Using different measures of center to describe Using different measures of center to describe similar data setssimilar data sets

Reporting only some findings instead of all Reporting only some findings instead of all findings.findings.

Changing definitions to include more or less peopleChanging definitions to include more or less people

Page 3: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Not using a random Not using a random samplesample

Suppose a city council wanted to request federal Suppose a city council wanted to request federal assistance for their city.assistance for their city.

They would then want to show that their community is They would then want to show that their community is poor. One measure of which would be the value of their poor. One measure of which would be the value of their houses.houses.

If they decided to only take the price of houses from a If they decided to only take the price of houses from a very poor community, the average price of a home of very poor community, the average price of a home of the sample would be much lower than the true average the sample would be much lower than the true average price of a home.price of a home.

It’s not that they didn’t calculate the sampleIt’s not that they didn’t calculate the sample mean mean correctly, but they took a sample that wasn’t correctly, but they took a sample that wasn’t representative of the entire city.representative of the entire city.

This is a prime example of purposely using misleading This is a prime example of purposely using misleading statistics.statistics.

Page 4: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

What should have What should have been donebeen done

Houses from all over the city should have been taken, to get Houses from all over the city should have been taken, to get an accurate representation of the city’s true economic an accurate representation of the city’s true economic housing climate.housing climate.

The city could get in trouble if someone found out they The city could get in trouble if someone found out they deliberately used a nonrandom sample to report their deliberately used a nonrandom sample to report their findings.findings.

Page 5: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Using different Using different measures of measures of

centercenterAnother situation that statistics could be used to deceive Another situation that statistics could be used to deceive people is when using measures of center.people is when using measures of center.

Remember that the three measures of center are mean(or Remember that the three measures of center are mean(or average), median, and mode.average), median, and mode.

Depending on which one you use, the center of a data set Depending on which one you use, the center of a data set could look drastically different.could look drastically different.

Page 6: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

ExampleExample

We have two datasets, A and BWe have two datasets, A and B

A is the set A is the set {{2,3,4,4,5,6,902,3,4,4,5,6,90}}

B is the set B is the set {{1,2,2,4,6,8,111,2,2,4,6,8,11}}

What does it look like if we use the mean, median, or mode What does it look like if we use the mean, median, or mode to describe these sets?to describe these sets?

Page 7: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Example continuedExample continuedMean of A: (2+3+4+4+5+6+90)/7 = 114/7 = 16.29Mean of A: (2+3+4+4+5+6+90)/7 = 114/7 = 16.29

Mean of B: (1+2+2+4+6+8+11)/7 = 34/7 = 4.86Mean of B: (1+2+2+4+6+8+11)/7 = 34/7 = 4.86

The means paint the picture that these two data sets are The means paint the picture that these two data sets are drastically different, even though the large observation in A is drastically different, even though the large observation in A is the biggest differencethe biggest difference

Median of A: 4Median of A: 4

Median of B: 4Median of B: 4

If we use the median, then the two data sets look like they If we use the median, then the two data sets look like they have the same center. have the same center.

Page 8: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Example continued Example continued again...again...

Mode for A: 4 occurs twice, so our mode is 4Mode for A: 4 occurs twice, so our mode is 4

Mode for B: 2 occurs twice, so our mode is 2Mode for B: 2 occurs twice, so our mode is 2

Using mode, A seems to be bigger than B.Using mode, A seems to be bigger than B.

All three of these measures of centers paint different pictures All three of these measures of centers paint different pictures of the data sets. None of them are wrong, but if you want it of the data sets. None of them are wrong, but if you want it to look a certain way, you could purposely choose one that to look a certain way, you could purposely choose one that would back up your conclusion.would back up your conclusion.

Page 9: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Underreporting your Underreporting your findingsfindings

If you fail to report all of your statistical results from a study, If you fail to report all of your statistical results from a study, this is also a form of deceptive statistics.this is also a form of deceptive statistics.

One result could make your company or yourself look good, One result could make your company or yourself look good, whereas the other would make you look bad.whereas the other would make you look bad.

Obviously, you would only want to choose the report the Obviously, you would only want to choose the report the results that help you out.results that help you out.

Page 10: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Underreporting Underreporting ExampleExample

A company that makes helmets releases a report that their A company that makes helmets releases a report that their new helmets are making the army safer, citing that deaths new helmets are making the army safer, citing that deaths from head wounds are down 50%.from head wounds are down 50%.

What they don’t report is that people are still getting injured What they don’t report is that people are still getting injured in the head, but since the helmets are better, they do not die. in the head, but since the helmets are better, they do not die. There are still plenty of people getting hurt, but its just in the There are still plenty of people getting hurt, but its just in the form of injury, not death.form of injury, not death.

Page 11: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Changing DefinitionsChanging Definitions

Companies may also choose to include or remove certain Companies may also choose to include or remove certain demographics in a study that contradicts the conclusions demographics in a study that contradicts the conclusions they wish to make. they wish to make.

Once again, this is not incorrect, but the purposeful removal Once again, this is not incorrect, but the purposeful removal of these groups make the act deceptive.of these groups make the act deceptive.

Page 12: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Changed Definitions Changed Definitions ExampleExample

How do you define a car accident as caused by alcohol?How do you define a car accident as caused by alcohol?

You could say if there was any alcohol consumed by anyone You could say if there was any alcohol consumed by anyone involved in the crash, it is alcohol related.involved in the crash, it is alcohol related.

Or, you could say that if the driver was legally intoxicated, Or, you could say that if the driver was legally intoxicated, then it is caused by alcohol.then it is caused by alcohol.

Page 13: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Alcohol Example Alcohol Example ContinuedContinued

Using these two definitions, we will get very different results Using these two definitions, we will get very different results if we want to determine the proportion of car accidents that if we want to determine the proportion of car accidents that are alcohol related. are alcohol related.

If an alcohol producer was making this report, they would If an alcohol producer was making this report, they would clearly choose the definition that led to the lowest proportion clearly choose the definition that led to the lowest proportion of alcohol related accidents.of alcohol related accidents.

Page 14: The Art of Deceptive Statistics Using statistics as a dishonest tool to achieve desired results, and how to determine the validity of statistical results

Ways to identify Ways to identify misleading statsmisleading stats

You always want to look at the sample. How large is it? Who You always want to look at the sample. How large is it? Who was in it? Is it representative?was in it? Is it representative?

Investigate the methods used. Did they use the same Investigate the methods used. Did they use the same measure of center for all datasets? Are all assumptions met measure of center for all datasets? Are all assumptions met for whatever method they used?for whatever method they used?

Be aware of any excluded results that should be there.Be aware of any excluded results that should be there.