39
1. Data Presentation 1/39 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Statistics and Data Analysis

  • Upload
    merlin

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Statistics and Data Analysis. Professor William Greene Stern School of Business IOMS Department Department of Economics. Statistics and Data Analysis. Part 1 – Data Presentation. Data Presentation Agenda. Data and Data Types Representing Data: pie chart, bar chart. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics and Data Analysis

1. Data Presentation1/39

Statistics and Data Analysis

Professor William GreeneStern School of Business

IOMS DepartmentDepartment of Economics

Page 2: Statistics and Data Analysis

1. Data Presentation2/39

Statistics and Data Analysis

Part 1 – Data PresentationTelling the story statistically

Page 3: Statistics and Data Analysis

1. Data Presentation3/39

Samples are surprisingly small

> 1010 Observations> Telephone sample> Sampling error

Page 4: Statistics and Data Analysis

1. Data Presentation4/39

What Does it Mean?Slightly more than one-third of Americans have a favorable opinion of the Democratic-led Congress, a poll said Wednesday.The Pew Research Center for the People & the Press said the 37% expressing a positive opinion represents a decline of 13 points since April.The favorable percentage is one of the lowest in more than two decades of Pew surveys – if not the lowest, the poll said. The previous low was 40% in January, but the result is not statistically significant because of the margin of error.(USA Today)

We will develop the idea of the “margin of error” and how it is computed.

Page 5: Statistics and Data Analysis

1. Data Presentation5/39

The following was taken fromhttp://www.msnbc.msn.com/id/27339545/An msnbc.com guide to presidential pollsWhy results, samples and methodology vary from survey to survey

WASHINGTON - A poll is a small sample of some larger number, an estimate of something about that larger number. For instance, what percentage of people reports that they will cast their ballots for a particular candidate in an election? A sample reflects the larger number from which it is drawn. Let’s say you had a perfectly mixed barrel of 1,000 tennis balls, of which 700 are white and 300 orange. You do your sample by scooping up just 50 of those tennis balls. If your barrel was perfectly mixed, you wouldn’t need to count all 1,000 tennis balls — your sample would tell you that 30 percent of the balls were orange.

Really?

Your sample might tell you that approximately 30 percent of the balls were orange.

Page 6: Statistics and Data Analysis

1. Data Presentation6/39

The Visual Data Do Tell the Story:Napoleon’s March to and from Moscow

Page 7: Statistics and Data Analysis

1. Data Presentation7/39

Life Expectancy: Highest 15 Countries, 2010 Disability Adjusted Life Expectancy

40

Informative Data Table

Page 8: Statistics and Data Analysis

1. Data Presentation8/39

A Dynamic Picture

Page 9: Statistics and Data Analysis

1. Data Presentation9/39

Bar Charts vs. Data Tables

Page 10: Statistics and Data Analysis

1. Data Presentation10/39

Probability of Survival to Age 50, Female at BirthU.S. and 20 Other Wealthy Countries

It is possible to be misled by a presentation such as this one. Note the vertical axis.

What does this graph tell you? What do the probabilities mean? Are the differences meaningful?

Page 11: Statistics and Data Analysis

1. Data Presentation11/39

Page 12: Statistics and Data Analysis

1. Data Presentation12/39

Does living longer make people happier? Or do people live longer because they are happier?

Page 13: Statistics and Data Analysis

1. Data Presentation13/39

Does the Picture Tell the Story?

New York Times, Page RE1, July 24, 2014

This is the only graphic in the article. The article compares default rates on VA vs. FHA mortgages. Is there anything wrong with this picture? The very technical looking graph/table is unrelated to the article.

Page 14: Statistics and Data Analysis

1. Data Presentation14/39

Data Presentation Agenda

Data Types: Cross Section and Time Series Summarizing Data Graphically

Pie chart, bar chart Box plot, histogram

Summarizing Data with Descriptive Statistics Central tendency Spread Distribution (shape)

Page 15: Statistics and Data Analysis

1. Data Presentation15/39

Data = A Set of FactsA picture of some aspect of the world

Pizza Sales by TypeWhat do the data tell you?How can you use the information?What additional information would make these data (more) informative?

Page 16: Statistics and Data Analysis

1. Data Presentation16/39

Data Types and Measurement Quantitative

Discrete = count: Number of car accidents by city by time Continuous = quantitative measurement: Housing prices

Qualitative Categorical: Shopping mall, car brand, trip mode Ordinal: Survey data on attitudes; “How do you feel about…?”Strongly disagree Disagree Neutral Agree Strongly agreeMoody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.

Frameworks Cross section Time series

Page 17: Statistics and Data Analysis

1. Data Presentation17/39

Discrete, Count Data, Time Series

Page 18: Statistics and Data Analysis

1. Data Presentation18/39

Continuous Quantitative DataHousing Prices and Incomes

Page 19: Statistics and Data Analysis

1. Data Presentation19/39

Unordered Qualitative DataTravel Mode Between Sydney and

Melbourne by 210 Travelers

Page 20: Statistics and Data Analysis

1. Data Presentation20/39

Ordered Qualitative Data German Health Satisfaction Survey; 27,326 individuals. On a

scale from 0 to 10, how do you feel about your health?

Page 21: Statistics and Data Analysis

1. Data Presentation21/39

Aggregated Data May Be Easier to Understand

(0-3)

(4-6)

(7-8)

(9-10)

Bad Fair Good Excellent

Page 22: Statistics and Data Analysis

1. Data Presentation22/39

Bond Ratings Movie Ratings

Ordered Qualitative Outcomes

Arithmetic Mean may not be meaningful.(a) Ordinal measure – rankings(b) Look at that distribution!

Page 23: Statistics and Data Analysis

1. Data Presentation23/39

A Problem with Ordered Survey Response Data

Safety Count Percent Cum Pct

1 17 27.87 27.87

2 15 24.59 52.46

3 17 27.87 80.33

4 10 16.39 96.72

5 2 3.28 100.00

61 Stern Students’ Ranking of Subway Safety (1994)*

Very UnsatisfactoryUnsatisfactoryOKSatisfactoryVery Satisfactory

There is no objective meaning to “3” on some standard scale.Does everyone’s “1” or “2” or “3” … mean the same thing?

* Jeff Simonoff: Data Presentation and Summary, pp. 3-4

Page 24: Statistics and Data Analysis

1. Data Presentation24/39

Cross Section DataHousing Prices and Incomes

Page 25: Statistics and Data Analysis

1. Data Presentation25/39

Time Series Data: Oil Price

Graph is much more useful and informative than a table for time series data.

Page 26: Statistics and Data Analysis

1. Data Presentation26/39

Representing Data

In raw form Transformed to a visual form Summarized graphically Summarized statistically

Page 27: Statistics and Data Analysis

1. Data Presentation27/39

Pie Chart vs. Frequency Table

PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball

CategoryMeatball

5.0%Garlic2.3%

Mushroom and Onion9.2%

Pepper and Onion7.3%

Sausage5.8%

Mushroom16.2%

Plain32.5%

Pepperoni21.8%

Pie Chart of Percent vs TypePizza Pies Sold, by Type

Same Information. Which is more useful for your audience?

Page 28: Statistics and Data Analysis

1. Data Presentation28/39

Data Representation: Bar Chart vs. Pie Chart

Type

Num

ber

Meatbal

lGar

lic

Mushroo

m and O

nion

Pepper

and O

nion

Sausag

e

Mushroo

mPlain

Pepper

oni

4000

3000

2000

1000

0

Chart of Number vs Type

PepperoniPlainMushroomSausagePepper and OnionMushroom and OnionGarlicMeatball

CategoryMeatball

5.0%Garlic2.3%

Mushroom and Onion9.2%

Pepper and Onion7.3%

Sausage5.8%

Mushroom16.2%

Plain32.5%

Pepperoni21.8%

Pie Chart of Percent vs Type

Same data. Which is easier to understand?

BAR CHART PIE CHART

Page 29: Statistics and Data Analysis

1. Data Presentation29/39 2013 data. Source: Bloomberg

Table vs. Bar Chart (or both)

Page 30: Statistics and Data Analysis

1. Data Presentation30/39

Football

Baseball

2013 Valuation of U.S. Sports TeamsThese figures reveal a league strategy.

Page 31: Statistics and Data Analysis

1. Data Presentation31/39

A Box Plot Describes the Distributionof Values in a Set of Data

Listin

g

900000

800000

700000

600000

500000

400000

300000

200000

100000

Average House Listing Price by State

Hawaii

Box and Whisker Plot for House Price Listings

Page 32: Statistics and Data Analysis

1. Data Presentation32/39

Raw Data on Housing Prices and Incomes

Page 33: Statistics and Data Analysis

1. Data Presentation33/39

Making a Box Plot for Per Capita IncomeMaximum=31136

Median=22610

Minimum=17043

1st Quartile = 21677

3rd Quartile = 24933

Interquartile Range = IQR= 24933-21677 = 3256

Page 34: Statistics and Data Analysis

1. Data Presentation34/39

Box and Whisker Plot

Median

75th Percentile

25th Percentile

Interquartile range=IQR

Larger of (Minimum, Median – 1.5 IQR

Smaller of (Maximum, Median + 1.5 IQR

OutliersWhat is an outlier?Why do we believe a particular point is an outlier?

= extreme observations

Page 35: Statistics and Data Analysis

1. Data Presentation35/39

Histogram for House Price Listings

Listing

Freq

uenc

y

900000800000700000600000500000400000300000200000

14

12

10

8

6

4

2

0

Histogram of ListingA histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.

Page 36: Statistics and Data Analysis

1. Data Presentation36/39

Distribution of House Price Listings

Listing

Freq

uenc

y

900000800000700000600000500000400000300000200000

14

12

10

8

6

4

2

0

Histogram of Listing

Listin

g

900000

800000

700000

600000

500000

400000

300000

200000

100000

Average House Listing Price by State

Asymmetry (skewness) in the histogram of listing prices…

… shows up in the box and whisker plot. Note the long whisker at the top of the figure.

Page 37: Statistics and Data Analysis

1. Data Presentation37/39

How to describe/summarize them.

How to explain the variation across states

How to determine if there is any correlation between the two variables.

Regression and Correlation. Are these two variables correlated?

House Price Listings and Per Capita Incomes. States.

r = .48

Page 38: Statistics and Data Analysis

1. Data Presentation38/39

Big Data: Netflix Cinematch Rating/Recommendation System

Page 39: Statistics and Data Analysis

1. Data Presentation39/39

Summary What story does the data presentation tell?

Data in raw form tell no story. Visual representation of data tells something about the data The representation of the data may reveal something about

the underlying process that the data measure. What tool is most informative?

Reduction to a small number of features Visual displays of data

Data Table – Organizing the data is often a good start. Pie chart Box and whisker plots Bar charts Histograms Time series plots

“There are lies, damned lies and statistics.” (Benjamin Disraeli)