View
221
Download
0
Category
Tags:
Preview:
Citation preview
Statistics: Unlocking the Power of Data Lock5
STAT 250Nathaniel Cannon
Describing Data:Categorical Variables
SECTIONS 2.1• One categorical variable• Two categorical variables
Statistics: Unlocking the Power of Data Lock5
Vaccinations in CaliforniaWhat proportion of children in California are
vaccinated?
California law requires students to provide proof of immunization for school, unless they have an approved exception: Medical Exception Personal belief exception
Let’s look at the data!
Statistics: Unlocking the Power of Data Lock5
Frequency Table
Vaccines up to date
Medical Exception
Personal Belief Exception
Other TOTAL
480014 1009 13229 36391
530643
Data from California department of public health
All kindergartens in California that reported data (required), 2014 – 2015
Do you think schools that reported may differ from schools that didn’t report? Does sampling bias exist?
•A frequency table shows the number of cases that fall in each category:
Minitab: Stat -> Tables -> Tally Individual Variables -> Counts
Statistics: Unlocking the Power of Data Lock5
Bar Chart/Plot/GraphIn a bar chart, the height of the bar is the number of cases falling in each category
Minitab: Graph -> Bar chart
Statistics: Unlocking the Power of Data Lock5
Histogram vs Bar Chart
This is a
a) Histogramb) Bar chartc) Otherd) I have no idea
Statistics: Unlocking the Power of Data Lock5
Histogram vs Bar Chart
This is a
a) Histogramb) Bar chartc) Otherd) I have no idea
Statistics: Unlocking the Power of Data Lock5
Histogram vs Bar ChartA bar chart is for categorical data, and the x-axis has no numeric scale
A histogram is for quantitative data, and the x-axis is numeric
For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed
For a quantitative variable, the number of bars in a histogram is up to you (or your software), and the appearance can differ with different number of bars
Statistics: Unlocking the Power of Data Lock5
Proportion
The proportion in a category is found by
Proportion for a sample: (“p-hat”)
Proportion for a population: p
Statistics: Unlocking the Power of Data Lock5
ProportionWhat proportion of children in the sample
have their vaccinations up to date?
480014/530643 = 0.9046
A proportion of 0.90 is the same as 90%
Vaccines up to date
Medical Exception
Personal Belief Exception
Other TOTAL
480014 1009 13229 36391
530643
Statistics: Unlocking the Power of Data Lock5
Relative Frequency TableA relative frequency table shows the proportion of cases that fall in each category
All the numbers in a relative frequency table sum to 1
Vaccines up to date
Medical Exception
Personal Belief Exception
Other TOTAL
0.905 0.002 0.025 0.068 1
Minitab: Stat -> Tables -> Tally Individual Variables -> Percents
Statistics: Unlocking the Power of Data Lock5
Pie ChartIn a pie chart, the relative area of each slice of the pie corresponds to the proportion in each category
Minitab: Graph -> Pie Chart
Statistics: Unlocking the Power of Data Lock5
Summary: One Categorical Variable
Summary Statistics Proportion Frequency table Relative frequency table
Visualization Bar chart Pie chart
Statistics: Unlocking the Power of Data Lock5
Two Categorical VariablesLook at the relationship between two categorical variables
1.Relationship status
2.Gender
Statistics: Unlocking the Power of Data Lock5
Two-Way Table
Female Male Total
In a Relationship 32 10 42
It’s Complicated 12 7 19
Single 63 45 108
Total 107 62 169
It doesn’t matter which variable is displayed in the rows and which in the columns
Minitab: Stat -> Tables -> Tally Individual Variables -> Counts
Statistics: Unlocking the Power of Data Lock5
Two-Way Table
What proportion of students in this sample are in a relationship?
a)42/169 25% b)32/107 30%c)10/62 16%d)32/42 76%
Female Male Total
In a Relationship 32 10 42
It’s Complicated 12 7 19
Single 63 45 108
Total 107 62 169
Statistics: Unlocking the Power of Data Lock5
Two-Way Table
What proportion of females in this sample are in a relationship?
a)42/169 25% b)32/107 30%c)10/62 16%d)32/42 76%
Female Male Total
In a Relationship 32 10 42
It’s Complicated 12 7 19
Single 63 45 108
Total 107 62 169
Statistics: Unlocking the Power of Data Lock5
Male and Female Proportions30% of females in the sample say they are in a
relationship
16% of males in the sample say they are in a relationship
Why the difference???
Statistics: Unlocking the Power of Data Lock5
Difference in ProportionsA difference in proportions is a difference in proportions for one categorical variable calculated for different levels of the other categorical variable
Example: proportion of females in a relationship – proportion of males in a relationship
Statistics: Unlocking the Power of Data Lock5
Two-Way Table
What proportion of people in a relationship in this sample are female?
a)42/169 25% b)32/107 30%c)10/62 16%d)32/42 76%
Female Male Total
In a Relationship 32 10 42
It’s Complicated 12 7 19
Single 63 45 108
Total 107 62 169
Statistics: Unlocking the Power of Data Lock5
Two-Way Table
CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female!
30% ≠ 76%!
Statistics: Unlocking the Power of Data Lock5
Side-by-Side Bar Chart
Minitab: Graph -> Bar Chart -> Cluster
The height of each bar is the number of the corresponding cell in the two-way table
Statistics: Unlocking the Power of Data Lock5
Segmented Bar ChartA segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side
Minitab: Graph -> Bar Chart -> Stack
Statistics: Unlocking the Power of Data Lock5
Vitamin D InjectionsMany kidney dialysis patients get vitamin D
injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived.
Construct an approximate two-way table of the data (due to rounding of the percentages we can’t recover the exact counts – round to whole numbers).Source: Teng, M., et. al., “Survival of patients undergoing hemodialysis with paricalcitol
or calcitriol Therapy,” New England Journal of Medicine, July 31, 2003; 349(5): 446-456.
Statistics: Unlocking the Power of Data Lock5
Vitamin D Injections
Statistics: Unlocking the Power of Data Lock5
Getting dataset from tableIf you were to write the data from the two-
way table out as an entire data set, what would it look like?
How many columns would there be? What would they represent?
How many rows would there be? Give an example of one of the rows.
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy" . Br Med J (Clin Res Ed) 292 (6524): 879–882
Success Failure
Treatment A 273 77
Treatment B 289 61
Which treatment is better at removing kidney stones?
a) Treatment Ab) Treatment B
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
SMALL STONES Success Failure
Treatment A 81 6
Treatment B 234 36
Which treatment is better at removing small kidney stones?
a) Treatment Ab) Treatment B
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
LARGE STONES Success Failure
Treatment A 192 71
Treatment B 55 25
Which treatment is better at removing large kidney stones?
a) Treatment Ab) Treatment B
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
•Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall!
•How is this possible!?!?
Statistics: Unlocking the Power of Data Lock5
Kidney Stones – Simpson’s Paradox
Large Stones Success Failure Success Rate
Treatment A 192 71 73%
Treatment B 55 25 69%
Small Stones Success Failure Success Rate
Treatment A 81 6 93%
Treatment B 234 36 87%
ALL STONES Success Failure Success Rate
Treatment A 273 77 78%
Treatment B 289 61 83%
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
•Treatment A is used more often on large stones, which are harder to treat.
•This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverse!) when a third variable is considered
Statistics: Unlocking the Power of Data Lock5
Kidney Stones
Statistics: Unlocking the Power of Data Lock5
CombinedTreatment
ATreatment
B
Successful 273 (78%) 289 (83%)
Unsuccessful 77 61
Statistics: Unlocking the Power of Data Lock5
Statistics: Unlocking the Power of Data Lock5
Summary: Two Categorical Variables
Summary Statistics Two-way table Difference in proportions
Visualization Side-by-side bar chart Segmented bar chart
Statistics: Unlocking the Power of Data Lock5
Variable(s) Visualization Summary StatisticsCategorical bar chart,
pie chartfrequency table,
relative frequency table, proportion, odds
Quantitative dotplot, histogram,
boxplot
mean, median, max, min, standard deviation,
range, IQR,five number summary
Categorical vs Categorical
side-by-side bar chart, segmented bar chart
two-way table,difference in
proportions, odds ratio
Quantitative vs Categorical
side-by-side boxplots statistics by group,difference in means
Quantitative vs Quantitative
scatterplot correlation
Statistics: Unlocking the Power of Data Lock5
Descriptive Statistics
Think of a topic or question you would like to use data to help you answer.
What would the cases be?
What would the variables be?
(Limit to one or two variables)
Statistics: Unlocking the Power of Data Lock5
Descriptive Statistics
How would you visualize and summarize the variable or relationship between variables?
a) bar chart/pie chart, proportions, frequency table/relative frequency table
b) dotplot/histogram/boxplot, mean/median, sd/range/IQR, five number summary
c) side-by-side or segmented bar charts, difference in proportions, two-way table
d) side-by-side boxplot, difference in meanse) scatterplot, correlation
Statistics: Unlocking the Power of Data Lock5
To DoRead Section 2.1
Do HW 2.1 (due Friday, 2/13)
Study for Exam 1 (Friday, 2/13)
Recommended