Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
SEEING IS BELIEVING:Telling stories with statistics – in pictures
We’re failing
Do you see the same thing here?
Gender Male Female
Military -------------- ---------
No 943 1,222
Yes 227 72
This is your brain on statistics
Gender Male Female
Military -------------- ---------
No 943 1,222
Yes 227 72
The total sample is (roughly) evenly divided by gender.
Subtracting 72 from the 150 one would expect gives a value of about 80, which squared is 6,400.
It is already obvious this is significant.
Just for closure ..
e o (e-o) ^2 ((e-o)^2)/e
157 72 7225 46.01910828
142 227 7225 50.88028169
1028 943 7225 7.028210117
1137 1222 7225 6.354441513
110.2820416
Seeing is a learned skill
Statisticians may see things in a picture others don’t
My points
(surprisingly, I do have some)
Data Visualization
•
Graphics do not necessarily stand alone
Data visualization is all around us.
Visual representation in one context is often misapplied to another.
Atomic numbers on your socks?
Data visualization needs to ADD information
Basic Assumptions
• Our audience needs to be taught to read visual data just as we read numeric data, and we need to learn to have some discussion beyond the choices of line graphs vs. pie charts
YOU NEED TO LEARN TO WRITE PICTURES
You learned to read numbers
Or, to be more specific, you need to explain to others what you see in pictures
?
Question + Data > Picture = Story
Bad visualization for one question can be good for another
• Who will win the election?
• Which regions support the Democrats?
Poll dataset did not include Hawaii or Alaska
DATA VISUALIZATION BY EXAMPLE
AN EXAMPLE OF PROGRAM EVALUATION
The government is smarter than you think
(No, I’m serious)
Was the program implemented as planned?
Was the program implemented as planned?
Was the program implemented as planned?
Did the program work?
GOPTIONS HBY = 2 ;PROC GPLOT
DATA=wussexampleUNIFORM; PLOT z_total_post * z_total_pre / VREF=0 ;BY group;
EQUATIONS IN THE SAS LOG FOR THE STATISTICIAN IN YOU
NOTE: Regression equation : z_total_post = 0.13379 + 0.776552*z_total_pre.
NOTE: The above message was for the following BY group: group=CONTROL
NOTE: Regression equation : z_total_post = 1.233616 + 0.578418*z_total_pre.
NOTE: The above message was for the following BY group: group=EXPERIMENTAL
Same plot in JMP
Is the intervention successful under all conditions?
TRAINING WAS ADMINISTERED TO FOUR COHORTS
Admittedly, we did not train people while flying on a trapeze
Creating the interaction graph
First, in the RESULTS window, type
sgedit on
Creating the interaction graph
First, in the RESULTS window, type
sgedit on
Ods listing sge = on ;
Ods graphics on ;
proc glm data = plots ;
class TestType cohort ;
model z_total = TestType cohort TestType*cohort ;
where group = "EXPERIMENTAL" ;
Click on the sge plot to edit it
ODDLY, THE MOST TIME-CONSUMING PART OF THIS IS MAKING THE LINES THICKER
Of course, that is kind of like being the smaller midget
Using SGEDIT to, well, edit
1. Double-click on the .sgefile in the RESULTS window
2. Right-click in the plot area & select PLOT PROPERTIES
3. Select desired line thickness
THANKS FOR ASKING!
Yes, the TestType*Cohort*Group interaction (F=5.84, p < .0001) AND the TestType*Group interaction (F=22.92, p < 0001) in the other repeated measures ANOVA were significant.
LOOKING AT THE LITTLE PICTURE
(Especially true for small samples)
How does our screening test work?
R-square = .05
Don’t be too hasty
Look!
Another example
• Years of Education as predictor of gain score
• R-square = .46
• Correlation = .68)
• P <.01.
Now looky here …
Is it a real relationship?
What should we do?
Throw the score out?
Keep the score in?
Something else?
Ignoring my partner …
Compare your answers with the people next to you
Sometimes outliers are the most interesting part of your study
PROC CORR
One last example on knowing your data
Not just telling a story,
having a conversation
PROC FREQ
Custom Map-making
How to plot the largest category in a frequency distribution
1, 2, 3
1. PROC TABULATE -> output dataset
2. PROC FORMAT
3. Proc GMAP
DATA VISUALIZATION BY EXAMPLE
WHERE IS DEMOCRATIC SUPPORT BASED?
DATA VISUALIZATION IN POLITICAL SURVEYS
PROC TABULATE
DATA= in.VOTE2008 OUT=SummaryVOTE2008 ;
CLASS question3 state ;
TABLE state, question3* RowPctN ;
proc format ;
value vote
50.01 - 100 = "Obama"
0 - 50 = "McCain" ;
PROC GMAP
DATA = SummaryVOTE2008 map = maps.us ;
ID state ;
CHORO PctN_01 / discrete LEGEND=LEGEND1 ;
PROC GMAP
DATA = SummaryVOTE2008 map = maps.us ;
ID state ;
CHORO PctN_01 / discrete LEGEND=LEGEND1 ;
Pattern1 c = red ;
Pattern2 c = blue ;
format PctN_01 vote. ;
PROC GMAP
CHORO PctN_01 / discrete LEGEND=LEGEND1 ;
FORMAT PctN_01 vote. ;
CHORO statement uses the first observation and ignores the others.
Does Race Matter?
PROC GMAP
DATA = wuss map=maps.us ;
ID state ;
area vote2008 / discrete statistic = mean ;
block pctmin / discrete statistic = mean ;
format pctmin rangep. vote2008 voten. ;
mean minority percentage in districts where Obama voters live is 21% versus 13% for McCain voters
(t= 5.73, p < .0001)
The usefulness of visual data
With one statement, I can change the percentage of minority & re-run the chart
value rangep
0 - 10 = "0 -10%"
10.01 - 100 = "> 10%%" ;
DATA VISUALIZATION BY EXAMPLE
Decision Trees, ROC & Lift Curves to Predict Military Service
Speaking of easy, interactive, graphics
JMP
How to get a SAS .xpt file into JMP, Step 1
File > Open
DECISION TREE
• ANALYZE > MODELING > PARTITION
• SELECT Y
• SELECT X VARIABLES
• Click on the SPLIT button
In JMP, use of training and testing datasets is REALLY easy
EXCLUDE 25% or 50% of the data and then re-run your analyses with the
excluded sample
Receiver Operating Characteristic
Click on the red arrow at the top left of the partition window for pull-down options include ROC and Lift curves.
Comparing models
A statistician is a person who was good at math but didn’t have enough
personality to be an accountant ?
It is important that people believe you
And that’s my story
AnnMaria De Mars
The Julia Group
2111 7th St #8
Santa Monica, CA 90405
(310) 717 -9089