27
Computing for Research I Spring 2012 Primary Instructor: Elizabeth Garrett-Mayer Stata Graphics February 16

Computing for Research I Spring 2012

  • Upload
    marcel

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Computing for Research I Spring 2012. Stata Graphics February 16. Primary Instructor: Elizabeth Garrett-Mayer. Basic syntax for commands. p refix: command varlist , options Examples: regress y x , level(90) by race: sum y x , detail t test y , by(x) unequal. - PowerPoint PPT Presentation

Citation preview

Page 1: Computing for Research I Spring  2012

Computing for Research ISpring 2012

Primary Instructor: Elizabeth Garrett-Mayer

Stata GraphicsFebruary 16

Page 2: Computing for Research I Spring  2012

Basic syntax for commands• prefix: command varlist, options

• Examples:– regress y x, level(90)– by race: sum y x, detail– ttest y, by(x) unequal

Page 3: Computing for Research I Spring  2012

Stata Graphics

• Maybe we can just end class now! • Check out these links:

– http://www.ats.ucla.edu/stat/stata/library/GraphExamples/default.htm

– http://www.ats.ucla.edu/stat/stata/topics/graphics.htm

– http://data.princeton.edu/stata/graphics.html– http://www.stata.com/capabilities/graphics.html

Page 4: Computing for Research I Spring  2012

Basic univariate displays

• Boxplots• Stem and leaf• Histograms• Density plots

Page 5: Computing for Research I Spring  2012

Ceramide Data

• Let’s look at the ceramide markers• What are their distributions?• Are there outliers?• Should we consider taking logs, or using % change?

Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head and neck cancers: serum C₁₈-ceramide as a novel biomarker for monitoring response.Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB, Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K, Ogretmen B. Clin Cancer Res. 2011 Sep 15;17(18):6097-105. Epub 2011 Jul 26.

Page 6: Computing for Research I Spring  2012

Histogram

• hist c180

.02

.04

.06

Den

sity

0 20 40 60C18 ceramide

Page 7: Computing for Research I Spring  2012

Let’s make it prettier

* prettier histogramshist c18 , freq xaxis(1 2) ylabel(0(2)24) xlabel(20 "Twenty" 40 "Forty")

hist c18, title("Histogram of C18 Ceramide") subtitle("PI: K. Shirai")

hist c18, ytitle("number of patients") freq yline(0(10)20)

hist c18, xaxis(1 2) xlabel(19.6 "mean" 11.9 "median", axis(2) grid)

finding help on these can sometimes be tricky! e.g. help axis_choice_options

Page 8: Computing for Research I Spring  2012

02

46

810

1214

1618

2022

24Fr

eque

ncy

0 20 40 60C18 ceramide

Twenty FortyC18 ceramide

0.0

2.0

4.0

6D

ensi

ty

0 20 40 60C18 ceramide

PI: K. ShiraiHistogram of C18 Ceramide

0.0

2.0

4.0

6D

ensi

ty

meanmedianC18 ceramide

0 20 40 60C18 ceramide

05

1015

2025

num

ber o

f pat

ient

s

0 20 40 60C18 ceramide

Page 9: Computing for Research I Spring  2012

Boxplots

• graph box c180

2040

6080

C18

cer

amid

e

Page 10: Computing for Research I Spring  2012

Boxplotsgraph box c18, by(cycle)graph box c18, over(cycle)

tab cyclegraph box c18 if cycle<7, over(cycle)

sort patient cyclemerge m:1 patient using "Ptdata.GemDox.dta"graph box c18 if cycle<7, over(cycle) over(gender)

graph hbox c18, over(initial) capsize(5)

Page 11: Computing for Research I Spring  2012

020

4060

800

2040

6080

020

4060

801 3 5

7 9 11

15 19

C18

cer

amid

e

Graphs by Cycle

020

4060

80C

18 c

eram

ide

1 3 5 7 9 11 15 19

010

2030

4050

C18

cer

amid

e

1 3 5

010

2030

4050

C18

cer

amid

e

f m1 3 5 1 3 5

Page 12: Computing for Research I Spring  2012

0 20 40 60 80C18 ceramide

SD

PR

PD

CR

0 20 40 60 80C18 ceramide

SD

PR

PD

CR

graph hbox c18, over(initial) capsize(5)

graph hbox c18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large))

graph hbox c18, over(initial) ytitle(“C18”)

Page 13: Computing for Research I Spring  2012

Labels

• Sometimes xlabels cannot be applied (e.g. boxplots)

• need to label your values• Example: cycle for boxplots

– label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle 5" 7 "cycle 7"

– label values cycle cycle– graph box c18 if cycle<7, over(cycle)

• (Hint: use this on the homework!)

Page 14: Computing for Research I Spring  2012

Stem and Leaf. stem c18

Stem-and-leaf plot for c18ceramide (C18 ceramide)

c18ceramide rounded to nearest multiple of .1plot in units of .1

0** | 42,43,44,46 0** | 57,57,67,81,89,90,96,98,99,99 1** | 01,06,08,08,14,15,19,20,35,44 1** | 62 2** | 03,15,16,18,19,19,22 2** | 82 3** | 17 3** | 4** | 23,49 4** | 58,68,68 5** | 5** | 6** | 37 6** | 86

Page 15: Computing for Research I Spring  2012

Dotplot

• Excellent way to show data across groups when you have a relatively small dataset

• dotplot y, over(group)

dotplot c18, over(cycle)dotplot c18, over(gender)dotplot c18, over(gender) nogroupdotplot c18, over(gender) nogroup jitter(3)dotplot c18, over(gender) nogroup median center

Page 16: Computing for Research I Spring  2012

Dotplot, by gender

020

4060

80C

18 c

eram

ide

f mgender

Page 17: Computing for Research I Spring  2012

Scatterplots• Two way graph• Syntax:

– graph twoway scatter y x1 x2– graph twoway scatter y x1

• Example:– graph twoway scatter c18 totalceramide

020

4060

80C

18 c

eram

ide

400 600 800 1000 1200total ceramide levels

Page 18: Computing for Research I Spring  2012

Regression example

• Scatterplot• Residual plots• Leverage • Fitted line with raw data

Page 19: Computing for Research I Spring  2012

Code graph twoway scatter c18 totalcerregress c18 totalcer

* residual plot* (residual vs. fitted)rvfplot

* the long way* 1. generate a new variable from the regression, residualspredict resid, res* 2. generate a new variable from the regression, fitted valuespredict fitscatter res fit, yline(0)* leverage vs. residual plotlvr2plot

* take transform of C18?gladder c18boxcox c18

* generate new variablegen logc18=log(c18)scatter logc18 totalcerscatter logc18 totalcer, mlabel(gender) scatter logc18 totalcer, mlabel(gender)

s(i)scatter logc18 totalcer, s(Oh)

* redo regressionregress logc18 totalcerrvfplot, yline(0)lvr2plotpredict logfit

* make plot of fitted model and raw datascatter logfit logc18 totalcerscatter logfit logc18 totalcer, s(i o) c(l .)graph twoway scatter logfit totalcer, s(i) c(l) || scatter logc18 totalcer, s(o) c(.)

Page 20: Computing for Research I Spring  2012

The next graph to create

Page 21: Computing for Research I Spring  2012

Fancier way to put regression linesinfile str14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw

graph twoway scatter change setting graph twoway (scatter change setting ) (lfit change setting )graph twoway (scatter change setting ) (qfit change setting )graph twoway (scatter change setting ) (lfitci change setting )

• scatter makes a scatterplot of the two variables

• lfit plots the regression line of y on x

• qfit plots a fitted quadratic model of y on x

• lfitci plots the line AND a confidence interval!

Page 22: Computing for Research I Spring  2012

Fancier way to put regression lines0

1020

3040

40 60 80 100setting

change Fitted values

Plot using qfit

-20

020

4040 60 80 100

setting

95% CI Fitted valueschange

Plot using lfitci

Page 23: Computing for Research I Spring  2012

Bolivia

Brazil

ChileColombia

CostaRica

Cuba

DominicanRep

Ecuador

ElSalvador

GuatemalaHaiti

Honduras

Jamaica

MexicoNicaragua

Panama

ParaguayPeru

TrinidadTobago

Venezuela

-20

020

40

40 60 80 100setting

95% CI Fitted valueschange

• One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua).

• We can solve this problem by specifying the position of the label relative to the marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option.

• We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock).

graph twoway (lfitci change setting) (scatter change setting, mlabel(country) )

Page 24: Computing for Research I Spring  2012

gen pos=3 replace pos = 11 if country == "TrinidadTobago" replace pos = 9 if country == "CostaRica" replace pos = 2 if country == "Panama" | country == "Nicaragua“

graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) )

Bolivia

Brazil

ChileColombia

CostaRica

Cuba

DominicanRep

Ecuador

ElSalvador

GuatemalaHaiti

Honduras

Jamaica

MexicoNicaragua

Panama

ParaguayPeru

TrinidadTobago

Venezuela

-20

020

40

40 60 80 100setting

95% CI Fitted valueschange

Page 25: Computing for Research I Spring  2012

Legends

Bolivia

Brazil

Chile

Colombia

CostaRica

Cuba

DominicanRep

Ecuador

ElSalvador

Guatemala

Haiti

Honduras

Jamaica

MexicoNicaragua

Panama

Paraguay

Peru

TrinidadTobago

Venezuela

-20

020

40Fe

rtilit

y D

eclin

e

40 60 80 100setting

linear fit 95% CI

Fertility Decline by Social Setting

Bolivia

Brazil

Chile

Colombia

CostaRica

Cuba

DominicanRep

Ecuador

ElSalvador

Guatemala

Haiti

Honduras

Jamaica

MexicoNicaragua

Panama

Paraguay

Peru

TrinidadTobago

Venezuela

-20

020

40Fe

rtilit

y D

eclin

e

40 60 80 100setting

Fertility Decline by Social Setting

graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI"))

graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(off)

Page 26: Computing for Research I Spring  2012

Spaghetti plotsCommand available from UCLA: spagplot

* spaghetti plotsclear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv"findit spagplotspagplot c18 cycle, id(patient)spagplot c18 cycle, id(patient) nofit

* remove patients who only have cycle=1sort patient cycle by patient: gen visit=_negen maxvis=max(visit), by(patient)spagplot c18 cycle if maxvis>1, id(patient) nofit

* or, use c(L)graph twoway scatter c18 cycle if maxvis>1, c(L)help connectstyle

Page 27: Computing for Research I Spring  2012

other neat stuff

• graph matrix• saving graphs: click and save as desired format• saving and combining (see princeton site,

section 3.3)– http://data.princeton.edu/stata/graphics.html

• See GraphExamples on ucla site:– http://www.ats.ucla.edu/stat/stata/library/GraphExamples/