Upload
andrea-lloyd
View
216
Download
1
Embed Size (px)
Citation preview
Introduction to STATA Introduction to STATA for Clinical for Clinical ResearchersResearchers
Jay BhattacharyaJay BhattacharyaAugust 2007August 2007
What is STATA?What is STATA?
A general purpose statistical A general purpose statistical analysis package used byanalysis package used by– epidemiologists, demographers, epidemiologists, demographers,
clinical researchers, social scientists, clinical researchers, social scientists, many othersmany others
Tool to graphically display dataTool to graphically display data– Good for data explorationGood for data exploration– Also good for publishing in journalsAlso good for publishing in journals
Why STATA?Why STATA?
Easy to learnEasy to learn PowerfulPowerful It will help you produce papersIt will help you produce papers
Anatomy of A Clinical Anatomy of A Clinical Research ProjectResearch Project Collect (the data)Collect (the data) Clean Clean ExploreExplore AnalyzeAnalyze Submit (for publication)Submit (for publication) ReviseRevise
Collect the DataCollect the Data
STATA is good for analyzing STATA is good for analyzing – large secondary databases large secondary databases – smaller home grown data smaller home grown data
Store the data as a relational Store the data as a relational database (or maybe as a database (or maybe as a spreadsheet)spreadsheet)– It’s easy to convert to STATA format It’s easy to convert to STATA format
from SAS and other formatsfrom SAS and other formats
Clean the DataClean the Data
Merge in other sources of dataMerge in other sources of data– STATA does merges of all types, STATA does merges of all types,
including match merge, table-lookup, including match merge, table-lookup, and more complicated mergingand more complicated merging
Recode variablesRecode variables Hunt for outliersHunt for outliers Apply inclusion/exclusion criteriaApply inclusion/exclusion criteria Treat missing variables Treat missing variables
consistentlyconsistently
Explore the DataExplore the Data
Make a data codebookMake a data codebook Examine univariate statisticsExamine univariate statistics
– mean, standard deviation, percentilesmean, standard deviation, percentiles Explore bivariate relationships Explore bivariate relationships
– correlations, conditional means, etc.correlations, conditional means, etc. Examine the data graphically Examine the data graphically
– STATA has powerful graphics STATA has powerful graphics capabilities (with a simple GUI capabilities (with a simple GUI interface)interface)
Analyze the DataAnalyze the Data
STATA is powerful all-purpose STATA is powerful all-purpose statistical package with most common statistical package with most common statistical computations built in statistical computations built in
STATA is extensible for uncommon STATA is extensible for uncommon statistical computationsstatistical computations– You can share the tools you develop with You can share the tools you develop with
the rest of the STATA communitythe rest of the STATA community– Built-in and user written commands have a Built-in and user written commands have a
common interfacecommon interface– The STATA community is vibrant and The STATA community is vibrant and
helpfulhelpful
Built-In CommandsBuilt-In Commands
Linear models (ANOVA, regressions)Linear models (ANOVA, regressions) Nonlinear models (logit, poission regression)Nonlinear models (logit, poission regression) Failure time models (KM curves, Cox models)Failure time models (KM curves, Cox models) Time-series modelsTime-series models R-like matrix processing toolsR-like matrix processing tools BootstrapBootstrap Robust statisticsRobust statistics
– Standard error corrections for clusteringStandard error corrections for clustering– Accounting for complex survey designAccounting for complex survey design
Powerful and easy to use macro language to Powerful and easy to use macro language to automate commandsautomate commands
Submit for PublicationSubmit for Publication
With STATA, you can make a wide With STATA, you can make a wide variety of publishable-quality graphsvariety of publishable-quality graphs
You can automatically generate You can automatically generate tables of results that are easy to edit tables of results that are easy to edit in your favorite word processorin your favorite word processor– These are commands added to STATA These are commands added to STATA
by the user communityby the user community– LaTeX supportLaTeX support
ReviseRevise
STATA has a nice, intuitive GUI for STATA has a nice, intuitive GUI for interactive data explorationinteractive data exploration– Don’t use it too much!Don’t use it too much!
STATA commands can be stored STATA commands can be stored in a text (.do) file, edited, and re-in a text (.do) file, edited, and re-runrun
An ExampleAn Example
Body mass index is weight (kg) divided Body mass index is weight (kg) divided by height (m) squaredby height (m) squared
Why squared?Why squared?– Presumably to make BMI independent of Presumably to make BMI independent of
height—BMI should mean the same thing height—BMI should mean the same thing for a short man and a tall womanfor a short man and a tall woman
But does it?But does it?– And is the triceps skinfold test height And is the triceps skinfold test height
independent?independent?
NHANES dataNHANES data
National Health and Nutrition National Health and Nutrition Examination Survey (NHANES)Examination Survey (NHANES)– 2001-2 edition2001-2 edition
Publicly available version can be Publicly available version can be downloaded from the National downloaded from the National Center for Health StatisticsCenter for Health Statistics– Includes anthropometric Includes anthropometric
measurementsmeasurements– Plus lots of other covariatesPlus lots of other covariates
Comparing SAS and Comparing SAS and STATASTATA Pro:Pro:
– STATA is easier to learn and at least as STATA is easier to learn and at least as powerfulpowerful
– STATA is substantially cheaperSTATA is substantially cheaper– STATA tends to be fasterSTATA tends to be faster– STATA has better help facilitiesSTATA has better help facilities
Con:Con:– ““Live” data management and report Live” data management and report
generation is easier with SASgeneration is easier with SAS– Simple analyses with datasets larger than Simple analyses with datasets larger than
memory is possible with SASmemory is possible with SAS