15
Introduction to Introduction to STATA for Clinical STATA for Clinical Researchers Researchers Jay Bhattacharya Jay Bhattacharya August 2007 August 2007

Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Embed Size (px)

Citation preview

Page 1: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Introduction to STATA Introduction to STATA for Clinical for Clinical ResearchersResearchers

Jay BhattacharyaJay BhattacharyaAugust 2007August 2007

Page 2: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

What is STATA?What is STATA?

A general purpose statistical A general purpose statistical analysis package used byanalysis package used by– epidemiologists, demographers, epidemiologists, demographers,

clinical researchers, social scientists, clinical researchers, social scientists, many othersmany others

Tool to graphically display dataTool to graphically display data– Good for data explorationGood for data exploration– Also good for publishing in journalsAlso good for publishing in journals

Page 3: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Why STATA?Why STATA?

Easy to learnEasy to learn PowerfulPowerful It will help you produce papersIt will help you produce papers

Page 4: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Anatomy of A Clinical Anatomy of A Clinical Research ProjectResearch Project Collect (the data)Collect (the data) Clean Clean ExploreExplore AnalyzeAnalyze Submit (for publication)Submit (for publication) ReviseRevise

Page 5: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Collect the DataCollect the Data

STATA is good for analyzing STATA is good for analyzing – large secondary databases large secondary databases – smaller home grown data smaller home grown data

Store the data as a relational Store the data as a relational database (or maybe as a database (or maybe as a spreadsheet)spreadsheet)– It’s easy to convert to STATA format It’s easy to convert to STATA format

from SAS and other formatsfrom SAS and other formats

Page 6: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Clean the DataClean the Data

Merge in other sources of dataMerge in other sources of data– STATA does merges of all types, STATA does merges of all types,

including match merge, table-lookup, including match merge, table-lookup, and more complicated mergingand more complicated merging

Recode variablesRecode variables Hunt for outliersHunt for outliers Apply inclusion/exclusion criteriaApply inclusion/exclusion criteria Treat missing variables Treat missing variables

consistentlyconsistently

Page 7: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Explore the DataExplore the Data

Make a data codebookMake a data codebook Examine univariate statisticsExamine univariate statistics

– mean, standard deviation, percentilesmean, standard deviation, percentiles Explore bivariate relationships Explore bivariate relationships

– correlations, conditional means, etc.correlations, conditional means, etc. Examine the data graphically Examine the data graphically

– STATA has powerful graphics STATA has powerful graphics capabilities (with a simple GUI capabilities (with a simple GUI interface)interface)

Page 8: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Analyze the DataAnalyze the Data

STATA is powerful all-purpose STATA is powerful all-purpose statistical package with most common statistical package with most common statistical computations built in statistical computations built in

STATA is extensible for uncommon STATA is extensible for uncommon statistical computationsstatistical computations– You can share the tools you develop with You can share the tools you develop with

the rest of the STATA communitythe rest of the STATA community– Built-in and user written commands have a Built-in and user written commands have a

common interfacecommon interface– The STATA community is vibrant and The STATA community is vibrant and

helpfulhelpful

Page 9: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Built-In CommandsBuilt-In Commands

Linear models (ANOVA, regressions)Linear models (ANOVA, regressions) Nonlinear models (logit, poission regression)Nonlinear models (logit, poission regression) Failure time models (KM curves, Cox models)Failure time models (KM curves, Cox models) Time-series modelsTime-series models R-like matrix processing toolsR-like matrix processing tools BootstrapBootstrap Robust statisticsRobust statistics

– Standard error corrections for clusteringStandard error corrections for clustering– Accounting for complex survey designAccounting for complex survey design

Powerful and easy to use macro language to Powerful and easy to use macro language to automate commandsautomate commands

Page 10: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Submit for PublicationSubmit for Publication

With STATA, you can make a wide With STATA, you can make a wide variety of publishable-quality graphsvariety of publishable-quality graphs

You can automatically generate You can automatically generate tables of results that are easy to edit tables of results that are easy to edit in your favorite word processorin your favorite word processor– These are commands added to STATA These are commands added to STATA

by the user communityby the user community– LaTeX supportLaTeX support

Page 11: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

ReviseRevise

STATA has a nice, intuitive GUI for STATA has a nice, intuitive GUI for interactive data explorationinteractive data exploration– Don’t use it too much!Don’t use it too much!

STATA commands can be stored STATA commands can be stored in a text (.do) file, edited, and re-in a text (.do) file, edited, and re-runrun

Page 12: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

An ExampleAn Example

Body mass index is weight (kg) divided Body mass index is weight (kg) divided by height (m) squaredby height (m) squared

Why squared?Why squared?– Presumably to make BMI independent of Presumably to make BMI independent of

height—BMI should mean the same thing height—BMI should mean the same thing for a short man and a tall womanfor a short man and a tall woman

But does it?But does it?– And is the triceps skinfold test height And is the triceps skinfold test height

independent?independent?

Page 13: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

NHANES dataNHANES data

National Health and Nutrition National Health and Nutrition Examination Survey (NHANES)Examination Survey (NHANES)– 2001-2 edition2001-2 edition

Publicly available version can be Publicly available version can be downloaded from the National downloaded from the National Center for Health StatisticsCenter for Health Statistics– Includes anthropometric Includes anthropometric

measurementsmeasurements– Plus lots of other covariatesPlus lots of other covariates

Page 14: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007
Page 15: Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

Comparing SAS and Comparing SAS and STATASTATA Pro:Pro:

– STATA is easier to learn and at least as STATA is easier to learn and at least as powerfulpowerful

– STATA is substantially cheaperSTATA is substantially cheaper– STATA tends to be fasterSTATA tends to be faster– STATA has better help facilitiesSTATA has better help facilities

Con:Con:– ““Live” data management and report Live” data management and report

generation is easier with SASgeneration is easier with SAS– Simple analyses with datasets larger than Simple analyses with datasets larger than

memory is possible with SASmemory is possible with SAS