19
Using AMRS Data in Research September 15, 2008 Beverly Musick Indiana University, Division of Biostatistics

Using AMRS Data in Research September 15, 2008 Beverly Musick Indiana University, Division of Biostatistics

Embed Size (px)

Citation preview

Using AMRS Data in Research

September 15, 2008Beverly Musick

Indiana University, Division of Biostatistics

Using AMRS Data in Research

• Extracting data from MySQL database• Conversion to SAS datasets• Processing and cleaning of data for specific

research question• Preliminary analysis• Example

Extracting data from AMRS (MySQL database)

• Mysqldump command– C:\>"c:\Program Files\Mysql\Mysql Server 5.0\

bin\mysqldump" -u root -p amrs concept > database-dump.sql

• OpenMRS reporting tool– Typically generates Excel Spreadsheets

• ODBC connection to SAS

** permamrsKE.SAS *** This program creates permanent SAS datasets from AMRS SQL database* 6/28/06 BSM******************************************************************** ;libname k 'c:\Kenya\HIV\DataMigration' ;footnote 'c:\kenya\hiv\SASCODE\permamrsKE.sas' ;

libname m odbc user=jsmith password=B*24c9dl database=amrsEldoret ;

data k.AMRSusers ; set m.users ; run ;data k.AMRSpatient ; set m.patient ;run;data k.AMRSpatient_identifier ; set m.patient_identifier ;data k.AMRSpatient_identifier_type ; set m.patient_identifier_type ;data k.AMRSpatient_name ; set m.person_name ;data k.AMRSencounter ; set m.encounter ; run ;data k.AMRSencounter_type ; set m.encounter_type ;data k.AMRSlocation ; set m.location ;

SAS Code for Creating SAS Datasets from AMRS

data k.obs1a ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier obs=3000000) ;run ;data k.obs1b ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier firstobs=3000001 obs=6100000) ; run ;…data k.obs1y ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier firstobs=14500001 obs=15000000) ; run ;

data k.obs2a ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason obs=3000000) ;run ;data k.obs2b ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason firstobs=3000001 obs=6000000) ;run ;…data k.obs2y ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason firstobs=14500001 obs=15000000) ;run ;

Extracting the OBS Table

data obs1 ; set k.obs1a k.obs1b k.obs1c k.obs1d k.obs1e k.obs1f k.obs1g k.obs1h k.obs1i k.obs1j k.obs1k k.obs1l k.obs1m k.obs1n k.obs1o k.obs1p k.obs1q k.obs1r k.obs1s k.obs1t k.obs1u k.obs1v k.obs1w k.obs1x k.obs1x;run ;proc sort data=obs1;by obs_id patient_id;data obs1a ; set obs1 ; by obs_id patient_id; if last.obs_id ;run ;

data obs2 ; set k.obs2a k.obs2b k.obs2c k.obs2d k.obs2e k.obs2f k.obs2g k.obs2h k.obs2i k.obs2j k.obs2k k.obs2l k.obs2m k.obs2n k.obs2o k.obs2p k.obs2q k.obs2r k.obs2s k.obs2t k.obs2u k.obs2v k.obs2w k.obs2x k.obs2y;run ;proc sort data=obs2;by obs_id;data obs2a ; set obs2 ; by obs_id; if last.obs_id ;run ;

proc sort data=obs1a;by obs_id;proc sort data=obs2a;by obs_id;

data k.AMRSobs ; merge obs1a(in=o) obs2a ; by obs_id ; if o ;run ;

Extracting the OBS Table (cont.)

Conversion to Master SAS datasets

• Once the data have been extracted from AMRS, we use HIVcombine.sas to merge it with data from the old ACCESS DB and create the Master SAS datasets– HIVDEMOG2.sas7bdat (stores the cross-sectional

and demographic data)– HIVVISIT2.sas7bdat (stores the longitudinal data

which comes mostly from the follow-up visits)

obs_id person_id

encounter_id

obs_datetime location_id

concept_id

value_coded

value_numeric

value_datetime creator

2005027 1826 53028 23JUN2006:00:00:00.0 14 5282 1139 . . 150

2005028 1826 53028 23JUN2006:00:00:00.0 14 6096 1066 . . 150

2005031 1826 53028 23JUN2006:00:00:00.0 14 1154 . 1.0 . 150

2005032 1826 53028 23JUN2006:00:00:00.0 14 1192 . 0.0 . 150

2005033 1826 53028 23JUN2006:00:00:00.0 14 1156 1066 . . 150

2005034 1826 53028 23JUN2006:00:00:00.0 14 5092 . 94.0 . 150

2005038 1826 53028 23JUN2006:00:00:00.0 14 5088 . 36.3 . 150

2005039 1826 53028 23JUN2006:00:00:00.0 14 5089 . 57.5 . 150

2005040 1826 53028 23JUN2006:00:00:00.0 14 5356 1204 . . 150

2005041 1826 53028 23JUN2006:00:00:00.0 14 1248 . 0.0 . 150

2005042 1826 53028 23JUN2006:00:00:00.0 14 5096 . . 28JUN2006:00:00:00.0 150

2005043 1826 53028 23JUN2006:00:00:00.0 14 6042 197 . . 150

2005044 1826 53028 23JUN2006:00:00:00.0 14 1112 1107 . . 150

2005054 28411 53029 19JUN2006:00:00:00.0 2 5282 1139 . . 152

2005055 28411 53029 19JUN2006:00:00:00.0 2 5096 . . 04JUL2006:00:00:00.0 152

2005056 28411 53029 19JUN2006:00:00:00.0 2 5605 1052 . . 152

2005057 28411 53029 19JUN2006:00:00:00.0 2 5606 1065 . . 152

Conversion to Master SAS datasets

• The process involves selecting the concept of interest (concept_id=5089)

• Creating a specific variable for that concept (weight)• Assigning the appropriate value (value_coded,

value_numeric, etc.) to the newly created variable (weight=value_numeric)

• Merging the records that have the same obs_datetime (or encounter_id)

Example of Converting a Concept into a SAS Variable

• Start with record from OBS tableobs_id person

_idencounter_id

obs_datetime location_id

concept_id

value_coded

value_numeric value_datetime

creator

2005039 1826 53028 23JUN2006:00:00:00.0 14 5089 . 57.5 . 150

person_id encounter_id

apptdate clinic weight

1826 53028 23JUN2006 MTRH Module 3 57.5

• Convert the concept_id to a SAS variable

SAS code to create WEIGHT and merge with other longitudinal data

data wght ; set amrsobscon(where=(concept_id = 5089) rename=(obsdate=apptdate)) ;

wght=value_numeric ; ** delete bogus weights ** ; if wght gt 120 then delete ; keep person_id encounter_id apptdate wght ;run ; …

data visit ; merge wght hght tbtx… ; by person_id apptdate ;run ;

Preliminary Analysis

• Generate frequency tables (PROC FREQ) for– All categorical data such as gender, WHO stage,

yes/no/unknown questions, clinic location, etc.– Limited-response continuous data such as number

of people in household

• Generate means/medians (PROC MEANS) for– All continuous data such as age, weight, CD4– Ordinal data such as ‘1=strongly disagree

2=disagree 3=agree 4=strongly agree’

Proc Freq

proc freq data=h.hivdemog2 ; title 'AMPATH Demographics' ; tables male married traveltime ; tables male*married / missing ;run ;

male Frequency Percent CumulativeFrequency

CumulativePercent

0 46227 65.45 46227 65.45

1 24403 34.55 70630 100.00

Frequency Missing = 77

Married

Married Frequency Percent CumulativeFrequency

CumulativePercent

0 31029 49.82 31029 49.82

1 31253 50.18 62282 100.00

Frequency Missing = 8425

travel time to clinic

TravelTime Frequency Percent CumulativeFrequency

CumulativePercent

< 30 minutes 15390 27.99 15390 27.99

30-60 minutes 17214 31.30 32604 59.29

1-2 hours 13163 23.94 45767 83.23

> 2 hours 9223 16.77 54990 100.00

Frequency Missing = 15717

Proc Meansproc means data=h.hivvisit2 n mean std min max median ; title 'AMPATH Visit Data' ; var age weight cd4 ;run ;

AMPATH Visit Data

The MEANS Procedure

Variable

Label N Mean Std Dev Minimum Maximum Median

age

Weight

cd4

weight (kg)

918651

796412

152475

32.6244700

51.9878071

375.0323837

15.2162848

19.1298253

1672.22

-4.8240931

0

0

311.2470910

120.0000000

536580.00

34.5078713

55.8000000

301.000000

Processing and cleaning of data for specific research question

• NVP Toxicity datasets