Using AMRS Data in Research
September 15, 2008Beverly Musick
Indiana University, Division of Biostatistics
Using AMRS Data in Research
• Extracting data from MySQL database• Conversion to SAS datasets• Processing and cleaning of data for specific
research question• Preliminary analysis• Example
Extracting data from AMRS (MySQL database)
• Mysqldump command– C:\>"c:\Program Files\Mysql\Mysql Server 5.0\
bin\mysqldump" -u root -p amrs concept > database-dump.sql
• OpenMRS reporting tool– Typically generates Excel Spreadsheets
• ODBC connection to SAS
** permamrsKE.SAS *** This program creates permanent SAS datasets from AMRS SQL database* 6/28/06 BSM******************************************************************** ;libname k 'c:\Kenya\HIV\DataMigration' ;footnote 'c:\kenya\hiv\SASCODE\permamrsKE.sas' ;
libname m odbc user=jsmith password=B*24c9dl database=amrsEldoret ;
data k.AMRSusers ; set m.users ; run ;data k.AMRSpatient ; set m.patient ;run;data k.AMRSpatient_identifier ; set m.patient_identifier ;data k.AMRSpatient_identifier_type ; set m.patient_identifier_type ;data k.AMRSpatient_name ; set m.person_name ;data k.AMRSencounter ; set m.encounter ; run ;data k.AMRSencounter_type ; set m.encounter_type ;data k.AMRSlocation ; set m.location ;
SAS Code for Creating SAS Datasets from AMRS
data k.obs1a ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier obs=3000000) ;run ;data k.obs1b ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier firstobs=3000001 obs=6100000) ; run ;…data k.obs1y ; set m.obs(keep=obs_id person_id concept_id encounter_id order_id obs_datetime location_id obs_group_id accession_number value_group_id value_boolean value_coded value_drug value_datetime value_numeric value_modifier firstobs=14500001 obs=15000000) ; run ;
data k.obs2a ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason obs=3000000) ;run ;data k.obs2b ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason firstobs=3000001 obs=6000000) ;run ;…data k.obs2y ; set m.obs(keep=obs_id value_text date_started date_stopped comments creator date_created voided voided_by date_voided void_reason firstobs=14500001 obs=15000000) ;run ;
Extracting the OBS Table
data obs1 ; set k.obs1a k.obs1b k.obs1c k.obs1d k.obs1e k.obs1f k.obs1g k.obs1h k.obs1i k.obs1j k.obs1k k.obs1l k.obs1m k.obs1n k.obs1o k.obs1p k.obs1q k.obs1r k.obs1s k.obs1t k.obs1u k.obs1v k.obs1w k.obs1x k.obs1x;run ;proc sort data=obs1;by obs_id patient_id;data obs1a ; set obs1 ; by obs_id patient_id; if last.obs_id ;run ;
data obs2 ; set k.obs2a k.obs2b k.obs2c k.obs2d k.obs2e k.obs2f k.obs2g k.obs2h k.obs2i k.obs2j k.obs2k k.obs2l k.obs2m k.obs2n k.obs2o k.obs2p k.obs2q k.obs2r k.obs2s k.obs2t k.obs2u k.obs2v k.obs2w k.obs2x k.obs2y;run ;proc sort data=obs2;by obs_id;data obs2a ; set obs2 ; by obs_id; if last.obs_id ;run ;
proc sort data=obs1a;by obs_id;proc sort data=obs2a;by obs_id;
data k.AMRSobs ; merge obs1a(in=o) obs2a ; by obs_id ; if o ;run ;
Extracting the OBS Table (cont.)
Conversion to Master SAS datasets
• Once the data have been extracted from AMRS, we use HIVcombine.sas to merge it with data from the old ACCESS DB and create the Master SAS datasets– HIVDEMOG2.sas7bdat (stores the cross-sectional
and demographic data)– HIVVISIT2.sas7bdat (stores the longitudinal data
which comes mostly from the follow-up visits)
obs_id person_id
encounter_id
obs_datetime location_id
concept_id
value_coded
value_numeric
value_datetime creator
2005027 1826 53028 23JUN2006:00:00:00.0 14 5282 1139 . . 150
2005028 1826 53028 23JUN2006:00:00:00.0 14 6096 1066 . . 150
2005031 1826 53028 23JUN2006:00:00:00.0 14 1154 . 1.0 . 150
2005032 1826 53028 23JUN2006:00:00:00.0 14 1192 . 0.0 . 150
2005033 1826 53028 23JUN2006:00:00:00.0 14 1156 1066 . . 150
2005034 1826 53028 23JUN2006:00:00:00.0 14 5092 . 94.0 . 150
2005038 1826 53028 23JUN2006:00:00:00.0 14 5088 . 36.3 . 150
2005039 1826 53028 23JUN2006:00:00:00.0 14 5089 . 57.5 . 150
2005040 1826 53028 23JUN2006:00:00:00.0 14 5356 1204 . . 150
2005041 1826 53028 23JUN2006:00:00:00.0 14 1248 . 0.0 . 150
2005042 1826 53028 23JUN2006:00:00:00.0 14 5096 . . 28JUN2006:00:00:00.0 150
2005043 1826 53028 23JUN2006:00:00:00.0 14 6042 197 . . 150
2005044 1826 53028 23JUN2006:00:00:00.0 14 1112 1107 . . 150
2005054 28411 53029 19JUN2006:00:00:00.0 2 5282 1139 . . 152
2005055 28411 53029 19JUN2006:00:00:00.0 2 5096 . . 04JUL2006:00:00:00.0 152
2005056 28411 53029 19JUN2006:00:00:00.0 2 5605 1052 . . 152
2005057 28411 53029 19JUN2006:00:00:00.0 2 5606 1065 . . 152
Conversion to Master SAS datasets
• The process involves selecting the concept of interest (concept_id=5089)
• Creating a specific variable for that concept (weight)• Assigning the appropriate value (value_coded,
value_numeric, etc.) to the newly created variable (weight=value_numeric)
• Merging the records that have the same obs_datetime (or encounter_id)
Example of Converting a Concept into a SAS Variable
• Start with record from OBS tableobs_id person
_idencounter_id
obs_datetime location_id
concept_id
value_coded
value_numeric value_datetime
creator
2005039 1826 53028 23JUN2006:00:00:00.0 14 5089 . 57.5 . 150
person_id encounter_id
apptdate clinic weight
1826 53028 23JUN2006 MTRH Module 3 57.5
• Convert the concept_id to a SAS variable
SAS code to create WEIGHT and merge with other longitudinal data
data wght ; set amrsobscon(where=(concept_id = 5089) rename=(obsdate=apptdate)) ;
wght=value_numeric ; ** delete bogus weights ** ; if wght gt 120 then delete ; keep person_id encounter_id apptdate wght ;run ; …
data visit ; merge wght hght tbtx… ; by person_id apptdate ;run ;
Preliminary Analysis
• Generate frequency tables (PROC FREQ) for– All categorical data such as gender, WHO stage,
yes/no/unknown questions, clinic location, etc.– Limited-response continuous data such as number
of people in household
• Generate means/medians (PROC MEANS) for– All continuous data such as age, weight, CD4– Ordinal data such as ‘1=strongly disagree
2=disagree 3=agree 4=strongly agree’
Proc Freq
proc freq data=h.hivdemog2 ; title 'AMPATH Demographics' ; tables male married traveltime ; tables male*married / missing ;run ;
male Frequency Percent CumulativeFrequency
CumulativePercent
0 46227 65.45 46227 65.45
1 24403 34.55 70630 100.00
Frequency Missing = 77
Married
Married Frequency Percent CumulativeFrequency
CumulativePercent
0 31029 49.82 31029 49.82
1 31253 50.18 62282 100.00
Frequency Missing = 8425
travel time to clinic
TravelTime Frequency Percent CumulativeFrequency
CumulativePercent
< 30 minutes 15390 27.99 15390 27.99
30-60 minutes 17214 31.30 32604 59.29
1-2 hours 13163 23.94 45767 83.23
> 2 hours 9223 16.77 54990 100.00
Frequency Missing = 15717
Proc Meansproc means data=h.hivvisit2 n mean std min max median ; title 'AMPATH Visit Data' ; var age weight cd4 ;run ;
AMPATH Visit Data
The MEANS Procedure
Variable
Label N Mean Std Dev Minimum Maximum Median
age
Weight
cd4
weight (kg)
918651
796412
152475
32.6244700
51.9878071
375.0323837
15.2162848
19.1298253
1672.22
-4.8240931
0
0
311.2470910
120.0000000
536580.00
34.5078713
55.8000000
301.000000