Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Modeling the Affect of Lead on Bone Health
Identification Number 8806
Biostatistics 699 – Project 3
03.04.2008
Appendix contains supplemental results
Abstract:
The present study aims to model how lead exposure can affect bone remodeling and turnover in recently
pregnant women living in Mexico City. The primary outcome is urinary N-telopeptide (NTx), the units of
which are nM BCE/mM creatinine. NTx is a biomarker of bone turnover, and is typically 5 – 65 for non-
pregnant, pre-menopausal women.
The data for this study were collected from women living in Mexico City, who were recruited during
pregnancy. Data collection began following the birth of their child, and was obtained at 1, 3, 7, and 12
months post partum. Bone lead measurements were taken via in vivo K-x-ray fluorescence (KXRF), and
are in units of micrograms per gram of bone mineral. Potentially relevant explanatory variables include
calcium intake, vitamin D intake, caffeine consumption, patella and tibia bone lead concentrations,
exposure to pottery containing lead, and minimal demographic variables.
I used ordinary, non-parametric bootstrap to correct for correlation caused by repeated measures.
I found that BMI, age, education were all associated with decreases in NTx, but lactation and bone lead
are associated with increases.
Introduction:
Lead exposure has been associated with many negative health outcomes including mental
retardation in children, and fertility problems and kindey problems in adults. The severe health
consequences of lead exposure are undoubtably the motivation for the copious research and policy
initiatives that have been inacted to reduce or ristrict the industrial uses of lead. Despite the recent influx
of resources into lead research, little is understood about how lead exposure affects human bone health.
The present study aims to model how lead exposure can affect bone remodeling and turnover in
recently pregnant women living in Mexico City. The primary outcome is urinary N-telopeptide (NTx), the
units of which are nM BCE/mM creatinine. For simplicity, I will drop the units when referring to NTx. NTx
is a biomarker of bone turnover, and is typically 5 – 65 for non-pregnant, pre-menopausal women.
This study is somewhat exploratory in nature due to the fact that NTx is a fairly new biomarker and
has not been studied extensively in pregnant women.
Methods:
The data were collected from women living in Mexico City, who were recruited during pregnancy.
Data collection began following the birth of their child, and was obtained at 1, 3, 7, and 12 months post
partum. Bone lead measurements were taken via in vivo K-x-ray fluorescence (KXRF), and are in units of
micrograms per gram of bone mineral. Two types of bone lead measurements were taken. The first was
from the patella which is a soft, spongey bone, and the second was from the tibia, which is a harder, less
porous bone. I am interested in modeling urinary N-telopeptide (NTx) which is a biomarker of bone
turnover. Potentially relevant explanatory variables include calcium intake, vitamin D intake, caffeine
consumption, patella and tibia bone lead concentrations, exposure to pottery containing lead, and minimal
demographic variables. In this dataset, NTx is right skewed (see appendix figures 3 and 4) which could
violate the assumption of normally distributed errors in the linear regression framework. To correct the
skewness, a log transform was used. An additional time variable was generated which was the difference
in days between the observation time and the child's date of birth. This time variable, which I will refer to
as the continuous time variable, is more precise than the follow-up time (1 mo, 3 mo, etc.) since the
follow-ups were not done in exact intervals.
Statistical Methods:
Special care must be taken when analyzing longitudinal data because observations within subjects
will be correlated. Correlated observations can be dealt with in several ways, including modeling the
covariance structure, using mixed-effects models, or ignoring the correlation and using the bootstrap
proceedure to correct the model standard errors. Mixed-effects are difficult to interpret for a medical
community and may be too advanced for the limited data and exploratory nature of the present study.
Likewise, I am not specifically interested in the correlation structure of the obervations. For these reasons
I opted to initally ignore the repeated measures and later correct the standard errors and p-values using
the bootstrap proceedure. I used ordinary, non-parametric bootstrap with 10,000 bootstrap replicates
which is ample to estimate the empirical variances.
To create a model basis for comparison, I first fit a so-called loaded model which contained all
available covariates and many potential interactions (coefficients and p-values are reported in appendix
table 2).
The loaded model is, by design, unparsimonious. To pare down the number of covariates, I used
best subset selection which fits most possible models subject to certain constraints such as forcing
covariates in or out, or a cap on the maximum number of covariates in the model. I used the adjusted R2
as a gauge of the performance of the model, but did not use it exclusively for model selection.
Lastly, to illustrate the final model, I generated plots of predicted NTx given many different
conditions.
All analysis was done in the statistical program R version 2.5-1 using Mac OS X version 10.4.11.
The packages used include ‘leaps,’ ‘gplots,’ and ‘boot.’
Results:
This data set contained many missing values in both the covariates and in the outcome, however
according to the investigator, many values of the outcome are missing due to financial constraints and can
thus be considered missing at random.
Table 1 summarizes population characteristics at baseline. Some notable features are the
extremely elevated NTx levels and the high percentage of lactating/breastfeeding women. While not
reported here, the percentage of breastfeeding women decreases sharply over time, and caffeine
consumption increases over time (appendix figure 5).
The results from univariate regressions are located in the appendix table 1. Univariate regressions
can often provide insight into which covariates may be predictive in the final model. These univariate
regressions do not seem to suggest that the lead measurements are predicitive at all, but that the
demographic variables (age and education) are. This result could mean that the association is too
complex to be captured by simple linear regressions, or that the lead measurements are too noisy to be
useful.
The loaded model recapitulates the findings of the univariate models, namely that the demographic
variables are significant predictors, but lead is not. Interestingly, lactation status is very significant (p-
value = 8.11e-05), which makes good sense biologically. Lactation and, more specifically, breastfeeding
is very taxing on the body especially in terms of calcium, so we might expect bone turnover to be higher.
Table 1. Baseline Subject Characteristics
Calcium (mg) Mean ± SD 999.23 ± 350.73
Vitamin D (IU) ± SD 236.01 ± 112.74
Caffeine (mg) ± SD 30 ± 86.20
NTx ± SD 104.50 ± 65.48
Patella Bone Lead ± SD 13.81 ± 11.33
Tibia Bone Lead ± SD 11.05 ± 11.88
Percent Lactating* 95%
Ceramic Use (Days/Week) ± SD 0.14 ± 0.73
% Life in MX ± SD 0.91 ± 0.21
Age (Years) ± SD 27.70 ± 5.56
Education (Years) ± SD 10.58 ± 3.17
BMI ± SD 25.57 ± 3.58
Table 1. Subject characteristics at baseline measuremen
* 95% of subjects who responded
Calcium and vitamin D are both marginally significant, however they are highly correlated (about 80% at
baseline) and should not both be included in a final model because of the ensuing problem of
multicollinearity.
Table 2 below contains the results of the final model when all covariates are centered except for
time and lactation. Centering allows for a meaningful interpretation of the model when the covariates are
zero, which would otherwise not be the case (you can't have a BMI of zero, for example). The coefficients
are reported as e for ease of interpretation (recall that NTx was originally on the log scale). When we go
from the log scale back to the standard scale, the coefficients can be interpreted as multiplicative effects.
The bootstrap corrected 95% confidence intervals and p-values are the ones that should be used for
inference and the uncorrected values are presented for comparison only, however they do not vary a great
deal.
Table 2. Results of Final Model. The coefficients are reported as e to the beta, and multiplied by the standard deviation, which
yields a more relevant change in units, except for time and lactation which are days and 0 = no, 1 = yes, respectively.
In my final model, I find that vitamin D, age, education, and BMI have protective effects. That is,
each of these covariates reduce NTx. Furthermore, NTx reduces over time, but this effect is only
marginally statistically significant (p-value = 0.09). All the coefficients reported in table 2, save for
lactation (which is binary) and time, have been multiplied by their respective standard deviations to yield
a more appropriate unit of increase (a one unit increase in vitamin D is trivial, but a standard deviation
increase is a meaningful difference). So the coefficients are interpreted as a 1 standard deviation
Table 2. Results of Final Model
Variable Final Model Bootstrap Corrected
Coefficient CI p-Value CI p-Value
Intercept 60.05 [52.42, 68.80] < 2e-16 [52.42, 68.80] < 2e-16
Vitamin D (IU) 0.94 [0.90, 0.99] 0.04 [0.90, 0.99] 0.03
Patella Bone Lead 0.94 [0.85, 1.05] 0.004 [0.82, 0.96] 0.006
Tibia Bone Lead 1.01 [0.96, 1.07] 0.06 [0.94, 1.04] 0.06
Lactation 1.56 [1.40, 1.75] 1.20e-14 [1.40, 1.74] 2.22e-15
Age (Years) 0.85 [0.81, 0.89] 1.03e-09 [0.81, 0.89] 1.54e-09
Education (Years) 0.92 [0.87, 0.96] 0.001 [0.87, 0.96] 0.001
Body Mass Index 0.92 [0.88, 0.96] 0.0002 [0.88, 0.96] 0.0002
Continuous Time 0.99 [0.99, 1.00] 0.10 [0.99, 1.00] 0.09
Patella * Tibia 1.07 [0.95, 1.04] 0.0005 [0.95, 1.05] 0.002
N 535
Adjusted R2 0.24
increase, rather than a 1 unit increase. The interaction of the tibia and patella bone lead concentrations is
difficult to interpret, but is simpler using figure 1.
Figure 1. Plots predicting NTx over patella/tibia values for the 10th, 50th and 90th quantiles of patella/tibia at baseline and end
point.
The plot in the top left corner shows that patella has a protective effect when the tibia value is
under the 10th quantile, but that the effect is overwhelmed for higher tiers of tibia bone lead
concentration. A similar trend can be seen at the end point as well (bottom left), except that it is shifted
down, reflecting the reduction in NTx over time. In the top right corner, lower quantiles of patella bone
lead concentration start at lower NTx levels, but increase more steeply as the tibia concentrations
increase. The analogous end point plot is again shifted down, reflecting the reduction in NTx over time.
Similar plots can be made to depict changes in NTx across lead values when the other covariates
change. These plots are located in appendix figures 8 and 9. The trends for both sets are simiar,
however the level of NTx shifts up or down depending on whether the covariate is protective or is not.
Conclusions:
Some covariates in the final model in this study seems fairly intuitive, however the associations
may not be so straightforward. According to Hu et al there are two paradigms for thinking about the
relationship between bone lead and bone health. The first paradigm is the framework discussed here with
lead concentrations (x) affecting bone turnover (y). The second, which is at least equally reasonable, is
that bone turnover actually affects lead concentrations. That is, if NTx is high, more lead is being released
which influences the bone lead measurements and the association direction is opposite of the modeling
done here. This would obviously completely invalidate the model presented here. Unfortunately, this
shortcoming is not a statistical one, but a limitation of the study design.
The most damning limitation of this study is the complexity of the research question combined with
the relative dearth of data. The female body changes dramatically during pregnancy, likely in ways that
are inadequately measured in this sample. More comprehensive laboratory measures are needed, as well
as concrete normal values for comparison. More importantly, bone growth and destruction is a slow,
gradual process and a mere 12 months of observation time is simply not enough to observe the body's
restabilization following pregnancy.
Though this analysis is limited in scope, it will hopefully serve as a good starting point for future
modeling and experimental design.
References:
Crawley, Michael J. «The R Book.» John Wiley and Sons Ltd: 2007.
Fitzmaurice, Garrett M., Nan M. Laird, and James H. Ware. «Applied Longitudinal Analysis.» John Wiley
and Sons Ltd: 2004.
Hu, Howard, et al. “Bone Lead as a Biological Marker in Epidemiologic Studies of Chronic Toxicity:
Conceptual Paradigms.” Environmental Health Perspectives, Vol. 106, No. 1. Jan. 1998.
Appendix:
Figure 1. Scatterplot matrix of NTx and all continuous covariates at baseline.
Figure 2. Scatterplot matrix of NTx and all continuous covariates at end point.
Figure 3. Histogram matrix of NTx and all continuous covariates at baseline.
Figure 4. Histogram matrix of NTx and all continuous covariates at end point.
Figure 5. Spaghetti plot matrix of log NTx and relevant covariates over time. For clarity, the plots
contain a randomly selected subset of subjects.
Figure 6. Best subsets selection plot. Each notch on the y-axis represents a model. This plot is for
exploratory use only, because it uses all four time points without correction for within-subject
correlation.
Figure 7. Best subsets selection plots. Each notch on the y-axis represents a model. Each time point is
done separately. The differences in the models selected at each time point may suggest instability in the
aggregate model, or differences in the importance of some covariates at different time points.
Figure 8. Predicted NTx over patella values given certain conditions. Unless otherwise noted, all the covariate values are at their mean levels, zero. The trends are all similar, but the time effect is apparent in the shift down for protective covariates and up for
lactation.
Figure 9. Predicted NTx over tibia values given certain conditions. Unless otherwise noted, all the covariate values are at their
mean levels. The trends are all similar, but the time effect is apparent in the shift down for protective covariates and up for lactation.
Table 2. Coefficients and p-values for the loaded model. These coefficients are not multiplied by relevant
unit changes and not centered.
Table 1. Univariate Regression Models
Variable Univariate Regression Models
Intercept 4.22*** 4.35*** 4.37*** 4.39*** 4.34*** 4.34*** 4.43*** 5.12*** 4.62*** 7.18*** 5.29*** 5.09*** 4.55***
(0.07) (0.06) (0.03) (0.04) (0.03) (0.03) (0.10) (0.14) (0.09) (0.62) (0.16) (0.18) (0.05)
Calcium 0.00014* --- --- --- --- --- --- --- --- --- --- --- ---
mg (0.00007) --- --- --- --- --- --- --- --- --- --- --- ---
Vitamin D -.000009 --- --- --- --- --- --- --- --- --- --- ---
IU (0.0002) --- --- --- --- --- --- --- --- --- --- ---
Caffeine -0.0005 --- --- --- --- --- --- --- --- --- ---
mg (0.0003) --- --- --- --- --- --- --- --- --- ---
Patella Lead -0.004 --- --- --- --- --- --- --- --- ---
(0.003) --- --- --- --- --- --- --- --- ---
Tibia Lead -0.00009 --- --- --- --- --- --- --- ---
(0.002) --- --- --- --- --- --- --- ---
Ceramic Use 0.09* --- --- --- --- --- --- ---
Days Per Week (0.04) --- --- --- --- --- --- ---
Percent of Life in MC -0.10 --- --- --- --- --- ---
(0.11) --- --- --- --- --- ---
Mother's Age -0.03 --- --- --- --- ---
(0.005) --- --- --- --- ---
Mother's Education -0.03** --- --- --- ---
Years Attained (-2.71) --- --- --- ---
Height -0.05*** --- --- ---
(0.01) --- --- ---
Weight -0.007*** --- ---
(0.001) --- ---
BMI -0.03*** ---
(0.007) ---
Time -0.001***
(0.0002)
Significance Codes: '***' 0.001, '**' 0.01, '*' 0.05
Table 2. Loaded Model
Variable Saturated Model Bootstrapped Corrected
Coefficient p-Value p-Value
Intercept 331.31 < 2e-16 < 2e-16
Calcium (mg) 1.00 0.15 0.13
Vitamin D (IU) 0.99 0.08 1.13e-08
Caffeine (mg) 0.99 0.79 0.77
Machine Used (Old) 1.04 0.71 0.69
Patella Bone Lead 0.99 0.52 0.55
Tibia Bone Lead 1.0008 0.93 0.93
Lactation 1.62 8.11e-05 0.0001
Lead Ceramic Use 1.04 0.21 0.34
% Life in Mexico Cit 1.009 0.93 0.92
Age (Years) 0.97 1.55e-09 4.86e-09
Education (Years) 0.97 0.00222 0.001
Body Mass Index 0.97 0.01 0.02
Continuous Time 1.0001 0.89 0.89
Machine * Patella 1.0002 0.97 0.97
Machine * Tibia 0.99 0.78 0.78
Patella * Tibia 1.0002 0.39 0.39
Machine*Patella*Ti 1.00003 0.92 0.92
ContTime * Lactatio 0.99 0.84 0.83
ContTime * Calcium 0.99 0.21 0.17
ContTime * Vit D 1.000003 0.34 0.30
ContTime * Caffeine 1.0000005 0.81 0.78
ContTime * Patella 0.99 0.62 0.31
ContTime * Tibia 0.99 0.50 0.52
ContTime * BMI 1.00001 0.71 0.70
N 530
Adjusted R2 0.25