1095.Nurita Andayani 3

  • Upload
    fikri17

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 1095.Nurita Andayani 3

    1/8

    1095

    LOGIT MODEL TO PREDICT DIABETES MELLITUS INEMPLOYEE

    Nurita Andayani1) and Moordiani2)

    1)Statistics, Faculty of Pharmacy, Pancasila University2)Pharmacology, Faculty of Pharmacy, Pancasila University

    e-mail: [email protected]

    Abstract. Diabetes mellitus is a metabolic disorder characterized by chronichyperglycemia with disturbances of carbohydrate, fat and protein metabolismresulting from defects in insulin secretion, insulin action, or both. According to thedata achieved from WHO, more than 220 million people in the world suffered from

    diabetes mellitus, where 17 million people are Indonesian (8.6% of Indonesiaspopulation) and we can say that after India, China, and USA, Indonesia is in thefourth order in the world. Diabetes mellitus patients commonly felt fatigue, itchy inthe skin, paresthesia, blurry vision or other. If this disease develops, it will have aneffect on complication such as the increasing risk of blindness, stroke, and death.

    And this condition could reduce the working productivity in companies employees.Thus, it is important to know the significant indicator to predict the probability ofthe existence of this disease using logistic regression or logit model. From the dataanalysis, we achieved logit model:From the data analysis, we achieved logit model:

    BMItriglageageagesex XXXXXx

    x057.0004.0719.0814.1991.2718.0925.3

    )(1

    )(ln )3()2()1()1(

    This model significantly explained that diabetes disease in employees depends onsex, age, triglyceride, and BMI (Body Mass Index) or IMT. From this model

    Keywords: Regression, logit, diabetes

    1 IntroductionDiabetes mellitus is a metabolic disorder characterized by chronic hyperglycemiawith disturbances of carbohydrate, fat and protein metabolism resulting fromdefects in insulin secretion, insulin action, or both. The classification of diabetes isbased on aetiological types. Type 1 indicates the processes of beta-cell destruction

    that may ultimately lead to diabetes in which insulin is required for survival. Type2 diabetes is characterized by disorders of insulin action and /or insulin secretion.

    The third category, "other specific types of diabetes," includes diabetes caused by aspecific and identified underlying defect, such as genetic defects or diseases of theexocrine pancreas. The latest WHO Global Burden of Disease estimates theworldwide burden of diabetes in adults to be around 173 million in the year 2002(World Health Organization, 1999).

    According to the data achieved from WHO, more than 220 million people in theworld suffered from diabetes mellitus, where 17 million people are Indonesian

    Proceedings of the Third International Conference on Mathematics and Natural Sciences(ICMNS 2010)

  • 8/12/2019 1095.Nurita Andayani 3

    2/8

    Nurita Andayani, Moordiani

    1096

    (8.6% of Indonesias population) and we can say that after India, China, and USA,

    Indonesia is in the fourth order in the world. The diabetes epidemic is acceleratingin the developing world, with an increasing proportion of affected people in youngerage groups. Thus, we need to prevent and control the increase of this disease. Theaim of this study is to predict the probability of the existence of this disease inemployees at companies. However, this study should serve only as a prediction forthe occurence of diabetes based on the data we have.

    Diabetes patients commonly felt fatigue, itchy in the skin, paresthesia, blurry

    vision or other. If this disease develops, it will have an effect on complications suchas the increasing risk of blindness, stroke, and death. Diabetes in employees couldreduce their working productivity and it will be on companies responsibility tocarry if this happend continously. Generally, the company would ask the employeecandidates to undergo the medical check up which covers body weight/height,cholesterol, body mass index and others. In this study, we want to predict the

    probability of someone to have diabetes in the future based on their data such assex, age, cholesterol (Low Density Lipoprotein or LDL level, High DensityLipoprote in or HDL, triglyceride), and Body Mass Index (BMI). These data had beenprocessed using logistic regression.

    Logistic regression (sometimes called logistic model or logit model) is a statisticalmethod for describing the relationship between response variable and explanatoryvariable. The response variable has only two values, for instance : success orfailure, live or die, acceptable or not. The explanatory variables can be eithercategorical or quantitative (Hosmer and Lemeshow). Logistic Regression have thesame meaning with multiple regression analysis, but response variable on logisticregression is a dummy variable (0 and 1). This is a modeling procedure to describethe relationship between response variable (Y) which is categorical variable and oneor more predictor variables (X), either categorical or continuous variables. Forexample, suppose the response variable consist of two categorical variables, whichY = 1 as success and Y = 0 as failed, then we can apply the binary logisticregression for the logistic regression method. For one research object, a condition

    with two categorical variables causing y have Bernoulli distribution. Distribution of

    probability function for y with as a parameter isyy

    yYP 1)1()(

    where y = 0 and 1. Then probability to each category is )1(YP and

    1)0(YP where 10,)(yE . Generally logistic regression forprobability model involving some predictor variables (x) can be formulated as :

    )...(

    )...(

    22110

    22110

    1)(

    pp

    pp

    xxx

    xxx

    e

    exyE (1)

    )(x is a non linear function. Hence, we need to use logit transformation to get a

    linear function so we can see the relationship between response variable ordependent variable (y) and predictor variables or independent variables (x). Logit

    form from )(x mentioned as )(xg :

    )(1

    )(ln)(

    x

    xxg (2)

  • 8/12/2019 1095.Nurita Andayani 3

    3/8

    Logit Model to Predict Diabetes Mellitus in Employee

    1097

    After equation (1) substituted into equation (2) then :

    ppxxxx

    x...

    )(1

    )(ln 22110

    (3)

    2 DataThis research uses 2051 data taken from private laboratory in Surabaya,Indonesia. The respondent data were chosen from people who work as an employeeon their company. Act as a response variable is diabetes (DM = 1) and non diabetes

    (DM = 0). Act as predictor variables are sex (female = 1 and male = 0), age range(less than or equal to 30 years old = 1, 31-40 years old = 2, 41-50 years old = 3,and more than 50 years old= 4), total cholesterol, LDL, triglyceride, and body massindex (BMI). The result using SPSS program for binary logistic regression weregiven below :

    Table 1. Number of cases in model

    All 2051 cases were included in the analysis and no missing cases to analysis(table 1.). And table 2 shows dependent variable encoding where interval value nonDM (non diabetes) is 0 and interval value DM (diabetes) is 1. The Nagelkerke RSquare shows that about 23.2% of the variation in the outcome variable (DM) isexplained by this logistic model.

    Table 2. Predicted outcome coding

    Table 3. Categorical variables coding

    Case Processing Sum mary

    2051 100.0

    0 .0

    2051 100.0

    0 .0

    2051 100.0

    Unwe ighted Casesa

    Included in A nalysis

    Missing Cases

    Total

    Selected Cases

    Unselected Cases

    Total

    N Percent

    If w eight is in ef fect, s ee classif ication table fo r the totalnumber of cases.

    a.

    epe ndent Variable Encoding

    0

    1

    Original Value

    non DM

    DM

    Internal Value

  • 8/12/2019 1095.Nurita Andayani 3

    4/8

    Nurita Andayani, Moordiani

    1098

    Table 4. Amount of variation explained by the model

    Table 5. Model discrimination

    The Wald estimates in table 6 give the importance of the contribution of eachvariable in the model. The higher the value, the more important it is. If weinterest in predictor model then sex, age, triglyceride, and BMI are important riskfactors to having diabetes (DM), with p-values of 0.018, 0.000, 0.000, and 0.013

    where they are less than significant level 0.05. Because LDL and HDL in model notsignificantly to predict diabetes then they are omitted from model althoughmulticolinearity is not shown.

    Categorical Variables Codings

    572 1.000 .000 .000

    592 .000 1.000 .000

    630 .000 .000 1.000

    257 .000 .000 .000

    1258 1.000

    793 .000

    50 tahun

    age

    range

    male

    female

    Sex

    Frequency (1) (2) (3)

    Parameter coding

    Model Summ ary

    998.098 .104 .232

    Step

    1

    -2 Loglikelihood

    Cox & SnellR Square

    NagelkerkeR Square

    Classification Tablea

    1865 5 99.7

    174 7 3.991.3

    Observed

    non DM

    DM

    DM

    Overall Percentage

    Step 1

    non DM DM

    DM

    Percentage

    Correct

    Predicted

    The cut value is .500a.

  • 8/12/2019 1095.Nurita Andayani 3

    5/8

    Logit Model to Predict Diabetes Mellitus in Employee

    1099

    Table 6. Estimates of the logistic regression model

    Table 7. Correlation matrix for Diabetes model

    New model (table 10) showed that there is not any variable not significant in model,all p-value (sig.) less than 5%. The Exp(B) gives the Odds Ratios. Since triglycerideis a quantitative numerical variable, an increase in one-level in triglyceride has a0.4% increase in odds of having diabetes. This 0.4% is obtained by taking Exp(B)

    for triglyceride1. Male compared to female is 2.051 (95% CI 1.283 to 3.278) timesmore likely to have diabetes. For age (age range) 31-40 years old compared to lessthan equal 30 years old is 0.05 (95% CI 0.021 to 0.119) times less likely to havediabetes, age 41-50 years old compared to 31-40 years old and less than equal 30

    years is 0.163 (95% CI 0.099 to 0.268) times less likely to have diabetes, and agemore than 50 years old compared to other age is 0.487 (95% CI 0.334 to 0.710)less likely to have diabetes and an increase in one-level in body mass index (BMI)has 5.8% increase in odds of having diabetes. This model analysis has a little bitdifferent with general research which showed that increasing age may increase therisk to have diabetes in the future, but we need to consider another factor, diet,lifestyle, or genetical properties of someone for instance. In the other hand thisdata research was taken from different persons that might be result in different

    figure of diabetes probability model with the general theory.

    Table 8. Amount of variation explained by the model after HDL and LDL omitted

    Variables in the Equation

    .628 .265 5.645 1 .018 1.875 1.116 3.148

    78.465 3 .000

    -2.972 .443 44.932 1 .000 .051 .021 .122

    -1.809 .256 49.994 1 .000 .164 .099 .271

    -.715 .193 13.751 1 .000 .489 .335 .714

    .054 .022 6.168 1 .013 1.056 1.012 1.102

    .001 .002 .370 1 .543 1.001 .997 1.006

    -.008 .011 .518 1 .472 .992 .972 1.013

    .004 .001 25.166 1 .000 1.004 1.002 1.005

    -3.625 .936 14.983 1 .000 .027

    SEX(1)

    AGE

    AGE(1)

    AGE(2)

    AGE(3)

    BMI

    LDL

    HDL

    TRIGL

    Constant

    Step

    1a

    B S.E. Wald df Sig. Exp(B) Low er Upper

    95.0% C.I.for EXP(B)

    Var iable(s ) entered on step 1: SEX, AGE, BMI, LDL, HDL, TRIGL.a.

    Correlation Matrix

    1.000 -.430 -.127 -.154 -.105 -.636 -.258 -.685 -.358

    -.430 1.000 .090 .093 .040 .024 -.102 .423 -.020

    -.127 .090 1.000 .215 .269 -.008 .106 .023 .041

    -.154 .093 .215 1.000 .463 -.044 .102 .057 .001

    -.105 .040 .269 .463 1.000 -.090 .055 .012 .003

    -.636 .024 -.008 -.044 -.090 1.000 -.055 .113 -.041

    -.258 -.102 .106 .102 .055 -.055 1.000 -.116 .076

    -.685 .423 .023 .057 .012 .113 -.116 1.000 .403

    -.358 -.020 .041 .001 .003 -.041 .076 .403 1.000

    Constant

    SEX(1)

    AGE(1)

    AGE(2)

    AGE(3)

    BMI

    LDL

    HDL

    TRIGL

    Step

    1

    Cons tant SEX(1) A GE(1) A GE(2) A GE(3) BMI LDL HDL TRIGL

    Model Summary

    998.903 .104 .232

    Step

    1

    -2 Log

    likelihood

    Cox & Snell

    R Square

    Nagelkerke

    R Square

  • 8/12/2019 1095.Nurita Andayani 3

    6/8

    Nurita Andayani, Moordiani

    1100

    The correlation values (table 11) among sex, age, triglyceride, and BMI are low but

    the correlation between BMI and the constant is rather high (r = -0.894) whichshows some multicolinearity. Our recommendation is to keep the constant term inthe model as it acts as a garbagebin, collecting all unexplained variance in themodel (recall from table 8 that variation in the variables only explains 23.2%).

    Table 9. Model discrimination after LDL and HDL omitted

    Table 10. Estimates of the logistic regression model after LDL and HDL omitted

    Table 11. Correlation matrix for Diabetes model after LDL and HDL omitted

    Table 12. Hosmer-Lemeshow test

    Classification Tablea

    1864 6 99.7

    174 7 3.9

    91.2

    Observed

    non DM

    DM

    DM

    Overall Percentage

    Step 1

    non DM DM

    DM

    Percentage

    Correct

    Predicted

    The cut value is .500a.

    Variables in the Equation

    .718 .239 9.015 1 .003 2.051 1.283 3.278

    80.973 3 .000

    -2.991 .441 46.093 1 .000 .050 .021 .119

    -1.814 .254 51.014 1 .000 .163 .099 .268

    -.719 .192 13.976 1 .000 .487 .334 .710

    .057 .022 6.881 1 .009 1.058 1.014 1.104

    .004 .001 33.504 1 .000 1.004 1.002 1.005

    -3.925 .600 42.821 1 .000 .020

    SEX(1)

    AGE

    AGE(1)

    AGE(2)

    AGE(3)

    BMI

    TRIGL

    Constant

    Step

    1a

    B S.E. Wald df Sig. Exp(B) Low er Upper

    95.0% C.I.for EXP(B)

    Var iable(s ) entered on s tep 1: SEX, AGE, BMI, TRIGL.a.

    Correlation Matrix

    1.000 -.271 -.119 -.125 -.124 -.894 -.063

    -.271 1.000 .096 .084 .042 -.031 -.226

    -.119 .096 1.000 .204 .265 -.003 .021

    -.125 .084 .204 1.000 .460 -.043 -.041

    -.124 .042 .265 .460 1.000 -.087 -.009

    -.894 -.031 -.003 -.043 -.087 1.000 -.094

    -.063 -.226 .021 -.041 -.009 -.094 1.000

    Constant

    SEX(1)

    AGE(1)

    AGE(2)

    AGE(3)

    BMI

    TRIGL

    Step

    1

    Cons tant SEX(1) A GE(1) A GE(2) A GE(3) BMI TRIGL

    Hosmer and Leme show Test

    5.139 8 .743

    Step

    1

    Chi-square df Sig.

  • 8/12/2019 1095.Nurita Andayani 3

    7/8

    Logit Model to Predict Diabetes Mellitus in Employee

    1101

    Hosmer-Lemeshow goodness of fit tells us how closely the observed and predicted

    probabilities match. The null hypothesis is the model fits and ap value >0.05 isexpected (Table 12). The overall accuracy of this model to predict subjects havingdiabetes (with a predicted probability of 0.5 or greater) is 91.3% (Table 5). Thesensitivity is given by 3.9% and the specificity is 99.7%. Positive predictive value(PPV) = 7/13 = 46.2% and negative predictive value (NPV) = 1864/2038 = 91.4%.

    For example, we have a male, 41-year-old, 167 for triglyceride level, and 30.4 forBMI which gives the Probability (diabetes) = 0.068; very unlikely that this subject

    has diabetes and the NPV tells us that we are 91.4% confident. Let us takeanother example, a male, 30-year-old, 500 triglyceride level, and 33.7 for BMIwhich gives the Probability (diabetes) = 0.68; very likely that this subject hasdiabetes and the PPV gives a 46.2%confident.

    From the data analysis, we achieved logit model:

    BMItriglageageagesex XXXXXx

    x 057.0004.0719.0814.1991.2718.0925.3)(1

    )(ln )3()2()1()1(

    This model significantly explained that diabetes disease in employees depends onsex, age, triglyceride, and BMI (Body Mass Index) or IMT.

    References

    Agresti, A, 1990, Categorical Data Analysis, John Wiley and Sons.Inc, New York.

    Al-khazrajy, LA., Raheem, YA. & Hanoon, YK., 2010, Sex Differences in the Impactof Body Mass Index (BMI) and Waist/Hip (W/H) Ratio on Patients withMetabolic Risk Factors in Baghdad. Global Journal of Health ScienceVol. 2,

    No. 2.

    Brunham, LR., Kruit, JK., Verchere, CB., and Hayden, MR., 2008, Cholesterol inIslet Dysfunction and Type 2 Diabetes, The Journal of Clinical Investigation,Volume 118 Number 2.

    Chan, YH., 2004, Biostatistics 202: Logistic regression analysis, Singapore MedJournal, Vol. 45(4) : 149.

    Federal Bureau of Prisons, 2009, Management of Diabetes, Clinical PracticeGuideline.

    Friel, C.M., 1998, Probit/Logit Analysis, Criminal Justice Center, Sam HoustonState University.

    Hao, M., Head, WS., Gunawardana, SC., Hasty, AH., and Piston, DW., 2007, DirectEffect of Cholesterol on Insulin Secretion, A Novel Mechanism forPancreatic -Cell Dysfunction. Diabetes, Vol. 56.

  • 8/12/2019 1095.Nurita Andayani 3

    8/8

    Nurita Andayani, Moordiani

    1102

    Hosmer, DW. and Lemeshow, S., Applied Logistic Regression. [online] Available at: [Accessed 23October 2010].

    Laakso, M., Sarlund, H., Ehnholm, C., Voutilainen, E., Aro, A., and K. Py6r~ila.,1987, Re lationship between postheparin plasma lipases and high-densitylipoprotein cholesterol in different types of diabetes. Diabetologia30:703-706.

    Scheffer, PG., Teerlink, T., and Heine, RJ., 2005, Clinical significance of thephysicochemical properties of LDL in type 2 diabetes, Diabetologia, 48:808816.

    Poedjiati, SA., 2010, Perbandingan Ketepatan Model Logit dan Probit Untuk

    Memprediksi Munculnya Penyakit Hipertensi pada Karyawan Perusahaan.Prosiding Seminar Nasional Basic Science VII, vol. 4:212.

    Vasisht, A.K., 2000, Logit and Probit Analysis, I.A.S.R.I., Library Avenue, NewDelhi.

    Wild S, Roglic G, Sicree R, Green A, King H., 2003, Global burden of diabetesmellitus in the year 2000. Global Burden of Disease, Geneva: WHO.

    World Health Organization, 1999, Definition, Diagnosis and Classification ofDiabetes Mellitus and its Complications. Report of a WHO Consultation. Geneva: World Health Organization.

    World Health Organization, 2003, Screening for Type 2 Diabetes. Report of a WorldHealth Organization and International Diabetes Federation meeting. Geneva:World Health Organization.

    http://www.indiana.edu/~lceiub/PY206F05/Logistic.pdfhttp://www.indiana.edu/~lceiub/PY206F05/Logistic.pdf