Upload
trancong
View
222
Download
3
Embed Size (px)
Citation preview
UNIVERSITI PUTRA MALAYSIA
AHAMEFULE HAPPY JOHN
FS 2013 42
ROBUST RANDOM REGRESSION IMPUTATION METHOD FOR MISSING DATA IN THE PRESENCE OF OUTLIERS
© COPYRIG
HT UPM
ROBUST RANDOM REGRESSION IMPUTATION
METHOD FOR MISSING
DATA IN THE PRESENCE OF OUTLIERS
By
AHAMEFULE HAPPY JOHN
Thesis Submitted to the School of Graduate Studies, Universiti
Putra Malaysia, in the Fulfilment of the Requirements
for the Master of Applied Statistics
December 2013
© COPYRIG
HT UPM
ii
DEDICATION
To God who made it all possible
To my family who wanted to see me reach my goals
To my beloved supervisor who inspired and guided me all the
way
To my friend who was there for me
© COPYRIG
HT UPM
iii
Abstract of thesis presented to the Senate of Universiti Putra Malaysia in
fulfilment of the requirements for the degree of Master of Science
ROBUST RANDOM REGRESSION IMPUTATION
METHOD FOR MISSING
DATA IN THE PRESENCE OF OUTLIERS
By
AHAMEFULE HAPPY JOHN
December 2013
Chairman: Md. Sohel Rana, PhD
Faculty: Science
The Ordinary Least Square (OLS) estimator is the best regression
estimator if all the assumptions are met. However, the presence of
missing data and outliers can distort the Ordinary Least Squares
estimation and increase the variability of the parameters estimates. The
main focus of this research is to take remedial measure in missing data
in regression in the presence of outliers. In regression analysis, the
dependent variable (Y) is a function of the independent variable X. Thus, in
regression, outliers and missing values can come in both X and Y directions.
It is very common to use the OLS base Random Regression Imputation (RRI)
when missing values are in Y direction. This RRI seems to be a good method
if there are no outliers in the data. Unfortunately, this estimate performs poorly
in the presence of outliers. It is because the RRI is OLS base imputation
© COPYRIG
HT UPM
iv
method and OLS is largely affected by outliers. As such, we modified an
OLS base Random Regression Imputation (RRRI) methods by
incorporating the robust MM estimate which is less affected by outliers. The
proposed method is compared with some well-known methods of estimating
missing data. The results of the study signify that the RRRI method
outperforms the existing methods in the presence of outliers. Since in
regression, outliers and missing data can come in both directions, we also
considered a situation in which observations are missing in the X
explanatory variable. In this respect, the Dummy Variable (DV) approach is
one of the best approaches to predict the missing data model. However, this
approach also becomes poor in the presence of outliers. As an alternative,
Robust Inverse Regression Technique is proposed to get the better
estimate. By examining the real data and Monte Carlo Simulation
studies, it revealed that our proposed robust methods perform better than
the classical methods.
© COPYRIG
HT UPM
v
Abstrak tesis dikemukakan kepada Senat Universiti Putra Malaysia sebagai
memenuhi syarat bagi Sarjana Statistik Gunaan
KAEDAH IMPUTASI REGRESI RAWAK TEGUH BAGI DATA
HILANG DENGAN KEHADIRAN TITIK TERPENCIL
Oleh
AHAMEFULE HAPPY JOHN
Disember 2013
Pengerusi: Md. Sohel Rana, PhD
Fakulti: Sains
Penganggar Kuasa Dua Terkecil Biasa (KDTB) adalah penganggar regresi
yang paling baik sekiranya semua andaian dipenuhi. Walau bagaimanapun
kehadiran data hilang dan titik terpencil boleh memesongkan anggaran Kuasa
Dua Terkecil Biasa dan meningkatkan kebolehubahan penganggaran
parameter. Fokus utama kajian ini ialah untuk mengambil langkah pembetulan
terhadap data hilang dalam regresi dengan kehadiran titik terpencil. Dalam
analisis regresi, pemboleh ubah bersandar (Y) adalah fungsi kepada
pembolehubah tak bersandar X. Oleh itu, dalam data regresi, titik terpencil
dan nilai hilang boleh wujud pada arah X dan Y. Penggunaan Imputasi
Regresi Rawak (IRR) yang berasaskan Kaedah Kuasadua Terkecil Biasa
(KKTB) sering digunakan apabila nilai hilang pada arah Y. Kaedah IRR
merupakan kaedah yang baik jika titik terpencil tiada dalam data. Malangnya,
penganggar ini lemah sekiranya wujud titik terpencil . Ini kerana IRR adalah
kaedah imputas yang berasaskan KDTB dan KDTB sangat dipengaruhi oleh
titik terpencil. Oleh itu kami telah mengubahsuai kaedah Imputasi Regresi
Rawak (IRR) yang berasaskan KDTB dengan menggabungkan penganggar
MM teguh yang kurang dipengaruhi oleh titik terpencil. Kaedah yang
© COPYRIG
HT UPM
vi
dicadangkan ini dibandingkan dengan kaedah penganggar data hilang yang
terkenal. Hasil kajian menunjukkan kaedah IRR mengatasi kaedah sedia ada
bagi data yang mengandungi titik terpencil. Oleh keruna dalam regresi, titik
terpencil dan data hilang boleh berada di kedua-dua arah, kami turut
mengambil kira situasi dengan cerapan hilang dalam pemboleh ubah tak
bersandar X. Dalam hal ini, pendekatan Pemboleh ubah Dami (PD)
merupakan suatu pendekatan yang paling baik bagi meramal model data
hilang. Walau bagaimanapun, pendekatan ini menjadi lemah dengan kehadiran
titik terperncil. Sebagai alternatif, Teknik Regresi Teguh Songsang
dicadangkan bagi mendapatkan anggaran yang lebih baik. Dengan mengkaji
data sebenar dan kajian Simulasi Monte Carlo, kaedah teguh yang kami
cadangkan didapati lebih baik berbanding kaedah-kaedah yang lama.
© COPYRIG
HT UPM
vii
ACKNOWLEDGEMENTS
I wish to take this opportunity to express my sincere gratitude to all
who have helped me in diverse ways to complete this thesis successfully.
First and foremost, I want to thank God for making it all possible.
Special thanks to my supervisor, Dr. Md. Sohel Rana for his excellent
guidance, patience and constructive ideas throughout the course of this
thesis. I am grateful to have a patient and motivational supervisor to
support me at all times during this study. Many thanks also go to my co-
supervisor, Prof. Dr. Habshah Midi for her valuable guidance. Without her
great support, it was impossible to complete this thesis.
I would also like to extend my deepest appreciation to Dr. John
Nwabueze Igabari for his role as a mentor and role model in my
studies and also all my other lecturers who taught and shared their
valuable knowledge with me.
Last but not least, I would like convey my gratitude to my parents for
their inspiration and to my beloved family, my wife and friends too for
their support and spiritual encouragement throughout my studies in
Universiti Putra Malaysia.
© COPYRIG
HT UPM
viii
I certify that a thesis Examination Committee has met on 13/ 12/13 to
conduct the final examination of Ahamefule Happy John on his thesis
entitled “Robust Random Regression Imputation Method for Missing Data
in the Presence of Outliers” in accordance with Universities and
University Colleges Act 1971 and the Constitution of the Universiti
Putra Malaysia [P.U.(A) 106] 15 March 1998. The Committee
recommends that the student be awarded the Master of Science of
Applied Statistics.
Members of the Examination Committee are as follows:
Y. Bhg. Prof. Dr. Noor Akma bt Ibrahim
Mathematics Department
Science Faculty
Universiti Putra Malaysia
43400 UPM Serdang
(Chairman)
Prof. Madya Dr. Mohd Rizam b Abu Bakar
Mathematics Department
Science Faculty
Universiti Putra Malaysia
43400 UPM Serdang
(Internal Examiner)
Dr. Jayathi a/p Arasan
Mathematics Department
Science Faculty
Universiti Putra Malaysia
43400 UPM Serdang
(Internal Examiner)
Prof. Madya Dr. Yong Zulina Zubairi
Pusat Asasi Sains
University of Malaya
50603 Kuala Lumpur
(External Examiner)
________________________________
SEOW HENG FONG, PhD
Professor and Deputy Dean
School of Graduate Studies
Universiti Putra Malaysia
Date:
© COPYRIG
HT UPM
ix
This thesis was submitted to the Senate of Universiti Putra Malaysia
and has been accepted as fulfilment of the requirement for the degree
of Master of Science. The members of the Supervisory Committee were
as follows:
Md. Sohel Rana, PhD
Senior Lecturer
Faculty of Science
Universiti Putra Malaysia
43400 UPM Serdang
(Chairman)
Habshah Midi, PhD
Professor
Faculty of Science
Universiti Putra Malaysia
43400 UPM Serdang
(Member)
_______________________________
BUJANG BIN KIM HUAT, PhD
Professor and Deputy Dean
School of Graduate Studies
Universiti Putra Malaysia
Date:
© COPYRIG
HT UPM
x
DECLARATION
I declare that the thesis is my original work except for quotations and
citations which have been duly acknowledged. I also declare that it has
not been previously, and is not concurrently, submitted for any other
degree at Universiti Putra Malaysia or any other institutions.
_____________________________
AHAMEFULE HAPPY JOHN
Date: 13/12/2013
© COPYRIG
HT UPM
xi
TABLE OF CONTENTS
page
ABSTRACT iii
ABSTRAK v
ACKNOWLEDGEMENTS vii
APPROVAL viii
DECLARATION x
LIST OF TABLES xiii
LIST OF FIGURES xiv
LIST OF APPENDICES xv
LIST OF ABBREVIATIONS xvi
1 INTRODUCTION
1.1 Research Background 1
1.2 Importance and Motivation of the Study 1
1.3 Research Objectives 2
1.4 Plan of the study 3
2 LITERATURE REVIEW
2.1 Introduction 4
2.2 Missing Data Mechanisms 4
2.3 Imputation Concepts of Missing Data 5
2.4 Single Imputation 6
2.5.1 Mean Imputation 6
2.5.2 Regression Imputation 6
2.5.3 Stochastic Regression Imputation 7
2.5.4 Dummy Variable Regression Imputation 7
2.5.5 Hot Deck Imputation 8
2.6 Other Imputation Methods 8
2.6.1 Summary of Advantages and Disadvantages of Imputation 9
2.7 Outliers in Linear Regression and Missing Data 9
2.8 Identification of Outliers in Linear Regression 10
2.9 Introduction to Robust Estimators 11
2.9.1 Basic Concepts of Robust Estimators 11
2.9.2 LMS Estimator 11
2.9.3 LTS Estimator 11
2.9.4 M Estimator 13
2.9.5 S Estimator 13
2.9.6 MM Estimator 14
2.10 Conclusion 15
© COPYRIG
HT UPM
xii
3 ROBUST REGRESSION IMPUTATION IN THE
PRESENCE OF OUTLIERS HAVING MISSING
DATA IN THE DEPENDENT VARIABLE
3.1 Introduction 16
3.2 Methodology 16
3.3 Numerical Example: Apple Crop Data 17
3.3.2 Toluca Company Refrigeration Manufacturing Data 21
3.4 Simulation Study 39
3.4.1 One Independent Variable 43
3.4.2 Two Independent Variables 43
3.5 Conclusion 50
4 INVERSE REGRESSION ADJUSTMENT ON MISSING
IN X DIRECTION IN THE PRESENCE OF OUTLIERS
4.1 Introduction 51
4.2 Methodology 51
4.2.1 Dummy Variable Approach 52
4.2.2 INV OLS 52
4.2.3 INV MM 53
4.2.4 INV LTS 53
4.3 Numerical Example: Pilot-Plant Data 53
4.3.2 Artificial Data 55
4.4 Simulation Study 61
4.5 Conclusion 78
5 CONCLUSIONS AND RECOMMENDATIONS
5.1 Introduction 79
5.2 Robust Regression Imputation 79
5.3 Inverse Regression Adjustment on Missing in X Direction
in the Presence of Outliers 79
5.4 Conclusion 79
5.5 Areas of Further Research 80
REFERENCES 81
APPENDICES 84
BIODATA OF STUDENT 105
LIST OF PUBLICATIONS 106
LIST OF TABLES
© COPYRIG
HT UPM
xiii
Table Page
3.1 Comparison of the correlation coefficients of the variables of
the Apple Crop original and modified data. 17
3.2 Regression output from different methods. 19
3.3 Correlation of 5% missing data. 21
3.4 Correlation of 10% missing data. 22
3.5 Correlation of 15% missing data. 23
3.6 Correlation of 20% missing data. 24
3.7 Correlation of 25% missing data. 25
3.8 Correlation of 30% missing data. 26
3.9 Regression output of the 5% missing data. 27
3.10 R2 and adjusted R2 (�̅�2) from the Regression line plot
for 5% missing data 29
3.11 Regression output of the 10% missing data. 30
3.12 R2 and adjusted R
2 (�̅�𝟐) from the Regression line plot
for 10% missing data 30
3.13 Regression output of the 15% missing data. 31
3.14 R2 and adjusted R
2 (�̅�𝟐) from the Regression line plot
for 15% missing data 31
3.15 Regression output of the 20% missing data. 32
3.16 R2 and adjusted R
2 (�̅�𝟐) from the Regression line plot
for 20% missing data 38
3.17 Regression output of the 25% missing data. 38
3.18 R2 and adjusted R
2 (�̅�𝟐) from the Regression line plot
for 25% missing data 38
3.19 Regression output of the 30% missing data. 39
3.20 R2 and adjusted R
2 (�̅�𝟐) from the Regression line plot
for 30% missing data 39
3.21 Bias, SE, MSE and RSE of the regression estimators with 10%
missing data with one independent variable. 40
3.22 Bias, SE, MSE and RSE of the regression estimators with 20%
missing data with one independent variable. 41
3.23 Bias, SE, MSE and RSE of the regression estimators with 30%
missing data with one independent variable 42
3.24 Bias, SE, MSE and RSE of the regression estimators with 10%
missing data with two independent variables. 44
3.25 Bias, SE, MSE and RSE of the regression estimators with 20%
missing data with two independent variables. 46
3.26 Bias, SE, MSE and RSE of the regression estimators with 30%
missing data with two independent variables. 48
4.1 The regression output of 10% missing data with 0%, 10%
and 20% outliers. 54
4.2 Regression output of the 5% missing data with Outliers in Y
direction. 56
4.3 Regression output of the 10% missing data. 57
4.4 Regression output of the 15% missing data. 58
© COPYRIG
HT UPM
xiiii
4.5 Regression output of the 20% missing data. 59
4.6 Regression output of the 25% missing data. 60
4.7 Regression output of the 30% missing data. 61
4.8 Summary output of 5% missing with Sample size 20 62
4.9 Summary output of 10% missing with Sample size 20 63
4.10 Summary output of 15% missing with Sample size 20 64
4.11 Summary output of 5% missing with Sample size 20 65
4.12 Summary output of 5% missing with Sample size 40 66
4.13 Summary output of 10% missing with Sample size 40 67
4.14 Summary output of 15% missing with Sample size 40 68
4.15 Summary output of 20% missing with Sample size 40 69
4.16 Summary output of 5% missing with Sample size 60 70
4.17 Summary output of 10% missing with Sample size 60 71
4.18 Summary output of 15% missing with Sample size 60 72
4.19 Summary output of 20% missing with Sample size 60 73
4.20 Summary output of 5% missing with Sample size 100 74
4.21 Summary output of 10% missing with Sample size 100 75
4.22 Summary output of 15% missing with Sample size 100 76
4.23 Summary output of 20% missing with Sample size 100 77
LIST OF FIGURES
© COPYRIG
HT UPM
xivi
Figure Page
3.1 Regression line plot for original data (without outliers). 20
3.2 Regression line plot for modified data (with outliers). 20
3.3 Regression line plot for 5% missing data. 28
3.4 Regression line plot for 10% missing data. 33
3.5 Regression line plot for 15% missing data. 34
3.6 Regression line plot for 20% missing data. 35
3.7 Regression line of the 25% missing data. 36
3.8 Regression line of the 30% missing data. 37
© COPYRIG
HT UPM
xvi
LIST OF APPENDICES
Appendix Page
A1 Apple Crop Data (Original and Modified data). 84
A2 Original and Modified data with 5% outliers. 85
A3 Original and Modified data with 10% outliers. 86
A4 Original and Modified data with 15% outliers. 87
A5 Original and Modified data with 20% outliers. 88
B1 Pilot-Plant data set. 89
B2 The 10% missing values with 0% outliers. 90
B3 The 10% missing values with 10% outliers. 91
B4 The 10% missing values with 20% outliers. 92
C1 Robust Regression Imputation for analyzing Missing Data. 93
C2 Robust Regression Imputation for analyzing Missing Data
- Simulation with one independent variable. 96
C3 Robust Regression Imputation for analyzing Missing Data 99
C4 Inverse Regression Adjustment on Missing in X
direction in the presence of outliers. 102
LIST OF ABBREVIATIONS
© COPYRIG
HT UPM
xvii
DV – Dummy Variable Approach
INV OLS – Inverse Ordinary Least Square
INV LTS – Inverse Least Trimmed Square
LTS Estimator – Least Trimmed Square Estimator
LMS Estimator – Least Median Square Estimator
MSE – Mean Square Error
OLS- Ordinary Least Squares
RSE – Residual Standard Error
SE – Standard Error
© COPYRIG
HT UPM
1
CHAPTER 1
INTRODUCTION
1.1 Research Background
Missing data occurs frequently in many fields; for example, surveys often
contain missing observation caused by the person being interviewed not
understanding the question correctly. The presence of outliers and missing
data make a huge interpretative problem in data analysis. In the content of
the regression model, missing variables occur in both dependent and
independent variables. Let us suppose a model
𝑌𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜀𝑖 , 𝑖 = 1, … , 𝑛 with 𝜀𝑖 ∼iidN(0, 𝜎2) (1.1)
If missing observation is present in the independent variables (X) or
dependent variable (Y), we will get the biased estimate of the parameters
𝛽0 𝑎𝑛𝑑 𝛽1 and also the biased standard error of the parameters. However,
the problem becomes more serious if there exists some unusual observations that
we often call outlier. It makes a huge interpretative problem in the model
as the parameters estimate of the model is affected by outliers.
For dealing with the outliers in the context of regression model, we have some
robust estimators such as M-Estimator, MM-Estimator, 𝐿1 - norm Estimator,
LTS-Estimator, LMS-Estimator, etc. Unfortunately, to the best of our
knowledge, no efficient method is available when both outliers and
missing data come together in a data set. In regression analysis, the
missing value is estimated based on random regression or regression
imputation. However, these imputation methods are not reliable in the
presence of outliers since these imputation methods are based on Ordinary
Least Squares (OLS).
In this research, the major question that comes to our mind is, ‘Are the
OLS based imputation estimator robust in the presence of outliers’? The
answer is probably, ‘No’. The reason is that the classical OLS is greatly affected
by outliers (Marrona, 2006; Rousseuw, 2003). We proposed to estimate the
missing data by using robust regression approaches. The performance of
our proposed methods would be evaluated by Monte Carlo simulation
approach and real data analysis. We would be inclined to propose our
robust methods based on the existing robust estimator such as MM-
Estimator. Our proposed methods would estimate the missing data in the
presence of outliers.
1.2 Importance and Motivation of the Study
Missing data is a common problem in many researches. Information is usually
missing in some variables for some cases in a typical data set and most
© COPYRIG
HT UPM
2
data analysis procedures are designed for corrected samples. However,
standard methods are not directly applicable if there are missing data.
It is important to know why missing data is a problem. The most serious
concern is that missing data can bring bias into estimates derived from a
statistical model (Schafer,1997; Alison, 2002). In addition, missing data
can lead to a loss of information and statistical power (Little and Rubin,
2002). Also, another problem is that missing data makes common
statistical methods difficult to apply. Finally, missing data causes a waste in
valuable resources. Due to all these reasons, missing data gives problems in
various fields of research.
Many times, difficulties arise when practitioners try to apply Ordinary
Least Squares regression estimation to real world data which have some
missing observation and outliers. Model adequacy diagnostics show poor
Ordinary Least Squares fit because of missing observation and outliers.
The need to determine the performance of OLS is demanding due to the
quick development of alternative robust estimators. Opportunities are also
available for the development of improved robust estimators. Even then,
progress continues to be made in making algorithms for robust methods
ready for implementation when Ordinary Least Squares fail. It is ideal
that improved methods be computationally practical and available in many
software such as R, S-Plus, etc.
In regression analysis, when missing is in the response variable Y, the
Random Regression Imputation (RRI) is usually used (Little and Rubin,
1987). Unfortunately, this RRI is affected by outliers. This problem has
inspired us to develop a new robust method which is resistant to outliers.
We incorporate the robust estimators in the missing value imputation
methods. We proposed to use the Robust Random Regression Imputation
(RRRI) instead of RRI.
In this thesis, we also consider the situation when data missing is on X
direction and there exists some outliers in the data. It is very common to
use the dummy variable regression approach when data is missing in X
direction. However, this dummy variable regression approach also gives
biased estimate of parameters and biased standard error of parameters in
the presence of outliers. Thus, we proposed a new alternative robust method.
Instead of dummy variable approach, in this situation, we proposed robust
inverse regression to get more reliable estimates of parameters.
1.3 Research Objectives
In regression analysis, missing data comes in both X and Y directions together
with outliers. It is worth mentioning that these types of problems are very
common in the regression data. Hence, in this thesis, we have two main aims
which are as follows.
1. To propose a robust imputation approach when missing values and outliers
present in the Y direction of the regression data. Since in the presence of
© COPYRIG
HT UPM
3
outliers, the classical random Regression Imputation (RRI) is affected by
outliers, we attempt to robustify RRI by using robust MM-estimator in
order to get the robust estimate of missing data in the presence of outliers.
2. To propose a robust imputation approach when missing values are present
in X direction and outliers are present in Y direction. It is seen that the
classical Dummy variable Regression approach is affected by outliers.
Thus, based on the robust inverse regression methods, we aim to estimate
the missing values in X direction in the presence of outliers.
3. To get the benefits from our proposed robust imputation methods
compared to the classical methods, the real data example will be used. We
also conduct simulation studies and applied both classical and robust
methods to compare their performance.
1.4 Plan of the Study
This research is organized into five chapters.
Chapter One: This chapter serves as an introduction to this research. It gives
the background of the research, the problem statement, the research
objectives and plan of the study.
Chapter Two: This chapter provides a brief review of the literature of
missing data, the consequences of missing data, and the techniques for its
estimation. We have also discussed some robust estimators which can help
to estimate the missing data in the presence of outliers.
Chapter Three: In this chapter, a modification of the Random Regression
Imputation (RRI) approach is proposed which we call Robust Random
Regression Imputation (RRRI). A set of numerical results proved that the
Robust Random Regression Imputation approach performs well compared to
the Random Regression Imputation (RRI) in the presence of outliers when
missing data is present in the dependent variable.
Chapter Four: In this chapter, an alternative method of dummy variable
regression approach to estimate the missing data in the independent
variables is proposed. As an alternative, the inverse regression is used. Also,
we proposed the robust version of inverse regression which performs well
in the presence of outliers.
Chapter Five: This last chapter provides the conclusion and significance of the
research findings. I thereby also recommend a number of areas for further
research which could have essential roles in future studies.
© COPYRIG
HT UPM
81
REFERENCES
Allison, P.D. (2002). Missing Data, Sage Series Quantitative Applications in the
social Science. Unpublished PhD thesis, University of Pennsylvania,
Philadelphia.
Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data, 3rd
edition. New York:
Willy.
Buck ,S.F (1960).’ A method of estimation of missing values of multivariate data
suitable for use with an electronic computer. Journal of the Royal Statistical
Society Series. 22: 302-306.
Chatterjee, S. and Hadi, A.S. (2006). Regression Analysis by Examples, 4th edition.
New York: Wiley.
Cohen, J. and Cohen, P. (1985). Applied Multiple Regression and Correlation
Analysis for the Behavioral Sciences, 2nd
edition. Mahwah. N.J : Lawrence
Erlbaum Associates.
Daniel, C., and Wood, F. F. (1971). Fitting Equations to Data, John & Sons. New
York: Willy.
Dempster, A.P, Laird, N.M, Rubin, D.B. (1977). Maximum Likelihood from
Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society.
Series. B (Methodological). 39: 1-38.
Ellenberg, J.H. (1976). Testing for Single outlier from a general regression.
Biometrics. 32: 637-645.
Gujarati, D. (2003). Basic Econometrics, 4th edition. New York: McGraw-Hill.
Habshah, M. and Lau, U. H. (2009). The Performance of Latent Root-M based
Regression. Journal of Mathematics and Statistics. 5: 1-9.
Hampel, F.R. (1974). The influence curve and its role in robust estimation.
Journal of the American Statistical Association. 69 : 383-393.
Hampel, F.R. (1979). Discussion of the meeting on robustness. Proceedings of the
42nd
Session of the ISI. Manila. 100-102.
Hawkins, D.M. Bradu, D. and Kass, G.V. (1984). Location of Several Outliers in
multiple regression data using elemental sets. Technometrics. 26: 197-208.
Heckman, J. J. (1976). The common structure of statistical models of the truncated,
sample selection and limited dependent variables, and a simple estimator of
such models. Annals of Economic and Social Measurement. 5, 475-492.
Huber, P.J. (1981). Robust Statistics. New York: Wiley.
Imon, A.H.M.R.(2005). Identifying multiple influential observations in linear
regression. Journal of Applied Statistics. 32: 929-946
Imon, AHMR. (2009). Deletion residuals in the detection of heterogeneity of
variances in linear regression. Journal of Applied Statistics. 36: 347-358.
© COPYRIG
HT UPM
82
Jones, M.P. (1996). Indicator and Stratification methods for missing explanatory
variables in Multiple linear regression. Journal of the American Statistical
Association. 91: 222-230
Kutner, M. H., Nachtsheim, C. J. and Neter, J. (2008). Applied Linear Regression
Models, 4th
edition. New York: McGraw-Hill.
Little, R.J.A. and Rubin, D.B ,(1987). Statistical Analysis with Missing Data. New
York: Wiley.
Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with missing Data, 2nd
edition, New York: Wiley.
Maronna ,R.A, Martin, R.D and Yohai V.J, (2006). Robust Statistics-Theory and
Methods. New York: Wiley.
McKnight, Patrick E., McKnight. Katherine M; Sidani , Souraya, and Aurelio Jose
Figueredo. (2007) Missing Data: A Gentle Introduction. Guilford Press. New
York: Willy.
Midi, H,. Rana,.S. and A.H.M.R. Imon. (2009). The Performance of robust
weighted least squares in the presence of outliers and heteroscedastic errors.
WSEAS, Transactions on Mathematics. 8: 351-361.
Montgomery, D. C., Peck, E. A. and Vining, G. G. (2001). Introduction to Linear
Regression Analysis. New York: Willy.
Rana S..Midi, H and A.H.M.R Imon.(2012). Robust with wild bootstrap for
stabilizing the variance of parameter estimates in Heteroscedastic regression
models in the presence of outliers. Mathematical Problems in Engineering, vol.
2012, Article ID 730328, 14 Pages.
Rousseeuw P.J. (1984). Least median of squares regression. Journal of American
Statistics Association. 79: 871–880.
Rousseeuw, P. J. and Yohai, V. J. (1984). Robust regression by means of S
estimators. Lecture Notes in Statistics. 26: 256-272.
Rousseuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection.
New York: Willy.
Rousseuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection.
John & Sons. New York: Willy.
Rubin, D.B..(1976). Inference and Missing Data. Biometrika. 63: 581-592.
Ryan, T.P. (1997). Modern Regression Methods, New York: Wiley.
Schafer, J. (1997). Analysis of Incomplete Multivariate Data. New York: Chapman
& Hall.
Schafer, J.L. (2002). Missing Data; Our view of the state of the Art. Psychological
Method. 2: 147-177.
Simpson, J.R. (1995). New Methods and Comparative Evaluations for Robust and
Biased-Robust Regression Estimate. Unpublished Phd Thesis. Arizona State
University.
Snedecor, G.W and Cochran ,W.G (1967). Statistical Methods, 6th
edition, Ames:
Iowa State University Press.
© COPYRIG
HT UPM
83
Srikantan K.S. (1961). Testing for the single outlier in a regression model. Shankhya.
series A. 23: 251-260
Stromberg, A.J. Hossjer, O. and Hawkins, D.M. (2000). The least trimmed and
difference regression estimator and alternatives. Journal of American Statistical
Association. 95: 853-864.
Tukey, J.W. (1977). Exploratory Data Analysis. Addison –Wesley Publisher, Cy.
Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for
regression. The Annals of Statistics, 15: 642-656.