Missing data occur when variables in observations have no data. Missing values regularly appear in social sciences, health and other scientific studies because of many reasons including, e.g., forgetting to answer questions, the experimental layout, data collection conditions or experimental time too long. In practice, the lack of data occurring in regression models, e.g., logistic regression models, is inevitable, and it possibly leads to some potential threats to a valid inference or decision. Dealing with missing values, researchers have used some deletion methods, namely complete-case (CC), semiparametric inverse probability weighting (SIPW) ([1] and [2]) and validation likelihood (VL) ([3]) methods, etc., or some multiple imputation (MI) approaches ([4] and [5]) for analysis. However, some literatures show that the methods based on deleting the un-observed values may lead to reducing the efficiency of estimation and the variance type from Rubin's MI method ([4]) may be under-estimated ([6] and [7]). Wang et al. ([3]) proposed the joint conditional likelihood (JCL) method, which is a semiparametric approach and uses both the validation and non-validation data, to estimate the parameters of a logistic regression model with a covariate missing. Lee et al. ([8]) presented a semiparametric estimation method and used the VL method and JCL method for logistic regression with both outcome and covariates missing at random. Hsieh et al. ([9]) also applied the methods of Lee et al. ([8]) to deal with logistic regression with outcome and covariates missing separately or simultaneously. Jiang et al. ([10]) proposed a stochastic approximation version of the expectation-maximization (EM) algorithm, which is based on Metropolis-Hastings algorithm, to perform statistical inference for logistic regression with missing covariates. In this study, the JCL method is proposed to estimate the parameters of a logistic regression model when two covariate vectors are missing separately or simultaneously by using one validation and three non-validations data sets to improve estimation. The asymptotic results of the JCL estimators are established under the assumption that all observable covariate variables including surrogates are categorical. Simulation results show that the proposed method is the most efficient compared to the CC, SIPW, and VL methods. The proposed methodology is illustrated by a real data example.
Tạp chí: Hội thảo "Hệ thống thông tin trong kinh tế - WISE15", Phòng B201, trường ĐH Kinh Tế TPHCM, 279 Nguyễn Tri Phương, Q10, Thời gian: ngày 19/12/2015
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên