Comparing the Performance of Different High Dimensional Variable Selection Techniques on the Low Dimensional HIV/AIDS Data set

  • Anurag Sharma Department of Statistics, University of Delhi, Delhi, India. https://orcid.org/0000-0002-3482-0774
  • Vajala Ravi Department of Statistics, University of Delhi, Delhi, India.
  • Vijay Kumar Sehgal Department of Mathematical Sciences & Computer Applications, Bundelkhand University, Jhansi, Uttar Pradesh, India.
  • Gurprit Grover Department of Statistics, University of Delhi, Delhi, India.
Keywords: Elastic Net, HIV, AIDS, AFT, Censoring

Abstract

Heavy censoring and high dimensionality have caused a great deal of difficulties for fitting and selection of model. This paper focuses on the performance of these four variable reduction techniques proposed by Khan and Shaw (2013) to select the variables to estimate the survival time of low dimensional HIV/ AIDS patients’ data. The techniques used are adaptive elastic net, weighted elastic net, adaptive elastic net with censoring constraints and weighted elastic net with censoring constraints. The performance of these approaches is compared among themselves along with the full model (model with all the predictors). It is observed that Adaptive Elastic Net with Censoring Constraints performed best among all the methods. Moreover, these four techniques can also be used for future prediction of survival time under AFT model.

How to cite this article:
Sharma A, Ravi V, Sehgal VK, Grover G. Comparing the Performance of Different High Dimensional Variable Selection Techniques on the Low Dimensional HIV/AIDS Data set. J Commun Dis 2020; 52(1): 14-21.

DOI: https://doi.org/10.24321/0019.5138.202003

References

Peduzzi PN, Hardy RJ, Holford TR. A stepwise variable selection procedure for nonlinear regression models.

Biometrics 1980; 36: 511-6.

Huang J, Ma S. Variable selection in the accelerated failure time model via the bridge method. Lifetime

Data Analysis 2010; 16: 176-95.

Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Stat Assoc

; 96: 1348-1360.

Tibshirani R. Regression shrinkage and selection via the Lasso. J Roy Statist Soc Ser B 1996; 58: 267-288.

Hunter DR, Li R. Variable selection using MM algorithms. The Annals of Statistics 2005; 33: 1617-1642.

Efron B, Hastie T, Johnstone I et al. Least angle regression. Ann Statist 2004; 32: 407-499.

Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n. The Annals

of Statistics 2007; 35: 2313-2351.

Zou H, Hastie T. Regularization and variable selection via the elastic net. J Roy Statist Soc Ser B 2005; 67:

-320.

Akaike H. Information theory as an extension of the maximum likelihood principle. In Second International

Symposium on Information Theory (Petrov BN, Csaki F, eds.). Akademiai Kiado, Budapest. 1973, 267-281.

Meinshausen N, Buhlmann P. Stability selection. J Roy Statist Soc Ser B 2010; 72: 417-473.

Cox DR. Regression models and life-tables. J Roy Statist Soc Ser B 1972; 34: 187-220.

Tibshirani R. The lasso method for variable selection in the Cox model. Statistics in Medicine 1997; 16: 385-395.

Faraggi D, Simon R. Bayesian Variable selection method for censored survival data. Biometrics 1998; 54: 1475-

Gui J, Li H. Penalized Cox regression analysis in the high dimensional and low-sample size settings, with

applications to microarray gene expression data. Bioinformatics 2005; 21: 3001-3008.

Antoniadis A, Fryzlewicz P, Letue F. The Dantzig selector in cox’s proportional hazards model. Scandinavian

Journal of Statistics 2010; 37: 531-552.

Huang J, Harrington D. Iterative partial least squares with right censored data analysis: A comparison to

other dimension reduction techniques. Biometrics 2005; 61: 17-24.

Datta S, Le-Rademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure

time modeling using partial least squares and LASSO. Biometrics 2007; 63: 259-271.

Sha N, Tadesse MG, Vannucci M. Bayesian variable selection for the analysis of microarray data with

censored outcome. Bioinformatics 2006; 22: 2262-2268.

Wang S, Nan B, Zhu J et al. Doubly penalized Buckley James method for survival data with high-dimensional

covariates. Biometrics 2008; 64: 132-140.

Engler D, Li Y. Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies.

Statistical Applications in Genetics and Molecular Biology 8 Article 14. 2009.

Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics 2006; 65:

-404.

Hu S, Rao JS. Sparse penalization with censoring constraints for estimating high dimensional AFT models

with applications to microarray data analysis. Technical Reports, University of Miami. 2010.

Khan MHR, Shaw JEH. AdapEnetClass: A class of adaptive elastic net methods for censored data R

package version 1.0. 2013a.

Khan MHR, Shaw JEH. On Dealing with Censored Largest Observations under Weighted Least Squares. CRiSM

Working Paper, No. 13-07 Department of Statistics, University of Warwick. 2013b.

Zou H, Zhang HH. On the adaptive elastic-net with adiverging number of parameters. Annals of Statistics

; 37: 1733-1751.

Stute W. (1996). Distributional convergence under random censorship when covariables are present.

Scandinavian Journal of Statistics 1996; 23: 461-471.

Ying Z. A large sample study of rank estimation for censored regression data. Annals of Statistics 1993;

: 76-99.

Buckley J, James I. Linear regression with censored data. Biometrika 1979; 66: 429-436.

Khan MHR, Shaw JEH. Variable Selection with The Modified Buckley-James Method and The Dantzig

Selector for High-dimensional Survival Data. 2013c.

Jin Z, Lin D, Wei LJ et al. Rank-based inference for the accelerated failure time model. Biometrika 2003; 90:

-353.

UNAIDS data (2018): a GAP data report 2018.

UNAIDS (2017): a GAP report 2017.

Published
2020-04-30