A sparse linear regression model for incomplete datasets

Marcelo B.A. Veras, Diego P.P. Mesquita, Cesar L.C. Mattos, João P.P. Gomes*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)


Incomplete data are often neglected when designing machine learning methods. A popular strategy adopted by practitioners to circumvent this consists of taking a preprocessing step to fill the missing components. These preprocessing algorithms are designed independently of the machine learning method that will be applied subsequently, which may lead to sub-optimal results. An alternative solution is to redesign classical machine learning methods to handle missing data directly. In this paper, we propose a variant of the forward stagewise regression (FSR) algorithm for incomplete data. The original FSR is an iterative procedure to estimate parameters of sparse linear models. The proposed method, named forward stagewise regression for incomplete datasets with GMM (FSIG), models the missing components as random variables following a Gaussian mixture distribution. In FSIG, the main steps of FSR are adapted to deaç with the intrinsic uncertainty of incomplete samples. The performance of FSIG was evaluated in an extensive set of experiments, and our model was able to outperform classical methods in most of the tested cases.

Original languageEnglish
JournalPattern Analysis and Applications
Early online date2019
Publication statusPublished - 4 Dec 2019
MoE publication typeA1 Journal article-refereed


  • Forward stagewise regression
  • Gaussian mixtures
  • Missing data


Dive into the research topics of 'A sparse linear regression model for incomplete datasets'. Together they form a unique fingerprint.

Cite this