Variable selection plays a fundamental role in the analysis of data containing several variables which are redundant or irrelevant to the problem of interest. The ability to identify and discard these variables would make it possible to improve predictive performances and data interpretation, thus reducing costs and computational time. Although many methods have been proposed for feature selection, in some fields there is more interest in selecting groups of variables because of the continuous nature and covariance of adjacent data. This is the case for near-infrared spectroscopy, where several methods, mainly based on partial least squares regression, have been proposed to deal with interval selection. In this article, we consider some of these methods and propose an additional solution based on a variable clustering procedure (Cov/VSURF), Lasso regression and permutation tests. We compare their performances on four different public datasets and discuss the impact of interval selection on the predictive performances of the considered models.
Interval selection: A case-study-based approach
Arboretti R.;Ceccato R.;Pegoraro L.;Salmaso L.
2021
Abstract
Variable selection plays a fundamental role in the analysis of data containing several variables which are redundant or irrelevant to the problem of interest. The ability to identify and discard these variables would make it possible to improve predictive performances and data interpretation, thus reducing costs and computational time. Although many methods have been proposed for feature selection, in some fields there is more interest in selecting groups of variables because of the continuous nature and covariance of adjacent data. This is the case for near-infrared spectroscopy, where several methods, mainly based on partial least squares regression, have been proposed to deal with interval selection. In this article, we consider some of these methods and propose an additional solution based on a variable clustering procedure (Cov/VSURF), Lasso regression and permutation tests. We compare their performances on four different public datasets and discuss the impact of interval selection on the predictive performances of the considered models.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.