Partial Least Squares regression (PLS) is a multivariate technique developed to perform regression in the case of multivariate responses when multicollinearity, redundancy and noise affect the predictors. In spite of several efforts have been made to extend PLS to classification problems, this is still a current field of research. In the present study, a new technique called PLS for classification is introduced to solve the general G-class problem. It is developed within a self-consistent framework based on linear algebra and on the theory of compositional data. After the introduction of the notion of probability-data vector, the space of the predictors and that of the conditional probabilities are linked, and a well-defined least squares problem, whose solution specifies the relationship between probabilities and predictors, is solved by a suitable reformulation of PLS2. The method estimates directly the conditional probability of the class membership given the predictors. The score vectors are introduced only in a second step to improve model interpretation. The main properties of PLS for classification and its relationships with PLS-DA are discussed. One simulated and one real data sets are investigated to show how the method works in practice.

PLS for classification

Stocchero M.
;
Scarpa B.
2021

Abstract

Partial Least Squares regression (PLS) is a multivariate technique developed to perform regression in the case of multivariate responses when multicollinearity, redundancy and noise affect the predictors. In spite of several efforts have been made to extend PLS to classification problems, this is still a current field of research. In the present study, a new technique called PLS for classification is introduced to solve the general G-class problem. It is developed within a self-consistent framework based on linear algebra and on the theory of compositional data. After the introduction of the notion of probability-data vector, the space of the predictors and that of the conditional probabilities are linked, and a well-defined least squares problem, whose solution specifies the relationship between probabilities and predictors, is solved by a suitable reformulation of PLS2. The method estimates directly the conditional probability of the class membership given the predictors. The score vectors are introduced only in a second step to improve model interpretation. The main properties of PLS for classification and its relationships with PLS-DA are discussed. One simulated and one real data sets are investigated to show how the method works in practice.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3414357
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact