Recent studies have shown that brain lesions following stroke can be probabilistically mapped onto disconnections of white matter tracts, and that the resulting “disconnectome” is predictive of the patient’s behavioral deficits. Disconnectome maps are sparse, high-dimensional 3D matrices that require unsupervised dimensionality reduction followed by supervised learning for prediction of the associated behavioral data. However, the optimal machine learning pipeline for disconnectome data still needs to be identified. We examined four dimensionality reduction methods at varying levels of compression and used the extracted features as input for cross-validated regularized regression to predict the associated language and motor deficits. Features extracted by Principal Component Analysis and Non-Negative Matrix Factorization were found to be the best predictors, followed by Independent Component Analysis and Dictionary Learning. Optimizing the number of extracted features improved predictive accuracy and greatly reduced model complexity. Moreover, the choice of dimensionality reduction technique was found to optimally combine with a specific type of regularized regression (ridge vs. LASSO). Overall, our findings represent an important step towards an optimal pipeline that yields high prediction accuracy with a small number of features, which can also improve model interpretability.

Assessment of Machine Learning Pipelines for Prediction of Behavioral Deficits from Brain Disconnectomes

Zorzi M.
;
De Filippo De Grazia M.;Blini E.;Testolin A.
2021

Abstract

Recent studies have shown that brain lesions following stroke can be probabilistically mapped onto disconnections of white matter tracts, and that the resulting “disconnectome” is predictive of the patient’s behavioral deficits. Disconnectome maps are sparse, high-dimensional 3D matrices that require unsupervised dimensionality reduction followed by supervised learning for prediction of the associated behavioral data. However, the optimal machine learning pipeline for disconnectome data still needs to be identified. We examined four dimensionality reduction methods at varying levels of compression and used the extracted features as input for cross-validated regularized regression to predict the associated language and motor deficits. Features extracted by Principal Component Analysis and Non-Negative Matrix Factorization were found to be the best predictors, followed by Independent Component Analysis and Dictionary Learning. Optimizing the number of extracted features improved predictive accuracy and greatly reduced model complexity. Moreover, the choice of dimensionality reduction technique was found to optimally combine with a specific type of regularized regression (ridge vs. LASSO). Overall, our findings represent an important step towards an optimal pipeline that yields high prediction accuracy with a small number of features, which can also improve model interpretability.
Lecture Notes in Artificial Intelligence
978-3-030-86992-2
978-3-030-86993-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3419022
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact