In typical machine learning frameworks, model selection is of fundamental impor-tance: commonly, multiple models have to be trained and compared in order to iden-tify the one with the best predictive performances. The aim of this study is toprovide a new tool to improve the model selection process, allowing the user to iden-tify the algorithm which significantly outperforms the other candidates. It proposes arobust model selection procedure based on a multi-aspect permutation test whichmakes it possible to detect differences in both location and variability between twopaired samples of prediction errors. A new extension of the nonparametric combina-tion (NPC) methodology is therefore introduced and is integrated with an appropriateranking procedure in order to deal with the comparison ofC≥2 candidate models. Asimulation study is conducted to evaluate the performances of this testing procedurein 2-sample andC-sample problems, by generating data from various well-known dis-tributions and simulating several possible null and alternative scenarios. The adoptionof the proposed technique in machine learning model selection problems is then dis-cussed by means of multiple real data applications. These applications confirm whatemerges from the simulation study: the introduced NPC-based approach performswell under several different scenarios and represents a valuable tool for robustmachine learning model selection.

Multi-aspect permutation tests for model selection

Elena Barzizza;Nicolò Biasetton;Riccardo Ceccato
2023

Abstract

In typical machine learning frameworks, model selection is of fundamental impor-tance: commonly, multiple models have to be trained and compared in order to iden-tify the one with the best predictive performances. The aim of this study is toprovide a new tool to improve the model selection process, allowing the user to iden-tify the algorithm which significantly outperforms the other candidates. It proposes arobust model selection procedure based on a multi-aspect permutation test whichmakes it possible to detect differences in both location and variability between twopaired samples of prediction errors. A new extension of the nonparametric combina-tion (NPC) methodology is therefore introduced and is integrated with an appropriateranking procedure in order to deal with the comparison ofC≥2 candidate models. Asimulation study is conducted to evaluate the performances of this testing procedurein 2-sample andC-sample problems, by generating data from various well-known dis-tributions and simulating several possible null and alternative scenarios. The adoptionof the proposed technique in machine learning model selection problems is then dis-cussed by means of multiple real data applications. These applications confirm whatemerges from the simulation study: the introduced NPC-based approach performswell under several different scenarios and represents a valuable tool for robustmachine learning model selection.
2023
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3499706
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact