Mid-infrared spectroscopy (MIRS) is widely used to collect milk phenotypes at the population level. The aim of this study was to test the ability of the uninformative variable elimination (UVE) method to select and remove uninformative wavelength variables before partial least squares (PLS) analysis. Milk titratable acidity (TA) and Ca content were used as examples to illustrate the procedure. Reference values and MIRS spectra (n = 208) of TA and Ca were retrieved from an existing database. The data set was randomly divided into calibration (70% of data) and validation (30% of data) sets, and PLS analysis was carried out before and after the UVE procedure. The UVE procedure selected 244 and 113 informative wavelengths for TA and Ca, respectively, from a total of 1,060. The elimination of uninformative variables before PLS regression increased the accuracy of MIRS prediction models, and it substantially reduced the computation time. Dealing with fewer variables is expected to enhance the efficiency of MIRS models to predict phenotypes at population level.

Technical note: Improving the accuracy of mid-infrared prediction models by selecting the most informative wavelengths

GOTTARDO, PAOLO;DE MARCHI, MASSIMO;CASSANDRO, MARTINO;PENASA, MAURO
2015

Abstract

Mid-infrared spectroscopy (MIRS) is widely used to collect milk phenotypes at the population level. The aim of this study was to test the ability of the uninformative variable elimination (UVE) method to select and remove uninformative wavelength variables before partial least squares (PLS) analysis. Milk titratable acidity (TA) and Ca content were used as examples to illustrate the procedure. Reference values and MIRS spectra (n = 208) of TA and Ca were retrieved from an existing database. The data set was randomly divided into calibration (70% of data) and validation (30% of data) sets, and PLS analysis was carried out before and after the UVE procedure. The UVE procedure selected 244 and 113 informative wavelengths for TA and Ca, respectively, from a total of 1,060. The elimination of uninformative variables before PLS regression increased the accuracy of MIRS prediction models, and it substantially reduced the computation time. Dealing with fewer variables is expected to enhance the efficiency of MIRS models to predict phenotypes at population level.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3182955
Citazioni
  • ???jsp.display-item.citation.pmc??? 5
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 21
social impact