Deep Learning (DL) through Convolutional Neural Networks (CNNs) has emerged as a critical player in classifying plant diseases from images. This prominence has intensified the demand for a substantial volume of annotated training data. However, acquiring such data is costly and intricate, fraught with subtle challenges. In the domain of plants, where data collection can be even more complex, this study scrutinises how one dataset was gathered. Specifically, it delves into the nuances of collecting images of grapevine leaves in an open field for a binary classification task, discerning the presence or absence of Esca disease. Adherence to rigorous dataset quality standards during image collection is paramount in precision agriculture. Errors made in this phase can have devastating repercussions on all subsequent work. For instance, collections of photos may exhibit a consistent disparity in background characteristics between images belonging to different classes. This persistent difference can lead a deep-learning algorithm to learn undesired correlations, even though the algorithm's performances are excellent because the train and test sets possess the same kind of disparity.

Towards rigorous dataset quality standards for deep learning tasks in precision agriculture: A case study exploration

Carraro Alberto
;
Marinello Francesco
2025

Abstract

Deep Learning (DL) through Convolutional Neural Networks (CNNs) has emerged as a critical player in classifying plant diseases from images. This prominence has intensified the demand for a substantial volume of annotated training data. However, acquiring such data is costly and intricate, fraught with subtle challenges. In the domain of plants, where data collection can be even more complex, this study scrutinises how one dataset was gathered. Specifically, it delves into the nuances of collecting images of grapevine leaves in an open field for a binary classification task, discerning the presence or absence of Esca disease. Adherence to rigorous dataset quality standards during image collection is paramount in precision agriculture. Errors made in this phase can have devastating repercussions on all subsequent work. For instance, collections of photos may exhibit a consistent disparity in background characteristics between images belonging to different classes. This persistent difference can lead a deep-learning algorithm to learn undesired correlations, even though the algorithm's performances are excellent because the train and test sets possess the same kind of disparity.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S2772375524003253-main_compressed.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Published (Publisher's Version of Record)
Licenza: Creative commons
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3544955
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact