Towards rigorous dataset quality standards for deep learning tasks in precision agriculture: A case study exploration

Deep Learning (DL) through Convolutional Neural Networks (CNNs) has emerged as a critical player in classifying plant diseases from images. This prominence has intensified the demand for a substantial volume of annotated training data. However, acquiring such data is costly and intricate, fraught with subtle challenges. In the domain of plants, where data collection can be even more complex, this study scrutinises how one dataset was gathered. Specifically, it delves into the nuances of collecting images of grapevine leaves in an open field for a binary classification task, discerning the presence or absence of Esca disease. Adherence to rigorous dataset quality standards during image collection is paramount in precision agriculture. Errors made in this phase can have devastating repercussions on all subsequent work. For instance, collections of photos may exhibit a consistent disparity in background characteristics between images belonging to different classes. This persistent difference can lead a deep-learning algorithm to learn undesired correlations, even though the algorithm's performances are excellent because the train and test sets possess the same kind of disparity.

Towards rigorous dataset quality standards for deep learning tasks in precision agriculture: A case study exploration

Carraro Alberto;Saurio Gaetano;Marinello Francesco

2025

Abstract

Deep Learning (DL) through Convolutional Neural Networks (CNNs) has emerged as a critical player in classifying plant diseases from images. This prominence has intensified the demand for a substantial volume of annotated training data. However, acquiring such data is costly and intricate, fraught with subtle challenges. In the domain of plants, where data collection can be even more complex, this study scrutinises how one dataset was gathered. Specifically, it delves into the nuances of collecting images of grapevine leaves in an open field for a binary classification task, discerning the presence or absence of Esca disease. Adherence to rigorous dataset quality standards during image collection is paramount in precision agriculture. Errors made in this phase can have devastating repercussions on all subsequent work. For instance, collections of photos may exhibit a consistent disparity in background characteristics between images belonging to different classes. This persistent difference can lead a deep-learning algorithm to learn undesired correlations, even though the algorithm's performances are excellent because the train and test sets possess the same kind of disparity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista su cui è pubblicata l'opera
	
				SMART AGRICULTURAL TECHNOLOGY
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.atech.2024.100721
			
	Codice WOS
	
				WOS:001391892600001
			
	Codice Scopus
	
				2-s2.0-85212317991
			
	Codice OpenAlex
	
				W4405404922
			
	Identificativo progetto
	
	Titolo Progetto
	
									Agritech National Research Centre
								
	Acronimo
	
									Next-GenerationEU
								
	Nome finanziatore
	
										European Union
									
	Finanziamento
	
									PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR)—MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.4—D.D. 1032 17/06/2022
								
	N. Contratto
	
									CN00000022
								
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S2772375524003253-main_compressed.pdf accesso aperto Descrizione: Articolo Tipologia: Published (Publisher's Version of Record) Licenza: Creative commons Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri	1.02 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3544955

Citazioni

ND

7

4

4

social impact