Performance degradation caused by corrupted multimedia samples is a critical challenge for machine learning models. Previously, three groups of approaches have been proposed to tackle this issue: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All have drawbacks limiting applicability; the first requires paired clean-corrupted data for training and has an high computational cost, while the others can only be used on the same task they were trained on. In this paper, we propose SyMPIE to solve these shortcomings, designing a small, modular, and efficient system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5% relative accuracy gain across the board1.

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

Barbato F.;Zanuttigh P.;
2024

Abstract

Performance degradation caused by corrupted multimedia samples is a critical challenge for machine learning models. Previously, three groups of approaches have been proposed to tackle this issue: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All have drawbacks limiting applicability; the first requires paired clean-corrupted data for training and has an high computational cost, while the others can only be used on the same task they were trained on. In this paper, we propose SyMPIE to solve these shortcomings, designing a small, modular, and efficient system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5% relative accuracy gain across the board1.
2024
MMSys 2024 - Proceedings of the 2024 ACM Multimedia Systems Conference
15th ACM Multimedia Systems Conference, MMSys 2024
File in questo prodotto:
File Dimensione Formato  
3625468.3647623.pdf

accesso aperto

Tipologia: Published (publisher's version)
Licenza: Creative commons
Dimensione 2.89 MB
Formato Adobe PDF
2.89 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3513546
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact