The rise of deep learning allowed for a significant leap forward in the field of computer vision in terms of the potential expressed by predictive algorithms. While traditional techniques were struggling to offer satisfactory performance, nowadays deep neural networks have been shown to perform strikingly well in a wide range of tasks, paving the way to application-oriented solutions based on such algorithms. Yet, real-world practical applications typically involve learning settings that depart from the standard static single-stage supervised setup, where deep learning algorithms have proven to excel. On the one side, data collection settings seldom match target environments on which the learning algorithm is intended to be applied. This environmental discrepancy calls for domain adaptation, with the final objective to develop predictive models that are robust to change of input distribution between supervised and target setups. On the other side, in an ever-changing world, it is of primary importance to continually acquire new knowledge to keep up with novel tasks, without losing what has been laboriously achieved so far. This presence of dynamically evolving tasks demands for solutions capable of continual learning with just new-task training data at disposal, without erasing previously learned information. Both problems are generally encountered simultaneously, with novel tasks urging for new learning phases to master them, which, in turn, are inevitably introducing dynamically mutable acquisition settings leading to shifts in data distribution. In this thesis we propose to investigate both problems, that is learning under domain and task shift. We first address them individually, and then face them together. In the first part of the thesis, we investigate the domain adaptation problem in its unsupervised form. Unsupervised domain adaptation comes in handy when relying on label-abundant domains for label-costly tasks, such as semantic segmentation. We first provide a broad and detailed overview of different ways in which this problem can be approached, and then propose new techniques to perform adaptation at multiple levels of data representation. In particular, image-level distribution alignment is achieved by image-to-image translation, whereas model-produced representations are adapted at the output-level by domain adversarial schemes and self-training, and at the intermediate level by a clustering based class-conditional regularization. In the second part, we shift our focus to continual task learning. Class incremental learning is crucial to sustain the dynamic expansion of the pool of semantic classes when past training data is no longer accessible. We show that regaining access and replaying former data distributions, still following an exemplar-free setup, represents a successful strategy, especially for long training progressions. We first devise an incremental framework that models the underlying representation drift undergone by old classes, and leverages the up-to-date estimated feature distribution to reproduce samples of no longer available categories in the latent space. Then, we propose to replay image-level information, by retrieving coarsely-labeled samples, via generative models and web-crawling, and pseudo-labeling them. Still, while the domain adaptation and continual task learning problems have been extensively studied as separate entities, methods that target only one of them perform poorly in a general setting comprising both scenarios. Therefore, in the third part of the thesis we develop a framework apt to deal with the general continual learning problem that encompasses both domain and task shift. We propose a domain stylization scheme to cope with domain incremental shift and spread task-related information across all domains encountered, joined by a robust distillation mechanism to preserve and adapt previously acquired knowledge to new environments.

The rise of deep learning allowed for a significant leap forward in the field of computer vision in terms of the potential expressed by predictive algorithms. While traditional techniques were struggling to offer satisfactory performance, nowadays deep neural networks have been shown to perform strikingly well in a wide range of tasks, paving the way to application-oriented solutions based on such algorithms. Yet, real-world practical applications typically involve learning settings that depart from the standard static single-stage supervised setup, where deep learning algorithms have proven to excel. On the one side, data collection settings seldom match target environments on which the learning algorithm is intended to be applied. This environmental discrepancy calls for domain adaptation, with the final objective to develop predictive models that are robust to change of input distribution between supervised and target setups. On the other side, in an ever-changing world, it is of primary importance to continually acquire new knowledge to keep up with novel tasks, without losing what has been laboriously achieved so far. This presence of dynamically evolving tasks demands for solutions capable of continual learning with just new-task training data at disposal, without erasing previously learned information. Both problems are generally encountered simultaneously, with novel tasks urging for new learning phases to master them, which, in turn, are inevitably introducing dynamically mutable acquisition settings leading to shifts in data distribution. In this thesis we propose to investigate both problems, that is learning under domain and task shift. We first address them individually, and then face them together. In the first part of the thesis, we investigate the domain adaptation problem in its unsupervised form. Unsupervised domain adaptation comes in handy when relying on label-abundant domains for label-costly tasks, such as semantic segmentation. We first provide a broad and detailed overview of different ways in which this problem can be approached, and then propose new techniques to perform adaptation at multiple levels of data representation. In particular, image-level distribution alignment is achieved by image-to-image translation, whereas model-produced representations are adapted at the output-level by domain adversarial schemes and self-training, and at the intermediate level by a clustering based class-conditional regularization. In the second part, we shift our focus to continual task learning. Class incremental learning is crucial to sustain the dynamic expansion of the pool of semantic classes when past training data is no longer accessible. We show that regaining access and replaying former data distributions, still following an exemplar-free setup, represents a successful strategy, especially for long training progressions. We first devise an incremental framework that models the underlying representation drift undergone by old classes, and leverages the up-to-date estimated feature distribution to reproduce samples of no longer available categories in the latent space. Then, we propose to replay image-level information, by retrieving coarsely-labeled samples, via generative models and web-crawling, and pseudo-labeling them. Still, while the domain adaptation and continual task learning problems have been extensively studied as separate entities, methods that target only one of them perform poorly in a general setting comprising both scenarios. Therefore, in the third part of the thesis we develop a framework apt to deal with the general continual learning problem that encompasses both domain and task shift. We propose a domain stylization scheme to cope with domain incremental shift and spread task-related information across all domains encountered, joined by a robust distillation mechanism to preserve and adapt previously acquired knowledge to new environments.

Tackling the Distribution Shift in Visual Understanding Applications / Toldo, Marco. - (2023 Feb 17).

Tackling the Distribution Shift in Visual Understanding Applications

TOLDO, MARCO
2023

Abstract

The rise of deep learning allowed for a significant leap forward in the field of computer vision in terms of the potential expressed by predictive algorithms. While traditional techniques were struggling to offer satisfactory performance, nowadays deep neural networks have been shown to perform strikingly well in a wide range of tasks, paving the way to application-oriented solutions based on such algorithms. Yet, real-world practical applications typically involve learning settings that depart from the standard static single-stage supervised setup, where deep learning algorithms have proven to excel. On the one side, data collection settings seldom match target environments on which the learning algorithm is intended to be applied. This environmental discrepancy calls for domain adaptation, with the final objective to develop predictive models that are robust to change of input distribution between supervised and target setups. On the other side, in an ever-changing world, it is of primary importance to continually acquire new knowledge to keep up with novel tasks, without losing what has been laboriously achieved so far. This presence of dynamically evolving tasks demands for solutions capable of continual learning with just new-task training data at disposal, without erasing previously learned information. Both problems are generally encountered simultaneously, with novel tasks urging for new learning phases to master them, which, in turn, are inevitably introducing dynamically mutable acquisition settings leading to shifts in data distribution. In this thesis we propose to investigate both problems, that is learning under domain and task shift. We first address them individually, and then face them together. In the first part of the thesis, we investigate the domain adaptation problem in its unsupervised form. Unsupervised domain adaptation comes in handy when relying on label-abundant domains for label-costly tasks, such as semantic segmentation. We first provide a broad and detailed overview of different ways in which this problem can be approached, and then propose new techniques to perform adaptation at multiple levels of data representation. In particular, image-level distribution alignment is achieved by image-to-image translation, whereas model-produced representations are adapted at the output-level by domain adversarial schemes and self-training, and at the intermediate level by a clustering based class-conditional regularization. In the second part, we shift our focus to continual task learning. Class incremental learning is crucial to sustain the dynamic expansion of the pool of semantic classes when past training data is no longer accessible. We show that regaining access and replaying former data distributions, still following an exemplar-free setup, represents a successful strategy, especially for long training progressions. We first devise an incremental framework that models the underlying representation drift undergone by old classes, and leverages the up-to-date estimated feature distribution to reproduce samples of no longer available categories in the latent space. Then, we propose to replay image-level information, by retrieving coarsely-labeled samples, via generative models and web-crawling, and pseudo-labeling them. Still, while the domain adaptation and continual task learning problems have been extensively studied as separate entities, methods that target only one of them perform poorly in a general setting comprising both scenarios. Therefore, in the third part of the thesis we develop a framework apt to deal with the general continual learning problem that encompasses both domain and task shift. We propose a domain stylization scheme to cope with domain incremental shift and spread task-related information across all domains encountered, joined by a robust distillation mechanism to preserve and adapt previously acquired knowledge to new environments.
Tackling the Distribution Shift in Visual Understanding Applications
17-feb-2023
The rise of deep learning allowed for a significant leap forward in the field of computer vision in terms of the potential expressed by predictive algorithms. While traditional techniques were struggling to offer satisfactory performance, nowadays deep neural networks have been shown to perform strikingly well in a wide range of tasks, paving the way to application-oriented solutions based on such algorithms. Yet, real-world practical applications typically involve learning settings that depart from the standard static single-stage supervised setup, where deep learning algorithms have proven to excel. On the one side, data collection settings seldom match target environments on which the learning algorithm is intended to be applied. This environmental discrepancy calls for domain adaptation, with the final objective to develop predictive models that are robust to change of input distribution between supervised and target setups. On the other side, in an ever-changing world, it is of primary importance to continually acquire new knowledge to keep up with novel tasks, without losing what has been laboriously achieved so far. This presence of dynamically evolving tasks demands for solutions capable of continual learning with just new-task training data at disposal, without erasing previously learned information. Both problems are generally encountered simultaneously, with novel tasks urging for new learning phases to master them, which, in turn, are inevitably introducing dynamically mutable acquisition settings leading to shifts in data distribution. In this thesis we propose to investigate both problems, that is learning under domain and task shift. We first address them individually, and then face them together. In the first part of the thesis, we investigate the domain adaptation problem in its unsupervised form. Unsupervised domain adaptation comes in handy when relying on label-abundant domains for label-costly tasks, such as semantic segmentation. We first provide a broad and detailed overview of different ways in which this problem can be approached, and then propose new techniques to perform adaptation at multiple levels of data representation. In particular, image-level distribution alignment is achieved by image-to-image translation, whereas model-produced representations are adapted at the output-level by domain adversarial schemes and self-training, and at the intermediate level by a clustering based class-conditional regularization. In the second part, we shift our focus to continual task learning. Class incremental learning is crucial to sustain the dynamic expansion of the pool of semantic classes when past training data is no longer accessible. We show that regaining access and replaying former data distributions, still following an exemplar-free setup, represents a successful strategy, especially for long training progressions. We first devise an incremental framework that models the underlying representation drift undergone by old classes, and leverages the up-to-date estimated feature distribution to reproduce samples of no longer available categories in the latent space. Then, we propose to replay image-level information, by retrieving coarsely-labeled samples, via generative models and web-crawling, and pseudo-labeling them. Still, while the domain adaptation and continual task learning problems have been extensively studied as separate entities, methods that target only one of them perform poorly in a general setting comprising both scenarios. Therefore, in the third part of the thesis we develop a framework apt to deal with the general continual learning problem that encompasses both domain and task shift. We propose a domain stylization scheme to cope with domain incremental shift and spread task-related information across all domains encountered, joined by a robust distillation mechanism to preserve and adapt previously acquired knowledge to new environments.
Tackling the Distribution Shift in Visual Understanding Applications / Toldo, Marco. - (2023 Feb 17).
File in questo prodotto:
File Dimensione Formato  
tesi_definitiva_Marco_Toldo.pdf

embargo fino al 18/08/2024

Descrizione: tesi_definitiva_Marco_Toldo
Tipologia: Tesi di dottorato
Dimensione 30.32 MB
Formato Adobe PDF
30.32 MB Adobe PDF Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3471254
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact