Prediction of Activities and Visual Concepts Under Complex and Changing Conditions

Camporese, Guglielmo

In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.

Prediction of Activities and Visual Concepts Under Complex and Changing Conditions / Camporese, Guglielmo. - (2023 Feb 27).

Prediction of Activities and Visual Concepts Under Complex and Changing Conditions

CAMPORESE, GUGLIELMO

2023

Abstract

In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Prediction of Activities and Visual Concepts Under Complex and Changing Conditions
			
	Anno di discussione
	
				27-feb-2023
			
	Abstract
	
				In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.
			
	Citazione
	
				Prediction of Activities and Visual Concepts Under Complex and Changing Conditions / Camporese, Guglielmo. - (2023 Feb 27).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
tesi_definitiva_Guglielmo_Camporese.pdf Open Access dal 28/02/2024 Descrizione: Tesi Tipologia: Tesi di dottorato Licenza: Altro Dimensione 21.38 MB Formato Adobe PDF Visualizza/Apri	21.38 MB	Adobe PDF	Visualizza/Apri