In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.
In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.
Prediction of Activities and Visual Concepts Under Complex and Changing Conditions / Camporese, Guglielmo. - (2023 Feb 27).
Prediction of Activities and Visual Concepts Under Complex and Changing Conditions
CAMPORESE, GUGLIELMO
2023
Abstract
In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.File | Dimensione | Formato | |
---|---|---|---|
tesi_definitiva_Guglielmo_Camporese.pdf
Open Access dal 28/02/2024
Descrizione: Tesi
Tipologia:
Tesi di dottorato
Dimensione
21.38 MB
Formato
Adobe PDF
|
21.38 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.