In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.

In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.

Prediction of Activities and Visual Concepts Under Complex and Changing Conditions / Camporese, Guglielmo. - (2023 Feb 27).

Prediction of Activities and Visual Concepts Under Complex and Changing Conditions

CAMPORESE, GUGLIELMO
2023

Abstract

In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.
Prediction of Activities and Visual Concepts Under Complex and Changing Conditions
27-feb-2023
In the last years, all the computer vision dramatically changed because of the deep learning systems that have overtaken in most of the tasks the performances of the previous models establishing a new way of thinking about vision problems. Despite the success of traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. In this dissertation, I will discuss my findings on different problems related to the visual prediction of activities and visual concepts under complex and changing conditions. A core problem of visual intelligence is the capability of anticipating future events on videos given the current state of knowledge and, to achieve such predictive capabilities, specific vision systems have to be developed for encoding current representations and creating hypotheses of future scenarios. In this dissertation, I will discuss different directions I proposed for reaching predictive capabilities of vision systems based on semantic label smoothing of future actions, representing videos with slow and fast temporal scales, predicting latent goals, and prototyping future action representations. Another challenge of visual intelligence is related to recognizing unknown visual concepts that are not previously seen by the visual system. In this context, in this dissertation I will discuss my work on open-set recognition where the vision model has to detect unknown classes not seen during training, maintaining the recognition capability on previously seen categories. Another core task related to the prediction of visual concepts is the representation learning problem where the model has to learn good representations from visual input, without any supervision. In this context, in this dissertation, I will discuss how vision transformers can learn efficiently good representations on small datasets by designing self-supervised tasks based on spatial relations of input patches.
Prediction of Activities and Visual Concepts Under Complex and Changing Conditions / Camporese, Guglielmo. - (2023 Feb 27).
File in questo prodotto:
File Dimensione Formato  
tesi_definitiva_Guglielmo_Camporese.pdf

Open Access dal 28/02/2024

Descrizione: Tesi
Tipologia: Tesi di dottorato
Dimensione 21.38 MB
Formato Adobe PDF
21.38 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3473495
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact