Data Driven Approaches for Depth Data Denoising

Agresti, Gianluca

The scene depth is an important information that can be used to retrieve the scene geometry, a missing element in standard color images. For this reason, the depth information is usually employed in many applications such as 3D reconstruction, autonomous driving and robotics. The last decade has seen the spread of different commercial devices able to sense the scene depth. Among these, Time-of-Flight (ToF) cameras are becoming popular because they are relatively cheap and they can be miniaturized and implemented on portable devices. Stereo vision systems are the most widespread 3D sensors and they are simply composed by two standard color cameras. However, they are not free from flaws, in particular they fail when the scene has no texture. Active stereo and structured light systems have been developed to overcome this issue by using external light projectors. This thesis collects the findings of my Ph.D. research, which are mainly devoted to the denoising of depth data. First, some of the most widespread commercial 3D sensors are introduced with their strengths and limitations. Then, some techniques for the quality enhancement of ToF depth acquisition are presented and compared with other state-of-the-art methods. A first proposed method is based on a hardware modification of the standard ToF projector. A second approach instead uses multi-frequency ToF recordings as input of a deep learning network to improve the depth estimation. A particular focus will be given to how the denoising performance degrades, when the network is trained on synthetic data and tested on real data. Thus, a method to reduce the gap in performance will be proposed. Since ToF and stereo vision systems have complementary characteristics, the possibility to fuse the information coming from these sensors is analysed and a method based on a locally consistent fusion, guided by a learning based reliability measure for the two sensors, is proposed. A part of this thesis is dedicated to the description of the data acquisition procedures and the related labeling required to collect the datasets we used for the training and evaluation of the proposed methods.

La profondità della scena è un importante informazione che può essere usata per recuperare la geometria della scena stessa, un elemento mancante nelle semplici immagini a colori. Per questo motivo, questi dati sono spesso usati in molte applicazioni come ricostruzione 3D, guida autonoma e robotica. L'ultima decade ha visto il diffondersi di diversi dispositivi capaci di stimare la profondità di una scena. Tra questi, le telecamere Time-of-Flight (ToF) stanno diventando sempre più popolari poiché sono relativamente poco costose e possono essere miniaturizzate e implementate su dispositivi portatili. I sistemi a visione stereoscopica sono i sensori 3D più diffusi e sono composti da due semplici telecamere a colori. Questi sensori non sono però privi di difetti, in particolare non riescono a stimare in maniera corretta la profondità di scene prive di texture. I sistemi stereoscopici attivi e i sistemi a luce strutturata sono stati sviluppati per risolvere questo problema usando un proiettore esterno. Questa tesi presenta i risultati che ho ottenuto durante il mio Dottorato di Ricerca presso l'Università degli Studi di Padova. Lo scopo principale del mio lavoro è stato quello di presentare metodi per il miglioramento dei dati 3D acquisiti con sensori commerciali. Nella prima parte della tesi i sensori 3D più diffusi verranno presentati introducendo i loro punti di forza e debolezza. In seguito verranno descritti dei metodi per il miglioramento della qualità dei dati di profondità acquisiti con telecamere ToF. Un primo metodo sfrutta una modifica hardware del proiettore ToF. Il secondo utilizza una rete neurale convoluzionale (CNN) che sfrutta dati acquisiti da una telecamera ToF per stimare un'accurata mappa di profondità della scena. Nel mio lavoro è stata data attenzione a come le prestazioni di questo metodo peggiorano quando la CNN è allenata su dati sintetici e testata su dati reali. Di conseguenza, un metodo per ridurre tale perdita di prestazioni verrà presentato. Poiché le mappe di profondità acquisite con sensori ToF e sistemi stereoscopici hanno proprietà complementari, la possibilità di fondere queste due sorgenti di informazioni è stata investigata. In particolare, è stato presentato un metodo di fusione che rinforza la consistenza locale dei dati e che sfrutta una stima dell'accuratezza dei due sensori, calcolata con una CNN, per guidare il processo di fusione. Una parte della tesi è dedita alla descrizione delle procedure di acquisizione dei dati utilizzati per l'allenamento e la valutazione dei metodi presentati.

Data Driven Approaches for Depth Data Denoising / Agresti, Gianluca. - (2019 Dec 02).