In this thesis, we first present a unified look to several well known 3D feature representations, ranging from hand-crafted design to learning based ones. Then, we propose three kinds of feature representations from both RGB-D data and point cloud, addressing different problems and aiming for different functionality. With RGB-D data, we address the existing problems of 2D feature representation in visual perception by integrating with the 3D information. We propose an RGB-D data based feature representation which fuses object's statistical color model and depth information in a probabilistic manner. The depth information is able to not only enhance the discriminative power of the model toward clutters with a different range but also can be used as a constraint to properly update the model and reduce model drifting. The proposed representation is then evaluated in our proposed object tracking algorithm (named MS3D) on a public RGB-D object tracking dataset. It runs in real-time and produces the best results compared against the other state-of-the-art RGB-D trackers. Furthermore, we integrate MS3D tracker in an RGB-D camera network in order to handle long-term and full occlusion. The accuracy and robustness of our algorithm are evaluated in our presented dataset and the results suggest our algorithm is able to track multiple objects accurately and continuously in the long term. For 3D point cloud, the current deep learning based feature representations often discard spatial arrangements in data, hence falling short of respecting the parts-to-whole relationship, which is critical to explain and describe 3D shapes. Addressing this problem, we propose 3D point-capsule networks, an autoencoder designed for unsupervised learning of feature representations from sparse 3D point clouds while preserving spatial arrangements of the input data into different feature attentions. 3D capsule networks arise as a direct consequence of our unified formulation of the common 3D autoencoders. The dynamic routing scheme and the peculiar 2D latent feature representation deployed by our capsule networks bring in improvements for several common point cloud-related tasks, such as object classification, object reconstruction and part segmentation as substantiated by our extensive evaluations. Moreover, it enables new applications such as part interpolation and replacement. Finally, towards rotation equivariance of the 3D feature representation, we present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the SO(3) rotation group, translation, and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from the pose, paving the way for more informative descriptions and structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets.

3D feature representations for visual perception and geometric shape understanding / Zhao, Yongheng. - (2019 Dec 02).

3D feature representations for visual perception and geometric shape understanding

Zhao, Yongheng
2019

Abstract

In this thesis, we first present a unified look to several well known 3D feature representations, ranging from hand-crafted design to learning based ones. Then, we propose three kinds of feature representations from both RGB-D data and point cloud, addressing different problems and aiming for different functionality. With RGB-D data, we address the existing problems of 2D feature representation in visual perception by integrating with the 3D information. We propose an RGB-D data based feature representation which fuses object's statistical color model and depth information in a probabilistic manner. The depth information is able to not only enhance the discriminative power of the model toward clutters with a different range but also can be used as a constraint to properly update the model and reduce model drifting. The proposed representation is then evaluated in our proposed object tracking algorithm (named MS3D) on a public RGB-D object tracking dataset. It runs in real-time and produces the best results compared against the other state-of-the-art RGB-D trackers. Furthermore, we integrate MS3D tracker in an RGB-D camera network in order to handle long-term and full occlusion. The accuracy and robustness of our algorithm are evaluated in our presented dataset and the results suggest our algorithm is able to track multiple objects accurately and continuously in the long term. For 3D point cloud, the current deep learning based feature representations often discard spatial arrangements in data, hence falling short of respecting the parts-to-whole relationship, which is critical to explain and describe 3D shapes. Addressing this problem, we propose 3D point-capsule networks, an autoencoder designed for unsupervised learning of feature representations from sparse 3D point clouds while preserving spatial arrangements of the input data into different feature attentions. 3D capsule networks arise as a direct consequence of our unified formulation of the common 3D autoencoders. The dynamic routing scheme and the peculiar 2D latent feature representation deployed by our capsule networks bring in improvements for several common point cloud-related tasks, such as object classification, object reconstruction and part segmentation as substantiated by our extensive evaluations. Moreover, it enables new applications such as part interpolation and replacement. Finally, towards rotation equivariance of the 3D feature representation, we present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the SO(3) rotation group, translation, and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from the pose, paving the way for more informative descriptions and structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets.
2-dic-2019
Feature Representation, deep learning, 3D perception, point cloud, shape understanding
3D feature representations for visual perception and geometric shape understanding / Zhao, Yongheng. - (2019 Dec 02).
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_(8)_(1).pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 12.04 MB
Formato Adobe PDF
12.04 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3424787
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact