Robust perception of humans for mobile robots RGB-depth algorithms for people tracking, re-identification and action recognition

Munaro, Matteo

Human perception is one of the most important skills for a mobile robot sharing its workspace with humans. This is not only true for navigation, because people have to be avoided differently than other obstacles, but also because mobile robots must be able to truly interact with humans. In a near future, we can imagine that robots will be more and more present in every house and will perform services useful to the well-being of humans. For this purpose, robust people tracking algorithms must be exploited and person re-identification techniques play an important role for allowing robots to recognize a person after a full occlusion or after long periods of time. Moreover, they must be able to recognize what humans are doing, in order to react accordingly, helping them if needed or also learning from them. This thesis tackles these problems by proposing approaches which combine algorithms based on both RGB and depth information which can be obtained with recently introduced consumer RGB-D sensors. Our key contribution to people detection and tracking research is a depth-clustering method which allows to apply a robust image-based people detector only to a small subset of possible detection windows, thus decreasing the number of false detections while reaching high computational efficiency. We also advance person re-identification research by proposing two techniques exploiting depth-based skeletal tracking algorithms: one is targeted to short-term re-identification and creates a compact, yet discrimative signature of people based on computing features at skeleton keypoints, which are highly repeatable and semantically meaningful; the other extract long-term features, such as 3D shape, to compare people by matching the corresponding 3D point cloud acquired with a RGB-D sensor. In order to account for the fact that people are articulated and not rigid objects, it exploits 3D skeleton information for warping people point clouds to a standard pose, thus making them directly comparable by means of least square fitting. Finally, we describe an extension of flow-based action recognition methods to the RGB-D domain which computes motion over time of persons' 3D points by exploiting joint color and depth information and recognizes human actions by classifying gridded descriptors of 3D flow. A further contribution of this thesis is the creation of a number of new RGB-D datasets which allow to compare different algorithms on data acquired by consumer RGB-D sensors. All these datasets have been publically released in order to foster research in these fields.

Una delle più importanti abilità per un robot mobile che agisce in un ambiente popolato da persone è la capacità di percepire gli esseri umani. Questo non è vero soltanto per la navigazione perché le persone devono essere evitate in maniera diversa dagli altri ostacoli, ma anche perché i robot mobili devono essere in grado di interagire veramente con gli esseri umani. In un prossimo futuro, si può immaginare che i robot saranno sempre più presenti in ogni casa e svolgeranno compiti utili al benessere delle persone. Per questo scopo, è necessario utilizzare robusti algoritmi di tracking e le tecniche di re-identificazione svolgono un ruolo importante per far sì che i robot riconoscano una persona anche dopo un'occlusione totale o dopo lunghi periodi di tempo. Inoltre, essi devono essere in grado di riconoscere le azioni delle persone per reagire in maniera adeguata, aiutandole se necessario o anche apprendendo da loro. Questa tesi affronta queste problematiche proponendo approcci che combinano algoritmi basati su informazioni RGB e di profondità che possono essere ottenute con i sensori RGB-D recentemente introdotti nel mercato. Il nostro contributo chiave alla ricerca sulla rilevazione e il tracking di persone è un clustering basato sull'informazione di profondità che permette di applicare un rilevatore di persone robusto e basato sull'immagine solamente a un ristretto insieme delle possibili finestre di detection, quindi diminuendo il numero di falsi allarmi e raggiungendo un'elevata efficienza computazionale. La ricerca sulla re-identificazione di persone viene avanzata proponendo due tecniche che sfruttano algoritmi di tracking dello scheletro basati sull'informazione di profondità: una è pensata per la re-identificazione a breve termine e crea una firma compatta, ma discriminativa, delle persone calcolando delle feature alle posizioni chiave dello scheletro, che sono altamente ripetibili e semanticamente significative; l'altra estrae feature a lungo termine, come la forma 3D, per confrontare le persone in base alla loro nuvola di punti 3D acquisita con un sensore RGB-D. Per tenere conto del fatto che le persone non sono oggetti rigidi, ma sono articolate, questa tecnica sfrutta l'informazione 3D dello scheletro per ricondurre le nuvole di punti delle persone ad una posa standard che le renda direttamente confrontabili mediante un fitting ai minimi quadrati. Infine, viene descritta un'estensione al dominio RGB-D delle tecniche di riconoscimento di azioni basati sul flusso ottico. Questa estensione calcola il flusso nel tempo dei punti 3D di una persona sfruttando congiuntamente l'informazione di colore e profondità e riconosce le azioni umane classificando descrittori a griglia del flusso 3D. Un ulteriore contributo di questa tesi è la creazione di una serie di dataset RGB-D che permettono di confrontare diversi algoritmi su dati acquisiti con sensori RGB-D di tipo consumer. Tutti questi dataset sono stati rilasciati pubblicamente per favorire la ricerca in questi settori.

Robust perception of humans for mobile robots RGB-depth algorithms for people tracking, re-identification and action recognition / Munaro, Matteo. - (2014 Jan 28).