Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration

Terreran, Matteo; Margherita, Lazzaretto; Ghidoni, Stefano

doi:10.1007/978-3-031-22216-0_3

Human action recognition plays a major role in enabling an effective and safe collaboration between humans and robots. Considering for example a collaborative assembly task, the human worker can use gestures to communicate with the robot while the robot can exploit the recognized actions to anticipate the next steps in the assembly process, improving safety and the overall productivity. In this work, we propose a novel framework for human action recognition based on 3D pose estimation and ensemble techniques. In such framework, we first estimate the 3D coordinates of the human hands and body joints by means of OpenPose and RGB-D data. The estimated joints are then fed to a set of graph convolutional networks derived from Shift-GCN, one network for each set of joints (i.e., body, left hand and right hand). Finally, using an ensemble approach we average the output scores of all the networks to predict the final human action. The proposed framework was evaluated on a dedicated dataset, named IAS-Lab Collaborative HAR dataset, which includes both actions and gestures commonly used in human-robot collaboration tasks. The experimental results demonstrated how the ensemble of the different action recognition models helps improving the accuracy and the robustness of the overall system.