METRIC—Multi-Eye to Robot Indoor Calibration Dataset

Allegro, Davide; Terreran, Matteo; Ghidoni, Stefano

doi:10.3390/info14060314

Multi-camera systems are an effective solution for perceiving large areas or complex scenarios with many occlusions. In such a setup, an accurate camera network calibration is crucial in order to localize scene elements with respect to a single reference frame shared by all the viewpoints of the network. This is particularly important in applications such as object detection and people tracking. Multi-camera calibration is a critical requirement also in several robotics scenarios, particularly those involving a robotic workcell equipped with a manipulator surrounded by multiple sensors. Within this scenario, the robot-world hand-eye calibration is an additional crucial element for determining the exact position of each camera with respect to the robot, in order to provide information about the surrounding workspace directly to the manipulator. Despite the importance of the calibration process in the two scenarios outlined above, namely (i) a camera network, and (ii) a camera network with a robot, there is a lack of standard datasets available in the literature to evaluate and compare calibration methods. Moreover they are usually treated separately and tested on dedicated setups. In this paper, we propose a general standard dataset acquired in a robotic workcell where calibration methods can be evaluated in two use cases: camera network calibration and robot-world hand-eye calibration. The Multi-Eye To Robot Indoor Calibration (METRIC) dataset consists of over 10,000 synthetic and real images of ChAruCo and checkerboard patterns, each one rigidly attached to the robot end-effector, which was moved in front of four cameras surrounding the manipulator from different viewpoints during the image acquisition. The real images in the dataset includes several multi-view image sets captured by three different types of sensor networks: Microsoft Kinect V2, Intel RealSense Depth D455 and Intel RealSense Lidar L515, to evaluate their advantages and disadvantages for calibration. Furthermore, in order to accurately analyze the effect of camera-robot distance on calibration, we acquired a comprehensive synthetic dataset, with related ground truth, with three different camera network setups corresponding to three levels of calibration difficulty depending on the cell size. An additional contribution of this work is to provide a comprehensive evaluation of state-of-the-art calibration methods using our dataset, highlighting their strengths and weaknesses, in order to outline two benchmarks for the two aforementioned use cases.