Object detection and recognition algorithms usually require large, annotated training sets. The creation of such datasets requires expensive manual annotation. Eye tracking can help in the annotation procedure. Humans use vision constantly to explore the environment and plan motor actions, such as grasping an object. In this paper we investigate the possibility to semi-automatically train object recognition with eye tracking, accelerometer in scene camera data, learning from the natural hand-eye coordination of humans. Our approach involves three steps. First, sensor data are recorded using eye tracking glasses that are used in combination with accelerometers and surface electromyography that are usually applied when controlling prosthetic hands. Second, a set of patches are extracted automatically from the scene camera data while grasping an object. Third, a convolutional neural network is trained and tested using the extracted patches. Results show that the parameters of eye-hand coordination can be used to train an object recognition system semi-automatically. These can be exploited with proper sensors to fine-tune a convolutional neural network for object detection and recognition. This approach opens interesting options to train computer vision and multi-modal data integration systems and lays the foundations for future applications in robotics. In particular, this work targets the improvement of prosthetic hands by recognizing the objects that a person may wish to use. However, the approach can easily be generalized.

Semi-automatic training of an object recognition system in scene camera data using gaze tracking and accelerometers

Bassetto F.;Atzori M.
2017

Abstract

Object detection and recognition algorithms usually require large, annotated training sets. The creation of such datasets requires expensive manual annotation. Eye tracking can help in the annotation procedure. Humans use vision constantly to explore the environment and plan motor actions, such as grasping an object. In this paper we investigate the possibility to semi-automatically train object recognition with eye tracking, accelerometer in scene camera data, learning from the natural hand-eye coordination of humans. Our approach involves three steps. First, sensor data are recorded using eye tracking glasses that are used in combination with accelerometers and surface electromyography that are usually applied when controlling prosthetic hands. Second, a set of patches are extracted automatically from the scene camera data while grasping an object. Third, a convolutional neural network is trained and tested using the extracted patches. Results show that the parameters of eye-hand coordination can be used to train an object recognition system semi-automatically. These can be exploited with proper sensors to fine-tune a convolutional neural network for object detection and recognition. This approach opens interesting options to train computer vision and multi-modal data integration systems and lays the foundations for future applications in robotics. In particular, this work targets the improvement of prosthetic hands by recognizing the objects that a person may wish to use. However, the approach can easily be generalized.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
978-3-319-68344-7
978-3-319-68345-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11577/3394641
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact