In thiswork,we propose a Docker image architecture for the replicability of Neural IR (NeuIR) models.We also share two self-contained Docker images to run the Neural Vector Space Model (NVSM) [22], an unsupervised NeuIR model. The first image we share (nvsm-cpu) can run on most machines and relies only on CPU to perform the required computations. The second image we share (nvsm-GPU) relies instead on the Graphics Processing Unit (GPU) of the host machine, when available, to perform computationally intensive tasks, such as the training of the NVSM model. Furthermore, we discuss some insights on the engineering challenges we encountered to obtain deterministic and consistent results from NeuIR models, relying on TensorFlow within Docker. We also provide an in-depth evaluation of the differences between the runs obtained with the shared images. The differences are due to the usage within Docker of TensorFlow and CUDA libraries - whose inherent randomness alter, under certain circumstances, the relative order of documents in rankings.

A docker-based replicability study of a neural information retrieval model

Ferro N.;Marchesin S.;Purpura A.;Silvello G.
2019

Abstract

In thiswork,we propose a Docker image architecture for the replicability of Neural IR (NeuIR) models.We also share two self-contained Docker images to run the Neural Vector Space Model (NVSM) [22], an unsupervised NeuIR model. The first image we share (nvsm-cpu) can run on most machines and relies only on CPU to perform the required computations. The second image we share (nvsm-GPU) relies instead on the Graphics Processing Unit (GPU) of the host machine, when available, to perform computationally intensive tasks, such as the training of the NVSM model. Furthermore, we discuss some insights on the engineering challenges we encountered to obtain deterministic and consistent results from NeuIR models, relying on TensorFlow within Docker. We also provide an in-depth evaluation of the differences between the runs obtained with the shared images. The differences are due to the usage within Docker of TensorFlow and CUDA libraries - whose inherent randomness alter, under certain circumstances, the relative order of documents in rankings.
2019
Proc. of the Open-Source IR Replicability Challenge (OSIRRC 2019)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3326095
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact