End-to-end learning for joint depth and image reconstruction from diffracted rotation

Mel, M.; Siddiqui, M.; Zanuttigh, P.

doi:10.1007/s00371-023-03147-8

Monocular depth estimation is an open challenge due to the ill-posed nature of the problem at hand. Deep learning techniques proved capable of producing acceptable depth estimation accuracy but the lack of robust depth cues within RGB images severally limits their performance. Coded aperture-based methods using phase and amplitude masks encode strong depth cues within 2D images by means of depth-dependent Point Spread Functions (PSFs) at the price of a reduced image quality. In this paper, we propose a novel end-to-end learning approach for depth from diffracted rotation. A phase mask that produces a Rotating Point Spread Function (RPSF) as a function of defocus is jointly optimized with the weights of a depth estimation neural network. To this aim, we introduce a differentiable physical model of the aperture mask and exploit an accurate simulation of the camera imaging pipeline. Our approach requires a significantly less complex model and less training data, yet it outperforms existing methods for monocular depth estimation on indoor benchmarks. In addition, we address the image degradation problem by incorporating a non-blind and nonuniform image deblurring module to recover the sharp all-in-focus image from its blurred counterpart.