Road scenes segmentation across different domains by disentangling latent representations

Barbato, F.; Michieli, U.; Toldo, M.; Zanuttigh, P.

doi:10.1007/s00371-023-02818-w

Deep learning models obtain impressive accuracy in road scene understanding; however, they need a large number of labeled samples for their training. Additionally, such models do not generalize well to environments where the statistical properties of data do not perfectly match those of training scenes, and this can be a significant problem for intelligent vehicles. Hence, domain adaptation approaches have been introduced to transfer knowledge acquired on a label-abundant source domain to a related label-scarce target domain. In this work, we design and carefully analyze multiple latent space-shaping regularization strategies that work together to reduce the domain shift. More in detail, we devise a feature clustering strategy to increase domain alignment, a feature perpendicularity constraint to space apart features belonging to different semantic classes, including those not present in the current batch, and a feature norm alignment strategy to separate active and inactive channels. In addition, we propose a novel evaluation metric to capture the relative performance of an adapted model with respect to supervised training. We validate our framework in driving scenarios, considering both synthetic-to-real and real-to-real adaptation, outperforming previous feature-level state-of-the-art methods on multiple road scenes benchmarks.