Improving the classification of veterinary thoracic radiographs through inter-species and inter-pathology self-supervised pre-training of deep learning models.

Celniac, W.; Wodzinsky, M.; Jurgas, A.; Burti, S.; Zotti, A.; Atzori, M.; Muller, H.; Banzato, T.

doi:10.1038/s41598-023-46345-z

The analysis of veterinary radiographic imaging data is an essential step in the diagnosis of many thoracic lesions. Given the limited time that physicians can devote to a single patient, it would be valuable to implement an automated system to help clinicians make faster but still accurate diagnoses. Currently, most of such systems are based on supervised deep learning approaches. However, the problem with these solutions is that they need a large database of labeled data. Access to such data is often limited, as it requires a great investment of both time and money. Therefore, in this work we present a solution that allows higher classification scores to be obtained using knowledge transfer from inter-species and inter-pathology self-supervised learning methods. Before training the network for classification, pretraining of the model was performed using self-supervised learning approaches on publicly available unlabeled radiographic data of human and dog images, which allowed substantially increasing the number of images for this phase. The self-supervised learning approaches included the Beta Variational Autoencoder, the Soft-Introspective Variational Autoencoder, and a Simple Framework for Contrastive Learning of Visual Representations. After the initial pretraining, fine-tuning was performed for the collected veterinary dataset using 20% of the available data. Next, a latent space exploration was performed for each model after which the encoding part of the model was fine-tuned again, this time in a supervised manner for classification. Simple Framework for Contrastive Learning of Visual Representations proved to be the most beneficial pretraining method. Therefore, it was for this method that experiments with various fine-tuning methods were carried out. We achieved a mean ROC AUC score of 0.77 and 0.66, respectively, for the laterolateral and dorsoventral projection datasets. The results show significant improvement compared to using the model without any pretraining approach.