Deep Learning Staging of Liver Iron Content From Multiecho MR Images

Positano, Vincenzo; Meloni, Antonella; Santarelli, Maria Filomena; Pistoia, Laura; Spasiano, Anna; Cuccia, Liana; Casini, Tommaso; Gamberini, Maria Rita; Allò, Massimo; Bitti, Pier Paolo; Pepe, Alessia; Cademartiri, Filippo

doi:10.1002/jmri.28300

Background MRI represents the most established liver iron content (LIC) evaluation approach by estimation of liver T2* value, but it is dependent on the choice of the measurement region and the software used for image analysis. Purpose To develop a deep-learning method for unsupervised classification of LIC from magnitude T2* multiecho MR images. Study Type Retrospective. Population/Subjects A total of 1069 thalassemia major patients enrolled in the core laboratory of the Myocardial Iron Overload in Thalassemia (MIOT) network, which were included in the training (80%) and test (20%) sets. Twenty patients from different MRI vendors included in the external test set. Field Strength/Sequence A5 T, T2* multiecho magnitude images. Assessment Four deep-learning convolutional neural networks (HippoNet-2D, HippoNet-3D, HippoNet-LSTM, and an ensemble network HippoNet-Ensemble) were used to achieve unsupervised staging of LIC using five classes (normal, borderline, middle, moderate, severe). The training set was employed to construct the deep-learning model. The performance of the LIC staging model was evaluated in the test set and in the external test set. The model's performances were assessed by evaluating the accuracy, sensitivity, and specificity with respect to the ground truth labels obtained by T2* measurements and by comparison with operator-induced variability originating from different region of interest (ROI) placements. Statistical Tests The network's performances were evaluated by single-class accuracy, specificity, and sensitivity and compared by one-way repeated measures analysis of variance (ANOVA) and one-way ANOVA. Results HippoNet-Ensemble reached an accuracy significantly higher than the other networks, and a sensitivity and specificity higher than HippoNet-LSTM. Accuracy, sensitivity, and specificity values for the LIC stages were: normal: 0.96/0.93/0.97, borderline: 0.95/0.85/0.98, mild: 0.96/0.88/0.98, moderate: 0.95/0.89/0.97, severe: 0.97/0.95/0.98. Correctly staging of cases was in the range of 85%-95%, depending on the LIC class. Multiclass accuracy was 0.90 against 0.92 for the interobserver variability. Data Conclusion The proposed HippoNet-Ensemble network can perform unsupervised LIC staging and achieves good prognostic performance. Evidence Level 4 Technical Efficacy Stage 2