Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data

Lee, Y.; Baruzzo, G.; Kim, J.; Seo, J.; Di Camillo, B.

doi:10.1109/ACCESS.2025.3618851

In tabular data analysis, high model accuracy is often regarded as a prerequisite for discussing feature importance. This assumption stems from the expectation that the validity of feature importance correlates with model performance. In this work, we challenge this prevailing belief by demonstrating that even low-performing models can provide reliable feature importance on biomedical datasets. We conduct experiments to observe how feature importance rankings change as model performance progressively degrades. Using three synthetic datasets and four real-world biomedical datasets, we compare feature rankings from the full datasets to those obtained after reducing either the number of samples (samples removal) or the number of features (features removal), using different feature stability indices. Our results reveal that, in both synthetic and real datasets, feature rankings remain stable during performance degradation caused by features removal. In contrast, sample removal introduces greater discrepancies in feature importance rankings as performance deteriorates more severely. By analyzing the distribution of feature importance values and theoretically examining the probability that the model fails to distinguish importance between features, we show that models can still reliably identify feature importance despite performance degradation due to features removal. We conclude that the validity of feature importance can be preserved even at suboptimal model performance levels, as long as the degradation stems from insufficient features rather than insufficient samples. This has a considerable impact on biomedical research, where feature importance analysis plays a pivotal role in clinical decision support and translational bioinformatics.