Self-Supervised Representation Learning for EEG-Based Detection of Neurodegenerative Diseases

Del Pup, F.; Tshimanga, L. F.; Zanola, A.; Taffarello, L.; Tentori, E.; Atzori, M.

doi:10.3390/app152413275

Electroencephalography (EEG) is an important noninvasive diagnostic tool for detecting neurodegenerative disorders. In this context, EEG-based deep learning models show promise due to their ability to capture nonlinear brain dynamics but often suffer from poor generalizability caused by high inter-subject variability. Self-supervised learning (SSL) offers a promising solution by enabling models to learn robust representations from large unlabeled datasets. This study introduces a double-masking representation learning framework for EEG analysis. Using data aggregated from eight multi-center datasets (3156 subjects; 439 h of EEG recordings), a hybrid convolutional-transformer model (TransformEEG) is pretrained to enhance generalization in neurodegenerative disease classification, focusing on Parkinson’s and Alzheimer’s diseases. This approach combines phase-swap data augmentation, designed to facilitate the learning of EEG phase-amplitude coupling, with a double-masking function that operates at both the signal and the transformer’s token levels. The pretrained model was evaluated against a fully supervised baseline using Monte Carlo cross-validation with 100 splits across three public pathology detection datasets. Pretraining led to consistent improvements in both median balanced accuracy and Inter-Quartile Range (IQR) across all fine-tuning datasets. Compared to the fully supervised baseline, the proposed approach increases median balanced accuracy by 2.8% for Parkinson’s disease detection and by 4.2% for Alzheimer’s disease detection, while also reducing performance variability across 100 Monte Carlo splits. These results demonstrate that SSL can enhance EEG deep learning performance, though achieving robust generalization in clinical applications remains an open research challenge.