Minimal Set of Glucose Variability Indices by Sparse Principal Component Analysis

Fabris, Chiara; Facchinetti, Andrea; Zanon, Mattia; Sparacino, Giovanni; Maran, Alberto; Cobelli, Claudio

Objective: Several tens of indices have been proposed to quantitatively describe glycaemic variability (GV), a risk factor for diabetes complications. While there is a general agreement that none of these indices alone is sufficient, it is also clear that most of them are correlated. To minimize redundancy, here we use Sparse Principal Component Analysis (SPCA) to select a subset of indices sufficient for a comprehensive GV description. Method: N=25 commonly used GV indices (e.g. SD, MAGE, ICG, ADRR, …) are computed on three different datasets (16 subjects each) of CGM time-series collected with the Dexcom SEVEN Plus sensor in Type 1 diabetics during the EU FP7 project “Diadvisor” (2008-2012). SPCA is used first to determine a reduced data dimension P through traditional PCA and, then, to decrease the number of variables from N=25 to M via LASSO estimation of sparse loadings. Result: In all three datasets, SPCA selected P=2 principal components and M=5 indices for each PC, preserving the 65%, 81% and 68% of the variance of the whole collection of GV indices, respectively. Interestingly, the subset of the 5 selected indices MAGE, MAGE_DESC (that is the MAGE index accounting only for downward excursions), HBGI, CV and ADRR is the same for all the analyzed datasets. Conclusion: SPCA can be used to determine the minimal set of indices necessary to describe the most part of GV in CGM data from a broader pool of redundant parameters. The fact that the same subset of indices, which can be considered the most informative for describing GV, is selected independently from the particular dataset confirms the robustness of the approach.