Despite the inherent lack of a ground truth in clustering, a broad consensus is overall acknowledged in defining the concept of cluster in the continuous setting. Conversely, this remains controversial in the presence of categorical data. We propose a novel notion of cluster based on the dual concepts of high frequency and variable association. We show how this concept aligns with the cluster notion provided by modal clustering in the continuous setting, and allows us to borrow and adapt existing operational tools to develop a novel procedure which automatically determines the number of clusters. The method is illustrated on some real data and tested via simulations.
Modal Clustering for Categorical Data
Corsini, Noemi;Menardi, Giovanna
2026
Abstract
Despite the inherent lack of a ground truth in clustering, a broad consensus is overall acknowledged in defining the concept of cluster in the continuous setting. Conversely, this remains controversial in the presence of categorical data. We propose a novel notion of cluster based on the dual concepts of high frequency and variable association. We show how this concept aligns with the cluster notion provided by modal clustering in the continuous setting, and allows us to borrow and adapt existing operational tools to develop a novel procedure which automatically determines the number of clusters. The method is illustrated on some real data and tested via simulations.| File | Dimensione | Formato | |
|---|---|---|---|
|
unpaywall-bitstream--509175962.pdf
accesso aperto
Tipologia:
Published (Publisher's Version of Record)
Licenza:
Creative commons
Dimensione
1.56 MB
Formato
Adobe PDF
|
1.56 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




