Over the last decades, the exuberant development of next-generation sequencing has revolutionized gene discovery. These technologies have boosted the mapping of single nucleotide polymorphisms (SNPs) across the human genome, providing a complex universe of heterogeneity characterizing individuals worldwide. Fractal dimension (FD) measures the degree of geometric irregularity, quantifying how "complex" a self-similar natural phenomenon is. We compared two FD algorithms, box-counting dimension (BCD) and Higuchi's fractal dimension (HFD), to characterize genome-wide patterns of SNPs extracted from the HapMap data set, which includes data from 1184 healthy subjects of eleven populations. In addition, we have used cluster and classification analysis to relate the genetic distances within chromosomes based on FD similarities to the geographical distances among the 11 global populations. We found that HFD outperformed BCD at both grand average clusterization analysis by the cophenetic correlation coefficient, in which the closest value to 1 represents the most accurate clustering solution (0.981 for the HFD and 0.956 for the BCD) and classification (79.0% accuracy, 61.7% sensitivity, and 96.4% specificity for the HFD with respect to 69.1% accuracy, 43.2% sensitivity, and 94.9% specificity for the BCD) of the 11 populations present in the HapMap data set. These results support the evidence that HFD is a reliable measure helpful in representing individual variations within all chromosomes and categorizing individuals and global populations.

Characterizing Fractal Genetic Variation in the Human Genome from the Hapmap Project

Porcaro, Camillo
2022

Abstract

Over the last decades, the exuberant development of next-generation sequencing has revolutionized gene discovery. These technologies have boosted the mapping of single nucleotide polymorphisms (SNPs) across the human genome, providing a complex universe of heterogeneity characterizing individuals worldwide. Fractal dimension (FD) measures the degree of geometric irregularity, quantifying how "complex" a self-similar natural phenomenon is. We compared two FD algorithms, box-counting dimension (BCD) and Higuchi's fractal dimension (HFD), to characterize genome-wide patterns of SNPs extracted from the HapMap data set, which includes data from 1184 healthy subjects of eleven populations. In addition, we have used cluster and classification analysis to relate the genetic distances within chromosomes based on FD similarities to the geographical distances among the 11 global populations. We found that HFD outperformed BCD at both grand average clusterization analysis by the cophenetic correlation coefficient, in which the closest value to 1 represents the most accurate clustering solution (0.981 for the HFD and 0.956 for the BCD) and classification (79.0% accuracy, 61.7% sensitivity, and 96.4% specificity for the HFD with respect to 69.1% accuracy, 43.2% sensitivity, and 94.9% specificity for the BCD) of the 11 populations present in the HapMap data set. These results support the evidence that HFD is a reliable measure helpful in representing individual variations within all chromosomes and categorizing individuals and global populations.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3455857
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
social impact