Allelic Specific Expression (ASE) analysis aims to assess the difference in expression between the two alleles of the same gene in a heterozygous individual. Next-generation sequencing (NGS) technologies give the possibility to measure ASE of all the genes of a species in a population of individuals through RNA-sequencing (RNA-seq). ASE is usually reported only as a relative measure for individual genotypes. The ratio between the number of aligned reads on the two different haplotypes represents an indication of the allelic expression imbalance (AEI) between the two alleles. With the aim to understand the existing cis-regulatory variation within a population of individuals, the inference of the expression of each allele starting from these ratios represents a crucial goal. In the real case studies, only a small subset of these AEI are obtained from bulk RNA-seq data. This leads to the “Population Level Allelic Imbalance Problem”. In this work, we will present a computational workflow to obtain a solution to this problem. First, a graph theoretical approach is applied to reconstruct the entire data-set of allele ratios. Then, Non Negative Matrix Factorization is employed to compute the allele values. A running example will be used to demonstrate how it works, and its soundness will be evaluated by inferring the allele values in genes of chromosome 1 of leaves in 98 cultivars representative of the variability present in Vitis vinifera.
Towards a Computational Approach to Quantification of Allele Specific Expression at Population Level
Gabelli G.;
2024
Abstract
Allelic Specific Expression (ASE) analysis aims to assess the difference in expression between the two alleles of the same gene in a heterozygous individual. Next-generation sequencing (NGS) technologies give the possibility to measure ASE of all the genes of a species in a population of individuals through RNA-sequencing (RNA-seq). ASE is usually reported only as a relative measure for individual genotypes. The ratio between the number of aligned reads on the two different haplotypes represents an indication of the allelic expression imbalance (AEI) between the two alleles. With the aim to understand the existing cis-regulatory variation within a population of individuals, the inference of the expression of each allele starting from these ratios represents a crucial goal. In the real case studies, only a small subset of these AEI are obtained from bulk RNA-seq data. This leads to the “Population Level Allelic Imbalance Problem”. In this work, we will present a computational workflow to obtain a solution to this problem. First, a graph theoretical approach is applied to reconstruct the entire data-set of allele ratios. Then, Non Negative Matrix Factorization is employed to compute the allele values. A running example will be used to demonstrate how it works, and its soundness will be evaluated by inferring the allele values in genes of chromosome 1 of leaves in 98 cultivars representative of the variability present in Vitis vinifera.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




