Evolutionary studies require extensive examination of genomic information across all domains of life. Despite the availability of a large number of genomes through GenBank, the effective visualization or comparison of the information they contain is challenging due to many reasons, including their size. We introduce genome-based retrieval and analysis parser, a comprehensive software tool to analyze genome files, and an online database housing an extensive collection of carefully curated, high-quality genome statistics for all the organisms available in the RefSeq database of National Center for Biotechnology Information. Users can either directly search, or select from precategorized groups, the organisms of their choice and retrieve data, and the output is generated as tables containing more than 200 columns of useful genomic information (base counts, GC content, Shannon entropy, codon usage, etc.) separately calculated for different genomic elements (e.g. coding sequences, introns, transfer RNA, ribosomal RNA, noncoding RNA, etc.). The data are independently displayed (if applicable) for each chromosomal, mitochondrial, plastid, or plasmid sequence. All the data can be visualized on the database or downloaded as comma-separated value or Excel files. The genome-based retrieval and analysis parser database is free to access without any registration and is publicly available at http://tacclab.org/gbrap/
GBRAP: A Comprehensive Database and Tool for Exploring Genomic Diversity Across All Domains of Life
Yaddehige, Sachithra Kalhari;Vischioni, Chiara;Berselli, Michele;Alberghini, Leonardo;Mezzavilla, Massimo;Bobbo, Tania;Taccioli, Cristian
2025
Abstract
Evolutionary studies require extensive examination of genomic information across all domains of life. Despite the availability of a large number of genomes through GenBank, the effective visualization or comparison of the information they contain is challenging due to many reasons, including their size. We introduce genome-based retrieval and analysis parser, a comprehensive software tool to analyze genome files, and an online database housing an extensive collection of carefully curated, high-quality genome statistics for all the organisms available in the RefSeq database of National Center for Biotechnology Information. Users can either directly search, or select from precategorized groups, the organisms of their choice and retrieve data, and the output is generated as tables containing more than 200 columns of useful genomic information (base counts, GC content, Shannon entropy, codon usage, etc.) separately calculated for different genomic elements (e.g. coding sequences, introns, transfer RNA, ribosomal RNA, noncoding RNA, etc.). The data are independently displayed (if applicable) for each chromosomal, mitochondrial, plastid, or plasmid sequence. All the data can be visualized on the database or downloaded as comma-separated value or Excel files. The genome-based retrieval and analysis parser database is free to access without any registration and is publicly available at http://tacclab.org/gbrap/Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.