GTDB Forum

Number of HQ Genomes in R95 discrepancy

https://gtdb.ecogenomic.org/stats page says there are 13, 563 HQ genomes
https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/auxillary_files/hq_mimag_genomes_r95.tsv Subsetting this to the species representatives gives 15, 773 HQ genomes

Hi,

Thank you for bringing this to our attention. The discrepancy is due to how a HQ genome is defined. On the stats page we required identified bacterial 5S, 16S, 23S to be a minimum length for them to be considered present (16S = 900bp for Archaea, 1200 bp for Bacteria; 23S = 1900bp; 5S = 80bp). This length testing was not done in the hq_mimag_genomes_r95.tsv file. We believe the intent of the MIMAG HQ criteria is that rRNA sequence should be full length so we have issues an updated file which does this length testing:
https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/auxillary_files/hq_mimag_genomes_r95.20201006_updated.tsv

Cheers,
Donovan