Discrepancy between "type_species_of_genus" online and metadata file

Hi there,

I recently downloaded the bac120_metadata_r214.tsv.gz file to map GTDB taxonomy to genomes from the BV-BRC (previously PATRIC) database.

I am filtering for both genomes that are “gtdb_representatives” and also the “type_species_of_genus”. From the bac120_metadata_r214.tsv.gz file - I noticed that there are multiple species representatives that are listed as “type_species_of_genus”, which is obviously a mistake? (see below)

When I look this genus (g__Bulleidia) up in the GTDB tree browser - it shows only 1 species representative that is also a “type_species_of_genus”

I’m wondering what is the reason for this discrepancy? I’m only interested in having 1 genus-level representative at the end of my filtering - how might I get this?

Thanks in advance for your help!



The data found in the ‘gtdb_type_species_of_genus’ column within the bac120_metadata_r214.tsv file corresponds to metadata retrieved for the clustering and curation of GTDB. Initially, the genomes GCF_009696165.1, GCF_000177375.1, GCF_900240265.1, GCF_000425005.1, and GCF_900343155.1 were assigned to 5 distinct species across 5 different genera in NCBI, (Stecheria intestinalis, Bulleidia extructa, Galactobacillus timonensis, Solobacterium moorei, and Lactimicrobium massiliense, respectively). These genomes were identified as the type species within their respective genera, as indicated in the bac120 metadata file.

Following the GTDB curation process, all these genomes were consolidated under a single genus, g__Bulleidia. This choice was made based on the precedence of this genus as the oldest named one. Consequently, only s__Bulleidia extructa is now recognised as the type species of the genus on the website.


