Misleading taxonomy assignments

There are certain strains that are listed as “representative” of the species even though they are not that species. For example assembly GCF_001719045.1 is listed as a species representative of Paenibacillus polymyxa (B) even though it is clearly not that species since it has only 89.8% ANI to the type strain assembly GCF_000217775.1 (ATCC_842). Similarly, assembly GCF_001709075.1 is listed as a species representative of Paenibacillus polmyxa (D) even though it is clearly not that species since it has only 90.0% ANI to the type strain assembly. The output of GTDB-Tk lists both of the assemblies as deriving from P. polymxya. I believe that this is both misleading and incorrect, particularly if no attention is paid to the letter qualifier (and many will not). The two strains appear to be an unidentified species of Paenibacillus and very likely the same species (95.13% ANI). It seems to me that if a type strain assembly exists for a species, that strain and only that strain should be used as the “representative” of the species, and any strain that does not have at least 95% identity to the type strain should never be given that species designation.

Hi,

Alphabetic suffixes are used in GTDB to explicitly indicate that a genome belongs to a different species according to the GTDB, but that the genome has historically been associated with the unsuffixed species name. For example, GCF_001719045.1 is classified as Paenibacillus polymyxa at NCBI. We agree with your assessment that this genome does not belong to this species if one is using an ANI-based species definition so classify it as Paenibacillus polymyxa_B:
https://gtdb.ecogenomic.org/genomes?gid=GCF_001719045.1

Some more details on the use of alphabetic suffixes in GTDB can be found in this FAQ:
https://gtdb.ecogenomic.org/faq#genera_and_species_end_with_alpha_suffix

Cheers,
Donovan