Creating "new taxa" from the de_novo_wf tree

Hi !
Thank you for making available and maintaining this wonderful and super useful ressource !

I had a bunch of genomes for which I wanted to get their GTDB taxonomy, so I ran classify_wf which worked great. In the output I had a bunch that were not assigned to any known species (or higher rank) so I ran de_novo_wf, only on those unclassified genomes, to get a tree and assign these remaining unclassified genomes to new taxa.
But from that tree how can I create new taxa for these unclassified genomes ? Do you have any existing script that does this ?
Or maybe this can be done just by parsing the tree with its internal node labels ?

For example, instead of:
d__Bacteria; p__Chloroflexota; c__Limnocylindria; o__Limnocylindrales; f__CSP1-4; g__; s__

I want to have:

d__Bacteria; p__Chloroflexota; c__Limnocylindria; o__Limnocylindrales; f__CSP1-4; g__NEW-GENUS-1; s__NEW-SPECIES-1

Just in case these unclassified genomes can be grouped at some level in a new taxa.

I understood that this could be tricky because some GTDB taxa will not be monophyletic on my new tree but I just want to have an idea of where are my unclassified genomes and if they gather in new groups.

Thanks in advance !
Eric

Hi Eric,

This is on the GTDB-Tk development roadmap. You can use ANI (say, calculated with skani) to establish which genomes are highly similar to each other and thus should be considered from the same species cluster. You can use PhyloRank to determine the relative evolutionary distance (RED) value of a tree and thus which nodes should be considered new genera, new families, etc. Unfortunately, this is all a bit of a manual exercise at the moment and one we hope to address in a future GTDB-Tk release.

Cheers,
Donovan

Thanks for your answer ! No problem I understand that it is not easy,
Do you have some sort of guide rules to do it or is it completely manual ?

I’m wondering if I could obtain the distribution of RED distances within each taxonomic rank from the reference tree, and use it to create new clades from the de novo tree.

Best,
Eric