Hi !
Thank you for making available and maintaining this wonderful and super useful ressource !
I had a bunch of genomes for which I wanted to get their GTDB taxonomy, so I ran classify_wf
which worked great. In the output I had a bunch that were not assigned to any known species (or higher rank) so I ran de_novo_wf
, only on those unclassified genomes, to get a tree and assign these remaining unclassified genomes to new taxa.
But from that tree how can I create new taxa for these unclassified genomes ? Do you have any existing script that does this ?
Or maybe this can be done just by parsing the tree with its internal node labels ?
For example, instead of:
d__Bacteria; p__Chloroflexota; c__Limnocylindria; o__Limnocylindrales; f__CSP1-4; g__; s__
I want to have:
d__Bacteria; p__Chloroflexota; c__Limnocylindria; o__Limnocylindrales; f__CSP1-4; g__NEW-GENUS-1; s__NEW-SPECIES-1
Just in case these unclassified genomes can be grouped at some level in a new taxa.
I understood that this could be tricky because some GTDB taxa will not be monophyletic on my new tree but I just want to have an idea of where are my unclassified genomes and if they gather in new groups.
Thanks in advance !
Eric