[de_novo_wf] taxa with multiple placements of equal quality


I used the de_novo_wf to classify a single new bacterial species after classify_wf only reported a “class assignment” and got the following warning messages:

[2023-03-10 09:57:23] WARNING: There are 1 taxa with multiple placements of equal quality.
[2023-03-10 09:57:23] WARNING: These were resolved by placing the label at the most terminal position.
[2023-03-10 09:57:23] WARNING: Ideally, taxonomic assignment of all genomes should be established before tree decoration.

I don’t understand what the takeaway message is. Is there anything additional I should have done or what should I interpret from this?

I’m also not sure how the tree I got from de_novo_wf is different from the one made by classify_wf, the topology and subtree around my new genome appears to be identical.



These are the commands I used, if needed:

gtdbtk classify_wf --genome_dir . --out_dir gtdbtk --mash_db .../gtdbtk-2.2.3-mash_db/ -x fasta --cpus 30 --write_single_copy_genes --keep_intermediates
gtdbtk de_novo_wf --genome_dir . --bacteria --outgroup_taxon p__Chloroflexota --out_dir gtdbtk-denovo -x fasta --cpus 30

PS. FastANI and pplacer reported classifying 0/1 genome. I read that occasionally gtdbtk might fail at this step but that the problem was fixed by unclassified genomes being report in identify/gtdbtk.failed_genomes.tsv which is in my case empty. I assume then this is really due to the novelty, and not a bug to rerun the command, or?

[2023-03-09 21:49:54] INFO: 0 genome(s) have been classified using FastANI and pplacer.


Hi Leon,

The classify_wf places your genome into a fixed reference tree using pplacer. The de novo workflow infers a new tree using FastTree. It is great that your genome is placed in the same position, but this doesn’t always need to be the case. You genome was classified as expected in both workflows. You can also ignore the warnings. These do not pertain to your genome or its classification.