Sorry for the dumb question, it is the first time I use GTDB-tk: I used gtdbtk classify_wf command, and got multiple treefiles in the output, and I could not figure out why, especially what is “gtdbtk.high.bac120.classify.tree”?
Thanks in advance for the explanation.
Starting with version 2.0.0,
gtdbtk classify_wf pipeline uses the divide-and-conquer approach by default.
With this approach the placement of genomes is now divided in 2 steps:
- The first one is to place the user genomes onto a backbone tree representing one genome per family in GTDB (gtdbtk.high.bac120.classify.tree) .
- The genomes classified with a class level (order level for GTDB-Tk v2.0.0) on the backbone tree are then placed in a second tree (class-level subtrees) Those trees are made by a subsets of classes and are indexed for 1 to n ( gtdbtk.bac120.classify.tree..tree) .
- This second placement will finalise the taxonomy of the user genome down to species level ( if possible)
We are preparing a manuscript with the details.
Please note, from version 2.1.0,
gtdbtk.high.bac120.classify.tree has been renamed to