MAGs for GTDB-tk classification annotation

Hi everyone, Further question on this post.
I’m using GTDB-Tk (R226) to classify my assembled MAGs, but the results look odd and differ substantially from my expectations, completeness=96.9/contamination=0.57,d__Bacteria;p__Bacillota;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__COE1;s__COE1 sp910585765, closest genome reference is GCA_910585765.1). I suspect this could be related to MAG fragmentation.

Rather than relying only on the final taxonomy, I’d like to identify potentially close relatives (closest strains/reference genomes) for this MAG, ideally starting from single-copy marker genes (or the GTDB marker gene set) to perform more direct comparisons (e.g., marker extraction → alignment → phylogeny or distance-based analyses).

My questions is:

  1. What community-recommended workflow/tools would you use to identify and compare close relatives based on single-copy marker genes?

Thank you!

Hi,

GTDB-Tk assigned genomes to species using average nucleotide identity (ANI). This is robust to genome quality, including the genome being fragmented into many contigs. The de novo workflow of GTDB-Tk can be used if you wish to infer a tree using the GTDB marker genes: de_novo_wf — GTDB-Tk 2.6.1 documentation.

Cheers,
Donovan