The GTDB-Tk results and mash distance

Hi, gtdbtk team!
I used gtdbtk to annotate the results as f__Lachnospiraceae; g__UBA3282; s__UBA3282 sp947243845, but I am more interested in finding the known and culturable relatives of this strain. To do so, I attempted to use mash distances to search for them. Despite building a large NCBI reference database using the mash reference (-s 10,000), I still did not obtain satisfactory results.
(1)Does this imply that there are no known culturable relatives of this strain?
Here is the mash distance in GTDBTK reference database compaired to my MAG:
image
Here is the mash distance in the NCBI Clostridia reference compared to my MAG:
image
(2) It’s weird that the nearest mash distance under the class Clostridia is only 0.220489, which is quite confusing to me.

Hi,

There are lots of GTDB species clusters defined by metagenome-assembled genomes (MAGs) and many of these do contain cultured representatives. As such, it is not surprising that your MAG has no similarity to cultured representatives (isolates genomes) at NCBI.

Cheers,
Donovan

Thank you for your explanation @ donovan.parks! I understand that many GTDB species clusters are defined by metagenome-assembled genomes (MAGs), and not all of them contain cultured representatives. Given this, I was wondering if you might have any suggestions or ideas on how to find the closest species in the NCBI reference database, especially for MAGs with no similarity to cultured isolates? Any guidance would be greatly appreciated!

You can use a program like skani to calculate the ANI between MAGs and all genomes in the NCBI Assembly database.

1 Like

Thank you for the suggestion. I’ve already tried that method, but the result was unusual. However, I’ll give it another try.

It could be that your MAGs are too genomically distinct for ANI to be calculated. If this is the case, you could try FastAAI: https://academic.oup.com/nar/article/53/8/gkaf348/8120557.