GTDB Tk to confirm known species

Dear all,

I have around 20k genomes of known bacterial strains from public and private databases. I have collected them using respective databases and based on their (species level) classification from individual metadata from respective studies reporting them. Most of them are classified by earlier GTDB versions (R89 or so) and others via NCBI and strain isolation reports by authors.

My question is:
Is there any way to only use a subset of GTDB (only some bacterial phyla in my dataset) to reconfirm the species names and any changes in classification (species) as yes or no logic without exactly placing them in the complete reference tree?

In short, are there options to inform the Tk about the existing classification and would it help minimize computation?

Note: I have RAM limitations so using the Scratch directory and 1 CPU for pplacer and 24 CPUs in general but it is very slow. Therefore trying to find ways to minimize computation.

Hi Naren,

No, unfortunately there is no way to limit GTDB-Tk to specific taxonomic groups. However, the latest version does assign all genomes to GTDB species clusters using ANI before placing genomes in the reference tree. Assuming all your genomes can be assigned via ANI (i.e. to exists species clusters) no genomes will be placed in the reference tree.

Cheers,
Donovan

1 Like