Dear all,
I have around 20k genomes of known bacterial strains from public and private databases. I have collected them using respective databases and based on their (species level) classification from individual metadata from respective studies reporting them. Most of them are classified by earlier GTDB versions (R89 or so) and others via NCBI and strain isolation reports by authors.
My question is:
Is there any way to only use a subset of GTDB (only some bacterial phyla in my dataset) to reconfirm the species names and any changes in classification (species) as yes or no logic without exactly placing them in the complete reference tree?
In short, are there options to inform the Tk about the existing classification and would it help minimize computation?
Note: I have RAM limitations so using the Scratch directory and 1 CPU for pplacer and 24 CPUs in general but it is very slow. Therefore trying to find ways to minimize computation.