Resource usage optimization question

KJLambert · October 27, 2020, 9:56pm

I’m building a pipeline that will process a single bacteria genome at a time and will leverage the GTDB-tk in one step. The pipeline is cloud-based so its cost-effective to use only the resources needed. The recommended compute hardware to run GTDB-tk is 64cpu/100GB mem on 1000 genomes.
My question is: if I’m only running a single genome during an execution of GTDB-tk, how many cpu’s are needed? Is 1 or 2 cpu enough? or are there parts of GTDB-tk that will leverage more cpu’s with a single genome?

donovan.parks · October 27, 2020, 11:06pm

Hi. One CPU will be optimal if you are only processing a single genome. However, the memory requirement will still be ~100GB as this is the requirement to load the GTDB-Tk reference tree used by pplacer.