How long does classify_wf usually take?

lyisrae1 · February 26, 2025, 9:14pm

Hello,
May I please have some help with getting gtdbtk to go faster?

I have been trying to use gtdbtk classify-wf, and I’ve run into an unexpected problem.
I ran: cd GTDB
start-mamba
mamba activate gtdbtk-2.1.1
gtdbtk classify_wf --genome_dir 50_percent_complete_and_10_percent_contamination --out_dir GTDB --extension .fa --skip_ani_screen --cpus 40
conda deactivate
conda deactivate
echo “Finished classifying bins”

echo = date job $JOB_NAME done

This ran for 6 days on my organization’s Linux cluster. And my organization’s system had to stop the program because it went past the time limit. Is this the usual time it takes for this to work?

For clarification: The name of the conda environment is gtdbtk-2.1.1, but 2.4.0 is uploaded. I also have the database release220, and the database path is correct.

This the the message that I get when I submit a job: mamba (miniforge3) v24.1.2 loaded, to start type ‘start-mamba’

Wed Feb 19 14:07:47 EST 2025 job GTDB_Megahit1 started in uThC.q with jobID=5992203 on compute-76-13

And then it just stays there. Is there anything that I need to change?

Thanks,
lyisrae1

donovan.parks · July 2, 2025, 10:39pm

Hi,

GTDB-Tk can take awhile to run. It depends on the number of genomes being classified. It should run in hours (not days!) if you are only classifying a couple thousand genomes.

Cheers,
Donovan