GTDB ERROR 55 GB

I am currently using GTDB-Tk to classify metagenome-assembled genomes (MAGs), but I am encountering a memory-related issue during the pplacer step.

Specifically, the pipeline reports that approximately 55 GB of RAM is required to run pplacer, which exceeds the available memory on my system (32 GB RAM). As a result, the classification step fails or cannot proceed.

My setup:

· GTDB-Tk

· Database version: [release 214

· System: Linux (Ubuntu)

· RAM: 32 GB

· Core i7

My questions are:

1. Is there a way to run GTDB-Tk (especially pplacer) with lower memory requirements?

2. Are there recommended parameters or flags to reduce RAM usage?

3. Is it acceptable to skip the pplacer step (e.g., using --skip_pplacer), and how would this affect taxonomic accuracy?

4. Are there alternative workflows or tools recommended for systems with limited RAM?

Any suggestions or best practices would be greatly appreciated.

Thank you.

Hi,

Unfortunately, GTDB-Tk does have high memory requirements due to pplacer. I believe the current version has a peak memory requirement of 110 GB, see: https://ecogenomics.github.io/GTDBTk/installing/index.html

You can try using the --scratch_dir flag. I’m not sure what the exact memory requirements for GTDB-Tk are when using this flag, but it may still be >32 GB. I don’t believe there is a –skip_pplacer flag. Running of pplacer is required for taxonomic classification above the rank of species. You can run GTDB-Tk at Kbase if this is a workable solution for you though it is using an older version of the GTDB database: Classify Microbes with GTDB-Tk - v2.3.2 | KBase App .

As for alternatives, you could see if gSearch provides sufficient information for your use case: GSearch: ultra-fast and scalable genome search by combining K-mer hashing with hierarchical navigable small world graphs - PMC .

Cheers,
Donovan

Hi,

You might also take a look at KMetaShot (https://academic.oup.com/bib/article/26/1/bbae680/7941744). This should work well if your genomes are from genera that are well represented in GTDB. I’m not sure how it will perform if your genomes are more taxonomically novel.

Cheers,
Donovan

Another alternative is to run GTDB-Tk via Galaxy: https://galaxy-main.usegalaxy.org/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fgtdbtk_classify_wf%2Fgtdbtk_classify_wf%2F2.6.1%2Bgalaxy0&version=latest