Kraken2 compatible GTDB database

Hello there!
Thank you so much for this database! I was trying to build a kraken2 compatible 16s GTDB database. I was told that I can only make a database for the 16S GTDB sequences if I format the taxonomy to contain the correct nodes.dmp/names.dmp files. I was wondering if you had additional information about this? Or how i could build the database. Thank you for your time!

Hi,

You can find 3rd party, pre-built Kraken2 16S GTDB databases in the third-party tools section of the GTDB website:
https://gtdb.ecogenomic.org/tools

If these existing databases don’t fit your requirements, there is also a script linked to on this page for converting the GTDB taxonomy to NCBI-style taxonomy files.

Cheers,
Donovan

So, i went through all the resources and im still pretty confused. The third party site with 16s database only has 16s fasta sequences. The headers of these sequences have ncbi format taxonomies. I can convert these to GTDB style taxonomies to use the linked script. But that script needs the whole genome apparently if i need to build a kraken2 database. So, im not exactly sure how to proceed. Any suggestions from your end is appreciated! I was able to build kraken2 compatible gtdb database though!

Hi. I don’t have any direct experience with building a 16S Kraken database. If you don’t need to build your own database, you can use IDTAXA to classify 16S sequences according to the GTDB R95:
http://www2.decipher.codes/Classification.html

Hello. Slight tangent, but there is another source of Kraken2 (and other metagenomic classifier tools) GTDB databases here

The one currently linked to via the third party tools page (from Lee Katz and Henk den Bakker) is not working (I’ve raised an issue), nor is the Figshare one from Guillaume Meric and Ryan Wick (I’ve told them).

Thanks!

1 Like