Announcing GTDB R08-RS214

GTDB R08-RS214 has been released and consists of 402,709 genomes organized into 85,205 species clusters. Additional statistics for this release are available on the GTDB Statistics page.

This release introduces the following changes:

  • We thank Jan Mares for his help in curating the Cyanobacteria

  • Phylum names have been updated following the valid publication of 42 names in [IJSEM]
    (Valid publication of the names of forty-two phyla of prokaryotes - PubMed), including Bacillota and Pseudomonadota

  • Fixed issue with SSU files where sequences started 2 bp after correct start and stopped
    1 bp after correct end of sequence. Thanks to CX for bringing this issue to our attention:
    16S, 23S and ssu_all_r207 - #2 by donovan.parks

  • SSU files now provide sequences in their 5ā€™ to 3ā€™ orientation

  • Changed QC criterion for number of contigs from 1000 to 2000 in order to better align the
    GTDB criteria with RefSeq (Assembly Anomalies and Other Reasons a Genome Assembly may be Excluded from RefSeq)

  • Changed QC criterion to use ar53 instead of ar122 marker set. The impact of this change was
    evaluated on the 353,569 genomes (~6,100 archaeal) considered for GTDB R207:
    ā€“ only 1 additional genome passed QC
    ā€“ only 21 additional genomes failed QC which included the following species representatives:
    ā€“ s__Methanoregula sp002497485
    ā€“ s__Methanobrevibacter_A sp017634055
    ā€“ s__Methanosphaera sp003266165
    ā€“ s__MGIIa-L1 sp002688825
    ā€“ s__MGIIb-N2 sp002503665
    ā€“ s__MGIIa-L2 sp002692685
    ā€“ s__MGIIb-O3 sp002730445
    ā€“ s__DTDI01 sp011334935
    ā€“ s__Methanosphaera sp017652595
    ā€“ s__Nitrosopelagicus sp902606945
    ā€“ s__Methanolinea sp002501965

Does this include the Mash DB? Iā€™d like to utilize the ANI screening but Iā€™m not sure exactly how to use or get the Mash DB for the current build.

Updated reference files for GTDB-Tk are in the works and should be ready in the next week.

Congratulations Donovan and team! Looking forward to pulling my genome collection through the new release.

The Mash DB is produced when you use the --mash_db option with a path to a non-existent file.