Announcing GTDB R10-RS226

GTDB release R10-RS226 is comprised of 732,475 genomes (22% increase) organized into 143,6141 species clusters (37% increase). Additional statistics for this release are available on the GTDB Statistics page.

Release notes

  • Post-curation cycle, we identified updated spelling for 1 taxon and a valid name for a placeholder:

    • g__Prometheoarchaeum (updated name: Promethearchaeum)
    • f__MK-D1 (updated name: Promethearchaeaceae)
      Note that the LPSN linkouts point to the correct updated names. We encourage users to use the updated names as these will appear in the next release.
  • QC criteria for GTDB was modified to consider CheckM v1 and v2 completeness
    and contamination estimates. In order to pass QC, a genome must have completeness
    >=50%, contamination <5%, and quality (completeness - 5*contamination)
    >=50% using both the CheckM v1 and v2 estimates. The exception is that a contig
    comprised of <10 contigs passes QC if these criteria are meet be either CheckM v1 or v2.

  • Mash is no longer used as a prefilter for establishing GTDB species clusters
    as this was found to be unnecessary with the prefiltering provided internally
    by skani (Shaw et al., Nat Methods, 2023).

  • The 20% most heterogeneous sites were removed from the archaeal MSA using alignment_pruner.pl (broCode/alignment_pruner.pl at master · novigit/broCode · GitHub).

  • The GTDB taxonomy tree now provides links to Sandpiper (https://sandpiper.qut.edu.au) results which provide information about the geographic and environmental distribution of a taxon.

  • We thank Jan Mares for his assistance in curating the class Cyanobacteriia,
    Peter Golyshin for bringing Ferroplasma acidiphilum strain Y (GCF_002078355.1) to our attention, and Brian Kemish for providing IT support to the project.

2 Likes