GTDB release R10-RS226 is comprised of 732,475 genomes (22% increase) organized into 143,6141 species clusters (37% increase). Additional statistics for this release are available on the GTDB Statistics page.
Release notes
-
Post-curation cycle, we identified updated spelling for 1 taxon and a valid name for a placeholder:
- g__Prometheoarchaeum (updated name: Promethearchaeum)
- f__MK-D1 (updated name: Promethearchaeaceae)
Note that the LPSN linkouts point to the correct updated names. We encourage users to use the updated names as these will appear in the next release.
-
QC criteria for GTDB was modified to consider CheckM v1 and v2 completeness
and contamination estimates. In order to pass QC, a genome must have completeness
>=50%, contamination <5%, and quality (completeness - 5*contamination)
>=50% using both the CheckM v1 and v2 estimates. The exception is that a contig
comprised of <10 contigs passes QC if these criteria are meet be either CheckM v1 or v2. -
Mash is no longer used as a prefilter for establishing GTDB species clusters
as this was found to be unnecessary with the prefiltering provided internally
by skani (Shaw et al., Nat Methods, 2023). -
The 20% most heterogeneous sites were removed from the archaeal MSA using alignment_pruner.pl (broCode/alignment_pruner.pl at master · novigit/broCode · GitHub).
-
The GTDB taxonomy tree now provides links to Sandpiper (https://sandpiper.qut.edu.au) results which provide information about the geographic and environmental distribution of a taxon.
-
We thank Jan Mares for his assistance in curating the class Cyanobacteriia,
Peter Golyshin for bringing Ferroplasma acidiphilum strain Y (GCF_002078355.1) to our attention, and Brian Kemish for providing IT support to the project.