Hi Liping Qu,
Thank you for your message. You can find the release notes for R226 on both the GTDB FTP site and on this forum:
- https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/RELEASE_NOTES.txt
- Announcing GTDB R10-RS226
The R226 release notes indicate that we are now using CheckM v1 and v2 in order to establish which genomes should be included in GTDB.
We do perform an assessment of any methodological changes before implementing them into GTDB. These are either described in the release notes or manuscripts. Incorporation of CheckM v2 will be discussed in a NAR database manuscript we are currently preparing. In short, adoption of CheckM v2 quality estimates and the exception for genomes with <10 contigs resulted in 12,214 (11.1% increase) additional genomes failing QC and only 178 (0.023% increase) additional genomes passing QC, demonstrating that these changes largely result in more stringent QC.
We appreciate that changing QC standards do result in less stability in GTDB. However, we have to balance this with reflecting best practices in the community. CheckM v2 was published nearly 2 years ago at this point and appears to have been widely adopted as evidence by its citation count. For these reasons, we determined that it was suitable to start using CheckM v2 genome quality estimates as part of the GTDB QC.
We plan to use the latest versions of CheckM v2 as they become available. This would include any bug fixes and updates to its ML models.
I hope this clarifies are decision to incorporate CheckM v2 into the GTDB QC process.
Regards,
Donovan