Doubt about Genome QC methods

Dear GTDB team,

I have some genomes of interest that I would like to quality check in the same way that was used for quality control of GTDB. From what I have seen in the methods section most of the statistics for the applied criteria could be obtained with your CheckM program. However, for criterion iv (contain >40% of the bac120 or ar53 marker genes), I am not sure how it could be calculated (whether it could be calculated with CheckM, or another program). Likewise for criterion vi ( have an N50 >5kb), do you use the N50_contigs or the N50_scaffolds?

Could you tell me in detail how you get this statistic?

Thank you very much in advance for any help you can give me.
Best regards, Sam

Hi Sam,

We used the contig N50. Criterion iv is specific to GTDB and is used because we require a sufficient number of these marker genes for phylogenetic inference. For general MAG QC I would not recommend this criterion. If you really want to use this criterion let me know and I can see if GTDB-Tk can produce this number for you.

Cheers,
Donovan