I’ve recently come across a few entries with “Undefined (Failed Quality Check)” in the gtdb taxonomy field (such as: GTDB - GCA_003648795.1). and I’m somewhat confused on their status.
Are entries like this particular one included in the GTDB set, or have they failed one or more of the criteria for inclusion? (my cited example looks like it fails the N50 criterion).
As I’m formulating the question I think i get what’s going on, but would still be good to confirm. Are all assemblies in refseq trawled, and have a web entry generated for them, but only those that pass all quality criteria included in the non-redundant species representative set? i.e. there’s more web entries than genomes in the GTDB?