I’m trying to figure out good defaults I can use for forming genome clusters from skani. I see the default ANI is 95% and min_af is 50%. However, does this require both the reference → query and query → reference both be above 50% fraction alignment or just one of them?
Hi. We always consider the maximum AF. The rationale is that GTDB contains many MAGs. It is entirely possible you are comparing a near-complete genome to a partial MAG which will result in an artificially low AF. In contrast, comparing a partial MAG to a near-complete genome should give a robust AF estimate. This also helps mitigate the issue of comparing two partial genomes/MAGs, though doesn’t complete resolve this challenge.
Thanks for explaining the logic here! It makes a lot of sense especially dealing with fragmented genomes that are likely in metagenomics datasets.