In my analysis, I found a problem about two bacteria species, including ‘s__CSP1-5 sp012974305’ and ‘s__CSP1-5 sp027293415’. They share 95.35% ANI but were assigned to different species in the GTDB database. I have some MAGs, some of them were identified as sp012974305 and remainning were identified as sp027293415 by the gtdb-tk v2.4.0 (both were based on the ANI only method, and MAGs and reference genomes shared ANI higher than 98%). However, all these MAGs shared ANI higher than 96.5% and were clustered into a single species by dRep (these MAGs clustered into two sub-clusters but the two sub-clusters shared ANI higher than 96.5%, no reference genomes were used in the dRep analysis, the skani version are both 0.2.1 in dRep and gtdb-tk). These two evidences suggest that may the two bacterial species should be considered as a single species?
Here are the skani matrix and only two representative MAGs with >99% completeness and <2% contamination are shown (n=4, n: number of input genomes):
4
GCA_012974305.1.fa
GCA_027293415.1.fa 95.35
bin.GT1.1.fa 96.43 98.66
bin.SY14.2.fa 98.39 96.49 96.76
Here are the results of gtdb-tk v2.4.0 (n=1199):
bin.GT1.1 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Methylomirabilales;f__CSP1-5;g__CSP1-5;s__CSP1-5 sp027293415 GCA_027293415.1 95.0 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Methylomirabilales;f__CSP1-5;g__CSP1-5;s__CSP1-5 sp027293415 98.61 0.743 N/A N/A N/A N/A N/A N/A ani_screen classification based on ANI only GCA_012974305.1, s__CSP1-5 sp012974305, 95.0, 96.21, 0.747 N/A N/A N/A N/A
bin.SY14.2 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Methylomirabilales;f__CSP1-5;g__CSP1-5;s__CSP1-5 sp012974305 GCA_012974305.1 95.0 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Methylomirabilales;f__CSP1-5;g__CSP1-5;s__CSP1-5 sp012974305 98.35 0.882 N/A N/A N/A N/A N/A N/A ani_screen classification based on ANI only GCA_027293415.1, s__CSP1-5 sp027293415, 95.0, 96.12, 0.771 N/A N/A N/A N/A
By the way, the ANI value were different between skani and gtdbtk even I really used the same skani in the gtdbtk conda environment. The only different is the number of input genomes (n=4 and n=1199). You can see that the ANI value of the same genome pair is inversely proportional to the number of input genomes. So I guess that the two bacterial species shared a ANI lower than 95% in a huge data set of the gtdb analysis but shared a ANI higher than 95% in a small data set of my own analysis? I’m not sure to assign these MAGs to a single or two different species.