I’m wondering if there are GFF or similar annotation files for the genomes in GTDB. Rather than just the raw sequence like we have for the fasta files, I am interested in the position of each genomic element in relation to each other. I suspect that a GFF file might have been output by the protein prediction pipeline, but they don’t seem to be provided in the data repository. This would be very useful for downstream analysis and visualisation.

Ah, I’ve been informed that a lot of this information is actually in the fasta header. I will see if I can write a script to convert this to .gff or .gbk.