Hello GTDB team,
First time user here.
Is there a straightforward way to retrieve full-length 16S and 23S rRNAs from specific taxa from GTDB, including introns? Some taxa appear to have 23S rRNAs identified, but it is unclear how we can access them.
I am interested in rRNA sequences from Archaea. What is the difference between sequences in ssu_all_r207.tar.gz and those in ar53_ssu_reps.tar.gz?
I can extract the sequences in ssu_all_r207.tar.gz file without problem, but I note discrepancies in the identified positions (and length) in the sequences in this file relative to the original sequences on NCBI (an example below).
From ssu_all_r207 - the sequence length is 202, which is 1 short of the expected length in positions 11802-12004:
>GB_GCA_019058055.1~JAHLWG010000214.1 d__Archaea;p__Asgardarchaeota;c__Lokiarchaeia;o__CR-4;f__SOKP01;g__Loki-b32;s__Loki-b32 sp019057865 [location=11802..12004] [ssu_len=202] [contig_len=12006] GAGGTGATCCAGCCGCAGGTTCCCCTACGGCTACCTTGTTACGACTTCTCCCTCCTCGCATACTAGAAACTCGATATGACCAGTCTGACCATACCTCATTTTTAGCACACTCGGATGGAGCGACGGGCGGTGTGTGCAAGGAGCAGAGACGTATTCACCGTGCGATGATGACACACGATTACTAGGGATTCCACGTTCATGT
From NCBI GenBank of the same sequence region based on the annotated positions (MAG: Candidatus Lokiarchaeota archaeon isolate 3H5_20 k141_57026, whol - Nucleotide - NCBI) - the length is 203 as expected:
>JAHLWG010000214.1:11802-12004 MAG: Candidatus Lokiarchaeota archaeon isolate 3H5_20 k141_57026, whole genome shotgun sequence AGGAGGTGATCCAGCCGCAGGTTCCCCTACGGCTACCTTGTTACGACTTCTCCCTCCTCGCATACTAGAA ACTCGATATGACCAGTCTGACCATACCTCATTTTTAGCACACTCGGATGGAGCGACGGGCGGTGTGTGCA AGGAGCAGAGACGTATTCACCGTGCGATGATGACACACGATTACTAGGGATTCCACGTTCATG
I greatly appreciate you input and advice.