GTDB Forum

Reverse translating msa marker gene alignment

Hi everyone

GTDB is an amazing resource. I am looking at designing primers of a protein coding gene to look at strain -level diversity of a clade and thought I could use GTDB to check across >100 marker genes for good candidates for primer design.

From what I can work out from the current download page bac120_msa_marker_genes_reps is the trimmed and aligned marker gene for the GTDB representatives. From the method outlined, hmmer uses amino acid sequences instead of nuclotides, so I was wondering if I could reverse translate these back into their nucleotide sequences?

Has anyone tried this before? Could this be done by feeding bac120_msa_marker_genes_reps and the nucleotide version of bac120_marker_genes_reps into a tool such as pal2nal?

Any tips appreciated.
Many thanks

Hi Dan. I think this would be fairly complicated since the aligned marker genes are trimmed. Aligning with HMMer also means that only columns fitting the HMM model are retained. I’d recommended directly performing a MSA on the nucleotide sequences in bac120_marker_genes_reps_<release>.tar.gz.

Hi Donovan.

Thanks for your reply this is what I have tried. Has not worked overly well yet but will get there. If there is a desire for it my scripts all use OSS and I would be happy to try and make them reproducible and useful to the community.

Btw I have found your documentation and release files very easy to use and navigate. :slight_smile:


Hi Dan,

Thanks for the kind words regarding the documentation and release files. We are sharing GTDB-related scripts with the community via the Tools->Third party page:

More than happy to link to your scripts if you wish to share them.