Where can I found universal genie/mark gene set for each database genome

Dear GTDB team,

I am wondering where I could find the universal gene set extracted from each database genome (in amino acid/protein format) for the newest release. I can only find the msa in the database directory. You must have extracted them right before doing msa?

Thanks,

Jianshu

Hi Jianshu,

You can find a description of all files provided by the GTDB at:
https://data.gtdb.ecogenomic.org/releases/release202/202.0/FILE_DESCRIPTIONS

I believe you are after the following files:
./genomic_files_all/ar122_marker_genes_all_r202.tar.gz
./genomic_files_all/bac120_marker_genes_all_r202.tar.gz

Cheers,
Donovan

Thanks Donovan,

I thought you guys find entire marker gene set for each genome and then split each genome and organize according to which marker gene, is there any place here each genome with it’s own marker set, not organized by marker gene name? It should be very easy to do to split but just in case you have done it.

Jianshu

Thanks,

Hi Jianshu,

Sorry - we don’t have the data organized as you are describing. Internally, we end up doing something like this to infer the tree, but this is immediately aligned and trimmed to form the MSA.

Cheers,
Donovan