I would like to narrow down phlygenetic tree only based on type materials

Hi, GTDB team. Thank you for the development and maintenance of awesome software.
I have quenstions.
I am preparing to submit a paper about novel isolates to IJSEM. I am using GTDB to create a phylogenetic tree. When I output the tree in GTDB-tk, GTDB accession numbers are displayed at a time. Is there any argument following ‘classify_wf’ to narrow down the search to only standard strains admitted by LSPN? In you web page, I can narrow down the search to only NCBI and/or GTDB standard strains.
In case I cannot find argument, I am looking for a chance to create python script using your API. Could you please tell me how to use the API to meet my request?
Thank you in advance.
Yours sincerely,
Hamaguchi

Hi Hamaguchi,

Thank you!
We are happy to hear that you find our resources valuable for your research.

Please note that you can use all GTDB-Tk commands individually, meaning that you can narrow down your dataset as desired. See here: Example — GTDB-Tk 2.3.0 documentation
You can create your own ‘Genome directory’ that will contain genomes of the type strains of interest and exclude GTDB genomes from the consideration for tree inference (if this is what you want to do). The list of type strain genomes for a specific lineage can be generated from the Advance search as you mentioned, and then their accessions can be used to get the corresponding fasta files (you can now download the selected genomes directly from NCBI in our Advance Search menu).

If I understand you correctly, you want to include only genomes from the type strains (?) or those that are annotated with the validly published species names (?). Is this correct?

Could you please clarify your request?

Thank you!
Masha

Hi Masha,

If I understand you correctly, you want to include only genomes from the type strains (?) or those that are annotated with the validly published species names (?).

I would like to include only genomes which are admitted by LPSN, in turn, validly published species, in my phylogenetic tree.

I did “Advanced search” to download multiple fasta files of the type strains of interest. I downloaded sh. file, named as “gtdb-adv-search-genomes.sh”. I think this is the “Genoeme directory” which you mentioned.
I have another question.
How can I apply my unique “Genome directory” to GTDB-tk?
I look inside directory “release 214” and found several directories like “fastani”, “markers”, “mash”, etc.
Do I replace some directory to my “Genome directory”? Or is there any argument when alignment?

Thank you as always.
Hamaguchi

1 Like

Hi Marsha,

I think I might have a solution.
I create my own “Genome directory” containing genomes of the type strains of interest. I add fasta of my isolated bacteria to it. Then, I proceed to the identify step.
Then, I create a multiple sequence alignment with argument “–skip_gtdb_refs”.
I hope this workflow fits my need to create phylogenetic tree only with validated strains.

Could you give me some advice?

Thank you!
Hamaguchi

1 Like

Hi Hamaguchi,

Thank you and sorry for my slow response!
Do you still need any advice from me?

Please let me know!

Best wishes,
Masha