User Tools

Site Tools


multi-gene_phylogeny_pipeline

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
multi-gene_phylogeny_pipeline [2018/02/05 12:04] 129.173.88.82multi-gene_phylogeny_pipeline [2018/03/10 11:07] (current) 173.212.69.201
Line 4: Line 4:
 Documentation by Kate Glennon, Sarah Shah, Shelby Williams, and Tommy Harding. Documentation by Kate Glennon, Sarah Shah, Shelby Williams, and Tommy Harding.
    
-The **Bordor** dataset is a set of 351 housekeeping genes that are well-conserved across all eukaryotes. This pipeline uses the gene sequences from //Arabidopsis thaliana// as queries to fish for homologues during the BLAST step.+The **Bordor** dataset is a set of 351 housekeeping genes that are well-conserved across all eukaryotes. This pipeline uses the gene sequences from //Arabidopsis thaliana// as queries to fish for homologues during the BLAST step. Credit goes to Matt Brown & co.: [[https://doi.org/10.1093/gbe/evy014]], [[https://doi.org/10.1093/molbev/msx162]]
  
 All the original transcriptom/proteome files that are in the Bordor alignment is in **/scratch2/mbrown/PhylogenomicDatabases** All the original transcriptom/proteome files that are in the Bordor alignment is in **/scratch2/mbrown/PhylogenomicDatabases**
Line 131: Line 131:
 </code> </code>
 This will sequentially add the appropriate sequences for all the organisms of interest to the Bordor dataset. Trimming will not occur until the last taxon is added. This will sequentially add the appropriate sequences for all the organisms of interest to the Bordor dataset. Trimming will not occur until the last taxon is added.
- + 
 +NOTE2: If you have alignment files from someone else, and you want to add your own transcriptomes to them, move the alignment files in the folder "old_aln" in your START folder. 
 Step 4: If everything went as expected, there will be a folder named “bmge_trimmed_old” in the “END*” folder. Download a bunch of *.faa (aligned non-trimmed sequences) and *.bmge.fas (trimmed aligned sequences) files to your computer and examine them with a sequence viewer such as AliView. The last line(s) is the sequence from your transcriptome/protein data that was aligned to the other sequences of that particular gene. Make sure they look aligned; for instance, if all other sequences have a “GGG” in a specific location then you should expect your sequence to have the same. The .bmge.fas files are .faa files in which the badly-aligned positions were trimmed away.  Step 4: If everything went as expected, there will be a folder named “bmge_trimmed_old” in the “END*” folder. Download a bunch of *.faa (aligned non-trimmed sequences) and *.bmge.fas (trimmed aligned sequences) files to your computer and examine them with a sequence viewer such as AliView. The last line(s) is the sequence from your transcriptome/protein data that was aligned to the other sequences of that particular gene. Make sure they look aligned; for instance, if all other sequences have a “GGG” in a specific location then you should expect your sequence to have the same. The .bmge.fas files are .faa files in which the badly-aligned positions were trimmed away. 
    
multi-gene_phylogeny_pipeline.1517846696.txt.gz · Last modified: by 129.173.88.82