multi-gene_phylogeny_pipeline
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| multi-gene_phylogeny_pipeline [2017/12/06 13:56] – 129.173.88.84 | multi-gene_phylogeny_pipeline [2018/03/10 11:07] (current) – 173.212.69.201 | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| Documentation by Kate Glennon, Sarah Shah, Shelby Williams, and Tommy Harding. | Documentation by Kate Glennon, Sarah Shah, Shelby Williams, and Tommy Harding. | ||
| - | The **Bordor** dataset is a set of 351 housekeeping genes that are well-conserved across all eukaryotes. This pipeline uses the gene sequences from // | + | The **Bordor** dataset is a set of 351 housekeeping genes that are well-conserved across all eukaryotes. This pipeline uses the gene sequences from // |
| + | |||
| + | All the original transcriptom/ | ||
| **The Pipeline Overview** | **The Pipeline Overview** | ||
| Line 103: | Line 105: | ||
| #$ -pe threaded 8 | #$ -pe threaded 8 | ||
| - | python AddPipeline3.0a.py <short name> <**complete species name with only one “_” in between the genus and species: Genus_species**> | + | python AddPipeline3.0a.py <short name> < |
| </ | </ | ||
| - | The number “1” refers to the standard genetic code. Use “NUC” if your fasta file contains nucleotide sequences, or change it to “AA” for protein sequences. Say “yes” for the last flag at the end of the line if you want your alignments to be trimmed by bmge. Edit the date attached to “END*” to match today’s date. Make sure the AddPipeline3.X.py in your “START*” folder matches the one in this shell script. As of now, AddPipeline3.0a.py is the latest version. Then qsub Bordor.sh | + | The number “1” refers to the standard genetic code. Use “NUC” if your fasta file contains nucleotide sequences, or change it to “AA” for protein sequences. Say “yes” for the last flag at the end of the line if you want your alignments to be trimmed by bmge. Edit the date attached to “END*” to match today’s date. Make sure the AddPipeline3.X.py in your “START*” folder matches the one in this shell script. As of now, AddPipeline3.0a.py is the latest version. **Make sure your "short name" and "long name" are correct, i.e. 8 characters for the former, and the latter must have one " |
| NOTE: If you need to add sequences from several taxa: in step 1, instead of renaming the “END*” folder “START*”, | NOTE: If you need to add sequences from several taxa: in step 1, instead of renaming the “END*” folder “START*”, | ||
| Line 129: | Line 131: | ||
| </ | </ | ||
| This will sequentially add the appropriate sequences for all the organisms of interest to the Bordor dataset. Trimming will not occur until the last taxon is added. | This will sequentially add the appropriate sequences for all the organisms of interest to the Bordor dataset. Trimming will not occur until the last taxon is added. | ||
| - | + | ||
| + | NOTE2: If you have alignment files from someone else, and you want to add your own transcriptomes to them, move the alignment files in the folder " | ||
| Step 4: If everything went as expected, there will be a folder named “bmge_trimmed_old” in the “END*” folder. Download a bunch of *.faa (aligned non-trimmed sequences) and *.bmge.fas (trimmed aligned sequences) files to your computer and examine them with a sequence viewer such as AliView. The last line(s) is the sequence from your transcriptome/ | Step 4: If everything went as expected, there will be a folder named “bmge_trimmed_old” in the “END*” folder. Download a bunch of *.faa (aligned non-trimmed sequences) and *.bmge.fas (trimmed aligned sequences) files to your computer and examine them with a sequence viewer such as AliView. The last line(s) is the sequence from your transcriptome/ | ||
multi-gene_phylogeny_pipeline.1512582986.txt.gz · Last modified: by 129.173.88.84
