phylogeny_protocol
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| phylogeny_protocol [2021/09/03 15:06] – 134.190.232.139 | phylogeny_protocol [2021/09/29 12:53] (current) – 134.190.232.139 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | **Indexing sequences** | + | **1. Indexing sequences** |
| What if you have a dataset of interested gene names/IDs, and you want to figure out the homologous and paralogous related genes. The easiest way is to blast against NCBI, but wait! Where to find the protein or nucleotide sequences of your interested genes. Sure! NCBI name search could be the way, but what if your gene ID is not from NCBI and you have thousands of interested genes, e.g., At2G01130, AT4G00010, etc., these are from TAIR10 database (A.thaliana). You are going to need this resource guide. Note: //not listed all excellent resource.// | What if you have a dataset of interested gene names/IDs, and you want to figure out the homologous and paralogous related genes. The easiest way is to blast against NCBI, but wait! Where to find the protein or nucleotide sequences of your interested genes. Sure! NCBI name search could be the way, but what if your gene ID is not from NCBI and you have thousands of interested genes, e.g., At2G01130, AT4G00010, etc., these are from TAIR10 database (A.thaliana). You are going to need this resource guide. Note: //not listed all excellent resource.// | ||
| Line 20: | Line 20: | ||
| As mentioned in the very beginning, if your interested gene name OR ID has nothing to do with NCBI and you need the Fasta sequence. There is a simple way to do this via a custom script | As mentioned in the very beginning, if your interested gene name OR ID has nothing to do with NCBI and you need the Fasta sequence. There is a simple way to do this via a custom script | ||
| - | __index_header_to_seq.py__ (https:// | + | __index_header_to_seq.py__ (https:// |
| < | < | ||
| Line 33: | Line 33: | ||
| - | Now, feel free to explore your interested genes via the BLAST+ and v5 database user guide (please refer http:// | + | Now, feel free to explore your interested genes via the BLAST+ and v5 database user guide (please refer to http:// |
| - | **Creating | + | **2. Creating |
| Software resource: | Software resource: | ||
| - | Clustal Omega 1.2.3 | + | - Clustal Omega 1.2.3 |
| - | trimAl v1.2 | + | |
| + | - FastTree 2.1 | ||
| + | Clustal Omega 1.2.3 (http:// | ||
| - | Clustal Omega 1.2.3 (http://www.clustal.org/omega/) | + | < |
| - | trimAl v1.2 (http:// | + | #For ubuntu system, simply run this to install |
| + | sudo apt install clustalo | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | |||
| + | ./ | ||
| + | </ | ||
| + | |||
| + | Note: //For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT.// https://www.ebi.ac.uk/Tools/msa/ | ||
| + | |||
| + | trimAl v1.2 (http:// | ||
| (http:// | (http:// | ||
| - | building | + | A very common way of using trimAl v1.2 to trim an alignment is to use just a gap threshold |
| + | (the minimum fraction of sequences without a gap that you require to consider a column of “enough quality”).Note: | ||
| + | < | ||
| + | trimal -in example1 -out output1 -htmlout output1.html -gt 1 | ||
| + | </ | ||
| + | |||
| + | Sometimes one does not know which alignment algorithm will perform best (or which parameters, e.g gap penalties). A way out is to just produce different alignments with the different algorithms and then choose the alignment that contains the most consistent residue-pairings, | ||
| + | < | ||
| + | trimal -compareset fileset1 -out output4 | ||
| + | trimal -compareset fileset1 -out output5 -htmlout output5.html -ct 0.5 | ||
| + | </ | ||
| + | |||
| + | FastTree infers approximately-maximum-likelihood phylogenetic | ||
| + | |||
| + | < | ||
| + | FastTree < alignment_file > tree_file | ||
| + | </ | ||
| + | |||
| + | **3. dN/dS analysis** | ||
| + | |||
| + | Software requirements: | ||
| + | - PAML package | ||
| + | - pal2nal | ||
| + | - Clustal Omega | ||
| + | - FastTree | ||
| + | |||
| + | The calculation of synonymous (dS) and non-synonymous (dN) substitution rates is important to infer the evolutionary driving force: positive selection (dN/ | ||
| + | |||
| + | |||
| + | PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. http:// | ||
| + | |||
| + | PAL2NAL is a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment.http:// | ||
| + | |||
| + | This is an example of batch script when dealing with dN/dS among thousands genes. | ||
| + | |||
| + | < | ||
| + | # | ||
| + | for i in *.txt | ||
| + | do | ||
| + | perl pal2nal.pl amino_acid.fa nucleotide.fa -out paml.file -nogap > folder/$i | ||
| + | done | ||
| + | </ | ||
| - | Calculating dN/dS | + | Shell script: codeml and configure file: codeml.ctl |
| - | Note: Please refer to the guide for the most updated information. | + | Note: Please refer to the latest version of software |
| - | {{: | ||
| <Last updated by Xi Zhang on Sep 3rd, | <Last updated by Xi Zhang on Sep 3rd, | ||
phylogeny_protocol.1630692398.txt.gz · Last modified: by 134.190.232.139
