bioinformatics_tools2
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| bioinformatics_tools2 [2021/10/08 16:45] – 134.190.232.9 | bioinformatics_tools2 [2022/02/28 11:53] (current) – 134.190.232.106 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| - | **Approach One: submit array jobs** | + | **Approach One: submit for loop shell script** |
| + | |||
| + | < | ||
| + | #script: shell.sh | ||
| + | |||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -o logfile | ||
| + | #$ -pe threaded 20 | ||
| + | #export PATH=/ | ||
| + | |||
| + | while read line | ||
| + | do | ||
| + | |||
| + | mafft --auto --thread 20 / | ||
| + | |||
| + | / | ||
| + | |||
| + | FastTree / | ||
| + | |||
| + | done <$1 | ||
| + | </ | ||
| + | |||
| + | This script need you have a list of sequence name and sensitive with only ID. Run the script like this: | ||
| + | |||
| + | Note: $line.ko.txt VS $line_ko.txt, | ||
| + | |||
| + | < | ||
| + | #pure name_list file of your fasta, e.g. | ||
| + | Gene1 | ||
| + | Gene2 | ||
| + | Gene3 | ||
| + | |||
| + | #This can be easily acquired via : | ||
| + | grep '>' | ||
| + | |||
| + | # If your FASTA seq includes gene descriptions e.g., directly retrieved from NCBI | ||
| + | > gen1 hypothetical protein balabalala | ||
| + | TAGTTAGTCGATCGTACGTA | ||
| + | |||
| + | Simply run: awk ' | ||
| + | |||
| + | #Then run the shell script. | ||
| + | chmod +x shell.sh | ||
| + | ./shell.sh name_list.txt | ||
| + | </ | ||
| + | |||
| + | # must leave one line break for the list.txt file, otherwise the last line will not be proceeded. | ||
| + | |||
| + | |||
| + | **Approach Two: submit array jobs** | ||
| Below is a real case to BLAST thousands of genes against NCBI-nr database. However, it could take weeks running if we BLAST whole gene against the nr database directly. | Below is a real case to BLAST thousands of genes against NCBI-nr database. However, it could take weeks running if we BLAST whole gene against the nr database directly. | ||
| Line 44: | Line 96: | ||
| If you are familiar with ${SGE_TASK_ID}, | If you are familiar with ${SGE_TASK_ID}, | ||
| - | - Method one: using ' | + | * Method one: using ' |
| < | < | ||
| Line 52: | Line 104: | ||
| # So in this case: -query / | # So in this case: -query / | ||
| # will be renamed to | # will be renamed to | ||
| - | # -query / | + | # -query / |
| # Technically, | # Technically, | ||
| </ | </ | ||
| Line 83: | Line 135: | ||
| for f in 0* | for f in 0* | ||
| do | do | ||
| - | |||
| python3 index_header_to_seq.py ####.fasta $f $f.fa | python3 index_header_to_seq.py ####.fasta $f $f.fa | ||
| - | |||
| done | done | ||
| Line 91: | Line 141: | ||
| </ | </ | ||
| - | - Method two: Run shell script split.sh | + | * Method two: Run shell script split.sh |
| < | < | ||
| Line 120: | Line 170: | ||
| - | But what if we change the code to this, the CPUs can be then efficiently used. | + | <Last updated by Xi Zhang on Oct 8th, |
| - | + | ||
| - | <Last updated by Xi Zhang on Oct 8th, | + | |
bioinformatics_tools2.1633722324.txt.gz · Last modified: by 134.190.232.9
