Differences

This shows you the differences between two versions of the page.

--- bioinformatics_tools2 [2021/10/08 16:53] – 134.190.232.9
+++ bioinformatics_tools2 [2022/02/28 11:53] (current) – 134.190.232.106
@@ Line 1: / Line 1: @@
 Usually Perun can be used to submit jobs via: qsub -q 768G-batch script.sh or qsub -q 256-batch script.sh; However, what if you have thousands of scripts waiting for running, are you going to submit thousands of shell script manually? That is definitely terrible. Here will introduce two approaches to realize submitting batch scripts/tasks in Perun. It depends on your own preference to use one of which.
-**Approach One: submit array jobs**
+**Approach One: submit for loop shell script**
+<code>
+#script: shell.sh
+#!/bin/bash
+#$ -S /bin/bash
+. /etc/profile
+#$ -cwd
+#$ -o logfile
+#$ -pe threaded 20
+#export PATH=/scratch2/software/anaconda/bin:$PATH
+while read line
+do
+mafft --auto --thread 20 /misc/scratch2/####/$line.fasta >/misc/scratch2/####/aligned/$line.aligned.fasta
+/scratch2/software/anaconda/envs/bmge/bin/bmge -i /misc/scratch2/####/aligned/$line.aligned.fasta -t AA -m BLOSUM30 -of /misc/scratch2/xizhang/####/trimmed/$line.aligned.trimmed.fasta
+FastTree /misc/scratch2/####/trimmed/$line.aligned.trimmed.fasta > /misc/scratch2/####/fasttree/$line.aligned.trimmed.newick
+done <$1
+</code>
+This script need you have a list of sequence name and sensitive with only ID. Run the script like this:
+Note: $line.ko.txt VS $line_ko.txt, the later one cannot be recognized due to "_" before ko.txt, so I suggest avoid "_" before ko.txt.
+<code>
+#pure name_list file of your fasta, e.g.
+Gene1
+Gene2
+Gene3
+#This can be easily acquired via :
+grep '>' ###.fasta|sed 's/>//g' > name_list.txt
+# If your FASTA seq includes gene descriptions e.g., directly retrieved from NCBI
+> gen1 hypothetical protein balabalala
+TAGTTAGTCGATCGTACGTA
+Simply run: awk '{print $1}' seq.fasta > clean_name_id.fasta
+#Then run the shell script.
+chmod +x shell.sh
+./shell.sh name_list.txt
+</code>
+# must leave one line break for the list.txt file, otherwise the last line will not be proceeded.
+**Approach Two: submit array jobs**
 Below is a real case to BLAST thousands of genes against NCBI-nr database. However, it could take weeks running if we BLAST whole gene against the nr database directly.
@@ Line 43: / Line 96: @@
 If you are familiar with ${SGE_TASK_ID}, you will know the real difficult is how to prepare each fasta file with the number as the name, e.g.1.fa, 2.fa, 3.fa, 4.fa. I collect some small but efficient scripts to realize that.
-  - Method one: using 'csplit' function
+    * Method one: using 'csplit' function
   <code>
@@ Line 51: / Line 104: @@
   # So in this case: -query /misc/scratch2/####/${SGE_TASK_ID}.fa
   #  will be renamed to
-  # -query /misc/scratch2/####/**0**{SGE_TASK_ID}
+  # -query /misc/scratch2/####/0{SGE_TASK_ID}
   # Technically, you can change 0 to whatever you want it is just a file name prefix.
   </code>
@@ Line 88: / Line 141: @@
 </code>
-   - Method two: Run shell script split.sh
+    * Method two: Run shell script split.sh
 <code>