unix_shell

The commonly used scripts collected from previous Github page and course materials.

1.How to set up timer for specific script run?

#!/bin/bash
# Twosday.sh

while true;
do
    DATE=`date | cut -d' ' -f4`
    #echo $DATE
    if [[ $DATE == "22:22:22" ]]
    then
            echo "this is a test program" >> xyz.log
            qrsh
            sleep 1s
    fi
done

#stopwatch
#https://superuser.com/questions/611538/is-there-a-way-to-display-a-countdown-or-stopwatch-timer-in-a-terminal
while true; do echo -ne "`date`\r"; done
#nanoseconds:
while true; do echo -ne "`date +%H:%M:%S:%N`\r"; done

2.Here's a list of commands to retrieve deleted text in vi:

u	Undoes the last change
U	Undoes changes to current line
p	"Puts" last delete after cursor
P	"Puts" last delete before cursor
1p	"Puts" next-to-last delete after cursor
2p	"Puts" second-to-last delete after cursor

Multiple FASTA seqeunces into one line https://www.biostars.org/p/9262/

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < file.fa

Yield the mega FASTA file with each the sequence ID as the file name (linearised FASTA) https://www.biostars.org/p/455915/

#split.sh 

numseqs=$(grep -c ">" "$1");
numlines=$(wc -l < "$1");
if (( "$numlines" > $(( 2*$numseqs )) )); then
    echo "The fasta file needs to be linearised before this function will work.";
    return 1;
fi;

while read line; do
    if [ "${line:0:1}" == ">" ]; then
        header="$line";
        filename=$(echo "${line#>}" | sed 's/\ .*//g');
        touch "$filename".fasta
        echo "$header" >> "${filename}".fasta;
    else
        seq="$line";
        echo "$seq" >> "${filename}".fasta;
    fi;
done < $1

#usage: ./split.sh test.fasta

Speed up the script running after chopping the input files (applicable for any scripts)

#interproscan.sh 

while read line
do

/scratch2/software/interproscan-5.52-86.0/interproscan.sh -i data/$line.fasta -f tsv -dp -goterms -pa -o results/$line.tsv

done <$1

# usage: ./interproscan.sh gene_id.txt
# note:must leave extra line break for gene_id.txt

#Remove duplicates in fasta file based on ID

https://www.biostars.org/p/321641/

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' 1.fa|sort -t $'\t' -k1,1 -u |tr "\t" "\n"

https://www.biostars.org/p/143617/

awk '/^>/{f=!d[$1];d[$1]=1}f'

https://github.com/brentp/pyfasta
# To install python2 version
pip install pyfasta
# To install python3 version
pip3 install pyfasta

#split a fasta file into 6 new files of relatively even size:
pyfasta split -n 6 original.fasta

Rename file name with the one user wanted https://linuxgazette.net/18/bash.html https://unix.stackexchange.com/questions/580506/renaming-multiple-files-using-a-loop

for i in *; do -- mv "$i" "${i/%.fasta/.fa}"; done
for f in *; do mv "$f" "${f#*_}";done

for i in chr*
do
  mv -- "$i" "${i/%.fasta/.fa}"
done
or

for i in chr*
do
  NEWNAME="${i/%.fasta/.fa}"
  mv -- "$i" "$NEWNAME"
done
The "%{var/%pat/replacement}" looks for pat only at the end of the variable and replaces it with replacement.

Usage:
    rename [ -h|-m|-V ] [ -v ] [ -n ] [ -f ] [ -e|-E perlexpr]*|perlexpr
    [ files ]

How to use sort in linux, especially ignore the header to sort special col?

(head -n 1 all.txt && tail -n +2 all.txt| sort -n -r -k2) > 123.txt

# sort -n
#sort -r
#sort -k
https://shapeshed.com/unix-sort/; https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort