cgeb2001's DokuWiki!

This is an old revision of the document!

upcoming

Here's a list of commands to retrieve deleted text in vi:

u	Undoes the last change
U	Undoes changes to current line
p	"Puts" last delete after cursor
P	"Puts" last delete before cursor
1p	"Puts" next-to-last delete after cursor
2p	"Puts" second-to-last delete after cursor

Multiple FASTA seqeunces into one line https://www.biostars.org/p/9262/

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < file.fa

Yield the mega FASTA file with each the sequence ID as the file name (linearised FASTA) https://www.biostars.org/p/455915/

#split.sh 

numseqs=$(grep -c ">" "$1");
numlines=$(wc -l < "$1");
if (( "$numlines" > $(( 2*$numseqs )) )); then
    echo "The fasta file needs to be linearised before this function will work.";
    return 1;
fi;

while read line; do
    if [ "${line:0:1}" == ">" ]; then
        header="$line";
        filename=$(echo "${line#>}" | sed 's/\ .*//g');
        touch "$filename".fasta
        echo "$header" >> "${filename}".fasta;
    else
        seq="$line";
        echo "$seq" >> "${filename}".fasta;
    fi;
done < $1

#usage: ./split.sh test.fasta

Speed up the script running after chopping the input files (applicable for any scripts)

#interproscan.sh 

while read line
do

/scratch2/software/interproscan-5.52-86.0/interproscan.sh -i data/$line.fasta -f tsv -dp -goterms -pa -o results/$line.tsv

done <$1

# usage: ./interproscan.sh gene_id.txt
# note:must leave extra line break for gene_id.txt