unix_shell
This is an old revision of the document!
upcoming
Here's a list of commands to retrieve deleted text in vi:
u Undoes the last change U Undoes changes to current line p "Puts" last delete after cursor P "Puts" last delete before cursor 1p "Puts" next-to-last delete after cursor 2p "Puts" second-to-last delete after cursor
Multiple FASTA seqeunces into one line https://www.biostars.org/p/9262/
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < file.fa
Yield the mega FASTA file with each the sequence ID as the file name (linearised FASTA) https://www.biostars.org/p/455915/
#split.sh
numseqs=$(grep -c ">" "$1");
numlines=$(wc -l < "$1");
if (( "$numlines" > $(( 2*$numseqs )) )); then
echo "The fasta file needs to be linearised before this function will work.";
return 1;
fi;
while read line; do
if [ "${line:0:1}" == ">" ]; then
header="$line";
filename=$(echo "${line#>}" | sed 's/\ .*//g');
touch "$filename".fasta
echo "$header" >> "${filename}".fasta;
else
seq="$line";
echo "$seq" >> "${filename}".fasta;
fi;
done < $1
#usage: ./split.sh test.fasta
Speed up the script running after chopping the input files (applicable for any scripts)
#interproscan.sh while read line do /scratch2/software/interproscan-5.52-86.0/interproscan.sh -i data/$line.fasta -f tsv -dp -goterms -pa -o results/$line.tsv done <$1 # usage: ./interproscan.sh gene_id.txt # note:must leave extra line break for gene_id.txt
<Last updated by Xi Zhang on Oct 6th,2021>
unix_shell.1636413168.txt.gz · Last modified: by 38.20.199.40
