This is an old revision of the document!
The command awk can be really useful to edit or parse tabulated files (for example: blast output in columns separated by a tabulation = -outfmt 6; or gff files).
By default, awk scans a file line by line, whereby a line is ending by a carriage return (\n) and further split the line into fields, by default separated by a tabulation “\t” although other field separators can be defined.
We will see how to use awk on a blast output file (-outfmt 6)
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, send, evalue, bit score user$ head blast.output BUSSELTON_g28320.t1 Seq_26_pilon_pilon 45.45 66 34 2 27 92 266496 266305 2e-07 57.4 BUSSELTON_g29060.t1 Seq_133_pilon_pilon 24.01 279 171 9 398 668 26316 27053 6e-13 74.7 BUSSELTON_g29060.t1 Seq_35_pilon_pilon 32.67 150 83 6 531 678 24051 23650 1e-07 57.4 BUSSELTON_g29223.t1 Seq_17_pilon_pilon 46.67 195 79 1 1103 1272 499049 499633 9e-49 193 BUSSELTON_g29223.t1 Seq_17_pilon_pilon 68.89 90 27 1 594 683 498684 498950 8e-44 137 BUSSELTON_g29223.t1 Seq_17_pilon_pilon 77.14 35 8 0 684 718 498950 499054 8e-44 62.0 BUSSELTON_g29223.t1 Seq_17_pilon_pilon 37.75 151 93 1 1381 1531 499664 500113 6e-23 108
How to invert 2 columns (fields) ex: Inverting the query (column 1) and the target (column 2)
awk -F "\t" '{print $2"\t"$1"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12}' file
How to use the if statement ex 1: printing a line if the name of the query (first column) contains “”
ex 2: printing a line if the start of the hit in the target sequence (column X) is greater than XXX
How to use the if statement with 2 conditions printing a line if the name of the query contains “” AND if the the hit in the target sequence (column X) is greater than XXX
printing a line if the name of the query contains “” OR if the the hit in the target sequence (column X) is greater than XXX
How to use the if and else statments printing the first column of a line if the query (first column) contains “”, else print the full line
How to make numeric operations
