User Tools

Site Tools


awk_for_tabulated_files

This is an old revision of the document!


The command awk can be really useful to edit or parse tabulated files (for example: blast output in columns separated by a tabulation = -outfmt 6; or gff files).

By default, awk scans a file line by line, whereby a line is ending by a carriage return (\n) and further split the line into fields, by default separated by a tabulation “\t” although other field separators can be defined.

We will see how to use awk on a blast output file (-outfmt 6)

# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, send, evalue, bit score
user$ head blast.output
BUSSELTON_g28320.t1	Seq_26_pilon_pilon	45.45	66	34	2	27	92	266496	266305	2e-07	57.4
BUSSELTON_g29060.t1	Seq_133_pilon_pilon	24.01	279	171	9	398	668	26316	27053	6e-13	74.7
BUSSELTON_g29060.t1	Seq_35_pilon_pilon	32.67	150	83	6	531	678	24051	23650	1e-07	57.4
BUSSELTON_g29223.t1	Seq_17_pilon_pilon	46.67	195	79	1	1103	1272	499049	499633	9e-49	 193
BUSSELTON_g29223.t1	Seq_17_pilon_pilon	68.89	90	27	1	594	683	498684	498950	8e-44	 137
BUSSELTON_g29223.t1	Seq_17_pilon_pilon	77.14	35	8	0	684	718	498950	499054	8e-44	62.0
BUSSELTON_g29223.t1	Seq_17_pilon_pilon	37.75	151	93	1	1381	1531	499664	500113	6e-23	 108

How to invert 2 columns (fields) ex: Inverting the query (column 1) and the target (column 2)

awk -F "\t" '{print $2"\t"$1"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12}' file

How to use the if statement ex 1: printing a line if the name of the query (first column) contains “”

ex 2: printing a line if the start of the hit in the target sequence (column X) is greater than XXX

How to use the if statement with 2 conditions printing a line if the name of the query contains “” AND if the the hit in the target sequence (column X) is greater than XXX

printing a line if the name of the query contains “” OR if the the hit in the target sequence (column X) is greater than XXX

How to use the if and else statments printing the first column of a line if the query (first column) contains “”, else print the full line

How to make numeric operations

awk_for_tabulated_files.1625583202.txt.gz · Last modified: by 156.34.16.174