awk_for_tabulated_files
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| awk_for_tabulated_files [2021/07/06 12:07] – 156.34.16.174 | awk_for_tabulated_files [2021/07/06 12:42] (current) – 156.34.16.174 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | The command awk can be really useful to edit or parse tabulated files (for example: blast output | + | The command awk can be really useful to edit or parse tabulated files (for example: blast outputs |
| By default, awk scans a file line by line, whereby a line is ending by a carriage return (\n) and further split the line into fields, by default separated by a tabulation " | By default, awk scans a file line by line, whereby a line is ending by a carriage return (\n) and further split the line into fields, by default separated by a tabulation " | ||
| + | |||
| Line 17: | Line 18: | ||
| </ | </ | ||
| - | How to invert 2 columns (fields) | + | |
| + | How to invert 2 columns (fields) | ||
| ex: Inverting the query (column 1) and the target (column 2) | ex: Inverting the query (column 1) and the target (column 2) | ||
| < | < | ||
| Line 31: | Line 33: | ||
| </ | </ | ||
| - | How to use the **if** statement | + | |
| + | How to use the **if** statement | ||
| ex 1: printing a line if the name of the query (first column) contains " | ex 1: printing a line if the name of the query (first column) contains " | ||
| < | < | ||
| Line 42: | Line 45: | ||
| BUSSELTON_g29223.t1 Seq_17_pilon_pilon 37.75 151 93 1 1381 1531 499664 500113 6e-23 108 | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 37.75 151 93 1 1381 1531 499664 500113 6e-23 108 | ||
| #the query in the first line BUSSELTON_g28320.t1 is the only one that do not contains " | #the query in the first line BUSSELTON_g28320.t1 is the only one that do not contains " | ||
| + | </ | ||
| + | |||
| + | |||
| + | ex 2: printing a line if the start of the hit in the target sequence (column 9 (s. start)) is greater than 499000 | ||
| < | < | ||
| + | $user awk -F " | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 46.67 195 79 1 1103 1272 499049 499633 9e-49 193 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 37.75 151 93 1 1381 1531 499664 500113 6e-23 108 | ||
| + | </ | ||
| - | ex 2: printing a line if the start of the hit in the target sequence (column | + | How to use the **if** statement with 2 conditions \\ |
| + | ex: printing a line if the name of the query contains " | ||
| + | < | ||
| + | $user awk -F " | ||
| + | BUSSELTON_g29060.t1 Seq_133_pilon_pilon 24.01 279 171 9 398 668 26316 27053 6e-13 74.7 | ||
| + | BUSSELTON_g29060.t1 Seq_35_pilon_pilon 32.67 150 83 6 531 678 24051 23650 1e-07 57.4 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 68.89 90 27 1 594 683 498684 498950 8e-44 137 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 77.14 35 8 0 684 718 498950 499054 8e-44 62.0 | ||
| + | #&& mean AND, both conditions must be filled | ||
| </ | </ | ||
| + | |||
| + | |||
| + | ex: printing a line if the name of the query contains " | ||
| < | < | ||
| - | How to use the **if** statement with 2 conditions | + | user$ awk -F "\t" |
| - | printing a line if the name of the query contains | + | BUSSELTON_g28320.t1 Seq_26_pilon_pilon 45.45 66 34 2 27 92 266496 266305 2e-07 57.4 |
| + | BUSSELTON_g29060.t1 Seq_133_pilon_pilon 24.01 279 171 9 398 668 26316 27053 6e-13 74.7 | ||
| + | BUSSELTON_g29060.t1 Seq_35_pilon_pilon 32.67 150 83 6 531 678 24051 23650 1e-07 57.4 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 46.67 195 79 1 1103 1272 499049 499633 9e-49 193 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 68.89 90 27 1 594 683 498684 498950 8e-44 137 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 77.14 35 8 0 684 718 498950 499054 8e-44 62.0 | ||
| + | BUSSELTON_g29223.t1 Seq_17_pilon_pilon 37.75 151 93 1 1381 1531 499664 500113 6e-23 108 | ||
| + | #the 2 pipes caracters || mean OR, either conditions must be filled | ||
| + | </ | ||
| - | printing a line if the name of the query contains "" | ||
| - | How to use the if and else statments | + | How to use the if and else statments |
| - | printing the first column of a line if the query (first column) contains "", | + | ex: printing the first column of a line if the query (first column) contains "g29", else print the full line |
| + | < | ||
| + | user$ awk -F " | ||
| + | BUSSELTON_g28320.t1 Seq_26_pilon_pilon 45.45 66 34 2 27 92 266496 266305 2e-07 57.4 | ||
| + | BUSSELTON_g29060.t1 | ||
| + | BUSSELTON_g29060.t1 | ||
| + | BUSSELTON_g29223.t1 | ||
| + | BUSSELTON_g29223.t1 | ||
| + | BUSSELTON_g29223.t1 | ||
| + | BUSSELTON_g29223.t1 | ||
| + | </ | ||
| + | |||
| + | How to make numeric operations on certain fields \\ | ||
| + | ex 1: printing the column 2, the column 10 and the column 10 -100 | ||
| + | < | ||
| + | user$ awk -F " | ||
| + | Seq_26_pilon_pilon 266305 266205 | ||
| + | Seq_133_pilon_pilon 27053 26953 | ||
| + | Seq_35_pilon_pilon 23650 23550 | ||
| + | Seq_17_pilon_pilon 499633 499533 | ||
| + | Seq_17_pilon_pilon 498950 498850 | ||
| + | Seq_17_pilon_pilon 499054 498954 | ||
| + | Seq_17_pilon_pilon 500113 500013 | ||
| + | </ | ||
| + | |||
| + | ex 2: if the column 10 is greater than the column 9, print the column 2, the column 9 -100 and the column 9; else (if the column 9 is greater than the column 10) printing the column 2, the column 10 and the column 10 +100 | ||
| + | < | ||
| + | user$ awk -F " | ||
| + | Seq_26_pilon_pilon 266396 266496 | ||
| + | Seq_133_pilon_pilon 27053 27153 | ||
| + | Seq_35_pilon_pilon 23951 24051 | ||
| + | Seq_17_pilon_pilon 499633 499733 | ||
| + | Seq_17_pilon_pilon 498950 499050 | ||
| + | Seq_17_pilon_pilon 499054 499154 | ||
| + | Seq_17_pilon_pilon 500113 500213 | ||
| + | </ | ||
| - | How to make numeric operations | ||
awk_for_tabulated_files.1625584024.txt.gz · Last modified: by 156.34.16.174
