Home Navigation

Monday 11 January 2021

Linux Awk scripting cheatsheet

 What is awk? 

It’s a full scripting language, as well as a complete text manipulation toolkit for the command line.

Awk is used for to transform data files and produce formatted report.

They way it works
  • Scans a file line by line
  • Splits each input line into fields
  • Compare input line/fields to pattern
  • Performs action on matches lines
in the terminal if you type awk and hit enter you should see the blow output which will show the parameters it accepts and the format of the command.

/$ awk
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options: (standard)
        -f progfile             --file=progfile
        -F fs                   --field-separator=fs
        -v var=val              --assign=var=val
Short options:          GNU long options: (extensions)
        -b                      --characters-as-bytes
        -c                      --traditional
        -C                      --copyright
        -d[file]                --dump-variables[=file]
        -e 'program-text'       --source='program-text'
        -E file                 --exec=file
        -g                      --gen-pot
        -h                      --help
        -L [fatal]              --lint[=fatal]
        -n                      --non-decimal-data
        -N                      --use-lc-numeric
        -O                      --optimize
        -p[file]                --profile[=file]
        -P                      --posix
        -r                      --re-interval
        -S                      --sandbox
        -t                      --lint-old
        -V                      --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
        gawk '{ sum += $1 }; END { print sum }' file
        gawk -F: '{ print $1 }' /etc/passwd

Create file in any of the directory you choose with following contents
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b
D,CD,BCD,98
F,GH,ABC,XYZ,LF

awk -F, '{ print }' file // -F, is the separator, here the separator is ,
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b
D,CD,BCD,98
F,GH,ABC,XYZ,LF

$awk -F',' '{ print $1}' file
A
B
C
D
F

$0: Represents the entire line of text.
$1: Represents the first field.
$2: Represents the second field.
$7: Represents the seventh field.
$45: Represents the 45th field.

$awk -F',' '{ print $1, $3}' file
A  ABC
B  CBA
C  ACB
D  BCD
F  ABC

OFS (output field separator) variable to put a separator between fields
$awk -F','  'OFS="/" { print $1, $3}' file
A/ ABC
B/ CBA
C/ ACB
D/ BCD
F/ ABC

Replacing all the values of column 2
$awk -F',' '{$2="1";print }' file
A 1  ABC  ABCD
B 1  CBA  C200
C 1  ACB  100b
D 1  BCD  98
F 1  ABC  XYZ  LF

Replacing all the values of colum 2 and putting a quote arround it
$awk -F, '{$2="\"1\"";print }' file
A "1"  ABC  ABCD
B "1"  CBA  C200
C "1"  ACB  100b
D "1"  BCD  98
F "1"  ABC  XYZ  LF

Number of cell in per row after splitting by ,
$awk -F, '{ print NF }' file
4
4
4
4
5

A BEGIN rule is executed once before any text processing starts. In fact, it’s executed before awk even reads any text. An END rule is executed after all processing has completed. You can have multiple BEGIN and END rules, and they’ll execute in order.
$awk  -F',' 'BEGIN {print "Hello world"} { print $0}' file
Hello world
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b
D,CD,BCD,98
F,GH,ABC,XYZ,LF


$awk 'END { print NR } { print }' file
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b
D,CD,BCD,98
F,GH,ABC,XYZ,LF
5

To print the first item along with the row number(NR) 
$awk -F, '{ print NR ", " $0 }' file
1,A,AB,ABC,ABCD
2,B,BA,CBA,C200
3,C,AC,ACB,100b
4,D,CD,BCD,98
5,F,GH,ABC,XYZ,LF

Conditions and regular expressions

$awk -F, '$4 > 90 { print }' file
D,CD,BCD,98

$awk -F, '$3 ~ /A/ { print $0 }' file
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b
F, GH, ABC, XYZ, LF

$awk -F, '$3 ~ /^A/ { print $0 }' file
A,AB,ABC,ABCD
C,AC,ACB,100b
F,GH,ABC,XYZ,LF

for loops in awk:
$awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }'
square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25
square of 6 is 36

$awk -F, 'length($4) > 3' file
A,AB,ABC,ABCD
B,BA,CBA,C200
C,AC,ACB,100b

awk if conditions
$awk -F, '{ if($4 == "ABCD") print $0;}' file
A,AB,ABC,ABCD