Operations 8 min read

Mastering gawk: Powerful Text Processing on Unix/Linux

This article introduces gawk, the GNU version of awk, explaining its programming capabilities, command syntax, field handling, script execution methods, and how to use BEGIN and END blocks for pre‑ and post‑processing of data streams on Unix/Linux systems.

Raymond Ops
Raymond Ops
Raymond Ops
Mastering gawk: Powerful Text Processing on Unix/Linux

gawk is the GNU implementation of the original awk program in Unix, offering a full programming language for stream editing. It allows defining variables, using arithmetic and string operators, employing structured programming constructs, and extracting and reformatting data from files, such as generating formatted reports from log files.

1 gawk command syntax

<code>gawk option program file
    -F fs        specify field separator
    -f file      read program from file
    -v var=value define variable with default value
    -mf N        set maximum number of fields
    -mr N        set maximum number of records
    -W keyword   set compatibility mode or warning level
</code>

Command‑line options customize gawk’s behavior. Scripts can be written to read each line, process data, and produce any type of output.

2 Reading a program script from the command line

gawk scripts must be enclosed in braces {} and quoted with single quotes. Example:

<code># gawk '{print "Hello World!"}'</code>

Without a file name, gawk reads from STDIN and waits for input. Press Ctrl‑D to send EOF and terminate.

3 Using field variables

gawk automatically assigns variables $0, $1, … $n to each field in a line, using whitespace as the default separator. The -F option changes the separator, e.g., to ':' for /etc/passwd.

<code># gawk -F : '{print $1}' /etc/passwd
root
bin
daemon
…</code>

4 Multiple commands in a script

Separate commands with semicolons. Example:

<code>echo "My name is centos" | gawk '{$4="hahaha"; print $0}'
My name is hahaha</code>

5 Storing scripts in files

Scripts can be saved in a file and invoked with -f. Example script2.gawk prints the user’s home directory:

<code>{print $1 "'s home directory is " $6}</code>

Running

gawk -F: -f script2.gawk /etc/passwd

produces the desired output.

6 Running code before processing data

The BEGIN block executes before any input is read, useful for printing headers.

<code>gawk 'BEGIN{print "The data3 File contents:"}{print $0}' data3.txt</code>

7 Running code after processing data

The END block runs after all input is processed, ideal for footers.

<code>gawk '{print $0} END{print "End of file"}' data3.txt</code>
Unixtext processingshell scriptingawkgawk
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.