Introduction to AWK: Syntax, Script Structure, and Practical Use Cases
This article introduces the AWK scripting language, covering its basic syntax, command‑line options, script components such as BEGIN and END blocks, and demonstrates common text‑filtering, data‑analysis, and formatting tasks with concrete examples.
Today we recommend a very simple scripting language—AWK—which excels at processing formatted text, often combined with the shell for log handling and statistical work. Its syntax is concise, execution fast, and it offers built‑in features like arrays and functions similar to C, making it easy for beginners.
Syntax
i. awk [options] 'script' var=value file(s)
ii. awk [options] -f scriptfile var=value file(s)
Common options include:
-F fs Specify input field separator (string or regex, e.g., -F "\t").
-v var=value Pass external variables to AWK.
-f scriptfile Read AWK commands from a script file.
Script Structure and Working Principle
An AWK script typically consists of optional BEGIN , pattern/action, and END blocks, each enclosed in single or double quotes.
awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file
BEGIN runs before any input is read (e.g., variable initialization, header printing).
END runs after all input has been processed (e.g., summary output).
The pattern/action block processes each line; if omitted, the default action is { print } , printing every line.
Application Scenarios
Assume IP_file contains visitor data in the format "area,IP,date".
Text Filtering
Goal: select lines where the area contains "北京".
awk '/^北京/{print $0}' IP_file
Here ^ matches the start of the line and $0 prints the entire line.
Data Statistics
Goal: find the top 100 IPs by access count on "2016-11-11".
cat IP_file | awk -F"," '{ if($3 == "2016-11-11") sum[$2]++ } END{ for(i in sum) print i"\t"sum[i] }' | sort -nrk2,2 | head -100
Formatted Output
Using printf (similar to C) for aligned results:
awk -F"\t" '{printf("The number of IP %-15s occurrences is %d times\n", $1, $2)}'
Finding Duplicate Records
awk 'NR==FNR{a[$1]++} NR>FNR && a[$1]>1' IP_file IP_file | uniq
Alternative solution:
cat IP_file | awk '{a[$1]++} END{ for(i in a) if(a[i]>1) print i }'
Integration with Shell
AWK can pipe to/from shell commands, access shell variables via '$var' , "$var" , the -v option, or export . Shell functions can be called after exporting them, and shell commands can be executed from AWK using system() or print cmd | "/bin/bash" .
Important Tips
Handle single quotes correctly; e.g., gsub(/A/,"a") not gsub(/A/,'a') .
Large floating‑point numbers may be printed in scientific notation.
When passing shell variables containing spaces, use '"$var"' .
Ensure comparable types for arithmetic comparisons.
When using FILENAME and ARGIND , keep file order consistent between the command line and the script.
Conclusion
AWK provides a rich set of built‑in variables, functions, flow control statements, standard output formatting, arrays, arithmetic and date functions, meeting many text‑processing needs. This article covered the basics to help you get started quickly; further exploration will reveal AWK as a powerful, handy assistant.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.