Operations 17 min read

Master Essential Linux Shell Tools: find, grep, awk, and More

This guide presents a comprehensive overview of the most frequently used Linux shell utilities for text processing—such as find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk—offering practical examples, key options, and best‑practice recommendations for efficient command‑line workflows.

Efficient Ops
Efficient Ops
Efficient Ops
Master Essential Linux Shell Tools: find, grep, awk, and More

Linux Shell is a fundamental skill; despite its quirky syntax and low readability, it is often replaced by scripts like Python. However, mastering it is essential because working with shell scripts reveals many aspects of the Linux system.

The most commonly used tools for text processing in Linux are: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, and awk.

1. find – File Search

Search for txt and pdf files:

<code>find . \( -name "*.txt" -o -name "*.pdf" \) -print</code>

Regex search for .txt and .pdf:

<code>find . -regex ".*\(\.txt|\.pdf\)$"</code>

Case‑insensitive regex:

<code>find . -iregex ".*\.txt$"</code>

Find all non‑txt files:

<code>find . ! -name "*.txt" -print</code>

Limit search depth (depth 1):

<code>find . -maxdepth 1 -type f</code>

Search by type (directories only):

<code>find . -type d -print</code>

Search by time:

‑atime: access time (days)

‑mtime: modification time

‑ctime: change time (metadata)

Files accessed in the last 7 days:

<code>find . -atime 7 -type f -print</code>

Search by size (greater than 2 kB):

<code>find . -type f -size +2k</code>

Search by permission (e.g., 644):

<code>find . -type f -perm 644 -print</code>

Search by user:

<code>find . -type f -user weber -print</code>

Delete all *.swp files in the current directory:

<code>find . -type f -name "*.swp" -delete</code>

Execute a command on each matched file (change ownership to user weber):

<code>find . -type f -user root -exec chown weber {} \;</code>
Note: {} is a placeholder that is replaced by the current file name for each match.

Copy found files to another directory:

<code>find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;</code>

Combine multiple commands by invoking a script with -exec:

<code>find . -type f -print -exec ./commands.sh {} \;</code>

2. grep – Text Search

Basic usage prints matching lines:

<code>grep "pattern" file</code>

Common options:

-o: output only the matching part

-v: invert match (output non‑matching lines)

-c: count matching lines

-n: show line numbers

-i: ignore case

-l: list only file names

Recursive search in multiple directories (favorite for code search):

<code>grep "class" . -R -n</code>

Match multiple patterns:

<code>grep -e "class" -e "virtual" file</code>

Use null‑terminated output (‑z) for safe piping:

<code>grep "test" file* -lZ | xargs -0 rm</code>

3. xargs – Build Command Lines from Input

xargs converts input data into command‑line arguments, allowing combination with other commands such as grep or find.

Convert multiline output to a single line:

<code>cat file.txt | xargs</code>

Convert a single line to multiple lines (‑n specifies fields per line):

<code>cat single.txt | xargs -n 3</code>

Key options:

-d: define delimiter (default space, newline is \n)

-n: specify number of arguments per command line

-I {}: replace {} with the input item

-0: use null character as delimiter

Example – run a script for each line:

<code>cat file.txt | xargs -I {} ./command.sh -p {} -1</code>

Example – count lines of C++ source files:

<code>find source_dir/ -type f -name "*.cpp" -print0 | xargs -0 wc -l</code>

4. sort – Sorting

Options:

-n: numeric sort (vs. -d dictionary order)

-r: reverse order

-k N: sort by the N‑th column

Examples:

<code>sort -nrk 1 data.txt</code>
<code>sort -bd data   # ignore leading blanks</code>

5. uniq – Remove Duplicate Lines

Basic usage:

<code>sort unsort.txt | uniq</code>

Count occurrences:

<code>sort unsort.txt | uniq -c</code>

Show only duplicate lines:

<code>sort unsort.txt | uniq -d</code>

Specify fields to compare (‑s start, ‑w width):

<code>sort unsort.txt | uniq -f 2 -s 5 -w 10</code>

6. tr – Translate or Delete Characters

General usage:

<code>echo 12345 | tr '0-9' '9876543210'   # simple substitution</code>
<code>cat text | tr '\t' ' '   # tab to space</code>

Delete characters:

<code>cat file | tr -d '0-9'   # remove all digits</code>

Complement set (‑c):

<code>cat file | tr -c '0-9'   # keep only digits</code>
<code>cat file | tr -d -c '0-9 \n'   # delete non‑digits</code>

Compress repeated characters (‑s):

<code>cat file | tr -s ' '</code>

Character classes (e.g., [:lower:], [:upper:]):

<code>tr '[:lower:]' '[:upper:]'</code>

7. cut – Extract Columns

Extract columns 2 and 4:

<code>cut -f2,4 filename</code>

Exclude column 3:

<code>cut -f3 --complement filename</code>

Specify delimiter:

<code>cut -d ";" -f2 filename</code>

Field ranges:

N‑: from field N to end

M‑N: fields M through N

Units:

-b: bytes

-c: characters

-f: fields (delimiter‑based)

Examples:

<code>cut -c1-5 file   # first five characters</code>
<code>cut -c-2 file    # first two characters</code>

8. paste – Merge Files Linewise

Combine two files column‑wise (default delimiter is tab):

<code>paste file1 file2</code>

Specify a different delimiter (e.g., comma):

<code>paste file1 file2 -d ","</code>

9. wc – Word, Line, and Byte Count

Count lines:

<code>wc -l file</code>

Count words:

<code>wc -w file</code>

Count bytes:

<code>wc -c file</code>

10. sed – Stream Editor for Text Substitution

Replace first occurrence on each line:

<code>sed 's/text/replace_text/' file</code>

Global replacement:

<code>sed 's/text/replace_text/g' file</code>

Edit file in place (‑i):

<code>sed -i 's/text/replace_text/g' file</code>

Delete empty lines:

<code>sed '/^$/d' file</code>

Use captured groups:

<code>sed 's/hello\([0-9]\)/\1/'</code>

Variable substitution with double quotes:

<code>p=pattern; r=replace; echo "a line with pattern" | sed "s/$p/$r/g"</code>

11. awk – Powerful Text Processing Language

Basic script structure:

<code>awk 'BEGIN{print "start"} {print} END{print "end"}' file</code>

Key built‑in variables:

NR – record number (line number)

NF – number of fields

$0 – entire line

$1, $2 … – individual fields

Print specific fields:

<code>awk '{print $2, $3}' file</code>

Count lines:

<code>awk 'END{print NR}' file</code>

Sum first column:

<code>awk '{sum+=$1} END{print sum}' file</code>

Filter by line number:

<code>awk 'NR<5' file</code>

Filter by pattern:

<code>awk '/linux/' file</code>

Set field delimiter (‑F):

<code>awk -F: '{print $NF}' /etc/passwd</code>

Read command output with getline:

<code>echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout}'</code>

Implement head (first 10 lines):

<code>awk 'NR<=10{print}' filename</code>

Implement tail (last 10 lines):

<code>awk '{buf[NR%10]=$0} END{for(i=0;i<10;i++) print buf[i]}' filename</code>
Source: 大CC, http://www.cnblogs.com/me115/p/3427319.html (originally from the public account “民工哥技术之路”).
linuxShellCommand-linetext processinggrepawkfind
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.