awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

Most people treat awk as a line processor. It is—but it’s also a full programming language with variables, loops, and functions. For logs, you barely need that. Five patterns cover everything.

The basic structure:

awk 'pattern { action }' file.log

If the pattern matches, the action runs. If you omit the pattern, the action runs for every line.

1. Filter Lines by Condition

Extract 404 errors from an Apache log:

awk '$9 == 404' access.log

$9 is the 9th field (the HTTP status code). This prints every line where status is 404.

More complex: status code AND a path:

awk '$9 == 500 && $7 ~ /api/' access.log

$7 is the path. ~ means “matches regex”. This gets 500 errors from /api/* paths.

Need the opposite? Use !~:

awk '$7 !~ /favicon|static/' access.log

Exclude favicon and static requests.

2. Count Occurrences

How many 404s are in your log?

awk '$9 == 404' access.log | wc -l

But awk is faster:

awk '$9 == 404 { count++ } END { print count }' access.log

{ count++ } runs for each matching line. END runs after all lines. print count outputs the total.

Count by status code:

awk '{ status[$9]++ } END { for (code in status) print code, status[code] }' access.log

status[$9] is an associative array keyed by HTTP status. After processing all lines, loop through and print counts.

Output:

200 15432
404 234
500 12

3. Sum a Field

Your app logs request latency in milliseconds. What’s the total? The average?

awk -F: '{ total += $3; count++ } END { print "Total:", total, "ms | Average:", total/count, "ms" }' latency.log

-F: sets the field separator to : (useful if your log format uses colons). $3 is the latency field. Accumulate in total, count lines, then divide for average at the end.

Input log line:

request:user123:145
request:user456:89
request:user789:201

Output:

Total: 435 ms | Average: 145 ms

4. Extract and Reformat

You have tab-separated logs. Extract name and email, reformat as CSV:

awk -F'\t' '{print $2 "," $3}' users.log

Input (tab-separated):

ID  Name  Email
1  alice  alice@example.com
2  bob  bob@example.com

Output:

Name,Email
alice,alice@example.com
bob,bob@example.com

More advanced: extract a date range:

awk -F'[: ]' '$4 >= "09:00" && $4 < "17:00"' access.log

-F'[: ]' uses multiple delimiters (colon or space). $4 is the hour. This gets logs between 9 AM and 5 PM.

5. Conditional Formatting

Print lines longer than 1000 characters with line numbers:

awk 'length > 1000 { print NR": " $0 }' largefile.log

length is the line length. NR is the line number. $0 is the entire line.

Another common one: print lines matching a pattern with context (2 lines before and after):

awk '/ERROR/ { for (i = 1; i <= 2; i++) if (NR - i in a) print a[NR - i]; print NR": " $0; next } { a[NR] = $0 }' app.log

Actually, that’s getting gnarly. For context, use grep:

grep -B 2 -A 2 'ERROR' app.log

But within awk? Mark important lines:

awk '/ERROR|FATAL/ { print "*** " $0 " ***" } !/ERROR|FATAL/ { print $0 }' app.log

Real Example: Analyzing a Web Server Log

You have an Apache log. You want:

Count requests per HTTP status
Find the slowest requests
Show only non-2xx/3xx responses

# 1. Counts by status
awk '{ status[$9]++ } END { for (s in status) print s ": " status[s] }' access.log | sort -t: -k2 -rn

# 2. Top 10 slowest requests
awk '{ print $10, $7 }' access.log | sort -rn | head -10

# 3. Filter to error codes
awk '$9 ~ /^[45]/' access.log

Line 1: count by $9 (status), sort by count descending. Line 2: print response time ($10), then path ($7). Sort by time descending. Line 3: regex match—if status starts with 4 or 5, print it.

When to Stop Using awk

If your log parsing needs:

Complex regex with named groups
Multiple passes over the data
JSON parsing

Then use jq (JSON) or switch to Python. awk is fast and scriptable, but it has limits.

For everything else—filtering, summing, reformatting—awk is the 20-year-old tool that still outperforms the new hotness.

awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

1. Filter Lines by Condition

2. Count Occurrences

3. Sum a Field

4. Extract and Reformat

5. Conditional Formatting

Real Example: Analyzing a Web Server Log

When to Stop Using awk

Responses from around the web

Discussion

Related Posts

SumGuy’s Guide to Linux Log Analysis

Logrotate & Compression

Bulk File Renaming on Linux: rename, vidir, fd

Named Pipes (FIFOs) in Shell Scripts