Classic Linux Commands for Production Log Analysis

Original Post: https://dzone.com/articles/replacing-modern-tools-with-retro-linux-commands

When troubleshooting production systems, we often encounter legacy infrastructure that lacks modern monitoring tools. Recently, I faced a situation where understanding system behavior from an isolated server required going back to basics with classic Unix commands.

The scenario: An aging production system generating log data, but disconnected from our centralized logging infrastructure. Rather than treating this as a blocker, I recognized an opportunity to demonstrate how foundational Linux tools solve real production problems.

Note on Command Complexity: While I’ll demonstrate some elaborate command chains for educational purposes, production systems benefit from proper scripting. These examples showcase the building blocks that can be combined into robust data extraction and analysis pipelines.

First, time to see where this esoteric system puts its log files. We use locate as our first pass.

locate .log

Technical insight: locate leverages a pre-built database maintained by updatedb. While not real-time, it’s invaluable for rapid discovery when you need to find files quickly across a large filesystem.

For a more exhaustive search, we can reach out to find.

find / -name \*.log\*

Technical insight: find performs real-time filesystem traversal, ensuring completeness at the cost of speed. This exhaustive search is crucial when you need to be certain you’ve found every relevant file. The search pattern uses wildcards to capture all log variations.

With thousands of results, we need intelligent filtering:

locate .log | grep -v kern.log | grep -v other.log

Pipeline optimization: Using grep’s inverse match (-v) to exclude noise is fundamental when working with large result sets. System logs often contain repetitive patterns you don’t care about. This filtering technique scales well when automated.

Success, you found the esoteric system files under a random home directory. It must have its own user.

/export/home/random/random.log
/export/home/random/random.log.1.gz
/export/home/random/random.log.2.gz
/export/home/random/random.log.3.gz
/export/home/random/random.log.4.gz

Compressed logs are common in production systems to manage storage costs. The z* family of commands (zcat, zgrep, zdiff) processes compressed data directly - a crucial efficiency when dealing with large volumes of historical log data.

Let’s extract error patterns:

zcat /export/home/random/random.log.2.gz | grep -i error

Pattern extraction: Case-insensitive matching (-i) captures all error variations - critical when you don’t know the exact case of error messages in the logs.

When dealing with high-volume error data, quantification becomes essential:

zcat /export/home/random/random.log.2.gz |
    grep -i error |
    wc -l

What is going on?: We are extending our last example with wc -l to count the number of lines (rather than words) in the output.

10033 … that is a lot of errors! Time to find out how many different types of errors we are getting. Also, time to investigate the error format by looking at a few of the errors at the start by using head which by default will return 10, but we only need 5. We could have also used tail to grab from the end.

zcat /export/home/random/random.log.2.gz |
    head -5

What is going on?: We are changing our last example to remove the line count and instead just return the first 5 lines using head.

Note: A common use for tail is monitoring files as they are changed using tail -f file_name.

The output of the above command is:

Apr 28 17:06:20 ip-172-31-11-241 random[12545]: reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net failed - POSSIBLE BREAK-IN ATTEMPT! [preauth]
Apr 28 17:06:21 ip-172-31-11-241 random[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
Apr 29 17:06:22 ip-172-31-11-241 random[12547]: Invalid user admin from 216.19.2.8 [auth]
Apr 29 17:06:23 ip-172-31-11-241 random[12547]: input_userauth_request: invalid user admin [preauth]
Apr 29 17:06:24 ip-172-31-11-241 random[12547]: questionable user request: delete everything [usage]

Hmph, that usage one is interesting, let’s see how many of those there are. But we don’t want to use grep because it might find the wrong things, so we need to isolate the specific field we want, cut to the rescue.

Note: This could also have been done with either sed or awk.

zcat /export/home/random/random.log.2.gz |
    cut -d "[" -f 3 |
    cut -d "]" -f 1

What is going on?: We now have added in the cut command, we set up a delimiter to split the string on, the first thing we do is split our lines on the “[” char, which turns

    Apr 28 17:06:21 ip-172-31-11-241 random[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]

into

    Apr 28 17:06:21 ip-172-31-11-241 random
    12545]: Received disconnect from 216.19.2.8: 11: Bye Bye
    preauth]

Then we take the 3rd field “preauth]” and we split it again using cut on “]” which leaves us with just:

    preauth

Note: There are a lot of tools we could have used to trim the last “]”, sed or awk would work here too.

Perfect, that (for our 5 line examples above) gives us the following output:

preauth
preauth
auth
preauth
usage

Terrific, now we want to get counts of each one so we can add in a bit of sort and uniq magic.

zcat /export/home/random/random.log.2.gz |
    cut -d "[" -f 3 |
    cut -d "]" -f 1 |
    sort |
    uniq -c |
    sort -bnr

What is going on?: We pipe the output of our last command to sort, giving us the output:

auth
preauth
preauth
preauth
usage

Then we use uniq to reduce to unique words and by adding -c we prefix each line with the count.

      1 auth
      3 preauth
      1 usage

We pipe it back into sort with -bnr which will (-b) ignore leading blanks, (-n) order using numeric value, and (-r) reverse the order putting highest values first.

      3 preauth
      1 usage
      1 auth

From this, we extract some interesting numbers and clues to the problem, but really need it also grouped by day to find patterns. So this one introduces a few new things. Here we actually use awk, see the first appearance of xargs, and zgrep.

zcat /export/home/random/random.log.2.gz |
    awk '{print $1" "$2}' |
    uniq |
    xargs -I '{}' sh -c '
         echo "{}";
         zgrep "{}" /export/home/random/random.log.2.gz |
             cut -d "[" -f 3 |
             cut -d "]" -f 1 |
             sort |
             uniq -c' |
             sort -bnr

What is going on?: We start by piping the compressed file into awk, which takes the first two fields and discards the rest. The first two fields are the month and day. Then we remove duplicate days using unique and are left with a series of days like “Apr 28” and “Apr 29.” At this point, we take those days and pipe them into xargs, which will run the given command for each line passed into it, in this case, each day. We output the day with echo, then we grep the file for that day and pipe matches into the same series of steps we came up with before

That outputs:

Apr 28
      2 preauth
Apr 29
      1 auth
      1 preauth
      1 usage

...
May 5
      5152 usage
      4 auth
      2 preauth
...

Identifying temporal patterns in system behavior is crucial for troubleshooting. The spike on May 5th represents exactly the kind of anomaly worth investigating further.

Now, let’s transform this data into a structured format suitable for reporting or further analysis:

echo "date, time, ip, process, pid, message, type" > boss_output.csv && zcat /export/home/random/random.log.2.gz | sed -n 's/\(\w\+\) \(\w\+\) \(\w\+:\w\+:\w\+\) \(.[^ ]\+\) \(\w\+\)\[\(\w\+\)\]: \(.[^\[]*\) \[\(\w\+\).*/\1 \2, \3, \4, \5, \6, \7, \8/p' >> boss_output.csv

Data structuring: This sed transformation converts unstructured logs into CSV format - easily imported into spreadsheets or analysis tools. The regex captures key fields: temporal data for time-series analysis, process information for tracking behavior, and message content for categorization.

The resulting structured data:

date, time, ip, process, pid, message, type
Apr 28, 17:06:20, ip-172-31-11-241, random, 12545, reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net failed - POSSIBLE BREAK-IN ATTEMPT!, preauth
Apr 28, 17:06:21, ip-172-31-11-241, random, 12545, Received disconnect from 216.19.2.8: 11: Bye Bye, preauth
Apr 29, 17:06:22, ip-172-31-11-241, random, 12547, Invalid user admin from 216.19.2.8, auth
Apr 29, 17:06:23, ip-172-31-11-241, random, 12547, input_userauth_request: invalid user admin, preauth
Apr 29, 17:06:24, ip-172-31-11-241, random, 12547, questionable user request: delete everything, usage
... (continues with thousands of structured records)

Legacy systems often contain years of valuable log data. Foundation tools scale to terabytes when properly orchestrated. Converting unstructured logs to CSV/JSON enables analysis in any tool. These techniques automate for continuous monitoring. Leveraging built-in tools reduces infrastructure costs while maintaining flexibility.

For a modern take on Unix philosophy applied to monitoring, see graph-handles, which uses the same composable approach for file handle tracking. For deeper terminal mastery, Terminal Reloaded covers essential skills for production AI systems.