L100519-Linux Files - Parsing #1


Here is a simple demonstration on how to use the command line to filter through text files. This is my common methodology for working with log files. I'm going to illustrate using the three commands below:

cat - concatenate files and print on the standard output
grep - grep searches the named input FILEs for lines containing a match to the given PATTERN.
wc - print the number of newlines, words, and bytes in files

For this example I'll be using the sample log file below:
May 2 08:18:44 mailgate sendmail[26300]: o42CIZD9026300:
ruleset=check_rcpt, arg1=, relay=[58.237.175.156], reject=550 5.7.1 ... 
Relaying denied. IP name lookup failed [58.237.175.156] 
May 2 08:18:44 mailgate sendmail[26300]: o42CIZD9026300: ruleset=check_rcpt,
arg1=, relay=[58.237.175.156], reject=550 5.7.1 ... 
Relaying denied. IP name lookup failed [58.237.175.156]
May 2 08:18:44 mailgate sendmail[26300]: o42CIZD9026300: ruleset=check_rcpt, arg1=,
relay=[58.237.175.156], reject=550 5.7.1 ...Relaying denied. IP name lookup failed [58.237.175.156]
May 2 08:18:46 mailgate sendmail[26300]: o42CIZD9026300: lost input channel from [58.237.175.156] to MTA after data 
May 2 08:18:46 mailgate sendmail[26300]: o42CIZD9026300: from=, size=0, class=0, nrcpts=0,
 proto=ESMTP, daemon=MTA, relay=[58.237.175.156]
May 2 08:18:51 mailgate sendmail[26303]: o42CIkX1026303: ruleset=check_rcpt, arg1=,
relay=[58.237.175.156], reject=550 5.7.1 ...Relaying denied. IP name lookup failed [58.237.175.156]
May 2 09:13:46 mailgate sendmail[27264]: o42DDaPN027264: lost input channel from [58.237.175.156] to MTA after data 
May 2 09:13:46 mailgate sendmail[27264]: o42DDaPN027264: from=, size=0, class=0, nrcpts=0, 
proto=ESMTP, daemon=MTA, relay=[58.237.175.156]
May 2 09:15:56 mailgate sendmail[27285]: o42DFshC027285: ruleset=check_rcpt, arg1=,
relay=[58.237.175.156], reject=550 5.7.1 ... 
Relaying denied. IP name lookup failed [58.237.175.156] 
May 2 09:15:56 mailgate sendmail[27285]: o42DFshC027285: ruleset=check_rcpt, arg1=, relay=[58.237.175.156], 
reject=550 5.7.1 ... Relaying denied. IP name lookup failed [58.237.175.156] 

I am using the name sample.log for the filename and I'm looking to see how many time the sender IP address 58.237.175.156 tried to relay mail during the minute of eight eighteen in the morning. The command line I would use is provided below
# cat sample.log | grep '58.237.175.156' | grep '08:18' | grep -v lost | wc -l
(sample.1)

Here is how the command line is interpreted:
  1. The command cat reads the entire file
  2. The first grep isolates the IP address I am looking for
  3. the second grep narrows the response down to the time in question
  4. the third grep using the -v switch which means ignore
  5. and finally wc uses the -l switch which asks for only the line-count output
The answer is 5

The grep command can does accept a filename so we can skip the cat command and shorten what we need to type. The command sample below is exactly the same as sample.1
# grep '58.237.175.156' sample.log | grep '08:18' | grep -v lost | wc -l
(sample.2)

One of the powers of Linux is the implementation of regular expressions, more commonly referred to as regex. Similar syntax is found in almost all of the Linux command line search and text manipulation tools.

For example, the command string above, sample.2, is once again shortened as shown below

# grep -E '58.237.175.156|08:18' sample.log |grep -v lost | wc -l
(sample.3)

Note the '-E', it tells grep to interpret PATTERN as an extended regular expression. Regular expressions are very good at matching but are extremely difficult to work with 'ignore', which is why I have left the grep -v in after the first pipe.




Articles
Networking

N090307-Duplex Mismatch
N090905-Multi_IP
N090825-Clear_ARP
N20110904-Mask_table
N091010-ASA_WCCP_LINUX
N20110930-MRTG
Servers

W090905-DHCP-Options
L091028-Crontab
L100519 Linux Files 1
S121220-xymon_cc
Misc.

M110419-Testing EMAIL
M20110818 - Malware Education

eSubnet Fragment

Receive insights into networking, security
and IT management from our newsletter