Advanced Log Processing by Dr. Anton Chuvakin

Advanced Log Processing
Dr. Anton Chuvakin
Written in 2002
DISCLAIMER:
Security is a rapidly changing field of human endeavor. Threats we face literally

change every day; moreover, many security professionals consider the rate of change
to be accelerating. On top of that, to be able to stay in touch with such ever-changing
reality, one has to evolve with the space as well. Thus, even though I hope that this
document will be useful for security at the time when you are reading it, please keep in
mind that is was possibly written years ago.
Introduction
"Only look for those problems that you know how to solve" - says one of the Murphy
Laws. In security, it means to only detect what you plan to respond to. It is well known,
that any intrusion detection system is only as good as the analyst watching its output.
Thus, having nobody watch the IDS is just as good as having no IDS at all. But how and
where to look if you are drowning in the ocean of alerts, logs, messages and other
attention grabbers?
This paper deals with log collection and analysis, both extremely important part of
information security game. We will touch upon using logs in incident response and
handling logs in day-to-day routine. Further, we will look at three fundamental
problems: log transmission, log collection and log analysis. We will also briefly touch
upon log storing and archival.
UNIX system administrators have always had a habit of looking at /var/log/messages

(or /var/adm/messages) in case of problems, be it the failed hardware components or
malicious hacker attack. In Windows world, NT Event log provides similar source of
information. Logs serve to both assure that everything is OK and to help figure out
what emergencies happened. Having logs from multiple machines collected in one
place simplifies both day-to-day maintenance and incident response. More effective
audit, secure storage and possibilities for analysis across multiple computing platforms
are some of the advantages. In addition, secure and uniform log storage might be
helpful is intruder is prosecuted based on log evidence. In this case, careful
documentation of log handling procedure might be needed.
I.
First, lets look at log transmission. The tradition mechanism for UNIX log transfers is
UDP, port 514 (see RFC3164 for more details) . Log messages are sent and received by
a syslog daemon. Network security devices (not only UNIX based) also often use UDP
for logging. What are the evident problems with this approach? Messages can be
injected, quietly dropped (or replaced) or delayed in transit. There is no "delivery
confirmation" and encryption. But how important are all those properties, considering
that syslog was in wide use for many years? in light of the above, logging over UDP is
unsuitable for high security environments, unless a separate LAN is used for collecting
the logging information.
What are alternative channels for log transmission. First, there is a standard for a
reliable syslog transmission (RFC3165), but it is not implemented by any major
vendors. The simplest approach is to accumulate logs locally and periodically copy
them to an aggregation server via ssh-based secure copy (scp). There are several
scripts available to automate the process:
1. rotate the logs - logrotate
2. compress them - gzip
3. apply the checksum algorithm (e.g. md5sum) - md5sum
4. copy the logs from host to aggregation server - scp
5. run the md5sum again and compare
6. store the log files and the checksums in a secure place (maybe encrypted)
(see more details on this transfer method at

http://online.securityfocus.com/infocus/1394)
However, the evident flaw of this method is its time-delayed nature. Unlike the UDP-
based syslog, this log copying methods allows for a lag time between the log
generation and safe storage/analysis. Sometimes it is critical to see the log files
immediately.
The first idea that come to mind is tunneling. Some say its inelegant, but it works.
Using netcat one can tunnel UDP over Secure Shell by (SSH) redirecting the syslog
traffic to TCP tunnel, protected by secure shell. The directions can be found at
http://www.patoche.org/LTT/security/00000118.html Make absolutely sure that the
syslog is not receiving messages from other hosts, or message looping will occur.
In fact, by replacing netcat with cryptcat, one can eliminate SSH from the equation. In
this case the setup is as follows:
To get cryptcat go to: http://farm9.com/content/Free_Tools/Cryptcat
on log-generating host:
1. edit /etc/syslog.conf to have:
*.* @localhost
2. run command:
# nc -l -u -p 514 | cryptcat 10.2.1.1 9999
on log collecting host:
1. run syslog with remote reception (-r) flag (for Linux)
2. run command:
# cryptcat -l -p 9999 | nc -u localhost 514
Stunnel SSL wrapper can be used in place of cryptcat (see this guide, for example,
http://www.campin.net/newlogcheck.html)
If one is not satisfied with makeshift tunneling solutions or needs even more security
(such as even higher delivery guarantee or cryptographic log integrity verification), its
time to look at syslog replacements.
This ( http://rr.sans.org/unix/syslog.php ) SANS article provides a good (if outdated)
high-level comparison of syslog replacements. We will look in more details at syslog-ng
(http://www.balabit.hu/en/downloads/syslog-ng/) by BalabIT and msyslog by CORE SDI
(http://www.corest.com). The third well-known replacement (nsyslog by Darren Reed
http://coombs.anu.edu.au/~avalon/nsyslog.html) does not apear to be actively updated
anymore.
Their common features include TCP communication, more filtering options (in addition
to SEVERITY and FACILITY of standard syslog) and log file integrity support.
Let us look at configuring msyslog for production
environment. Myslog-1.08a-1 is installed on client machine (produces
logs) and server machine (collects logs) from RPM packages. Both
client and server run RedHat Linux 7.2.
Making msyslog work, while easy, was considerably more difficult than
a regular syslog daemon. Software documentation appears to be
contradictory at times.
The TCP-mode setup that worked involves the following list of changing on the hosts:
On the client:
1. Modify /etc/syslog.conf to have
*.* %tcp -a -h loghost -p 514 -m 30 -s 8192
IN PLACE OF
*.* @loghost
2. Run msyslog as "msyslogd -i linux -i unix". Just running "/etc/init.d/msyslog start"
does the trick.
On the server:
1. Run msyslog as "msyslogd -i linux -i unix -i 'tcp -a -p 514'" This
can be accomplished by modifying /etc/syslonfig/msyslog to have:
-----------
IM_LINUX="-i linux" # example: "-i linux"
IM_TCP="-t tcp -a -p 514" # example: "-i tcp accepted.host.com 514"
IM_UNIX="-i unix" # example: "-i unix"
-----------
Result is an unencrypted TCP connection (can be verified by 'tcpdump
proto TCP and port 514', one would see the log messages)
Lets add hashing to the mix. On the syslog aggregation server, where
the messages are distributed between various log files, one needs to:
1. Add a line to syslog.conf:
*.info;mail.none;authpriv.none;cron.none %peo -l -k /etc/.var.log.authlog.key

%classic /var/log/messages
2. Stop the syslog daemon and rotate or erase the log files
3. Run the program to create the initial checksum (using included
'peochk' program):
# peochk -g -k /etc/.var.log.authlog.key
5. Start the syslog: '/etc/init.d/msyslog start'
To test the functionality run:
peochk -f /var/log/messages -k /etc/.var.log.authlog.key
to see:
(0) /var/log/messages file is ok
on the intact log file.
If one is to edit the 'messages' file (say, by removing a line) and then retest the output
is:
(1) /var/log/messages corrupted
The advantages of msyslog include: uses the same /etc/syslog.conf file
as regular syslog, full syslog interoperability in UDP mode (as client
and server) and extensive regular expression support, that allows
matching messages with various source, time and content fields. The
version that was evaluated also used easy-to-use RedHat-style

/etc/sysconfig file to control daemon startup options. The options
include various modules that handle I/O, such as receiver TCP/UDP,
send TCP/UDP, send to database, record cryptographic has and
others. Msyslog also features a debug mode that was very helpful
during the setup.
The software is not without minor problems: authentication is weak (by
host only, thus any user from the allowed host can connect via telnet
to TCP port and send spoofed messages), encryption has to be
implemented via thrid-party tools (the easiest is via SSH port
forwarding or SSL wrapper such as stunnel. However, performance impact
of such is unclear). The best secure setup will involve binding
msyslog to a localhost address (127.0.0.1) and using SSH RSA/DSA
authentication for access control, together with hash integrity
checking on the logging server.
Another important issue is buffering. Since TCP offers reliable
delivery (unlike UDP), some measures should be taken to keep the log
files in case the log server goes down. Msyslog offers configurable
buffering option. In the above configuration: '-m 30 -s 8192' stand
for retry limit (in seconds) and buffer size (in message
lines). Buffers are also important for dealing with message bursts,
that do happen, for example, when programs are starting up or when the
"noisy" firewall is getting port scanned.
Syslog-ng (version 1.5.17-1) was installed from RPM packages on the
same test systems. Syslog-ng supports TCP connections, filtering based

on message contents, logging of complete chain of forwarding loghosts
(unlike regular syslog which will only record the name of last step),
etc. Extensive documentation is available.
To make syslog-ng work, one has to use an included conversion tool to
covert the /etc/syslog.conf to syslog-ng format file. The command:
# /usr/share/doc/syslog-ng-1.5.17/syslog2ng < /etc/syslog.conf > syslog-ng.conf
does the trick. The excerpt is shown below:
----------------------
# global options
options { use_dns(yes);
use_fqdn(no);
use_time_recvd(no);
chain_hostnames(no);
mark(0);
sync(0);
};
source s_local { internal();
unix-stream("/dev/log" keep-alive(yes) max-connections(10));
file("/proc/kmsg");
};
# *.* @anton
destination d_2 {
tcp("anton" port(514));
};
filter f_5 {
level(debug...emerg);
};
log { source(s_local); filter(f_5); destination(d_2); };
---------------------
It is easy to follow the logic, even though the file is different from
regular syslog. However, writing files by hand and using advanced
options will take some learning. For example, to enable TCP logging
one has to replace udp with tcp in the configuration file (done in the
above example: see 'tcp("anton" port(514));'). Again, by default the
communication is not encrypted.
Syslog-ng also features more granular access control and can use TCP
wrapper to limit network access. The program can also redirect
messages to custom programs for real-time processing. For example, to
send every log message to the STDIN on the "correlate.sh" script, add
to the config file:
------
log { source(s_local); destination(d_prg); };
destination d_prg { program("/home/bin/correlate.sh -a"); };
------
The first test performed was the interoperability test - syslog-ng
client successfully sent messages over UDP and TCP to msyslog server.
Syslog-ng comes with a stress test tool (that calls '/usr/bin/logger'
command in a large loop). However, it is apparent that TCP transfers
will be slower than UDP, even with no encryption. The syslog
replacements should be stress tested (at well above normal message
rates) before enterprise deployment. It should be noted, that for
conventional syslog UDP transmission the failure mode will be losing
messages, it is not clear how the TCP-enabled daemons will behave.
Overall, msyslog and syslog-ng are viable options where extra security
is desired. However, a detailed testing is required before deployment.
A few words on covert logging. If one is running a honeypot (like we
do) and is experiencing intense paranoia about attackers detecting
your system and keyboard log transfers, some covert options are
available. Encrypted spoofed UDP transfer mechanism has been proposed
for this purpose. However, the discussion goes beyond the scope of
this paper.
II.
The typical method of log collection is a dedicated
logging host, holding the log records from many machines in a single
mammoth file, rotated and compressed periodically. This method is used
since the early days of UNIX and there are few disadvantages with it.
Logging to a database brings us to the next level of log
aggregation. Msyslog has native support for logging to a database
(MySQL and Postgress). To configure, do the following on the log
collecting server:
0. Install and start mysql
# /etc/init.d/mysql start
1. Create a database instance:
# echo "CREATE DATABASE msyslog;" | mysql -u root -p
2. Define tables for log storage:
# cat syslog-sql.sql | mysql msyslog
(the file is shown below:
-------------------
CREATE TABLE syslogTB (

facility char(10),
priority char(10),
date date,
time time,
host varchar(128),
message text,
seq int unsigned auto_increment primary key
);
------------------------
3. Edit syslog.conf to enable database-logging module:
*.* %mysql -s localhost -u snort -d msyslog -t syslogTB
4. Grant access privileges for message insertion:
# echo "grant INSERT,SELECT on msyslog.* to snort@localhost;" | mysql -u root -p
5. Restart msyslog
# /etc/init.d/msyslogd start
That is how the result looks in PHPAdmin:
======================
======================
======picture 1=======
======================
======================
In this setup, message queuing is enable on the client (which uses the
configuration shown above). In some cases, it was necessary to restart
the client after the server restart if using the TCP mode.
Other tools to collect syslog messages include SQLSyslogd
(http://www.frasunek.com/sources/security/sqlsyslogd/), which can be
used with regular syslog daemon.
Collecting logs in the database presents several important advantages
over plain text storage. Databases can be set to accept messages at
much higher rates. In the our tests, for simple messages msyslog-MySQL
combination received and archived about 240 messages per
second. Sustained rate of thousands of messages per seconds is not
unheard of for a commercial log aggregation software, which can also
analyze the resulting massive datasets. If proper analysis software is
available, log file database can be used to analyze the data more
effectively. For example, fine grained searches can be performed (as
will be shown below).
III.
Now lets turn to the analysis part. Log analysis often is defined as
getting meaningful intrusion data and some historical trends from log
file.
In fact, a lot of good software is written to analyze plain text log
files (a large resource list is available at
http://www.counterpane.com/log-analysis.html). Not having the space to
review all the log analysis scripts, it makes sense to review the
approaches and then formulate suggestions on log analysis.
Log analysis can be split into real-time and periodic. Tools like
"swatch" or "logsurfer" provide real-time log processing and
"logcheck", "logwatch" and many others use the periodic approach.
Both approaches work, provided you know what to look for and (usually)
can write some sophisticated regular expressions to tell the grains
from the chaff. Some of the analysis scripts come with an extensive
set of default regexes (logcheck),. while others make you create your
own (swatch). The techniques are well covered in their documentations.
One tool deserve special mention for having a much more sophisticated
real-time analysis and correlation engine. SEC (simple event
correlator) by Risto Vaarandi (http://www.estpak.ee/~risto/sec/)
offers not just regex matching on a line by line basis. The programs
offers an extensive list of sophisticated multi-event matches such as
match an event A and wait for X seconds for the event B to arrive,
then execute an action, match an event A, then count same events for X
seconds and execute an action if threshold is exceeded, etc. In
addition, events can be matched across multiple lines. Provided, that
you know what to look for, the program can be an extremely powerful
tool for log analysis.
But what if the logs are stored into a database? What are the
techniques one can use to analyze those? Instead of running a script
one can write a simple (or, not-so-simple, if desired) SQL file to
look through events and establish relationships.
To analyze logs, one might want to run various SELECT queries, such as
(the msyslog database described above is used for tests):
--------------
A. High-level overview queries:
number of events select count(*) from syslogTB;
number of hosts that sent messages inselect count( distinct host) from syslogTB;
number of messages per host select host, count(host) from syslogTB group by
host order by host desc;
Sample output from MySQL:
+------------------------+-------------+
| host | count(host) |
+------------------------+-------------+
| box1 | 20 |
| box2.example.com | 3147 |
+------------------------+-------------+
B. Drill-down detailed reports:
search by hostname select * from syslogTB where host like "%box1%"
search by message text select * from syslogTB where message like "%restart
%";
search by combination select * from syslogTB where message like "%restart

%" and host like "box1";
select * from syslogTB where message like "%restart%" and

time like "10:51%";
Sample output from MySQL:
+----------+----------+------------+----------+------------------------+------------------+-----+
| facility | priority | date | time | host | message | seq |
+----------+----------+------------+----------+------------------------+------------------+-----+
| NULL | NULL | 2002-05-17 | 10:51:19 | box2.example.com | syslogd: restart |

1|
| NULL | NULL | 2002-05-17 | 10:51:41 | box2.example.com | syslogd: restart |

3|
+----------+----------+------------+----------+------------------------+------------------+-----+
count the number of events select count(*) from syslogTB where message like
"%restart%";
see all unique message types select distinct message from syslogTB;
--------------
The limit is one's creativity since SQL syntax is very flexible and
allows extremely complicated queries to be built. Just keep in mind
that with a loaded database the multi-message queries might take a

while (however, still much faster than doing a 'grep' on a mammoth
plain text file).
As a conclusion, several best practices for system logging are
provided:
• Use a dedicated logging host

• Make sure that tight access controls are enabled on all logging servers
• Encrypt logs if business case for this exists
• Try to log to more than one box to increase reliability
• Watch for overflowing log partitions/storage
• For more security store logs on WORM media (if business case exists) or transfer
them to a non-networked computer
Clearly, not all of the above have to be implemented for all
environments. The lists provides some of the things that will ensure
that one will be able track an incident when it occurs.
ABOUT AUTHOR:
This is an updated author bio, added to the paper at the time of reposting in 2009.
Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the

field of log management and PCI DSS compliance. He is an author of books "Security
Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy II", "Information
Security Management Handbook" and others. Anton has published dozens of papers
on log management, correlation, data analysis, PCI DSS, security management (see list
www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most
popular in the industry.
In addition, Anton teaches classes and presents at many security conferences across
the world; he recently addressed audiences in United States, UK, Singapore, Spain,
Russia and other countries. He works on emerging security standards and serves on
the advisory boards of several security start-ups.
Currently, Anton is developing his security consulting practice, focusing on logging and
PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton
Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously,
Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the
world about the importance of logging for security, compliance and operations. Before
LogLogic, Anton was employed by a security vendor in a strategic product
management role. Anton earned his Ph.D. degree from Stony Brook University.

Advanced Log Processing by Dr. Anton Chuvakin

Uploaded by

Copyright:

Available Formats

Advanced Log Processing by Dr. Anton Chuvakin

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Log Processing by Dr. Anton Chuvakin

Uploaded by

Copyright:

Available Formats

Advanced Log Processing

Dr. Anton Chuvakin

Security is a rapidly changing field of human endeavor. Threats we face literally

UNIX system administrators have always had a habit of looking at /var/log/messages

1. rotate the logs - logrotate

2. compress them - gzip

3. apply the checksum algorithm (e.g. md5sum) - md5sum

4. copy the logs from host to aggregation server - scp

5. run the md5sum again and compare

(see more details on this transfer method at

1. edit /etc/syslog.conf to have:

# nc -l -u -p 514 | cryptcat 10.2.1.1 9999

on log collecting host:

1. run syslog with remote reception (-r) flag (for Linux)

# cryptcat -l -p 9999 | nc -u localhost 514

Let us look at configuring msyslog for production

environment. Myslog-1.08a-1 is installed on client machine (produces

client and server run RedHat Linux 7.2.

a regular syslog daemon. Software documentation appears to be

1. Modify /etc/syslog.conf to have

*.* %tcp -a -h loghost -p 514 -m 30 -s 8192

1. Run msyslog as "msyslogd -i linux -i unix -i 'tcp -a -p 514'" This

can be accomplished by modifying /etc/syslonfig/msyslog to have:

IM_LINUX="-i linux" # example: "-i linux"

IM_TCP="-t tcp -a -p 514" # example: "-i tcp accepted.host.com 514"

IM_UNIX="-i unix" # example: "-i unix"

Result is an unencrypted TCP connection (can be verified by 'tcpdump

1. Add a line to syslog.conf:

*.info;mail.none;authpriv.none;cron.none %peo -l -k /etc/.var.log.authlog.key

5. Start the syslog: '/etc/init.d/msyslog start'

To test the functionality run:

peochk -f /var/log/messages -k /etc/.var.log.authlog.key

(0) /var/log/messages file is ok

on the intact log file.

(1) /var/log/messages corrupted

The advantages of msyslog include: uses the same /etc/syslog.conf file

as regular syslog, full syslog interoperability in UDP mode (as client

and server) and extensive regular expression support, that allows

version that was evaluated also used easy-to-use RedHat-style

include various modules that handle I/O, such as receiver TCP/UDP,

send TCP/UDP, send to database, record cryptographic has and

during the setup.

The software is not without minor problems: authentication is weak (by

to TCP port and send spoofed messages), encryption has to be

implemented via thrid-party tools (the easiest is via SSH port

forwarding or SSL wrapper such as stunnel. However, performance impact

of such is unclear). The best secure setup will involve binding

msyslog to a localhost address (127.0.0.1) and using SSH RSA/DSA

authentication for access control, together with hash integrity

checking on the logging server.

Another important issue is buffering. Since TCP offers reliable

buffering option. In the above configuration: '-m 30 -s 8192' stand

"noisy" firewall is getting port scanned.

Syslog-ng (version 1.5.17-1) was installed from RPM packages on the

same test systems. Syslog-ng supports TCP connections, filtering based

etc. Extensive documentation is available.

To make syslog-ng work, one has to use an included conversion tool to

covert the /etc/syslog.conf to syslog-ng format file. The command:

# /usr/share/doc/syslog-ng-1.5.17/syslog2ng < /etc/syslog.conf > syslog-ng.conf

does the trick. The excerpt is shown below:

. %tcp -a -h loghost -p 514 -m 30 -s 8192

. %mysql -s localhost -u snort -d msyslog -t syslogTB