Advanced Log Processing by Dr. Anton Chuvakin
Advanced Log Processing by Dr. Anton Chuvakin
Advanced Log Processing by Dr. Anton Chuvakin
Written in 2002
DISCLAIMER:
Introduction
"Only look for those problems that you know how to solve" - says one of the Murphy
Laws. In security, it means to only detect what you plan to respond to. It is well known,
that any intrusion detection system is only as good as the analyst watching its output.
Thus, having nobody watch the IDS is just as good as having no IDS at all. But how and
where to look if you are drowning in the ocean of alerts, logs, messages and other
attention grabbers?
This paper deals with log collection and analysis, both extremely important part of
information security game. We will touch upon using logs in incident response and
handling logs in day-to-day routine. Further, we will look at three fundamental
problems: log transmission, log collection and log analysis. We will also briefly touch
upon log storing and archival.
I.
First, lets look at log transmission. The tradition mechanism for UNIX log transfers is
UDP, port 514 (see RFC3164 for more details) . Log messages are sent and received by
a syslog daemon. Network security devices (not only UNIX based) also often use UDP
for logging. What are the evident problems with this approach? Messages can be
injected, quietly dropped (or replaced) or delayed in transit. There is no "delivery
confirmation" and encryption. But how important are all those properties, considering
that syslog was in wide use for many years? in light of the above, logging over UDP is
unsuitable for high security environments, unless a separate LAN is used for collecting
the logging information.
What are alternative channels for log transmission. First, there is a standard for a
reliable syslog transmission (RFC3165), but it is not implemented by any major
vendors. The simplest approach is to accumulate logs locally and periodically copy
them to an aggregation server via ssh-based secure copy (scp). There are several
scripts available to automate the process:
6. store the log files and the checksums in a secure place (maybe encrypted)
However, the evident flaw of this method is its time-delayed nature. Unlike the UDP-
based syslog, this log copying methods allows for a lag time between the log
generation and safe storage/analysis. Sometimes it is critical to see the log files
immediately.
The first idea that come to mind is tunneling. Some say its inelegant, but it works.
Using netcat one can tunnel UDP over Secure Shell by (SSH) redirecting the syslog
traffic to TCP tunnel, protected by secure shell. The directions can be found at
http://www.patoche.org/LTT/security/00000118.html Make absolutely sure that the
syslog is not receiving messages from other hosts, or message looping will occur.
In fact, by replacing netcat with cryptcat, one can eliminate SSH from the equation. In
this case the setup is as follows:
To get cryptcat go to: http://farm9.com/content/Free_Tools/Cryptcat
on log-generating host:
*.* @localhost
2. run command:
2. run command:
Stunnel SSL wrapper can be used in place of cryptcat (see this guide, for example,
http://www.campin.net/newlogcheck.html)
If one is not satisfied with makeshift tunneling solutions or needs even more security
(such as even higher delivery guarantee or cryptographic log integrity verification), its
time to look at syslog replacements.
This ( http://rr.sans.org/unix/syslog.php ) SANS article provides a good (if outdated)
high-level comparison of syslog replacements. We will look in more details at syslog-ng
(http://www.balabit.hu/en/downloads/syslog-ng/) by BalabIT and msyslog by CORE SDI
(http://www.corest.com). The third well-known replacement (nsyslog by Darren Reed
http://coombs.anu.edu.au/~avalon/nsyslog.html) does not apear to be actively updated
anymore.
Their common features include TCP communication, more filtering options (in addition
to SEVERITY and FACILITY of standard syslog) and log file integrity support.
logs) and server machine (collects logs) from RPM packages. Both
Making msyslog work, while easy, was considerably more difficult than
contradictory at times.
The TCP-mode setup that worked involves the following list of changing on the hosts:
On the client:
IN PLACE OF
*.* @loghost
2. Run msyslog as "msyslogd -i linux -i unix". Just running "/etc/init.d/msyslog start"
does the trick.
On the server:
-----------
-----------
proto TCP and port 514', one would see the log messages)
Lets add hashing to the mix. On the syslog aggregation server, where
the messages are distributed between various log files, one needs to:
2. Stop the syslog daemon and rotate or erase the log files
3. Run the program to create the initial checksum (using included
'peochk' program):
# peochk -g -k /etc/.var.log.authlog.key
to see:
If one is to edit the 'messages' file (say, by removing a line) and then retest the output
is:
matching messages with various source, time and content fields. The
others. Msyslog also features a debug mode that was very helpful
host only, thus any user from the allowed host can connect via telnet
delivery (unlike UDP), some measures should be taken to keep the log
files in case the log server goes down. Msyslog offers configurable
for retry limit (in seconds) and buffer size (in message
lines). Buffers are also important for dealing with message bursts,
that do happen, for example, when programs are starting up or when the
(unlike regular syslog which will only record the name of last step),
----------------------
# global options
options { use_dns(yes);
use_fqdn(no);
use_time_recvd(no);
chain_hostnames(no);
mark(0);
sync(0);
};
file("/proc/kmsg");
};
# *.* @anton
destination d_2 {
tcp("anton" port(514));
};
filter f_5 {
level(debug...emerg);
};
---------------------
It is easy to follow the logic, even though the file is different from
options will take some learning. For example, to enable TCP logging
one has to replace udp with tcp in the configuration file (done in the
Syslog-ng also features more granular access control and can use TCP
send every log message to the STDIN on the "correlate.sh" script, add
------
log { source(s_local); destination(d_prg); };
------
client successfully sent messages over UDP and TCP to msyslog server.
Overall, msyslog and syslog-ng are viable options where extra security
your system and keyboard log transfers, some covert options are
for this purpose. However, the discussion goes beyond the scope of
this paper.
II.
logging host, holding the log records from many machines in a single
since the early days of UNIX and there are few disadvantages with it.
collecting server:
# /etc/init.d/mysql start
-------------------
priority char(10),
date date,
time time,
host varchar(128),
message text,
);
------------------------
5. Restart msyslog
# /etc/init.d/msyslogd start
======================
======================
======picture 1=======
======================
======================
In this setup, message queuing is enable on the client (which uses the
the client after the server restart if using the TCP mode.
much higher rates. In the our tests, for simple messages msyslog-MySQL
available, log file database can be used to analyze the data more
III.
Now lets turn to the analysis part. Log analysis often is defined as
getting meaningful intrusion data and some historical trends from log
file.
In fact, a lot of good software is written to analyze plain text log
review all the log analysis scripts, it makes sense to review the
Log analysis can be split into real-time and periodic. Tools like
Both approaches work, provided you know what to look for and (usually)
from the chaff. Some of the analysis scripts come with an extensive
set of default regexes (logcheck),. while others make you create your
One tool deserve special mention for having a much more sophisticated
offers not just regex matching on a line by line basis. The programs
match an event A and wait for X seconds for the event B to arrive,
then execute an action, match an event A, then count same events for X
you know what to look for, the program can be an extremely powerful
tool for log analysis.
But what if the logs are stored into a database? What are the
To analyze logs, one might want to run various SELECT queries, such as
--------------
number of hosts that sent messages inselect count( distinct host) from syslogTB;
number of messages per host select host, count(host) from syslogTB group by
host order by host desc;
+------------------------+-------------+
| host | count(host) |
+------------------------+-------------+
| box1 | 20 |
| box2.example.com | 3147 |
+------------------------+-------------+
B. Drill-down detailed reports:
search by message text select * from syslogTB where message like "%restart
%";
+----------+----------+------------+----------+------------------------+------------------+-----+
+----------+----------+------------+----------+------------------------+------------------+-----+
+----------+----------+------------+----------+------------------------+------------------+-----+
count the number of events select count(*) from syslogTB where message like
"%restart%";
see all unique message types select distinct message from syslogTB;
--------------
The limit is one's creativity since SQL syntax is very flexible and
provided:
environments. The lists provides some of the things that will ensure
ABOUT AUTHOR:
This is an updated author bio, added to the paper at the time of reposting in 2009.
In addition, Anton teaches classes and presents at many security conferences across
the world; he recently addressed audiences in United States, UK, Singapore, Spain,
Russia and other countries. He works on emerging security standards and serves on
the advisory boards of several security start-ups.
Currently, Anton is developing his security consulting practice, focusing on logging and
PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton
Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously,
Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the
world about the importance of logging for security, compliance and operations. Before
LogLogic, Anton was employed by a security vendor in a strategic product
management role. Anton earned his Ph.D. degree from Stony Brook University.