Highly Compressed Aho-Corasick Automata For
Efficient Intrusion Detection
Xinyan Zha & Sartaj Sahni
Computer and Information Science and Engineering
University of Florida
Gainesville, FL 32611
{xzha, sahni}@cise.ufl.edu
Abstract—We develop a method to compress the unoptimized
Aho-Corasick automaton that is used widely in intrusion detection systems. Our method uses bitmaps with multiple levels
of summaries as well as aggressive path compaction. By using
multiple levels of summaries, we are able to determine a popcount
with as few as 1 addition. On Snort string databases, our
compressed automata take 24% to 31% less memory than taken
by the compressed automata of Tuck et al. [23]. and the number
of additions required to compute popcounts is reduced by about
90%.
Keywords: Intrusion detection, Aho-Corasick trees, compression, efficient popcount computation, performance.
I. I NTRODUCTION
Network intrusion detection systems (NIDS) examine network traffic (both in- and out-bound packets) looking for traffic
patterns that indicate attempts to break into a target computer,
port scans, denial of service attacks, and other malicious
behavior.Bro [16], [6], [9], [19], [5] and Snort [17] are two
of the more popular public-domain NIDSs. Both maintain a
database of signatures (or rules) that include a string as a
component. These intrusion detection systems examine the
payload of each packet that is matched by a rule and reports
all occurrences of the string associated with that rule. It is
estimated that about 70% of the time it takes Snort, for
example, to process packets is spent in its string matching
code and this code accounts for about 80% of the instructions
executed [2]. Consequently, much research has been done
recently to improve the efficiency of string matching ([4], [11],
[23], for example). The focus of this paper is to improve the
storage and search cost of NIDS string matching using AhoCorasick trees [1].
In Section II, we review related work. The Aho-Corasick
automaton, which is central to our work, is described in
Section III. The compression method of Tuck et al. [23] is
described in Section IV. In Section V we propose three designs
to compute popcounts efficiently in 256-bit bitmaps. These
designs make it possible to use popcounts efficiently without
any hardware support whatsoever! Our method to compress
the Aho-Corasick automaton is described in Section VI and
experimental results comparing our method with that of Tuck
et al. [23] are presented in Section VII.
II. R ELATED W ORK
Snort [17] and Bro [16], [6], [9], [19], [5] are two of
the more popular public domain NIDSs. Both are software
solutions to intrusion detection. The current implementation
of Snort uses the optimized version of the Aho-Corasick
automaton [1]. Snort also uses SFK search and the WuManber [24] multi-string search algorithm. To reduce the
memory requirement of the Aho-Corasick automaton, Tuck
et al. [23] have proposed starting with the unoptimized AhoCorasick automaton and using bitmaps and path compression.
In the network algorithms area, bitmaps have been used
also in the tree bitmap scheme [7] and in shape shifting
and hybrid shape shifting tries [20], [12]; path compression
has been used in several IP lookup structures including tree
bitmap [7] and hybrid shape shifting tries [12]. With these
compression methods, the memory required by the compressed
unoptimized Aho-Corasick automaton becomes about 1/50 to
1/30 of that required by the optimized automaton and the WuManber structure and is slightly less than that required by SFK
search [23]. However, a search requires us to perform a large
number of additions at each node and so requires hardware
support for efficient implementation.
Hardware and hardware assisted solutions have been proposed [22], [8], [26], [25], [4], [1], [21], [11], [23], [14],
[13].
III. T HE A HO -C ORASICK AUTOMATON
The Aho-Corasick finite state automaton [1] for multi-string
matching is widely used in IDSs. In the unoptimized version,
which we use in this paper, there is a failure pointer for each
state and each state has success pointers;each success pointer
has a label, which is a character from the string alphabet,
associated with it. Also, each state has a list of strings/rules
(from the string database) that are matched when that state
is reached by following a success pointer. This is the list of
matched rules. The search starts with the automaton start state
designated as the current state and the first character in the
text string, S, that is being searched designated as the current
character. At each step, a state transition is made by examining
the current character of S. If the current state has a success
pointer labeled by the current character, a transition to the
state pointed at by this success pointer is made and the next
256 bit
character of S becomes the current character. When there is no
corresponding success pointer, a transition to the state pointed
at by the failure pointer is made and the current character is not
changed. Whenever a state is reached by following a success
pointer, the rules in the list of matched rules for the reached
state are output along with the position in S of the current
character. This output is sufficient to identify all occurrences,
in S, of all database strings. Aho and Corasick [1] have shown
that when their unoptimized automaton is used, the number of
state transitions is 2n, where n is the length of S.
SB0
SB1
…
SB2
SB14
SB15
4bit
SSB0
SSB1
2bit
Fig. 1.
degree
0
1
2
3
4
5
6
7
8
9
10,11,12,13,14,15
17,18,21,51,78
Assume that the alphabet size is 256 (e.g., ASCII characters). A natural way to store the Aho-Corasick automaton, for
a given database D of strings, is to represent each state of
the unoptimized automaton by a node that has 256 success
pointers, a failure pointer, and a list of rules that are matched
when this state is reached via a success pointer. Assuming that
a pointer takes 4 bytes and the rule list is simply pointed at
by the node, each state node is 1032 bytes. Using bitmap and
path compression, we may use nodes whose size is 52 bytes
[24].
A serious deficiency of the compression method of [23]
is the need to perform up to 31 additions at each bitmap
node. This seriously degrades worst-case performance and
increases the clamor for hardware support for a popcount in
network processors [23]. Since popcounts are used in a variety
of network algorithms ([3], [7], [12], [20], for example) in
addition to those for intrusion detection, we consider, in this
section, the problem of determining the popcount independent
of the application. This problem has been studied extensively
by the algorithms community ([10], [15], [?], for example).
Motivated by the work of Munro [15], [?], we propose 3
designs for summaries for a 256-bit bitmap. The first two of
these use 3 levels of summaries and the third uses 2 levels.
1) Type I Summaries
• Level 1 Summaries For the level 1 summaries, the
256-bit bitmap is partitioned into 4 blocks of 64 bits
each. S1(i) is the number of 1s in blocks 0 through
i − 1, 1 ≤ i ≤ 3.
• Level 2 Summaries For each block j of 64 bits,
we keep a collection of level 2 summaries. For this
purpose, the 64-bit block is partitioned into 16 4-bit
subblocks. S2(j, i) is the number of 1s in subblocks
0 through i − 1 of block j, 0 ≤ j ≤ 3, 1 ≤ i ≤ 15.
• Level 3 Summaries Each 4-bit subblock is partitioned into 2 2-bit subsubblocks. S3(j, i, 1) is the
number of 1s in subsubblock 0 of the ith 4-bit
subblock of the jth 64-bit block, 0 ≤ j ≤ 3,
0 ≤ i ≤ 15.
Figure 1 shows the setup for Type I summaries. When
Type I summaries are used, the popcount for position q
B3
64 bit
IV. T HE M ETHOD OF T UCK ET AL . [23] T O C OMPRESS
N ON -O PTIMIZED AUTOMATON
V. P OPCOUNTS W ITH F EWER A DDITIONS
B2
B1
B0
Fig. 2.
Type I summaries
number of nodes percentage
1964
7.75
22453
88.6
591
2.33
149
0.58
43
0.17
35
0.14
14
0.055
23
0.090
14
0.055
8
0.031
6,3,4,5,3,2
< 0.03
1
< 0.03
Distribution of states in a 3000 string Snort database
(i.e., the number of 1s preceding position q), 0 ≤ q <
256, of the bitmap is obtained as follows:
a) Position q is in subblock sb = ⌊(q mod 64)/4⌋ of
block b = ⌊q/64⌋. The subsubblock ssb is 0 when
q mod 4 < 2 and 1 otherwise.
b) The popcount for position q is S1(b) + S2(b, sb) +
S3(b, sb, ssb) + bit(q − 1), where bit(q − 1) is 0
if q mod 2 = 0 and is bit q − 1 of the bitmap
otherwise; S1(0), S2(b, 0) and S3(b, sb, 0) are all
1
0.
So, using Type I summaries, we can determine a popcount with at most 3 additions whereas using only 1 level
of summaries as in [23], up to 31 additions are required.
This reduction in the number of additions comes at the
expense of memory. An S1(∗) value lies between 0 and
192 and so requires 8 bits; an S2 value requires 6 bits
and an S3 value requires 2 bits. So, we need 8 ∗ 3 = 24
bits for the level-1 summaries, 6 ∗ 15 ∗ 4 = 360 bits for
the level-2 summaries, and 2∗1∗16∗4 = 128 bits for the
level-3 summaries. Therefore, 512 bits (or 64 bytes) are
needed for the summaries. In contrast, the summaries
of the 1-level scheme of [23] require only 56 bits (or 7
bytes).
2) Type II Summaries These are exactly what is prescribed
by Munro [15], [?]. S1 and S2 are as for Type I
summaries. However, the S3 summaries are replaced by
a summary table T 4(0 : 15, 0 : 3) such that T 4(i, j) is
the number of 1s in positions 0 through j − 1 of the
binary representation of i. The popcount for position q
of a bitmap is S1(b) + S2(b, sb) + T 4(d, e), where d
is the integer whose binary representation is the bits in
subblock sb of block b of the bitmap and e is the position
of q within this subblock; S1 and SB are for the current
state/bitmap.
Since T 4(i, j) ≤ 3, we need 2 bits for each entry of T 4
for a total of 128 bits for the entire table. Recognizing
that rows 2j and 2j + 1 are the same for every j,
we may store only the even rows and reduce storage
cost to 64 bits. A further reduction in storage cost for
T 4 is possible by noticing that all values in column
0 of this array are 0 and so we need not store this
column explicitly. Actually, since only 1 copy of this
table is needed, there seems to be little value (for our
intrusion detection system application) to the suggested
optimizations and we may store the entire table at a
storage cost of 128 bits.
The memory required for the level 1 and 2 summaries
is 24 + 360 = 384 bits (48 bytes), a reduction of 16
bytes compared to Type I summaries. When Type II
summaries are used, a popcount is determined with 2
additions rather than 3 using Type I summaries and 31
using the 1-level summaries of [23].
3) Type III Summaries These are 2 level summaries and
using these, the number of additions needed to compute
a popcount is reduced to 1. Level-1 summaries are kept
for the bitmap and a lookup table is used for the second
level. For the level-1 summaries, we partition the bitmap
into 16 blocks of 16 bits each. S1(i) is the number of
1s in blocks 0 through i − 1, 1 ≤ i ≤ 15. The lookup
table T 16(i, j) gives the number of 1s in positions 0
through j − 1 of the binary representation of i, 0 ≤ i <
65, 536 = 216 , 0 ≤ j < 16. The popcount for position
q of the bitmap is S1(⌊q/16⌋) + T 16(d, e), where d
is the integer whose binary representation is the bits in
block ⌊q/16⌋ of the bitmap and e is the position of q
within this subblock; S1 and SB are for the current
state/bitmap.
8 ∗ 15 = 120 bits (or 15 bytes) of memory are required
for the level-1 summaries of a bitmap compared to 7
bytes in [23]. The lookup table T 16 requires 216 ∗ 16 ∗ 4
bits as each table entry lies between 0 and 15 and so
requires 4 bits. The total memory for T 16 is 512KB.
For a table of this size, it is worth considering the
optimizations mentioned earlier in connection with T 4.
Since rows 2j and 2j + 1 are the same for all j, we may
reduce table size to 256KB by storing explicitly only the
even rows of T 16. Another 16KB may be saved by not
storing column 0 explicitly. Yet another 16KB reduction
is achieved by splitting the optimized table into 2. Now,
column 0 of one of them is all 0 and is all 1 in the
other. So, column 0 may be eliminated. We note that
optimization below 256KB may not be of much value as
the increased complexity of using the table will outweigh
the small reduction is storage.
node type
3bits
firstchild
type 3bits
bitmap
256bits
L1(B0,..B2)
8bit*3=24bits
failptr offset
8bits
Fig. 3.
L2(SB0,..SB14)
6bit*4*15=360bits
failptr
32bits
ruleptr
32bits
L3(SSB0)
2bit*16*4*1=128bits
firstchildptr
32bits
Our bitmap node
VI. O UR M ETHOD T O C OMPRESS T HE N ON -O PTIMIZED
A HO -C ORASICK AUTOMATON
A. Classification of Automaton States
The Snort database had 3,578 strings in April, 2006. Figure 2 profiles the states in the corresponding unoptimized
Aho-Corasick automaton by degree (i.e., number of non-null
success pointers in a state). This profile motivated us to
classify the states into 3 categories–B (states whose degree is
more than 8), L (states whose degree is between 2 and 8) and
O (all other states). B states are those that will be represented
using a bitmap, L states are low degree states, and O states
are states whose degree is one or zero. In case the distribution
of states in future string databases changes significantly, we
can use a different classification of states.
Next, a finer (2 letter) state classification is done as below
and in the stated order.
1) BB All B states are reclassified as BB states.
2) BL All L states that have a sibling BB state are
reclassified as a BL states.
3) BO All O states that have a BB sibling are reclassified
as BO states.
4) LL All remaining L states are reclassified as LL states.
5) LO All remaining O states that have an LL sibling are
reclassified as LO states.
6) OO All remaining O states are reclassified as OO states.
B. Node Types
Our compressed representation uses three node types–
bitmap, low degree, and path compressed. These are described
below.
1) Bitmap: A bitmap node has a 256-bit bitmap together
with summaries; any of the three summary types described in
Section V may be used. We note that when Type II or Type
III summaries are used, only one copy of the lookup table (T 4
or T 16) is needed for the entire automaton. All bitmap nodes
may share this single copy of the lookup table. When Type II
summaries are used, the 128 bits needed by the unoptimized
T 4 are insignificant compared to the storage required by the
remainder of the automaton. For Type III summaries, however,
using a 512KB unoptimized T 16 is quite wasteful of memory
and it is desirable to go down to at least the 256KB version.
When Type I summaries are used, each bitmap node (Figure 3) is 110 bytes (we need 57 extra bytes compared to the 52byte nodes of [23] for the larger summaries and an additional
extra byte because we use larger failure pointer offsets). When
Type II summaries are used, each bitmap node is 94 bytes and
the node size is 61 bytes when Type III summaries are used.
node type
3bits
firstchild
type 3bits
bitmap
256bits
L1(B0,..B2)
8bit*3=24bits
failptr offset
8bits
Fig. 4.
n o d e t y p e firstchild
3bits
type 3bits
size
3bits
L2(SB0,..SB14)
6bit*4*15=360bits
failptr
32bits
L3(SSB0)
2bit*16*4*1=128bits
ruleptr
32bits
firstchildptr
32bits
Our path compressed node
char_1
8bits
...
char_8
8bits
failptroff
8bits
failptr
32bits
ruleptr
32bits
firstchildptr
32bits
since failure pointer offsets are 8 bits, using an O node with
capacity between 257 and 261 isn’t possible. So, the limit on
O node capacity is 256. The total size of a path compressed
node O is 10c + 6 bytes, where c is the capacity of the O
node. The size of an Ol node is 10l + 5 as we do not need
the capacity field in such a node.
C. Memory Accesses
Fig. 5.
Our low degree node
2) Low Degree Node: Low degree nodes are used for states
that have between 2 and 8 success transitions. Figure 5 shows
the format of such a node. In addition to fields for the node
type, failure pointer, failure pointer offset, rule list pointer, and
first child pointer, a low degree node has the fields char1, ...,
char8 for the up to 8 characters for which the state has a nonnull success transition and size, which gives us the number
of these characters stored in the node. Since this number
is between 2 and 8, 3 bits are sufficient for the size field.
Although it is sufficient to allocate 22 bytes to a low degree
node, we allocate 25 bytes as this allows us to pack a path
compressed node with up to 2 characters (i.e., an O2 node as
described later) into a low degree node.
3) Path Compressed Node: Unlike [23], we do not limit
path compression to end-node sequences. Instead, we path
compress any sequence of states whose degree is either 1 or
0. Further, we use variable-size path compressed nodes so that
both short and long sequences may be compressed into a single
node with no waste. In the path compression scheme of [23]
an end-node sequence with 31 states will use 7 nodes and in
one of these the capacity utilization is only 20% (only one
of the available 5 slots is used). Additionally, the overhead
of the type, next node, and size fields is incurred for each
of the path compressed nodes. By using variable-size path
compressed nodes, all the space in such a node is utilized and
the node overhead is paid just once. In our implementation,
we limit the capacity of a path compressed node to 256 states.
This requires that the failure pointer offsets in all nodes be
at least 8 bits. A path compressed node whose capacity is
c, c ≤ 256, has c character fields, c failure pointers, c failure
pointer offsets, c rule list pointers, 1 type field, 1 size field, and
1 next node field (Figure 4). We refer to the path compressed
node of Figure 4 as an O node. Five special types of O nodes–
O1 through O5–also are used by us. An Ol node, 1 ≤ l ≤ 5,
is simply an O node whose capacity is exactly l characters.
For these special O-node types, we may dispense with the
capacity field as the capacity may be inferred from the node
type.
The type fields (node type and first child type) are 3 bits.
We use Type = 000 for a bitmap node, Type = 111 for a low
degree node and Type = 110 for an O node. The remaining 5
values for Type are assigned to Ol nodes. Since the capacity
of an O node must be at least 6, we actually store the node’s
true capacity minus 6 in its capacity field. As a result, an 8bit capacity field suffices for capacities up to 261. However,
The number of memory accesses needed to process a node
depends on the memory bandwidth W , how the node’s fields
are mapped to memory, and whether or not we get a match
at the node. We provide the access analysis primarily for the
case W = 32 bits.
D. Bitmap Node With Type I Summaries, W = 32
We map our bitmap node into memory by packing the node
type, first child type, failure pointer offset fields as well as 2
of the 3 L1 summaries into a 32-bit block; 2 bits of this block
are unused. The remaining L1 summary (S1(3)) together with
S2(0, ∗) are placed into another 32-bit block. The remaining
L2 summaries are packed into 32-bit blocks; 5 summaries per
block; 2 bits per block are unused. The L3 summaries occupy
4 memory blocks; the bitmap takes 8 blocks; and each of the
3 pointers takes a block.
When a bitmap node is reached, the memory block with type
fields is accessed to determine the node’s actual type. The rule
pointer is accessed so we can list all matching rules. A bitmap
block is accessed to determine whether we have a match with
the input string character. If the examined bit is 0, the failure
pointer is accessed and we proceed to the node pointed by
this pointer; the failure pointer offset, which was retrieved
from memory when the block with type fields was accessed,
is used to position us at the proper place in the node pointed
at by the failure pointer in case this node is a path compressed
node. So, the total number of memory accesses when we do
not have a match is 4. When the examined bit of the bitmap
is 1, we compute a popcount. This may require between 0 and
3 memory accesses (for example, 0 are needed when bit 0 of
the bitmap is examined or when the only summary required
is S1(1) or S1(2)). Using the computed popcount, the first
child pointer (another memory access) and the first child type
(cannot be that of an O node), we move to the next node in our
data structure. A total of 4 to 7 memory accesses are made.
E. Low Degree Node, W = 32
Next consider the case of a low degree node. We pack the
type fields, size field, failure pointer offset field, and the char 1
field into a memory block; 7 bits are unused. The remaining 7
char fields are packed into 2 blocks leaving 8 bits unused. Each
of the pointer fields occupies a memory block. When a low
degree node is reached, we must access the memory block with
type fields as well as the rule pointer. To determine whether we
have a match at this node, we do an ordered sequential search
of the up to 8 characters stored in the node. Let i denote
the number of characters examined. For i = 1, no additional
memory access is required, one additional access is required
B(I)
B(II)
B(III)
L
O1
O2
O3
O4
O5
O
T B([23])
T O([23])
W =32
match
mismatch
4 to 7
4
4 to 6
4
4 to 5
4
3 to 5
3 to 5
3
3
4
3 or 5
6
3, 5, or 6
7
3 or 5 to 7
8
3, 5, 6, 8, or 9
3,
3,
⌈ 2+5i
⌈ 2+5i
⌉+1
⌉+2
4
4
4 to 5
4
1 + i,6 or 8 3,3 + i
Fig. 6.
Node Type
DataSet 1284
DataSet 2430
W =1024
match
mismatch
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1,
1,
⌈ 2+5i
⌉+1 ⌈ 2+5i
⌉+1
128
128
1
1
1
1
Memory accesses to process a node
B
L
Ol
O
TB
TO
133 595 850 454 1057 2955
100 769 938 576 1527 3310
Fig. 7. Number of nodes of each type, Ol and O counts are for Type I
summaries
when 2 ≤ i ≤ 5, and 2 accesses are required when 6 ≤ i ≤ 8.
In case of no match we need to access also the failure pointer;
the first child pointer is retrieved in case of a match. The total
number of memory accesses to1 process a low degree node is
3 to 5 regardless of whether there is a match.
F. Summary
Using a similar analysis, we can derive the memory access
counts for different values of the memory bandwidth W , other
summary types, and other node types. Figure 6 gives the access
counts for the different node and summary types for a few
sample values of W . The rows labeled B (bitmap), L (low
degree), Ol (O1 through O5), and O refer to node types for
our structure while those labeled T B (bitmap) and T O (one
degree) refer to node types in the structure of Tuck et al. [23].
G. Mapping States to Nodes
We map states to nodes as follows
and in the stated order.
1
1) Category BX, X ∈ {B, L, O}, states are mapped to 1
bitmap node each; sibling states are mapped to nodes
that are contiguous in memory. Note that in the case of
BL and BO states, only a portion of a bitmap node is
used.
2) Maximal sets of LX, X ∈ {L, O}, states that are
siblings are packed into unused space in a bitmap node
created in (1) using 25 bytes per LX state and the low
degree structure of Figure 5. The packing of sibling LX
nodes is done in non-increasing order of the number of
siblings.
3) The remaining LX states are mapped into low degree
nodes (LL states) or O2 nodes (LO states). LL states
are mapped one state per low degree node. As before,
when an LO state whose child is an OO state is mapped
in this way, it is mapped together with its lone OO-state
Methods
Memory(bytes)
Normalized
Data set 2430
[23]
Type I Type II Type III
251524 177061 175511
172523
1
0.70
0.70
0.69
Fig. 8. Memory requirement for data set 2430 (*Excludes memory for T 4
and T 16)
child into a single 25-byte O2 node. Sibling states are
mapped to nodes that are contiguous in memory.
4) The chains of remaining OO states are handled in groups
where a group is comprised of chains whose first nodes
are siblings. In each group, we find the length, l, of the
shortest chain. If l > 5, set l = 5. Each chain is mapped
to an Ol node followed by an O node. The Ol nodes
for the group are in contiguous memory. Note that an O
node can only be the child of an Ol node or another O
node.
VII. E XPERIMENTAL R ESULTS
We benchmarked our compression method of Section VI
against that proposed by Tuck et al. [23] using two data sets
of strings extracted from Snort [18] rule sets. The first data set
has 1284 strings and the second
1 has 2430 strings. We name
each data set by the number of strings in the data set.
A. Number of Nodes
Figure 7 gives the number of nodes of type I and Tuck et
al. [23] in the compressed Aho-Corasick structure for each of
our string sets. The maximum capacity of an allocated O node
was 141 for data set 1284 and 256 for data set 2430.
B. Memory Requirement
Although the total number of nodes used by us is less than
that used by Tuck et al. [23], our nodes are larger and so the
potential remains that we actually use more memory than used
by the structure of Tuck et al. [23]. Figure 8 gives the number
of bytes of memory used by the structure of [23] as well as
that used by our structure for each of the different summary
types of Section V. The row labeled Normalized gives the
memory required normalized by that required by the structure
of Tuck et al. [23]. As can be seen, our structures take between
24% and 31% less memory than is required by the structure
of [23]. With the 256KB required by T 16 added in for Type
III summaries, the Type III representation takes twice as much
memory as does [23] for the 1284 data set and 75% more for
the 2430 data set. As the size of the data set increases, we
expect Type II summaries to be more competitive than [23]
on total memory required.
C. Popcount
Figure 9 gives the total number of additions required to
compute popcounts when using each of the data structures.
For this experiment, we used 3 query strings obtained by
concatenating a differing number of real emails that were
classified as spam by our spam filter. The string lengths varied
Methods
[23]
Type I
strlen=1002832 11.54M 1.46M
strlen=2032131 34.97M 4.43M
strlen=3002665 69.54M 8.78M
Normalized
1
0.127
Fig. 9.
Type II
1.33M
4.02M
7.96M
0.114
Type III
0.79M
2.42M
4.80M
0.069
Number of popcount additions, data set 2430
from 1MB to 3MB and we counted the number of additions
needed to report all occurrences of all strings in the Snort data
sets (1284 or 2430) in each of the query strings. The last row
of each figure is the total number of adds for all 3 query strings
normalized by the total for the structure of [23]. When Type
III summaries are used, the number of popcount additions is
only 7% that used by the structure of [23]. Type I and Type
II summaries require about 13% and 12%, respectively, of the
number of additions required by [23].
VIII. C ONCLUSION
We have proposed the use of 2- and 3-level summaries for
efficient popcount computation and have suggested ways to
minimize the size of the lookup table associated with the popcount scheme of Munro [15]. Using the summaries proposed
here, the number of additions required to compute popcount
1
is between 7% and 13% of that required by the scheme
of [23]. We also have proposed an aggressive compression
scheme. When this scheme is used on our test sets, the memory
required by the search structure is between 24% and 31% less
than that required when the compression scheme of [23] is
used.
R EFERENCES
[1] A. Aho and M. Corasick, Efficient string matching: An aid to bibliographic search, CACM, 18, 6, 1975, 333-340.
[2] S. Antonatos, K. Anagnostakis and E. Markatos, Generating realistic
workloads for network intrusion detection systems, ACM Workshop on
Software and Performance, 2004.
[3] M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, Small forwarding
tables for fast routing lookups, ACM SIGCOMM, 1997, 3-14.
[4] S. Dharamapurikar and J. Lockwood, Fast and scalable pattern matching
for content filtering, ANCS, 2005.
[5] H. Dreger, A. Feldmann, M. Mai, V. Paxson and R. Sommer, Dynamic
application-layer protocol analysis for network intrusion detection,
USENIX Security Symposium, 2006.
[6] H. Dreger, C. Kreibach, V. Paxson, and R. Sommer, Enhancing the
accuracy of network-based intrusion detection with host-based context,
DIMVA, 2005.
[7] W. Eatherton, G. Varghese, Z. Dittia, Tree bitmap: hardware/software
IP lookups with incremental updates, Computer Communication Review,
34(2): 97-122, 2004.
[8] Y. Fang, R. Katz and T. Lakshman, Gigabit rate packet pattern-matching
using TCAM, ICNP, 2004
[9] J. Gonzalez and V. Paxson, Enhancing network intrusion detection with
integrated sampling and filtering, RAID, 2006.
[10] G. Jacobson, Succinct Static Data Structure, Carnegie Mellon University
Ph.D Thesis, 1998.
[11] J. Lockwood, C. Neely, and C. Zuver, An extensible system-onprogrammable-chip, content-aware Internet firewall.
[12] W. Lu and S. Sahni, Succinct representation of static packet classifiers,
IEEE Symposium on Computers and Communications, 2007.
[13] J. Lunteran and A. Engbersen, Fast and scalable packet classification
using, IEEE JSAC, 21, 4, 2003, 560-571.
[14] J. Lunteren, High-performance pattern-matching for intrusion detection,
INFOCOM, 2006
[15] J. Munro, Tables, Foundations of Software Technology and Theoretical
Computer Science, LNCS, 1180, 37–42, 1996.
[16] V. Paxson, Bro: A system for detecting network intruders in real-time,
Computer Networks, 31, 1999, 2435–2463.
[17] Snort users manual 2.6.0, 2006.
[18] http://www.snort.org/dl.
[19] R. Sommer and V. Paxson, Exploiting independent state for network
intrusion detection, ACSAC, 2005.
[20] H. Song, J. Turner, and J. Lockwood, Shape shifting tries for faster IP
route lookup, ICNP, 2005.
[21] H. Song, et al. Snort offloader: A reconfigurable hardware NIDS filter,
FPL 2005.
[22] H. Song and J. Lockwood, Efficient packet classification for network
intrusion detection, FPGA, 2005.
[23] N. Tuck, T. Sherwood, B. Calder and G. Varghese, Deterministic
memory-efficient string matching algorithms for intrusion detection,
INFOCOM, 2004.
[24] S. Wu and U. Manber, Agrep–a fast algorithm for multi-pattern searching, Technical Report, Department of Computer Science, University of
Arizona, 1994.
[25] M. Yazdani, W. Fraczak, F. Welfeld, and I. Lambadaris, Two level state
machine architecture for content inspection engines, INFOCOM 2006.
[26] F. Yu and R. Katz, Efficient multi-match packet classification with
TCAM.