Deep Packet Inspection
Using Parallel Bloom Filters
Abstract. There
is a class of packet processing applications that inspect
packets deeper than the protocol headers to analyze content.
For instance, network security applications must drop
packets containing certain malicious Internet worms or
computer viruses carried in a packet payload. Content-based
billing systems analyze media files and bill the receiver
based on the material transferred over the network. Content
forwarding applications look at the hypertext transport
protocol headers and distribute the requests among the
servers for load balancing.
Most payload scanning applications
have a common requirement for string matching. For example,
the presence of a string of bytes (or a signature) can identify
the presence of a media file. Well-known Internet worms such
as Nimda, Code Red, and Slammer propagate by sending malicious
executable programs identifiable by certain byte sequences
in packet payloads. Because the location (or offset) of such
strings in the packet payload and their length is unknown,
such applications must be able to detect strings of different
lengths starting at arbitrary locations in the packet payload.
Packet inspection applications,
when deployed at router ports, must operate at wire speeds.
With networking speeds doubling every year, it is becoming
increasingly difficult for software-based packet monitors
to keep up with the line rates. These changes have underscored
the need for specialized hardware-based solutions that are
portable and operate at wire speeds.
We describe a hardware-based
technique using Bloom filters, which can detect strings in
streaming data without degrading network throughput.1 A Bloom
filter is a data structure that stores a set of signatures
compactly by computing multiple hash functions on each member
of the set. This technique queries a database of strings
to check for the membership of a particular string. The answer
to this query can be false positive but never a false negative.
An important property of this data structure is that the
computation time involved in performing the query is independent
of the number of strings in the database provided the memory
used by the data structure scales linearly with the number
of strings stored in it. Furthermore, the amount of storage
required by the Bloom filter for each string is independent
of its length.
Our hardware implementation
groups signatures according to their length (in bytes) and
stores each group of strings in a unique Bloom filter. Each
Bloom filter scans the streaming data and checks the strings
of corresponding length. Whenever a Bloom filter detects
a suspicious string, an analyzer probes this string to decide
whether it indeed belongs to the given set of strings or
is a false positive.
Here, we present the architecture
of a hardware-based Bloom filter implementation and analyze
its performance. The analysis shows that an implementation
with field-programmable gate arrays (FPGAs) can support multigigabit
per second line speeds while scanning for more than 10,000
strings.
|