What About Scan Data?
Analyzing scan data as part of a defense in depth solution to the high bandwidth intrusion detection problem
i. Abstract
-- Guarding a large, bandwidth environment from attackers is very difficult
-- Analyzing scan data to help determine what alarms are worth looking at reduces the workload for the analyst
-- These ideas have been modeled at a university setting and initial indications are positive
I. Introduction
A. Difficulty of operating a NIDS in a large scale environment
1. Amount of data vs. small # of qualified people
(a) large number of hosts on a class A or class B network space
or even in a CIDR block
(b) amount of data that can pass through an OC3 or several T3’s
in a day. (analogy to relate amount of data passing to something
comprehensible)
(c) Types of organizations that have large pipes, what they could
loose if their networks aren’t protected
(d) qualified security people in general, even smaller number of
network intrusion detection people. (analogy of small # of
policemen in a city the size of Baltimore and the number of
crimes that are committed)
2. Ancillary data like P2P traffic, chat traffic, etc. (difficult key words)
(a) network bandwidth is eaten up by non- mission related traffic
even more grey for the ISP’s where really everything is mission
related
(b) difficult to sort the traffic in a high speed manner, esp when
you have to re-assemble the TCP sessions before you can
figure out what’s inside them
(c) people are always trying to get around local security policy
by changing the default ports for their applications
3. In general poor tools designed for small networks
(a) IDS’s are really the only way to analyze the incoming
traffic at the packet level
(b) merely examining the header information (5 tuple/CISCO
NETFLOW) is not enough to determine malicious intent
(c) firewalls that auto-respond to block traffic aren’t sufficient
Firwalls that maintain state can barely keep up with the kind
of bandwidth that a large installation may control
(d) most of the IDS solutions deployed today are designed for
small networks. Rated to process data at the T1-T3 rate
(list bandwidth capabilities of snort, dragon, netranger,
real-secure, & intrusion.com)
(e) tying a bunch together causes correlation problems
B. Break problem down with a defense in depth strategy.
1. Router ACLs -> Firewalls -> NIDS -> Host based stuff
(a) Defense in depth is the only way to handle the high-bandwidth
problem. Don’t put all eggs in one basket
(b) Routers can control network flow based on a very granular
filtering mechanism at the port, protocol, or IP level
(c) constant fight between those trying to secure the network
and those trying to provide services to the customers of
the network
(d) compromise is usually the firewall. Basically another filtering
device that often can provide some form of state maintenance
(e) NIDS is usually the next step. A NIDS can do anomaly or
signature based detection. Sigs and anomaly can be tricked
flooded with false alarms, DoS’d, hacked (mention SNORT
bugs) Full time job of an alarm analyst to keep up with the
constant new threats arriving. False positives are the greatest
problem with NIDS. They can generate tons of useless data.
NIDS is good because it looks at all (err.. most) traffic as it’s
on its way to doing something nefarious. A quick analyst could
take action based on a NIDS alert/alarm and possibly prevent
excessive damage from occurring against his/her system
(f) Host based strategies like log analysis, i.e. tripwire etc are old-
fashioned. They only catch things that the stupid hackers forget
to clear out of the logs before they leave. Easier to thwart, or
fool. Similar NIDS in a way is that logs can contain reams of
benign data, making it difficult for the tools and the analyst to
sort through looking for the useful problems
C. Target the NIDS part and try to make it better by focusing on the most
important alerts
1. Best part about NIDS is that it can be completely hidden (passive) to the
intruder It can have the most likely chance at success, in some respects, for catching a bad-guy because he won’t know it’s there until he gets caught
2. NIDS analyst doesn’t have to fight with network dorks
3. Biggest problem with NIDS is false alarms
4. Use Scans to predict what alerts to watch for
5. Scans can be easily discovered for the most part. everything except low-
and slow is easy to pick out and figure out who the perp is and who the
victim is. Also a good IDS should be able to check for responses from
the victim hosts allowing the NIDS analyst to see exactly what the perp
sees as the result of his/her scanning.
D. Explain Hacker methodology – all attacking takes the following form even
DoS
1. Scan – can’t rob a house until you know where it is
2. Initial compromise – need to find an intial access to the house i.e. break
a window
3. Privilege escalation – need to find a way to get into the room where the
the safe is.
4. Nefarious activity – break open the safe and start to carry the stock
certificates away
5. Attack relay – look out the window at other houses on the street or look
through the address book found in the kitchen drawer for other people
to rob. Or impersonate the house owners to trick the neighbors into
letting you into their house
E. Signpost for rest of paper
1. Statement of problem – the problem I’m trying to solve is to reduce the
workload of the NIDS analyst and possibly provide some predictive
element as to what could be attacked, and what it could be a victim
of, and possibly a categorization of the skill of the attacker
2. Abbreviated, innovative solution – Solution is to capture as much
data about scanning as possible to allow a determination to be made
as laid out above
(a) UMBC network used as an example. – SNORT logs were
analyzed and patterns were noticed that alert the analyst
of impending doom.
(b) perl scripts were used to analyze data and were graphed
to give the analyst a picture of what might happen soon
II. Previous related work – brief summary & why their papers fall short
A. Seminal papers
1. On a difficulty of Intrusion Detection – Stefan Axelsson
2. Computer Security Threat Monitoring and Surveillance – James
Anderson
3. Data mining for Network Intrusion Detection – Kumar, Lazarevic,
Srivastava
4. ??Teresa Hunt paper ??
5. ?? Dorothy Denning paper ??
B. Closely related papers
1. Analysis of Low & Slow Network Scans – Keith Scott
2. Analyzing the past… predicting the future – HoneyNet Project
3. Base-rate fallacy – Stefan Axelsson
III. Describe Scanning in depth
i. brief review attack methodology – i.e. just remind the 5 steps
ii. legal status of scanning
A. Brief Network tutorial explaining how TCP/IP lends itself to scanning from a
protocol level
1. All of the TCP/IP protocols are designed to transfer data between nodes
some of them are designed to do this reliably (TCP) some unreliably
(UDP, ICMP)
2. Even though there are standards and RFCs that clearly lay out what all
the actors are supposed to do in every case, individual vendors often
take quite a bit of liberalness with the way the respond to network
events allowing perp’s to do OS detection
(a). TCP – has the built in 3 way handshake. All computers
must respond to it or the reliable connection cannot be
established. This allows a perp, to initiate action that the
victim host must respond to if it is able.
(b). UDP – Even though UDP is a connection-less protocol it can
still be probed for responses because certain services respond
immediately when a packet is sent to them (echo/chargen etc.)
additionally ICMP is designed to report error messages,
attackers to do sort of an inverse scan
(c). ICMP – many of ICMP messages are auto responded to.
Classic example of course is the ping message
B. Types of Scans – explain how scan is perpetrated (i.e. what part of the TCP/IP
protocol is being abused to allow scan to happen), give an example of the type
of information that the attacker is looking for with this scan, explain what can
be done with this type of information, point out how sophisticated the attacker
has to be to use this type of scan
1. Half – Open SYN scan
2. Null Host scan
3. OS fingerprint scan
4. Grim’s ping
5. etc…
C. Common tools used to scan
1. Commonality of all scanning tools – ease of use (availability, gui etc.)
2. Custom built tools, easy to write a quick perl script that can do some
scanning
2. Nessus, Satan, Queso, Nmap, etc.
IV. Making use of NIDS Scanning Alerts / Logs
A. How Alerts are generated in NIDS – 2 main types of NIDS signature vs.
anomaly. If a set of inputs matches a particular rule or doesn’t match a rule
depending on the context, then fire off an alert/alarm
1. Generic NIDS description
(a) Alert level – usually 1-5 allows the NIDS analyst to sort which
alerts he will pay attention to. Usually the lower level alerts get
totally ignored. Scan alerts are usually considered lower level
alerts
(b) False positives – are when an alert fires on a particular input
even though that input is not actually associated with malicious
activity. this is really one of the most annoying parts of
analyzing NIDS alerts. NIDS have lots of false positives
because unlike a human they don’t usually have any context
within which to judge a particular event
(c) False negatives – These are very dangerous to the network
being defended because these are malicious events that
don’t get alarmed on. Most frequently happens when a brand
new attack comes out that there is no signature for yet.
(d) Anomaly vs. Signature –
Anomaly is good for some things /bad for others
like finding large scans or DoS’s but it’s not very good for
finding virus or worm payloads coming across the wire, or
for activity that is malicious in nature, but not out of the
ordinary for a particular protocol or service. i.e. if someone
already knows what a password is for a particular server
then they won’t look out of the ordinary when they log in, not
until you look at exactly what they’re doing (like executing a
buffer overflow on the remote system or changing the logs) will
you realize that something bad is going on.
Signature is good for something /bad for others.
Signature is very good for finding things like virus’s or worms
coming across the wire. Very good for tracking down company
policy violators (like surfing for kiddie-porn). Great for
catching someone in the act of doing something bad before they
do something worse. Signature based systems are real bad when
There’s tons of traffic flying around. They’re also horrible for
new never-before-seen attacks.
2. SNORT specific description – SNORT is described here because that’s
what UMBC uses to protect their network
(a) SNORT has several preprocessors that look through the
incoming traffic and decide what is going on in the packets
to try to determine maliciousness. These preprocessors are
designated to look for things like TCP reassembly and
scanning. These preprocessors are then able to put out alarms
(b) SNORT has the scan preprocessor that creates alerts when
certain thresholds are reached. It then puts all of these alerts
in a particular file so that the analyst can look through them
later. Which he/she rarely does
(c) The rest of the SNORT engine can publish alerts/alarms to
a repository that is checked much more frequently by the
analyst. Especially when he/she is looking out for a particular
thing.
B. Parsing logs and Comparing scan logs to other ‘attack’ logs
1. Possible in Real time? – to be predictive one must be keeping track
of everything going on right now! (or as close to now as possible)
the problem is that, the more processing you do to a packet before
you decide whether it is malicious or not, really slows down your
ability to see it and react to it.
(a) what kind of delay is acceptable – manpower,
dollars, criticality of what you’re protecting
2. Tools for parsing – what you want to do with an IDS like SNORT (or
really any IDS that publishes scan and regular alert alerts ;) is look
through the scan alerts as fast as possible and try to figure out where
they match up to any incoming alerts right now. If you want to predict
what may or may not happen in the future you need to figure out who
the perps are and who the victims are as soon as possible so that you
know what to look out for in the coming days, weeks, or months
(a) Perl is a great language for parsing text cause it’s designed for
that.