What About Scan Data?

Analyzing scan data as part of a defense in depth solution to the high bandwidth intrusion detection problem


i. Abstract

-- Guarding a large, bandwidth environment from attackers is very difficult

-- Analyzing scan data to help determine what alarms are worth looking at reduces the workload for the analyst

-- These ideas have been modeled at a university setting and initial indications are positive

I. Introduction

A. Difficulty of operating a NIDS in a large scale environment

1. Amount of data vs. small # of qualified people

(a) large number of hosts on a class A or class B network space

or even in a CIDR block

(b) amount of data that can pass through an OC3 or several T3’s

in a day. (analogy to relate amount of data passing to something

comprehensible)

(c) Types of organizations that have large pipes, what they could

loose if their networks aren’t protected

(d) qualified security people in general, even smaller number of

network intrusion detection people. (analogy of small # of

policemen in a city the size of Baltimore and the number of

crimes that are committed)

2. Ancillary data like P2P traffic, chat traffic, etc. (difficult key words)

(a) network bandwidth is eaten up by non- mission related traffic

even more grey for the ISP’s where really everything is mission

related

(b) difficult to sort the traffic in a high speed manner, esp when

you have to re-assemble the TCP sessions before you can

figure out what’s inside them

(c) people are always trying to get around local security policy

by changing the default ports for their applications

3. In general poor tools designed for small networks

(a) IDS’s are really the only way to analyze the incoming

traffic at the packet level

(b) merely examining the header information (5 tuple/CISCO

NETFLOW) is not enough to determine malicious intent

(c) firewalls that auto-respond to block traffic aren’t sufficient

Firwalls that maintain state can barely keep up with the kind

of bandwidth that a large installation may control

(d) most of the IDS solutions deployed today are designed for

small networks. Rated to process data at the T1-T3 rate

(list bandwidth capabilities of snort, dragon, netranger,

real-secure, & intrusion.com)

(e) tying a bunch together causes correlation problems

B. Break problem down with a defense in depth strategy.

1. Router ACLs -> Firewalls -> NIDS -> Host based stuff

(a) Defense in depth is the only way to handle the high-bandwidth

problem. Don’t put all eggs in one basket

(b) Routers can control network flow based on a very granular

filtering mechanism at the port, protocol, or IP level

(c) constant fight between those trying to secure the network

and those trying to provide services to the customers of

the network

(d) compromise is usually the firewall. Basically another filtering

device that often can provide some form of state maintenance

(e) NIDS is usually the next step. A NIDS can do anomaly or

signature based detection. Sigs and anomaly can be tricked

flooded with false alarms, DoS’d, hacked (mention SNORT

bugs) Full time job of an alarm analyst to keep up with the

constant new threats arriving. False positives are the greatest

problem with NIDS. They can generate tons of useless data.

NIDS is good because it looks at all (err.. most) traffic as it’s

on its way to doing something nefarious. A quick analyst could

take action based on a NIDS alert/alarm and possibly prevent

excessive damage from occurring against his/her system

(f) Host based strategies like log analysis, i.e. tripwire etc are old-

fashioned. They only catch things that the stupid hackers forget

to clear out of the logs before they leave. Easier to thwart, or

fool. Similar NIDS in a way is that logs can contain reams of

benign data, making it difficult for the tools and the analyst to

sort through looking for the useful problems

C. Target the NIDS part and try to make it better by focusing on the most

important alerts

1. Best part about NIDS is that it can be completely hidden (passive) to the

intruder It can have the most likely chance at success, in some respects, for catching a bad-guy because he won’t know it’s there until he gets caught

2. NIDS analyst doesn’t have to fight with network dorks

3. Biggest problem with NIDS is false alarms

4. Use Scans to predict what alerts to watch for

5. Scans can be easily discovered for the most part. everything except low-

and slow is easy to pick out and figure out who the perp is and who the

victim is. Also a good IDS should be able to check for responses from

the victim hosts allowing the NIDS analyst to see exactly what the perp

sees as the result of his/her scanning.

D. Explain Hacker methodology – all attacking takes the following form even

DoS

1. Scan – can’t rob a house until you know where it is

2. Initial compromise – need to find an intial access to the house i.e. break

a window

3. Privilege escalation – need to find a way to get into the room where the

the safe is.

4. Nefarious activity – break open the safe and start to carry the stock

certificates away

5. Attack relay – look out the window at other houses on the street or look

through the address book found in the kitchen drawer for other people

to rob. Or impersonate the house owners to trick the neighbors into

letting you into their house

E. Signpost for rest of paper

1. Statement of problem – the problem I’m trying to solve is to reduce the

workload of the NIDS analyst and possibly provide some predictive

element as to what could be attacked, and what it could be a victim

of, and possibly a categorization of the skill of the attacker

2. Abbreviated, innovative solution – Solution is to capture as much

data about scanning as possible to allow a determination to be made

as laid out above

(a) UMBC network used as an example. – SNORT logs were

analyzed and patterns were noticed that alert the analyst

of impending doom.

(b) perl scripts were used to analyze data and were graphed

to give the analyst a picture of what might happen soon

II. Previous related work – brief summary & why their papers fall short

A. Seminal papers

1. On a difficulty of Intrusion Detection – Stefan Axelsson

2. Computer Security Threat Monitoring and Surveillance – James

Anderson

3. Data mining for Network Intrusion Detection – Kumar, Lazarevic,

Srivastava

4. ??Teresa Hunt paper ??

5. ?? Dorothy Denning paper ??

B. Closely related papers

1. Analysis of Low & Slow Network Scans – Keith Scott

2. Analyzing the past… predicting the future – HoneyNet Project

3. Base-rate fallacy – Stefan Axelsson

III. Describe Scanning in depth

i. brief review attack methodology – i.e. just remind the 5 steps

ii. legal status of scanning

A. Brief Network tutorial explaining how TCP/IP lends itself to scanning from a

protocol level

1. All of the TCP/IP protocols are designed to transfer data between nodes

some of them are designed to do this reliably (TCP) some unreliably

(UDP, ICMP)

2. Even though there are standards and RFCs that clearly lay out what all

the actors are supposed to do in every case, individual vendors often

take quite a bit of liberalness with the way the respond to network

events allowing perp’s to do OS detection

(a). TCP – has the built in 3 way handshake. All computers

must respond to it or the reliable connection cannot be

established. This allows a perp, to initiate action that the

victim host must respond to if it is able.

(b). UDP – Even though UDP is a connection-less protocol it can

still be probed for responses because certain services respond

immediately when a packet is sent to them (echo/chargen etc.)

additionally ICMP is designed to report error messages,

attackers to do sort of an inverse scan

(c). ICMP – many of ICMP messages are auto responded to.

Classic example of course is the ping message

B. Types of Scans – explain how scan is perpetrated (i.e. what part of the TCP/IP

protocol is being abused to allow scan to happen), give an example of the type

of information that the attacker is looking for with this scan, explain what can

be done with this type of information, point out how sophisticated the attacker

has to be to use this type of scan

1. Half – Open SYN scan

2. Null Host scan

3. OS fingerprint scan

4. Grim’s ping

5. etc…

C. Common tools used to scan

1. Commonality of all scanning tools – ease of use (availability, gui etc.)

2. Custom built tools, easy to write a quick perl script that can do some

scanning

2. Nessus, Satan, Queso, Nmap, etc.

IV. Making use of NIDS Scanning Alerts / Logs

A. How Alerts are generated in NIDS – 2 main types of NIDS signature vs.

anomaly. If a set of inputs matches a particular rule or doesn’t match a rule

depending on the context, then fire off an alert/alarm

1. Generic NIDS description

(a) Alert level – usually 1-5 allows the NIDS analyst to sort which

alerts he will pay attention to. Usually the lower level alerts get

totally ignored. Scan alerts are usually considered lower level

alerts

(b) False positives – are when an alert fires on a particular input

even though that input is not actually associated with malicious

activity. this is really one of the most annoying parts of

analyzing NIDS alerts. NIDS have lots of false positives

because unlike a human they don’t usually have any context

within which to judge a particular event

(c) False negatives – These are very dangerous to the network

being defended because these are malicious events that

don’t get alarmed on. Most frequently happens when a brand

new attack comes out that there is no signature for yet.

(d) Anomaly vs. Signature –

Anomaly is good for some things /bad for others

like finding large scans or DoS’s but it’s not very good for

finding virus or worm payloads coming across the wire, or

for activity that is malicious in nature, but not out of the

ordinary for a particular protocol or service. i.e. if someone

already knows what a password is for a particular server

then they won’t look out of the ordinary when they log in, not

until you look at exactly what they’re doing (like executing a

buffer overflow on the remote system or changing the logs) will

you realize that something bad is going on.

Signature is good for something /bad for others.

Signature is very good for finding things like virus’s or worms

coming across the wire. Very good for tracking down company

policy violators (like surfing for kiddie-porn). Great for

catching someone in the act of doing something bad before they

do something worse. Signature based systems are real bad when

There’s tons of traffic flying around. They’re also horrible for

new never-before-seen attacks.

2. SNORT specific description – SNORT is described here because that’s

what UMBC uses to protect their network

(a) SNORT has several preprocessors that look through the

incoming traffic and decide what is going on in the packets

to try to determine maliciousness. These preprocessors are

designated to look for things like TCP reassembly and

scanning. These preprocessors are then able to put out alarms

(b) SNORT has the scan preprocessor that creates alerts when

certain thresholds are reached. It then puts all of these alerts

in a particular file so that the analyst can look through them

later. Which he/she rarely does

(c) The rest of the SNORT engine can publish alerts/alarms to

a repository that is checked much more frequently by the

analyst. Especially when he/she is looking out for a particular

thing.

B. Parsing logs and Comparing scan logs to other ‘attack’ logs

1. Possible in Real time? – to be predictive one must be keeping track

of everything going on right now! (or as close to now as possible)

the problem is that, the more processing you do to a packet before

you decide whether it is malicious or not, really slows down your

ability to see it and react to it.

(a) what kind of delay is acceptable – manpower,

dollars, criticality of what you’re protecting

2. Tools for parsing – what you want to do with an IDS like SNORT (or

really any IDS that publishes scan and regular alert alerts ;) is look

through the scan alerts as fast as possible and try to figure out where

they match up to any incoming alerts right now. If you want to predict

what may or may not happen in the future you need to figure out who

the perps are and who the victims are as soon as possible so that you

know what to look out for in the coming days, weeks, or months

(a) Perl is a great language for parsing text cause it’s designed for

that.