APPROVAL SHEET
Title of Thesis: What About Scan Data? Analyzing Scan Data as part of a “Defense in Depth” Solution to the High Bandwidth Intrusion Detection Problem
Name of Candidate: Douglas Cress
Master of Science, 2003
Thesis and Abstract Approved: ______________________________
Dr. Charles Nicholas
Computer Science
Date Approved:___________
Curriculum Vitae
Name: Douglas Grover Cress.
Permanent Address: 1517 Wyncote Circle, Severn, MD 21144.
Degree and date to be conferred: M.S., 2003.
Date of Birth: May 29, 1977.
Place of Birth: Ft. Hood, Texas
Secondary Education:
Robinson Secondary School, Fairfax, Virginia, June 1995
Collegiate Institutions Attended:
September 2002 –August 2003 University of Maryland Baltimore
County M.S., Aug. 2003.
September 1995 –May 1999 James Madison University B.S.,
May 1999.
Major: Computer Science.
Minor: Mathematics.
Professional publications:
None.
Professional Positions Held:
Computer Scientist, Department of Defense,
9800 Savage Rd., Ft. Meade, MD 20755.
ABSTRACT
Title of Thesis: What About Scan Data?
Analyzing Scan Data as Part of a “Defense in Depth” Solution to the High Bandwidth Intrusion Detection Problem
Douglas Grover Cress, Master of Science, 2003
Thesis directed by: Dr. Charles Nicholas, Computer Science
People the world over need the ability to share ideas and information. Various enclaves of people have fulfilled this need by connecting themselves to each other through an ever-expanding computer network architecture, known as the Global Internet.
While the majority of the populace benefits from this new information utopia, there is a black underbelly of the new cyber-society that seeks to exert its control over any and every part of the network. These so called ‘hackers’ endeavor to bring digital anarchy to all parts of the Internet for reasons running the gamut from self-fulfillment, to greed, to simply a desire to exert power over others. These malefactors are able to take advantage of the expanding bandwidth deployed throughout the Internet to hide their nefarious activities within the benign traffic of the masses, like a pickpocket on the crowded streets of New York City.
One of the techniques employed by hackers is “scanning.” They can use a computer under their control to test potential access points on other remote computers – as fast as hundreds of tests per second. These scans for weak access points can be detected. This thesis extends the state of the art in high-bandwidth intrusion detection through the analysis of scan data that is already collected and ignored by a plethora of Network Intrusion Detection Systems (NIDS) deployed throughout the Internet. By uncovering the scan perpetrators and victims from within the NIDS published scan alerts and only spending time analyzing NIDS published attack alerts related to those uncovered IP addresses, the intrusion analyst can save thousands of hours and possibly catch hundreds more actual intrusions. This analysis method is shown to be effective for analysis of both long-term trend data and more real-time intrusion analysis by examining a major university’s NIDS logs and pointing out more reliable avenues of intrusion sleuthing.
What About Scan Data?
Analyzing Scan Data as Part of a “Defense in Depth” Solution to the High Bandwidth Intrusion Detection Problem
by
Douglas Grover Cress
Thesis submitted to the Faculty of the Graduate School
of the University of Maryland in partial fulfillment
of the requirements for the degree of
Master of Science
2003
Dedication
This work is dedicated to
my parents for raising me
my wife who supports me
my God who sustains me
Acknowledgement
Thanks to Dr. Nicholas for his help and guidance
59
TABLE OF CONTENTS
Chapter Page
1.0 Introduction and Motivation…………………………………................ 2
1.1 High Bandwidth Intrusion Analysis Challenges ………………… 2
1.2 High Bandwidth Intrusion Analyst Solutions……………………. 7
1.3 Hacker Methodology…………………………………………….. 9
1.4 Thesis Synopsis………………………………………………….11
2.0 A Description of Scanning……………………………………………... 13
2.1 Background TCP/IP……………………………………………… 13
2.2 Types of Scans…………………………………………………… 16
2.3 Scan Tools………………………………………………………... 20
3.0 Analyzing NIDS Alerts……………………………………………….. .. 22
3.1 Generic NIDS Description………………………………………. 22
3.2 Brief Description of Snort……………………………………….. 28
3.3 Parsing Logs and Comparing Scan Logs to Alert Logs…………. 30
3.4 Predictive Analysis / Attack Forecasting………………………… 35
4.0 Experiment Description………………………………………………. .. 37
4.1 UMBC’s Fitness as a Testing Ground…………………………… 37
4.2 Long Term / Trend Analysis…..…………………………………. 38
4.3 A More Real-Time Illustration…………………………………… 53
4.4 Tools Created for Analysis……………………………………….. 57
5.0 Conclusions and Future Work………………………………….......... ... 59
6.0 Bibliography……………………………………………………………... 63
7.0 Appendix A………………………………………………………………. 67
scanTop10.pl………………………………………………………….. 67
checkAlerts2.pl………………………………………………………... 69
fit_checkAlerts2_to_excel.pl………………………………………….. 72
1.0 Introduction and Motivation
1.1 High Bandwidth Intrusion Analysis Challenges
Defeating malicious attempts to attack any network is difficult. The attacker has all the advantages of stealth, surprise, tenacity, and often even skill. Defending a network becomes even more difficult as the scale of the network increases. Today’s fast-paced information intensive society requires that every member of an organization have a connection to the Internet. This is true for small families, to the medium sized business of a few hundred employees, to the largest government organizations employing millions of people. Each of these connected computers provides an opportunity for an attacker to sneak into a network to wreck havoc or steal vital proprietary or classified security information.
Historically IP addresses were doled out by the InterNic in class-sized blocks. The entire IPv4 address space is broken down into five classes including a Class A network assignment of 16 million different hosts for a particular network, a Class B network of 65,535 hosts, and a Class C network with 256 hosts. Classes D and E are special case classes typically not doled out to organizations for normal use. Protecting even a small Class C network can be a daunting task for a single intrusion analyst. Sadly in today’s economy, even few Class B size organizations can afford to pay more than one intrusion analyst. Such analysts are typically overworked. Even Class A sized organizations may have only a small team of analysts dedicated to protecting their network from threats both foreign (outside the network) and domestic (inside the network).
The large number of hosts on a network is only half the problem in the large organization intrusion detection realm. The other major challenge is the high bandwidth, (volume of data transmission) that a large set of hosts requires. Even a small Class C sized network can command a bandwidth in the T1-T3 range (1.54 – 45 Mb/s or approximately 16.6 – 486 Gigabytes/day) depending on the applications desired by the network residents. Larger organizations, such as Class B and A size networks, typically require multiple T3’s and even OC-3’s (155 Mb/s, approximately 1.67 Terabytes/day) or OC-12’s (622 Mb/s, approximately 6.72 Terabytes/day) to service their customers. It is nearly impossible to examine such incredibly large bandwidths at line rate, (the speed data can be passed across the line), to determine whether there are intrusion efforts resident within the data stream. Few modern devices can meaningfully process such large amounts of data. Devices such as routers, that can handle the data rate, are typically performing only a single function such as examining the header of a packet to determine its next hop. Inspecting the contents of all the inbound and outbound traffic with a Network Intrusion Detection System (NIDS) at OC-3 or OC-12 rates is just about impossible at current common processor speeds [5].
Organizations that employee large bandwidth and large numbers of hosts usually wish to take active measures to protect their assets. Large global businesses like telecomms and financial institutions stand to loose billions of dollars if their networks are penetrated and proprietary or financial information is extracted from them. Other examples include large universities, which need to protect not only research and development data, but also need to provide a safe environment for students to explore and communicate. Lastly, various world militaries rely on the Internet for transmission of logistics information and command and control. If these assets aren’t protected, soldiers and sailors could perish in combat.
The number of qualified intrusion analysts is fairly small. For example the leading training certification offered by the Global Information Assurance Certification (GIAC) has only certified 643 people since Feb. 2000 [19]! This small body of individuals is expected to constantly keep up with all the newly reported vulnerabilities as well as analyze the never ending stream of audit data produced by their organizations’ network intrusion detection systems, firewalls, or host based intrusion detection systems.
Large organizations typically employ only a small number of network intrusion analysts to help defend their network. In fact, more often than not, the network administrators, who have more than enough on their plate as it is, are conscripted into this activity. The impact of this condition is easily illustrated by looking at any large metropolitan city. Typically there are only a limited number of policemen and prosecuting attorneys who have the time and energy to catch an ever-changing body of criminals intent on committing greater and sneakier crimes. They are usually overworked and habitually miss crimes that would have been preventable had they had more assistance.
Not only are the number of hosts and the amount of bandwidth major deterrents to good intrusion analysis, but the traffic payload transiting the protected networks unnecessarily complicates an intrusion analyst’s task. Frequently the traffic transiting an organization’s network is not the type of traffic that the network architects designed the network to handle. Most organizations architect their network to handle web, mail and some organization specific applications. Software like chat, (AOLTM Instant Messenger), games, (Battle Field 1942TM), and the worst, P2P, (Kazaa, Morpheus), can easily ‘clog’ a network [29]; depriving the members of the organization of the bandwidth they need to perform their mission critical functions.
A network intrusion analyst’s job is complicated by these non-mission related applications in several ways. Most NIDS perform their job by searching the incoming traffic for key words or signatures. These key words frequently come up in applications like chat resulting in an inordinate amount of false positives, (detection events resulting in false intrusion alerts). It is difficult, at the high bandwidth rates mentioned above, to parse and prioritize the data streaming across one’s network in order to help the analyst sort out his workload. Lastly, these applications are typically forbidden by the organization. The users of such censured applications will frequently do whatever it takes to hide their use of these applications. For example, illicit users will typically set their P2P software up to communicate over port 80 so as to hide their traffic in the glut of web traffic typically assigned to traverse port 80. Network defenders often must spend much of their protection time tracking down these non-standard port assignments of P2P in order to ferret out the offenses and remove them from the network.
But the straw that most often breaks the network defender’s back is the lack of good tools to help them with their job. Because of the massive amount of even official data for which an intrusion analyst is responsible, finding good tools to help them is often a major challenge. Network defense is a relatively new field [2]. Software designed to help the intrusion analyst is often developed for other purposes and then adapted to the network defense arena. Visualization tools in particular are rarely up to the task of displaying the thousands or millions of hosts and associated events that a medium to large network may include. Even common network appliances designed with built-in audit software often fall short. For example Cisco has recently provided a feature on their routers designed for traffic accounting. This software is called Cisco NetFlow [10] and it reports the 5-tuple, (Source IP, Destination IP, Source Port, Destination Port, and Protocol), and time of any session that has passed through the router. Meta-data such as this is often touted as a reduction in data solution that provides an analyst all he/she needs to detect intrusions. Unfortunately NetFlow records do not provide any of the payload for any of the sessions that have traversed the router. Without this payload, an analyst is hard pressed to prove that any of their conclusions about a particular intrusion event are valid.
Even default configured NIDS, which are designed from the ground up to help an analyst discover intrusions in the network data, often create more frustration than help for a network defender. Most NIDS are not designed to handle extremely high data rates like those used by today’s Class A and Class B enterprises. Many of today’s commercial grade NIDS claim to be Gigabit capable, but are in truth merely Gigabit enabled, meaning that they can process data at Gigabit line rates, but frequently bog down when particularly large traffic spikes crash against the NIDS. The solution most often suggested to mitigate this problem is to deploy multiple NIDS on a network at different strategic locations in order to spread out the burden on the NIDS [5]. Unfortunately this solution often causes more problems than solutions as the number of alerts, both true positive and false positive, increases to the point of drowning out the analyst’s ability to track down malicious activity.
1.2 High Bandwidth Intrusion Analysis Solutions
The best solution developed so far to help mitigate all the difficulties related to high bandwidth intrusion detection is the defense in depth model. The idea is similar to that of a medieval castle [28]. Such a castle had several defense mechanisms that individually wouldn’t have kept out invaders, but when combined provided a substantial defense. Defensive structures like a moat, a drawbridge, a portcullis, inner and outer walls, sentries and an inner and outer bailey all provided layers of defense which protected the castle and the tenants within as they went about their daily lives.
Today’s modern network castles are best protected by a layered combination of router access control lists, firewalls, NIDS, host-based IDS, and account passwords. Deploying all of these devices allows a network defender to rely on several avenues of protection instead of placing all of his defensive eggs in one basket. Routers act as the drawbridge in the above illustration. They can provide a single point of entry that is controlled by the network administrator, allowing traffic and communication over only a restricted set of ports and protocols. An example is the JuniperTM line of routers that provide a host of filtering options, [24]. Firewalls are often used in conjunction with a router to more finely control access to a network. Where a router may be too busy routing traffic to keep track of connection state across the network boundary, some firewalls can determine the validity of certain types of traffic based on whether or not a host within the network had initiated the connection in question or not. The CheckPoint firewall is an example of this type of firewall, [9]. The Network Intrusion Detection System (NIDS) is similar in function to the sentry watching the travelers coming in and out of the castle, inspecting the contents of their carts and packs for contraband. Depending on the bandwidth of a particular site, a NIDS can examine all of the incoming and outgoing packet traffic and determine whether a particular packet contains malicious or benign code. The ISS RealSecureTM is one of the world’s most popular IDS solutions, [23]. The last line of defense is a host-based IDS. Unlike the previous three devices, this defensive element is not a separate device monitoring the network. Instead it is a piece of software that runs on a host constantly monitoring its audit logs or operating system (OS) execution stack, seeking out malicious actions and alerting the network defender that illicit activity is taking place. Tripwire from, Tripwire.org., is the classic example, [35] of this technology.