A Large Network Malicious Code Detection System: VDS

PING WANG,XIAO-CHUN YUN, BIN-XING FANG

ResearchCenter of Computer Network and Information Security Technology

HarbinInstitute of Technology, Harbin 150001

CHINA

Abstract: To deal with the attack of malicious code has been the largest challenge to the intrusion detection system. There is not a good approach to detection malicious code in the large network at present. Referring the other IDS, we develop a network based malicious code detection system: VDS. To fit the trend of malicious code, we create new rule description, and use event combination to get clearer and more macroscopic information. VDS can detect malicious code efficiently in the large network and respond in time to protect the computer user and network.

Keywords:intrusion detection; malicious code; rule description; event combination; response

1.Introduction

The reason for the intrusion detection system (IDS) being so popular is because of its capacity of pre_alerting. The current IDSs identify the intrusion mainly in terms of the behavior pattern. But it is ineffective for ids to identify the content-based intrusion, such as all kinds malicious code. Meanwhile the malicious code is becoming one of the most serious threats. To deal with such problem, we developed a malicious code detection system named VDS (virus detection system), which aims to detecting the content-based intrusion.

In this paper, our own rule description of malicious code is proposed, and new humanity technology is used to abstract detection results which is named event combination by us, because to improve the usability of a realistic system is very important. So this paper includes three main parts as following:

Rule Description

Event Combination

Architecture and Realization of VDS

This paper is organized as follows. Section 2 describes other related researches, the survey of malicious code in subsection 2.1 and nowadays detection approaches in subsection 2.2. We construct our rule description of malicious code in the Section 3, after discuss the rule description of snort. Section 4 lays out three main types of event combination. Section 5 describes the architecture of VDS and its realization in detail. Finally, we give a short conclusion in section 6.

2.Related works

2.1The survey of malicious codes

What is the precise definition of malicious code? It is difficult to answer this question, but generally malicious codes extend the definition of computer virus, perhaps generalized virus, include not only the traditional computer virus, but also worm、Trojan、devastating scripts and all of any kind of malicious executable codes, authored specifically for their ability to disrupt the normal or intended operation of computing infrastructure and services[1], especially in the Windows operational system[2], and it is not easy to classify them[3].

In the past, the computer virus overrun on the personal computers, it bring huge puzzle to the computer users.And even several years ago there was legitimate concern that, within the next few years, the Internet would provide a fertilemedium for new breeds of computer viruses capable of spreading orders of magnitude fasterthan that day's viruses [4]. However, because of the development of network and its far-ranging used, the worms bring more deep disaster to the people than computer virus. Now the most dramaticalnuisance is worm. After the Code Red burst in 2001, SQL Slammer burst in January 2003; current the Worm Sasser is flooding. Wemay use Slammer as an example [5].Slammer (sometimes called Sapphire) was thefastest computer worm in history. As it beganspreading throughout the Internet, the worminfected more than 90 percent of vulnerablehosts within 10 minutes, causing significant disruption tofinancial, transportation, and government institutionsand precluding any human-based response.

2.2The detection of malicious code

To solve the problem of computer virus, the Anti-Virus software was produced in the past years, because the signature [6] of computer virus always can be extracted. The Anti-Virus software can be installed in the host, scan the memory and disk to find computer virus, and delete the founded virus from the system. To prevent the computer be infected by computer virus, one monitor thread always monitor the operation system. This way is available to confront the computer virus, but it is not sensitive to the attack of worms. Because the antivirus software is short of the ability to find and locate malicious code in network scope, it can do nothing to the burst and population of malicious code in the large network.

Then IDSs (Intrusion Detection System) were also applied. However, the detection of IDS is based on the behavior of computer and network, it can find some attack of worms, but it can do nothing about the malicious code in content, that is worse the IDS can’t deal with all kinds of computer virus and Trojans. All of the IDS can’t detect the malicious code using normal communication mechanism, i.e., the transmission of E-mail worms which only can be found by the content detection. Perhaps to install antivirus software and HIDS (Host based IDS) on every computer in the network is a good idea to detect malicious code, but it’s terrified by the sight of the work of install、manager and update and so on. Confronting with the large impact from malicious code, because of the massive node not under control in Internet, the HIDS can’t solve this problem, the early warning、location and protection must be on the network scope.

To deal with the malicious code, NIDS (Network based IDS) using content detection must be researched. So we construct a bran-new product to solve the problem. In the past two years, we realized AVG (Anti-Virus Gateway) on LINUX platform, it can detect more than 10,000 kinds of computer virus on the entrance of large network. Base on AVG, VDS (Virus Detection System) make up for the common Anti-Virus software and the IDS tools. The VDS provides ability to find malicious code in the large network scope, and can quickly locate it.

The approach presented in VDS employs a combination oftechniques such as the use of zero-copy packet capture, parallel protocol revert, real time response, online updates and so on. But the most important new technologies we adopted are integrative rule description and event combination. Furthermore, our VDS system is very easilyextensible to accommodate detection and reactive measuresagainst new kinds of malicious codeswhen they become known. In this paper rule description、event combination mechanism are discussed at first, then base on these work, the architecture of VDS is introduced.

3.Model of VDS rule description

The Intrusion Detection System always uses two classical models: misuse detection and anomaly detection [7]. The misuse detection, also called character detection, looks the patterns that match the known intrusion characters as intrusion. At first it analyze the known intrusion and draw out its character (signature), then describe it as rule with an understood standard format, thus collecting more and more rules, the rule database is created. Comparing the data should be analyzed with every rule in database, while consistency is found, that means the intrusion described by that rule is found.

The Misused Detection based Intrusion Detection Systems work well to detect the intrusion whose character had been stored in the rule database, and fault rate is very low. For there are thousands of computer viruses in the computer world, and the detection of computer virus is very mature technology, character detection is used in our design. The base step in misuse intrusion detection is to collect the rules and create rule database. The rule database is the key, so the model of rule description must be constructed at first.

Now there are some behavior-based IDS or malicious code detection systems, for example, LAWS[7], but the rule description has never been expressed clear and perfectly. In this paper the rule description of malicious code is constructed based on the propensity of malicious code, and the rule description of IDS, i.e. Snort is referred. The rule description of VDS is redundant, it’s the tradeoff of flexibility and extensibility.

3.1The rule description of Snort

There are many IDS products, but the most famous IDSsystem is snort [9]. Now let’s refer and discuss the rule of snort. The rule of snort is composed of rule header and rule option. The rule header includes: operation、protocol、source address and destination address、net mask、port, etc. The rule option includes: alert message、pattern message and index message. The different component of one rule must be true simultaneously, like the AND operation; whereas the relation between rules in a same rule database like the OR operation.

Snort is a lightweight IDS product, it work very well in the environment of host or little network. But snort can’t work in large network and its rule description has the obvious flaw:

1, In the keywords of snort rule, some protocol variables are defined, for example, IP、TCP、UDP、ICMP, such primary protocol, and other protocol variables, e.g. URL, etc. But only the frequently appeared network protocol data fields which be used in the attack characters are defined. If new attack approaches use the keyword not defined in the snort, user can’t generate snort rule by himself, unless the developer of snort update the program. Thus it is not fit for the flexible attack approaches.

2, The “content” is one of the most important keywords of snort, it is used in content match. But the definition of content is not precise. There is not key word to anchor the precise position of character data.

3, The IP address field is redundant and useless.

4, There is not keyword to describe the frequency or intensity of attack. In nowadays the worm is the backbone of malicious code, so such keyword is necessary. And some access is normal but when its frequency is too high the access become attack.

In a word, the definition of snort rule is narrow and difficult to extend. However, the precondition of rule databaseis an efficient、available and easy extensible rule description approach, and it should have very strong flexibility, can be used to describe all kinds of intrusion incident.

3.2The network appearance of malicious code

To get the rule database, we must inspect the malicious codes and describe the signature of malicious codes, then we may discuss them in turn. Perhaps we can try our test to classify the malicious code to three most important kinds: virus、Trojan、worm. But it is very difficult to classify the malicious codes, because there are not obvious boarder between themselves, even they include each other. Some hard work [10] is done to the taxonomy, it help us to find out a great variety of malicious code.

However, our object is not to make the taxonomy of malicious code perfect but to detect the malicious codes. We change our train of thoughts to keep away from the obscureclassification in taxonomy. With an eye to the network appearance in this section, we build our rule description of VDS in the next section. Let’s discuss the appearance of most kinds of malicious code.

At first, we know in the frame of TCP/IP [11] network architecture different applications use different protocol in different layers like in the figure 1. The higher layer perhaps put some additional information in the header, even encode the information, to get the real data in higher layer from data in the lower layer the work of removing the header and decoding the information must be done, which is called protocol revert.

The malicious codes are associated with the applications, e.g. the worm slammer can exploit the Microsoft SQL Server service having vulnerability, whereas the same data is sent to the HTTP service, nothing would happened, saying nothing of attack. The relation between malicious codes and applications is picked up by us using the term “protocol”. With the applications increasing very rapidly, a lot of protocol is used. From the figure 1, we can draw a conclusion: the protocols on the top of network layer are often used by malicious code except ICMP.

So, the rule description must have the key word “protocol”. This filed can be HTTP、SMTP or ICMP.


For the detection of malicious code, the lowest available layer is IP protocol. Then the network appearances of malicious code taking no account of protocol are like below:

1, content signature without fixed offset

A majority of most of computer virus, Trojan、devastating scripts and URL signature can be found in their body. For example, Win32.CIH, when it is transmitted, a signature like below can be found:

55908D4424F833DB648703E8000000005B

8D4B445150500F014C24FE5B83C32CFA8

B2B668B6BFC8D711256668973FCC1EE10

668973025ECD05568BF08B48FCF3A483E

8088B300BF67402EBF05ECD05FB33DBE

B0733DB648B038B20648F03585D68

but the begin position is not fixed in the body, because the insertion point is decided when the origin user file is infected..

2, content signature with fixed offset

Some worms transmit packet with content signature begin at the fixed offset. For example, Worm.Welchia, when it scans the vulnerable host, sends the ICMP packets like below:

08007FC603001FE4AAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAA2410E03

FDB4508006000

From the offset position 8, 54 continuous 0xAA bytes are filled.

3,behavior signature with regular content

Some scan of worm or DDOS attack always send regular content, but the frequency of send is too high to come into being attack. For example, Wrom.DVLDR32, it send SYN packet to the destination port 445 of victim host more than 10 times in only one second.

Considering the network appearance of different malicious codes, the rule description of VDS will be given in the next subsection.

3.3The rule description of VDS

From the discussion about the network appearance of malicious code, we can get some conclusions that can be described as below:

DEFINITION 1(Signature). The special binary serial present to the files or packets can be used to identify the malicious code is defined as Signature.

DEFINITION 2(Rule). The Rule is defined as one pattern in misuse detection based IDS, when the data match this pattern that means certain malicious code is detected. The signature is always included in rule for only signature can’t confirm, it must appear in certain regulation and format.

DEFINITION 3 (Rule Database). Rule database is such a set, C = {Ci| Cirule},i = 1,…,N, where Ci is a rule can be used to detect some malicious code, and N is the number of rule in this database.

DEFINITION 4(Protocol). The term Protocol is defined as p = {HTTP, SMTP, POP3, ICMP,….}, which is used to show the application.

DEFINITION 5(Offset).The Offset is defined as offset = {n, -1≤n<MTU,}, which is the position where signature is begin. If the position is not fixed, we use –1 to express this situation. And the offset must be little than the MTU of network.

DEFINITION 6(Rule Description).The every rule is tangible, but the rule description is abstracted from the rules in database. However, after the rule description is obtained, it must be able to express all of the rules. The rule description is defined as P = (name, protocol, offset, signature, count).

The rule description of VDS is given, and then it will be discussed in detail:

vds_rule::name×protocol×offset×signature×count

1, The keyword “name”shows the sign of this rule in a sensate form, this field is optional, can be set as NULL if you not hope the latter process, e.g. statistic.

name:: string | NULL;

2, The keyword “protocol” is not a really string to tell its protocol, but a macro definition of number to tell the type of protocol, hence we get large flexibility.

protocol:: positive_integer;

3, The keyword “offset” is a number show the offset of signature occurs in the content. If the offset is alterable and the whole content should be scan, the number can be set as –1.

offset:: positive_integer | -1;

4, The keyword “signature” is a binary serial which always is showed in hexadecimal format, the signature is the symbol of one kind of intrusion approach.

signature:: binary_string;

5, The keyword “count” perhaps be replaced by “frequency” which is more intuitionistic. But if the time range is defined, then frequency can be converted as count in a fixed time range, in our test the time adopt 30 seconds is fit. And if the frequency is not important for this intrusion approach, the count can be set as 1, it means that once the signature is matched the intrusion is found.

count:: positive_integer;

For example, the rule to detect WIN32.CIH、scan of Worm.Welchia and Worm.DVLDR32 like in Table 1.

Compared with snort, the rule description of VDS is more flexible and more extensible、more fit for the network intrusion system on the large network..


4.The model of event combination

A lots of security product do much work for declinenegative fault and positive fault, but there still is a problem of alerting at a too high frequency. Though intrusions can be confirmed by more accurate definition, attacker often make IDS positive fault by deliberate construction, thus the real intrusion information was submerged in the mass of trash alert information, the difficulty for manager to identify intrusion is increased.

The result of VDS detect engine describes at the time ‘when’, in the position ‘where’, and ‘which’ kinds of malicious code appeared, this simple result is called by ‘result’. We get the simple format of result:

VDS_result:: SRC×DST×Time×Name

In the format, SRC is the source address of malicious code, DST is the destination address of malicious code, Time is the found time, and Name field is the label of malicious code. In fact, we use r_id replace the Name field, it’s the serial number of malicious code. A little example is given, result likes below: