Virus and Anti Viruses
1. INTRODUCTION
In the mid-eighties, so legend has it, the Amjad brothers of Pakistan ran a computer store. Frustrated by computer piracy, they wrote the first computer virus, a boot sector virus called Brain. From those simple beginnings, an entire counter-culture industry of virus creation and distribution emerged, leaving us today with several tens of thousands of viruses. In just over a decade, most of us have been familiar with the term computer virus.
A large portion of modern computing life is to secure theinformation that we are creating and processing. There are manyaspects of information security, ranging from physical access toensuring that the information has not been changed in any way.One of the most high-profile threats to information integrity isthe computer virus. Surprisingly, PC viruses have been aroundfor two-thirds of the IBM PC’s lifetime, appearing in 1986. Withglobal computing on the rise, computer viruses have had morevisibility in the past two years.
Despite our awareness of computer viruses, how many of us can define what one is, or how it infects computers? This seminar aims to demystify the basics of computer viruses, summarizing what they are, how they attack and what we can do to protect ourselves against them.
2. VIRUSES
2.1 THE BASICS OF COMPUTER VIRUSES
Computer viruses are not inherently destructive. The essential feature of a computer program that causes it to be classified as a virus is not its ability to destroy data, but its ability to gain control of the computer and make a fully functional copy of itself. It can reproduce. When it is executed, it makes one or more copies of itself. Those copies may later be executed, to create still more copies, ad infinitum. Not all computer programs that are destructive are classified as viruses because they do not all reproduce, and not all viruses are destructive because reproduction is not destructive. However, all viruses do reproduce. The computer virus overcomes the roadblock of operator control by hiding itself in other programs. Thus it gains access to the CPU simply because people run programs that it happens to have attached itself to without their knowledge. A computer virus attaches itself to other programs earned it the name “virus.” However that analogy is wrong since the programs it attaches to are not in any sense alive.
Virus: What exactly is a Virus?
A virus is basically an executable file which is designed such that first of all it should be able to infect documents, then it has to have the ability to survive by replicating itself and then it should also be able to avoid detection. Usually to avoid detection, a Virus disguises itself as a legitimate program which the user would not normally suspect to be a Virus. Viruses are designed to corrupt or delete data on the hard disk i.e. on the FAT (File Allocation Table).
2.2 TYPES OF VIRUSES
Computer viruses can be classified into several different types.
- File or program viruses:
Some programs are viruses in disguise, when executed they load the virus in the memory along with the program and perform the predefined steps and infect the system. They infect program files like files with extensions like .EXE, .COM , .BIN , .DRV and .SYS. Some file viruses just replicate while others destroy the program being used at that time.
- Boot Sector Viruses (MBR or Master Boot Record)
Boot sector viruses can be created without much difficulty and infect either the Master boot record of the hard disk or the floppy drive.
3. Multipartite Viruses
Multipartite viruses are the hybrid variety; they can be best described as a cross between both Boot Viruses and File viruses.They not only infect files but also infect the boot sector.
4. Stealth Viruses
They viruses are stealth in nature and use various methods to hide themselves and to avoid detection.
5. Polymorphic Viruses
They are the most difficult viruses to detect. They have the ability to mutate this means that they change the viral code known as the signature each time it spreads or infects.
- Macro viruses
In essence, a macro is an executable program embedded in a word processing document or other type of file. Typically users employ macros to automate repetitive tasks and there by save key strokes
2.3THE FUNCTIONAL ELEMENTS OF A VIRUS
Every viable computer virus must have at least two basicparts, or subroutines, if it is even to be called a virus. Firstly, it mustcontain a search routine, which locates new files or new areas ondisk which are worthwhile targets for infection. This routine willdetermine how well the virus reproduces, e.g., whether it does soquickly or slowly, whether it can infect multiple disks or a singledisk, and whether it can infect every portion of a disk or just certainspecific areas. As with all programs, there is a size versus functionalitytradeoff here. The more sophisticated the search routine is, themore space it will take up.So although an efficient search routinemay help a virus to spread faster, it will make the virus bigger, andthat is not always so good.
Secondly, every computer virus must contain a routine tocopy itself into the area which the search routine locates. The copyroutine will only be sophisticated enough to do its job withoutgetting caught. The smaller it is, the better. How small it can be willdepend on how complex a virus it must copy. For example, a viruswhich infects only COM files can get by with a much smaller copyroutine than a virus which infects EXE files. This is because theEXE file structure is much more complex, so the virus simply needsto do more to attach itself to an EXE file.
While the virus only needs to be able to locate suitablehosts and attach itself to them, it is usually helpful to incorporatesome additional features into the virus to avoid detection, either bythe computer user, or by commercial virus detection software.Anti-detection routines can either be a part of the search or copyroutines, or functionally separate from them. For example, thesearch routine may be severely limited in scope to avoid detection.A routine which checked every file on every disk drive, withoutlimit, would take a long time and cause enough unusual disk activitythat an alert user might become suspicious. Alternatively, an Anti-detectionroutine might cause the virus to activate under certainspecial conditions. For example, it might activate only after acertain date has passed (so the virus could lie dormant for a time).
Figure 1. Functional diagram of a virus.
Alternatively, it might activate only if a key has not been pressedfor five minutes (suggesting that the user was not there watchinghiscomputer).Search, copy, and anti-detection routines are the only necessarycomponents of a computer virus, and they are the componentswhich we will concentrate on in this volume. Of course, manycomputer viruses have other routines added in on top of the basicthree to stop normal computer operation, to cause destruction, orto play practical jokes. Such routines may give the virus character,but they are not essential to its existence. In fact, such routines areusually very detrimental to the virus’ goal of survival and self-reproduction,because they make the fact of the virus’ existenceknown to everybody. If there is just a little more disk activity thanexpected, no one will probably notice, and the virus will go on itsmerry way. On the other hand, if the screen to one’s favoriteprogram comes up saying “Ha! Gotcha!” and then the whole
Computer locks up, with everything on it ruined, most anyone canfigure out that they’ve been the victim of a destructive program.And if they’re smart, they’ll get expert help to eradicate it rightaway. The result is that the viruses on that particular system arekilled off, either by themselves or by the clean up crew.
2.4 TOOLS NEEDED FOR WRITING VIRUSES
Viruses are written in assembly language. High level languages like Basic, C, and Pascal have been designed to generate stand-alone programs, but the assumptions made by these languages render them almost useless when writing viruses. They are simply incapable of performing the acrobatics required for a virus to jump from one host program to another. That is not to say that one could not design a high level language that would do the job, but no one has done so yet. Thus, to create viruses, we must use assembly language. It is just the only way we can get exacting control over all the computer system’s resources and use them the way we want to, rather than the way somebody else thinks we should.
3. VIRUSES IN DETAIL
3.1 FILE OR PROGRAM VIRUSES
Some programs are viruses in disguise, when executed they load the virus in the memory along with the program and perform the predefined steps and infect the system. They infect program files like files with extensions like .EXE, .COM, .BIN, .DRV and .SYS. Some file viruses just replicate while others destroy the program being used at that time. Such viruses start replicated as soon as they are loaded into the memory. As the file viruses also destroy the program currently being used, after removing the virus or disinfecting the system, the program that got corrupted due to the file virus, too, has to be repaired or reinstalled.
3.1.1 A Simple COM File Infector
Some DOS Basics
EXE and COM files are directly executable by the Central Processing Unit. To execute a COM file, DOS must do some preparatory work before giving that program control. Most importantly, DOS controls and allocates memory usage in the computer. So first it checks to see if there is enough room in memory to load theprogram. If it can, DOS then allocates the memory required for the program. DOS simply records how much space it is making available for such and such a program, so it won’t try to load another program on top of it later.
Next, DOS builds a block of memory 256 bytes long known as the Program Segment Prefix, or PSP.
Once the PSP is built, DOS takes the COM file stored on disk and loads it into memory just above the PSP, starting at offset 100H. Once this is done, DOS is almost ready to pass control to the program. Before it does, though, it must set up the registers in the CPU to certain predetermined values. First, the segment registersmust be set properly, or a COM program cannot run.
COM files are designed to operate with a very simple, butlimited segment structure. Namely they have one segment,cs=ds=es=ss. All data is stored in the same segment as the programcode itself, and the stack shares this segment.
Figure 2. Memory map just before executing a COM file.
An Outline for a Virus
In order for a virus to reside in a COM file, it must getcontrol passed to its code at some point during the execution of theprogram. The easiest point to take control is right at the very beginning, when DOS jumps to the start of the program.
At this time, the virus is completely free to use any space above the image of the COM file which was loaded into memory by DOS. Since the program itself has not yet executed, it cannot have set up data anywhere in memory, or moved the stack, so this is a very safe time for the virus to operate. To gain control at startup time, a virus infecting a COM file must replace the first few bytes in the COM file with a jump to the virus code, which can be appended at the end of the COM file.
Then, when the COM file is executed, it jumps to the virus, which goes about looking for more files to infect, and infecting them. When the virus is ready, it can return control to the host program. The problem in doing this is that the virus already replaced the first few bytes of the host program with its own code. Thus it mustrestore those bytes, and then jump back to offset 100 Hex, where the original program begins.
Step by step, it might work like this:
- An infected COM file is loaded into memory and executed. The viral code gets control first.
- The virus in memory searches the disk to find a suitable COM file to infect.
- If a suitable file is found, the virus appends its own code to the end of the file.
- Next, it reads the first few bytes of the file into memory, and writes them back out to the file in a special data area within the virus’ code. The new viruswill need these bytes when it executes.
- Next the virus in memory writes a jump instruction to the beginning of the file it is infecting, which will pass control to the new virus when its host program isexecuted.
- Then the virus in memory takes the bytes which were originally the first bytes in its host, and puts them back (at offset 100H).
- Finally, the viral code jumps to offset 100 Hex and allows its host program to execute. Ok. So let’s develop a real virus with these specifications. We willneed both a search mechanism and a copy mechanism.
Figure 3. Replacing the first bytes in a COM file.
3.1.2 AN EXECUTABLE VIRUS
The simple COM file infector which we just developedit only attacks COM files in the currentdirectory, it will have a hard time proliferating. In this chapter, wewill develop a more sophisticated virus that will overcome theselimitations. . . . a virus that can infect EXE files and jump directoryto directory and drive to drive. Such improvements make the virusmuch more complex, and also much more dangerous.
The structure of an exe file
The EXE file is designed to allow DOS to execute programs that require morethan 64 kilobytes of code, data and stack. All of this information is stored in the EXEfile itself, in the EXE Header at the beginning of the file. This header has two parts to it, a fixed-length portion, and a variable length table of pointers to segment references in the Load Module, called the Relocation Pointer Table. Since any virus which attacks EXE files must be able to manipulate the data in the EXE Header.
Figure 4. The layout of an EXE file.
Infecting an EXE File
A virus that is going to infect an EXE file will have tomodify the EXE Header and the Relocation Pointer Table, as wellas adding its own code to the Load Module. The EXE file virus will attach itself to the end of an EXEprogram and gain control when the program first starts. This willrequire a routine similar to that in COM File, which copies programcode from memory to a file on disk, and then adjusts the file.
To set up segments for the virus, new initial segment values for cs and ss must be placed in the EXE file header. All the initial segment values must be calculated from the size of the load module which is being infected. Also, the old initial segments must be stored somewhere in the virus, so it can pass control back to the host program when it is finished executing. We will have to put two pointers to these segment references in the relocation pointer table, since they are relocatable references inside the virus code segment.
A Persistent File Search Mechanism
As in the TIMID virus, the search mechanism and determinewhether it can be infected and make sure it has not already beeninfected. The only two criteria for determining whether an EXE filecan be infected are whether the Overlay Number is zero, andwhether it has enough room in its relocation pointer table for twomore pointers. To determine whether the virus has already infected a file,we put an ID word with a pre-assigned value in the code segmentat a fixed offset (say 0).
The procedure in COM file virus could only search for files in the current directory to attack. a good virus should be able to leap fromdirectory to directory, and even from drive to drive. To search more than one directory, we need a tree searchroutine. For each subdirectory found, search routine will recursively call itself using the new subdirectory as the directory to perform a search on.
Passing Control to the Host
The final step the virus must take is to pass control to the host program. To do that, all the registersshould be set up the same as they would be if the host program werebeing executed without the virus. Except for these, only the ax register is set to aspecific value by DOS, to indicate the validity of the drive ID in theFCB’s in the PSP. The DTA must also be moved when the virus is first fired up, and then restored when control is passed to the host.