Project in Computer Security

236349

Michael Kuritzky306741638Guy Cepelevich304045305

Honeypot data analysis

Final Report

Supervisor: Amichai Shulman

Overview

From Wikipedia: In computer terminology, a honeypot is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. Generally it consists of a computer, data, or a network site that appears to be part of a network, but is actually isolated, (un)protected, and monitored, and which seems to contain information or a resource of value to attackers.

In our case, the honeypot is a remote machine running an anonymizing proxy service, which keeps logs of all the requests passed through it.

Project goals and development order:

  1. Set up and activate the honeypot.
  2. Create a database, parse the honeypot’s logs and inject them into the DB.
  3. Create a system that provides an easy-to-use interface to view and classify the log entries.
  4. Add a rule system for automatic classification of both new(future) and existing entries.
  5. Add a rule management system for rule activation, deactivation and deletion.

Implementation details:

  • We used Privoxy (open-source; available from ) to provide the anonymizing proxy service.
  • The proxy runs on a remote Amazon EC2 (Elastic Compute Cloud) Instance.
  • The logs are stored on an Amazon EBS (Elastic Block Storage) drive connected to the Instance.
  • The logs are downloaded from the EBS drive, parsed and stored in a MySQL database. We store the database locally because the EC2 instance tends to be very slow.
  • We chose to use MySQL to store the database for the following reasons:
  • Very common
  • Free
  • Easy to access
  • Existing management tools
  • Easy to write rules on the entries
  • Developers’ experience 
  • The system consists of two parts – the PrivoxyLogParser and PrivoxyLogViewer.
  • Both were written in Java, for the following reasons:
  • Free
  • Developers’ experience 
  • Extensive integrated and third-party library support (JDBC for database connections, Swing for GUI development)
  • In order to avoid memory problems (most logs are around 10MB), the parser goes over the logs line-by-line, parses them and calls its listeners each time it finishes parsing a log entry.
  • A listener connected to the parser inserts the new log entries into the database.
  • We use two methods to classify the entries:
  • Interest:
  • True indicates that the entries are interesting (e.g. pertaining to some known attack pattern).
  • False indicates that the entries are not interesting, i.e. representing regular internet usage activity.
  • Null indicates that the entries have not been classified yet. The user can either classify them manually from the Entries Panel (see below), or create a Rule to classify them automatically (see below).
  • Tags:
  • Each entry can be associated with one or more tags (e.g. PORN, SQL INJECTION, CLICK FORGERY) for easier browsing, grouping and filtering.
  • The viewer consists of three panels (more detailed documentation below):
  • Entries Panel:
  • Provides convenient display of entries from the DB.
  • Allows on-the-spot manipulation of the entries.
  • Allows the user to conveniently load more logs into the DB.
  • Rule Editing Panel:
  • Provides an interface for creating rules for automatic entry manipulation.
  • Rule Management Panel:
  • Provides an interface for activating, deactivating or deleting existing rules.
  • Allows the user to view all entries matching a rule.
  • Allows the user to “load” a rule for editing, or use it as a template for a new (perhaps more refined) rule.

Setting up the system

  • If you wish to set up a new “log-producing” instance:
  • Download and install Privoxy from
  • Modify Privoxy’s config file – set it to write logs into your preferred dir.
  • Add the backupLog.bat script in Appendix B (also attached to the submission) to Windows’ Task Scheduler, and set it to run at regular intervals (preferably once an hour – approx. 10MB of log data are generated each hour, from our observation). Don’t forget to modify the script to suit the file/folder locations on your instance!
  • Set up a way to copy the log files to another machine (ClipBook service, rdpclip.exe, ftp server, etc.).
  • If you don’t have it yet, download, install and run MySQL Server from
  • Run the schema creation script in Appendix A (also attached to the submission).
    NOTE: if, for some reason, you need to run the script again, make sure you delete the "hagana_proj" DB instance from your MySQL server (if its there) prior to activating the script.
  • If you don’t have it yet, download and install the Java Runtime Environment from
    (or Java Development Kit, which contains the JRE, from
  • If you don’t have it yet, download and install the latest version of Eclipse.
  • In Eclipse, Import the two projects, PrivoxyLogParser and PrivoxyLogViewer.
  • In the PrivoxyLogParser project, go to the file messages.properties (in the res folder), and change the values there to fit your MySQL server properties (server address, username and password).
  • Run thePrivoxyLogViewer (the file to run there is MainFrame.java) and use the Load Logs button in the Entries panel (see detailed documentation below) to load Privoxy logs into the system.
  • Download some logs from the EC2 Instance.
  • That’s it – you’re ready to view the logs! 

GUI Documentation

  • Entries Panel
  • Screenshot:
  • Load Logs – opens a file chooser which allows the user to choose privoxy log files to parse (multiple choice is supported). Once the logs are parsed, all the active rules will be applied to the entries.
  • Reset – resets the panel to its initial state (i.e. view all entries).
  • Create Rule From Selection – uses the currently selected cells as a template for a new rule (OR between all the values; see Rule Editing Panel documentation below).
  • Prev Page, Next Page – scrolls between pages of the current view (we can’t load all the entries at once because of memory concerns, so we load 200 at a time).
  • Entry table – displays the main columns of the entries. Click on a single cell to view its content in the Selected cell field, and view the Request Headers and Response Headers associated with that entry. Select a cell or range of cells to view and/or modify the tags associated with them, and/or modify the interest associated with them.
  • Filter entries by interest – the user can choose to view only entries with a certain Interest value.
  • Change interest of selected rows to – the user can change the Interest level of all selected rows at once.
  • Tags – the user can assign one or more tags to the selected entries (use the and buttons to move tags between the Applied and Unapplied lists). Also, the user can add new tags to the system (using the + button) or delete ones from it (using the Delete tag button).
  • Rule Editing Panel
  • Screenshot:
  • Rule name – must be unique (if a rule with the current name exists, a dialog will appear [upon clicking Save, of course] asking the user whether to overwrite it), up to 50 chars.
  • Rule description – a short textual description of the purpose of the rule.
  • SQL Statement – the main part of the rule. Here, the user needs to write an SQL statement that will catch the entries he is interested in, and return their entry_id (note the static “header” of the query – the statement must return a list of entry_id’s!). Use the DB Schema button below to show a reminder of the relevant part of the database schema:
  • Actions – the actions to take upon the entries matching the rule. Note that these actions (and especially the Delete entries one) CANNOT be undone!
    The actions are:
  • Delete Entries
  • Change interest level
  • Apply tags
  • Unapply tags
  • Upon clicking the Save button, the user will be asked whether to apply the rule now, or just save it for the future entries. Note that applying rules can take a while!
  • Rule Management Panel
  • Screenshot:
  • Selected Rule Description – click on a rule to view its description in this field.
  • Active rules – these rules will be applied to any newly parsed logs.
  • Inactive rules – these rules are just sitting around in the memory until the user decides to make them active again. 
  • Use the , buttons to move rules between the Active and Inactive lists.
  • Delete selected rule – does just that.
  • Use selected rule as template to a new rule – opens the rule in the Rule Editing Panel, where the user can either overwrite it or use it to create a new rule.
  • View entries related to the selected rule – opens the entries (if there are any) “captured” by the rule in the Entries Panel.

Class Diagram


Sample Results

These sample results come from analyzing about 90,000 log entries, which are about 1.5 days of traffic on the honeypot.

  • A (very) simple rule to catch porn:
  • The rule itself:

  • And the results (about 1000 entries; more can probably be caught with more sophisticated queries):

    Interestingly enough, most requests come from the 78.159.125.0 subnet…
  • Suspicious user-agents
  • This rule catches requests that supposedly come from computers with the MS Windows 98 operating system. Since we’re in 2010 now, this seems quite suspicious…
  • The rule itself:

  • And the results – about 9000(!) entries!

  • A quick look at the results of the Win98 rule revealed a bunch of requests to a site rd.apmebf.com , which (in addition to the domain itself looking suspicious), all have &host=media.fastclick.net at the end of the URL.
    This seemed suspicious to us, so we wrote a trivial rule (“WHERE m.host LIKE ‘%apmebf%’”) to catch all the apmebf requests and assign a tag to them. In a lesser database (about 19,000 entries), this rule caught about 600 entries, coming in batches of several dozen requests, often asking for the same page.

    From Googling the word apmebf, it seems that it’s some malicious site, blocked by many firewalls.
    Future work
  • Make the SQL queries more efficient (currently we have a problem dealing with databases with a large number of entries).
  • Make the user defined queries more structured and guided.
  • Support for creating automatic queries from multiple selection from the entries table.
  • Support for reconstruction and “replay” of requests.

Appendix A – Database creation script

CREATE DATABASE `hagana_proj` /*!40100 DEFAULT CHARACTER SET latin1 */;

DROP TABLE IF EXISTS `hagana_proj`.`active_rules`;

CREATE TABLE `hagana_proj`.`active_rules` (

`RULE_NAME` char(50) NOT NULL DEFAULT 'NEW_RULE',

PRIMARY KEY (`RULE_NAME`) USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`associated_tags`;

CREATE TABLE `hagana_proj`.`associated_tags` (

`ENTRY_ID` bigint(20) unsigned NOT NULL,

`TAG` varchar(50) NOT NULL,

PRIMARY KEY (`ENTRY_ID`,`TAG`) USING BTREE,

KEY `Tag_Index` (`TAG`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`master`;

CREATE TABLE `hagana_proj`.`master` (

`ENTRY_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,

`SOURCE_IP` char(15) NOT NULL,

`COUNTRY` varchar(15) DEFAULT NULL,

`DESTINATION_URL` varchar(3000) DEFAULT NULL,

`HTTP_METHOD` char(5) NOT NULL,

`REQUESTED_PAGE` varchar(3000) DEFAULT NULL,

`PROTOCOL` char(15) DEFAULT NULL,

`PROTOCOL_VERSION` double DEFAULT NULL,

`DECISION` char(5) NOT NULL,

`RESPONSE_CODE` int(10) unsigned DEFAULT NULL,

`PRIVOXY_COMMENT_CODE` int(10) unsigned DEFAULT NULL,

`TIMESTAMP` datetime NOT NULL,

`IS_INTERESTING` tinyint(1) DEFAULT NULL,

`HOST` varchar(3000) DEFAULT NULL,

PRIMARY KEY (`ENTRY_ID`) USING BTREE,

KEY `IP_index` (`SOURCE_IP`),

KEY `Interest_Index` (`IS_INTERESTING`)

) ENGINE=InnoDB AUTO_INCREMENT=153623 DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`request_headers`;

CREATE TABLE `hagana_proj`.`request_headers` (

`ENTRY_ID` bigint(20) unsigned NOT NULL,

`REQ_HEADER_NAME` varchar(100) NOT NULL,

`REQ_HEADER_CONTENT` varchar(3000) DEFAULT NULL,

KEY `Header_Id_Index` (`ENTRY_ID`) USING BTREE,

KEY `Header_Name_Index` (`REQ_HEADER_NAME`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`response_headers`;

CREATE TABLE `hagana_proj`.`response_headers` (

`ENTRY_ID` bigint(20) unsigned NOT NULL,

`RES_HEADER_NAME` varchar(100) NOT NULL,

`RES_HEADER_CONTENT` varchar(3000) DEFAULT NULL,

KEY `Header_Id_Index` (`RES_HEADER_NAME`) USING BTREE,

KEY `Header_Name_Index` (`RES_HEADER_NAME`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`rules_list`;

CREATE TABLE `hagana_proj`.`rules_list` (

`RULE_NAME` char(50) NOT NULL DEFAULT 'NEW_RULE',

`DESCRIPTION` varchar(3000) DEFAULT NULL,

`SQL_QUERY` varchar(5000) NOT NULL,

`TO_DELETE_ENTRIES` smallint(1) unsigned NOT NULL DEFAULT '0',

`TO_CHANGE_INTEREST` smallint(1) unsigned NOT NULL DEFAULT '0',

`CHANGE_INTEREST_TO` smallint(1) unsigned DEFAULT NULL,

PRIMARY KEY (`RULE_NAME`) USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`tag_list`;

CREATE TABLE `hagana_proj`.`tag_list` (

`TAG` varchar(50) NOT NULL,

PRIMARY KEY (`TAG`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`tags_associated_in_rules`;

CREATE TABLE `hagana_proj`.`tags_associated_in_rules` (

`RULE_NAME` char(50) NOT NULL,

`TAG` varchar(50) NOT NULL,

PRIMARY KEY (`RULE_NAME`,`TAG`) USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `hagana_proj`.`tags_removed_in_rules`;

CREATE TABLE `hagana_proj`.`tags_removed_in_rules` (

`RULE_NAME` char(50) NOT NULL,

`TAG` varchar(50) NOT NULL,

PRIMARY KEY (`RULE_NAME`,`TAG`) USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Appendix B – log backup script

@echo off

:: Variables

set copyCmd=copy /y

set delCmd=del /f

set startCmd=start

set killCmd=taskkill /f /im

set day=%date:~4,2%

set month=%date:~7,2%

set year=%date:~10,4%

set hour=%time:~0,2%

set minute=%time:~3,2%

set timestamp=%day%_%month%_%year%_%hour%_%minute%

set srcDir=E:\PrivoxyLogs

set srcFile=%srcDir%\privoxy.log

set dstDir=%srcDir%\LogBackups

set dstFile=%dstDir%\privoxy_%timestamp%.log

set privoxyDir=D:\Privoxy-students

set privoxyExe=%privoxyDir%\privoxy.exe

::

:: Kill privoxy so that it won't bother the backup process

%killCmd% privoxy.exe

:: Copy the old log file

%copyCmd% %srcFile% %dstFile%

:: Delete the old log file

%delCmd% %srcFile%

:: Create a new empty file instead

echo. 2>%srcFile%

:: Start privoxy back up

%startCmd% %privoxyExe%

::

echo Backup done!