Baby steps with a honeypot

Mark Cooper

April 2002

Outline

This document describes the build and running of my first honeypot. It was based heavily on the work done by Lance Spitzner and his colleagues of the HoneyNet project (http://project.honeynet.org).

The aim of my first deployment was to start gaining some experience in the handling of honeypot technologies, rather than concentrate on actual hacker activity.

Contents

Outline
Layout
Data Control
    First attempt at router ACLs
Data Logging
The Honeypot
Event Analysis
    Using sed to determine ports
    Anonymous FTP
    CISCO IOS context based access control lists(CBAC)
Fast Forward
Summary
Bibliography
Links

Layout

This was the layout of my system:


Figure 1 - Network layout

Data Control

One of the key issues with a honeypot is preventing the compromised host from being used to attack other hosts on the Internet. Having had initial problems limiting the number of outbound connections from my honeypot I decided to opt for a slightly simpler setup. This involved blocking all outbound connections from the honeypot, and was accomplished using the following extended ACL on a CISCO router.

First attempt at router ACLs

The following extended access list was applied to the honeypot side of my router:

access-list 100 permit tcp host honeypot any established
access-list 100 permit icmp host honeypot host router
access-list 100 deny ip any any log
CISCO ACLs version 1

The above list allowed TCP connections from the Internet to be established to the honeypot, but not from the honeypot. Note also that neither UDP nor ICMP were permitted. The only exception was ICMP from the honeypot to the router itself. This was enabled purely to allow ping packets from the honeypot to the router.

Data Logging

All activity to the honeypot was logged using Snort. The snort.conf file used was the one provided in the original documentation on the Honeynet Project web site. This has now been shown to be less than ideal, as it explicitly logged TCP, UDP and ICMP traffic, and thus missed any other types of IP traffic.

The Honeypot

The honeypot was running HP-UX 11.00. Prior to being assigned as a honeypot, it was cleanly installed and comprehensively patched. A number of services were disabled, then later re-enabled to provide a more attractive target.

The level of security on a honeypot is a matter of continual debate. One school of thought is that there is no point deploying a honeypot with known weaknesses. One expects that a hacker will use the easiest method to compromise a system, and so it is argued that a honeypot with known vulnerabilities has little value if the aim is to detect and analyse new vulnerabilities.

As this was my first honeypot I tried to hit a middle ground. As already explained, the system was fairly well patched, but ran some services that, from a security point of view, it shouldn't. Maybe that would get me the best of both worlds, or maybe the worst. Only time would tell, and this was, after all, a learning exercise!

Event Analysis

Using the Snort configuration file from the HoneyNet Project, all network activity was logged in libpcap-formatted files. Packets that matched Snort signatures generated alerts in the normal way, and were logged in both short ('fast') and detailed ('full') ASCII formats.

Note that only examining the alert files misses the point of the research honey pot. If there is a new unknown exploit 'out there', then by definition there will not yet be a Snort signature. Thus there would be no entry in the snort_full or snort_fast alerts files. However, there would be data in the binary output file, as all TCP, UDP and ICMP traffic was logged there. So it was necessary to develop a quick way to spot any interesting traffic in the binary files. As only TCP traffic was being permitted by the router ACLs, that is all that needed to be looked for.

Using sed to determine port numbers

A quick way to filter out the noise from the binary file was to examine it using tcpdump, specifying a suitable BPF filter. For instance, the following filter will ignore all packets in which the TCP SYN flag is set:

    tcp and tcp[12:2] & 0x02 == 0

Similarly, to ignore all RST packets:

    tcp and tcp[12:2] & 0x04 == 0

Naturally, the above could be combined:

    tcp and tcp[12:2] & 0x06 == 0

Careful use of standard UNIX command line text processing tools, such as sed, can be used to extract useful information. For instance, to determine which TCP ports were successfully connected to on my honeypot I used the following incantation:

    tcpdump -nr snort.log 'tcp and tcp[12:2] & 0x06 == 0' |
    sed -ne 's/^[0-9][0-9]*:[0-9][0-9]*:[0-9][0-9]*\.[0-9][0-9]* W\.X\.Y\.Z\.\([1-9][0-9]*\) .*$/\1/ gp' |
    sort -n | uniq

The first line is a tcpdump invocation to show all TCP packets without a SYN or RST flag.

The second two lines show sed being used to extract the port number used on the honeypot. The last line uses the sort and uniq commands to produce a compact sorted list of the unique port numbers connected to.

To help understand what's happening, look at a typical tcpdump output:

03:17:28.820874 at.ta.ck.er.1026 > W.X.Y.Z.80: S 15045320:15045320(0) win 8192 <mss 512,nop,nop,sackOK> (DF)
03:17:28.830874 W.X.Y.Z.80 > at.ta.ck.er.1026: R 0:11(11) ack 15045321 win 0 (DF)

Briefly, this trace of two packets starts with an inbound tcp SYN packet from the client to the honeypot. SYN packets are used as the first step in the three-way tcp handshake required when establishing a tcp connection. The honeypot responds with a reset (RST) packet to the server. This indicates that the honeypot is not listening on this port. As I wasn't interested in unsuccessful connection attempts, I ignored all packets including a SYN or RST flag.

A more detailed breakdown now follows.

Each packet trace starts with a timestamp, in hours:minutes:seconds.fraction format. This is matched in sed with the following regular expression[1]:

     ^[0-9][0-9]*: [0-9][0-9]*: [0-9][0-9]*\. [0-9][0-9]*

The first underlined segment is explained below.

The initial caret symbol (^) tells sed to start pattern matching at the start of the line. The "[0-9]" means match any single digit in the range 0-9. The "[0-9]*" is similar in purpose to the previous segment. However, the asterisk modifier "*" means "match zero or more of the previous character". So, we can match an hour value that is one or two digits in length.[2]

We then follow this with a literal colon ":" character.

We repeat this segment twice, to cater for the minutes and then seconds. We then follow it with a literal period "." character (instead of a colon), and then a similar segment to match the 1 or more digits that represent the fraction of a second [3].

After the timestamp there is a single space character before the start of the source IP address and source port. This segment is shown underlined below.

03:17:28.830874 W.X.Y.Z.80 > at.ta.ck.er.1026: R 0:11(11) ack 15045321 win 0 (DF)

The IP address, obscured here, is displayed in normal "dotted quad" notation, i.e. 4 sets of digits each separated by a single period "." character. For TCP and UDP packets, another period and then the source port number immediately follow the IP address. In the above case, the port is 80.

In the above example I have replaced my honeypot IP address with "W.X.Y.Z". The corresponding portion of the regular expression that matches this is shown below. Note that it starts with a space character.

     W\.X\.Y\.Z

As the period character has a special meaning in regular expressions[4], it must be preceded by a back slash to indicate that we require a literal "." character at this position.

The following regular expression segment matches the port number:

    \.\([1-9][0-9]*\)

First, we match the final period character after the IP address. Next, we introduce a new regexp feature denoted by the \( ... \) construct. This causes sed to remember whatever is matched between the two boundary markers. We'll see why a little later.

To match the actual port number we use the underlined portion. The first digit in the port number must be in the range 1 to 9. After that, we can have 0, 1 or 2 more digits in the range 0-9 [5].

OK. We now have the piece of information we wanted. Lets tell sed to skip the rest of the line.

     .*$

There is always a single space character after the port number, so we start with that. Then we use the common period-asterisk combination ".*" to match any remaining characters. The dollar symbol "$" is used to denote the end of line. So we have now matched everything after the single space up to the end of the line. Finished!

Well, not quite. We have found our port number, but not actually done anything with it. The regular expression that we've just looked at was used within a search-and-replace sed command, which can be summarised as

    s/match this /replace with this/options

The regular expression we've just examined constitutes the "match this" part. However, what do we replace it with? Indeed, why do we want to replace it at all?

To understand this you need to remember how sed works. It is fed textual input, manipulates the text, and then prints something as output[6]. By replacing the input text with just the port number, and then printing the result, we get a list of all the port numbers found.

Remember the \( ... \) construct we used in the regexp? It is repeated below, highlighted in bold.

The construct bordered the part of the expression that matched the actual port number. We can now refer to the matched text using the notation "\1". sed can remember 9 chunks of matched text. As this is the first (and only) chunk that we asked it to remember, it is number 1. It is shown below bold and underlined.

    sed -n 's/^[0-9][0-9]*:[0-9][0-9]*:[0-9][0-9]*\.[0-9][0-9]* W\.X\.Y\.Z\.\([1-9][0-9]*\) .*$/\1/ p'

To actually print out the resulting text we use the "p" command, shown at the very end of the sed incantation. The sort and uniq commands are simply used to remove any duplicates, thus producing a neat list of the honeypot ports to which tcp connections were successfully established[7].

See, it's easy! Who said regular expressions were hard to understand? :)

Anonymous FTP

Since I noted a large number of probes looking for an anonymous FTP server, I decided to configure one on my honeypot. The process was quite straightforward, and was described (a little incorrectly!) in the manual page for ftpd. I also edited the inetd.conf file to enable all the logging options for the FTP daemon. Time now to sit back and wait...

Well, only a few hours passed and I already had some interest. It looked like an automated scan for anonymous FTP servers.

Testing from a remote host showed a flaw in my CISCO ACL. Whilst remote hosts could successfully connect to my FTP server, they could not perform any of the usual operations, such as ls, get or put. The reason was quite simple.

If they used normal "active" mode FTP, the server would attempt to open up a data connection back to the requesting client using a source port of 20/tcp. However, as you can see from my initial ACL shown above, this outbound connection was blocked at my router. A quick fix was to add the following highlighted ACL. Passive mode FTP wouldn't work, as the honeypot, sat on a private LAN behind a NATing firewall, replied with a non-routable RFC 1918 address.

access-list 100 permit tcp host honeypot any established
access-list 100 permit tcp host honeypot eq 20 any
access-list 100 permit icmp host honeypot host router
access-list 100 deny ip any any log
CISCO ACLs version 2

This was not ideal. Whilst it permitted the functionality I required, it also introduced a potential liability. Should the system be compromised, it was now able to initiate outbound connections to other hosts using source port 20.

A better approach was to use the context-based access control mechanism (CBAC) that my router supports.

CISCO IOS Context Based Access Control Lists (CBAC)

CBAC allows the router to dynamically add and remove ACLs as required by the higher-level protocol. In my case, the CBAC mechanism looks at the commands passed over the FTP command channel (TCP port 21), and dynamically creates an ACL to cater for the data connection specified in the FTP "PORT" command.

Thus I no longer needed the dangerous ACL highlighted above. Instead, the following lines enabled CBAC. First, a global configuration command to enable CBAC inspection of FTP:

ip inspect name anyname ftp timeout 600

Next, this inspect list identified by anyname had be applied to an interface. It was applied to the interface that was not directly connected to the honeypot.

ip inspect anyname in

My ACLs were now back to those shown in "CISCO ACLs version 1" above.

Fast Forward

I received lots of "warez" scans, and some interesting uploads. One was for a new game that according to the manufacturer's website wasn't due for release until 24 hours later! Naturally, I removed all uploads as soon as I spotted them.

Since this article is simply to record some of my learning experiences gained during the configuration and deployment of my first honeypot, I'll not include any analysis of the anonymous FTP scans.

Summary

Did my first honeypot achieve my objectives? Yes, I believe it did. The consensus of experience says that the first couple of honeypots one deploys will teach more about honeypots than actual hacker activity.

My next honeypot will involve redesigning my perimeter security, to allow all protocols through to the honeypot whilst still controlling outbound connectivity. It will also use a more common operating system, so that I can gain some first hand experience of post-incident analysis!

Bibliography

The following books and articles provided extremely useful information.

1 "SED and AWK"
            Dale Dougherty
            O'Reilly & Associates, Inc.

2 "Mastering Regular Expressions"
            Jeffrey E.F. Friedl
            O'Reilly & Associates, Inc.

3 "TCP/IP Illustrated, Volume 1 - The Protocols"
            W. Richard Stevens
            Addison-Wessley

4 "Cisco IOS Access Lists"
            Jeff Sedayao
            O'Reilly & Associates, Inc.

5 "Know Your Enemy: Honeynets"
            The Honeynet Project
            http://project.honeynet.org

6 "Honeypots - Definition and value of honeypots"
            Lance Spitzner
            http://www.enteract.com/~lspitz

7 Cisco IOS router benchmark and audit tool
            The Centre for Internet Security
            http://www.cisecurity.org

Links

The Honeynet Project

http://project.honeynet.org

The Distributed Honeypot Project

http://www.lucidic.net

The Centre for Internet Security

http://www.cisecurity.org



[1] To learn more about regular expressions, check the regex manual page on a Unix system. Alternatively, refer to the bibliography at the end of this article for details of books covering sed and regular expressions.

[2] There are other ways of specifying this. For instance, one could replace the complete segment by "[0-9]{1,2}", as we only ever expect 1 or 2 digits in the hour field of the timestamp.

[3] Note that here we really do need to use the "*" symbol, as different platforms resolve to different resolutions.

[4] A period means "match any single character".

[5] Technically, for a 3-digit port number the initial digit can only be 1 or 2. If it is a 2, the second and third digits can only be in the range 0 to 5. However, we trust that the input to sed is valid, and so do not need a complex regexp to cater for this exact specification.

[6] By default, sed outputs all input lines. The "-n" option used in the command inhibits that action.

[7] This command sequence does have one small flaw in it. It does not show connections that were "half-opened" and then reset by the client. This is a common technique used when port scanning a system for open tcp ports. To correct this we would need to modify the BPF filter to allow packets that had the SYN and ACK flags set, but not those with just a SYN or RST.