DITL 2009 Data Collection

A Day in the Life of the Internet is a large-scale data collection project undertaken by CAIDA and OARC every year since 2006. This year, the DITL collection ~~will take place in late March~~ took place March 30-April 1, 2009. If you would like to participate by collecting and contributing DNS packet captures, please subscribe to the DITL mailing list.

Participation Requirements

There are no strict participation requirements. OARC is happy to accept data from members and non-members alike. If you are a non-member, you may want to sign a Proprietary Data Agreement with us, but this is not required. In terms of data sources, we are always interested in getting a lot of coverage from DNS Root servers, TLD servers, AS112 nodes, and "client-side" iterative/caching resolvers.

Types of DNS Data

Most of the data that we collect for DITL will be pcap files (e.g., from dnscap or tcpdump). We are also happy to accept other data formats such as BIND query logs, text files, SQL database dumps, and so on. We have an established system for receiving compressed pcap files from contributors. If you want to contribute data in a different format, please contact us to make transfer arrangements.

Pre-collection Checklist

Please make sure that your collection hosts are time-synchronized with NTP. Do not simply use date to check a clock since you might be confused by time zone offsets. Instead use ntpdate like this:
```
$ ntpdate -q clock.isc.org
server 204.152.184.72, stratum 1, offset 0.002891, delay 0.02713
```
The reported offset should normally be very small (less than one second). If not, your clock is probably not synchronized with NTP.
Be sure to do some "dry runs" before the actual collection time. This will obviously test your procedures and give you a sense of how much data you'll be collecting.
Carefully consider your local storage options. Do you have enough local space to store all the DITL data? Or will you need to upload it as it is being collected? If you have enough space, perhaps you'll find it easier to collect first and upload after, rather than trying to manage both at the same time.

Collecting Data with dnscap

If you don't already have your own system for capturing DNS traffic, we recommend using dnscap with some shell scripts that we provide specifically for DITL collection.

Download the most recent ditl-tools tarball. This includes a copy of the dnscap sources.
Run 'make' from the top-level ditl-tools directory.
Note that dnscap depends on libbind, so you may need to install that first. See here for more info.
Run 'make install' as root. This installs dnscap to /usr/local/bin.

Next we'll be working with some scripts in the scripts directory. By default these will store pcap files in the current directory. You may want to copy these scripts to a different directory where you have plenty of free disk space. We provide scripts for using either (dnscap) or (tcpdump and tcpdump-split). In most cases dnscap should be easier. The tcpdump method is included for sites that would prefer it or cannot use dnscap for some reason. Note that the settings.sh configuration file described below includes variables for both dnscap and tcpdump. Some variables are common to both, while some are unique to each.

Copy settings.sh.default to settings.sh.
Open settings.sh in a text editor.
Set the IFACES variable to the names of your network interfaces carrying DNS data.
Set the NODENAME variable (or leave it commented to use the output of `hostname` as the NODENAME). Please make sure that each instance of dnscap that you run has a unique $nodename!
Set the OARC_MEMBER variable to your OARC-assigned name. Note that the scripts automatically prepend "oarc-" to the login name so just give the short version here.
Note that the scripts assume your OARC ssh upload key is at /root/.ssh/oarc_id_dsa.
Look over the remaining variables in settings.sh. Read the comments in capture-dnscap.sh to understand what all the variables mean.

Here is an example customized settings.sh file:

# Settings that you should customize
#
IFACES="fxp0"
NODENAME="lgh"
OARC_MEMBER="test"

When you're done customizing the settings, run capture-dnscap.sh as root:

$ sudo sh capture-dnscap.sh

When its time to do the actual DITL data collection, please uncomment the START_T and STOP_T variables in settings.sh and run the scripts from within a screen session.

Collecting Data with tcpdump and tcpdump-split

Another collection option is to use tcpdump and our tcpdump-split program. The instructions are similar to the above.

Download and install the ditl-tools package (see link above).
Copy settings.sh.default to settings.sh and bring it up in a text editor
Set the IFACES variable to the single network interface to collect DNS data from.
Set NODNAME
Set OARC_MEMBER
Tweak the BPF_FILTER variable as necessary.

Note that tcpdump won't capture IP fragments unless you add "or ip[6:2] & 0x1fff != 0" to your BPF_FILTER. Start the capture with:

$ sudo sh capture-tcpdump.sh

Uncomment the START_T and STOP_T and use screen when its time for the real deal.

Data Available

Data collection and pre-processing is now complete. Please visit the DITL 2009 Data page for information on the data and how to access it.

Contact

Contact the OARC Admin with any questions about DITL 2009.