DITL 2009 Data Collection
A Day in the Life of the Internet is a large-scale data collection project undertaken by CAIDA and OARC every year since 2006. This year, the DITL collection
There are no strict participation requirements. OARC is happy to accept data from members and non-members alike. If you are a non-member, you may want to sign a Proprietary Data Agreement with us, but this is not required.
In terms of data sources, we are always interested in getting a lot of coverage from DNS Root servers, TLD servers, AS112 nodes, and "client-side" iterative/caching resolvers.
Types of DNS Data
Most of the data that we collect for DITL will be pcap files (e.g., from dnscap or tcpdump). We are also happy to accept other data formats such as BIND query logs, text files, SQL database dumps, and so on. We have an established system for receiving compressed pcap files from contributors. If you want to contribute data in a different format, please contact us to make transfer arrangements.
Collecting Data with dnscap
If you don't already have your own system for capturing DNS traffic, we recommend using dnscap with some shell scripts that we provide specifically for DITL collection.
Next we'll be working with some scripts in the scripts directory. By default these will store pcap files in the current directory.
We provide scripts for using either (dnscap) or (tcpdump and tcpdump-split). In most cases dnscap should be easier. The tcpdump method is included for sites that would prefer it or cannot use dnscap for some reason. Note that the settings.sh configuration file described below includes variables for both dnscap and tcpdump. Some variables are common to both, while some are unique to each.
Here is an example customized settings.sh file:
# Settings that you should customize # IFACES="fxp0" NODENAME="lgh" OARC_MEMBER="test"
When you're done customizing the settings, run capture-dnscap.sh as root:
$ sudo sh capture-dnscap.sh
When its time to do the actual DITL data collection, please uncomment the START_T and STOP_T variables in settings.sh and run the scripts from within a screen session.
Collecting Data with tcpdump and tcpdump-split
Another collection option is to use tcpdump and our tcpdump-split program. The instructions are similar to the above.
Note that tcpdump won't capture IP fragments unless you add "or ip[6:2] & 0x1fff != 0" to your BPF_FILTER.
Start the capture with:
$ sudo sh capture-tcpdump.sh
Uncomment the START_T and STOP_T and use screen when its time for the real deal.
Data collection and pre-processing is now complete. Please visit the DITL 2009 Data page for information on the data and how to access it.
Contact the OARC Admin with any questions about DITL 2009.
Submitted by admin on Mon, 2009-03-02 23:33
Verizon Digital Media Svs
Integrated S and T
Tel Aviv University
University of Maryland