DNS-OARC Public Comment on ICANN's DNS-CERT Business Case

On February 12th, 2010, ICANN issued a invitation for Public Comment on a proposed DNS-CERT. The original invitation can be found at http://www.icann.org/en/announcements/announcement-2-12feb10-en.htm and the list of current public comments maybe be found at http://forum.icann.org/lists/dns-cert-proposal/

DNS-OARC's submission follows:


DNS Operations, Analysis and Research Center

DNS-OARC Public Comment on ICANN's DNS-CERT Business Case

Introduction


DNS-OARC http://www.dns-oarc.net was established in 2004 to address a forseen and growing need to improve the stability, security and understanding of the Internet's DNS infrastructure. Initially funded by the Internet Systems Consortium and a National Science Foundation grant, it is now an established nonprofit with over 60 DNS registries, registrars, operators, researchers and vendors comprising its membership.


DNS-OARC's mission is: to build trust among its members through forums where information can be shared in confidence; to enable knowledge transfer by organizing semiannual workshops; to promote research through data collection, analysis, and simulation; and to increase awareness with publicly available tools and services.


DNS-OARC has been following the DNS-CERT proposals closely as there is much in common between its existing mission and that proposed for DNS-CERT. To date DNS-OARC has worked closely with ICANN in these areas, including collaboration on running the 2 Global DNS Security Stability and Resilience symposia, and participation in the recent DNS-CERT gap analysis workshop.


We are grateful for the co-operation we have received from ICANN in this collaboration, and for ICANN's support both as a member and in funding of our data gathering and analysis related to the DNSSEC signing of the root zone.


In general DNS-OARC welcomes the additional attention that ICANN has brought to the subject are of its mission, and we agree with ICANN's position that additional funding and resources for this area would be beneficial. We do however have some concerns about potential overlap, and some aspects of the approach proposed, which are detailed below.


The views represented in this paper should be noted as being representative of the view of DNS-OARC in its own right, and not necessarily those of OARC's individual members.

DNS-OARC's Capabilities


In the 6 years since its inception, DNS-OARC has developed a number of capabilities with which to fulfill its mission. Many of these are in established widespread use by the DNS operator community, and have contributed both to the response to specific large-scale incidents the global DNS infrastructure has faced, and to the standards of knowledge and best practice within the wider community. Many of these capabilities are compatible with the mission of DNS-CERT, and should be considered as resources that DNS-OARC can "bring to the table" of DNS-CERT's mission.


These capabilities include:

  • Regular open and member-only meetings.
  • A website which places substantial key knowledge, resources and information in the public domain, together with a private section which include a member operational contact directory.
  • Data gathering, monitoring and analysis infrastructure.
  • Archive Data Sets, amounting to over 40 Terabytes dating back 5-10 years.
  • Open, member-only, closed-user and vetted-only mailing lists for knowledge and information sharing, together with a PGP encryption key repository for secure e-mail exchange.
  • Secure real-time Jabber/XMPP-based instant messaging service for members to instantly get in touch and co-ordinate responses during incidents.
  • Publicly available tools for testing, verifying, probing and scanning DNS infrastructure for appropriate behavior and vulnerabilities. Use of many of these tools generate further data sets.
  • Relationships with researchers which allow for analysis and characterization of data, incidents and attacks.
  • Formal relationships with our members which establish DNS-OARC as a self-governing, self-funding neutral membership organization, and places legally-binding confidentiality requirements on all participants.
  • Governance structures including a Delaware legal entity with 501(c)3 nonprofit status and a board of Directors elected by its members.
  • Full-time staff dedicated to fulfilling DNS-OARC's mission.

Partnership with Existing Initiatives


There are various global initiatives that resemble a CERT-like organization. DNS-OARC wishes to draw attention to their existing
activities and emphasize the important of working collectively with them.
  • Mailing lists: there are several which require vetting and which share global incidents (OARC's vetted list, NXD, NSPSEC etc).
  • Meetings: There are several technical colloquia that allow for operationally-relevant DNS meetings, such as DNS-OARC, CENTR-TECH, ICANN's ccNSO-TECH/DNSSECWG, RISG.
  • Active Incident Handling: what might not be shared on the afore-mentioned lists and meetings, is the actual coordination/mitigation of an attack or threat. There are however established CERT organizations that handle those incidents. Some ccTLD registries already operate a CERT on a national level. Lastly, some do have a helpdesk, publish abuse contact addresses, various online tools, and are monitoring lists to handle incidents with actively participating staff, though they do not call it a CERT.


DNS-OARC encourages all organizations to develop a CSIRT-like internal organization or structure. The subsequent step would then be to join FIRST (a vetted membership organization) and/or DNS-OARC in order to share incidents and mitigated threats in a bi-lateral and trusted way.

DNS-OARC Position on a Global DNS-CERT

It is DNS-OARC's view that in order for DNS-CERT to have some impact as a CERT, there needs to be a constituency for it to operate in, including some form of jurisdiction over that constituency, for it to be effective. A CERT needs to gain trust from outside constituencies and other CERTs, in order to share incidents and effectively mitigate threats. This can not be dictated top down, or by unilateral proclamation.


It is also DNS-OARC's experience that sharing of information, whether though data gathering or incident notification, is something that is both essential to DNS infrastructure protection and is something that cannot be compelled. Trust is a very necessary pre-condition for such sharing, but even with trust we have seen that information is not always shared, and some valuable lessons have been learned about what practical limits exist on information sharing.


While ICANN might forsee that extending its jurisdiction could be a means to create a stick for the new gTLDs to participate in DNS-CERT, there is no comparable carrot for the ccTLDs. Additionally, it is unclear what incentives registrars might have to voluntarily co-operate, as co-operation will inevitably lead to loss of income when domains have to be canceled, on top of the labor costs involved to monitor and cancel those.


For an organization such as DNS-CERT to generate the trust it requires to be effective, ICANN needs to allow this entity to be governed and operated outside of the realm and reach of ICANN, working with already established and trusted organizations within the Registry/Registrar/ISP world.


It is projected that the DNS-CERT operating budget will be about US$4.2M annually. ICANN proposes that it is operated by an independent, free-standing entity, governed by a sponsor-based board. At present however there is currently only one sponsor: ICANN. This carries a risk of 'capture' - DNS-CERT can only be truly independent if funded by diverse sponsors and not just ICANN.

Conclusion


DNS-OARC encourages education and awareness in mitigation of threats and handling incidents, and welcomes ICANN's raising the profile of the need for this. We feel it is necessary that this awareness is developed inside organizations that already have a responsibility for parts of the DNS community (such as TLD registries). Subsequently, these organizations can join already established vetted communities like FIRST or DNS-OARC.


DNS-OARC strongly encourages ICANN to work in co-operation with it to build upon DNS-OARC's existing established capabilities within the context of wider DNS security co-operation and initiatives. DNS-OARC would not support any DNS-CERT activity to the extent that it would unnecessarily duplicate DNS-OARC's own capabilities.


We remain to be persuaded of the requirement for developing a single, global DNS-CERT. ICANN has correctly identified some gaps between the capabilities of the existing established DNS security organizations and activities, and DNS-OARC welcomes this analysis. Given the large size of the total potential problem and solution space, we think that it would be overly ambitious to attempt to establish a single over-arching new organization to address all of it. Rather it would be far more effective to fully recognize established activities, and dedicate a more modest budget on addressing the gap. This can be through support and funding to establish, assist and enhance trust and co-operation between these existing organizations.


Building education and awareness of incident handling and mitigation of threats, including development of response teams within, and further consensual co-operation between, existing ICANN constituency organizations will lead to a decentralized global cooperative. We believe that this will be far more effective, with greater ultimate reach and legitimacy than a single central DNS-CERT.



14-Apr-2010

Submitted by wayne@dns-oarc.net on Wed, 2010-04-14 14:14

DITL 2010 Data Collection

A Day in the Life of the Internet is a large-scale data collection project undertaken by CAIDA and OARC every year since 2006. This year, the DITL collection will take place in April. If you would like to participate by collecting and contributing DNS packet captures, please subscribe to the DITL mailing list.

Participation Requirements

There are no strict participation requirements. OARC is happy to accept data from members and non-members alike. If you are a non-member, you may want to sign a Proprietary Data Agreement with us, but this is not required.

In terms of data sources, we are always interested in getting a lot of coverage from DNS Root servers, TLD servers, AS112 nodes, and "client-side" iterative/caching resolvers.

Types of DNS Data

Most of the data that we collect for DITL will be pcap files (e.g., from dnscap or tcpdump). We are also happy to accept other data formats such as BIND query logs, text files, SQL database dumps, and so on. We have an established system for receiving compressed pcap files from contributors. If you want to contribute data in a different format, please contact us to make transfer arrangements.

Pre-collection Checklist

  • Please make sure that your collection hosts are time-synchronized with NTP. Do not simply use date to check a clock since you might be confused by time zone offsets. Instead use ntpdate like this:
    $ ntpdate -q clock.isc.org
    server 204.152.184.72, stratum 1, offset 0.002891, delay 0.02713
    

    The reported offset should normally be very small (less than one second). If not, your clock is probably not synchronized with NTP.

  • Be sure to do some "dry runs" before the actual collection time. This will obviously test your procedures and give you a sense of how much data you'll be collecting.
  • Carefully consider your local storage options. Do you have enough local space to store all the DITL data? Or will you need to upload it as it is being collected? If you have enough space, perhaps you'll find it easier to collect first and upload after, rather than trying to manage both at the same time.

Collecting Data with dnscap

If you don't already have your own system for capturing DNS traffic, we recommend using dnscap with some shell scripts that we provide specifically for DITL collection.

  1. Download the most recent ditl-tools tarball. This includes a copy of the dnscap sources.
  2. You might need to edit dnscap/Makefile. Then run 'make' from the top-level ditl-tools directory.
  3. Note that dnscap no longer depends on libbind.
  4. Run 'make install' as root. This installs dnscap to /usr/local/bin.

Next we'll be working with some scripts in the scripts directory. By default these will store pcap files in the current directory.
You may want to copy these scripts to a different directory where you have plenty of free disk space.

We provide scripts for using either (dnscap) or (tcpdump and tcpdump-split). In most cases dnscap should be easier. The tcpdump method is included for sites that would prefer it or cannot use dnscap for some reason. Note that the settings.sh configuration file described below includes variables for oth dnscap and tcpdump. Some variables are common to both, while some are unique to each.

  1. Copy settings.sh.default to settings.sh.
  2. Open settings.sh in a text editor.
  3. Set the IFACES variable to the names of your network interfaces carrying DNS data.
  4. Set the NODENAME variable (or leave it commented to use the output of `hostname` as the NODENAME). Please make sure that each instance of dnscap that you run has a unique $nodename!
  5. Set the OARC_MEMBER variable to your OARC-assigned name. Note that the scripts automatically prepend "oarc-" to the login name so just give the short version here.
  6. Note that the scripts assume your OARC ssh upload key is at /root/.ssh/oarc_id_dsa.
  7. Look over the remaining variables in settings.sh. Read the comments in capture-dnscap.sh to understand what all the variables mean.

Here is an example customized settings.sh file:

# Settings that you should customize
#
IFACES="fxp0"
NODENAME="lgh"
OARC_MEMBER="test"

When you're done customizing the settings, run capture-dnscap.sh as root:

$ sudo sh capture-dnscap.sh

When its time to do the actual DITL data collection, please uncomment the START_T and STOP_T variables in settings.
sh
and run the scripts from within a screen session.

Collecting Data with tcpdump and tcpdump-split

Another collection option is to use tcpdump and our tcpdump-split program. The instructions are similar to the above.

  1. Download and install the ditl-tools package (see link above).
  2. Copy settings.sh.default to settings.sh and bring it up in a text editor
  3. Set the IFACES variable to the single network interface to collect DNS data from.
  4. Set NODNAME
  5. Set OARC_MEMBER
  6. Tweak the BPF_FILTER variable as necessary.

Note that tcpdump won't capture IP fragments unless you add "or ip[6:2] & 0x1fff != 0" to your BPF_FILTER.

Start the capture with:

$ sudo sh capture-tcpdump.sh

Uncomment the START_T and STOP_T and use screen when its time for the real deal.

Contact

Contact the OARC Admin with any questions about DITL 2010.

Submitted by admin on Tue, 2010-04-06 20:03

Presentations and Notes from the 2nd Global Annual Symposium on DNS Security, Stability and Resiliency

On Monday, February 1, 2010, DNS-OARC organized a few presentations at the start of the 2nd Global Annual Symposium on DNS Security, Stability and Resiliency

Presentations and Notes

13:15: Introduction to DNS-OARC
Roy Arends / DNS-OARC
 
13:25: Investigating anomalous DNS traffic: A proposal for an address reputation system
Sebastian Castro / .NZ Registry Services

Q&A
Willingness of operators to cooperate? There's a need from the TLD operators to handle these kinds of events - to get traction, plans to address a wider audience.

Building a reputation system is one thing, but using it is another. What's the risk of impacts of false positives (in social and technical domains)?
Blacklists frequently don't use such complicated schemes - haven to considered highly strict rules in terms of how identities are added. But this will require more thought as experience grows.

Once you have a blacklist, how would you envisage its use? Blocking services? Other? Filter at the edge? Haven't thought fully about this - future study topic.

Comment: During the event, several AS's were shown. This kind of data shows that a larger-scale study might be needed. So, before we get to "how does the blacklist work?", it might be good to do a larger study.

Comment: There needs to be some back-pressure to deal with trying to get the source to pull their weight to resolve the problem (spammers, for instance). Otherwise, the blacklist's size is going to monotonically increase.
Experientially, it is harder to work the problem back to the source.

Comment: As an operator, this data (or a blacklist) by itself is not particularly interesting, however, in combination with other data -- through a kind of intelligence fusion of data -- we can see some real useful trends. In other words, it's the combining of the data that is interesting, in terms of characterizing the curent situation.

Comment: How far down the chain of "operators" do you go to collect (and analyze) this data? Different players have different ideas about what constitutes an anomaly, or what constitutes a threat. Avoiding subjective language, such as good or bad, might be helpful.

Comment: One idea put forth was that ADSL customers who are characterized as 'bad' should be blocked from using (particular country's) network resolution services, because it could lead to an attack. (if somoene can clarify this comment, please chime in.)

Comment: Seeing an anomaly might not necessarily indicate that a problem is existential in the indicated domain. Instead, it is possible that you are measuring a problem in your own infrastructure.

How much resource (time & money) did this cost .NZ to do this investigation? And how are we going to see sharing of this data? Comment: Smaller organizations are not going to have the resources to be able to focus on all this data.
Two days of work. Tools used were largely already done (and known to the principal investigator).

14:01: APNIC DNS Measurement & Perspectives on 'DNS Health'
George Michaelson / APNIC

Interesting observation that, with DNSSEC enabled, we can see average UDP packet size increasing, including a fair body of packets with a size in excess of 800 bytes, and that this could have a significant impact in the face of network designer assumptions that valid UDP traffic should nominally not include traffic with packet sizes exceeding 512 bytes.

Q&A
If we divide our measurement data into two categories: one caused by human behavior and one by machines, can we explain the diurnal pattern shown in the NXDOMAIN from DSC slide?
Yes, it seems natural that there would be two different categories, and your observation is likely correct. However, there is still work to perform to characterize the collected data in a meaningful way.

Comment: Talked about DSC and that it is doing some things and should be doing others. DNS-OARC was started as an NSF grant. A problem was "how to spend $ in a meaningful, useful way". Now gathering requirements for next generation work. I want NSTAT - a place where data can be aggregated en masse. Now a framework seems achievable. (Next need is funding.)

Admin note: Break moved to 15:00 and shortened to 1/4-hour

14:41: Measurement for ascertaining health of the DNS
James Galvin / Afilias

Note-taker comment: Taking longer notes on this talk, because the slides are highly summarized.

When we collect data in this kind of context, we have to think about massive amounts of data, the size of which is going to grow continually. Are we going to collect measurements in the raw and keep that? Or are we going to create summaries and then aggregate the summaries?

The next question is, how do we collect the data in a sampling? Do we start looking outside our own infrastructure (at the entry point, for instance) or are we going to look inside our infrastructure. There are arguments for and against each possibility, largely because sampling at one point and not sampling at another will gain or lose important statistically important data. This question is deserving of additional study.

We need to think about things like creating a technical advisory board of people who are knowledgable about the subject of analyzing the data so that we can make sense of the information that is collected. This analysis needs to take into account ideas from a wide variety of points of view.

What will the introduction of DNSSEC do to the collection and analysis of data? Will is create a new vector to analyze? Or will it accentuate the negative aspects of existing data analysis?

w.r.t. DNSSEC widespread implementation: As the amount of data being moved increases, and as we see more signed zones being transferred, then we have to think about whether instantaneous propagation is the right model.

Last point, for discussion, is DNS views ("views" is inteded to have a generic usage, with apologies to BIND). Hypothesis is that views and filtering are going to become mainstream -- and perhaps even mandated in some jurisdictions. In sum, entire zones will not be delivered. Filtering will be required in some circumstances.

Q&A
No questions.

Admin note: Break until 15:15.

15:18: Characterizing DNS Client Behavior using Hierarchical Aggregate
Keisuke Ishibashi / NTT

Q&A
(Referring to slide 11, "Experimental results":) You claim that your methodology increases accuracy by 10-20%. What's your ground truth to be able to make such an assertion?
Investigator made criteria but this was rough guidance only.

Is the mathematics intended to achieve on-the-fly results? Second question, this seems simple and cheap to calculate, but it might lead to misclassification rates that are high. What is the intent of the use of the result of the calculation? Comments?
Yes, it is an easy calculation and a valid comment. (nfi)

15:49: JPRS activities on monitoring and measurement of JP DNS and the registry system
Shinta Sato, JPRS

Q&A
Do you develop these criteria for yourself and then discuss it in the community? Is there a reflection of others' needs in these criteria?
We haven't asked external communities -- these are very internal thoughts, which we have not opened up to the public.

What drove those numbers (you picked 15m, 1h) -- is there some goal you're aiming at?
These values were set merely from our own thoughts, not based on some particular objective standard. These values would need to be revisited from time to time to ensure validity.

You chose 50% change in the size of the zone. Do you also cb3r3seck to see the # of changes to the zone?
No, we only check the file size of the zone. We don't account to changes in the resource records.

In Japan, when you're referring to [medical] health, there's "public health" and "private health". We are seeing that your zone files are handled exclusively within your domain? Or do you allow transfers to areas outside your own domain (where you are not fully in control of the health of the environment)?
We don't transfer our entire dataset outside.

We've heard about how to become healthy or how to stay healthy, but it didn't really address what to do when we've become sick.
When the zone file changes are too large, we keep using the zone file and alert to the operators to see what is wrong. After the detection of the existence of an unhealthy state, we have other operational procedures that exceed the scope of this talk, so I did not cover these topics here.

16:19: L-Root Update
Joe Abley / ICANN

This is an operational update on L-Root, not a talk about the signing of L-Root.

Long-Term Query Capture (LTQC) is a tool used by several other root servers. The data from it are stored at OARC. It has the distinct advantage of being targeted and the resultant datasets are small.

Beyond what's shown on the slide (#1), we also have other ongoing tasks, such as graphically displaying trends and data.

On 2010-01-27, we made a transition from the unsigned root to the DURZ.

http://root-dnssec.org/

Questions slide: (1) What else should we measure? (2) What analysis could be done on what we are measuring to identify problems?

Comment: How many half-open TCP connections should be allowed before shutting them down? How much of that (measurement) do you keep?
We are not keeping that data, and it's a good point.

Comment: You don't know how many queries went dark (since DNSSEC went live).
Other person's comment: Trying to distinguish between requests and other data.
Other person's comment: It's also impossible to know what you don't know.

If people have thoughts about what triggers should cause alarm, we would be very interested in capturing that data.

16:41: January 12 Baidu's Attack - What Happened and What Shall We Do?
Wang Zheng / CNNIC

Many efforts underway to enhance the security of DNS service, such as DNSSEC, as one instance. The January 12 attack against Baidu is just a reminder to us to keep an eye on the security of the registry system.

Prior to the January 12 attack, Baidu.com had only been attacked significantly on one occasionally, in December 2006. Baidu.com's registry is Register.com, based in New York.

Chain of events ("at first sight"):
0740 on January 12, Baidu went offlien and traffic was redirected to a website in the Netherlands

DNS records were modified, causing the redirection
It is believed that Register.com was breached, allowing access to, and the gaining of modification rights, to Baidu.com's records.

At 0901, dig showed baiducom's NS records pointing to yns{1,2}.yahoo.com
At 0936, baidu.com -> ns230{3,4}.hostgator.com

Similarly, the registry information was clearly changed at various points in time during the day.

Registrar: Rollback done by register.com at reqeust of Baidu
Direct correction of the record was declined by the registrar, due to a claimed lack of authority.

[Outline of Registry->Registrar->Registrant chain]

Points to consider: do we need special security protections? do we need enhanced communciations between registrant and registrar?

Q&A
What is the status of the lawsuit against Register.com by Baidu.com?
We have no information about this. However, it is likely that the aim of this action is to try to get a clear explanation from Register.com as to why this was so problematic. In a larger sense, this kind of issue needs to be more clearly resolved in order to enhance the stability of the entire Internet.

What TTLs were set? How long did it take for Register.com to figure out that Baidu was making a legitimate request?
Baidu asked Register.com to reset the records. However, Register.com refused to directly correct the DNS records. The only possible mechanism was a rollback. We do not have all of the operational details of what happened, but the resolution was not handled immediately, and there was a long (measured in hours) time before the records were fully corrected.

16:58: DNSCheck and DNS2db
Patrik Wallstrom / .SE

Q&A
No questions.

Tuesday Morning: Symposium Keynote Address
Andrew Sullivan / Shinkuro
Submitted by admin on Fri, 2010-02-26 16:45

L-Root now serving "DURZ" signed responses

In case you haven't heard, L.ROOT-SERVERS.NET began serving a DNSSEC-signed root zone today. DNS-OARC has been collecting data during the signed root rollout. The graph below shows how L-root's priming response size has increased during the last hour since it first began serving signed responses:

We're also watching the data below to see if there are noticeable increases in priming query rates after signing:

So far so good.

Update

And here's something I should have thought to include initially. The following graph shows the increase in TCP queries for L-root. This is over all queries, not just priming queries:

Submitted by wessels@ on Wed, 2010-01-27 19:15

DITL 2009 Data: Query rates to TLDs with wildcards

Last week someone asked me if the DITL 2009 data could shed any light on the amount of queries sent to TLDs with wildcards. While we do have data from a few TLD operators, it wouldn't really help to answer this question. However, I think we can get a "first-order approximation" by looking at the queries to root nameservers. Note that by looking at queries to the roots, we have no knowledge of client queries that are cache hits and those that are sent to the TLD nameservers due to cached referrals. We could perhaps turn to a passive DNS collection, such as SIE for a second opinion.

The long, tall graph below shows the query rate to each TLD seen in the DITL 2009 data. Those TLDs known to have wildcards are shown in blue. Note, however, that some TLDs (such as CN and KR) implement wildcards only for IDN names.

Submitted by wessels@ on Mon, 2009-11-30 18:34

Another Look at Reply Size Test Data

A couple months ago I posted some data from the OARC reply size test service. Recently some folks have been wondering if the situation is getting better or staying the same. Today I created a graph that shows the monthly trend:

The data probably does not contain enough samples to ascertain any trends. The number of samples in each month is shown at the top of the bars. Also keep in mind that this data has a self-selecting bias. We're only measuring resolvers of people that choose to use this service.

Submitted by wessels@ on Tue, 2009-11-10 19:18

Signed Root Zone Rollout and Schedule Announced

Here at the RIPE 59 meeting in Lisbon, Joe Abley from ICANN and Matt Larson from VeriSign announced a plan and schedule for signing the Root Zone. A number of interesting tidbits:

  1. The root zone will technically be signed by December 1, 2009 although ICANN and VeriSign will keep it to themselves for internal testing.
  2. Between January and July 2010, the root servers will begin serving the signed zone one "letter" (server) at a time.
  3. Also during this rollout period, actual DNSSEC keys will be replaced with "dummy" keys so that validation CANNOT occur. In other words, the public components of the signing keys will not be published, which makes it impossible to configure a trust anchor for the root zone.
  4. During the rollout period, the traffic on both signed and unsigned roots will be monitored for impacts and effects.
  5. By July 1, 2010 the KSK will be rolled and published to achieve a fully signed root zone.

The RIPE presentation contains additional details such as key sizes, algorithms and rolling intervals.

Submitted by wessels@ on Wed, 2009-10-07 11:17

Report on the Impacts of Changes to the DNS Root Zone

Earlier this year, ICANN contracted with DNS-OARC to study the impacts of potential changes facing the DNS root zone. These changes include: (1) a significant increase in the number of gTLDs, (2) signing the zone with DNSSEC, and (3) continued increase in IPv6 glue. In our study we explore how these changes affect:

  • The size of the zone (e.g., on disk and in memory)
  • How much time is required to load, or reload, the zone
  • Latency and performance of serving the zone
  • Time and bandwidth necessary for zone transfers
  • Truncated responses and retries over TCP

Executive Summary

This excerpt of the executive summary is taken from the full report:

Our analysis of zone size focuses on memory usage. As expected, we find that memory requirements increase linearly with zone size. We also find that, for a given number of TLDs, signing the zone increases the memory requirement by a factor of 1.5–2. Additionally, we find that 32 GB of memory is insufficient for serving a very large root zone (e.g., a signed zone with 10 million TLDs), particularly when using NSD.

The response latency measurements find negligible increases (typically less than one millisecond) with NSD. For BIND (9.6.0-P1), however, we find some response time degradation with a large signed root zone (e.g., greater than 100,000 TLDs). With a 100,000 TLD signed zone, BIND drops nearly 30% of all queries sent at a rate of 5000 queries per second. With a one million TLD signed zone, BIND drops over 80%. NSD also begins to show some signs of stress with a very large (4.5 million TLD) zone where over 40% of queries are dropped.

The reload and restart times measurements are relatively straightforward and contain no real surprises. Loading and reloading times are generally proportional to zone size. Loading a 1 million TLD signed zone takes 190 seconds with BIND and 227 seconds with NSD.

To measure inter-nameserver bandwidth we performed a number of zone transfers between master and slave nameservers. We tested both standard (AXFR) and incremental (IXFR) zone transfer mechanisms. One interesting result of the AXFR test is that an NSD master utilizes 20–30% less bandwidth than a BIND master to send a given zone. To assess the duration of a zone transfer under wide-area network conditions, we introduced simulated packet loss and delays. A zone transfer experiencing 1% packet loss takes more than 2.5 times longer than with no packet loss for any given tested latency.

To explore increased TCP at root servers, we replayed real query 1streams to servers with signed zones. We found that between 0.3% and 0.8% of responses to UDP queries would be truncated, likely causing most these clients to fall back to TCP. This means that root servers can expect to see at least an order of magnitude increase (e.g., from 5 to 50 per second) in queries over TCP when the root zone is signed. Additionally, we found that a large (e.g., one million TLD) signed root zone will likely result in a slightly higher proportion of TCP queries than a signed version of the current one. Finally, we examined data for the .org TLD from before and after DNSSEC was deployed and found evidence suggesting that the actual increase in TCP-based queries could be significantly higher than can be forecast by evaluating current DNS traffic patterns.

Submitted by admin on Thu, 2009-09-17 18:51

Statistics on UDP checksums in 2009 DNS DITL data

I was recently asked if OARC had any data on the percentage of DNS queries with bad or disabled UDP checksums. After a few days of crunching through the DITL 2009 data, I have the following results:

Data Provider Matched Mismatch Disabled
afilias 99.02 0.85 0.12
apnic 99.89 0.01 0.10
arin 99.92 0.02 0.07
arl 99.28 0.62 0.10
as112-gf 99.91 0.00 0.09
cogent 52.30 47.66 0.04
cznic 73.50 26.40 0.10
icann 99.52 0.36 0.12
iis 97.58 2.36 0.06
isc 96.73 3.14 0.13
lacnic 99.79 0.01 0.19
namex 100.00 0.00 0.00
nasa 99.34 0.57 0.09
nethelp 99.90 0.06 0.05
niccl 53.94 46.01 0.04
nixcz 99.82 0.18 0.00
nominet 99.80 0.04 0.15
pktpush 69.29 26.04 4.67
regbr 98.30 1.48 0.22
ripe 98.81 1.07 0.12
switch 99.91 0.01 0.08
tix-or-tz 100.00 0.00 0.00
uninett 99.91 0.01 0.08
verisign 99.03 0.92 0.05
wide 99.58 0.36 0.06

Obviously, its interesting that most of the traces show 99% matching checksums, but a few have close to 25% or 50% with mismatches. I'm likely to suspect some kind of "middle boxes" (load balancers?) at play here, but have not yet investigated further.

Update

Mauricio from NIC Chile reports that most of their bad UDP checksums are from replies they send out. They have some Dell hardware running Linux. The Linux installation doesn't support certain NIC hardware features, such as checksum calculations. Hardware checksumming can be disabled with this command:

# ethtool --offload ethXX tx off
Submitted by wessels@ on Wed, 2009-08-19 05:19

Data from the Reply Size Test server

Over on the IETF namedroppers mailing list there is a discussion about DNSSEC and UDP fragmentation. See this thread, for example. Since the OARC Reply Size Test has been going for a couple of months now, I thought maybe it would have enough data for a decent analysis. Here's what I found:

The sample isn't quite as large as I'd like. Only about 1650 unique client addresses tested so far.

The bar plots show histograms of the number of clients (on y-axis) receiving certain maximum reply sizes (x-axis). Each plot is for a different advertised EDNS receive buffer size. Note that the 512 data also includes clients that didn't send any EDNS information.

Normally the Reply Size Test utility won't return responses larger than the advertised buffer size. However, in the 512 data you can see a few counts around 4000. This can happen for one of two reasons: first, a clever user can "trick" the utility by sending queries with certain values in the query name. Second, there is a special mode for Nominum CNS, which won't send EDNS information unless it first receives a truncated response.

The most interesting data is in the 4096-bytes plot. Most of the clients can receive 4K responses. However, about 20% are limited to 2K or less. The bar just left of 1500 on the x-axis represents clients that cannot receive fragmented DNS responses.

Submitted by wessels@ on Tue, 2009-08-18 23:34