OARC's TLDmon uses Nagios to monitor operational characteristics of authoritative nameservers for the Root Zone and all Top Level Domains. TLDmon checks for authoritative answers, EDNS support, lame delegations, consistent NS RR sets, open resolvers, expired RRSIGs, matching serial numbers, and TCP support. As the Domain Name System continues its evolution, it becomes increasingly important that these critical nameservers are configured correctly.
TLDmon is available to the public. OARC members can receive notifications (via email) about zone problems directly from Nagios. Members can also request monitoring of additional (non-TLD) zones. Please contact the OARC Admin for more information.
Nagios is really designed to monitor hosts and services that run on those hosts. We've configured Nagios such that each DNS zone is a "host" each and characteristic to be monitored is a "service." TLDmon checks the following operational characteristics of each zone:
The AA service checks that all nameservers set the Authoritative Answer (AA) bit in responses to SOA queries for the zone. When all nameservers set the AA bit, the AA service is marked OK and shown in green. If one or more do not set the AA bit, the service is marked WARNING and shown in yellow. A nameserver that does not set the AA bit may be configured as a caching resolver, rather than an authoritative server. Caching resolvers are susceptible to DNS cache poisoning.
The CLKSKEW service checks that all the clocks of all nameservers are reasonbly within sync to the current time. The check uses a TSIG query to learn the server's time. When all nameservers are within 15 seconds of the correct time, the service is marked OK and shown in green. If any nameserver's clock skew is greater than 15 seconds, the service is marked WARNING and shown in yellow. In this case the nameserver with the largest skew will be shown in the status information field.
The EDNS service checks that all nameservers support EDNS0 extensions. When all nameservers support EDNS0, the EDNS service is marked OK and shown in green. If one or more do not, the service is marked WARNING and shown in yellow. The EDNS0 protocol extension (written in 1999) is necessary for the transmission of UDP DNS messages larger than 512 octets. It is also used to request DNSSEC validation.
The IPV6 service checks that the zone's IPv6-enabled nameservers are working. Only those zones with at least one IPv6-enabled nameserver are checked. When all IPv6-enabled nameservers are working, the service is marked OK and shown in green. If one or more do not, the service is marked WARNING and shown in yellow. IPv6 is increasingly important as the IPv4 free pool shrinks in size.
The LAME service checks that no nameservers are lame for the zone. A nameserver is considered lame when the response to an SOA query for the zone contains no records in the answer section. When no nameservers are lame, the LAME service is marked OK and shown in green. If one or more are lame, the service is marked WARNING and shown in yellow. A lame nameserver results in wasted queries and additional latency for end users.
The NSSET service checks that all nameservers report the same set of NS records for the zone and that they match the delegations from the parent zone. Note that lame nameservers are excluded from this check. If all (non-lame) nameservers report the same NS set, the service is marked OK and shown in green. If there is at least one inconsistency, the service is marked WARNING and shown in yellow.
A nameserver that is known by different names appears as an inconsistency when the delegation name does not match the name listed in the zone. Some people may not consider this a real problem. The NSSET service only checks the nameserver names, not their IP addresses.
The OPENRES service checks that none of the nameservers are open resolvers (i.e., providing recursive resolution to any client). If no nameservers are open resolvers, the OPENRES service is marked OK and shown in green. If one or more are open, the service is marked WARNING and shown in yellow. It is usually a bad idea to mix recursive and authoritative DNS services in a single process, and especially so for top-level zones in the DNS infrastructure. Open resolvers have an increased vulnerability to cache poisoning and denial of service.
The RCODE service checks that all nameservers return response code zero ("NOERROR") in response to an SOA query for the zone. If all nameservers return NOERROR, the RCODE server is marked OK and shown in green. If one or more return an error code (such as SERVFAIL, REFUSED) or no response at all, the service is marked WARNING and shown in yellow. Nameservers that return errors or cause timeouts lead to wasted queries and increased latency for end users.
The RRSIG service checks the expiration time of DNSSEC RRSIG records for zone itself. Zones that do not implement DNSSEC are excluded from this check. If the expiration time for RRSIG records is greater than 3 days, the service is marked OK and shown in green. If one or more RRSIG records is already expired, the service is marked CRITICAL and shown in red. If records expire in less than 3 days, the service is marked WARNING and shown in yellow.
The SERIAL service checks that all nameservers report the same serial number for the zone. When all nameservers report the same serial number, the SERIAL service is marked OK and shown in green. If one or more nameservers has a different serial number, the service is marked WARNING and shown in yellow.
Serial number checking is prone to false alarms due to latencies involved in master/slave synchronization, and in the time that it takes to query multiple nameservers. To reduce false alarms, we tolerate two exceptions to the requirement that serial numbers must match:
The TCP service checks that all nameservers respond to queries over TCP, rather than UDP. When all nameservers support TCP, the TCP service is marked OK and shown in green. If one or more do not support TCP, the service is marked WARNING and shown in yellow. TCP support is becoming increasingly important with the deployment of DNSSEC and IPv6.
Long Term Trends
The Trends Graph page shows the number of zones with services in OK, WARNING, and CRITICAL states since the project began. These graphs will help us understand whether things are getting better, worse, or not changing over time.
Please visit the TLDmon code page if you'd like to see the Nagios plugins that we use.
Submitted by admin on Wed, 2008-11-12 18:23. categories [ ]
Eric Osterweil gave an interesting talk at the NANOG44 DNSSEC BOF. His data (shown below) from SecSpider indicates a significant increase in signed zones right around the time that CERT VU #800113 was released:
(click for larger)
The blue line shows a large increase in the number of signed zones automatically discovered by the SecSpider crawlers in August 2008. A message to the dnssec-deployment list confirms this as well.
Submitted by firstname.lastname@example.org on Mon, 2008-10-13 22:02.
Last week David Conrad of ICANN asked if I could make DSC show the EDNS buffer sizes advertized by clients. This is now available in DSC versions dated after 2008-08-22. Furthermore, the new version is running on the F-root collector nodes.
The breakdown of buffer sizes looked different than I remembered for recent DITL data, so I generated some graphs for the 2006 through 2008 DITL data (F-root nodes) using the same size ranges.
The trend is good news. Back in Jan 2007, 50% of queries did not indicate any EDNS support. 20% had bufsiz=2048, and 30% had bufsiz=4096. Now we have about 65% of queries with bufsiz=4096, while 35% still don't support EDNS.
Submitted by wessels on Mon, 2008-08-25 20:17.
Date: Mon, 4 Aug 2008 18:22:46 -0400
ISC SIE has developed a tool for detecting cache poisoning attempts. it consists of two parts: ncaptool, the part which performs packet gathering, reassembly, and dns filtering; and mod_urstate, a message processing module which attempts to statefully detect unsolicited responses that may be indicative of cache poisoning attempts. specifically, the tool is designed to listen at the network layer of a recursive dns server, auditing the query-response stream between recursive and authoritative dns servers. when a potential cache poisoning attempt is detected, both the offending and original dns responses will be emitted into the output stream.
ncaptool and mod_urstate may be obtained via ftp:
the defaults are tuned for a dedicated IDS-style setup; e.g. fairly fast machines with >= 1 GB of memory and aggregated taps of dns traffic between recursive and authoritative nameservers. it ought to be possible to run it directly on a machine running a recursive nameserver, however.
we would like people to use it, and if possible provide feed back or contribute the data it generates to SIE.
mod_urstate is an ncaptool dns message parsing plugin that attempts to detect unsolicited dns responses that may be indicative of cache poisoning attempts. it does this by statefully tracking the application layer state of the dns transactions between recursive and authoritative dns servers. it gracefully handles query retransmissions due to client timeouts and byte identical responses from dns authorities.
two data collections are employed by mod_urstate:
when a query for a previously unknown tuple arrives, the query payload is cached so that subsequent byte identical queries may be discarded. when the first response for this query arrives, its payload is also cached. the vast majority of dns queries will only elicit a single response, so most cache entries will be quietly expired in fifo order.
however, in the usual case that a domain name's authoritative nameservers are reachable and functioning, a potentially successful cache poisoning attempt has these properties:
(it should be noted that certain types of benign dns responses unfortunately also match these criteria. it will be possible to filter these out in the post-processing / analysis phase.)
the mod_urstate module will output dns responses matching the above criteria. this output can then be analyzed to help answer the following questions:
it should be noted that an out of band method of distinguishing malicious from legitimate responses will be needed by the analyst.
Submitted by wessels on Fri, 2008-08-08 22:30.
Timeline of Events
Tools for Testing and Monitoring
Data, Documentation, Papers, and Presentations
Submitted by wessels on Thu, 2008-07-31 22:36.
A number of people have been asking for a way to check transaction ID randomness, in addition to source port randomness. OARC's porttest tool has now been expanded to also report on transaction IDs. To use it, issue a TXT query for the name txidtest.dns-oarc.net. For example, with dig:
Also note that in conjunction with this enhancement, the scoring critera for porttest and txidtest have been changed to match the web-based port test. The scoring is as follows:
Submitted by wessels on Thu, 2008-07-10 06:10.
CERT and numerous vendors are making a major announcement today regarding a DNS protocol vulnerability that may enable cache poisoning of recursive resolvers. From the CERT page:
We can expect patches from most vendors that will implement randomization of query source ports. According to ISC, source port randomization only increases the difficulty of the attack, but does not entirely prevent it. The best prevention, they say, is to implement DNSSEC.
Here are some vendor announcements:
The vulnerability was discovered by Dan Kaminsky of IOActive.
Submitted by wessels on Tue, 2008-07-08 18:47.
Within a day of ICANN's gTLD announcement, ZDNet reports that a Turkish hacking group has hijacked domain names belonging to IANA and ICANN. Interestingly, only thier "alternative" names were hijacked. For example, ICANN.COM and ICANN.NET were, but ICANN.ORG was not. Similarly, IANA.COM was, but IANA.ORG was not. The same group is apparently responsible for other recent high profile domain hijinks as well.
One thing all of the hijacked names have in common is their registrar, Register.com, which was apparently able to fix the problem within about 20 minutes. Let's hope the parties involved are up-front enough to explain what happened.
Submitted by wessels on Sun, 2008-06-29 20:38.
ARI Registry Services
Public Interest Registry
University of Maryland