426 lines
19 KiB
Markdown
426 lines
19 KiB
Markdown
# hello-dns
|
|
Hello and welcome to DNS!
|
|
|
|
This document attempts to provide a correct introduction to the Domain Name
|
|
System as of 2018. The original RFCs remain the authoritative source of
|
|
normative text, but this document tries to be in full alignment with all
|
|
relevant and useful RFCs.
|
|
|
|
Although we start from relatively basic principles, the reader is expected
|
|
to know what IP addresses are, what a (stub) resolver is and what an
|
|
authoritative server is supposed to do. When in doubt: authoritative servers
|
|
'host' DNS, 'resolvers' look up things over at authoritative servers and
|
|
clients run 'stub resolvers' to look things up over at resolvers.
|
|
|
|
DNS was originally written down in August 1979 in 'IEN 116', a parallel
|
|
series of documents describing the internet. IEN 116 era DNS is not
|
|
compatible with today's DNS. In 1983, RFC 882 was released, and stunningly
|
|
enough, an implementation of this 35 year old document would function
|
|
on the internet and be interoperable.
|
|
|
|
RFC attained its modern form in 1987 when RFC 1034 and 1035 were published.
|
|
Although most of 1034/1035 remains valid, these standards are not that easy
|
|
to read because they were written in a very different time.
|
|
|
|
The main goal of this document is not to contradict 1034 and 1035 but to
|
|
provide an easier entrypoint into DNS.
|
|
|
|
## Layout
|
|
We start off with a general introduction of DNS basics: what is a resource
|
|
record, what is a RRSET, what is a zone, what is a zone-cut, how are packets
|
|
laid out. This part is required reading for anyone ever wanting to query a
|
|
nameserver or emit a valid response.
|
|
|
|
We then specialize into what applications can expect when they send
|
|
questions to a resolver, or what a stub-resolver can expect.
|
|
|
|
The next part is about what an authoritative server is supposed to do. On
|
|
top of this, we describe in slightly less detail how a resolver could
|
|
operate. Finally, there is a section on DNSSEC.
|
|
|
|
Note that this file, which describes DNS basics, absolutely must be read from
|
|
beginning to end in order for the rest of the documents (or DNS) to make
|
|
sense.
|
|
|
|
## DNS Basics
|
|
In this section we will initially ignore optional extensions that were added
|
|
to DNS later, specifically EDNS and DNSSEC which requires EDNS to function.
|
|
|
|
DNS is mostly used to serve IP addresses and mailserver details, but it can
|
|
contain arbitrary data. DNS is all about names. Every name can have data
|
|
of several *types*. The most well known externally useful types are *A* for
|
|
IPv4 addresses, *AAAA* for IPv6 addresses and *MX* for mailserver details.
|
|
DNS also has types that have meaning for its own use, like *NS*, *CNAME* and
|
|
*SOA*.
|
|
|
|
When we ask a DNS question we call this a *query*. We call the reply the
|
|
*response*. These queries and responses are contained in DNS messages. When
|
|
UDP is used, the message is also the packet.
|
|
|
|
A DNS message has:
|
|
|
|
* A header
|
|
* A query name and query type
|
|
* An answer section
|
|
* An authority section
|
|
* An additional section
|
|
|
|
The header has the following fields that are useful for queries and
|
|
responses:
|
|
|
|
* ID: a 16 bit identifier used as part of the process of matching queries to responses
|
|
* QR: Set to 0 to identify a message as a query, 1 for a response
|
|
* OPCODE: 0 for a standard query, other opcodes also exist
|
|
* RD: Set to indicate that this question wants *recursion*
|
|
|
|
Relevant for responses:
|
|
* AA: This answer has Authoritative Answers
|
|
* RA: Recursive service was available
|
|
* TC: Not all the required parts of the answer fit in the message
|
|
|
|
In basic DNS, query messages should have no answer, authority or additional
|
|
sections. DNS queries are mostly sent over UDP, and UDP packets can easily
|
|
be spoofed. To recognize the authentic response to a query it is important
|
|
that the ID field is random or at least unpredictable. This is however not
|
|
enough protection, so the source port of a UDP DNS query must also be
|
|
unpredictable.
|
|
|
|
The header of a question for the IPv6 address of www.ietf.org looks like
|
|
this:
|
|
|
|
```
|
|
1 1 1 1 1 1
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ID = random 16 bits |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
|
|
|0 | 0 |0 | 0| 0|0 | 0 | 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| QDCOUNT = 1 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ANCOUNT = 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| NSCOUNT = 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ARCOUNT = 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
```
|
|
|
|
Note that we did not spend time on field Z, this is because it is defined to
|
|
be 0 at all times. This packets also requests recursion. QDCOUNT = 1 means
|
|
there is 1 question. In theory DNS supported several questions in one
|
|
message, but this has not been implemented. ANCOUNT, NSCOUNT and ARCOUNT are
|
|
all zero, indicating there as no answers in this question packet.
|
|
|
|
Here is the actual question:
|
|
|
|
```
|
|
1 1 1 1 1 1
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 3 w |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| w w |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 4 i |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| e t |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| f 3 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| o r |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| g 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 0 28 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 0 1 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
```
|
|
|
|
This consists of the 'www.ietf.org' encoded in DNS wire format (for which
|
|
see below), followed by a 16 bit type field. For AAAA, which denotes the
|
|
IPv6 address, this is 28. This is then followed by the 'class' of the
|
|
question. It was originally intended that DNS records would exist in
|
|
different 'classes', but the semantics of this were not specified completely
|
|
and it was not really implemented. For now, always set class to 1.
|
|
|
|
Of specific note is the somewhat unusual way the name 'www.ietf.org' is
|
|
serialized in DNS. 'www.ietf.org' consists of 3 labels of lenghts 3, 4 and
|
|
3. In DNS messages, this is encoded as the value 3, then www, then the
|
|
value 4, then ietf, then 3 followed by org. Then there is a trailing 0
|
|
which denotes this is the end.
|
|
|
|
This format is unusual, but has several highly attractive properties. For
|
|
example, it is binary safe and it needs no escaping. When writing DNS
|
|
software, it may be tempting to pass DNS names around as "ASCII". This then
|
|
leads to escaping an unescaping code in lots of places. It is highly
|
|
recommended to use the native DNS encoding to store DNS names. This will
|
|
save a lot of pain when processing DNS names with spaces or dots in them.
|
|
|
|
Finally, DNS queries are case-insensitive. This however is defined rather
|
|
mechanically. Operators do not need to know that in some ASCII encodings a Ü
|
|
is equivalent to ü when compared case insensitively. For DNS purposes, the
|
|
fifth bit (0x20) is ignored when comparing octets within a-Z and A-Z.
|
|
|
|
Note that individual labels of a name may only be 63 octets long.
|
|
|
|
Next up, a DNS response. Note that this again is a DNS message, and it looks
|
|
a lot like the original DNS query. Here is the beginning of a response:
|
|
|
|
```
|
|
1 1 1 1 1 1
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ID = same random 16 bits |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
|
|
|1 | 0 | 1| 0| 0| 0| 0 | 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| QDCOUNT = 1 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ANCOUNT = 1 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| NSCOUNT = 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| ARCOUNT = 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 3 w |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| w w |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 4 i |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| e t |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| f 3 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| o r |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| g 0 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 0 28 (0x1c) |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 0 1 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
|
|
```
|
|
|
|
Note that QR is now set to 1 to denote a response. The 'AA' bit was set
|
|
because this answer came from a from a server authoriative for this name.
|
|
|
|
In addition, ANCOUNT is now set to '1', indicating a single answer is to be
|
|
found in the message, immediately after the original question, which has been
|
|
repeated from the query message.
|
|
|
|
To recognize the right response, check that the ID field is the same as the
|
|
query, make sure the answer arrives on the right source port and that the
|
|
query name and type match up with the original query. In addition, make sure
|
|
not to send out more than one equivalent query when still waiting for the
|
|
response, as doing so opens a security hole.
|
|
|
|
After the header and the original question we find the answer:
|
|
|
|
```
|
|
1 1 1 1 1 1
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 0xc0 0x0c |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 28 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 01 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| TTL = 3600 |
|
|
| |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| RDLENGTH = 16 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
|
|
| 24 00 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| cb 00 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 20 48 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 01 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 00 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 00 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 68 14 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
| 00 55 |
|
|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|
|
```
|
|
|
|
The first two bytes (0xc0 0c0c) look rather mysterious. When DNS was
|
|
created, 512 octets was considered the maximum size of a UDP datagram and
|
|
thus the maximum size of a DNS message transported without using the (then
|
|
slow) TCP protocol.
|
|
|
|
In order to squeeze as much information as possible into the 512 bytes, DNS
|
|
names can (and often MUST) be compressed. The details of this compression
|
|
are arcane and easy to get wrong, leading to infinite loops or buffer
|
|
overflows. So tread very carefully. If you remember one thing, make sure
|
|
that a pointer always has to go to a lower position in the packet. Also
|
|
beware of signed/unsigned arithmetic.
|
|
|
|
In this case, the DNS name of the answer is encoded is '0xc0 0x0c'. The c0
|
|
part has the two most significant bits set, indicating that the following
|
|
6+8 bits are a pointer to somewhere earlier in the message. In this case,
|
|
this points to position 12 (= 0x0c) within the packet, which is immediately
|
|
after the DNS header. There we find 'www.ietf.org'.
|
|
|
|
So what this means is that the answer about the DNS name 'www.ietf.org' is
|
|
also called 'www.ietf.org'.
|
|
|
|
This is then followed in the packet by '28', which denotes AAAA (IPv6), and
|
|
the usual 'class' of 1. Then a whole 32 bits are devoted to the Time To Live
|
|
of this record, followed by a 16 bits length field. Since this is an IPv6
|
|
address, the actual answer payload length is 16 bytes (or 128 bits).
|
|
|
|
This is then followed by the binary representation of the current IPv6
|
|
address of www.ietf.org, 2400:cb00:2048:1::6814:55.
|
|
|
|
## RRSETs
|
|
In the example above, the question for the AAAA record of 'www.ietf.org' had
|
|
exactly one corresponding resource record. In a human readable 'zone file',
|
|
this would stored as:
|
|
|
|
```
|
|
www.ietf.org IN AAAA 3600 2400:cb00:2048:1::6814:55
|
|
```
|
|
|
|
It is however possible to have multiple AAAA records for the same name. Even
|
|
if there is only one record, the DNS specifications talk about 'Resource
|
|
Record Sets', or RRSETs. These operate in unity. So even though the encoding
|
|
in the DNS packet allows different TTL values within a single RRSET, this
|
|
should never happen.
|
|
|
|
## Zone files
|
|
Zone files are one way of storing DNS data, but these are not integral to
|
|
the operation of a nameserver. The zone file format is standardised, but it
|
|
is highly non-trivial to parse. It is entirely possible to write useful
|
|
nameserver that do not read or write DNS zone files. When embarking on
|
|
parsing zonefiles, do not do so lightly. As an example, various fields
|
|
within a single line can appear in many orders. Most fields are optional,
|
|
and some will then be copied from the previous line. But not all.
|
|
|
|
Of specific note, many people have attempted to write a grammar (parser) for
|
|
zonefiles and it is almost impossible.
|
|
|
|
## Zones
|
|
The concept of a DNS zone is non-trivial and frequently misunderstood.
|
|
Despite writing 'www.ietf.org' from left to right, within DNS it is fairer
|
|
to describe it as 'org' below the root node, with below the 'org' node a
|
|
node called 'ietf'. Finally to the 'ietf' node is attached a node called
|
|
'www'.
|
|
|
|
|
|
Or in graphical form:
|
|
|
|
```
|
|
+-----+
|
|
| . |
|
|
+-----+
|
|
|
|
|
+-----+
|
|
| ORG |
|
|
+-----+
|
|
|
|
|
+------+
|
|
| IETF |
|
|
+------+
|
|
|
|
|
+-----+
|
|
| WWW |
|
|
+-----+
|
|
```
|
|
|
|
To make life confusing, 'www.ietf.org' could be defined in four different
|
|
places. It could be in the 'root' zone itself, fully written out:
|
|
|
|
```
|
|
www.ietf.org IN AAAA 3600 2400:cb00:2048:1::6814:55
|
|
```
|
|
Or it could be in the org zone, where it might look like this:
|
|
|
|
```
|
|
$origin ORG
|
|
www.ietf IN AAAA 3600 2400:cb00:2048:1::6814:55
|
|
```
|
|
|
|
Or, (as is actually the case), this name could live in the 'ietf.org' zone:
|
|
|
|
```
|
|
$origin ietf.org
|
|
www IN AAAA 3600 2400:cb00:2048:1::6814:55
|
|
```
|
|
|
|
And finally, it is even possible that there is a zone called 'www.ietf.org',
|
|
where the record lives like this:
|
|
|
|
```
|
|
$origin www.ietf.org
|
|
@ IN AAAA 3600 2400:cb00:2048:1::6814:55
|
|
```
|
|
|
|
For each of these four scenarios, the 'tree' of nodes as shown above is
|
|
real. This for example means that if there is a name called
|
|
'www.fr.ietf.org' and a query comes in for 'fr.ietf.org', that name
|
|
exists - even though no records may be assigned to it.
|
|
|
|
NOTE: This means that any implementation that sees DNS as a simple
|
|
'key/value' store, where only records that exist can match, is headed for
|
|
trouble down the line.
|
|
|
|
## Start of Authority, zone cuts, delegations
|
|
As noted above, 'www.ietf.org' can live in four places: the root zone, the
|
|
org zone, the ietf.org zone and even in a zone that is itself called
|
|
'www.ietf.org'.
|
|
|
|
A zone always starts with a SOA or Start Of Authority record. A SOA record
|
|
is DNS metadata. It stores various things that may be of interest about a
|
|
zone, like the email address of the maintainer, the name of the most
|
|
authoritative server. It also has time intervals that describe how a zone
|
|
needs to be replicated. Finally, the SOA record has a number that influences
|
|
TTL values for names that do or do not exist.
|
|
|
|
There is only one SOA that is guaranteed to exist and that is the one for
|
|
the root zone (called '.'). As of 2018, it looks like this:
|
|
|
|
```
|
|
. 86400 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2018032802 1800 900 604800 86400
|
|
```
|
|
|
|
This says: the authoritative server for the root zone is called
|
|
'a.root-servers.net'. This name is however only used for diagnostics.
|
|
Secondly, nstld@verisign-grs.com is the email address of the zone
|
|
maintainer. Note that the '@' is replaced by a dot. Specifically, if the
|
|
email address had been 'nstld.maintainer@verisign-grs.com', this would have
|
|
been stored as nstld\.maintainer.verisign-grs.com. This name would then
|
|
still be 3 labels long, but the first one has a dot in it.
|
|
|
|
The following field, 2018032802, is a serial number. Quite often, but by all
|
|
means not always, this is a date in proper order (YYYYMMDD), followed by two
|
|
digits of serial numbers. This serial number is used for replication
|
|
purposes, as are the following 3 numbers.
|
|
|
|
Zones are hosted on 'masters'. Meanwhile, 'slave' servers poll the master
|
|
for updates, and pull down a new zone if they see new contents, as noted by
|
|
an increase in serial number.
|
|
|
|
The numbers 1800 and 900 describe how often a zone should be checked for
|
|
updates (twice an hour), and that if an update check fails it should be
|
|
repeated after 900 seconds. Finally, 604800 says that if a master server
|
|
was unreachable for over a week, the zone should be deleted from the slave.
|
|
This is not a popular feature.
|
|
|
|
The final number, 86400, denotes that if a response says a name or RRSET
|
|
does not exist, it will continue to not exist for the next day, and that
|
|
this knowledge may be cached.
|
|
|
|
|