dnsguide/chapter5.md

369 lines
14 KiB
Markdown
Raw Normal View History

2017-11-23 20:52:49 +07:00
5 - Recursive Resolve
=====================
2017-11-23 20:51:20 +07:00
Our server is working, but being reliant on another server to actually perform
2019-10-17 17:10:32 +07:00
the lookup is annoying and less than useful. Now is a good time to delve into
2017-11-23 20:51:20 +07:00
the details of how a name is really resolved.
Assuming that no information is known since before, the question is first
issued to one of the Internet's 13 root servers. Why 13? Because that's how
many that fits into a 512 byte DNS packet (strictly speaking, there's room for
14, but some margin was left). You might think that 13 seems a bit on the low
side for handling all of the internet, and you'd be right -- there are 13
logical servers, but in reality many more. You can read more about it
[here](http://www.root-servers.org/). Any resolver will need to know of these
13 servers before hand. A file containing all of them, in bind format, is
available and called [named.root](https://www.internic.net/domain/named.root).
These servers all contain the same information, and to get started we can pick
2020-06-09 05:09:18 +07:00
one of them at random. Looking at `named.root` we see that the IP-address of
2017-11-23 20:51:20 +07:00
*a.root-servers.net* is 198.41.0.4, so we'll go ahead and use that to perform
our initial query for *www.google.com*.
```text
# dig +norecurse @198.41.0.4 www.google.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> +norecurse @198.41.0.4 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64866
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 16
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com. IN A
;; AUTHORITY SECTION:
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
;; ADDITIONAL SECTION:
e.gtld-servers.net. 172800 IN A 192.12.94.30
b.gtld-servers.net. 172800 IN A 192.33.14.30
b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30
j.gtld-servers.net. 172800 IN A 192.48.79.30
m.gtld-servers.net. 172800 IN A 192.55.83.30
i.gtld-servers.net. 172800 IN A 192.43.172.30
f.gtld-servers.net. 172800 IN A 192.35.51.30
a.gtld-servers.net. 172800 IN A 192.5.6.30
a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30
g.gtld-servers.net. 172800 IN A 192.42.93.30
h.gtld-servers.net. 172800 IN A 192.54.112.30
l.gtld-servers.net. 172800 IN A 192.41.162.30
k.gtld-servers.net. 172800 IN A 192.52.178.30
c.gtld-servers.net. 172800 IN A 192.26.92.30
d.gtld-servers.net. 172800 IN A 192.31.80.30
;; Query time: 24 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Fri Jul 08 14:09:20 CEST 2016
;; MSG SIZE rcvd: 531
```
The root servers don't know about *www.google.com*, but they do know about
*com*, so our reply tells us where to go next. There are a few things to take
note of:
* We are provided with a set of NS records, which are in the authority
section. NS records tells us *the name* of the name server handling
a domain.
* The server is being helpful by passing along A records corresponding to the
NS records, so we don't have to perform a second lookup.
* We didn't actually perform a query for *com*, but rather *www.google.com*.
However, the NS records all refer to *com*.
Let's pick a server from the result and move on. *192.5.6.30* for
*a.gtld-servers.net* seems as good as any.
```text
# dig +norecurse @192.5.6.30 www.google.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> +norecurse @192.5.6.30 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16229
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com. IN A
;; AUTHORITY SECTION:
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
;; ADDITIONAL SECTION:
ns2.google.com. 172800 IN A 216.239.34.10
ns1.google.com. 172800 IN A 216.239.32.10
ns3.google.com. 172800 IN A 216.239.36.10
ns4.google.com. 172800 IN A 216.239.38.10
;; Query time: 114 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Fri Jul 08 14:13:26 CEST 2016
;; MSG SIZE rcvd: 179
```
We're still not at *www.google.com*, but at least we have a set of servers that
handle the *google.com* domain now. Let's give it another shot by sending our
query to *216.239.32.10*.
```text
# dig +norecurse @216.239.32.10 www.google.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> +norecurse @216.239.32.10 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20432
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 300 IN A 216.58.211.132
;; Query time: 10 msec
;; SERVER: 216.239.32.10#53(216.239.32.10)
;; WHEN: Fri Jul 08 14:15:11 CEST 2016
;; MSG SIZE rcvd: 48
```
And here we go! The IP of *www.google.com* as we desired. Let's recap:
* *a.root-servers.net* tells us to check *a.gtld-servers.net* which handles com
* *a.gtld-servers.net* tells us to check *ns1.google.com* which handles google.com
* *ns1.google.com* tells us the IP of *www.google.com*
This is rather typical, and most lookups will only ever require three steps,
even without caching. It's still possible to have name servers for subdomains,
and further ones for sub-subdomains, though. In practice, a DNS server will
maintain a cache, and most TLD's will be known since before. That means that
most queries will only ever require two lookups by the server, and commonly one
or zero.
### Extending DnsPacket for recursive lookups
Before we can get on, we'll need a few utility functions on `DnsPacket`.
```rust
impl DnsPacket {
- snip -
2020-06-18 06:47:09 +07:00
/// It's useful to be able to pick a random A record from a packet. When we
/// get multiple IP's for a single name, it doesn't matter which one we
/// choose, so in those cases we can now pick one at random.
2017-11-23 20:51:20 +07:00
pub fn get_random_a(&self) -> Option<String> {
2020-06-18 06:47:09 +07:00
self.answers
.iter()
.filter_map(|record| match record {
DnsRecord::A { ref addr, .. } => Some(addr.to_string()),
_ => None,
})
.next()
}
2017-11-23 20:51:20 +07:00
2020-06-18 06:47:09 +07:00
/// A helper function which returns an iterator over all name servers in
/// the authorities section, represented as (domain, host) tuples
fn get_ns<'a>(&'a self, qname: &'a str) -> impl Iterator<Item=(&'a str, &'a str)> {
self.authorities.iter()
// In practice, these are always NS records in well formed packages.
// Convert the NS records to a tuple which has only the data we need
// to make it easy to work with.
.filter_map(|record| match record {
DnsRecord::NS { domain, host, .. } => Some((domain.as_str(), host.as_str())),
_ => None,
})
// Discard servers which aren't authoritative to our query
.filter(move |(domain, _)| qname.ends_with(*domain))
2017-11-23 20:51:20 +07:00
}
2020-06-18 06:47:09 +07:00
/// When there is a NS record in the authorities section, there may also
/// be a matching A record in the additional section. This saves us
/// from doing a separate query to resolve the IP of the name server.
2017-11-23 20:51:20 +07:00
pub fn get_resolved_ns(&self, qname: &str) -> Option<String> {
2020-06-18 06:47:09 +07:00
// Get an iterator over the nameservers in the authorities section
self.get_ns(qname)
// Now we need to look for a matching A record in the additional
// section. Since we just want the first valid record, we can just
// build a stream of matching records.
.flat_map(|(_, host)| {
self.resources.iter()
// Filter for A records where the domain match the host
// of the NS record that we are currently processing
.filter_map(move |record| match record {
DnsRecord::A { domain, addr, .. } if domain == host => Some(addr),
_ => None,
})
})
.map(|addr| addr.to_string())
// Finally, pick the first valid entry
.next()
}
2017-11-23 20:51:20 +07:00
2020-06-18 06:47:09 +07:00
/// However, not all name servers are as that nice. In certain cases there won't
/// be any A records in the additional section, and we'll have to perform *another*
/// lookup in the midst of our first. For this, we introduce a method for
/// returning the hostname of an appropriate name server.
2017-11-23 20:51:20 +07:00
pub fn get_unresolved_ns(&self, qname: &str) -> Option<String> {
2020-06-18 06:47:09 +07:00
// Get an iterator over the nameservers in the authorities section
self.get_ns(qname)
.map(|(_, host)| host.to_string())
// Finally, pick the first valid entry
.next()
}
2017-11-23 20:51:20 +07:00
} // End of DnsPacket
```
### Implementing recursive lookup
We move swiftly on to our new `recursive_lookup` function:
```rust
fn recursive_lookup(qname: &str, qtype: QueryType) -> Result<DnsPacket> {
2018-03-15 20:42:42 +07:00
// For now we're always starting with *a.root-servers.net*.
2017-11-23 20:51:20 +07:00
let mut ns = "198.41.0.4".to_string();
2018-03-15 20:42:42 +07:00
// Since it might take an arbitrary number of steps, we enter an unbounded loop.
2017-11-23 20:51:20 +07:00
loop {
println!("attempting lookup of {:?} {} with ns {}", qtype, qname, ns);
2018-03-15 20:42:42 +07:00
// The next step is to send the query to the active server.
2017-11-23 20:51:20 +07:00
let ns_copy = ns.clone();
let server = (ns_copy.as_str(), 53);
2020-06-18 06:47:09 +07:00
let response = lookup(qname, qtype.clone(), server)?;
2017-11-23 20:51:20 +07:00
2018-03-15 20:42:42 +07:00
// If there are entries in the answer section, and no errors, we are done!
2020-06-18 06:47:09 +07:00
if !response.answers.is_empty() && response.header.rescode == ResultCode::NOERROR {
2017-11-23 20:51:20 +07:00
return Ok(response.clone());
}
2018-03-15 20:42:42 +07:00
// We might also get a `NXDOMAIN` reply, which is the authoritative name servers
// way of telling us that the name doesn't exist.
2017-11-23 20:51:20 +07:00
if response.header.rescode == ResultCode::NXDOMAIN {
return Ok(response.clone());
}
2018-03-15 20:42:42 +07:00
// Otherwise, we'll try to find a new nameserver based on NS and a corresponding A
// record in the additional section. If this succeeds, we can switch name server
// and retry the loop.
2017-11-23 20:51:20 +07:00
if let Some(new_ns) = response.get_resolved_ns(qname) {
ns = new_ns.clone();
continue;
}
2018-03-15 20:42:42 +07:00
// If not, we'll have to resolve the ip of a NS record. If no NS records exist,
// we'll go with what the last server told us.
2017-11-23 20:51:20 +07:00
let new_ns_name = match response.get_unresolved_ns(qname) {
Some(x) => x,
2020-06-18 06:47:09 +07:00
None => return Ok(response.clone()),
2017-11-23 20:51:20 +07:00
};
2018-03-15 20:42:42 +07:00
// Here we go down the rabbit hole by starting _another_ lookup sequence in the
// midst of our current one. Hopefully, this will give us the IP of an appropriate
// name server.
2020-06-18 06:47:09 +07:00
let recursive_response = recursive_lookup(&new_ns_name, QueryType::A)?;
2017-11-23 20:51:20 +07:00
2018-03-15 20:42:42 +07:00
// Finally, we pick a random ip from the result, and restart the loop. If no such
// record is available, we again return the last result we got.
2017-11-23 20:51:20 +07:00
if let Some(new_ns) = recursive_response.get_random_a() {
ns = new_ns.clone();
} else {
2020-06-18 06:47:09 +07:00
return Ok(response.clone());
2017-11-23 20:51:20 +07:00
}
}
2020-06-18 06:47:09 +07:00
}
2017-11-23 20:51:20 +07:00
```
### Trying out recursive lookup
The only thing remaining is to change our `handle_query` function to use
2017-11-23 20:51:20 +07:00
`recursive_lookup`:
```rust
fn handle_query(socket: &UdpSocket) -> Result<()> {
2017-11-23 20:51:20 +07:00
- snip -
println!("Received query: {:?}", question);
if let Ok(result) = recursive_lookup(&question.name, question.qtype) {
packet.questions.push(question.clone());
packet.header.rescode = result.header.rescode;
- snip -
}
```
Let's try it!
```text
# dig @127.0.0.1 -p 2053 www.google.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 -p 2053 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41892
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 300 IN A 216.58.211.132
;; Query time: 76 msec
;; SERVER: 127.0.0.1#2053(127.0.0.1)
;; WHEN: Fri Jul 08 14:31:39 CEST 2016
;; MSG SIZE rcvd: 62
```
Looking at our server window, we see:
```text
Received query: DnsQuestion { name: "www.google.com", qtype: A }
attempting lookup of A www.google.com with ns 198.41.0.4
attempting lookup of A www.google.com with ns 192.12.94.30
attempting lookup of A www.google.com with ns 216.239.34.10
Answer: A { domain: "www.google.com", addr: 216.58.211.132, ttl: 300 }
```
2018-03-15 20:42:42 +07:00
This mirrors our manual process earlier. We can now successfully resolve
a domain starting from the list of root servers. We've now got a fully
functional, albeit suboptimal, DNS server.
There are many things that we could do better. For instance, there is no true
concurrency in this server. We can neither send nor receive queries over TCP.
We cannot use it to host our own zones, and allow it to act as an authorative
server. The lack of support for DNSSEC leaves us open to DNS poisoning attacks
where a malicious server can return records relating to somebody else's domain.
Many of these problems have been fixed in my own project
[hermes](https://github.com/EmilHernvall/hermes), so you can head over there to
investigate how I did it, or continue on your own from here. Or maybe you've
had enough of DNS for now... :) Regardless, I hope you've gained some new insight
into how DNS works.