View
221
Download
2
Embed Size (px)
Citation preview
Motivation DNS: part of the Internet core
infrastructure Applications: web, e-mail, e164, CDNs …
DNS: considered as a very reliable system Works almost always
Question: is DNS a robust system? User-perceived robustness System robustness
are they the same?
Thousands or even millions of users affected
All due to a single DNS configuration error
MotivationShort Answer:
“Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet --because of an equipment misconfiguration”
-- Wired News, Jan 2001
Related Work Traffic & implementation errors studies:
Danzig et al. [SIGCOMM92]: bugs CAIDA : traffic & bugs
Performance studies: Jung et al. [IMW01]: caching Cohen et al. [SAINT01]: proactive caching Liston et al. [IMW02]: diversity
Server availability : in [OSDI04, IMC04]
Study DNS Robustness
Classify DNS operational errors: Study known errors Identify new types of errors
Measure their pervasiveness Quantify their impact on DNS
availability performance
Outline
DNS Overview Measurement Methodology DNS Configuration Errors
Example Cases Measurement Results
Discussion & Summary
net com uk ca jp
foo
buz bar
bar1 bar2 bar3
Zone:Occupies a continues subspace Served by the same nameservers
bar.foo.com. NS ns1.bar.foo.com.bar.foo.com. NS ns3.bar.foo.com.bar.foo.com. NS ns2.bar.foo.com.bar.foo.com. MX mail.bar.foo.com. www.bar.foo.com. A 10.10.10.10
bar
name servers
resource records
Background
local DNS server
client
bar zone
foo zone
com zone
root zone
asking for www.bar.foo.com
answer:www.bar.foo.com A 10.10.10.10
referral:com NS RRscom A RRs
referral:foo NS RRsfoo A RRs
referral:bar NS RRsbar A RRs
Infrastructure RRs
foo.com. NS ns1.foo.com.foo.com. NS ns2.foo.com.foo.com. NS ns3.foo.com.
foo.com. NS ns1.foo.com.foo.com. NS ns2.foo.com.foo.com. NS ns3.foo.com.
foo.com
comns1.foo.com. A 1.1.1.1ns2.foo.com. A 2.2.2.2ns3.foo.com. A 3.3.3.3
ns1.foo.com. A 1.1.1.1ns2.foo.com. A 2.2.2.2ns3.foo.com. A 3.3.3.3
•NS Resource Record:–Provides the names of a zone’s authoritative servers
–Stored both at the parent and at the child zone
•A Resource Record–Associated with a NS resource record
–Stored at the parent zone (glue A record)
What Affects DNS Availability
Name Servers: Software failures Network failures Scheduled maintenance tasks
Infrastructure Resource Records: Availability of these records Configuration errors
focus of our work
Classification of Measured Errors
Inconsistency Dependency
LameDelegation
DelegationInconsistency
DiminishedRedundancy
CyclicDependency
The configuration of infrastructure RRs does not correspond to the actual authoritative name-servers.
More than one name-servers share a common point of failure.
What is Measured? Frequency of configuration errors:
System parameters: TLDs , DNS level, zone size (i.e. the number of delegations)
Impact on availability: Number of servers: lost due to these errors Zone’s availability: probability of resolving a
name Impact on performance:
Total time to resolve a query Starting from the query issuing time Finishing at the query final answer time
Measurement Methodology
Error frequency and availability impact: 3 sets of active measurements
Random set of 50K zones 20K zones that allow zone transfers 500 popular zones
Performance impact: 2 sets of passive measurements:1-
week DNS packet traces
Lame Delegation
com
foo
foo.com. NS A.foo.com.foo.com. NS B.foo.com.
A.foo.com
A.foo.com. A 1.1.1.1B.foo.com. A 2.2.2.2
2) DNS error code -- 1 RTT perf. penalty
3) Useless referral -- 1 RTT perf. penalty
4) Non-authoritativeanswer (cached)
1) Non-existing server -- 3 seconds perf. penalty
B.foo.com
Lame Delegation Results Error Frequency:
15% of the zones 8% for the 500 most popular zones independent of the zone’s location (level),
varies a lot per TLD Impact:
70% of the zones with errors lose half or more of the authoritative servers
8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation
C) Geographic location level: - belong to the same city
B) Autonomous system level: - belong to the same AS
Diminished Server Redundancy
com
foo
foo.com. NS A.foo.com.foo.com. NS B.foo.com.
A.foo.com B.foo.com
A.foo.com. A 1.1.1.1B.foo.com. A 2.2.2.2
A) Network level: - belong to the same subnet
Diminished Server Redundancy Results Error Frequency:
45% of all zones have all servers in the same /24 subnet
75% of all zones have servers in the same AS large & popular zones: better AS and geo
diversity Impact:
less than 99.9% availability: all servers in the same /24 subnet
more than 99.99% availability: 3 servers at different ASs or different cities
Cyclic Zone Dependency (1)
com
foo
foo.com. NS A.foo.com.foo.com. NS B.foo.com.
A.foo.com B.foo.com
A.foo.com. A 1.1.1.1
B.foo.com depends on A.foo.com
The A glue RR for B.foo.com missing
B.foo.com. A 2.2.2.2
If A.foo.com is unavailable then B.foo.com is too
Cyclic Zone Dependency (2)
com
foo
foo.com. NS A.foo.com.foo.com. NS B.bar.com.
A.foo.com B.bar.com
A.foo.com. A 1.1.1.1
bar
B.foo.com A.bar.com
bar.com. NS A.bar.com.bar.com. NS B.foo.com.
A.bar.com. A 2.2.2.2
The foo.com zone seemscorrectly configured
The combination of foo.com and bar.com zones is wrongly
configured
The B serversdepend on A servers
If A.foo and A.bar are unavailable, B addr. are unresolvable
Cyclic Zone Dependency Results
Error Frequency: 2% of the zones None of the 500 most popular zones
Impact: 90% of the zones with cyclic
dependency errors lose 25% (or even more) of their servers
2 or 4 zones are involved in most errors
Discussion: User-Perceived != System
Robustness User-perceived robustness:
Data replication: only one server is needed Data caching: temporary masks
infrastructure failures Popular zones: fewer configuration errors
System robustness: Fewer available servers: due to
inconsistency errors Fewer redundant servers: due to
dependency errors
Discussion: Why so many errors?
Superficially: are due to operators: Unaware of these errors Lack of coordination
parent-child zone, secondary servers hosting Fundamentally: are due to protocol
design: Lack of mechanisms to handle these errors
proactively or reactively Design choices that embrace some of them:
Name-servers are recognized with names Glue NS & A records necessary to set up the DNS tree
Summary DNS operational errors are widespread DNS operational errors affect availability:
50% of the servers lost less than 99.9% availability
DNS operational errors affect performance: 1 or even 2 orders of magnitude
DNS system robustness lower than user perception Due to protocol design, not just due to operator
errors