25
Impact of Configuration Errors on DNS Robustness CSCI 780, Fall 2005

Impact of Configuration Errors on DNS Robustness CSCI 780, Fall 2005

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Impact of Configuration Errors on DNS Robustness

CSCI 780, Fall 2005

Motivation DNS: part of the Internet core

infrastructure Applications: web, e-mail, e164, CDNs …

DNS: considered as a very reliable system Works almost always

Question: is DNS a robust system? User-perceived robustness System robustness

are they the same?

Thousands or even millions of users affected

All due to a single DNS configuration error

MotivationShort Answer:

“Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet --because of an equipment misconfiguration”

-- Wired News, Jan 2001

Related Work Traffic & implementation errors studies:

Danzig et al. [SIGCOMM92]: bugs CAIDA : traffic & bugs

Performance studies: Jung et al. [IMW01]: caching Cohen et al. [SAINT01]: proactive caching Liston et al. [IMW02]: diversity

Server availability : in [OSDI04, IMC04]

Study DNS Robustness

Classify DNS operational errors: Study known errors Identify new types of errors

Measure their pervasiveness Quantify their impact on DNS

availability performance

Outline

DNS Overview Measurement Methodology DNS Configuration Errors

Example Cases Measurement Results

Discussion & Summary

net com uk ca jp

foo

buz bar

bar1 bar2 bar3

Zone:Occupies a continues subspace Served by the same nameservers

bar.foo.com. NS ns1.bar.foo.com.bar.foo.com. NS ns3.bar.foo.com.bar.foo.com. NS ns2.bar.foo.com.bar.foo.com. MX mail.bar.foo.com. www.bar.foo.com. A 10.10.10.10

bar

name servers

resource records

Background

local DNS server

client

bar zone

foo zone

com zone

root zone

asking for www.bar.foo.com

answer:www.bar.foo.com A 10.10.10.10

referral:com NS RRscom A RRs

referral:foo NS RRsfoo A RRs

referral:bar NS RRsbar A RRs

Infrastructure RRs

foo.com. NS ns1.foo.com.foo.com. NS ns2.foo.com.foo.com. NS ns3.foo.com.

foo.com. NS ns1.foo.com.foo.com. NS ns2.foo.com.foo.com. NS ns3.foo.com.

foo.com

comns1.foo.com. A 1.1.1.1ns2.foo.com. A 2.2.2.2ns3.foo.com. A 3.3.3.3

ns1.foo.com. A 1.1.1.1ns2.foo.com. A 2.2.2.2ns3.foo.com. A 3.3.3.3

•NS Resource Record:–Provides the names of a zone’s authoritative servers

–Stored both at the parent and at the child zone

•A Resource Record–Associated with a NS resource record

–Stored at the parent zone (glue A record)

What Affects DNS Availability

Name Servers: Software failures Network failures Scheduled maintenance tasks

Infrastructure Resource Records: Availability of these records Configuration errors

focus of our work

Classification of Measured Errors

Inconsistency Dependency

LameDelegation

DelegationInconsistency

DiminishedRedundancy

CyclicDependency

The configuration of infrastructure RRs does not correspond to the actual authoritative name-servers.

More than one name-servers share a common point of failure.

What is Measured? Frequency of configuration errors:

System parameters: TLDs , DNS level, zone size (i.e. the number of delegations)

Impact on availability: Number of servers: lost due to these errors Zone’s availability: probability of resolving a

name Impact on performance:

Total time to resolve a query Starting from the query issuing time Finishing at the query final answer time

Measurement Methodology

Error frequency and availability impact: 3 sets of active measurements

Random set of 50K zones 20K zones that allow zone transfers 500 popular zones

Performance impact: 2 sets of passive measurements:1-

week DNS packet traces

Lame Delegation

com

foo

foo.com. NS A.foo.com.foo.com. NS B.foo.com.

A.foo.com

A.foo.com. A 1.1.1.1B.foo.com. A 2.2.2.2

2) DNS error code -- 1 RTT perf. penalty

3) Useless referral -- 1 RTT perf. penalty

4) Non-authoritativeanswer (cached)

1) Non-existing server -- 3 seconds perf. penalty

B.foo.com

Lame Delegation Results

Lame Delegation Results

0.06 sec

0.4 sec3 sec

50%

Lame Delegation Results Error Frequency:

15% of the zones 8% for the 500 most popular zones independent of the zone’s location (level),

varies a lot per TLD Impact:

70% of the zones with errors lose half or more of the authoritative servers

8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation

C) Geographic location level: - belong to the same city

B) Autonomous system level: - belong to the same AS

Diminished Server Redundancy

com

foo

foo.com. NS A.foo.com.foo.com. NS B.foo.com.

A.foo.com B.foo.com

A.foo.com. A 1.1.1.1B.foo.com. A 2.2.2.2

A) Network level: - belong to the same subnet

Diminished Server Redundancy Results Error Frequency:

45% of all zones have all servers in the same /24 subnet

75% of all zones have servers in the same AS large & popular zones: better AS and geo

diversity Impact:

less than 99.9% availability: all servers in the same /24 subnet

more than 99.99% availability: 3 servers at different ASs or different cities

Cyclic Zone Dependency (1)

com

foo

foo.com. NS A.foo.com.foo.com. NS B.foo.com.

A.foo.com B.foo.com

A.foo.com. A 1.1.1.1

B.foo.com depends on A.foo.com

The A glue RR for B.foo.com missing

B.foo.com. A 2.2.2.2

If A.foo.com is unavailable then B.foo.com is too

Cyclic Zone Dependency (2)

com

foo

foo.com. NS A.foo.com.foo.com. NS B.bar.com.

A.foo.com B.bar.com

A.foo.com. A 1.1.1.1

bar

B.foo.com A.bar.com

bar.com. NS A.bar.com.bar.com. NS B.foo.com.

A.bar.com. A 2.2.2.2

The foo.com zone seemscorrectly configured

The combination of foo.com and bar.com zones is wrongly

configured

The B serversdepend on A servers

If A.foo and A.bar are unavailable, B addr. are unresolvable

Cyclic Zone Dependency Results

Error Frequency: 2% of the zones None of the 500 most popular zones

Impact: 90% of the zones with cyclic

dependency errors lose 25% (or even more) of their servers

2 or 4 zones are involved in most errors

Discussion: User-Perceived != System

Robustness User-perceived robustness:

Data replication: only one server is needed Data caching: temporary masks

infrastructure failures Popular zones: fewer configuration errors

System robustness: Fewer available servers: due to

inconsistency errors Fewer redundant servers: due to

dependency errors

Discussion: Why so many errors?

Superficially: are due to operators: Unaware of these errors Lack of coordination

parent-child zone, secondary servers hosting Fundamentally: are due to protocol

design: Lack of mechanisms to handle these errors

proactively or reactively Design choices that embrace some of them:

Name-servers are recognized with names Glue NS & A records necessary to set up the DNS tree

Summary DNS operational errors are widespread DNS operational errors affect availability:

50% of the servers lost less than 99.9% availability

DNS operational errors affect performance: 1 or even 2 orders of magnitude

DNS system robustness lower than user perception Due to protocol design, not just due to operator

errors