Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact...

Preview:

Citation preview

Improving Internet Availability

Availability of Other Services

• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability

• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability

• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability

Credit: David Andersen job talk

Internet Availability

• Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines”

• More “critical” (or at least availability-centric) applications on the Internet

• At the same time, the Internet is getting more difficult to debug– Increasing scale, complexity, disconnection, etc.

Is it possible to get to “5 nines” of availability?If so, how? What role should the network play?

Inherent Availability vs. Reactive Diagnosis

• What happens when a failure occurs?

• (At least) three options– Nothing– Automatic masking/recovery– Diagnosis + Semi-manual intervention

• (Augustin, Renata)

• When is “automatic” recovery appropriate?• What features for diagnosis should the network provide?

(How) should the network provide inherent availability?

• Idea: compute backup in advance– No dynamic routing, just dynamic forwarding– End systems (routers, hosts, proxies) detect failures

and send hints to deflect packets– Kind of like fast reroute…but a bit more extreme

• Various proposals in this space– Multi-router configurations, e.g.

Path Splicing: Main Idea

• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration

• Step 2 (Parallelization): Allow traffic to switch between instances at any node in the protocol

ts

Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.

Availability: Paths vs. Content

• What definitions of availability are appropriate?– Downtime

• Fraction of time that path exists between endpoints• Fraction of time that endpoints can communicate on any path

– Transfer time• How long must I wait to get content?• (Perhaps this makes more sense in delay-tolerant networks,

bittorrent-style protocols, etc.)

• Some applications depend more on availability of content, rather than uptime/availability of any particular Internet path or host

Diagnosis

• User or operator takes over when the network doesn’t “fix things automatically”

• Diagnosis will never be fully automatic– Task: put functions in place to make network

(mal)functions as intuitive as possible– Make the operators (or users) more efficient…

(How) should the network support diagnosis?

• More network support means potentially more information to users and operators– …potentially at the cost of performance– Forwarding performance, filters, or

measurment/monitoring?

• What functions should the router (or other on-path elements) provide?

Data-Plane Accountability

• Problem: Network elements drop packets, fail, and otherwise give rise to poor performance

• One Solution: In-Band Path Diagnosis

• Routers keep track of number of packets seen per flow

• Each router stamps each packet with current flow counter value

• If current counter value does not equal router’s expected packet count for that flow, router marks packet

IP Header

New Shim Header

Transport header

High-level Overview

Scalability vs. Reactivity

• Various ways to get more data– More frequent monitoring– More data types– More vantage points

• Advantages – More paths, links, services, etc.– Potentially faster reaction

• But…data reduction is key– Operators/users are not at a loss for data about the network.

They need ways to process it.– More monitoring data means more overhead (storage,

bandwidth, etc.)

Active vs. Passive Monitoring

• Active monitoring can provide more direct indicators of path quality, service availability, etc.– But…can’t monitor all possible paths

• What combination of active and passive monitoring is appropriate?

What role should end systems/cooperation play?

• Various previous work in “peer-to-peer” troubleshooting– Tomography– NetProfiler / CoopNet (Padmanabhan)– Cooperative troubleshooting (Wang)– Sharing IDS logs

• In what contexts do these make sense?– Internet– Wireless settings

Medium-Sized Challenges

Some Problems

• Competing business interests threaten– Stability– Connectivity

• Malicious hosts and network entities threaten– Trust– Resource allocation

• Growing scale threatens– Robust, secure, efficient network operations

• Governments threaten– Free speech– Privacy– Efficiency

Problem: Insecurity

• Can’t trust the control plane– BGP: Route hijacks (intentional and unintentional)– DNS: Insecure name resolution

• Can’t trust the data plane– No guarantee for where packets will go

• No accountability or auditing capabilities

• No strong forms of identity

Security: To-Do

• Data plane security– No assurances about where traffic will actually go– Monitoring/stemming unwanted traffic is hard

• Control plane security– Defense against route hijacks, etc.

• Accountability (spoofing prevention, auditing, etc.)– For data-plane performance– For unwanted traffic

Problem: Manageability

• Too easy to misconfigure the network

• Correct operation depends on correct configuration– Can future networks be intrinsically robust?

Management: To-Do

• Automated provisioning

• Configuration, management, and maintenance at a higher layer of abstraction

• Fast, distributed fault detection

• Where possible, eliminate “knobs” without eliminating flexibility

Problem: Scale

• Increasing number of users, end hosts, etc.

• Network connectivity has become a commodity– At the same time, the network is becoming more

difficult to manage– Network providers must keep adding customers– Cost of bandwidth, equipment is plummeting– Management costs begin to dominate

Scale: To-Do

• Scalable addressing that permits multihoming– Traffic engineering, fast updates, etc.– Related topic: mobility

• Scalable mechanisms for path diversity (path selection, etc.)

Designing for Selfishness: Goals

• Providers, producers and consumers must benefit from participating– Without “eyeballs”, content has no value– Without content, the “eyeballs” will bail out– Without a network, eyeballs can’t meet content– Without content or eyeballs, no need for a network

Internet Wish-List

• Availability• Accountability• Mobility• Manageability/Intrinsic Correctness• Support for monitoring• Assurances about traffic

What Has Worked?

• Packet switching• Layering• Congestion control

What Might We Revisit?

• Single-path routing• Monitoring support

– Better traffic sampling algorithms to cope with evolving requirements (it’s no longer just about billing)

• Naming– Poor support for mobility– Poor support for naming content

• Addressing– Very poor correspondence to identity

• Business models/selfishness

Possible Outcome: Many Internets

• Run many different networks simultaneously on the same infrastructure– No clear distinction between architecture and services– Develop specialized “architectures” for specialized applications

• Application or topology-specific routing protocols

• Virtualization of physical resources as a tool for building new networks– Virtual link establishment and virtual routers– Substrate for deploying overlays is new “waist”– This substrate is the new Internet

Recommended