27
Improving Internet Availability

Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Embed Size (px)

Citation preview

Page 1: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Improving Internet Availability

Page 2: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Availability of Other Services

• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability

• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability

• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability

Credit: David Andersen job talk

Page 3: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Internet Availability

• Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines”

• More “critical” (or at least availability-centric) applications on the Internet

• At the same time, the Internet is getting more difficult to debug– Increasing scale, complexity, disconnection, etc.

Is it possible to get to “5 nines” of availability?If so, how? What role should the network play?

Page 4: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Inherent Availability vs. Reactive Diagnosis

• What happens when a failure occurs?

• (At least) three options– Nothing– Automatic masking/recovery– Diagnosis + Semi-manual intervention

• (Augustin, Renata)

• When is “automatic” recovery appropriate?• What features for diagnosis should the network provide?

Page 5: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

(How) should the network provide inherent availability?

• Idea: compute backup in advance– No dynamic routing, just dynamic forwarding– End systems (routers, hosts, proxies) detect failures

and send hints to deflect packets– Kind of like fast reroute…but a bit more extreme

• Various proposals in this space– Multi-router configurations, e.g.

Page 6: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Path Splicing: Main Idea

• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration

• Step 2 (Parallelization): Allow traffic to switch between instances at any node in the protocol

ts

Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.

Page 7: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Availability: Paths vs. Content

• What definitions of availability are appropriate?– Downtime

• Fraction of time that path exists between endpoints• Fraction of time that endpoints can communicate on any path

– Transfer time• How long must I wait to get content?• (Perhaps this makes more sense in delay-tolerant networks,

bittorrent-style protocols, etc.)

• Some applications depend more on availability of content, rather than uptime/availability of any particular Internet path or host

Page 8: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Diagnosis

• User or operator takes over when the network doesn’t “fix things automatically”

• Diagnosis will never be fully automatic– Task: put functions in place to make network

(mal)functions as intuitive as possible– Make the operators (or users) more efficient…

Page 9: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

(How) should the network support diagnosis?

• More network support means potentially more information to users and operators– …potentially at the cost of performance– Forwarding performance, filters, or

measurment/monitoring?

• What functions should the router (or other on-path elements) provide?

Page 10: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Data-Plane Accountability

• Problem: Network elements drop packets, fail, and otherwise give rise to poor performance

• One Solution: In-Band Path Diagnosis

• Routers keep track of number of packets seen per flow

• Each router stamps each packet with current flow counter value

• If current counter value does not equal router’s expected packet count for that flow, router marks packet

IP Header

New Shim Header

Transport header

High-level Overview

Page 11: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Scalability vs. Reactivity

• Various ways to get more data– More frequent monitoring– More data types– More vantage points

• Advantages – More paths, links, services, etc.– Potentially faster reaction

• But…data reduction is key– Operators/users are not at a loss for data about the network.

They need ways to process it.– More monitoring data means more overhead (storage,

bandwidth, etc.)

Page 12: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Active vs. Passive Monitoring

• Active monitoring can provide more direct indicators of path quality, service availability, etc.– But…can’t monitor all possible paths

• What combination of active and passive monitoring is appropriate?

Page 13: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

What role should end systems/cooperation play?

• Various previous work in “peer-to-peer” troubleshooting– Tomography– NetProfiler / CoopNet (Padmanabhan)– Cooperative troubleshooting (Wang)– Sharing IDS logs

• In what contexts do these make sense?– Internet– Wireless settings

Page 14: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability
Page 15: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Medium-Sized Challenges

Page 16: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Some Problems

• Competing business interests threaten– Stability– Connectivity

• Malicious hosts and network entities threaten– Trust– Resource allocation

• Growing scale threatens– Robust, secure, efficient network operations

• Governments threaten– Free speech– Privacy– Efficiency

Page 17: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Problem: Insecurity

• Can’t trust the control plane– BGP: Route hijacks (intentional and unintentional)– DNS: Insecure name resolution

• Can’t trust the data plane– No guarantee for where packets will go

• No accountability or auditing capabilities

• No strong forms of identity

Page 18: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Security: To-Do

• Data plane security– No assurances about where traffic will actually go– Monitoring/stemming unwanted traffic is hard

• Control plane security– Defense against route hijacks, etc.

• Accountability (spoofing prevention, auditing, etc.)– For data-plane performance– For unwanted traffic

Page 19: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Problem: Manageability

• Too easy to misconfigure the network

• Correct operation depends on correct configuration– Can future networks be intrinsically robust?

Page 20: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Management: To-Do

• Automated provisioning

• Configuration, management, and maintenance at a higher layer of abstraction

• Fast, distributed fault detection

• Where possible, eliminate “knobs” without eliminating flexibility

Page 21: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Problem: Scale

• Increasing number of users, end hosts, etc.

• Network connectivity has become a commodity– At the same time, the network is becoming more

difficult to manage– Network providers must keep adding customers– Cost of bandwidth, equipment is plummeting– Management costs begin to dominate

Page 22: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Scale: To-Do

• Scalable addressing that permits multihoming– Traffic engineering, fast updates, etc.– Related topic: mobility

• Scalable mechanisms for path diversity (path selection, etc.)

Page 23: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Designing for Selfishness: Goals

• Providers, producers and consumers must benefit from participating– Without “eyeballs”, content has no value– Without content, the “eyeballs” will bail out– Without a network, eyeballs can’t meet content– Without content or eyeballs, no need for a network

Page 24: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Internet Wish-List

• Availability• Accountability• Mobility• Manageability/Intrinsic Correctness• Support for monitoring• Assurances about traffic

Page 25: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

What Has Worked?

• Packet switching• Layering• Congestion control

Page 26: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

What Might We Revisit?

• Single-path routing• Monitoring support

– Better traffic sampling algorithms to cope with evolving requirements (it’s no longer just about billing)

• Naming– Poor support for mobility– Poor support for naming content

• Addressing– Very poor correspondence to identity

• Business models/selfishness

Page 27: Improving Internet Availability. Availability of Other Services Carrier Airlines (2002 FAA Fact Book) –41 accidents, 6.7M departures –99.9993% availability

Possible Outcome: Many Internets

• Run many different networks simultaneously on the same infrastructure– No clear distinction between architecture and services– Develop specialized “architectures” for specialized applications

• Application or topology-specific routing protocols

• Virtualization of physical resources as a tool for building new networks– Virtual link establishment and virtual routers– Substrate for deploying overlays is new “waist”– This substrate is the new Internet