46
Digital Communications II, CUCL 23/06/22 23/06/22 IP Network Management IP Network Management Richard Mortier Richard Mortier

Guest Lecture by Dr Richard Mortier, MSR

Embed Size (px)

Citation preview

Page 1: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

14/04/2314/04/23

IP Network ManagementIP Network Management

Richard MortierRichard Mortier

Page 2: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions

• IP network componentsIP network components

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach

Page 3: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction– What’s it all about then?What’s it all about then?

• AbstractionsAbstractions

• IP network componentsIP network components

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach

Page 4: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

What is What is network managementnetwork management??

• One point-of-view: a large field full of acronymsOne point-of-view: a large field full of acronyms– EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML, EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML,

FCAPS, ITU, ...FCAPS, ITU, ...

– (Don’t ask me what all of those mean, I don’t care!)(Don’t ask me what all of those mean, I don’t care!)

• From question.com:From question.com:– In 1989, a random of the journalistic persuasion asked In 1989, a random of the journalistic persuasion asked

hacker Paul Boutin “What do you think will be the hacker Paul Boutin “What do you think will be the biggest problem in computing in the 90s?” Paul’s biggest problem in computing in the 90s?” Paul’s straight-faced response: “There are only 17,000 three-straight-faced response: “There are only 17,000 three-letter acronyms.”letter acronyms.”

• We will ignore most of them We will ignore most of them

Page 5: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

What is What is network managementnetwork management??

• Computer networks are considered to have Computer networks are considered to have three operating timescalesthree operating timescales– DataData: packet forwarding [μs, ms]: packet forwarding [μs, ms]

– ControlControl: flows/connections [ secs, mins ]: flows/connections [ secs, mins ]

– ManagementManagement: aggregates, networks : aggregates, networks [ hours, days ][ hours, days ]

• ……so we’re concerned with “the network” so we’re concerned with “the network” rather than particular devices or protocolsrather than particular devices or protocols

• Standardization is key!Standardization is key!

Page 6: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions– ISO FCAPS, TMN EMS, ATMISO FCAPS, TMN EMS, ATM

• IP network componentsIP network components

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach

Page 7: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

ISO FCAPS: ISO FCAPS: functionalfunctional separation separation

• FFaultault– Recognize, isolate, correct, log faultsRecognize, isolate, correct, log faults

• CConfigurationonfiguration– Collect, store, track configurationsCollect, store, track configurations

• AAccountingccounting– Collect statistics, bill users, enforce quotasCollect statistics, bill users, enforce quotas

• PPerformanceerformance– Monitor trends, set thresholds, trigger alarmsMonitor trends, set thresholds, trigger alarms

• SSecurityecurity– Identify, secure, manage risksIdentify, secure, manage risks

Page 8: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

TMN EMS: TMN EMS: administrativeadministrative separation separation

• Telecommunications Management NetworkTelecommunications Management Network• Element Management SystemElement Management System

• ““...simple ...simple butbut elegant...” (!) elegant...” (!)– (my emphasis)(my emphasis)– (often the two go together)(often the two go together)

• NEL: network elements (switches, transmission systems)NEL: network elements (switches, transmission systems)• EML: element management (devices, links)EML: element management (devices, links)• NML: network management (capacity, congestion)NML: network management (capacity, congestion)• SML: service management (SLAs, time-to-market)SML: service management (SLAs, time-to-market)• BML: business management (RoI, market share, blah)BML: business management (RoI, market share, blah)

Page 9: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

The B-ISDN reference modelThe B-ISDN reference model

• Asynchronous Transfer Mode Asynchronous Transfer Mode “cube”“cube”– See IAP lectures, maybe See IAP lectures, maybe

• Plane Plane management…management…– The whole networkThe whole network

• ……vs vs layer layer management management – Specific layersSpecific layers

• TopologyTopology• ConfigurationConfiguration• FaultFault• OperationsOperations• AccountingAccounting• PerformancePerformance

management plane

user planecontrol plane

higher layers

ATM layer

physical layer

plane m

anagem

ent

layer managem

ent

ATM adaptation layer

higher layers

Page 10: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Network managementNetwork management

• Models of general communication networksModels of general communication networks– Tend to be quite abstract and Tend to be quite abstract and exceedingly exceedingly tedious!tedious!

– Many practitioners still seem excited about OO Many practitioners still seem excited about OO programming, WIMP interfaces, etcprogramming, WIMP interfaces, etc

– ……probably because implementation is hard due to so probably because implementation is hard due to so many excessively long and complex standards!many excessively long and complex standards!

• My view: basic “need-to-know” requirements areMy view: basic “need-to-know” requirements are1.1.What should be happening? [ c ]What should be happening? [ c ]

2.2.What is happening? [ f, p, a ]What is happening? [ f, p, a ]

3.3.What shouldn’t be happening? [ f, s ]What shouldn’t be happening? [ f, s ]

4.4.What will be happening? [ p, a ]What will be happening? [ p, a ]

Page 11: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Network managementNetwork management

• We’ll concentrate on IP networksWe’ll concentrate on IP networks– Still acronym city: ICMP, SNMP, MIB, RFC Still acronym city: ICMP, SNMP, MIB, RFC

– Sample size: 10Sample size: 1022 routers, 10 routers, 1055 hosts hosts

• We’ll concentrate on the network coreWe’ll concentrate on the network core– Routers, not hostsRouters, not hosts

• We’ll ignore We’ll ignore “service management”“service management”– DNS, AD, file stores, etcDNS, AD, file stores, etc

Page 12: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions

• IP network componentsIP network components– IP, networks, routers IP, networks, routers

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach

Page 13: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

IP primer (you probably know all this)IP primer (you probably know all this)

• Destination-routed packets – no connectionsDestination-routed packets – no connections– Time-to-live field: allow removal of looping packetsTime-to-live field: allow removal of looping packets

• Routers forward packets based on routeing tablesRouters forward packets based on routeing tables– Tables populated by routeing protocolsTables populated by routeing protocols

• Routers and protocols operate independently Routers and protocols operate independently – ……although protocols aim to build consistent statealthough protocols aim to build consistent state

• RFCs ~= standardsRFCs ~= standards– Often much looser semantics than e.g. ISO, ITU Often much looser semantics than e.g. ISO, ITU

standardsstandards

– Compare for example OSPF [RFC2327] and IS-IS Compare for example OSPF [RFC2327] and IS-IS [RFC1142, RFC1195], two link-state routeing protocols[RFC1142, RFC1195], two link-state routeing protocols

Page 14: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

So, how do you build an IP network?So, how do you build an IP network?

1.1. Buy (lease) routersBuy (lease) routers

2.2. Buy (lease) fibreBuy (lease) fibre

3.3. Connect them all togetherConnect them all together

4.4. Configure routersConfigure routers

5.5. Configure end-systemsConfigure end-systems

$1m? $2m? for a new, populated, backbone router!

Wayleaves = $$$Be a landowner!

Correctly.For now.

Mwuhahaha.

Someone else’s can of worms.

Page 15: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Multiple router flavoursMultiple router flavours

• CoreCore– OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)– Big, fat, fast, expensiveBig, fat, fast, expensive– E.g. Cisco HFR, Juniper T-640E.g. Cisco HFR, Juniper T-640

• HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450kHFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k

• Transit/Peering-facingTransit/Peering-facing– OC-3 and up, good GigE densityOC-3 and up, good GigE density– ACLs, full-on BGP, uRPF, accountingACLs, full-on BGP, uRPF, accounting

• Customer-facingCustomer-facing– FR/ATM/…FR/ATM/…– Feature set as above, plus fancy queues, etcFeature set as above, plus fancy queues, etc

• Broadband aggregatorBroadband aggregator– High scalability: sessions, ports, reconnectionsHigh scalability: sessions, ports, reconnections– Feature set as aboveFeature set as above

• Customer-premises (CPE)Customer-premises (CPE)– 100Mbps, maybe100Mbps, maybe– NAT, DHCP, firewall, wireless, VoIP, …NAT, DHCP, firewall, wireless, VoIP, …– Low cost, low-end, perhaps just software on a PCLow cost, low-end, perhaps just software on a PC

A sample taxonomy

Cisco CRS-1Multi-shelf system

Page 16: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Network designNetwork design

• Whose network?Whose network?– ISPs, IXs, enterprise, campusISPs, IXs, enterprise, campus– POPs, DCs POPs, DCs

• Many designs: flat, hierarchical, hybrids, multiple scalesMany designs: flat, hierarchical, hybrids, multiple scales• Many constraintsMany constraints

– BusinessBusiness• Backwards compatibility. Who to connect. Peering. Backwards compatibility. Who to connect. Peering.

– TechnologyTechnology• Power – directly (24x7 operation) and indirectly (cooling)Power – directly (24x7 operation) and indirectly (cooling)• Port density Port density vs.vs. raw bandwidth raw bandwidth• Software reliabilitySoftware reliability• Hardware/software capabilityHardware/software capability

– Addressing schemes for scalability, summarizationAddressing schemes for scalability, summarization– Can’t run feature X with feature Y on vendor C in network size NCan’t run feature X with feature Y on vendor C in network size N

– Connectivity/resiliencyConnectivity/resiliency• ““All core routers connect to at least 2 other core routers”All core routers connect to at least 2 other core routers”• ““All edge routers connect to at least 2 core routers”All edge routers connect to at least 2 core routers”

Page 17: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Router configurationRouter configuration

• InitializationInitialization– Name the router, setup boot options, setup Name the router, setup boot options, setup

authentication optionsauthentication options• Configure interfacesConfigure interfaces

– Loopback, ethernet, fibre, ATMLoopback, ethernet, fibre, ATM– Subnet/mask, filters, static routesSubnet/mask, filters, static routes– Shutdown (or not), queueing options, full/half duplexShutdown (or not), queueing options, full/half duplex

• Configure routeing protocols (OSPF, BGP, IS-IS, …)Configure routeing protocols (OSPF, BGP, IS-IS, …)• Process number, addresses to accept routes from, networks to Process number, addresses to accept routes from, networks to

advertiseadvertise

• Access lists, filters, ...Access lists, filters, ...– Numeric id, permit/deny, subnet/mask, protocol, portNumeric id, permit/deny, subnet/mask, protocol, port

• Route-maps, matching Route-maps, matching routesroutes rather than data traffic rather than data traffic• Other configuration aspects: traps, syslog, etcOther configuration aspects: traps, syslog, etc• (Oh, and switch configuration is about as painful)(Oh, and switch configuration is about as painful)

Page 18: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Router configuration fragmentsRouter configuration fragments

hostname FOOBAR!boot system flash slot0:a-boot-image.binboot system flash bootflash:logging buffered 100000 debugginglogging console informationalaaa new-model aaa authentication login default tacacs local aaa authentication login consoleport none aaa authentication ppp default if-needed tacacs aaa authorization network tacacs !ip tftp source-interface Loopback0no ip domain-lookupip name-server 10.34.56.78!ip multicast-routingip dvmrp route-limit 7000ip cef distributed

interface Loopback0 description router-1.network.corp.com ip address 10.65.21.43 255.255.255.255!interface FastEthernet0/0/0 description Link to New York ip address 10.65.43.21 255.255.255.128 ip access-group 175 in ip helper-address 10.65.12.34 ip pim sparse-mode ip cgmp ip dvmrp accept-filter 98 neighbor-list 99 full-duplex!interface FastEthernet4/0/0 no ip address ip access-group 183 in ip pim sparse-mode ip cgmp shutdown full-duplex

router ospf 2 log-adjacency-changes passive-interface FastEthernet0/0/0 passive-interface FastEthernet0/1/0 passive-interface FastEthernet1/0/0 passive-interface FastEthernet1/1/0 passive-interface FastEthernet2/0/0 passive-interface FastEthernet2/1/0 passive-interface FastEthernet3/0/0 network 10.65.23.45 0.0.0.255 area 1.0.0.0 network 10.65.34.56 0.0.0.255 area 1.0.0.0 network 10.65.43.0 0.0.0.127 area 1.0.0.0

access-list 24 remark Mcast ACLaccess-list 24 permit 239.255.255.254access-list 24 permit 224.0.1.111access-list 24 permit 239.192.0.0 0.3.255.255access-list 24 permit 232.192.0.0 0.3.255.255access-list 24 permit 224.0.0.0 0.0.0.255access-list 1011 deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff

tftp-server slot1:some-other-image.bintacacs-server host 10.65.0.2tacacs-server key xxxxxxxxrmon event 1 trap Trap1 description "CPU Utilization>75%" owner configrmon event 2 trap Trap2 description "CPU Utilization>95%" owner config

Page 19: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

• Lots of large, fragile text files Lots of large, fragile text files – 00s/000s routers, 00s/000s lines per config00s/000s routers, 00s/000s lines per config– Errors are hard to find and have non-obvious resultsErrors are hard to find and have non-obvious results– Router configuration also editable on-lineRouter configuration also editable on-line– Order matters!Order matters!

• How to keep track of them all?How to keep track of them all?– Naming schemes, directory trees, CVS, ssh upload and Naming schemes, directory trees, CVS, ssh upload and

atomic commit to routeratomic commit to router– Perhaps even a proper databasePerhaps even a proper database

• State of the art is pretty basicState of the art is pretty basic– Few tools to check consistency, design goalsFew tools to check consistency, design goals– Generally generate configurations from templates and Generally generate configurations from templates and

have human-intensive process to control access to have human-intensive process to control access to running configsrunning configs

• Topic of current research [Feamster et al]Topic of current research [Feamster et al]

Router configurationRouter configuration

This counts as advanced!

Page 20: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions

• IP network componentsIP network components

• IP network management protocolsIP network management protocols– ICMP, SNMP, NetFlowICMP, SNMP, NetFlow

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach

Page 21: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

ICMPICMP

• Internet Control Message Protocol [RFC792]Internet Control Message Protocol [RFC792]– IP protocol #1IP protocol #1

– In-band “control”In-band “control”

• Variety of message typesVariety of message types– echo/echo reply echo/echo reply

[ [ PING (packet internet groper) PING (packet internet groper) ]]

– time exceeded [ time exceeded [ TRACEROUTETRACEROUTE ] ]

– destination unreachable, redirectdestination unreachable, redirect

– source quenchsource quench

Page 22: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Ping (Packet INternet Groper)Ping (Packet INternet Groper)

• Test for livenessTest for liveness– ……also used to measure (round-trip) latencyalso used to measure (round-trip) latency

• Send ICMP echo Send ICMP echo

• Valid IP host [RFC1122, RFC1123] Valid IP host [RFC1122, RFC1123] mustmust reply with ICMP echo responsereply with ICMP echo response

• Subnet PING?Subnet PING?– Useful but often not available/deprecatedUseful but often not available/deprecated

– ““ACK” implosion could be a problemACK” implosion could be a problem

– RFCs ~= standardsRFCs ~= standards

Page 23: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

TracerouteTraceroute

• Which route do my packets take to their destination?Which route do my packets take to their destination?– Send UDP packets with increasing time-to-live valuesSend UDP packets with increasing time-to-live values– Compliant IP host must respond with ICMP “time exceeded”Compliant IP host must respond with ICMP “time exceeded”– Triggers each host along path to so respondTriggers each host along path to so respond

• Not quite that simpleNot quite that simple– One router, many IP addresses: which source address?One router, many IP addresses: which source address?– Router control processor, inbound or outbound interfaceRouter control processor, inbound or outbound interface– Asymmetric routes (return path != outbound path)Asymmetric routes (return path != outbound path)– Routes changeRoutes change

• Do we want full-mesh host-host routes anyway?!Do we want full-mesh host-host routes anyway?!– Size of data set, amount of probe trafficSize of data set, amount of probe traffic– This is topology, what about load on links?This is topology, what about load on links?

Page 24: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

SNMPSNMP

• Protocol to manage information tables at devicesProtocol to manage information tables at devices• Provides get, set, trap, notify operationsProvides get, set, trap, notify operations

– get, set: read, write valuesget, set: read, write values– trap: signal a condition (e.g. threshold exceeded)trap: signal a condition (e.g. threshold exceeded)– notify: reliable trapnotify: reliable trap

• Complexity mostly in the MIB designComplexity mostly in the MIB design– Some standard tables, but many vendor specificSome standard tables, but many vendor specific– Non-critical, so often tables populated incorrectlyNon-critical, so often tables populated incorrectly– Many tens of MIBs (thousands of lines) per deviceMany tens of MIBs (thousands of lines) per device– Different versions, different data, different semantics Different versions, different data, different semantics

• Yet another configuration tracking problemYet another configuration tracking problem– Inter-relationships between MIBsInter-relationships between MIBs

Page 25: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

IPFIXIPFIX

• IETF working groupIETF working group– Export of flow based data out of IP network devicesExport of flow based data out of IP network devices

– Developing suitable protocol from Cisco NetFlow™ v9Developing suitable protocol from Cisco NetFlow™ v9

– [RFC3954, RFC3955][RFC3954, RFC3955]

• Statistics reportingStatistics reporting– Setup templateSetup template

– Send data records matching templateSend data records matching template

• Many variables Many variables – Packet/flow counters, rule matches, quite flexiblePacket/flow counters, rule matches, quite flexible

Page 26: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions

• IP network componentsIP network components

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together– Network mapping, statistics gathering, Network mapping, statistics gathering,

controlcontrol

• An alternative approachAn alternative approach

Page 27: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

An hypothetical NMSAn hypothetical NMS

• GUI around ICMP (ping, traceroute), SNMP, etcGUI around ICMP (ping, traceroute), SNMP, etc– Recursive host discoveryRecursive host discovery– Broadcast ping, ARP, default gateway: start somewhereBroadcast ping, ARP, default gateway: start somewhere– Recursively SNMP query for known hosts/connected networksRecursively SNMP query for known hosts/connected networks– Ping known hosts to test livenessPing known hosts to test liveness– IterateIterate

• Display topology: allow “drill-down” to particular devicesDisplay topology: allow “drill-down” to particular devices

• Configure and monitor known devicesConfigure and monitor known devices– Trap, Netflow™, syslog message destinationsTrap, Netflow™, syslog message destinations– Counter thresholds, CPU utilization threshold, fault reportingCounter thresholds, CPU utilization threshold, fault reporting– Particular faults or fault patternsParticular faults or fault patterns

• Interface statistics and graphs (MRTG)Interface statistics and graphs (MRTG)

Page 28: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

NOC, NOC. Calling AT&T…NOC, NOC. Calling AT&T…

Page 29: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

What are they all looking at?What are they all looking at? http://www.stat.ee.ethz.ch/mrtg/

Page 30: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

An hypothetical NMSAn hypothetical NMS

• All very straightforward? No, not reallyAll very straightforward? No, not really– A lot of software engineering: corner cases, traceroute A lot of software engineering: corner cases, traceroute

interpretation, NATs, etcinterpretation, NATs, etc• CorrectnessCorrectness

– MIBs may contain rubbishMIBs may contain rubbish– Can only view inside your network anywayCan only view inside your network anyway– Tunnelled, encrypted protocols becoming prevalentTunnelled, encrypted protocols becoming prevalent

• EfficiencyEfficiency– Rate pacing discovery traffic: ping implosion/explosionRate pacing discovery traffic: ping implosion/explosion– SNMP overloading router CPUsSNMP overloading router CPUs

• Using NMSs also not straightforwardUsing NMSs also not straightforward– How to setup “correct” thresholds?How to setup “correct” thresholds?– How to decide when something “bad” has happened?How to decide when something “bad” has happened?– How to present (or even interpret) reams and reams of data?How to present (or even interpret) reams and reams of data?

Page 31: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OverviewOverview

• IntroductionIntroduction

• AbstractionsAbstractions

• IP network componentsIP network components

• IP network management protocolsIP network management protocols

• Pulling it all togetherPulling it all together

• An alternative approachAn alternative approach– From the edges…From the edges…

Page 32: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

AnemoneAnemone

• An endsystem network management platformAn endsystem network management platform– Collect Collect flow informationflow information from endsystems, and from endsystems, and– Combine with Combine with topology informationtopology information from routeing from routeing

protocolsprotocols

• Endsystems have more information about their Endsystems have more information about their traffic network devicestraffic network devices

– No No router support requiredrouter support required

• A platform to support many applicationsA platform to support many applications

• Currently concentrating on “managed networks”Currently concentrating on “managed networks”– E.g. governments, enterprises, etcE.g. governments, enterprises, etc– High complexity, high valueHigh complexity, high value– High degree of endsystem controlHigh degree of endsystem control

Page 33: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

ApplicationsApplications

• Real-time and historical analysisReal-time and historical analysis– Current topology + ingress, egress flows gives global Current topology + ingress, egress flows gives global

picture of network behaviourpicture of network behaviour– Capacity planning, anomaly detectionCapacity planning, anomaly detection

• Modelling “what if” scenariosModelling “what if” scenarios– Plug into a simulator back-endPlug into a simulator back-end– ““What happens to the network if we move all our What happens to the network if we move all our

Exchange servers to a single data centre?”Exchange servers to a single data centre?”

• Automatic configurationAutomatic configuration– Close the loop: enable network to meet dynamic SLAsClose the loop: enable network to meet dynamic SLAs– Reconfigure network to track e.g. time of day load Reconfigure network to track e.g. time of day load

changeschanges

Page 34: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

ChallengesChallenges

1.1. How do you instrument an entire How do you instrument an entire network? Do you need to network? Do you need to instrument all endsystems?instrument all endsystems?

2.2. What network information should be What network information should be captured and stored?captured and stored?

3.3. How do you access data stores How do you access data stores distributed across a large network distributed across a large network (ca. 300,000 nodes)?(ca. 300,000 nodes)?

1% coverage gives 99.999% bytes and flows

Flow data augmented with application and user context

Use distributed query system

Page 35: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

SummarySummary

• IntroductionIntroduction– What is network management? What is network management?

• AbstractionsAbstractions– ISO FCAPS, TMN EMS, ATMISO FCAPS, TMN EMS, ATM

• IP network componentsIP network components– IP, networks, routersIP, networks, routers

• IP network management protocolsIP network management protocols– ICMP, SNMP, etcICMP, SNMP, etc

• Pulling it all togetherPulling it all together– Outline of a network management systemOutline of a network management system

• An alternative approach: from the edgesAn alternative approach: from the edges

Page 36: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

The endThe end

• QuestionsQuestions

• Answers?Answers?

• http://http://www.cisco.comwww.cisco.com//

• http://www.routergod.com/http://www.routergod.com/

• http://www.ietf.org/http://www.ietf.org/

• http://http://ipmon.sprintlabs.com/pyrtipmon.sprintlabs.com/pyrt//

• http://http://www.nanog.orgwww.nanog.org//

Page 37: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Backup slidesBackup slides

• Internet routeingInternet routeing

• OSPFOSPF

• BGPBGP

Page 38: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Internet routeingInternet routeing

• Q: how to get a packet from node to destination?Q: how to get a packet from node to destination?

• A1: advertise all reachable destinations and apply A1: advertise all reachable destinations and apply a consistent cost function (a consistent cost function (distance vector)distance vector)

• A2: learn network topology and compute A2: learn network topology and compute consistent shortest paths (consistent shortest paths (link statelink state))

– Each node (1) discovers and advertises Each node (1) discovers and advertises adjacenciesadjacencies; ; (2) builds (2) builds link state databaselink state database; (3) computes ; (3) computes shortest shortest pathspaths

• A1, A2: Forward to A1, A2: Forward to next-hopnext-hop using using longest-prefix-longest-prefix-matchmatch

Page 39: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

OSPF (~link state routeing)OSPF (~link state routeing)

• Q: how to route given packet from any node to Q: how to route given packet from any node to destination?destination?

• A: learn network topology; compute shortest pathsA: learn network topology; compute shortest paths

• For each nodeFor each node– Discover Discover adjacencies adjacencies (~(~immediate neighboursimmediate neighbours)); ;

advertiseadvertise

– Build Build link state databaselink state database (~ (~network topologynetwork topology))

– Compute Compute shortest pathsshortest paths to all destination prefixes to all destination prefixes

– Forward to Forward to next-hopnext-hop using using longest-prefix-match longest-prefix-match (~(~most most specific routespecific route))

Page 40: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

BGP (~path vector routeing)BGP (~path vector routeing)

• Q: how to route given packet from any node to destination?Q: how to route given packet from any node to destination?• A: neighbours tell you destinations they can reach; pick A: neighbours tell you destinations they can reach; pick

cheapest optioncheapest option

• For each nodeFor each node– Receive Receive (destination, cost, next-hop)(destination, cost, next-hop) for all destinations for all destinations

known to neighbourknown to neighbour– Longest-prefix-match among next-hops for given Longest-prefix-match among next-hops for given

destinationdestination– Advertise selected Advertise selected (destination, cost+(destination, cost+, next-hop, next-hop'')) for all for all

known destinationsknown destinations• Selection process is complicatedSelection process is complicated• Routes can be modified/hidden at all three stagesRoutes can be modified/hidden at all three stages

– General mechanism for application of policyGeneral mechanism for application of policy

Page 41: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Page 42: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Anemone: where are we now?Anemone: where are we now?

• Three major componentsThree major components– Flow collectionFlow collection

– Route collectionRoute collection

– Distributed databaseDistributed database

• Building prototypes, simulating Building prototypes, simulating systemsystem

Page 43: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

Data collectionData collection

• Flow collectionFlow collection– Hosts track active flows Hosts track active flows

• Using low overhead event posting infrastructure, Using low overhead event posting infrastructure, ETWETW

• Built prototype device driver provider & user-space Built prototype device driver provider & user-space consumerconsumer

– Used packet traces for feasibility study on (client, server)Used packet traces for feasibility study on (client, server)• Peaks at (165, 5667) live and (39, 567) active flows Peaks at (165, 5667) live and (39, 567) active flows

per secper sec

• Route collectionRoute collection– OSPF is link-state: passively collect link state advertsOSPF is link-state: passively collect link state adverts– Extension of my work at Sprint (for IS-IS and BGP); also Extension of my work at Sprint (for IS-IS and BGP); also

been done at AT&T (NSDI’04 paper)been done at AT&T (NSDI’04 paper)

Page 44: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

The distributed databaseThe distributed database

• Logically contains Logically contains 1.1. Traffic flow matrix (bandwidths), Traffic flow matrix (bandwidths), {srcs}{srcs} ×× {dsts}{dsts}2.2. ……each entry annotated with current route from each entry annotated with current route from src src to to dstdst

• N.B. src/dst might be e.g. (IP end-point, application)N.B. src/dst might be e.g. (IP end-point, application)• Large dynamic data set suggests aggregationLarge dynamic data set suggests aggregation

• Related workRelated work– { distributed, continuous query, temporal } databases{ distributed, continuous query, temporal } databases– Sensor networksSensor networks

• Potential starting points: Astrolabe or SDIMS (SIGCOMM’04)Potential starting points: Astrolabe or SDIMS (SIGCOMM’04)– Where/what/how much to aggregate? Where/what/how much to aggregate?

• Is data read- or write-dominated?Is data read- or write-dominated?• Which is more dynamic, flow or topology data?Which is more dynamic, flow or topology data?• Can the system successfully self-tune?Can the system successfully self-tune?

Page 45: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

The distributed databaseThe distributed database

• Construct traffic matrix from flow monitoringConstruct traffic matrix from flow monitoring– Hosts can supply flows they source and sinkHosts can supply flows they source and sink– Only need a subset of this data to get complete traffic Only need a subset of this data to get complete traffic

matrixmatrix• Construct topology from route collectionConstruct topology from route collection

– OSPF supplies topology → routesOSPF supplies topology → routes

• Wish to be able to answer queries likeWish to be able to answer queries like– ““Who are the top-10 traffic generators?”Who are the top-10 traffic generators?”

• Easy to aggregate, don’t care about topologyEasy to aggregate, don’t care about topology

– ““What is the load on link What is the load on link l l ?”?”• Can aggregate from hosts, but need to know routesCan aggregate from hosts, but need to know routes

– ““What happens if we remove links What happens if we remove links {l…m} {l…m} ?”?”• Interaction between traffic matrix, topology, even flow controlInteraction between traffic matrix, topology, even flow control

Page 46: Guest Lecture by Dr Richard Mortier, MSR

Digital Communications II, CUCL

The distributed databaseThe distributed database

• Building simulation modelBuilding simulation model– OSPF data gives topology, event list, routesOSPF data gives topology, event list, routes– Simple load model to start with (load ~ # subnets)Simple load model to start with (load ~ # subnets)– Precedence matrix (from SPF) reduces flow-data query Precedence matrix (from SPF) reduces flow-data query

setset

• Can we do as well/better than e.g. NetFlow?Can we do as well/better than e.g. NetFlow?– Accuracy/coverage trade-offAccuracy/coverage trade-off

• How should we distribute the DB? How should we distribute the DB? – Just OSPF data? Just flow data? A mixture?Just OSPF data? Just flow data? A mixture?

• How many levels of aggregation?How many levels of aggregation?– How many nodes do queries touch?How many nodes do queries touch?

• What sort of API is suitable?What sort of API is suitable?– Example queries for sample applicationsExample queries for sample applications