View
215
Download
1
Tags:
Embed Size (px)
Citation preview
3
Creating an Experiment
• Done with `batchexp’ for both batch and interactive experiments– “batch”* is historical name
• Can bring the experiment to three states– swapped – pre-run only– posted – queued experiment ready to run– active – experiment swapped in
4
Swapping An Experiment
• Done with `swapexp’
• Can effect several transitions– swapped to active (swap in experiment)– active to swapped (swap out experiment)– active to active (modify experiment)– posted to swapped (dequeue batch experiment)
5
Pre-run (tbprerun)
• Parse NS file (parse-ns and parse.tcl)– Put virtual state in database (xmlconvert)
• Do visualization layout (prerender)
• Compute static routes (staticroutes)
6
swapped to active (tbswap in)
• Mapping: Find nodes for experimenter– assign_wrapper– assign
• Allocate nodes (nalloc)– Set serial console access (console_setup)
• Set up NFS exports (exports_setup)
• Set up DNS names (named_setup)
• Reboot nodes and wait for them (os_setup)– Load disks if necessary (os_load)
7
swapped to active (contd.)
• Start event system (eventsys_control)
• Create VLANs (snmpit)
• Set up mailing lists (genelists)
• Failure at any step results in swapout
8
active to swapped(tbswap out)
• Stop the event system (eventsys_control)
• Tear down VLANs (snmpit)
• Free nodes (nfree)– Scheduled reservations (sched_reserve)– Place in reloadpending experiment– Revoke console access (console_setup)
• Reset DNS (named_setup)
• Reset NFS exports (exports_setup)
• Reset mailing lists (genelists)
9
active to active (tbswap modify)
• Purpose: experiment modification– Get new virtual state (re-parse NS file)– Bring physical mapping into sync with new state
• Leaves alone nodes whose physical mapping matches the new virtual state
10
Important Daemons
• batch_daemon– Picks up posted experiments– Attempts a swapin– One experiment at a time for each user– Swaps out finished batch experiments
• reload_daemon– Picks up nodes from reloadpending experiment– Frees them when done reloading
12
Next, in More Depth• Parsing• Resource allocation
– Setup for the action: assign_wrapper– The real brains: assign
• Serial console management• Link shaping• IP routing support• Traffic generation• Inter-node synchronization• Event system
14
Experiment Configuration Language
• General purpose OTcl scripting language based on NS
• Exports an API nearly identical to that of NS albeit a subset
• Testbed specific actions via the tb-* procedures– We provide a compatibility script to include when
running under a NS simulation
• Define your own procedures / classes / methods
15
Making sense out of others’ code
• The parser is also written in OTcl
• It mirrors a subset of NS classes
• Implemented methods for the above classes capture the user specified experiment attributes
• Convert experiment attributes to an intermediate XML format– Generic format makes it easy to add support for other
configuration languages
• Store the configuration in the virt_* tables such as virt_nodes, virt_lans etc.
16
Implementation Quirks
• Capture top level resource names for later use– E.g.: Use 'n0' to name the physical node when the user
asks for set n0 [$ns node]
• Rename resource names to workaround restrictions such as in DNS– E.g.: Node 'n(0)' to 'n-0'
• Parser run on ops for security reasons– Mixing trusted/untrusted OTcl code on main server (boss)
is dangerous
• Read tbsetup/ns2ir/README in the source tree for details
18
Assign Wrapper
• Perl frontend to assign• Converts virtual DB representation to more
neutral “top” file format (input)• Converts results from plain text format into
physical DB representation• assign_wrapper is extremely testbed aware• Moves information from virtual tables to
physical tables
19
Virtual Representation
• An experiment is really a set of tables in the database
• Includes “virt_nodes” and “virt_lans” which describe the nodes and the network topology
• Other tables include routes, program agents, traffic generators, virtual types, etc.
20
Virtual Representation Cont.
• Example:set n1 [$ns node]set n2 [$ns node]set link0 [$ns duplex-link $n1 $n2 100MB 10ms]tb-set-hardware $n2 pc600
• Is stored in database tables:virt_node ('n1', '10.1.1.1', 'pc850', 'FBSD-STD', ...)virt_node ('n2', '10.1.1.2', 'pc600', 'RHL-STD, ...)virt_lan ('link0', 'n1', '100MB', '5ms', ...)virt_lan ('link0', 'n2', '100MB', '5ms', ...)
21
What’s a top file?
• Stands for "topology" file, but thats too many syllables.
• Input file to assign specifying nodes, links, desires.
• Conversion of DB format to:
node n2 pc850node n1 pc600link link0/n1:0,n2:0 n1 n2 100000 0 0
• Combine with current (free) physical resources to come up with a solution.
22
Assign Results
• Assign maps n1 and n2 to pc1 and pc41 based on types and bandwidth.
Nodes
node1 pc1
node2 pc41
End Nodes
Edges
link0/n1:0,n2:0 intraswitch pc1/eth3 pc41/eth1
End Edges
• The above is a “simplified” version of actual results. Gory details available elsewhere.
23
Assign Wrapper Continues
• Allocate physical resources (nodes) as specified by assign
• Allocate virtual resources (vnodes) on physical nodes (local and remote)
• If some nodes already allocated (someone else got them before you), try again
• Keep trying until maximum try exceeded; assign might fail to find a solution on first N tries
24
Assign Wrapper Keeps Going …
• Insert set of “vlans” into database– pc1/eth3 connected to pc41/eth1
• Update “interfaces” table with IP addresses assigned by the parser
• Update “nodes” table with user specified values from virt_nodes.– Osids, rpms, tarballs, etc.
• Update “linkdelays” table with end node traffic shaping configuration (from virt_lans)
25
And Going and Going
• Update “delays” table with delay node traffic shaping configuration
• Update “tunnels” table with tunnel configuration (widearea nodes)
• Update “agents” table with location of where events should be sent to control traffic shaping
• Call exit(0) and rest!
27
assign’s job
• Maps virtual resources to local nodes and VLANs • General combinatorial optimization approach to
NP-hard problem• Uses simulated annealing• Minimizes inter-switch links, number of switches,
and other constraints.• Takes seconds for most experiments
28
What’s Hard About It?
• Satisfy constraints– Requested types– Can’t go over inter-switch bandwidth– Domain-specific constraints
• LAN placement for virtual nodes• Subnodes
• Maximize opportunity for future mappings– Minimize inter-switch bandwidth– Avoid scarce nodes
29
What It Can Do
• Handle multiple types of nodes on multiple switches
• Allow users to ask for classes of nodes
• Prefer/discourage use of certain nodes
• Map multiple virtual nodes to one physical node
• Handle nodes that are 'hosted' in some other node
• Partial solutions
30
What It Doesn't Do
• Map based on observed end-to-end network characteristics– Applicable to wide-area and wireless– But, we have another program, wanassign, that
can
• Satisfy requests for specific link types– But, we could approximate with subnodes
• Full node resource description
31
Issues
• Complicated– Several authors
– Subject of paper evaluating many configurations
– Nature of randomized algorithm makes debugging hard
– Evolved over time to keep up with features
• Scaling– Particularly with virtual and simulated nodes
• Not just scale (1000’s), it’s the type of node
– Pre-passes may help
• The good: it’s coped with a lot of new demands!
33
Executive Summary
• Allow user access to consoles via serial line
• Console proxy enables remote access
• Authentication and encryption
• All console output logged
• Requires OS support for serial consoles
• Utah Emulab: all nodes have serial lines– Not required, but handy
34
Serial Consoles
• Can redirect console in three places– BIOS: on most “server” motherboards– Boot loader: easy on BSD and Linux– OS: easy on BSD and Linux
• Boot loaders and OSes must be configured– Generally via boot loader configuration
35
The serial line proxy(capture)
• Original purpose was to log console output– Read/write serial line, log data, present tty IF– Use “tip” to access pty
• Enhanced to “remote” the console– Present a socket interface– Can be accessed from anywhere on the
network
• One capture process per serial line
36
Authentication(capserver)
• Only users in an experiment can access
• Use a one-time key– capture running on serial line host generates
new key for every “session”
• Sends key to capserver on the boss node– capserver records key in DB, returns ownership
info– capture uses info to protect ACL and log files
37
Clients(console, tiptunnel)
• console is the replacement for tip– Run on ops, obtains access info via ACL file
created by capture– File permissions restrict user access
• tiptunnel is the remove version– Binaries for Linux, BSD, Windows– Run as a helper app from browser– Access info passed via secure web connection– All communication via SSL
39
Executive Summary
• Emulab allows setting and modification of bandwidth, latency, and loss rate on a per-link basis
• Interface through NS script or command
• Implemented either by dedicated “delay” nodes or on end nodes
• Delay nodes work with any end node OS
• End node shaping for FreeBSD or Linux
40
Delay nodes
• Run FreeBSD + dummynet + bridging
• FreeBSD kernel:– Runs at 10000Hz to improve accuracy– Uses polling device drivers to reduce overhead
• Nodes are dedicated to an experiment
• One node can shape multiple links
• Transparent to end nodes
• Not transparent to switch fabric
42
End node shaping(“link delays”)
• Handle link shaping at both ends of the link
• Requires OS support on the end nodes– FreeBSD: dummynet– Linux: “tc” with modifications
• Conserves Emulab resources at potential expense of emulation fidelity
• Works in environments where delay nodes are not practical or possible
43
Dynamic control
• Link settings can be modified at “run time”– at commands in the NS file
– tevc command
• Run a control agent (delay_agent) on all nodes implementing shaping
• Listens for events, interacts with kernel to effect changes
• OS specific
45
Executive Summary
• Emulab offers three options for IP routing in a topology: none, manual, or automatic
• Specified via the NS file
• Routes setup automatically at boot time
• There is no agent for dynamic modification of routes
46
User-specified routing
• “None”– No experimental network routes will be setup– Used for LANs and routing experiments
• “Manual”– Explicit specification of routes in the NS file– Routes becomes part of DB state of experiment– Passed to a node at boot, part of self-config– Implies IP forwarding enabled
47
Emulab-provided routing
• “Static”– Emulab calculates routes at experiment creation
(routecalc, staticroutes)– Shortest path calculation between all pairs– Optimized to coalesce into network routes
• “Session”– Dynamic routing: runs gated/OSPF on all nodes– Auto-generated config file uses only active
experimental interfaces
48
Routing Gotcha’s
• Node default route uses the control net– Missing manual routes result in lost traffic
• Control net is visible to routing daemons– Makes their job easy (one hop to anyone)
• NxN "Static" route computation and storage do not scale as N increases, such as in multiplexed virtual nodes
50
Executive Summary
• Emulab allows experiments to run and control background traffic generators
• Interface through NS script or command line tool
• Constant Bit Rate traffic only right now
• UDP or TCP only right now
51
Implementation details
• Based on TG (http://www.postel.org/tg/)– UDP or TCP, one-way, various distributions of
interarrival and length
• Modified to be an event agent– Start and stop, change packet rate and size
• Interface:– NS: standard syntax for traffic sources/sinks– tevc command line tool
53
Executive Summary
• Provides a simple inter-node barrier synchronization mechanism for experiments
• Example: wait for all nodes to finish running a test before starting the next one
• Not a centralized service (per-experiment infrastructure), scales well
• Easy to use: can be scripted
54
History
• Originally implemented a single-barrier, single-use “ready” mechanism:– Allowed users to know when all nodes were “up”– Used centralized TMCC to report/query status– Network/server unfriendly: constant polling
• Users wanted a more general mechanism– Multiple barriers, reusable barriers
• Tended to roll their own– Often network unfriendly as well
55
Enter the Sync Server
• In NS file, declare a node as the server:– set node1 [$ns node]
– tb-set-sync-server $node1
• When node boots, it starts up the sync server automatically
• Nodes requiring synchronization use emulab-sync application
• Use can be scripted using program agent
56
Example client use
• One node acts as barrier master, initializing barrier and waiting for a number of clients:– /usr/testbed/bin/emulab-sync -i 4
• All other client nodes contact the barrier:– /usr/testbed/bin/emulab-sync
• emulab-sync blocks until the barrier count is reached
57
Implementation
• Simple TCP-based server and client program– UDP version in the works
• Client:– Gets server info from a config file written at boot– Connect to server and write a small record– Block until a reply is read
• Server:– Accept connections, read records from clients– Write a reply when all clients have connected
58
Issues
• Why not use the event system for synchronization?– Event system is a centralized service– As we move to decentralization, may reconsider
• Authentication: none– Local: uses shared control net so this is a
problem, won't be with control net VLANs– Wide-area: wide-open, add HMAC ala events or
just use event system
60
Emulab Control Plane
• Many of Emulab’s features are dynamically controllable:– Traffic generators: can be started, stopped, and
parameters altered– Link shaping: links can be brought up and down,
characteristics can be modified
• Control is via the NS file, the web interface, or a command line tool.
61
Example: A Link
• NS: create a shaped link:– set link0 [$ns duplex-link $n1 $n2 50Mb 10ms DropTail]
• NS: control the link:– $ns at 100 "$link0 modify DELAY=20 BANDWIDTH=25"
– $ns at 200 "$link0 down"
• Command line: control the link– tevc -e tutorial/linktest +10 link0 down
62
What's really happening?
• A link “agent” runs on each (delay) node to control all of the links for that node.
• The agent listens for “events” from the server telling it what to do.
• A per-experiment scheduler doles out the events at the proper time, sending them to the agents.
• Other agents include the traffic generators, program objects, link tester.
63
Come on, what's really happening?!
• Use Elvin (http://elvin.dstc.edu.au/) – off-the-shelf publish-subscribe system
• Agents "listen" for events by "subscribing" to those they care about.
• The per-experiment scheduler "publishes" events as they come due.
• Events flow from the scheduler through the Elvin daemon to the nodes, and ultimately to the agents that wanted them.
65
Issues: Time
• What happens to “event time” when an experiment is swapped?– Run in real time: events could be lost– Suspend time: dilation of experiment time– Restart time: replay static event stream
• Timing for dynamic events– tevc … +10 link0 down; tevc … +10 link1 up
– What is the latency between events?
• What latency do we need to guarantee?
66
Issues: Security
• Elvin mechanism is too heavyweight– Requires encryption to protect authentication keys
– We have no reason to encrypt our events
• Don't want to tie ourselves to Elvin– In principle
– Elvin has gone closed source
• Emulab past: no authentication, no wide-area
• Emulab current: use end-to-end HMAC– Key transferred via TMCC
– Wide-area nodes supported, cannot inject events
67
Issues: Scaling
• Open Elvin TCP connection for every agent– Use per-node proxy– But agents still send events directly to boss– And there are still a lot of nodes
• Use UDP?– What about lost events?
• Deliver static events to nodes early?– Doesn't help dynamic (“now”) events
• Multicast, someday (not the current usage model)• You’d think we could just find a better pub/sub
system, but haven’t.