Upload
jeffry-marsh
View
217
Download
0
Embed Size (px)
Citation preview
Real-World Techniques forAutomating Configuration of
Network Devices@NANOG 24
Mark EpsteinCTO, Ponte
February 11th, 2002
Slide 2
www.ponte.com© 2000-2002 Ponte Communications, Inc.
The Challenge: Large ScaleEpstein’s rule of large numbers
Responsibility for large numbers of anything that must be individually managed is a real pain
• Large firms have large numbers
– Specific business initiatives and functions
– Vendors, models, and instances of devices
– Employees and Operators
– Security breaches and breach attempts
• Additional Challenges
– High employee turnover
– More operators than device-savvy staff
Slide 3
www.ponte.com© 2000-2002 Ponte Communications, Inc.
NOC2
Core WAN
RPOP
RPOP
NOC1
RPOP
Service Provider Network
Internet
Customer
Customer
CustomerCustomer
Customer
Customer
Customer
Customer
Customer
Customer RPOP
Customer
Customer
Slide 4
www.ponte.com© 2000-2002 Ponte Communications, Inc.
CoreWAN
Regional POP
Intermediary Device
CoreRouter
CoreRouter
Intermediary Device
customers
AccessRouter
customers
AccessRouter
customers
AccessRouter
customers
AccessRouter
Many Devices working in Concert
Slide 5
www.ponte.com© 2000-2002 Ponte Communications, Inc.
BusinessSystems
Operations Support Systems
InternetINTRANET
HomeOffice
Secured ChannelsBranchOffice
NetworkControl Point
Network Operations Center
CONTROL SERVER
Delivery Drivers
Assembly Templates
Security Service Modules
Network Security ControlPonte nsControl™ Architecture
NetworkControl Point
NetworkControl Point
ClientControl Point
HeadquartersOffice
Slide 6
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Many Interrelated Problems
Tcl/Expect
• Issues
– Buffer skew and device prompts
– Timing and reset behavior
– Terminal servers
– Firmware revisions and delivery
– High Availability and Fail-over
– Control channel problems
– Using existing configurations
– Differential configuration
Slide 7
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Prompts
Tcl/Expect Issues
• Buffer skew– Your code isn’t looking at what you expect
• typical enable prompt ends with “#”• banner=“### authorized users only ###”• enable prompt=“router23#”
Canonical vs custom prompts• Can cause buffer skew• Know the prompts or be very flexible
Strategies• Resync with unique text • Use time of output as additional sync?
Slide 8
www.ponte.com© 2000-2002 Ponte Communications, Inc.
# resynchronize the buffer with text unlikely
# to occur in the input buffer
# PIX example
proc BufferResync {} {
set buffer_data ""
send "who ?\r"
expect {
-ex {usage: who [ip]} { set
buffer_data $expect_out(buffer) }
timeout { error }
}
# now safe to expect prompt
expect {
-ex {#} {}
timeout { error }
}
# empty the input buffer
expect {
-re {.*} {}
}
# error out if anything arrived after
# prompt
expect -timeout 1 {
-re {.} { error }
timeout {}
}
# erase input
send [control u]
return $buffer_data
}
Prompts
Tcl/Expect Code
Slide 9
www.ponte.com© 2000-2002 Ponte Communications, Inc.
• Device Reset Behavior – IOS devices disconnect the control terminal on a 'reload’– But still accept new connections– And leave other active connections up until later in the reload process– Thus difficult to detect when device has completed its reset
• Typing Speed Some devices are command speed limited
– Device communication over slow serial lines– Minimum-cost processors (i.e. slow)
Inter-command speed can be naturally limited– Throttle inter-command speed by processing intervening prompts– You cannot depend on prompts
–Ex: when connecting through a terminal server to a device–Do not to send an initial [CR] too quickly or device may drop it
Speed & Timing
Tcl/Expect Issues
Slide 10
www.ponte.com© 2000-2002 Ponte Communications, Inc.
# ask for the reload, then wait 5min before
# attempting to reconnect
ExpectReload
sleep 300
...
# slow our “typing” speed for slow device
set sendRate [JobVar DeviceBitRate]
# can only accept data at 25% of bit rate
set loadFactor 25
set send_slow [deviceSpeed $sendRate $loadFactor]
...
send -s "long data string\r"
• Measure actual device reset time, encode into scripts
• Different for every device type
• Sophistication makes sense — but still device-specific
• Slow command entry (“typing”) may be critical for reliable behavior
Speed Issues
Tcl/Expect Code
Slide 11
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Device Control via Terminal Servers 1-0
Tcl/Expect Issues
• Unpredictable prompt at connection– Serial vs. virtual-terminal TCP connection– Device may be in any state at all– Get device into known state
• Terminal server port resets– Terminal server ports get wedged– Good configuration reduces this problem– Need to be terminal-server-aware– Pay careful attention to timeouts– Rebooting terminal server may cause device reboots!!
Slide 12
www.ponte.com© 2000-2002 Ponte Communications, Inc.
proc ExpectLogin {access} {
set timeout 10
set retries 3
set passwordfailed 0
expect {
-ex {>} {}
-ex {#} {
warning "device was left in \
enable mode"
send "disable\r"
}
-ex {sername:} {
send "[getCSUserName \
$access]\r"
exp_continue
}
-ex {assword:} {
if {$passwordfailed == 0} {
send "[getSystemPasswd \
$access]\r"
set passwordfailed 1
} else {
error "System password was \
rejected"
}
exp_continue
}
-ex {Enter Selection:} {
;# for c1900, enterprise edition
send "K"
exp_continue
}
Device Control via Terminal Servers 1-1
Tcl/Expect Code
Slide 13
www.ponte.com© 2000-2002 Ponte Communications, Inc.
-ex {Press any key to continue.} {
send "\r"
exp_continue
}
-ex {Password required, but none \
set} {
error "Connection closed by \
foreign host. Possible cause:\
no password on device"
}
eof {
retry "Telnet connection to \
device closed unexpectedly"
}
timeout {
set timeout 120
if {$retries > 0} {
incr retries -1
send "\r" exp_continue
} else {
retry "Login timed out \
waiting for \"Password:\""
}
}
}
}
Device Control via Terminal Servers 1-2
Tcl/Expect Code
Slide 14
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Device Control via Terminal Servers 2-0
Tcl/Expect Issues
• “Console” output– Usually console (serial) is the true “console”– Terminal page length may be fixed over the serial port– Asynchronous, unrelated output increases need for resynchronization and fault tolerance
Slide 15
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Suppress console output…
# suppress line width editing
sendCmd "terminal width 0"
# suppress console monitor messages
sendCmd "terminal no monitor"
... [do stuff] ...
sendCmd "terminal monitor"
# suppress "More" prompts
sendCmd "no pager"
...
sendCmd "pager"
Try, try again…(What to do when you can’t
suppress console output)
for {set retries 0} {retries < 3} {incr retries}
{
sendCmd "show version"
set buffer_data [BufferResync]
if {regexp {VERSION: (\W)} \
$buffer_data junk version} {
break
}
}
if {! [info exists version]} { error }
Device Control via Terminal Servers 2-1
Tcl/Expect Code
Slide 16
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Special Concerns RE Firmware
Tcl/Expect Issues
• Configuration File Issues– Commands may be added or removed– Differences in meaning between versions– Often must reconfigure to support firmware– Wholesale firmware change (E.G. CatOS to IOS)
• Transfer Concerns– Distance vs. Reliability– Some devices require local access– Pilot error– TFTP
Slide 17
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Fail-over Devices
Tcl/Expect Issues
• Active/standby and primary/secondary– IP address vs. terminal server “mismatch”– “Two men say they’re Jesus, one of them must be wrong”
• Change volume limits (PIX example) (i.e., 200 lines of conduit changes per “commit”)
• New and expanded commands
Slide 18
www.ponte.com© 2000-2002 Ponte Communications, Inc.
proc PIXActive {} {
set cablestatus {NOT FOUND}
set iam {}
set state {}
send "\r"
expect {
{# $} {
send "sho fai\r"
}
timeout {
sendAbort
error "PIXActive: timed out \
waiting for first prompt"
}
}
expect {
-re "Cable status: (\[^\r\n]*)" {
set cablestatus \
$expect_out(1,string)
}
-re "(\n|\r)<--- More --->" {
send " \r"
exp_continue
}
timeout {
sendAbort
error "PIXActive: timed out \
searching for `Cable \
status:.*'"
}
}
Fail-over Devices [detection] (1)
Tcl/Expect Code
Slide 19
www.ponte.com© 2000-2002 Ponte Communications, Inc.
expect {
-re "This host: (\[^ ]*) - \
(\[^ \r\n]*)" {
set iam $expect_out(1,string)
set state $expect_out(2,string)
}
-re "(\n|\r)<--- More --->" {
send " \r"
exp_continue
}
timeout {
sendAbort
error "PIXActive: timed out \
searching for `This host:.*'"
}
}
expect {
-re "(\n|\r)<--- More --->" {
send "q\r"
exp_continue
}
{# $} {}
timeout {
sendAbort
error "PIXActive: timed out \
waiting for final prompt"
}
}
Fail-over Devices [detection] (2)
Tcl/Expect Code
Slide 20
www.ponte.com© 2000-2002 Ponte Communications, Inc.
if {$iam == "Secondary" && \
$state == "Active"} {
JobRetAdd -append Warning \
failover_secondary_active \
"Secondary PIX is Active, \
cable status: $cablestatus\n"
}
if {$cablestatus != "Normal"} {
sendAbort
error "PIXActive: cable status \
failure: $iam Cable status: \
$cablestatus"
}
switch -- $state {
{Standby} {return 0}
{Active} {return 1}
default {
sendAbort
error "PIXActive: failed to \
determine if this host \
active, host $iam, state \
$state, \
cable status $cablestatus"
}
}
}
Fail-over Devices [detection] (3)
Tcl/Expect Code
Slide 21
www.ponte.com© 2000-2002 Ponte Communications, Inc.
proc ExpectConfigure {cmds} {
ExpectConfigMode
set count 0
foreach cmd $cmds {
send -s $cmd
send "\r"
expect {
-ex {Type help or '?' for a list of \
available commands.} {
sendAbort
error "ExpectConfigure: invalid \ configuration command \
detected, check session log"
}
{(config)#} { }
timeout {
sendAbort
error "ExpectConfigure: timed \
out waiting for (config)# \
after $cmd"
}
}
if {[incr count] > [JobVar
MaxConfigurationLines]} {
set count 0
ExpectWriteConfig
sleep 30
ExpectConfigMode
}
}
ExpectWriteConfig
}
Fail-over Devices [change volume]
Tcl/Expect Code
Slide 22
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Control Channel Problems
Tcl/Expect Issues/Code
• Loss of connection triggers Expect EOF
• Many scripts consider this retry-able– Often caused by transient network failure– But what state was the device in, anyway?
• Distribute control to reduce risk– Place control close to devices– Distance between control and controlled device == risk of network failure
expect {
....
eof { retry "lost connection, retry request" }
}
Slide 23
www.ponte.com© 2000-2002 Ponte Communications, Inc.
Turning Found Configurations into Data
Tcl/Expect Issues
• Retrieve configuration from a PIX#roam-request -a pixdevice -- req_class=AuditConfig \action=import-pixconfig
• After the configuration is retrieved, import- pixconfig uses roam-pixload to parse configuration
#roam-pixload -r $requestId
• roam-pixload gets relevant data from configuration
–interface name, security level, mtu, speed/options,ip address, netmask, fail-over configuration,access and enable passwords
• Then pushes found data back into device profile#roam-device pixdevice -- iface.inside.speed=auto#...
• Device profile is used in combination with template to create configuration file
Slide 24
www.ponte.com© 2000-2002 Ponte Communications, Inc.
• The only way to update device configurations without losing connections on most devices
• Often not possible - many device commands are not invertable
• Must take care to maintain control connectivity
• Cannot do it for firmware update
• Often difficult– Often cannot just add onto end of existing configuration– Can cause serious security issues– Order-dependent configuration changes often cannot be made at all
• Much more difficult to do reliably than just replacing startup configuration and reloading
Differential Configuration
Tcl/Expect Issues
Slide 25
www.ponte.com© 2000-2002 Ponte Communications, Inc.
proc ConfigUpdate {} {
global spawn_id timeout
set system [getSystemPasswd \
[JobGetVar Access]]
set enable [getEnablePasswd \
[JobGetVar Access]]
if {[JobVarExists Failover] && \
[JobGetVar Failover]} {
ConnectFailover $system $enable \
[JobGetVar RemoteAddrList]
} else {
Connect [lindex [JobGetVar \
RemoteAddrList] 0] \
$system $enable
}
set conffile [ExpectGetConfig] VerifyTarget $conffile
set oldconf [prepare_config $conffile]
set newconf [prepare_config \
[JobGetFile [JobGetVar ConfigFile]]]
ExpectConfigure [ComputeDeltaConfig \
$oldconf $newconf]
send "exit\r"
ExpectClose
}
Differential Configuration
Tcl/Expect Code