25
Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th , 2002

Real-World Techniques for Automating Configuration of Network Devices @NANOG 24 Mark Epstein CTO, Ponte February 11 th, 2002

Embed Size (px)

Citation preview

Real-World Techniques forAutomating Configuration of

Network Devices@NANOG 24

Mark EpsteinCTO, Ponte

February 11th, 2002

Slide 2

www.ponte.com© 2000-2002 Ponte Communications, Inc.

The Challenge: Large ScaleEpstein’s rule of large numbers

Responsibility for large numbers of anything that must be individually managed is a real pain

• Large firms have large numbers

– Specific business initiatives and functions

– Vendors, models, and instances of devices

– Employees and Operators

– Security breaches and breach attempts

• Additional Challenges

– High employee turnover

– More operators than device-savvy staff

Slide 3

www.ponte.com© 2000-2002 Ponte Communications, Inc.

NOC2

Core WAN

RPOP

RPOP

NOC1

RPOP

Service Provider Network

Internet

Customer

Customer

CustomerCustomer

Customer

Customer

Customer

Customer

Customer

Customer RPOP

Customer

Customer

Slide 4

www.ponte.com© 2000-2002 Ponte Communications, Inc.

CoreWAN

Regional POP

Intermediary Device

CoreRouter

CoreRouter

Intermediary Device

customers

AccessRouter

customers

AccessRouter

customers

AccessRouter

customers

AccessRouter

Many Devices working in Concert

Slide 5

www.ponte.com© 2000-2002 Ponte Communications, Inc.

BusinessSystems

Operations Support Systems

InternetINTRANET

HomeOffice

Secured ChannelsBranchOffice

NetworkControl Point

Network Operations Center

CONTROL SERVER

Delivery Drivers

Assembly Templates

Security Service Modules

Network Security ControlPonte nsControl™ Architecture

NetworkControl Point

NetworkControl Point

ClientControl Point

HeadquartersOffice

Slide 6

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Many Interrelated Problems

Tcl/Expect

• Issues

– Buffer skew and device prompts

– Timing and reset behavior

– Terminal servers

– Firmware revisions and delivery

– High Availability and Fail-over

– Control channel problems

– Using existing configurations

– Differential configuration

Slide 7

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Prompts

Tcl/Expect Issues

• Buffer skew– Your code isn’t looking at what you expect

• typical enable prompt ends with “#”• banner=“### authorized users only ###”• enable prompt=“router23#”

Canonical vs custom prompts• Can cause buffer skew• Know the prompts or be very flexible

Strategies• Resync with unique text • Use time of output as additional sync?

Slide 8

www.ponte.com© 2000-2002 Ponte Communications, Inc.

# resynchronize the buffer with text unlikely

# to occur in the input buffer

# PIX example

proc BufferResync {} {

set buffer_data ""

send "who ?\r"

expect {

-ex {usage: who [ip]} { set

buffer_data $expect_out(buffer) }

timeout { error }

}

# now safe to expect prompt

expect {

-ex {#} {}

timeout { error }

}

# empty the input buffer

expect {

-re {.*} {}

}

# error out if anything arrived after

# prompt

expect -timeout 1 {

-re {.} { error }

timeout {}

}

# erase input

send [control u]

return $buffer_data

}

Prompts

Tcl/Expect Code

Slide 9

www.ponte.com© 2000-2002 Ponte Communications, Inc.

• Device Reset Behavior – IOS devices disconnect the control terminal on a 'reload’– But still accept new connections– And leave other active connections up until later in the reload process– Thus difficult to detect when device has completed its reset

• Typing Speed Some devices are command speed limited

– Device communication over slow serial lines– Minimum-cost processors (i.e. slow)

Inter-command speed can be naturally limited– Throttle inter-command speed by processing intervening prompts– You cannot depend on prompts

–Ex: when connecting through a terminal server to a device–Do not to send an initial [CR] too quickly or device may drop it

Speed & Timing

Tcl/Expect Issues

Slide 10

www.ponte.com© 2000-2002 Ponte Communications, Inc.

# ask for the reload, then wait 5min before

# attempting to reconnect

ExpectReload

sleep 300

...

# slow our “typing” speed for slow device

set sendRate [JobVar DeviceBitRate]

# can only accept data at 25% of bit rate

set loadFactor 25

set send_slow [deviceSpeed $sendRate $loadFactor]

...

send -s "long data string\r"

• Measure actual device reset time, encode into scripts

• Different for every device type

• Sophistication makes sense — but still device-specific

• Slow command entry (“typing”) may be critical for reliable behavior

Speed Issues

Tcl/Expect Code

Slide 11

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Device Control via Terminal Servers 1-0

Tcl/Expect Issues

• Unpredictable prompt at connection– Serial vs. virtual-terminal TCP connection– Device may be in any state at all– Get device into known state

• Terminal server port resets– Terminal server ports get wedged– Good configuration reduces this problem– Need to be terminal-server-aware– Pay careful attention to timeouts– Rebooting terminal server may cause device reboots!!

Slide 12

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ExpectLogin {access} {

set timeout 10

set retries 3

set passwordfailed 0

expect {

-ex {>} {}

-ex {#} {

warning "device was left in \

enable mode"

send "disable\r"

}

-ex {sername:} {

send "[getCSUserName \

$access]\r"

exp_continue

}

-ex {assword:} {

if {$passwordfailed == 0} {

send "[getSystemPasswd \

$access]\r"

set passwordfailed 1

} else {

error "System password was \

rejected"

}

exp_continue

}

-ex {Enter Selection:} {

;# for c1900, enterprise edition

send "K"

exp_continue

}

Device Control via Terminal Servers 1-1

Tcl/Expect Code

Slide 13

www.ponte.com© 2000-2002 Ponte Communications, Inc.

-ex {Press any key to continue.} {

send "\r"

exp_continue

}

-ex {Password required, but none \

set} {

error "Connection closed by \

foreign host. Possible cause:\

no password on device"

}

eof {

retry "Telnet connection to \

device closed unexpectedly"

}

timeout {

set timeout 120

if {$retries > 0} {

incr retries -1

send "\r" exp_continue

} else {

retry "Login timed out \

waiting for \"Password:\""

}

}

}

}

Device Control via Terminal Servers 1-2

Tcl/Expect Code

Slide 14

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Device Control via Terminal Servers 2-0

Tcl/Expect Issues

• “Console” output– Usually console (serial) is the true “console”– Terminal page length may be fixed over the serial port– Asynchronous, unrelated output increases need for resynchronization and fault tolerance

Slide 15

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Suppress console output…

# suppress line width editing

sendCmd "terminal width 0"

# suppress console monitor messages

sendCmd "terminal no monitor"

... [do stuff] ...

sendCmd "terminal monitor"

# suppress "More" prompts

sendCmd "no pager"

...

sendCmd "pager"

Try, try again…(What to do when you can’t

suppress console output)

for {set retries 0} {retries < 3} {incr retries}

{

sendCmd "show version"

set buffer_data [BufferResync]

if {regexp {VERSION: (\W)} \

$buffer_data junk version} {

break

}

}

if {! [info exists version]} { error }

Device Control via Terminal Servers 2-1

Tcl/Expect Code

Slide 16

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Special Concerns RE Firmware

Tcl/Expect Issues

• Configuration File Issues– Commands may be added or removed– Differences in meaning between versions– Often must reconfigure to support firmware– Wholesale firmware change (E.G. CatOS to IOS)

• Transfer Concerns– Distance vs. Reliability– Some devices require local access– Pilot error– TFTP

Slide 17

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Fail-over Devices

Tcl/Expect Issues

• Active/standby and primary/secondary– IP address vs. terminal server “mismatch”– “Two men say they’re Jesus, one of them must be wrong”

• Change volume limits (PIX example) (i.e., 200 lines of conduit changes per “commit”)

• New and expanded commands

Slide 18

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc PIXActive {} {

set cablestatus {NOT FOUND}

set iam {}

set state {}

send "\r"

expect {

{# $} {

send "sho fai\r"

}

timeout {

sendAbort

error "PIXActive: timed out \

waiting for first prompt"

}

}

expect {

-re "Cable status: (\[^\r\n]*)" {

set cablestatus \

$expect_out(1,string)

}

-re "(\n|\r)<--- More --->" {

send " \r"

exp_continue

}

timeout {

sendAbort

error "PIXActive: timed out \

searching for `Cable \

status:.*'"

}

}

Fail-over Devices [detection] (1)

Tcl/Expect Code

Slide 19

www.ponte.com© 2000-2002 Ponte Communications, Inc.

expect {

-re "This host: (\[^ ]*) - \

(\[^ \r\n]*)" {

set iam $expect_out(1,string)

set state $expect_out(2,string)

}

-re "(\n|\r)<--- More --->" {

send " \r"

exp_continue

}

timeout {

sendAbort

error "PIXActive: timed out \

searching for `This host:.*'"

}

}

expect {

-re "(\n|\r)<--- More --->" {

send "q\r"

exp_continue

}

{# $} {}

timeout {

sendAbort

error "PIXActive: timed out \

waiting for final prompt"

}

}

Fail-over Devices [detection] (2)

Tcl/Expect Code

Slide 20

www.ponte.com© 2000-2002 Ponte Communications, Inc.

if {$iam == "Secondary" && \

$state == "Active"} {

JobRetAdd -append Warning \

failover_secondary_active \

"Secondary PIX is Active, \

cable status: $cablestatus\n"

}

if {$cablestatus != "Normal"} {

sendAbort

error "PIXActive: cable status \

failure: $iam Cable status: \

$cablestatus"

}

switch -- $state {

{Standby} {return 0}

{Active} {return 1}

default {

sendAbort

error "PIXActive: failed to \

determine if this host \

active, host $iam, state \

$state, \

cable status $cablestatus"

}

}

}

Fail-over Devices [detection] (3)

Tcl/Expect Code

Slide 21

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ExpectConfigure {cmds} {

ExpectConfigMode

set count 0

foreach cmd $cmds {

send -s $cmd

send "\r"

expect {

-ex {Type help or '?' for a list of \

available commands.} {

sendAbort

error "ExpectConfigure: invalid \ configuration command \

detected, check session log"

}

{(config)#} { }

timeout {

sendAbort

error "ExpectConfigure: timed \

out waiting for (config)# \

after $cmd"

}

}

if {[incr count] > [JobVar

MaxConfigurationLines]} {

set count 0

ExpectWriteConfig

sleep 30

ExpectConfigMode

}

}

ExpectWriteConfig

}

Fail-over Devices [change volume]

Tcl/Expect Code

Slide 22

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Control Channel Problems

Tcl/Expect Issues/Code

• Loss of connection triggers Expect EOF

• Many scripts consider this retry-able– Often caused by transient network failure– But what state was the device in, anyway?

• Distribute control to reduce risk– Place control close to devices– Distance between control and controlled device == risk of network failure

expect {

....

eof { retry "lost connection, retry request" }

}

Slide 23

www.ponte.com© 2000-2002 Ponte Communications, Inc.

Turning Found Configurations into Data

Tcl/Expect Issues

• Retrieve configuration from a PIX#roam-request -a pixdevice -- req_class=AuditConfig \action=import-pixconfig

• After the configuration is retrieved, import- pixconfig uses roam-pixload to parse configuration

#roam-pixload -r $requestId

• roam-pixload gets relevant data from configuration

–interface name, security level, mtu, speed/options,ip address, netmask, fail-over configuration,access and enable passwords

• Then pushes found data back into device profile#roam-device pixdevice -- iface.inside.speed=auto#...

• Device profile is used in combination with template to create configuration file

Slide 24

www.ponte.com© 2000-2002 Ponte Communications, Inc.

• The only way to update device configurations without losing connections on most devices

• Often not possible - many device commands are not invertable

• Must take care to maintain control connectivity

• Cannot do it for firmware update

• Often difficult– Often cannot just add onto end of existing configuration– Can cause serious security issues– Order-dependent configuration changes often cannot be made at all

• Much more difficult to do reliably than just replacing startup configuration and reloading

Differential Configuration

Tcl/Expect Issues

Slide 25

www.ponte.com© 2000-2002 Ponte Communications, Inc.

proc ConfigUpdate {} {

global spawn_id timeout

set system [getSystemPasswd \

[JobGetVar Access]]

set enable [getEnablePasswd \

[JobGetVar Access]]

if {[JobVarExists Failover] && \

[JobGetVar Failover]} {

ConnectFailover $system $enable \

[JobGetVar RemoteAddrList]

} else {

Connect [lindex [JobGetVar \

RemoteAddrList] 0] \

$system $enable

}

set conffile [ExpectGetConfig] VerifyTarget $conffile

set oldconf [prepare_config $conffile]

set newconf [prepare_config \

[JobGetFile [JobGetVar ConfigFile]]]

ExpectConfigure [ComputeDeltaConfig \

$oldconf $newconf]

send "exit\r"

ExpectClose

}

Differential Configuration

Tcl/Expect Code