29
DSM Scalability Considerations for Unicenter NSM r11 - Last Updated June 5 2006

DSM Scalability Considerations for Unicenter NSM r11

  • Upload
    xenos

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

DSM Scalability Considerations for Unicenter NSM r11. Last Updated June 5 2006. Best Practice Summary – see notes. 50k local objects polled in one DSM is fine for r11 Manage polling to not exceed 600 polls per second Must configure –m parameter to allow this load - PowerPoint PPT Presentation

Citation preview

Page 1: DSM Scalability Considerations for Unicenter NSM r11

DSM Scalability Considerations for Unicenter NSM r11

- Last Updated June 5 2006

Page 2: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Best Practice Summary – see notes

- 50k local objects polled in one DSM is fine for r11

- Manage polling to not exceed 600 polls per second

- Must configure –m parameter to allow this load

- We encourage managing poll cycle use avg >20% and <50% of poll time window

- More than 100 DSMs can report to one MDB

Page 3: DSM Scalability Considerations for Unicenter NSM r11

Detailed DSM Performance

Page 4: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Objectives

- Understand issues affecting DSM performance

- Understand issues affecting scalability

- Consider architectural options

- Recommendations

Page 5: DSM Scalability Considerations for Unicenter NSM r11

Issues affecting DSM performance

Page 6: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Understand issues affecting DSM performance

- Hardware

- Local vs remote DSM(s)

- Cold start vs. warm start

- Electronic proximity to hosts

- Network configuration and congestion

- Number of hosts

- Number of managed objects

- Polling configuration

Page 7: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Hardware

- See Hardware Requirements in NSM r11 Implementation Guide for latest guidance

Page 8: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Hardware- Does hardware matter? - 30,000 objects ~= 2 subnets with 50 objects per host

Hardware Comparison

05000

100001500020000250003000035000

:00 :05 :10 :15 :20 :25 :30 :35 :40

Elapsed Time

Obj

ects

2x3.0 Ghz HT 500Mhz

Page 9: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Local vs remote DSM(s)

- For smaller implementations a local DSM on the MDB machine is OK

- For larger implementations, remote DSM(s) should be strongly considered

- DSM should be electronically close to what it polls and may connect to a remote MDB

Page 10: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Local vs remote DSM(s)

Local vs Remote DSM

0

10000

20000

30000

40000

50000

60000

70000

:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60

Elapsed Time

DS

M O

bje

cts

Local DSM Remote DSM (60k) Remote DSM (30k)

Page 11: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Multiple Remote DSMs

- Multiple remote DSMs have a synergistic effect

2 Remote DSMs Startup

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60

Elapsed Time

Ob

ject

s

Remote DSM 1 Remote DSM 2 Combined

Page 12: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Local vs remote DSM(s)

- Local and remote DSM not as strong

Local & Remote DSM Startup

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60

Elapsed Time

Ob

ject

s

Local DSM Remote DSM Combined

Page 13: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Cold start vs. warm start

- Set “WarmStart=yes” option in %AGENTWORKS_DIR%\services\config\atmanager.ini

- Warm start uses previously discovered objects

- Reduces MDB access time

- Reduces discovery process time

- Must still confirm status

Page 14: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Cold start vs. warm start- Startup measured as time to DSM settling

DSM start complete

Page 15: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Cold start vs. warm start

- Startup elapsed times

Cold Start vs. Warm Start

Cold Start

Wam Start

0:00

0:14

0:28

0:43

0:57

1:12

1:26

Startup Type

Ela

pse

d T

ime

Page 16: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Electronic proximity to hosts

- Standard best practice not more than 3 hops

- High performance LAN access to hosts and MDB

- Avoid WAN links

- Given a choice, put a DSM close to what it polls, instead of close to its MDB

- Missed traps is in indication of excessive load or network busy – reduce distance of polling/traps

Page 17: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

LAN Polling

Polls Per Second27,475 DSM Objects

aws_snmp -m 600

0

500

1000

1500

2000

2500

3000

3500

4000

4500

18:4

7:14

18:5

0:04

18:5

2:50

18:5

5:37

18:5

8:23

19:0

1:14

19:0

4:00

19:0

6:46

19:0

9:32

19:1

2:18

19:1

5:04

19:1

7:50

19:2

0:36

19:2

3:22

19:2

6:09

19:2

8:55

19:3

1:41

19:3

4:27

19:3

7:13

19:3

9:59

19:4

2:45

19:4

5:31

19:4

8:17

19:5

1:03

Polls Per Second

Average

+2 Std Dev

Page 18: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Network configuration and congestion- DSM should usually handle whole subnets

- Fast/stable path to MDB

- Network utilization

- Errors, timeouts, and retries

- Missed traps must be addressed

- Poll cycle must have free time for lead peaking

- Size counts

Page 19: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

WAN Polling

Polls Per Second35,178 DSM Objectsaws_snmp -m 1000

0

500

1000

1500

2000

2500

3000

3500

4000

21:3

8:47

21:4

3:33

21:4

6:19

21:4

9:05

21:5

1:56

21:5

4:42

21:5

7:28

22:0

0:17

22:0

3:07

22:0

5:53

22:0

8:39

22:1

1:27

22:1

4:13

22:1

6:59

22:1

9:45

22:2

2:31

22:2

5:17

22:2

8:03

22:3

0:49

22:3

3:35

22:3

6:21

22:3

9:07

22:4

1:53

22:4

4:39

Polls Per Second

Average

+2 Std Dev

Page 20: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Number of hosts

- Affects startup and first stage discovery

- Affects total DSM object population

- Affects DSM host configuration

Page 21: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Number of objects

- Each managed host may spawn dozens of objects

- Agents

- Watchers

- Split DSMs to keep number of objects constrained

- Split DSMs to keep electronically close

- Obrowser and query with no argument displays objects – actual polled objects usually is fewer

Page 22: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Polling configuration – see notes

- Polling interval

- Polling rate for r11 DSM sustained at up to 1,000 polls/second (laboratory only – do not exceed 600)

- Speeds discovery (?)

- Not needed for status polling

- 10 to 20 minutes polling still best practice

- 50,000 poll-able objects at 10 minute polling interval is about 80 polls/second

- Timeouts are critical- Assume timeout 10, retry 2 = 30 second delay

- DSM thread waits for reply or timeout on SNMPGET

- IP policy makes extensive use of SNMPGET

Page 23: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Polling configuration

- Calculating polling rates

- Target no more than 50% MaxPollRate utilization and no less than 20% MaxPollRate utilization

- 200/sec: five minute interval is 300 seconds so do not attempt more than 30k polls in five minute interval (300 seconds X .50 X 200 polls per second) = 30k objects polled every 5 minutes

- Configure [aws_snmp] MaxPollRate in atservices.ini

Page 24: DSM Scalability Considerations for Unicenter NSM r11

Issues affecting scalability

Page 25: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Issues affecting scalability

- Hardware

- What hardware is available?

- Can it support MDB + DSM?

- Network

- How electronically close are managed objects?

- Is there capacity to handle polling and trap traffic?

- How reliable is the network?

- Geographic proximity

- Do managed objects exist on other side of WAN?

- Polling

- What are the polling requirements?

Page 26: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Issues affecting scalability

- Type of host activity

- Web server

- Application server

- Database server

- Batch server

Page 27: DSM Scalability Considerations for Unicenter NSM r11

Architectural options

Page 28: DSM Scalability Considerations for Unicenter NSM r11

© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Architectural Options- Local DSM

- Fine for smaller shops

- Add remote DSMs as necessary

- Add remote DSMs to improve performance

- Use several smaller DSMs

- Closer to managed objects (most important tuning choice!)

- Faster startup

- More robust (not single point of failure)

- Reduces effect of an outage

- Bridged MDBs

- Distribute MDBs for better DSM access – not critical unless bandwidth to MDB limited and high update activity

Page 29: DSM Scalability Considerations for Unicenter NSM r11

Questions?