65
#engageug Fixing Server Sickness Gabriella Davis Technical Director The Turtle Partnership 1

Fixing Domino Server Sickness

Embed Size (px)

DESCRIPTION

From Engage 2014 - Breda, NL Updated presentation on working with Domino tools to analyse and fix problems

Citation preview

Page 1: Fixing Domino Server Sickness

#engageug

Fixing Server Sickness

Gabriella Davis Technical Director

The Turtle Partnership

!1

Page 2: Fixing Domino Server Sickness

#engageug

Fixing Your Server• What causes server sickness • Tools to spot sickness • Getting Your Server Back to Full Health

!2

Page 3: Fixing Domino Server Sickness

#engageug

Server Sickness• The problem with Domino • How does a server get sick?

• Vulnerabilities • Aging Configurations • Bad Habits

!3

Page 4: Fixing Domino Server Sickness

#engageug

Server Sickness• The problem with Domino • How does a server get sick?

• Vulnerabilities • Aging Configurations • Bad Habits • Developers Gone Wild

!4

Page 5: Fixing Domino Server Sickness

#engageug

The Problem With Domino• “My Server Is Running Fine” • Server Stability

• Often despite our best efforts • Tasks that just run

• even without being properly configured

!5

Page 6: Fixing Domino Server Sickness

#engageug

Vulnerabilities• Start with the OS

• patch levels • unnecessary processes with exposed ports • disk and data security

• Then the hardware • It’s all about disk performance • Using a SAN? Is the SAN configured for Domino? • Transaction logs configured?

!6

Page 7: Fixing Domino Server Sickness

#engageug

Vulnerabilities• Security

• ACLs • -Default- and Anonymous • LocalDomainServers

• HTTP vs HTTPs • LDAP • DIIOP • Sametime

!7

Page 8: Fixing Domino Server Sickness

#engageug

Aging Configurations• What can give you problems over time

• Database sizes • More users • More tasks and features

!8

Page 9: Fixing Domino Server Sickness

#engageug

Bad Habits• What are your users doing?

• what features are they using • how are they using them

• are they creating repeating 10yr appointments for instance

• are they copying themselves on emails • Password quality for HTTP passwords

!9

Page 10: Fixing Domino Server Sickness

#engageug

Giving Developers Power• Allowing development to dictate replication and agent

scheduling • The curse of not production tested XPages code • Demands for “LDAP” or “DIIOP” for an application to work

!10

Page 11: Fixing Domino Server Sickness

#engageug

Tools to Spot Sickness• Understanding Priorities • DDM Probes and Event Analysis

!11

Page 12: Fixing Domino Server Sickness

#engageug

Tools to Spot Sickness• Understanding Priorities • DDM Probes and Event Analysis • Statistics • Catalog.nsf • QoS - new with Domino 9 • Enhanced Fault Reporting - new with Domino 9

!12

Page 13: Fixing Domino Server Sickness

#engageug

Understanding Priorities• Server role

• What do you want from your server • What are statistics telling you

• Warning Levels • Is it safe to ignore ‘Warning (Low)’ and focus on ‘Fatal’ or

‘Failure’

!13

Page 14: Fixing Domino Server Sickness

#engageug

Bringing Problems to You• Event Handlers, Event Generators, Statistics, Fault Reports

and DDM Probes - where to start • Setting Statistic Thresholds • Choosing and configuring probes • Reviewing Faults • Setting up QoS behaviour

!14

Page 15: Fixing Domino Server Sickness

#engageug

Bringing Problems To You• Why we set up collection hierarchies for DDM

• and how • Daily and Weekly DDM reviews

• What to look out for

!15

Page 16: Fixing Domino Server Sickness

#engageug

Probes for Mail Servers• Security - Weekly • Directory Performance • Critical mail routes • Mail ‘Slack’

!16

Page 17: Fixing Domino Server Sickness

#engageug

Probes for Application Servers• Agent run times

• agent cpu usage • Security and Web Configuration

!17

Page 18: Fixing Domino Server Sickness

#engageug

Probes for Struggling Servers• OS level

• disk performance (beware of reported SAN problems) • memory • network

!18

Page 19: Fixing Domino Server Sickness

#engageug

What to look for• Fatal problems • Persistent Warnings • Peak activity behaviour

• uptick in problems at 9am, 1pm etc • Repetitive low level ‘annoyances’

!19

Page 20: Fixing Domino Server Sickness

#engageug

Catalog.nsf• Not every database is immediately visible but they are all

there (just hidden with selection formulae) • It’s a good place to start looking for multiple replica • It’s a good place to find ACL issues • Replicates around your domain and updates overnight

!20

Page 21: Fixing Domino Server Sickness

#engageug

QoS - Quality of Service• Monitor server health and performance • Monitors application behavior, stability and hangs • Restarts Domino if it thinks there are memory issues or an

application is hung • Shuts down Domino if a clean shutdown doesn’t happen and

the server hangs • Controlled via notes.ini settings and dcontroller.ini • Requires Domino to be running under the Java Controller

• nserver -jc

!21

Page 22: Fixing Domino Server Sickness

#engageug

QoS Configuration• Starting Domino under Java Controller should create a

dcontroller.ini file • QOS_Enable=1 • In Notes.Ini

• QOS_ProbeInterval (defaults to 1 min) • QOS_ProbeTimeout (defaults to 5 mins) • QOS_ShutDown_Timeout • QOS_Apps_Timeout • QOS_Shutdown_Timeout

!22

Page 23: Fixing Domino Server Sickness

#engageug

QOS - Potential Problems• QOS doesn’t support passwords on server ids , the restart

will pause at the password entry screen • QOS timeouts being too low • Don’t enable QOS on servers without transaction logging

!23

Page 24: Fixing Domino Server Sickness

#engageug

Enhanced Fault Reporting• Fault Reporting Database -lndfr.nsf • Expanded to include a by Disposition view

• all faults when analyzed have a disposition value that categorises as

• Problem • Possible Problem (possibly actionable ) • Possible Problem (likely NOT actionable ) • Informational • Unknown (investigate)

!24

Page 25: Fixing Domino Server Sickness

#engageug

Possible Problem - Actionable• Out Of Memory: Represents a crash in which the Java virtual

machine (JVM) ran out of a memory resource such as heap space.

• Launched Notes multiple times: Indicates that the user quickly launched multiple instances of the Notes client

• Possible hang: Indicates that the Notes client was manually terminated while it appeared to be doing useful work.

• User Kill: Indicates that the user manually terminated the client while it appeared to be waiting for input or network timeout

!25

Page 26: Fixing Domino Server Sickness

#engageug

Back to Full Health• Getting Control

• Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents

• Domino Configuration Tuner

!26

Page 27: Fixing Domino Server Sickness

#engageug

Back to Full Health• Getting Control

• Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents

• Domino Configuration Tuner

!27

Page 28: Fixing Domino Server Sickness

#engageug

Getting Control - Mail and Databases• Setting ACLs at directory level (Editor) • Lock down ECLs via Policies • Introducing quotas alongside server based archiving • Consider archiving files to a dedicated server • Upgrade to 8 and enable OOO router instead of agents • Disable forwarding rules set up by users • Use message tracking and mail rules very sparingly • Disable on the fly searching of non indexed databases

!28

Page 29: Fixing Domino Server Sickness

#engageug

Database Management Tools• DBMT Server Command

• runs copy-style compact operations • purges deletion stubs • expires soft deleted entries • updates views • reorganizes folders • merges full-text indexes • updates unread lists • ensures that critical views are created for failover

• Replaces Updall • Load updall - nodbmt tells updall to run but not perform the

functions that DMBT already does!29

Page 30: Fixing Domino Server Sickness

#engageug

DBMT Parameters• -compactThreads • -updallThreads • -ftiThreads • -timeLimit refers to compact timeout for DBMT • -range starttime stoptime

• compactNdays (run Compact every x days) • ftiNdays (run FT Index every x days) • force d (day Sunday =1) fixup if compact fails for

consecutive day

!30

Page 31: Fixing Domino Server Sickness

#engageug

Getting Control - SMTP• Restrict relaying to specific ip addresses not network ranges • Beware of allowing authenticated relaying and opening up to

dictionary attacks • Restrict rights to send to internal groups from internet

addresses • Don’t accept mail for local part matches • Configure your server for HTML mail not plain text

!31

Page 32: Fixing Domino Server Sickness

#engageug

Getting Control - SMTP (more)• Don’t allow all connecting hosts to deliver mail inbound, if

you use a service restrict to those hosts • Use services / tools to spot attacks such as

• persistent attempts to mass deliver within a time period • continual failures by a host to deliver to a correct address

• Move responsibility for that first line of defense away from native Domino

!32

Page 33: Fixing Domino Server Sickness

#engageug

Getting Control - Agent Scheduling• When are agents set to run

• amgr_newmaileventdelay • amgr_newmailagentmininterval

• If you’re using OOO agents how often are they scheduled • Do users have private agents running

• Sh Agents [DBName] • All shared and private agents in a database

• Who has rights to run agents

!33

Page 34: Fixing Domino Server Sickness

#engageug

Getting Control - Directories• Avoid adding additional views to the Domino Directory • The risk of allowing local replicas with Author rights • Directory Assistance

• Sh xdir

!34

Page 35: Fixing Domino Server Sickness

#engageug

Getting Control - Adminp• Purge old documents • Requests awaiting approval • Tell adminp process NEW not ALL

!35

Page 36: Fixing Domino Server Sickness

#engageug

Getting Control - LDAP• Allowing anonymous access to query LDAP • Authenticating LDAP queries • Extended Directory Catalog used by LDAP • Relying on DNS • Not configuring the LDAP task correctly to allow large

searches with no timeouts • Maintaining schema.nsf

!36

Page 37: Fixing Domino Server Sickness

#engageug

Getting Control - Tasks and Program Documents• Disable tasks you don’t need • Schedule overnight tasks so they don’t overlap

• and don’t conflict with backups • Use program documents so you can review and manage

easily • sh config servertasksat*

• Keeping templates on every server • Using compact -B

!37

Page 38: Fixing Domino Server Sickness

#engageug

Getting Control - Internet Site Documents• Web Configuration means TCPIP tasks are configured in the

server document and are server wide • often enabled by default

• Internet site documents require you to opt in for TCPIP services

• configured by hostname

!38

Page 39: Fixing Domino Server Sickness

#engageug

Domino Configuration Tuner• Domino Configuration Tuner is an analysis tool based on a

set of pre-configured best practice/worst practice rules • The Rules are shipped by IBM with the Lotus installs and are

updated via a public update site • Makes recommendations on configuration changes to

enhance performance and security and reduce TCO

!39

Page 40: Fixing Domino Server Sickness

#engageug

How does it work?• Run and installed via the Domino Configuration Tuner

database • Updated by online template updates and rule updates • DCT rules and results are held in a local database and will

require a restart of the client for changes to take effect • Scans

• Server documents • notes.ini settings • advanced database properties

• Intended to scan servers in a single domain!40

Page 41: Fixing Domino Server Sickness

#engageug

How does it work?• Creates reports on each scanned server based on the rules

you select • Each report contains

• Issues • recommendations for adjustments • links to supporting documentation

!41

Page 42: Fixing Domino Server Sickness

#engageug

Pre-requisites• v8 Notes client (standard or basic) or administrator • dct.nsf database and dct.ntf template • servers 7.x or higher

!42

Page 43: Fixing Domino Server Sickness

#engageug

Setup• DCT.NSF • StdDominoConfigTuner Template (dct.ntf) • ID must have reader access to names.nsf • ID must have ‘View Administrator’ rights • Requires no server or domain changes

!43

Page 44: Fixing Domino Server Sickness

#engageug

View Administrator Rights• Server Document • Security Tab • View Administrator is a subset

of ‘Administrator’ rights • Think of it as ‘Show’ not ‘Tell’ rights

• Sh users - YES • tell http refresh - NO

!44

Page 45: Fixing Domino Server Sickness

#engageug

DCT Preferences• List of all rules • Review rule , description and supporting documentation • All rules are enabled by default for all scans • Enable and Disable rules

!45

Page 46: Fixing Domino Server Sickness

#engageug

DCT Updates• Connects to the IBM site to download

• must have outbound connectivity

!46

Page 47: Fixing Domino Server Sickness

#engageug

DCT Updates• Click ‘check for updates’ • Connects to an external IBM site to identifies any template or

rule updates

!47

Page 48: Fixing Domino Server Sickness

#engageug

DCT Updates• Accept license and updates download • It’s not possible to selectively download

!48

Page 49: Fixing Domino Server Sickness

#engageug

DCT Updates - Finished• “Successful” screen will notify you to restart your client • You may need to do 2 client restarts before DCT can be

used

!49

Page 50: Fixing Domino Server Sickness

#engageug

• First select the servers in your current domain you want to run against

• The list of servers is retrieved from the domain of the home server identified in your location document

• Change locations to scan a different domain

Running the tuner

!50

Page 51: Fixing Domino Server Sickness

#engageug

• You can manually type in the full hierarchical names of any other servers you want to scan as part of this analysis

• Separate multiple server names with commas, semi colons or new lines

• You can only scan servers you can reach so you need a connection document to any you list

• or the server needs to be available via your passthru server in your location

Running the tuner

!51

Page 52: Fixing Domino Server Sickness

#engageug

Understanding the Results• Summary results • Issues by criticality

!52

Page 53: Fixing Domino Server Sickness

#engageug

Understanding the Results• Summary results • Servers that failed to scan

• reason why scan failed

!53

Page 54: Fixing Domino Server Sickness

#engageug

Understanding the Results• Summary results • Detailed list of rules evaluated

!54

Page 55: Fixing Domino Server Sickness

#engageug

Understanding the Results• View the current report • Select ‘change’ to view a different report

!55

Page 56: Fixing Domino Server Sickness

#engageug

Understanding the Results• Filter results to make analysis easier

• by server • by specific rules • by severity

!56

Page 57: Fixing Domino Server Sickness

#engageug

Understanding the results• Categorised results of recommendations • Sorted by criticality and then by server name

!57

Page 58: Fixing Domino Server Sickness

#engageug

Understanding the results• Each recommendation comes with an explanation so you

can evaluate on a result by result basis if you want to make the change

!58

Page 59: Fixing Domino Server Sickness

#engageug

• Each recommendation is provided with a link to a best / worst practices supporting documentation

Understanding the results

!59

Page 60: Fixing Domino Server Sickness

#engageug

Working with Rules• Disabling and enabling rules can be done through the

‘Preferences’

!60

Page 61: Fixing Domino Server Sickness

#engageug

Working with Rules• Selecting a rule shows the description and links to the best /

worst practice documentation

!61

Page 62: Fixing Domino Server Sickness

#engageug

Making Changes• Advanced Database Properties

• assigned en masse via Domino Admin • notes.ini settings

• assigned via the command set config xxx = x • shown via the command sh config xxx = x

• Many recommendations refer to ‘some databases’ but don’t specify which ones - check which ones will be affected

!62

Page 63: Fixing Domino Server Sickness

#engageug

Resources• Domino Configuration Tuner blog

• http://www.bleedyellow.com/blogs/DCT/ • details and explanations of new rules published each

month

!63

Page 64: Fixing Domino Server Sickness

#engageug

Summary• No matter how well your servers are configured they will continue to degrade in

performance over time unless you pro-actively monitor and fix • Many of the server performance issues will be seen first by your users before

they filter down to you • Make reviewing your server configuration using DDM probes followed by a DCT

analysis part of every server upgrade • Enable probes that are specific to the server role. Mail and Directory probes on

Mail servers and Agent probes on Application servers • Use Security and Database probes configured in DDM to stay on top of any low

level warnings that could cause larger problems in the future • Don’t over configure your servers to monitor everything or you’ll be looking for

a needle in a haystack. Ask your servers to tell you only what you need to be aware of so immediately

• Use the built in tools, DCT, Statistics, DDM, Catalog, Activity Trends to monitor your servers and gain a good understanding of what is their ‘normal’ behaviour so you can more easily spot when something goes wrong.

!64

Page 65: Fixing Domino Server Sickness

#engageug

Questions

!65

How to contact me: Gabriella Davis [email protected] Twitter: gabturtle