Upload
erin-sweeney
View
3.289
Download
4
Tags:
Embed Size (px)
Citation preview
Monitoring and Alerting
Ledion Bitincka, Search and Alerting Team
© Copyright Splunk 20112The 2nd Annual Splunk Worldwide Users’ Conference
Search and Reporting Team@ Splunk for 4+ years - since 3.0Things I’ve worked on:
- Key-value extractions- Transactions, Eventtyping, Typeahead, Summary Indexing- Monitoring and alerting framework- Other random @#$%
Intro …Ledion Bitincka (aka Splunk Albanian)
© Copyright Splunk 20113The 2nd Annual Splunk Worldwide Users’ Conference
Why use Splunk for monitoring and alerting?Basic alertingAdvanced alerts and config optionsReal-time alerting and throttling (new in 4.2)Alert Manager (new in 4.2)Sneak peek into new features … Feel free to interrupt when you don’t follow!!!
Agenda
© Copyright Splunk 20114The 2nd Annual Splunk Worldwide Users’ Conference
Life Without Splunk
Service DeskApplication
SupportSystems
AdministratorApplication Developer
Application Developer
Database Administrator
Log call. The console says everything is
green.
App monitoring tools don’t show anything either.
Call the developer.
Stop working on new code to
troubleshoot. Need production
logs!
Stop what they’re doing to identify
and gather production logs for developer.
Manual investigation
establishes not application problem.
DBA analyzes the logs which points
to corrupted database files.
Escalate. Escalate. Escalate. Respond. Escalate. Now what?
© Copyright Splunk 20115The 2nd Annual Splunk Worldwide Users’ Conference
Life With Splunk
Service Desk
Trouble Ticket
Search on IP address shows related Web session and User ID
“192.168.169.100”
Last 60 minutes
192.168.169.100
Search at same timereveals database errordue to corrupted files
Search for failure or error across entire IT
Last 2 minutes
failure OR error
Search on corruption in the db logs shows that an index file has been corrupted
Search for corruption in db logs
Last 1 minute
host=db.domain.com source=*db.log corrupt*
5
Setup monitoring and alerting for db file corruption
Set up Monitoring and Alerting
Last hour
host=db.domain.com source=*db.log corrupt*
© Copyright Splunk 20116The 2nd Annual Splunk Worldwide Users’ Conference
One Splunk. Many Uses.
© Copyright Splunk 20117The 2nd Annual Splunk Worldwide Users’ Conference
2. Evaluate alerting condition
Monitor and Alert in Real Time1. Get data
Scheduled
search
Real-timesearch
Alert Conditio
n
3. Execute actions
RSS
SNMP
Script
Yes
Noop
No
© Copyright Splunk 20118The 2nd Annual Splunk Worldwide Users’ Conference
Create simple alert using wizard …Available alert actions …Configure email settings (MTA, link hostname)...
Demo… (5-10 min)
© Copyright Splunk 20119The 2nd Annual Splunk Worldwide Users’ Conference
Advanced Alerting OptionsSpecify an advanced schedule using cron notation
Use custom alert conditions
Invoke scripts to perform custom actions - Integrate with other tools
‣ file trouble ticket
- Other custom processing
‣ restart a faulty service
‣ update a firewall rule
‣ temporarily disable a user account
‣ etc …
© Copyright Splunk 201110The 2nd Annual Splunk Worldwide Users’ Conference
Real-time Search PrimerSearches forward in time Never completes (unless stopped)
- Constantly updating result set- Only generates results preview
All search commands supported
© Copyright Splunk 201111The 2nd Annual Splunk Worldwide Users’ Conference
Splunkd/Scheduler
SearchProcess
time
Search
Start historical search
audit.log
search.log
YN
Notify splunkd
splunkd_access.log
Suppress?
audit.log
Search done
- Execute actions- Update artifact TTL- Suppression update- Alert managerN
Y Done
scheduler.logLogging
Condition
Results
Scheduled Search Alerts
© Copyright Splunk 201112The 2nd Annual Splunk Worldwide Users’ Conference
Real-time Alerts
Splunkd/Scheduler
SearchProcess
time
RTSearch
Start RT search
audit.logsearch.log
YN
Notify splunkd
splunkd_access.log
Suppress?
- Execute actions- Update artifact TTL- Suppression update- Alert manager
N
Y
Logging
ConditionResPrev
Done
scheduler.log
ConditionResPrev
NY
…..
Results Snapshot
© Copyright Splunk 201113The 2nd Annual Splunk Worldwide Users’ Conference
Real-time AlertsReduce response timeContinuously monitor a conditionScheduler ensures real-time search is always runningThrottling is almost always necessaryCompatible with all alert actions Visible through Alerts Manager
© Copyright Splunk 201114The 2nd Annual Splunk Worldwide Users’ Conference
Alert ThrottlingNatively support alert action throttlingUseful in:
- Alert when database server is down, but don’t alert me about this condition for one hour
Available for both standard and real-time alerts
© Copyright Splunk 201115The 2nd Annual Splunk Worldwide Users’ Conference
Alerts ManagerSystem-wide view of all triggered alerts Basic alert management featuresAbility to drill down and view why the alert was triggered
Real-time alert results are snapshots in time when triggered
© Copyright Splunk 201116The 2nd Annual Splunk Worldwide Users’ Conference
Demo… (5-10 min)Show custom alert conditions, when to use themDemo real-time alerts:
- Throttling- Alert manager
© Copyright Splunk 201117The 2nd Annual Splunk Worldwide Users’ Conference
Managing Search LoadSystem wide concurrent searches limited to
- Total: 4 + 4 x number of cores
- Limit used for ad-hoc and scheduled searches
Scheduler queues over limit searches
Scheduler allocation is configurable in limits.conf- [scheduler]
- max_searches_perc = 25 // percentage of system wide concurrent searches to use
- …
Use the Scheduler Activity dashboards
Search App >> Status >> Scheduler Activity >> Overview
Search allocation
© Copyright Splunk 201118The 2nd Annual Splunk Worldwide Users’ Conference
savedsearches.conf- Search string, schedule, alert condition, actions etc…
alert_actions.conf- Alert action options such as: email server, format, subject line, ttl etc…
limits.conf- Scheduler’s concurrent search limit
- Action execution related limits
scheduler.log
Look in $SPLUNK_HOME/etc/system/README/<filename>.conf.spec for more detailed info
.conf & .log File Summary
© Copyright Splunk 201119The 2nd Annual Splunk Worldwide Users’ Conference
Per result alerting and throttlingMore alert actions to enable more complex alerting conditions- Once five failed login attempts occur enable a monitoring search that alerts on suspicious user activity
Sneak Peek Into New Features
© Copyright Splunk 201120The 2nd Annual Splunk Worldwide Users’ Conference
How scheduled and real-time alerts workCreate simple and advanced real-time alerts Enable alert throttling and check for throttled alertsCheck fired alerts using Alerts ManagerChange scheduler limit defaultsBe an IT hero
Now You Should Know …
August 15, 2011
Questions?
Ledion Bitincka, Search and Alerting Team