21
Monitoring and Alerting Ledion Bitincka, Search and Alerting Team

Splunk .conf2011: Real Time Alerting and Monitoring

Embed Size (px)

Citation preview

Page 1: Splunk .conf2011: Real Time Alerting and Monitoring

Monitoring and Alerting

Ledion Bitincka, Search and Alerting Team

Page 2: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20112The 2nd Annual Splunk Worldwide Users’ Conference

Search and Reporting Team@ Splunk for 4+ years - since 3.0Things I’ve worked on:

- Key-value extractions- Transactions, Eventtyping, Typeahead, Summary Indexing- Monitoring and alerting framework- Other random @#$%

Intro …Ledion Bitincka (aka Splunk Albanian)

Page 3: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20113The 2nd Annual Splunk Worldwide Users’ Conference

Why use Splunk for monitoring and alerting?Basic alertingAdvanced alerts and config optionsReal-time alerting and throttling (new in 4.2)Alert Manager (new in 4.2)Sneak peek into new features … Feel free to interrupt when you don’t follow!!!

Agenda

Page 4: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20114The 2nd Annual Splunk Worldwide Users’ Conference

Life Without Splunk

Service DeskApplication

SupportSystems

AdministratorApplication Developer

Application Developer

Database Administrator

Log call. The console says everything is

green.

App monitoring tools don’t show anything either.

Call the developer.

Stop working on new code to

troubleshoot. Need production

logs!

Stop what they’re doing to identify

and gather production logs for developer.

Manual investigation

establishes not application problem.

DBA analyzes the logs which points

to corrupted database files.

Escalate. Escalate. Escalate. Respond. Escalate. Now what?

Page 5: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20115The 2nd Annual Splunk Worldwide Users’ Conference

Life With Splunk

Service Desk

Trouble Ticket

Search on IP address shows related Web session and User ID

“192.168.169.100”

Last 60 minutes

192.168.169.100

Search at same timereveals database errordue to corrupted files

Search for failure or error across entire IT

Last 2 minutes

failure OR error

Search on corruption in the db logs shows that an index file has been corrupted

Search for corruption in db logs

Last 1 minute

host=db.domain.com source=*db.log corrupt*

5

Setup monitoring and alerting for db file corruption

Set up Monitoring and Alerting

Last hour

host=db.domain.com source=*db.log corrupt*

Page 6: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20116The 2nd Annual Splunk Worldwide Users’ Conference

One Splunk. Many Uses.

Page 7: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20117The 2nd Annual Splunk Worldwide Users’ Conference

2. Evaluate alerting condition

Monitor and Alert in Real Time1. Get data

Scheduled

search

Real-timesearch

Alert Conditio

n

3. Execute actions

RSS

Email

SNMP

Script

Yes

Noop

No

Page 8: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20118The 2nd Annual Splunk Worldwide Users’ Conference

Create simple alert using wizard …Available alert actions …Configure email settings (MTA, link hostname)...

Demo… (5-10 min)

Page 9: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 20119The 2nd Annual Splunk Worldwide Users’ Conference

Advanced Alerting OptionsSpecify an advanced schedule using cron notation

Use custom alert conditions

Invoke scripts to perform custom actions - Integrate with other tools

‣ file trouble ticket

- Other custom processing

‣ restart a faulty service

‣ update a firewall rule

‣ temporarily disable a user account

‣ etc …

Page 10: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201110The 2nd Annual Splunk Worldwide Users’ Conference

Real-time Search PrimerSearches forward in time Never completes (unless stopped)

- Constantly updating result set- Only generates results preview

All search commands supported

Page 11: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201111The 2nd Annual Splunk Worldwide Users’ Conference

Splunkd/Scheduler

SearchProcess

time

Search

Start historical search

audit.log

search.log

YN

Notify splunkd

splunkd_access.log

Suppress?

audit.log

Search done

- Execute actions- Update artifact TTL- Suppression update- Alert managerN

Y Done

scheduler.logLogging

Condition

Results

Scheduled Search Alerts

Page 12: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201112The 2nd Annual Splunk Worldwide Users’ Conference

Real-time Alerts

Splunkd/Scheduler

SearchProcess

time

RTSearch

Start RT search

audit.logsearch.log

YN

Notify splunkd

splunkd_access.log

Suppress?

- Execute actions- Update artifact TTL- Suppression update- Alert manager

N

Y

Logging

ConditionResPrev

Done

scheduler.log

ConditionResPrev

NY

…..

Results Snapshot

Page 13: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201113The 2nd Annual Splunk Worldwide Users’ Conference

Real-time AlertsReduce response timeContinuously monitor a conditionScheduler ensures real-time search is always runningThrottling is almost always necessaryCompatible with all alert actions Visible through Alerts Manager

Page 14: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201114The 2nd Annual Splunk Worldwide Users’ Conference

Alert ThrottlingNatively support alert action throttlingUseful in:

- Alert when database server is down, but don’t alert me about this condition for one hour

Available for both standard and real-time alerts

Page 15: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201115The 2nd Annual Splunk Worldwide Users’ Conference

Alerts ManagerSystem-wide view of all triggered alerts Basic alert management featuresAbility to drill down and view why the alert was triggered

Real-time alert results are snapshots in time when triggered

Page 16: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201116The 2nd Annual Splunk Worldwide Users’ Conference

Demo… (5-10 min)Show custom alert conditions, when to use themDemo real-time alerts:

- Throttling- Alert manager

Page 17: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201117The 2nd Annual Splunk Worldwide Users’ Conference

Managing Search LoadSystem wide concurrent searches limited to

- Total: 4 + 4 x number of cores

- Limit used for ad-hoc and scheduled searches

Scheduler queues over limit searches

Scheduler allocation is configurable in limits.conf- [scheduler]

- max_searches_perc = 25 // percentage of system wide concurrent searches to use

- …

Use the Scheduler Activity dashboards

Search App >> Status >> Scheduler Activity >> Overview

Search allocation

Page 18: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201118The 2nd Annual Splunk Worldwide Users’ Conference

savedsearches.conf- Search string, schedule, alert condition, actions etc…

alert_actions.conf- Alert action options such as: email server, format, subject line, ttl etc…

limits.conf- Scheduler’s concurrent search limit

- Action execution related limits

scheduler.log

Look in $SPLUNK_HOME/etc/system/README/<filename>.conf.spec for more detailed info

.conf & .log File Summary

Page 19: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201119The 2nd Annual Splunk Worldwide Users’ Conference

Per result alerting and throttlingMore alert actions to enable more complex alerting conditions- Once five failed login attempts occur enable a monitoring search that alerts on suspicious user activity

Sneak Peek Into New Features

Page 20: Splunk .conf2011: Real Time Alerting and Monitoring

© Copyright Splunk 201120The 2nd Annual Splunk Worldwide Users’ Conference

How scheduled and real-time alerts workCreate simple and advanced real-time alerts Enable alert throttling and check for throttled alertsCheck fired alerts using Alerts ManagerChange scheduler limit defaultsBe an IT hero

Now You Should Know …

Page 21: Splunk .conf2011: Real Time Alerting and Monitoring

August 15, 2011

Questions?

Ledion Bitincka, Search and Alerting Team