Architec;ngSplunkforHigh AvailabilityandDisaster ...€¦ · Clustered Indexing! Originangpeernodestreamscopiesofdatatootherclusteredpeers – They store those copies

Copyright © 2013 Splunk Inc.

Dritan Bi;ncka Professional Services

Architec;ng Splunk for High Availability and Disaster Recovery

#splunkconf

Legal No;ces During the course of this presenta;on, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cau;on you that such statements reflect our current expecta;ons and es;mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presenta;on are being made as of the ;me and date of its live presenta;on. If reviewed aUer its live presenta;on, this presenta;on may not contain current or accurate informa;on. We do not assume any obliga;on to update any forward-‐looking statements we may make. In addi;on, any informa;on about our roadmap outlines our general product direc;on and is subject to change at any ;me without no;ce. It is for informa;onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga;on either to develop the features or func;onality described or to include any such feature or func;onality in a future release.

Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve

owners.

©2013 Splunk Inc. All rights reserved.

2

About Me

!   Member of Professional Services !   Several mul;-‐TB deployments !   50+ projects/customers !   Third .conf

3

Agenda !  Disaster recovery !  High availability

–  Data collection –  Searching –  Indexing

!   Top takeaways

Problems Recover in the event of a disaster

Maintain an acceptable level of continuous service

4

Disaster Recovery (DR)

What is Disaster Recovery?

Set of processes necessary to ensure recovery of service after a disaster

6!

Disaster Recovery Steps

7!

1 Backup necessary data

Backup to a medium at least as resilient as source Local backup vs. remote

2 Restore

Ensure this works Backup is worthless without restore

Backup

8!

1 a Configurations $SPLUNK_HOME/etc/*

b Indexes Buckets: Hot*, warm, cold, frozen

Backup Configurations

9

$SPLUNK_HOME/etc/*

Splunk instance

Backup: Bucket Lifecycle

10!

Events

[Too many warms] [Hot bucket is full]

[Out of space or bucket is old]

[Explicit user action]

$ Thawed path

$ Home path $ Cold path [Cheaper storage]

$ Frozen path or deleted

Backup Indexes

11

Bucket type State Can backup?

Read + write No*

Read only Yes

Read only Yes

What if your backup fails?

12

13

Restore Configurations

14

$SPLUNK_HOME/etc/*

New Splunk instance

$SPLUNK_HOME/etc/*

Splunk instance

Restore Data

15

New Splunk instance Splunk instance

$Indexes_Location ($SPLUNK_HOME/var/lib/splunk)

$Indexes_Location ($SPLUNK_HOME/var/lib/splunk)

Putting Restore Together

16!

2 a New Splunk instance

b Configurations

C Indexes

Things to Think About: !   Recovery )me vs. complexity and cost

–  Ex. Backup to tape vs. warm standby

!   Tolerable loss vs. complexity and cost –  Ex. Backup warm/cold buckets vs. mirror hot bucket writes

17!

High Availability (HA)

What is High Availability?

A design methodology whereby a system is continuously operational, bounded by a set of

predetermined tolerances Note: “high availability” !=“complete availability”

19!

Splunk High Availability

20!

1 Data collection

2 Searching

3 Indexing

Data Collection

21!

Forwarder

Indexers

Forwarder Forwarder . . .

. . .

A B

Data Collection

22!

Forwarder

Indexers

Forwarder Forwarder . . .

. . . outputs.conf: ! ![tcpout] !defaultGroup = mygroup !![tcpout:mygroup] !server = A:9997, B:9997 !autoLB = true!

A B

Searching

23!

2 a Multiple, Independent Search Heads

b Splunk Native Search Head Pooling

Searching

24!

Indexer A Indexer B Indexer N . . . !

Typical search hierarchy

Searching: Independent SH

25!

Indexer A Indexer B Indexer N . . . !

Typical search hierarchy

Searching: Independent SH

26!

Indexer A Indexer B Indexer N

Search head A

. . . !

Search head B

Sync configs Sync job artifacts

Independent search heads

Search Head Pooling

27!

Searching: Pooling

28!


Search head A

. . . !

Search head B

NFS

Use NFS to share - Configurations - Job artifacts

Searching: Pooling

29!


Search head A

. . . !

Search head B

NFS

Use NFS to share - Configurations - Job artifacts

Steps to SHP

30!

1 Configure the shared location (NFS/SMB)

2 Enable pooling on each (stopped) search head ./splunk pooling enable <path_to_share>

3 Copy $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users to $path_to_share/etc/

4 Start all search heads

Indexing

31!

3 a Reliable storage (SAN)

b Mirrored indexers

C Index replication

Indexing: Reliable Storage

32!

Use the underlying SAN infrastructure to provide HA

Indexer A

SAN

Indexing: Reliable Storage

33!

SAN

Indexer B

When indexer A fails, mount SAN to another indexer instance

Indexer A

Indexing: Mirrored Indexers

34!

Indexer A.0

Search head

Indexer B.0

Indexer A.1 Indexer B.1

Primary indexers X.0 index (locally) and forward to their respective secondaries X.1

Indexing: Mirrored Indexers

35!

Indexer A.0

Search head

Indexer B.0

Indexer A.1 Indexer B.1

Point search head to mirrored indexer B.1

Index Replication

Indexing: Index Replication !   Introduced with Splunk 5 !   Cluster = a group of search peers (indexers) that replicate each others' data

!   Mul;ple copies of all data exist throughout the cluster !   The process that achieves this is known as index replica;on !   Bucket based replica;on

–  i.e. the unit of replication is a bucket

37!

Index Replication Benefits !   Data availability. An indexer is always available to handle incoming data, and the indexed data is always available for searching

!   Disaster tolerance. System can tolerate downed indexers without losing data or losing access to data

38!

Index Replication Trade Offs

!  Extra storage !  To a lesser degree, increased processing load

–  More disaster tolerance (i.e., more copies of data) requires higher storage requirements

–  Governed the replication factor (RF) attribute

39!

40

3 node Cluster Replication factor of 3

41

Ok guys, pair up in threes! – Yogi Berra

Cluster Components !  Master node

–  Orchestrates replication process –  Informs the SH where to find data –  Helps manage peer configurations

!   Peer nodes –  Receive and index data (normal peer behavior) –  Replicate data to/from other peers –  Peer nodes number ≥ RF

42!

Cluster Components !   At least one search head

–  Must use one to manage searches across the cluster –  Can’t run searches directly on an indexer, as you can with non-

clustered indexers

!   Auto load-‐balanced forwarders –  Capable of indexer acknowledgement

ê  Ensuring data gets reliably indexed –  Note that other input methods can be used as well, but no indexer

acknowledgement is available

43!

Cluster Concepts: RF !   Replica;on factor :: RF defaults to 3

–  Number of copies of data in the cluster –  Cluster can tolerate RF-1 node failures

44!

!   Cluster can tolerate 2 node failures

Cluster Concepts: SF !   Searchable factor :: SF defaults to 2

–  Number of searchable copies the cluster maintains –  SF ≤ RF –  Requires more storage and more processing than replicated copies –  Replicated copy vs. searchable copy –  SF = 1 vs. SF = 2

45!

Clustered Indexing !   Origina;ng peer node streams copies of data to other clustered peers

–  They store those copies in their own buckets –  Some peers may also index the copy

!   Master determines which peer nodes get replicated data –  Currently not configurable

!   Master manages all peer-‐to-‐peer interac;ons !   Master keeps track of which peers have searchable data

–  Ensures that there are always SF copies of searchable data available –  When a peer goes down, the master coordinates remedial activities

46!

Clustered Searching !   Search head coordinates all searches in the cluster !   SH relies on master to tell it who its peers are !   Only one replicated bucket is searchable (primary)

–  Searches occur over primary buckets, only

!   Primary buckets (searchable) may change over ;me –  Peers know their status and therefore know where to search

47!

Putting HA All Together

48!

NFS!

………..! ………..!

Search head pooling HA

Index replication or

SAN based mirrorring

Forwarders – autoLB HA

Top Takeways

49!

!   DR – Process of backing-up and restoring service in case of disaster –  Configuration files – Copy of $SPLUNK_HOME/etc/ folder –  Indexed data – Backup and restore buckets

ê  Hot, warm, cold, frozen ê  Can’t backup hot but can safely backup warm and cold

!   HA – Continuously operational system bounded by a set of tolerances –  Data collection

ê  Autolb from forwarders to multiple indexers ê  Use indexer acknowledgement to protect in flight data

–  Searching ê  Search head pooling (SHP)

–  Indexing ê  Use index replication ê  Highly reliable storage, mirrored set of indexers

Questions?

Next Steps

51

Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App

Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! Go to the sessions listed on the next slide

1

2

3

Other Sessions You May Like: !   Architec)ng and Sizing Your Splunk Deployments Brera 2&3, Level 3 Today, 3-‐4pm

!   Planning and Execu)on for Successful Deployments Brera 2&3, Level 3 October 3rd, 10:15-‐11:15am

More Info

52

Documents

Architec;ng*Splunk*for*High* Availability*and*Disaster ...€¦ · Clustered Indexing! Originang*peer*node*streams*copies*of*datato*other*clustered*peers* – They store those copies

Architec;ngSplunkforHigh AvailabilityandDisaster ...€¦ · Clustered Indexing! Originangpeernodestreamscopiesofdatatootherclusteredpeers – They store those copies