Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Copyright © 2013 Splunk Inc.
Dritan Bi;ncka Professional Services
Architec;ng Splunk for High Availability and Disaster Recovery
#splunkconf
Legal No;ces During the course of this presenta;on, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cau;on you that such statements reflect our current expecta;ons and es;mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presenta;on are being made as of the ;me and date of its live presenta;on. If reviewed aUer its live presenta;on, this presenta;on may not contain current or accurate informa;on. We do not assume any obliga;on to update any forward-‐looking statements we may make. In addi;on, any informa;on about our roadmap outlines our general product direc;on and is subject to change at any ;me without no;ce. It is for informa;onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga;on either to develop the features or func;onality described or to include any such feature or func;onality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve
owners.
©2013 Splunk Inc. All rights reserved.
2
About Me
! Member of Professional Services ! Several mul;-‐TB deployments ! 50+ projects/customers ! Third .conf
3
Agenda ! Disaster recovery ! High availability
– Data collection – Searching – Indexing
! Top takeaways
Problems Recover in the event of a disaster
Maintain an acceptable level of continuous service
4
Disaster Recovery (DR)
What is Disaster Recovery?
Set of processes necessary to ensure recovery of service after a disaster
6!
Disaster Recovery Steps
7!
1 Backup necessary data
Backup to a medium at least as resilient as source Local backup vs. remote
2 Restore
Ensure this works Backup is worthless without restore
Backup
8!
1 a Configurations $SPLUNK_HOME/etc/*
b Indexes Buckets: Hot*, warm, cold, frozen
Backup Configurations
9
$SPLUNK_HOME/etc/*
Splunk instance
Backup: Bucket Lifecycle
10!
Events
[Too many warms] [Hot bucket is full]
[Out of space or bucket is old]
[Explicit user action]
$ Thawed path
$ Home path $ Cold path [Cheaper storage]
$ Frozen path or deleted
Backup Indexes
11
Bucket type State Can backup?
Read + write No*
Read only Yes
Read only Yes
What if your backup fails?
12
13
Restore Configurations
14
$SPLUNK_HOME/etc/*
New Splunk instance
$SPLUNK_HOME/etc/*
Splunk instance
Restore Data
15
New Splunk instance Splunk instance
$Indexes_Location ($SPLUNK_HOME/var/lib/splunk)
$Indexes_Location ($SPLUNK_HOME/var/lib/splunk)
Putting Restore Together
16!
2 a New Splunk instance
b Configurations
C Indexes
Things to Think About: ! Recovery )me vs. complexity and cost
– Ex. Backup to tape vs. warm standby
! Tolerable loss vs. complexity and cost – Ex. Backup warm/cold buckets vs. mirror hot bucket writes
17!
High Availability (HA)
What is High Availability?
A design methodology whereby a system is continuously operational, bounded by a set of
predetermined tolerances Note: “high availability” !=“complete availability”
19!
Splunk High Availability
20!
1 Data collection
2 Searching
3 Indexing
Data Collection
21!
Forwarder
Indexers
Forwarder Forwarder . . .
. . .
A B
Data Collection
22!
Forwarder
Indexers
Forwarder Forwarder . . .
. . . outputs.conf: ! ![tcpout] !defaultGroup = mygroup !![tcpout:mygroup] !server = A:9997, B:9997 !autoLB = true!
A B
Searching
23!
2 a Multiple, Independent Search Heads
b Splunk Native Search Head Pooling
Searching
24!
Indexer A Indexer B Indexer N . . . !
Typical search hierarchy
Searching: Independent SH
25!
Indexer A Indexer B Indexer N . . . !
Typical search hierarchy
Searching: Independent SH
26!
Indexer A Indexer B Indexer N
Search head A
. . . !
Search head B
Sync configs Sync job artifacts
Independent search heads
Search Head Pooling
27!
Searching: Pooling
28!
Indexer A Indexer B Indexer N
Search head A
. . . !
Search head B
NFS
Use NFS to share - Configurations - Job artifacts
Searching: Pooling
29!
Indexer A Indexer B Indexer N
Search head A
. . . !
Search head B
NFS
Use NFS to share - Configurations - Job artifacts
Steps to SHP
30!
1 Configure the shared location (NFS/SMB)
2 Enable pooling on each (stopped) search head ./splunk pooling enable <path_to_share>
3 Copy $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users to $path_to_share/etc/
4 Start all search heads
Indexing
31!
3 a Reliable storage (SAN)
b Mirrored indexers
C Index replication
Indexing: Reliable Storage
32!
Use the underlying SAN infrastructure to provide HA
Indexer A
SAN
Indexing: Reliable Storage
33!
SAN
Indexer B
When indexer A fails, mount SAN to another indexer instance
Indexer A
Indexing: Mirrored Indexers
34!
Indexer A.0
Search head
Indexer B.0
Indexer A.1 Indexer B.1
Primary indexers X.0 index (locally) and forward to their respective secondaries X.1
Indexing: Mirrored Indexers
35!
Indexer A.0
Search head
Indexer B.0
Indexer A.1 Indexer B.1
Point search head to mirrored indexer B.1
Index Replication
Indexing: Index Replication ! Introduced with Splunk 5 ! Cluster = a group of search peers (indexers) that replicate each others' data
! Mul;ple copies of all data exist throughout the cluster ! The process that achieves this is known as index replica;on ! Bucket based replica;on
– i.e. the unit of replication is a bucket
37!
Index Replication Benefits ! Data availability. An indexer is always available to handle incoming data, and the indexed data is always available for searching
! Disaster tolerance. System can tolerate downed indexers without losing data or losing access to data
38!
Index Replication Trade Offs
! Extra storage ! To a lesser degree, increased processing load
– More disaster tolerance (i.e., more copies of data) requires higher storage requirements
– Governed the replication factor (RF) attribute
39!
40
3 node Cluster Replication factor of 3
41
Ok guys, pair up in threes! – Yogi Berra
Cluster Components ! Master node
– Orchestrates replication process – Informs the SH where to find data – Helps manage peer configurations
! Peer nodes – Receive and index data (normal peer behavior) – Replicate data to/from other peers – Peer nodes number ≥ RF
42!
Cluster Components ! At least one search head
– Must use one to manage searches across the cluster – Can’t run searches directly on an indexer, as you can with non-
clustered indexers
! Auto load-‐balanced forwarders – Capable of indexer acknowledgement
ê Ensuring data gets reliably indexed – Note that other input methods can be used as well, but no indexer
acknowledgement is available
43!
Cluster Concepts: RF ! Replica;on factor :: RF defaults to 3
– Number of copies of data in the cluster – Cluster can tolerate RF-1 node failures
44!
! Cluster can tolerate 2 node failures
Cluster Concepts: SF ! Searchable factor :: SF defaults to 2
– Number of searchable copies the cluster maintains – SF ≤ RF – Requires more storage and more processing than replicated copies – Replicated copy vs. searchable copy – SF = 1 vs. SF = 2
45!
Clustered Indexing ! Origina;ng peer node streams copies of data to other clustered peers
– They store those copies in their own buckets – Some peers may also index the copy
! Master determines which peer nodes get replicated data – Currently not configurable
! Master manages all peer-‐to-‐peer interac;ons ! Master keeps track of which peers have searchable data
– Ensures that there are always SF copies of searchable data available – When a peer goes down, the master coordinates remedial activities
46!
Clustered Searching ! Search head coordinates all searches in the cluster ! SH relies on master to tell it who its peers are ! Only one replicated bucket is searchable (primary)
– Searches occur over primary buckets, only
! Primary buckets (searchable) may change over ;me – Peers know their status and therefore know where to search
47!
Putting HA All Together
48!
NFS!
………..! ………..!
Search head pooling HA
Index replication or
SAN based mirrorring
Forwarders – autoLB HA
Top Takeways
49!
! DR – Process of backing-up and restoring service in case of disaster – Configuration files – Copy of $SPLUNK_HOME/etc/ folder – Indexed data – Backup and restore buckets
ê Hot, warm, cold, frozen ê Can’t backup hot but can safely backup warm and cold
! HA – Continuously operational system bounded by a set of tolerances – Data collection
ê Autolb from forwarders to multiple indexers ê Use indexer acknowledgement to protect in flight data
– Searching ê Search head pooling (SHP)
– Indexing ê Use index replication ê Highly reliable storage, mirrored set of indexers
Questions?
Next Steps
51
Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App
Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! Go to the sessions listed on the next slide
1
2
3
Other Sessions You May Like: ! Architec)ng and Sizing Your Splunk Deployments Brera 2&3, Level 3 Today, 3-‐4pm
! Planning and Execu)on for Successful Deployments Brera 2&3, Level 3 October 3rd, 10:15-‐11:15am
More Info
52