29
Splunk Enterprise 6.2.2 Capacity Planning Manual Generated: 3/29/2015 12:57 am Copyright (c) 2015 Splunk Inc. All Rights Reserved

Splunk 6.2.2 Capacity

Embed Size (px)

DESCRIPTION

Splunk 6.2.2 Capacity

Citation preview

  • Splunk Enterprise 6.2.2Capacity Planning ManualGenerated: 3/29/2015 12:57 am

    Copyright (c) 2015 Splunk Inc. All Rights Reserved

  • Table of ContentsIntroduction..........................................................................................................1

    Introduction to capacity planning for Splunk Enterprise............................1

    Hardware capacity planning...............................................................................2 Components of a Splunk Enterprise deployment......................................2 Dimensions of a Splunk Enterprise deployment.......................................4 How incoming data affects Splunk Enterprise performance.....................5 How indexed data affects Splunk Enterprise performance.......................6 How concurrent users affect Splunk Enterprise performance...................6 How saved searches / reports affect Splunk Enterprise performance......7 How search types affect Splunk Enterprise performance.........................7 How Splunk apps affect Splunk Enterprise performance..........................9 How Splunk Enterprise calculates disk storage........................................9 Estimate your storage requirements.......................................................10

    Scale your Splunk Enterprise Deployment.....................................................12 Distribute indexing and searching...........................................................12 Accommodate many simultaneous searches..........................................13

    Performance Reference.....................................................................................18 Reference hardware................................................................................18 Performance checklist.............................................................................23 Summary of performance recommendations..........................................25 Forwarder-to-indexer ratios.....................................................................26

    i

  • Introduction

    Introduction to capacity planning for SplunkEnterpriseYou can expand Splunk Enterprise to meet almost any capacity requirement. Totake advantage of this scaling capability requires planning. This manualdiscusses high-level hardware guidance for Splunk Enterprise deployments anddescribes how Splunk Enterprise uses hardware resources in different situations.

    New for Splunk Enterprise version 6.2 and later, this manual supersedesguidance about capacity planning in the Installation and Distributed Deploymentmanuals. It provides information about reference hardware and a performancechecklist to determine when and how you should scale your deployment basedon your needs.

    Before you decide on hardware for Splunk Enterprise, see the followinginformation:

    Review "Components of a Splunk Enterprise Deployment" in this manualfor a description of the elements in a Splunk Enterprise installation.

    Read about the dimensions of a Splunk Enterprise deployment, how thosedimensions impact performance, and how to maximize performance.

    Learn about the basic building block of a Splunk Enterprise deployment in"Reference hardware" in this manual.

    1

  • Hardware capacity planning

    Components of a Splunk Enterprise deploymentBy using a single software component and easy to understand configurations,Splunk Enterprise can coexist with existing infrastructure or be deployed as auniversal platform for accessing IT data.

    The simplest deployment is the one you get by default when you install SplunkEnterprise: indexing and searching on the same server. You log into Splunk Webor the CLI on the server and configure data inputs to collect machine data. Youthen use the same server to search, monitor, alert, and report on the incomingdata.

    You can also deploy components of Splunk Enterprise on different servers toaddress your load and availability requirements. This section introduces thetypes of components. See the Distributed Deployment manual, particularly thetopic, "Scale your deployment: Splunk components."

    Indexer

    Splunk indexers provide data processing and storage for local and remote dataand host the primary Splunk data store. See "How indexing works" in theManaging Indexers and Clusters manual for more information.

    Search head

    A search head is a Splunk Enterprise instance that distributes searches toindexers (referred to as "search peers" in this context). Search heads can beeither dedicated or not, depending on whether they also perform indexing.Dedicated search heads don't have any indexes of their own, other than theusual internal indexes. Instead, they consolidate and display results that originatefrom remote search peers.

    See "What is distributed search" in the Distributed Search Manual to configure asearch head to search across a pool of indexers.

    2

  • Forwarder

    Forwarders are Splunk instances that forward data to remote indexers for dataprocessing and storage. In most cases, they do not index data themselves. Seethe "About forwarding and receiving" topic in the Forwarding Data manual.

    Deployment server

    A Splunk Enterprise instance can also serve as a deployment server. Thedeployment server is a tool for distributing configurations, apps, and contentupdates to groups of Splunk Enterprise instances. You can use it to distributeupdates to most types of Splunk components: forwarders, non-clusteredindexers, and non-clustered search heads. See "About deployment server andforwarder management" in the Updating Splunk Enterprise Instances manual.

    Functions at a glance

    Functions Indexer Search head Forwarder Deploymentserver

    Indexing xWeb xDirect search xForward to indexer xDeployconfigurations x x x

    Index replication and indexer clusters

    An indexer cluster is a group of indexers configured to replicate each others'data, so that the system keeps multiple copies of all data. This process is knownas index replication. By maintaining multiple, identical copies of data, indexerclusters prevent data loss while promoting data availability for searching.

    Splunk Enterprise clusters feature automatic failover from one indexer to thenext. This means that, if one or more indexers fail, incoming data continues toget indexed and indexed data continues to be searchable.

    In addition to enhancing data availability, clusters have other features that youshould consider when you are scaling a deployment, for example, a capability tocoordinate configuration updates easily across all indexers in the cluster.Clusters also include a built-in distributed search capability. See "About clusters

    3

  • and index replication" in the Managing Indexers and Clusters of Indexers manual.

    Dimensions of a Splunk Enterprise deploymentA Splunk Enterprise deployment has many dimensions. These scenariosdetermine whether a single reference machine can handle indexing and searchload.

    In some cases, a single reference machine can collect, store, and search dataefficiently. In other cases, consider adding machines to your Splunk Enterprisedeployment to increase performance. Below is a list of items that can have asignificant impact on Splunk Enterprise performance.

    Amount of incoming data. The more data you send to SplunkEnterprise, the more time it needs to process the data into events that youcan search, report, and generate alerts on.

    Amount of indexed data. As the amount of data stored in a SplunkEnterprise index increases, so does the I/O bandwidth needed to storedata and provide results for searches.

    Number of concurrent users. If more than one person at a time uses aninstance of Splunk Enterprise, that instance requires more resources forthose users to perform searches and create reports and dashboards.

    Number of saved searches. If you plan to invoke a lot of savedsearches, Splunk Enterprise needs capacity to perform those searchespromptly and efficiently. A higher search count over a given period of timerequires more resources.

    Types of search you use. Almost as important as the number of savedsearches is the types of search that you run against a Splunk Enterpriseinstance. There are several types of search, each of which affects how theindexer responds to search requests.

    Whether or not you run Splunk apps. Splunk apps and solutions canhave unique performance, deployment, and configuration considerations.If you plan to run apps, consider the resource requirements of the appsthe you are using. See the documentation for the app for moreinformation.

    4

  • How do these dimensions impact overall performance?

    While these factors have an impact on the basic sizing requirements of yourSplunk Enterprise deployment, addressing each of them individually does notguarantee peak performance gain for the deployment. You must discoverthrough trial how these factors correlate with one another in your specificapplication.

    For example, if your Splunk Enterprise deployment calls for a low amount ofindexing but has a high number of concurrent users, it has significantly differentresource needs than a setup with a low number of concurrent users and a highamount of daily indexing volume. Additionally, as both user count and amount ofindexed data rise, you must distribute the environment across multiple servers tomaintain a similar performance level. Search types complicate matters, becausesome searches strain available CPU resources, while others depend on thespeed of the disk subsystem.

    When should I scale my Splunk Enterprise deployment?

    You must understand how the deployment dimensions described in this topicapply to your specific use case. Answer the following questions, and then refer tothe performance checklist in this manual to determine when you should add morehardware resources:

    How much data do you expect to index daily? How much data do you need to retain and for how long? How many users do you expect to search through the data at any onetime?

    Do you plan to use certain specific searches more than once? Do you want or need to use a Splunk app to present or manipulate yourdata?

    The key to a well-performing installation is to develop a plan early in thedeployment cycle to account for both your initial outlay of hardware resourcesand the addition of resources when the deployment scales up.

    How incoming data affects Splunk EnterpriseperformanceA reference Splunk Enterprise indexer can index a significant amount of data in ashort period of time: over 20 megabytes of data per second or over 1.7 terabytes

    5

  • per day. This level of indexing occurs if the server is doing nothing else butconsuming data.

    Because Splunk Enterprise instances do more than index, consider this figurethe maximum throughput for an indexer. Performance changes depending on thesize and amount of incoming data. Larger events slow down indexingperformance. As events increase in size, the indexer uses more system memoryto process and index them.

    If you need more indexing capacity than a single indexer can provide, addindexers into the deployment to account for the increased demand.

    See the topics in this chapter to learn how other factors impact this performancefigure.

    How indexed data affects Splunk EnterpriseperformanceAfter Splunk Enterprise consumes data and places it into indexes, those indexesgrow and take up disk space. As the indexes grow and available disk spacedecreases, Splunk Enterprise takes more time to index incoming data becausethe indexer's disk subsystem takes more time to find space to store the data.

    This growth has an impact on search, as well. On a single indexer, diskthroughput splits between indexing, which is ongoing, and search requests,which are interrupts based on requests scheduled by users. As indexes grow,search slows down because the disk subsystem needs to account for searchrequests, and it also needs to handle increasingly longer requests to storeincoming data. Depending on the type of search, those kinds of requests can beI/O-intensive.

    How concurrent users affect Splunk EnterpriseperformanceA reference indexer needs to dedicate one of its available CPU cores for everysearch that a user invokes for as long as the search is active. If multiple usersare logged in and running searches, the number of available CPU cores can beexhausted quickly.

    6

  • These figures assume that CPUs are idle when they receive a login or searchrequest. This does not account for other system requests or CPU cores used bySplunk Enterprise to index data. If they are processing any other systemrequests, then the load splits across other available CPUs.

    As CPU cores are used up, all activities on an indexer slow down as thecomputer splits processing time between indexing, search, and handling on-lineusers. Only additional indexers can increase capacity for all three functions ofSplunk Enterprise operation.

    How saved searches / reports affect SplunkEnterprise performanceOn a reference indexer, a saved search or report consumes about 1 CPU coreand a specified amount of memory while it executes. It behaves like an ad-hocsearch. A saved search also increases the amount of disk I/O temporarily as thedisk subsystem looks through the indexes to fetch data.

    Each additional saved search that executes at the same time consumes anadditional CPU core. This consumption is separate from CPU usage from theoperating system and Splunk Enterprise indexing and storage processes.

    If more saved searches execute than can be accepted for processing, they waitin a queue until they can be serviced. Splunk Enterprise also warns you when thesystem reaches the maximum number of queued saved searches. Whensearches queue up, search results return more slowly.

    Adding search heads provides additional CPU cores to run more concurrentsearches. Adding indexers helps scale with the increased search load andconcurrency that comes from adding search heads. Adding RAM to your existingmachines helps with concurrent searches but does not give you additional searchcapacity.

    How search types affect Splunk EnterpriseperformanceYou can invoke four types of searches against data stored in a Splunk Enterpriseindex. Each search type impacts the indexer in a different way.

    7

  • The following table summarizes the different search types. For dense and sparsesearches, Splunk Enterprise measures performance based on number ofmatching events. With super-sparse and rare searches, performance ismeasured based on total indexed volume.

    Search type Description Ref. indexerthroughput

    Performanceimpact

    Dense

    Returns a large percentage(10% or more) of matchingresults for a given set of data ina given period of time. Densesearches usually tax a server'sCPU first, because of theoverhead required todecompress the raw data storedin a Splunk Enterprise index.

    Up to 50,000matchingevents persecond.

    CPU-bound

    Sparse

    Returns a smaller amount ofresults for a given set of data ina given period of time (anywherefrom .01 to 1%) than do densesearches.

    Up to 5,000matchingevents persecond.

    CPU-bound

    Super-sparse

    Returns a small number ofresults from each index bucketthat matches the search. Asuper-sparse search is I/Ointensive because the indexermust look through all of thebuckets of an index to find theresults. If you have a largeamount of data stored on yourindexer, there are a lot ofbuckets, and a super-sparsesearch can take a long time tofinish.

    Up to 2seconds perindex bucket.

    I/O bound

    Rare Similar to a super-sparsesearch, but receives assistancefrom bloom filters, which helpeliminate index buckets that donot match the search request.Rare searches return resultsanywhere from 20 to 100 times

    From 10 to 50index bucketsper second.

    I/O bound

    8

  • faster than does a super-sparsesearch.

    How Splunk apps affect Splunk EnterpriseperformanceA single Splunk Enterprise indexer can run multiple apps simultaneously. SplunkEnterprise includes several apps which it runs at the same time.

    However, the more complex apps offer advanced views that require the use ofsummarizing and accelerating searches that run in the background. The morebackground processing an app needs, the more likely you must distribute theprocessing load across multiple machines.

    Many apps require a distributed Splunk Enterprise deployment by design.Whether it is a case of universal forwarders fetching data and sending it to asingle central instance, or many indexers and search heads connected togetherand serving up reports, dashboards, or alerts, Splunk apps often need more thanone server to realize both maximum performance and potential in the enterprise.

    How Splunk apps affect resource requirements

    If you use a Splunk app or solution that gets knowledge by executing a largenumber of saved searches, then you can overwhelm a single-server SplunkEnterprise instance. Multiple searches quickly exhaust available CPU resourceson an indexer. See "Accommodate many simultaneous searches" in this manual.

    When you install an app or solution, read the system requirements outlined inthat app or solution's documentation. If the information is not available, contactthe authors of the app or solution to get information about what you need to runthe app properly.

    How Splunk Enterprise calculates disk storageAt a high level, Splunk calculates total disk storage as follows:

    ( Daily average indexing rate ) x ( retention policy ) x 1/2

    Splunk Enterprise stores raw data at up to approximately half its original size withcompression. On a volume that contains 500GB of usable disk space, you can

    9

  • store nearly six months' worth of data at an indexing rate of 5GB/day or ten days'worth at a rate of 100GB/day.

    If you need additional storage, you can opt for either more local disks, which isrequired for frequent searching, or you can use attached or network storage,which is acceptable for occasional searching. Low-latency connections over NFSor SMB/CIFS (Server Message Block/Common Internet File System) areacceptable for searches over long time periods where instant search returns canbe compromised to lower cost per GB.

    Important: Shares mounted over a Wide Area Network (WAN) connection or onstandby storage such as tape are never suitable storage choices for SplunkEnterprise operations.

    Estimate your storage requirementsThis topic describes how to estimate the size of your Splunk Enterprise index, sothat you can plan your storage capacity requirements.

    When Splunk Enterprise indexes your data, it creates two main types of files: the"rawdata" file that contains the original data in compressed form and the indexfiles that point to this data. (It also creates a few metadata files, which don'tconsume much space.) With a little experimentation, you can estimate how muchindex disk space you will need for a given amount of incoming data.

    Typically, the compressed rawdata file is 10% the size of the incoming,pre-indexed raw data. The associated index files range in size fromapproximately 10% to 110% of the rawdata file. The number of unique terms inthe data affect this value.

    Depending on the data's characteristics, you might want to tune yoursegmentation settings, as described in "About segmentation" in the Getting DataIn Manual.

    The best way to get an idea of your space needs is to experiment by indexing arepresentative sample of your data, and then checking the sizes of the resultingdirectories in $SPLUNK_HOME/var/lib/splunk/defaultdb.

    On *nix systems, follow these steps

    Once you've indexed your data sample:

    10

  • 1. Go to $SPLUNK_HOME/var/lib/splunk/defaultdb/db.

    2. Run du -ch hot_v* and look at the last total line to see the size of the index.

    On Windows systems, follow these steps

    1. Download the du utility from Microsoft TechNet.

    2. Extract du.exe from the downloaded ZIP file and place it into your%SYSTEMROOT% or %WINDIR% folder.

    Note: You can also place it anywhere in your %PATH%.

    3. Open a command prompt.

    4. Once there, go to %SPLUNK_HOME%\var\lib\splunk\defaultdb\db.

    5. Run del %TEMP%\du.txt & for /d %i in (hot_v*) do du -q -u %i\rawdata| findstr /b "Size:" >> %TEMP%\du.txt.

    6. Open the %TEMP%\du.txt file. You will see Size: n, which is the size of eachrawdata directory found.

    7. Add these numbers together to find out how large the compressed persistedraw data is.

    8. Next, run for /d %i in (hot_v*) do dir /s %i, the summary of which is thesize of the index.

    9. Add this number to the total persistent raw data number.

    This is the total size of the index and associated data for the sample you haveindexed. You can now use this to extrapolate the size requirements of yourSplunk Enterprise index and rawdata directories over time.

    Answers

    Have questions? Visit Splunk Answers to see what questions and answers otherSplunk users had about data sizing.

    11

  • Scale your Splunk Enterprise Deployment

    Distribute indexing and searchingThis topic discusses the concepts and hardware requirements for distributing theindexing and searching components of your Splunk Enterprise deployment.

    Concepts of distributed indexing and searching

    Scale your Splunk Enterprise deployment by distributing searching and indexingacross multiple machines. Indexers bring in, store, and search the data. Searchheads manage search requests and present results.

    Because indexers require much more disk I/O throughput than search heads do,give your environment more indexing capacity by reducing the overhead requiredfor searching. The key points are as follows:

    When you add indexers to the deployment, the aggregate rate of dataconsumption and total available storage increases, which providesadditional capacity to manage increased search loads.

    When you add search heads, you increase the number of concurrentusers that the deployment can handle, thus providing capacity forincreased search load.

    The more search heads you add to the deployment, the faster you can find thedata that you indexed.

    Considerations for search performance versus indexingperformance

    While the two points in the previous section are best practice for improvingindexing speed, some caveats apply, particularly when it comes to search speed.

    As your indexers consume data, they store it in buckets, which are the individualelements of an index. As more data comes in, the number of buckets increases.A higher number of buckets can impact search speed because of the throughputrequired to navigate through those buckets for the data you want. This impact isnoticeable for index buckets that hold smaller amounts of data.

    12

  • As the number of buckets increases, the indexer must manage the buckets,which it does by "rolling" them to make room for new incoming data. Thisprocedure takes up I/O cycles as well, which reduces the number of I/O cyclesavailable to fetch events for search requests.

    The key points to remember are:

    Adding search heads to your distributed deployment does not guaranteeimproved search performance. A mix of search heads and indexers is vitalfor performance increases.

    The number and types of search also impact indexer performance. Somesearch types tax an indexer's CPU, while others apply pressure to the disksubsystem.

    See "Accommodate concurrent users and searches" in this manual for details onsimultaneous searches.

    Accommodate many simultaneous searchesThis topic provides high-level guidance on how to accommodate manysimultaneous searches, search types, and concurrent users with your distributedSplunk Enterprise deployment.

    Primary search performance impacts

    The biggest performance impacts on a Splunk Enterprise deployment are:

    Number of concurrent users. Number of concurrent searches. Types of searches you use during the course of operation.

    Each of these elements affects the deployment in different ways.

    How concurrent users and searches affect performance

    When a user submits a search request, the search request can take up to oneCPU core on each indexer to process while it runs, by default. Any additionalsearches that the user submits also account for one CPU core. You can adjustthe number of global concurrent searches that a machine can run. See"Expected performance and known limitations of real-time searches and reports"in the Search Manual.

    13

  • The type of search the user invokes also impacts resource usage. The additionalamount of CPU or disk usage varies depending on the search type. See "Howsearch types affect Splunk Enterprise performance" in this manual.

    How to maximize search performance

    The best way to address the resource overhead for many concurrent searches isto configure the deployment to handle search requests completely withinavailable physical machine memory. Do this by adding as much memory tosystems as is feasible. While this is important for the machines that youdesignate as search heads, it is particularly important for indexers in yourdeployment, because indexers must index incoming data and search existingdata.

    For example, in a deployment that has up to 48 concurrent searches happeningat a time, a single search that uses 200MB of memory translates to nearly 10GBof memory required to satisfy the 48 concurrent search requests. The amount ofavailable memory is an important statistic. While performance on an indexerdeclines gradually with increased CPU usage from concurrent search jobs, itdrops dramatically when the machine exhausts all available physical memory. Asingle reference indexer cannot handle 48 concurrent searches. At the very least,the machine needs additional memory for satisfactory performance.

    Search performance: The ideal

    Despite the memory constraints, the run time of a search increases proportionallyas the number of free CPU cores on a machine decreases. For example, on anidle system with 8 available cores, the cores on the machine service the first 8searches to arrive and finish within a short period of time, for this example, 10seconds.

    If 48 searches run concurrently, then completion time increases significantly asshown in the calculation.

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec. perindividual

    search

    = Approx.time (sec.) per

    search48 8 6 10 60Because indexers do the bulk of the work in search operations (reading data offdisk, decompressing it, extracting knowledge and reporting), it is best practice toadd indexers to decrease the amount of time per search. In the next example, toreturn to the performance level of 10-second searches, deploy 6 indexers with 8

    14

  • cores each.

    8 consecutive searches - 6 indexers with 8 cores per indexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofcores per

    search

    x No. of sec. perindividual

    search

    = Approx. time(sec.) per

    search8 48 6 10 1.648 consecutive searches - 6 indexers with 8 cores per indexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofcores per

    search

    x No. of sec. perindividual

    search

    = Approx. time(sec.) per

    search48 48 1 10 10In the previous example, one search head to service search requests isacceptable. It might be appropriate to set aside a second search head to createsummary indexes.

    Search performance: The reality

    In many cases, the system is not idle before searches arrive. For example, if anindexer handles 150GB/day of data at peak times, then it can use up to 4 of the 8available cores for indexing that data. In this case, search times increasesignificantly.

    4 concurrent searches - one indexer with 4 of 8 cores available per indexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofcores per

    search

    x No. of sec. perindividual

    search

    = Approx. time(sec.) per

    search4 4 1 10 1048 concurrent searches - one indexer with 4 of 8 cores available perindexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec. perindividual

    search

    = Approx.time (sec.) per

    search48 4 12 10 120Increasing the number of cores per machine decreases the amount of time takenper search, but is not the most effective way to streamline search operations.

    15

  • One system with 16 cores has the following performance.

    16 concurrent searches - one indexer with 12 of 16 cores available:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec. perindividual

    search

    = Approx.time (sec.) per

    search16 12 1.33 10 13.348 concurrent searches - one indexer with 12 of 16 cores available:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec.per individual

    search

    = Approx. Totaltime (sec.) per

    search48 12 4 10 40Two 8-core machines have the following performance profile.

    8 concurrent searches - two indexers with 4 of 8 cores available perindexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofcores per

    search

    x No. of sec. perindividual

    search

    = Approx. time(sec.) per

    search8 8 1 10 1016 concurrent searches - two indexers with 4 of 8 cores available perindexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec. perindividual

    search

    = Approx.time (sec.) per

    search16 8 2 10 2048 concurrent searches - two indexers with 4 of 8 cores available perindexer:

    No. ofconcurrentsearches

    / No. ofavail.cores

    = No. ofsearches per

    core

    x No. of sec. perindividual

    search

    = Approx.time (sec.) per

    search48 8 6 10 60Two 8-core machines cost only slightly more than one 16-core machine. Inaddition, two 8-core machines provide significantly more available disk

    16

  • throughput than one 16-core machine does, based on hard disk spindle count.This is important for indexers because of the high disk bandwidth that theyrequire.

    Adding indexers reduces the indexing load on any system and frees CPU coresfor searching. Also, because the performance of almost all types of search scaleswith the number of indexers, searches will be faster, which mitigates the effect ofreduced performance from sharing resources amongst both indexing andsearching. Increasing search speed reduces the chance of concurrent searcheswith concurrent users.

    In real-world situations with hundreds of users, each user runs a search everyfew minutes, although not at the exact same time as other users. By addingindexers and reducing the search time, you reduce the concurrency factor andlower the resultant I/O and memory contention.

    17

  • Performance Reference

    Reference hardwareA reference machine helps you understand when to scale and distribute a SplunkEnterprise deployment. Refer to this configuration as the standard for theremainder of this chapter and whenever you need to size the hardware needs ofyour environment.

    Reference machine for single-instance deployments

    The following machine requirements represent the basic building block of aSplunk Enterprise deployment.

    Intel x86 64-bit chip architecture 2 CPUs, 6 cores per CPU (12 CPU cores total), at least 2Ghz per core 12GB RAM Standard 1Gb Ethernet NIC, optional second NIC for a managementnetwork

    Standard 64-bit Linux or Windows distribution

    Disk subsystem

    The disk subsystem for a reference machine should be capable of handling ahigh number of averaged Input/Output Operations Per Second (IOPS).

    IOPS are a measurement of how much data throughput a hard drive canproduce. Because a hard drive reads and writes at different speeds, there areIOPS numbers for disk reads and writes. The average IOPS is the blend betweenthose two figures.

    The more average IOPS a hard drive can produce, the more data it can indexand search in a given period of time. While many variable items factor into theamount of IOPS that a hard drive can produce, the following are three mostimportant elements:

    Its rotational speed in revolutions per minute. Its average latency, which is the amount of time it takes to spin its plattershalf a rotation.

    Its average seek time, which is the amount of time it takes to retrieve arequested block of data.

    18

  • To get the most IOPS out of a hard drive, choose drives that have high rotationalspeeds and low average latency and seek times. Every drive manufacturerprovides this information, and some provide much more.

    For information on IOPS and how to calculate them, see the following articles:

    "Getting the hang of IOPS(http://www.symantec.com/connect/articles/getting-hang-iops-v13) onSymantec's Connect Community.

    "Analyzing I/O performance in Linux(http://www.cmdln.org/2010/04/22/analyzing-io-performance-in-linux) onCMDLN.ORG (A sysadmin blog).

    For this application, we use eight 146-gigabyte, 15,000 RPM serial-attachedSCSI (SAS) HDs in a Redundant Array of Independent Disks (RAID) 1+0 faulttolerance scheme as the disk subsystem. Each hard drive is capable of about200 average IOPS. The combined array produces a little over 800 IOPS.

    Important: Disk I/O often constrains Splunk Enterprise first, so always considerdisk infrastructure first when specifying your hardware.

    Performance capability

    The reference machine produces the following index and search performancemetrics for a given sample of data.

    Indexing performance

    Up to 20MB per second (1700GB per day) of raw indexing performance,as long as no other Splunk Enterprise related activity occurs.

    Search performance

    Up to 50,000 events per second for dense searches. Up to 5,000 events per second for sparse searches. Up to 2 seconds per index bucket for super-sparse searches. From 10 to 50 buckets per second for rare searches with bloom filters.

    To find out more about the types of searches and how they affect SplunkEnterprise performance, see "How search types affect Splunk Enterpriseperformance" in this manual.

    19

  • Reference machine for distributed deployments

    At higher data indexing rates and user counts, consider the different needs ofindexers and search heads. Dedicated search heads do not need extremely fastdisk throughput, nor do they need much local storage. They require far moreCPU resources than do indexers.

    The following recommendations are for both search heads and indexers in adistributed deployment.

    Dedicated search head

    Intel 64-bit chip architecture 4 CPUs, 4 cores per CPU (16 CPU cores total), at least 2Ghz per core 12GB RAM 2 x 300GB, 10,000 RPM SAS hard disks, configured in RAID 1 Standard 1Gb Ethernet NIC, optional 2nd NIC for a management network Standard 64-bit Linux or Windows distribution

    Because search heads require raw computing power and are likely to becomeCPU-bound, add additional CPU cores to a search head for faster performanceper server. If search workload dictates, a search head might require 4 CPUs with6 cores per CPU. Regardless, the guideline of 1 core per active user applies.Additionally:

    Account for scheduled searches in your CPU allowance. A typical searchrequest requires 1 CPU core.

    Depending on the type of search you use, you might need to add CPUcores to account for the increased load those search types cause.

    Indexer

    Indexers in a distributed deployment configuration have equivalent disk I/Obandwidth requirements to indexers in a single-instance deployment. Tocompensate for the increase in search requests and higher data indexing volumein a distributed environment, distribute the work across many indexers.

    Intel 64-bit chip architecture. 2 CPUs, 6 cores per CPU (12 CPU cores total), at least 2Ghz per core. 12GB RAM. Disk subsystem capable of 800 average IOPS. See "Disk subsystem." Standard 1Gb Ethernet NIC, with optional second NIC for a managementnetwork.

    20

  • Standard 64-bit Linux or Windows distribution.

    At higher daily volumes, local disk does not provide cost-effective storage for thetime frames where you want a fast search. In these cases, deploy fast attachedstorage or networked storage, such as storage area networks (SAN) over fiber,that can provide the required IOPS per indexer. While there are too many typesof storage to recommend, consider these guidelines when you plan your storageinfrastructure:

    Indexers do many bulk reads. Indexers do many disk seeks.

    Therefore:

    More disks (specifically, more spindles) are better for indexingperformance.

    Total throughput of the entire system is important. The ratio of disks to disk controllers in a particular system should behigher, similar to how you configure a database server.

    Ratio of indexers to search heads

    Technically, there is no practical limitation on the number of search heads anindexer can support or on the number of indexers a search head can searchagainst. However, systems limitations suggest a ratio of approximately 8indexers to 1 search head as a rough guideline. If you have many searcherscompared to your total data volume, then more search heads will increasesearch efficiency. In some cases, the best use of a separate search head is topopulate summary indexes. This search head acts like an indexer to theprimary search head that users log into.

    Virtual hardware

    Splunk Enterprise performs fastest when deployed directly on to bare-metalhardware. However, Splunk Enterprise can deliver on virtual equipment, andSplunk supports deploying it that way.

    Using the bare-metal hardware as a baseline, Splunk Enterprise generallyindexes data about 10-15% slower on a virtual machine than it does on astandard bare-metal machine. Search performance is on par with the real-worldhardware.

    21

  • This is a best-case scenario that does not account for resource contention withother active virtual machines on the same physical server. It also does notaccount for certain vendor-specific I/O enhancement techniques, such as DirectI/O or Raw Device Mapping.

    See this Tech Brief for recommendations for running Splunk on VMware.

    Splunk Enterprise, self-managed in the cloud

    Running Splunk Enterprise in the cloud is a great alternative to running it onbare-metal hardware.

    Given the hardware resources described earlier in this topic, Splunk Enterpriseshould deliver the same levels of performance on cloud infrastructure as it doeson bare-metal hardware. That said, you should keep a close eye on what yourcloud infrastructure actually delivers.

    For example, if you run Splunk Enterprise on an Amazon Web Services (AWS)instance, note that:

    AWS measures CPU power on Elastic Compute Cloud (EC2) instances invirtual CPUs (vCPUs), not real CPUs.

    Each vCPU is a hyper thread of an Intel Xeon core on most AWS instancetypes. See "AWS | Amazon EC2 | Instance Types"(http://aws.amazon.com/ec2/instance-types/) on the AWS site.

    As a hyper thread of a core, a vCPU acts as a core, but the physical coremust schedule its workload among other workloads of other vCPUs thatthe physical core handles.

    For indexing and data storage, note that:

    If you choose to use Elastic Block Storage (EBS), the type of EBS volumeyou choose determines the amount of performance you get.

    Not all EBS volume types have the necessary IOPS to handle SplunkEnterprise operations.

    The "Provisioned IOPS" and "Magnetic" EBS volume types offer the bestopportunity to get the required IOPS necessary for indexing andsearching. See "EBS - Product Details"(http://aws.amazon.com/ebs/details/) on the AWS site.

    Not every EC2 instance type offers the network throughput to the EBSvolume that you need. To ensure that bandwidth you must either launchthe instance4 as "EBS-optimized" or choose an instance type thatprovides a minimum of 10Gb of bandwidth. See "Amazon EC2 Instance

    22

  • Configuration"(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-config.html)on the AWS site.

    For forwarding, note that the proximity of your cloud infrastructure to yourforwarders can have a major impact on performance of the whole environment.

    Splunk Cloud

    Splunk offers its machine data platform and licensed software as a subscriptionservice called Splunk Cloud. When you subscribe to the service, you purchase acapacity to index, store, and search your machine data. Splunk Cloud abstractsthe infrastructure specification from you and delivers high performance on thecapacity you have purchased.

    Performance checklistThese guidelines help you decide when to distribute your Splunk Enterprisedeployment.

    This questionnaire assumes that you have a single-instance Splunk Enterprisedeployment based on the reference architecture described in "Referencehardware."

    Determine when to scale your Splunk Enterprise deployment

    Before you consider when to scale, estimate how much data you need to indexand whether you need more than one concurrent Splunk Enterprise user tosearch that data.

    Depending on how much data you index and how many concurrent users yourequire, you might need to scale your environment to multiple machines. Even ifyour indexing amount and user count falls within the capabilities of a singleserver, you might have to distribute your deployment based on the types ofsearches you use and whether you use summary indexes.

    To run a Splunk app or solution in your Splunk environment, or to createelements that generate a large number of saved searches, you might have todistribute Splunk Enterprise components across a number of machines.

    23

  • Question 1: Do you want to create or run a Splunk app, alert, or solutionthat executes a large number (more than 8 concurrently) of savedsearches?

    A saved search is a search that a user saves in Splunk Enterprise to make itavailable for later use. The number of saved searches, especially those that runconcurrently, has a direct impact on a Splunk server's performance. If youanswer No, then go to Question 2. You don't yet need to consider scaling yourSplunk Enterprise deployment to multiple machines.

    If you answer Yes, then scale your Splunk Enterprise deployment to multiplemachines.

    Question 2: Do you need to index more than 2GB of data per day?

    Question 3: Do you need more than two users signed in at one time?

    If you answer No to questions 2 and 3, then your Splunk Enterprise instance canshare one of the reference servers with other services, with the caveat thatSplunk Enterprise must have sufficient disk I/O bandwidth on the sharedmachine.

    If you answer Yes to question 2 or 3, then proceed to Question 4.

    Caution To deploy Splunk Enterprise on Windows, do not share full SplunkEnterprise services on servers that run Microsoft Exchange, Active Directorydomain services, or machine virtualization software. Those services are oftendisk I/O intensive and can reduce indexing and search performance. Additionally,make sure that antivirus software installed on the server does not scan theSplunk Enterprise installation directory.

    Question 4: Do you need to index more than 250GB per day?

    Question 5: Do you need more than four concurrent users?

    If you answer No to questions 4 and 5, then a single dedicated Splunk Enterpriseinstance running on a reference machine should be able to handle indexing andsearch workload.

    If you answer Yes to question 4 or 5, then go to Question 6.

    24

  • Question 6: Do you need more than 500GB of total storage?

    See "How Splunk Enterprise calculates disk storage."

    If you answer No, then a single dedicated reference machine should be able tohandle indexing and search workload, but you might need to add fast storage tothe system to account for the increased disk usage.

    If you answer Yes, then consider scaling your deployment to additional indexersto handle the increased demand of indexing and searching.

    Question 7: Do you need to search large quantities of data for a small set(less than 1 per cent) of results?

    Searches that cover large quantities of data and return small sets of results arecalled super-sparse searches. These searches require lots of disk I/O, becausethe indexer must search a number of buckets to find the data you're looking for.

    If you answer No, then you do not need to scale your deployment. However,adding additional indexers improves both indexing and search performance.

    If you answer Yes, then consider scaling your deployment.

    Summary of performance recommendationsThe Daily Indexing Volume table summarizes the performance recommendationsthat were given in the performance checklist. The table shows the number ofreference machines that you need to index and search data in Splunk Enterprise,depending on the number of concurrent users and the amounts of data that theinstance indexes.

    See "Reference hardware" in this manual for a reminder of what the currentreference machine is.

    The table shows approximate guidelines only. Modify these figures based onyour use case. If you need help, contact Splunk. You might want to engage amember of Professional Services depending on the deployment's initial size.

    Daily Indexing Volume

    25