Copyright © 2014 Splunk Inc.
Damien DallimoreDev Evangelist , CSO Office @ Splunk
Getting the Message
Nimish DoshiPrincipal Systems Engineer @ Splunk
2
DisclaimerDuring the course of this presentation, we may make forward looking statements regarding future events or the
expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important
factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other
commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.
3
Agenda
Damien’s Section
What is messagingJMS + DemoAMQP + DemoKafka + DemoCustom message handlingArchitecting for scale
Nimish’s Section
Using ZeroMQUsing JMS for underutilized computersQuestion time
Damien’s Section
5
From Middle Earth
Make Splunk Apps & Add-ons
Messaging background
6
7
apps.splunk.com
github.com/damiendallimore
8
What is messaging ?Messaging infrastructures facilitate the sending/receiving of messages between distributed systems
Message can be encoded in one of many available protocols
A common paradigm involves producers and consumers exchanging via topics or queues
Topics (publish subscribe)
Queues (point to point)
QUEUE
TOPIC
9
Why are messaging architectures used ?
Integrating Legacy Systems
Integrating Heterogeneous Systems
Distributed Applications
Cluster Communication
High Performance Streaming
10
There’s a lot of information in the pipes
11
The data opportunity
Easily tap into a massive source of valuable inflight data flowing around the veins
Don’t need to access the application directly ,pull data off the messaging bus
I can not think of a single industry vertical that does not use messaging
12
Getting this data into Splunk
Many different messaging platforms and protocols
JMS (Java Message Service)
AMQP (Advanced Message Queueing Protocol)
Kafka
Nimish will cover some more uses cases also
13
JMS
DEMO
Not a messaging protocol , but a programming interface to many different underlying message providers
WebsphereMQ , Tibco EMS , ActiveMQ , HornetQ , SonicMQ etc…
Very prevalent in the enterprise software landscape
14
AMQP
DEMO
RabbitMQ
Supports AMQP 0.9.1, 0.9, 0.8
Common in financial services and environments that need high performance and low latency
15
Kafka
DEMO
Cluster centric design = strong durability and fault tolerance
Scales elastically
Producers and Consumers communicate via topics in a Kafka node cluster
Very popular with open source big data / streaming analytics solutions
16
Custom message handling
These Modular Inputs can be used in a multitude of scenarios
Message bodies can be anything : JSON, XML, CSV, Unstructured text, Binary
Need to give the end user the ability to customize message processing
So you can plugin your own custom handlers
Need to write code , but it is really easy , and there are examples on GitHub
I’m a big data pre processing fan
17
Cut the code
18
Compile, bundle into jar file, copy to Splunk
19
Declaratively apply it
Let’s see if it works
20
Achieving desired scale
AMQP Queue
AMQP Mod Input
Single Splunk Instance
With 1 Modular Input instance , only so much performance / throughput can be achieved
You’ll hit limits with JVM heap , CPU , OS STDIN/STDOUT Buffer , Splunk indexing pipeline
21
So go Horizontal
AMQP Queue
Universal Forwarders
Splunk Indexer Cluster
AMQP Broker
AMQP Mod Input AMQP Mod Input
Nimish’s Section
23
About Me
• Principal Systems Engineer at Splunk in the NorthEast• Session Speaker at all past Splunk .conf user conferences• Catch me on the Splunk Blogs
24
Problem with Getting Business Data from JMS
The goal is to index the business message contents into SplunkMessage Uncertainty Principal:If you de-queue the message to look at it, you have affected the TXNIf you use various browse APIs for content, you may miss it– Message may have already been consumed by TXN
Suggestion: Use a parallel queue to log the message– Suggestion: Try ZeroMQ
25
Why use ZeroMQ
Light WeightMultiple Client language support (Python, C++, Java, etc)Multiple design patterns (Pub/Sub, Pipeline, Request/Reply, etc)Open Source with community support
26
Application Queue and ZeroMQ Example
Auto Load Balance
1
2
27
Example Python Sender
context = zmq.Context()socket = context.socket(zmq.PUSH)socket.connect('tcp://127.0.0.1:5000')sleeptime=0.5
while True: num=random.randint(50,100) now = str(datetime.datetime.now()) sleep(sleeptime) payload = now + " Temperature=" + str(num) socket.send(payload)
28
Python Receiver (Scripted Input)
context = zmq.Context()socket = context.socket(zmq.PULL)# Change address and port to match your environmentsocket.bind("tcp://127.0.0.1:5000")
while True: msg = socket.recv() print "%s" % msgexcept: print "exception"
29
Python Subscriber (Scripted Input)
context = zmq.Context()socket = context.socket(zmq.SUB)
socket.connect ("tcp://localhost:5556")
# Subscribe to directionfilter = "east"socket.setsockopt(zmq.SUBSCRIBE, filter)
while True: string = socket.recv() print string
30
Parallel Pipeline Example
31
Getting Events out of SplunkSplunk SDK
Use Cases:– In Depth processing of Splunk events in a queued manner– Use as pivot point to drop off events into a Complex Event Processor– Batch Processing of Splunk events outside of Splunk
Divide and Conquer Approach as seen in last slide
32
Java Example using SDK to load ZeroMQString query=search;Job job = service.getJobs().create(query, queryArgs);while (!job.isDone()) {
Thread.sleep(100);job.refresh();
}// Get Query Results and store in String str… (Code Omitted)// Assuming single line events StringTokenizer st = new StringTokenizer(str, "\n");while(st.hasMoreTokens()) {
String temp= st.nextToken();sock.send(temp.getBytes(), 0);byte response[] = sock.recv(0);
}
33
Idle Computers at a Corporation
…
34
Idea: Use Ideas from SETI @ Home
35
Idle Computers Put to Work Using JMS
…
36
Applications for Distributing Work
Application Server would free up computing resourcesWork could be pushed to underutilized computersExamples:– Massive Mortgage Calculation Scenarios– Linear Optimization Problems– Matrix Multiplication– Compute all possible paths for combinatorics
37
Architecture
Optional
38
Algorithm
Application servers push requests to queues, which may include data in the request object called a Unit of WorkJMS client implements doWork() interface to work with dataMessage Driven Bean receives finished work and implements doStore() interfaceWhat does this have to do with Splunk?– Time Series results can be stored in Splunk for further or historical analytics
39
Matrix Example High Level Architecture
40
Search Language Against Matrix ResultList Column Values of Each Stored Multiplied Matrix using Multikv
Screenshot here
41
Search Language Against Matrix ResultVisualize the Average for Columns 2 to 5
Screenshot here
42
Search Language Against Matrix ResultPerform arbitrary math on aggregate columns
Screenshot here
43
Reference
ZeroMQ– http://apps.splunk.com/app/1000/– Blog: http://blogs.splunk.com/2012/06/08/zeromq-as-a-splunk-input/
Using JMS for Underutilized Computers– Github Reference: https://github.com/nimishdoshi/JMSClientApp/– Blog: http://blogs.splunk.com/2014/04/11/splunk-as-a-recipient-on-the-jms-grid/– Article:http:
//www.oracle.com/technetwork/articles/entarch/jms-distributed-work-082249.html
Questions ?