13
Jan 2014, HAPPIEST MINDS TECHNOLOGIES Innovation @Work Log Management with Logstash and ElasticSearch Rishav Rohit SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL

Log management with_logstash_and_elastic_search

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Log management with_logstash_and_elastic_search

Jan 2014, HAPPIEST MINDS TECHNOLOGIES

Innovation @Work Log Management with Logstash and ElasticSearch

Rishav Rohit

SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.

Page 2: Log management with_logstash_and_elastic_search

2

Copyright Information

This document is exclusive property of Happiest Minds Technologies Pvt. Ltd. It is intended for limited circulation.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 3: Log management with_logstash_and_elastic_search

3

ContentsCopyright Information.............................................................................................................2

Abstract...................................................................................................................................4

Introduction............................................................................................................................4

Problem Definition..................................................................................................................4

High Level Solution..................................................................................................................4

Solution Details.......................................................................................................................4

Solution Benefits.....................................................................................................................4

Solution extend-ability............................................................................................................4

Deliverables............................................................................................................................4

Conclusion...............................................................................................................................5

References..............................................................................................................................5

Happiest Mind Innovators.......................................................................................................5

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 4: Log management with_logstash_and_elastic_search

4

AbstractGathering logs from a wide array of servers and applications to be collected, searched, and analyzed centrally, in real-time, is a challenging task. Once we overcome this challenge we can get an ocean of insights from these logs, identify problems and come up with a solution or corrective measures much quickly. In this paper, we will build a highly scalable real-time log collection, search, visualization and analysis application using Logstash, ElasticSearch and Kibana.

IntroductionRecent compliance mandates require not only that organizations collect all logs, but also that they be reviewed regularly, are searchable, and are stored in their original, unaltered, raw form for mandate-specific timeframes. Log management solutions address data collection and retention needs in a way that allows them to inexpensively collect, store and manage large amounts of log data.To solve this problem we can build a highly scalable solution with real-time analysis using Logstash, ElasticSearch and Kibana.

Logstash: Logstash is a free, light weight and high-integrality tool for managing events and logs. It can collect logs, parse them and store them in a central location. It is free and open source under Apache license.

ElasticSearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is free and open source under Apache license.

Kibana: Kibana is a web-based, highly scalable dashboard solution seamlessly integrated with ElasticSearch and provides real-time analysis of streaming data. This is also free and open source product.

Problem DefinitionLogs are extremely useful in identifying security incidents, policy violations, fraudulent activity, and operational problems. They are also valuable when performing audits, forensic analysis, internal investigations and identifying operational trends and long-term problems. However, the infinite variety of log data formats makes it impossible to utilize the data without data normalization.

As organizations grow, the variety of log data sources and the volume of data will increase. Compounding this challenge is the variability of data formats and distributed nature of these

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 5: Log management with_logstash_and_elastic_search

5

sources; in addition, every network infrastructure is in a constant state of change, with new systems, applications, users, and devices being added every day of the year.

All these challenges can be handled in a cost-effective and efficient manner by a log management solution which can offer these features:

Centralized Highly reliable Searchable Scalable Secure

High Level SolutionGiven below is brief overview of different technologies used for Log Management solution.

Logstash is a tool for managing events and logs. It is capable of filtering, modifying and shipping out events and logs. Logstash natively offer plugins for variety of sources like ElasticSearch, RabbitMQ, Redis, S3, Twitter, ZeroMQ, etc. Apart from single line logs it can handle json, multi-line logs also. It offers wide range of filters like grok, csv, date, geoip, kv, etc. and can it can ship out the parsed log to ElasticSearch, S3, Redis, ZeroMQ, MongoDB, etc. A complete list of Logstash input, output and filter plugins is available at http://logstash.net/docs/latest/.

The alternatives for Logstash are Splunk, Chukwa, Flume and Graylog but none of these offers the features like free and open source, high flexibility, low memory consumption and native plugins for a range of inputs, codecs, filters and outputs.

ElasticSearch is rapidly growing open source search solution and it is used by thousands of enterprises in virtually every industry. It is being used in production at companies like Mozilla, StackOverflow, GitHub, Clout, McGraw-Hill, etc.

ElasticSearch provides amazing features like faceted search, auto-complete, routing, sharding and scales easily. It provides search results in near real-time (close to milliseconds!).

Kibana is light weight web based dashboard and analysis application capable of real-time analysis of streaming data. It provides dashboard components like maps, histogram, trends and many other basic components.

The high level architecture for this solution is given in the diagram below:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 6: Log management with_logstash_and_elastic_search

6

Diagram – HLD of Log Management Solution

In the above architecture we have three components:

Logstash Agent ElasticSearch Cluster Kibana UI

Logstash agent is a light java application running on the server(s) which is/are producing logs. It filters and parses log and then ships out a json document to ElasticSearch cluster.

ElasticSearch cluster acts as a persistent store for logs and offers real-time search capabilities. Using its distributed architecture ElasticSearch can scale massively without compromising on performance.

Kibana is an UI dashboard and analysis tool. It offers both pre-configured dashboards and on-demand dashboards. Kibana makes use of REST APIs to interact with ElasticSearch.

Solution DetailsFor purpose of demo of this solution I have used clickstream logs from ECML/PKDD 2005 Discovery Challenge. Some sample log lines are shown below:

These log lines are delimited by semi-colon (;) and have below mentioned fields in order:

shop_id

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=15512;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/

Page 7: Log management with_logstash_and_elastic_search

7

unixtime client ip session visited page referrer

For making the demo we need to create a logstash configuration file (clickstream.conf) which consists of specifying inputs, filters and outputs.

The clickstream.conf file looks like:

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 8: Log management with_logstash_and_elastic_search

8

In the above logstash configuration file we have defined the input to be a log file and given the absolute path for the log. In filter section of we are parsing different fields, converting epoch seconds to date time format and converting IP address to latitude-longitude

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

input { file { # path for clickstream log path =>"/path/to/_2004_02_01_19_click_stream.log" # define a type for all events handeled by this input type =>"weblog" start_position =>"beginning" # the clickstream log is in character set ISO-8859-1 codec => plain {charset =>"ISO-8859-1"} }}

filter { csv { # define columns present in weblog columns =>[shop_id, unixtime, client_ip, session, page, referrer] separator =>";" } grok { # get visited page and page parameters match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"] remove_field =>["page"] } date { # as we are getting unixtime field in epoch seconds we will convert it to normal timestamp match =>["unixtime","UNIX"] } geoip { # this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind source =>"client_ip" fields =>["latitude","longitude"] target =>"geoip" add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"] add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"] } mutate { # this will convert geoip.coordinates to float values convert =>["[geoip][coordinates]","float"]} }

output { # store output in local elasticsearch cluster elasticsearch { host =>"127.0.0.1" }}

Page 9: Log management with_logstash_and_elastic_search

9

combination for plotting them on map. Finally we are storing the parsed logs to a local ElasticSearch cluster.

To start the logstash agent on the server run below command:

This command will invoke logstash JVM process which will parse the logs, index them to ElasticSearch and also start Kibana UI on http://localhost:9292/. By making some simple dashboard in Kibana UI we can visualize the logs.

Some sample screenshots from Kibana UI are given below:

Screenshot 1 - Histogram showing page landing count for different time interval.

Screenshot 2 – Map showing geographical distribution of users.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

java -jar logstash-1.3.2-flatjar.jar agent -f clickstream.conf --web

Page 10: Log management with_logstash_and_elastic_search

10

Screenshot 3 – Table showing different fields of logs.

Solution BenefitsThe benefits offered by this solution are listed below:

1. All the tools used in this solution are free and open source so this is a very cost-effective solution.

2. Development effort required is very low, as on coding part only logstash configuration file needs to be written and for UI, Kibana dashboards needs to be designed.

3. This solution is highly scalable. Logstash is tested to process around 25,000 events/ per node/ per second and ElasticSearch is used in production by many web scale companies.

4. All the tools are open sourced and are being actively contributed to, by a large developer community.

5. Logstash consumes very less memory, around 150MB.

Solution extend-ability Logstash not only manages logs but it is capable of handling different types of events like JSON, ActiveMQ, RabbitMQ, ZeroMQ, Twitter feeds, etc. It can also output aggregated counts of different events. And it is capable of shipping out events to a variety of tools like Riak, Redis, S3, Graphite, etc.

Apart from used as a search engine ElasticSearch be used as a NoSQL database, historical archive and real-time analytics tool.

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Page 11: Log management with_logstash_and_elastic_search

11

The above mentioned features of Logstash and ElasticSearch offers us practical application of this solution for many business problems.

DeliverablesPresentation of the solution with a focus on architecture, design and use cases.

Conclusion

The Log Management solution proposed using Logstash, ElasticSearch and Kibana is a cost-effective, efficient, reliable and highly scalable solution.

These products are backed by an active user community which keeps adding values and new functionalities to them. These are also backed and supported by the ElasticSearch company

ReferencesLogstash - http://www.elasticsearch.org/overview/logstash/ ElasticSearch - http://www.elasticsearch.org/overview/ Kibana - http://www.elasticsearch.org/overview/kibana/ ElasticSearch Users - http://www.elasticsearch.com/case-studies/ Logstash Performance Test - https://gist.github.com/paulczar/4513552 Logstash Memory Consumption - http://blog.sematext.com/2013/11/05/logstash-performance-monitoring/ ECML/PKDD 2005 Discovery Challenge - http://lisp.vse.cz/challenge/ecmlpkdd2005/

Happiest Mind Innovators Number of contributors - 1 Names of the contributors – Rishav Rohit Role of the contributor – Solution design and development

© 2013 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved