1
Machine Learning Techniques for Security Information and Event Management Jordan A. Caraballo-Vega 1,2 , George Rumney 2 , John Jasen 2 , Jasaun Neff 2 1 University of Puerto Rico at Humacao, Department of Mathematics, Humacao, PR 2 High Performance Computing Branch, NASA Center for Climate Simulation (NCCS), NASA Goddard Space Flight Center, Mail Code 606.2, Greenbelt, MD, USA Motivation & Objectives The deployment and maintenance of a High Performance Computing facility, such as the NASA Center for Climate Simulation, requires services able to monitor and report live results of hardware and software operational statistics. With more than 4k computing nodes and more than 90,000 processors cores, it is crucial for the NCCS to implement techniques to advance, improve, and speed up our way of analyzing failures to fix and prevent future downtimes. One important technique used over time to supervise information and events is to automatically store time- stamped documentation of relevant procedures in log files. This technique helps organizations, businesses or networks to proactively mitigate different risks. Even when this information is very useful, as fast-moving data increases, it becomes nearly impossible for humans to detect these error causes or incoming threats. Figure 1. We receive daily ~115 to 120 million log messages from ~3,000 servers. Therefore our aim is to: - Enhance and improve our ability to view, analyze, and monitor logs files. - Upgrade our existing ELK+Graylog infrastructure. - Prove that machine learning (ML) techniques are useful for log analysis. - Implement and develop ML jobs to automate the detection of common and security events. - Produce a recipe for future production upgrade procedures. Figure 2. Diagram with some services that are currently stored in our log infrastructure. They are all centralized and monitored in an Elasticsearch + Logstash + Graylog environment. Proxy CPU SSHD Firewall DNS Software LDAP NASA Minority University Research and Education Scholarship Thanks to George Rumney (mentor), John E. Jasen (mentor), and Jasaun Neff (mentor) for their continued advice and help during this project. Special thanks to Bennett Samowich and Maximiliano Guillen for their technical support and assistance. Thanks to Melissa Canon and Mablelene Burrell for their organization and support throughout the internship experience. Elasticsearch. Retrieved on June 10, 2017 from https://www.elastic.co X-Pack ML. Retrieved on June 15, 2017 from https://www.elastic.co/products/x- pack/machine-learning Logstash. Retrieved on June 16, 2017 from https://www.elastic.co/products/logstash Kibana. Retrieved on June 17, 2017 from https://www.elastic.co/products/kibana Continue monitoring the X-Pack Beta release for new updates and improvements. Combine results from different machine learning jobs to detect incoming threats that may harm multiple services. Enable new features to analyze a wider variety of logs with ML modules. Deploy machine learning techniques in a production environment. This may include the development of our own models and UI features. The final result will be an environment with the capacity of analyzing, monitoring, and visualizing logs through machine learning models. SIEM Components Machine Learning Overview Acknowledgements References Future Work Security Information and Event Management combines SIM (Security Information Management) and SEM (Security Event Management) for a consolidated analysis of your logs from multiple perspectives. While SEM centralizes the storage of logs and allows real-time analysis, SIM collects the data and provides automated trend analysis that will lead to a fully compliant and centralized service report. Elasticsearch – a highly scalable and distributable open-source search and analytics engine that allows the user to store, search, and analyze big volumes of data in near real time and with slight latency. It provides a fast data retrieving interface thanks to its indexing storage. Forwarding and storage (A) (B) (C) The combination of these two techniques is the one that gives SIEM the powerful and effective ability to detect incoming threats in real time and to perform forensic analysis on log data. By storing events historically, SIEM gives the administrator the flexibility of correlating events and to recognize unusual behavior more easily by using baselines. In order to implement, configure, and test these forensic techniques, an Elasticsearch + Logstash + Graylog + Kibana Demo system was built. Three Dell C60100 servers running Debian 8 were initially assembled with versions 2.4 of Elasticsearch, Logstash, and Graylog. This demo infrastructure reproduces the current NCCS log cluster functionalities and provides a real scenario of the procedures that need to be done at the time of deploying the upgraded infrastructure in production. The multiple software elements that integrate the cluster are described below. Log Analysis Demo Cluster Logstash – a server-side data collection and processing pipeline that ingests, transforms, normalizes, and sends data from a multitude of sources to a wide range of destinations. It comes with over 200 pre-built plugins that ease the process of filtering the unstructured data. Messages are produced and filtered through configuration files that seek the different patterns from logs. Graylog – powerful log management and analysis tool that parses and enriches logs while providing a centralized configuration management for 3 rd party collectors. Its REST API and web interface lets the user forward and pull data to and from multiple systems with the ability of integrating LDAP user directories. Kibana – an analytics and visualization platform that is built on top of Elasticsearch and lets the user search, view, and interact with data stored on indices. Its browser-based interface enables quick creation and sharing of dynamic dashboards that can include graphs, histograms, charts, and many other representations. Kibana plays and enormous roll at the time of visualizing and identifying the anomalies encountered in the systems. Figure 5. Diagram representing Elasticsearch workflow. A JSON query is executed to match files, while Elasticsearch returns in JSON output the matches. Figure 3. Diagram illustrating the workflow of our SIEM infrastructure. It begins with systems input (A) (CPU’s, Firewalls, Switches, Servers, etc.) that is then parsed and stored in a log cluster (B). After messages are indexed, they are analyzed through ML models and results can then be visualized in dashboards (C) that include real-time graphs and specific anomalies information. Figure 6. Diagram representing Logstash workflow. It receives data as input, parses logs through its filter plugin, and sends messages to the declared destinations. Beats – single-purpose lightweight agents that ship data to multiple Logstash and Elasticsearch instances. Beats is great for gathering and centralizing data; and can forward data from the network (packetbeat), log files (filebeat), operating system (metricbeat), and many others. Elasticsearch X-Pack 5.4 release brings unsupervised machine learning techniques into play. X-Pack ML automatically models the behavior of your data in order to identify streamline root causes, and to detect issues faster while reducing false positives. It uses statistical models that calculate baselines over time regarding what is normal in your logs or messages. After baselines are calculated for a set of points, it searches for deviations within the data sets that are later on identified as anomalies. X-Pack ML models are adaptable and as more data enters the systems, models are able to update automatically; which suites effectively SIEM objectives. By applying multiple probability distribution functions it gives the analysis more flexibility to determine which model is more effective regarding your data set. Some examples of anomalies that a system can encounter are when an entities’ behavior changes suddenly, and when an entity is drastically different than others in a population. Once the model is determined, analysis functions like mean, sum, and many others are calculated over the data in order to identify deviations from the baseline values and their influencers. Methods Information/Events Sources Analysis &Visualization Security Alerts Data from multiple services were collected, filtered, ingested, and analyzed. The example below reflects a job created from sample data acquired from the internet. Fields like detectors, types of data, period detection, influencers, and analysis functions are key piece for detecting anomalies. These analysis functions are responsible for detecting what is abnormal in the data range and are usually confirmed by mean, sum, metric, time, and many others. Figure 4. Representation of events as function of time through days. The initial graph behavior is significantly moderate compared to the end when some peaks raised. This is an example of a drastically changed entity behavior. Possible anomalies are colored in red. elasticsearch Documents Matching Documents JSON Query Data Source Data Destination Input Plugin Filter Plugin Output Plugin Machine Learning Jobs and Findings X-Pack – an Elastic Stack extension that bundles security, alerting, monitoring, reporting, graph, and machine learning features that are designed to work together seamlessly. Its reporting capabilities can generate and email dashboards as PDF reports, and can send automatic alerts about your system. Figure 7. Representation of the cluster workflow with X-Pack integrated. Logs are imported to the database with security features enabled; and then analyzed and visualized with ML. Log Importer Logstash Beats Database elasticsearch Visualization Kibana Graylogs X-Pack Machine Learning Jobs and Findings The Elastic Stack upgrade from version 2.4 to 5.5 brought significant changes that required analysis, problem resolution and documentation. Beats tools are convenient easy-to-use packages that can play an important role in a production environment at the time of ingesting logs. Machine learning techniques were effective and extremely useful for analyzing real- time and archived data. The implemented jobs were able to detect sensitive anomalies that would have required a big effort and time investment to be manually detected. X-Pack release brings useful security and alerting features that will be very useful for detecting incoming threats. It does not require huge cpu-power. Machine learning will emerge as a powerful analytics technique for log analysis and it was substantiated as a great engine for SIEM purposes. Conclusions Response Request By Application Data Description – Stats taken from apache log data. Job Details – A multi-metric job is created in order to monitor the total sum of the requests and the mean of the response values taking into account the country of origin. Aim of the Job – Detect if there is an ip address issuing high amounts of requests over time, and if those requests are legitimate. Graphs Descriptions – Figure 8A represents the high peak from amount of requests made (100X higher than baseline); red circles exhibit the biggest anomalous events. Figure 8B map illustrates the origin and intensity of the requests based on their ip addresses and total sum. LDAP Event Rate Figure 8. Anomaly ML Kibana Dashboard representation of the response-request-by-app job. There are high amount of requests from unauthorized countries. Jobs described below are intended to cover connection events rate, user authentication, and real-time system usage stats. The data was taken from multiple NCCS production systems and was filtered and ingested with Logstash and Metricbeat. These representations are examples of the ML advanced job option and are designed to monitor multiple events with a variety of detectors and influencers. Data Description – Stats taken from a week worth of logs from four NCCS LDAP servers. Data was parsed with Logstash grok and kv plugins. Job Details – A multi metric job was created using log sources as influencers and the high mean of operations per event as detectors. Aim of the Job – Monitor if there is a server issuing high amounts of operations over time, while it takes into account the log source and operation type. SSHD Event Rate Data Description – Three weeks worth of logs from NCCS SSHD servers. Job Details – An advanced job was created in order to calculate the event rate received from multiple input servers categorized by their hostnames. Aim of the Job – Detect if there are servers sending non-common amounts of events over time. This will build a wider picture of a cluster baseline behavior. System Metric Change Data Description – Six hours of CPU data together with a high CPU signature produced by a stress tool. Job Details – An advanced job is created in order to monitor the mean of the system CPU usage. This job will detect as anomalous high values as events. Aim of the Job – Detect if there is an unusual behavior in CPU consumption. The system is performing some tasks while it is being monitored. Figure 9. LDAP2 event rate representation illustrates a 22 times higher peak identified on July 21 st at ~8:00 pm. Figure 10. Red squares reveal possible anomalies found. On June 24th seven of the ten analyzed servers exhibit an anomalous behavior based on the amount of events sent. Figure 11. A high CPU consumption event that dropped drastically was detected at ~13:30. This can represent that the system rebooted or was down momentarily. (A) (B)

LDAP Proxy Logstash - NASA · PDF fileDebian8wereinitiallyassembledwithversions2.4ofElasticsearch,Logstash,and Graylog. This demo infrastructure reproduces the current NCCS log cluster

Embed Size (px)

Citation preview

Page 1: LDAP Proxy Logstash - NASA · PDF fileDebian8wereinitiallyassembledwithversions2.4ofElasticsearch,Logstash,and Graylog. This demo infrastructure reproduces the current NCCS log cluster

MachineLearningTechniquesforSecurityInformationandEventManagementJordanA.Caraballo-Vega1,2,GeorgeRumney2,JohnJasen2,JasaunNeff2

1UniversityofPuertoRicoatHumacao,DepartmentofMathematics,Humacao,PR2HighPerformanceComputingBranch,NASACenterforClimateSimulation(NCCS),NASAGoddardSpaceFlightCenter,MailCode606.2, Greenbelt,MD,USA

Motivation&ObjectivesThe deployment and maintenance of a High Performance Computing facility, such as theNASA Center for Climate Simulation, requires services able to monitor and report liveresults of hardware and software operational statistics. With more than 4k computingnodes and more than 90,000 processors cores, it is crucial for the NCCS to implementtechniques to advance, improve, and speed up our way of analyzing failures to fix andprevent future downtimes.

One important technique used over time to superviseinformation and events is to automatically store time-stamped documentation of relevant procedures in logfiles. This technique helps organizations, businesses ornetworks to proactively mitigate different risks. Evenwhen this information is very useful, as fast-movingdata increases, it becomes nearly impossible forhumans to detect these error causes or incomingthreats.

Figure1.Wereceivedaily~115to120millionlogmessagesfrom~3,000servers.

Therefore our aim is to:- Enhance and improve our ability to view,analyze, and monitor logs files.

- Upgrade our existing ELK+Grayloginfrastructure.

- Prove that machine learning (ML)techniques are useful for log analysis.

- Implement and develop ML jobs toautomate the detection of common andsecurity events.

- Produce a recipe for future productionupgrade procedures.

Figure2.Diagramwithsomeservicesthatarecurrentlystoredinourloginfrastructure.Theyareallcentralizedand

monitoredinanElasticsearch +Logstash +Graylogenvironment.

ProxyCPUSSHD

Firewall DNSSoftwareLDAP

• NASA Minority University Research and Education Scholarship• Thanks to George Rumney (mentor), John E. Jasen (mentor), and JasaunNeff (mentor) for their continued advice and help during this project.• Special thanks to Bennett Samowich and Maximiliano Guillen for theirtechnical support and assistance.•Thanks to Melissa Canon and Mablelene Burrell for their organizationand support throughout the internship experience.

•Elasticsearch.RetrievedonJune10,2017fromhttps://www.elastic.co• X-PackML.RetrievedonJune15,2017fromhttps://www.elastic.co/products/x-pack/machine-learning• Logstash.RetrievedonJune16,2017fromhttps://www.elastic.co/products/logstash• Kibana.RetrievedonJune17,2017fromhttps://www.elastic.co/products/kibana

• ContinuemonitoringtheX-PackBetareleasefornewupdatesandimprovements.• Combineresultsfromdifferentmachinelearningjobstodetectincomingthreatsthatmayharmmultipleservices.• EnablenewfeaturestoanalyzeawidervarietyoflogswithMLmodules.• Deploymachinelearningtechniquesinaproductionenvironment.ThismayincludethedevelopmentofourownmodelsandUIfeatures.• Thefinalresultwillbeanenvironmentwiththecapacityofanalyzing,monitoring,andvisualizinglogsthroughmachinelearningmodels.

SIEMComponents

MachineLearningOverview

Acknowledgements

References

FutureWork

Security Information and Event Management combines SIM (Security InformationManagement) and SEM (Security Event Management) for a consolidated analysis of yourlogs from multiple perspectives. While SEM centralizes the storage of logs and allowsreal-time analysis, SIM collects the data and provides automated trend analysis that willlead to a fully compliant and centralized service report.

Elasticsearch – a highly scalable anddistributable open-source search andanalytics engine that allows the user tostore, search, and analyze big volumesof data in near real time and with slightlatency. It provides a fast data retrievinginterface thanks to its indexing storage.

Forwardingandstorage

(A) (B) (C)

The combination of these two techniques is the one that gives SIEM the powerful andeffective ability to detect incoming threats in real time and to perform forensic analysison log data. By storing events historically, SIEM gives the administrator the flexibility ofcorrelating events and to recognize unusual behavior more easily by using baselines.

In order to implement, configure, and test these forensic techniques, an Elasticsearch +Logstash + Graylog + Kibana Demo system was built. Three Dell C60100 servers runningDebian 8 were initially assembled with versions 2.4 of Elasticsearch, Logstash, andGraylog. This demo infrastructure reproduces the current NCCS log clusterfunctionalities and provides a real scenario of the procedures that need to be done atthe time of deploying the upgraded infrastructure in production. The multiple softwareelements that integrate the cluster are described below.

LogAnalysisDemoCluster

Logstash – a server-side data collectionand processing pipeline that ingests,transforms, normalizes, and sends datafrom a multitude of sources to a widerange of destinations. It comes with over200 pre-built plugins that ease the processof filtering the unstructured data.Messages are produced and filteredthrough configuration files that seek thedifferent patterns from logs.

Graylog – powerful log management and analysis tool that parses and enriches logswhile providing a centralized configuration management for 3rd party collectors. Its RESTAPI and web interface lets the user forward and pull data to and from multiple systemswith the ability of integrating LDAP user directories.

Kibana – an analytics and visualization platform that is built on top of Elasticsearch andlets the user search, view, and interact with data stored on indices. Its browser-basedinterface enables quick creation and sharing of dynamic dashboards that can includegraphs, histograms, charts, and many other representations. Kibana plays and enormousroll at the time of visualizing and identifying the anomalies encountered in the systems.

Figure5.DiagramrepresentingElasticsearch workflow.AJSONqueryisexecutedtomatchfiles,while

Elasticsearch returnsinJSONoutputthematches.

Figure3.DiagramillustratingtheworkflowofourSIEMinfrastructure.Itbeginswithsystemsinput(A)(CPU’s,Firewalls,Switches,Servers,etc.)thatisthenparsedandstoredinalogcluster(B).Aftermessagesareindexed,theyareanalyzed

throughMLmodelsandresultscanthenbevisualizedindashboards(C) thatincludereal-timegraphsandspecificanomaliesinformation.

Figure6.DiagramrepresentingLogstash workflow.Itreceivesdataasinput,parseslogsthroughitsfilterplugin,

andsendsmessagestothedeclareddestinations.

Beats – single-purpose lightweight agents that ship data to multiple Logstash andElasticsearch instances. Beats is great for gathering and centralizing data; and canforward data from the network (packetbeat), log files (filebeat), operating system(metricbeat), and many others.

Elasticsearch X-Pack 5.4 release brings unsupervised machine learning techniques intoplay. X-Pack ML automatically models the behavior of your data in order to identifystreamline root causes, and to detect issues faster while reducing false positives. Ituses statistical models that calculate baselines over time regarding what is normal inyour logs or messages. After baselines are calculated for a set of points, it searches fordeviations within the data sets that are later on identified as anomalies. X-Pack MLmodels are adaptable and as more data enters the systems, models are able to updateautomatically; which suites effectively SIEM objectives.

By applying multiple probability distributionfunctions it gives the analysis more flexibilityto determine which model is more effectiveregarding your data set. Some examples ofanomalies that a system can encounter arewhen an entities’ behavior changes suddenly,and when an entity is drastically differentthan others in a population. Once the modelis determined, analysis functions like mean,sum, and many others are calculated over thedata in order to identify deviations from thebaseline values and their influencers.

Methods

Information/EventsSources Analysis&Visualization SecurityAlerts

Data from multiple services were collected, filtered, ingested, and analyzed. Theexample below reflects a job created from sample data acquired from the internet.Fields like detectors, types of data, period detection, influencers, and analysis functionsare key piece for detecting anomalies. These analysis functions are responsible fordetecting what is abnormal in the data range and are usually confirmed by mean, sum,metric, time, and many others.

Figure4.Representationofeventsasfunctionoftimethroughdays.Theinitialgraphbehaviorissignificantlymoderatecomparedtotheendwhensomepeaks

raised.Thisisanexampleofadrasticallychangedentitybehavior.Possibleanomaliesarecoloredinred.

elasticsearch

Documents

MatchingDocumentsJSONQuery

DataSource

DataDestination

InputPlugin

FilterPlugin

OutputPlugin

MachineLearningJobsandFindings

X-Pack – an Elastic Stack extension thatbundles security, alerting, monitoring,reporting, graph, and machine learningfeatures that are designed to worktogether seamlessly. Its reportingcapabilities can generate and emaildashboards as PDF reports, and can sendautomatic alerts about your system.

Figure7.RepresentationoftheclusterworkflowwithX-Packintegrated.Logsareimportedtothedatabasewithsecurityfeaturesenabled;andthenanalyzedandvisualizedwithML.

LogImporterLogstashBeats

Databaseelasticsearch

VisualizationKibanaGraylogs

X-Pack

MachineLearningJobsandFindings

• The Elastic Stack upgrade from version 2.4 to 5.5 brought significant changes thatrequired analysis, problem resolution and documentation.• Beats tools are convenient easy-to-use packages that can play an important role in aproduction environment at the time of ingesting logs.•Machine learning techniques were effective and extremely useful for analyzing real-time and archived data. The implemented jobs were able to detect sensitive anomaliesthat would have required a big effort and time investment to be manually detected.• X-Pack release brings useful security and alerting features that will be very useful fordetecting incoming threats. It does not require huge cpu-power.•Machine learning will emerge as a powerful analytics technique for log analysis and itwas substantiated as a great engine for SIEM purposes.

Conclusions

ResponseRequestByApplication

Data Description – Stats taken from apache log data.Job Details – A multi-metric job is created in order tomonitor the total sum of the requests and the meanof the response values taking into account thecountry of origin.Aim of the Job – Detect if there is an ip addressissuing high amounts of requests over time, and ifthose requests are legitimate.Graphs Descriptions – Figure 8A represents the highpeak from amount of requests made (100X higherthan baseline); red circles exhibit the biggestanomalous events. Figure 8B map illustrates theorigin and intensity of the requests based on their ipaddresses and total sum.

LDAPEventRate

Figure8.AnomalyMLKibana Dashboardrepresentationoftheresponse-request-by-appjob.Therearehighamountofrequestsfromunauthorizedcountries.

Jobs described below are intended to cover connection events rate, userauthentication, and real-time system usage stats. The data was taken from multipleNCCS production systems and was filtered and ingested with Logstash and Metricbeat.These representations are examples of the ML advanced job option and are designedto monitor multiple events with a variety of detectors and influencers.

Data Description – Stats taken from a week worthof logs from four NCCS LDAP servers. Data wasparsed with Logstash grok and kv plugins.Job Details – A multi metric job was created usinglog sources as influencers and the high mean ofoperations per event as detectors.Aim of the Job – Monitor if there is a server issuinghigh amounts of operations over time, while it takesinto account the log source and operation type.

SSHDEventRate

Data Description – Three weeks worth of logs fromNCCS SSHD servers.Job Details – An advanced job was created in orderto calculate the event rate received from multipleinput servers categorized by their hostnames.Aim of the Job – Detect if there are servers sendingnon-common amounts of events over time. This willbuild a wider picture of a cluster baseline behavior.

SystemMetricChange

Data Description – Six hours of CPU data togetherwith a high CPU signature produced by a stress tool.Job Details – An advanced job is created in order tomonitor the mean of the system CPU usage. Thisjob will detect as anomalous high values as events.Aim of the Job – Detect if there is an unusualbehavior in CPU consumption. The system isperforming some tasks while it is being monitored.

Figure9.LDAP2eventraterepresentationillustratesa22timeshigherpeakidentifiedonJuly21st at~8:00

pm.

Figure10.Redsquaresrevealpossibleanomaliesfound.OnJune24thsevenofthetenanalyzedserversexhibitananomalousbehaviorbasedontheamountofeventssent.

Figure11.AhighCPUconsumptioneventthatdroppeddrasticallywasdetectedat~13:30.Thiscanrepresentthatthesystemrebootedorwasdownmomentarily.

(A)

(B)