33
Regular Expressions and XML Parsing

Regular Expressions and XML Parsing

Embed Size (px)

DESCRIPTION

Regular Expressions and XML Parsing. Objectives. After this session you should be able to: Understand and write Regular Expressions Create XML code that will use Regular Expressions to parse data from providers into parameters. Regular Expressions. - PowerPoint PPT Presentation

Citation preview

Regular Expressions and XML Parsing

Objectives

After this session you should be able to:

Understand and write Regular Expressions

Create XML code that will use Regular Expressions to parse data from providers into parameters

Regular Expressions

Regular Expression Parsing In Application Log/Syslog Provider

Regular Expressions

Used to parse and analyze fields

Designed for matching text items

Requires extremely precise syntax

Regular Expression Overview Can be used in:

− Rule criteria− View criteria− Computer Group formulas− XML / parameter parsing

Popular Boost.org regular expression parser

Perl-like regular expression syntax

Features include: − Advanced text pattern matching− Timestamp conversion− UI support− Syslog IP filtering

Note: The regex that is used in Rules, Views and Computer Groups is not the same syntax as parsers

Regular Expression Example

Regular Expression = ^World\s+.* “^” means “Start of Line” “^World” means the text line must begin with “World” “\s+” means any number of spaces “.*” is a wildcard that will match anything else Matches:

“World Wide Web Publishing Service”“World with lots of space” “World Class”“World War”

Does not match: “WorldWide”“Wide World of Sports”“Wayne’s World”“War of the Worlds”

Regular Expression Example #2 Regular Expression = ^\s+TCP|ICMP\s+\d+.\d+.\d+.\d+:?\d+?

“^” means “Start of Line”

“\s+” means “any number of blank spaces”

“TCP|ICMP” means the literal words “TCP” or “ICMP” must be present

\d+.\d+.\d+.\d+:?\d+? means a field with 5 numerical (digits) parts, separated by periods and a colon, the colon and the 5th field may or may not exist.

Since each digital component has a “+” it can be any number of consecutive digits

Matches: TCP 192.168.1.1:80

ICMP 192.168.1.1

Regular Expression Example #3

Regular Expression = [^\,]*

Matches all fields until a , is seen. Any character can be used.

Useful for matching data within a given sub-expression that can vary greatly

Matches: the red text in the below line:250606001E05,25,2,8,HOUAV03

Regular Expression Operators and Their Definitions

Menu Item Character Definition

Any Character . Matches any single character

Character in Range [ ] Matches any single character from within the bracketed list

Character Not in Range

[^] Specifies a set of characters not to be matched

Beginning of Line ^ Matches the beginning of a line

End of Line $ Matches from end of string

Special Characters

Special Characters include \ ^ $ * . [ ] | + ( )

Any time you want to use a special characters as a literal, it must be escaped − Example: The path c:\myfile.txt would need to be entered

as c:\\myfile\.txt− Example: The User-ID $ExchangeService would need to

be entered as \$ExchangeService

Taking Apart the Regular Expression

^\s+ TCP \s+ \d+ . \d+ . \d+ . \d+ :? \d+?

TCP 192 . 168 . 106 . 134 : 80

Syntax Must Be Precise

Regular Expression = ^\d.\d.\d.\d:?\d?

Matches:

1.2.3.4:5

1.2.3.5

Does Not Match:

192.168.1.20:25

192.168.1.20

Examples of Regular Expressions and Matches

Example Matches Does Not Match

st.n Austin and Houston Webster

st[io]n Austin and Houston Stanton

st[^io]n Stanton Houston or Austin

^houston Houston Sam Houston

ston$ Houston and Galveston Stonewall

dall|hart Dallas and Dalhart and Lockhart Dale

dal(l|h) art Dalhart Dallas or Lockhart

il?e$ Etoile Beeville

il*e$ Etoile and Beeville Bellaire

il+e$ Etoile and Beeville Wylie

ad{2} Addison and Caddo Adkins

Regular Expressions with XML

Regular Expression Tools & Links

Expresso

http://www.ultrapico.com/Expresso.htm− Helps with the actual writing of RegEx expressions

Regular expression syntax help

http://www.boost.org/libs/regex/doc/syntax_perl.html

Timestamp format

http://icu.sourceforge.net/userguide/formatDateTime.html

Sample XML File

Three Major Sections

Date

Filters

Events

Date Section

<DateTimeMap>

<TimeStamp>

<TimeStampSample>2005-9-11T14:18:11 GMT</TimeStampSample>

<TimeStampFormat>yyyy-MM-dd'T'HH:mm:ss z</TimeStampFormat>

<TimeStampRE>\d+-\d+-\d+T\d+:\d+:\d+\w+[^|]*</TimeStampRE>

</TimeStamp>

</DateTimeMap>

<DateTimeFormat>yyyy-MM-dd'T'HH:mm:ss z</DateTimeFormat>

When using a DateTimeMap, your regex code should include the following comment tags:<!--TimeStampStartTag--><!--TimeStampEndTag-->

Filter Section

Used to pre-filter high volume Events or unwanted Events

Used to improve Provider performance

Should be as efficient and specific as possible

Sample filter section: <Filters>

<RegEx>.*last message repeated\s+\w+\s+times.*</RegEx>

</Filters>

This particular Filter is used to filter out UNIX Syslog Messages that list the previous message being repeated X times.

Event Section

Contains one or more Event matching nodes

An Event node is used to match a particular message and format it in a specific way

Each Event node contains 3 sections:− Regular Expression section – the RegEx itself− Instruction section – parameter mapping− Message section – SM description definition

Event Node Mapping – RegEx Section

<RegEx>^\s+TCP\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\w+)</RegEx>

(F 0) (F 1) (F 2) (F 3) (F 4)

(F 0) (F 1) (F 2) (F 3) (F 4)

Event Node Mapping – Instruction Section

<Instructions>

<Field name="$EventSource" source=“MYEVTSRC" />

<Field name="1" source="%0%" />

<Field name="2" source="%1%" />

<Field name="3" source="%2%" />

<Field name="4" source="%4%" />

<Field name="5" source=“" />

<Field name=“6" source=“%3%" />

</Instructions>

Event Node Mapping – Message Section

<Message><![CDATA[

Protocol: TCP

Local Address: %0%

Local Port: %1%

Foreign Address: %2%

Foreign Port: %3%

Status: %4%

]]></Message>

Note: <![CDATA[ ]]> tags are used to tell the code the interprets the XML code to ignore the contents within from an XML syntax standpoint.

Message Example This is an acceptable way to break down the event into

details, but is not necessary. A better way will be explained shortly.

<Message><![CDATA[

Protocol: TCP

Local Address: %0%

Local Port: %1%

Foreign Address: %2%

Foreign Port: %3%

Status: %4%

]]></Message>

Where are the Parameters?

Parameters are not stored separately unless SM is specifically instructed to do so.

Preventing Data Loss

Adding additional “Catch-all” parsers will allow you to collect anything that slipped through the cracks.

Examples:<Event id="“><RegEx>.*snort.*:.*</RegEx> <Instructions>

<Field name="$EventSource" source="Snort IDS" /> <Field name="$EventSeverity" source="1" />

</Instructions><Message></Message>

</Event><Event id="“><RegEx>.*</RegEx>

<Instructions><Field name="$EventSource" source="Syslog" />

<Field name="$EventSeverity" source="1" /> </Instructions>

<Message></Message></Event>

Putting it All Together

Change the Provider to XML

Click on the Configure XML button

Cut and Paste XML code from Editor

Custom Alert Descriptions – The right way to create alert messages!

The default is to use $Description$ for the Alert Description

This causes the alert to look like this:

Custom Alert Descriptions – The right way to create alert messages!

By creating a descriptive alert description, you can make the alert look like this:

This is accomplished by creating modifying the event processing rule that generates the alert to have a more detailed alert description

Limitations of Regular Expression Parsing

It is a lexical parser and it works only for sequence-based regular expression parsing

Does not support XML format messages, i.e., IDMEF messages

Sub-expressions are limited to 0–24

XML Tools & Links

SCiTe− http://scintilla.sourceforge.net/ScintillaDownload.html− Small fast text editor with color coding for XML

Notepad++− http://notepad-plus.sourceforge.net/uk/site.htm− Slightly larger text editor, but more robust than SCiTe

Module Review

In this session you learned how to:

Understand and write Regular Expressions

Create XML code that will use Regular Expressions to parse data from providers into parameters