46
Regular Expressions and XML Parsing

Regular Expressions and XML Parsing. Objectives After this session you should be able to: Understand and write Regular Expressions Create XML code

Embed Size (px)

Citation preview

Regular Expressions and XML Parsing

Objectives

After this session you should be able to:

Understand and write Regular Expressions

Create XML code that will use Regular Expressions to parse data from providers into parameters

Regular Expressions

Regular Expression Parsing In Syslog Provider

Regular Expressions

Used to parse and analyze fields

Designed for matching text items

Requires extremely precise syntax

Regular Expression Overview Can be used in:

− Rule criteria− View criteria− Computer Group formulas− XML / parameter parsing

Popular Boost.org regular expression parser

Perl-like regular expression syntax

Features include: − Advanced text pattern matching− Timestamp conversion− UI support− Syslog IP filtering

Note: The regex that is used in Rules, Views and Computer Groups is not the same syntax as parsers

Regular Expression Example

Regular Expression = ^World\s+.* “^” means “Start of Line” “^World” means the text line must begin with “World” “\s+” one or more spaces “.*” is a wildcard that will match anything else Matches:

“World Wide Web Publishing Service”“World with lots of space” “World Class”“World War”

Does not match: “WorldWide”“Wide World of Sports”“Wayne’s World”“War of the Worlds”

Regular Expression Example #2 Regular Expression = ^\s+TCP|ICMP\s+\d+.\d+.\d+.\d+:?\d+?

“^” means “Start of Line”

“\s+” means “one or more spaces ”

“TCP|ICMP” means the literal words “TCP” or “ICMP” must be present

\d+.\d+.\d+.\d+:?\d+? means a field that matches 5 sets of 1 or more numerical digits separated by periods and a colon, the colon and the 5th field may or may not exist.

Since each digital component has a “+” it means at least one digits

Matches: TCP 192.168.1.1:80

ICMP 192.168.1.1

Regular Expression Example #3

Regular Expression = [^\,]*

Matches all fields until a , is seen. Any character can be used.

Useful for matching data within a given sub-expression that can vary greatly

Matches: the red text in the below line:250606001E05,25,2,8,HOUAV03

Regular Expression Operators and Their Definitions

Menu Item Character Definition

Any Character . Matches any single character

Character in Range [ ] Matches any single character from within the bracketed list

Character Not in Range

[^] Specifies a set of characters not to be matched

Beginning of Line ^ Matches the beginning of a line

End of Line $ Matches from end of string

Special Characters

Special Characters include \ ^ $ * . [ ] | + ( )

Any time you want to use a special characters as a literal, it must be escaped − Example: The path c:\myfile.txt would need to be entered

as c:\\myfile\.txt− Example: The User-ID $ExchangeService would need to

be entered as \$ExchangeService

Taking Apart the Regular Expression

^\s+ TCP \s+ \d+ . \d+ . \d+ . \d+ :? \d+?

TCP 192 . 168 . 106 . 134 : 80

Syntax Must Be Precise

Regular Expression = ^\d.\d.\d.\d:?\d?

Matches:1.2.3.4:51.2.3.51A225g5 *see note

Does Not Match:192.168.1.20:25 192.168.1.20

*note: since the ‘.’ is any character, escaping it with a ‘\’ will ensure that only IP Address formats would be matched – i.e. ^\d\.\d\.\d\.\d:?\d?

Examples of Regular Expressions and Matches

Example Matches Does Not Match

st.n Austin and Houston Webster

st[io]n Austin and Houston Stanton

st[^io]n Stanton Houston or Austin

^houston Houston Sam Houston

ston$ Houston and Galveston Stonewall

dall|hart Dallas and Dalhart and Lockhart Dale

dal(l|h) art Dalhart Dallas or Lockhart

il?e$ Etoile Beeville

il*e$ Etoile and Beeville Bellaire

il+e$ Etoile and Beeville Wylie

ad{2} Addison and Caddo Adkins

Regular Expressions with XML

Regular Expression Tools & Links

Expresso

http://www.ultrapico.com/Expresso.htm− Helps with the actual writing of RegEx expressions

Regular expression syntax help

http://www.boost.org/libs/regex/doc/syntax_perl.html

Sample XML File

Three Major Sections

<event-timestamp-format/> − (Date)

<filter-rule/>− (Filtering)

<parse-rule/>− (Event Parsing)

Date Section

Optional element that must be specified before the filters and parse rules

Determines the timestamp format expected from a syslog message− If not defined, the event timestamp will be based on the time the provider

received the syslog message

Applied to the detecttime field of log archive formatters and EventTime of processing event formatters

First successful format will be used to parse event time string.− If they all fail, the time the syslog message was received will be used.

Timestamp formatting is expensive− In highly optimized parse maps, consider not defining timestamps− Expect up to a 10% drop in overall EPS

Example TimeStamp Format

<event-timestamp-format><timestamp-format>yyyy-MM-dd HH:mm:ss</timestamp-format><timestamp-format>MMM dd yyyy HH:mm:ss Z</timestamp-format><timestamp-format>MMM dd yyyy HH:mm:ss.SSS Z</timestamp-format>

</event-timestamp-format>

Date/Time Reserved CharactersSymbol Meaning Example

G era designator AD

y Year 1996

M month in year July or 07

d day in month 10

h hour in am/pm (1~12) 12

H hour in day (0~23) 0

m minute in hour 30

s second in minute 55

S fractional second 978

E day of week Tuesday

e day of week (local 1~7) 2

D day in year 189

F day of week in month 2 (2nd wed in July)

w week in year 27

W week in month 2

a am/pm marker PM

k hour in day (1~24) 24

K hour in am/pm (0~11) 0

z time zone Pacific Standard Time

Z time zone (RFC 822) -800

v time zone (generic) Pacific Time

Filter Section

Used to pre-filter unwanted Events

Should be as efficient and specific as possible

Sample filter section:

<filter-rule name=”Filter rule example 1” enabled=”true”><match-regex>

<property name="regex"><![CDATA[.*last message repeated\s+\w+\s+times.*]]></property><syslog-message/>

</match-regex> <filter-action/>

</ filter-rule>

This particular Filter is used to filter out UNIX Syslog Messages that list the previous message being repeated X times.

Important Items for Filter Rules

A filter rule “name” has to be unique to distinguish it from other parsing/filter rules− If the name is not unique the last one with the name will be

used.

If the “enabled” attribute was false, the filter rule will not be considered for evaluation

The “name” of the match-regex property element MUST be “regex”

Performance Counters Available to Track Filtered Messages

If Multiple Provider Instances have filter Expressions, the combined set is applied to the parse rules

Event Section

Contains one or more parse rules

A single parse rule is used to match a message and execute an action

Actions are formatters that determine how data will be processed

<parse-rule name=”Parse Rule Example 1” enabled=”true”>

<match-regex>

<property name="regex"><![CDATA[Regex-Expression]]></property>

<syslog-message/>

</match-regex>

<format-log-archive-event/>

<format-processing-event/>

</parse-rule>

Important Items for Parse Rules A parse rule “name” has to be unique to distinguish it

from other parsing/filter rules− If the name is not unique the last one with the name will be used.

If the “enabled” attribute was false, the parse rule will not be considered for evaluation

The “name” of the match-regex property element MUST be “regex”

A single parse rule can create a Log Archive Event and a Processing Event− Event Processing is a much more expensive operation than

archiving and should only be used when the specific event needs to be alerted on

All parse rules regex evaluation are mutually exclusive− Once a regex is matched, no other parse rules will be evaluated.

Order of parse rules is important, evaluates parse rules in a top down manner

Parse Rule - <match-regex>

<match-regex>

<property name="regex"><![CDATA[

^\s+TCP\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\w+)

]]></property>

<syslog-message/>

</match-regex>

(F 0) (F 1) (F 2) (F 3) (F 4)

(F 0) (F 1) (F 2) (F 3) (F 4)

Parse Rule - <format-log-archive-event>

<format-log-archive-event>

<property-list/>

<mapped-property-list/>

<lookup-property-list/>

</format-log-archive-event>

Parse Rule - < format-processing-event >

<format-processing-event>

<property-list/>

<mapped-property-list/>

<lookup-property-list/>

</format-processing-event>

Formatter Considerations

One or both types of formatters can be defined in a single parse rule

Field/Property names used in the log archive formatters MUST be defined in the field map

Field/Property names used in the processing event formatters MUST be Intrinsic Properties of SM Events

Log Archive Formatters prevent events from entering the workflow− No archive collection or filter rules will be evaluated against this

data

If no Log Archive Rules or Real Time rules exist for a provider, the related formatters are disabled.− High Performance Log Archive Mode disables all event processing

formatters.

Intrinsic SM Event PropertiesEventNumber This is the Event ID/ Event Number of generated SM event.

Default: event number 25256 

EventType The event type of the genreated SM event.Default: An information event type.

 EventTime The time stamp of the generated SM Event.

Default: The time stamp when the provider recived the syslog message. 

Category The event category of the genreated SM event. 

ProviderName The provider name associated with the genreated SM event. This value is automatically set by the proivder but can be over-wirtten if specified in the parse map.

Default: The provider instance name will be used. 

Message The message/description field of the genreated SM event.Default: The syslog message.

 UserName Is the user name associated with the genreated SM event.

 UserDomainName

Is the domain of the user name associated with the genreated SM event. 

Computer The source computer name that is the source of the syslog messages.Default: A reverse lookup of the IP address computer who sent the syslog message.

 Domain Is the domain of the source computer that is the source of the syslog messages.

 SourceName The event source field of the genreated SM event.

 Parameter # where # is between 1 and 99 that represent the SM event parameters 1 to 99.

Log Archive Formatter – Property List Fields are defined in the field map (metadata)

<property-list><property name="classification.origin" value="Cisco IOS"/><property name="detecttime" value="%1%"/><property name="analyzer.model" value="Cisco IOS"/><property name="userfield_string_001" value="%2%"/><property name="assessment.impact.severity" value="%3%"/><property name="source.service.port" value="%8%"/><property name="source.node.name" value="%7%"/><property name="target.service.port" value="%10%"/><property name="target.node.name" value="%9%"/><property name="target.service.protocol" value="%6%"/><property name="action" value="%5%"/>

</property-list>

Processing Event Formatter – Property List Fields are intrinsic properties of SM Events

<property-list><property name="EventTime" value="%1%"/><property name="SourceName" value="Cisco IOS"/><property name="EventNumber" value="2001"/><property name="EventType" value="4"/><property name="Parameter 1" value="%2%"/><property name="Parameter 2" value="%3%"/><property name="Parameter 3" value="%4%"/><property name="Parameter 4" value="%7%"/><property name="Parameter 5" value="%8%"/><property name="Parameter 6" value="%7%"/><property name="Parameter 7" value="%9%"/>

</property-list>

Property Values Definitions:

%#% - represents the regex substitution/capture string or value whos index is # starting with 1 to indicate the first regex index value.− (e.g. name ="UserDomainName" value ="%3%" means that the

third captured strings/value of the regex evaluation will be used as the user domain name).

Literals are specified without a surrounded % sign.− (e.g. name =" Parameter 7" value = “literal value” will result to

using the full string specified as the value for Parameter 7 field). 

Special regex captured index 0 (zero) indicate the original data message− (e.g. name =" Message" value = “%0%” will result on placing

the full original data message into the message field of the event)

Formatting: Concatenation Used when a field value needs to be formatted out of

concatenated values and strings

Can only be used on fields that support strings− Use in a non string defined field will result in an error while loading the

parse map− Concatenation is not supported for Mapped or Lookup properties.

Applies to both Log Archive and Event Processing Formatters

The concatenation format consist of:− Operator - The concatenation format designates the “+” (plus sign) as the

concatenation operator between the different string literals and parsed tokens.− Operands - They can either be:

− String literals wrapped around a single quotes like ’abc’ and ’xzy’− Parsed token index such as %4%

− Spaces between the plus operators are ignored.− Spaces within a string literal are preserved

Formatting: Concatenation - Examples

<property-list>

<property name="classification.name" value="'%' + %2% + '-' + %3% + '-' + %4%" />

</property-list>

This results in classification.name being the captured values from indexes 2, 3 and 4

If the concatenated format is: “%2%+’-‘+%3%+’-‘+%4%”

If the parsed captured values were “10”, “High”, “Connection Accepted”.

The formatted value will be: “10-High-Connection Accepted”

Formatting: Mapped Properties

Both the log archive and real time formatters support mapped properties

The “mapped-property-list” is a container of one or more mapped properties

“mapped -property” defines the mapping scheme. It must contain the following required attributes:− “name” : It is the name of the field to be inserted into the event after the

mapping expression has been evaluated. − “value” : is the value to be used during the lookup. It must be the regex

index.

If the mapping does not match any mapping keys and no default provided, it will not be inserted.

Formatting: Mapped Properties

The key/value pairs of the map are represented with child elements “key-value”.− The key/value element has two attributes, the “key” attributed

represent the mapping key and the “value” is the value of the mapped key.

The map lookup is case-insensitive.

The value of the key/value can be a literal string or the index of a regex evaluation results.

Formatting: Mapped Properties – Example The mapping properties are defined within a

formatting node as follows:<mapped-property-list>

<mapped-property name="classification.name" value="%7%" >

<key-value key="tcp" value="reliable"/>

< key-value key="udp" value="unreliable"/>

<key-value key="nttp" value="async"/>

<default-mapping value="unknown"/>

</mapped-property>

</mapped-property-list>

If the captured value from index 7 was “TCP” then the mapped property for classification.name would equal “reliable”

Formatting: Lookup Properties Both the log archive and real time formatters

support lookup properties

The “lookup-property-list” is a container of one or more lookup properties

“lookup-property” defines the mapping scheme. It must contain the following required attributes:− “name”: is the name of the field to be inserted as a looked up field.− “lookup-value”: is the value to be used during the lookup. It must be a

regex index− “lookup-type”: is the type of the lookup.

Currently lookup properties allow for an IP lookup from a hostname only

Can be turned on/off in the provider parsing properties

Formatting: Lookup Properties – Example

The lookup properties are defined within a formatting node as follows:

<lookup-property-list>

<lookup-property name="source.node.address.address" lookup-value="%7%" lookup-type="IP_LOOKUP" />

<lookup-property name="target.node.address.address" lookup-value="%9%" lookup-type="IP_LOOKUP" />

</lookup-property-list>

If the captured value from index 7 was “houselab01”, that name would be looked up in DNS and the resultant IP Address would be recorded as the lookup-property for “source.node.address.address”

Preventing Data Loss – Catch All

Catch All can be enabled to allow any event that did not match any parse rules to be gathered as a log archive event.− This includes not matching a filter rule

Catch All is enabled by default and can be disabled in the provider instance UI

Catch All will capture the syslog message and send it directly to log archive, NO Real Time event will be generated based on catch all logic

Catch All will only function if:− A log archive rule for that a provider instance or any provider

instance sharing the same port− The user did not disable catch-all in the UI

Catch All – Captured Field

Detect time: The time the provider received the syslog data.

Time zone offset: The time zone offset of the agent hosting the provider.

Analyzer model: The analyzer model will be “Generic Syslog”

Classification origin: Which will be “Generic Syslog”

The severity will always be “low”.

The message field will contain the syslog message.

Putting it All Together

Select Enable Parsing

Click on the Configure XML button

Cut and Paste XML code from Editor

Custom Alert Descriptions

By creating a descriptive alert description, you can make the alert look like this:

This is accomplished by creating modifying the event processing rule that generates the alert to have a more detailed alert description

XML Tools & Links

SCiTe− http://scintilla.sourceforge.net/ScintillaDownload.html− Small fast text editor with color coding for XML

Notepad++− http://notepad-plus.sourceforge.net/uk/site.htm− Slightly larger text editor, but more robust than SCiTe

Module Review

In this session you learned how to:

Understand and write Regular Expressions

Create XML code that will use Regular Expressions to parse data from providers into parameters