16
Regular Expressions in SAS® Enterprise Guide® Mark Tabladillo Ph.D. http://www.marktab.com @marktabnet

Regular Expressions in SAS Enterprise Guide

Embed Size (px)

DESCRIPTION

In version 9, the SAS® System introduces Perl regular expressions (sometimes known by the acronym PRX, the first three letters of these functions or call routines). However, previous versions of SAS® already had regular expressions (known by their acronym RX, the first two letters of these functions or call routines). This presentation will describe specific functional and performance differences in these two exclusive regular expression strategies, and offer recommendations on when to use each strategy. The technologies will be compared using SAS Enterprise Guide® 4.3.

Citation preview

Page 1: Regular Expressions in SAS Enterprise Guide

Regular Expressions in SAS® Enterprise Guide®Mark Tabladillo Ph.D.http://www.marktab.com@marktabnet

Page 2: Regular Expressions in SAS Enterprise Guide

Introduction

Regular expressions are the foundation of character pattern matching

Textual data is increasingly important in predictive analytics

SAS Enterprise Guide® offers regular expression processing

Page 3: Regular Expressions in SAS Enterprise Guide

Outline

Guide for Migrating from SAS (RX) expressions to Perl (PRX) regular expressions

Best practices for Perl Regular Expressions

Advanced Perl Regular Expression Capabilities

Demo

Page 4: Regular Expressions in SAS Enterprise Guide

How to Migrate from SAS(RX) to Perl (PRX) Regular Expressions

Page 5: Regular Expressions in SAS Enterprise Guide

Upgrade to Perl (PRX) Regular ExpressionsSAS (RX) Perl (PRX) DescriptionRXPARSE Function

PRXPARSE Function

Compiles a regular expression (RX or PRX) that can be used for pattern matching of a character value

RXMATCH Function

PRXMATCH Function

Searches for a pattern match and returns the position at which the pattern is found

CALL RXSUBSTR Routine

CALL PRXSUBSTR Routine

Returns the position and length of a substring that matches a pattern (RX includes score)

CALL RXCHANGE Routine

CALL PRXCHANGE RoutinePRXCHANGE Function

Performs a pattern-matching replacement

CALL RXFREE Routine

CALL PRXFREE Routine

Frees unneeded memory allocated for a regular expression (either RX or PRX)

Page 6: Regular Expressions in SAS Enterprise Guide

How to Best Use Perl Regular Expressions

Page 7: Regular Expressions in SAS Enterprise Guide

Categories of Regular Expression CommandsCategory Single-Line

CommandProc SQL Data Step Macro

Accepts PerlRegular Expression

YES YES YES YES

AcceptsRegular Expression ID

no YES YES YES

Has CALL Routine Variant

no no YES YES

Page 8: Regular Expressions in SAS Enterprise Guide

Regular Expression Commands (1 of 3)

CommandDescription Accepts Perl

Regular Expression

Accepts Regular Expression ID

Has a Call Routine Variant

PRXCHANGE Performs a pattern-matching replacement.

YES YES YES

PRXDEBUG Enables Perl regular expressions in a DATA step to send debugging output to the SAS log.

no no YES

PRXFREE Frees memory that was allocated for a Perl regular expression.

no no YES

Page 9: Regular Expressions in SAS Enterprise Guide

Regular Expression Commands (2 of 3)

CommandDescription Accepts Perl

Regular Expression

Accepts Regular Expression ID

Has a Call Routine Variant

PRXMATCH Searches for a pattern match and returns the position at which the pattern is found.

YES YES no

PRXNEXT Returns the position and length of a substring that matches a pattern, and iterates over multiple matches within one string.

no no YES

PRXPAREN Returns the last bracket match for which there is a match in a pattern.

no YES no

Page 10: Regular Expressions in SAS Enterprise Guide

Regular Expression Commands (3 of 3)

CommandDescription Accepts Perl

Regular Expression

Accepts Regular Expression ID

Has a Call Routine Variant

PRXPARSE Compiles a Perl regular expression (PRX) that can be used for pattern matching of a character value.

YES no no

PRXPOSN Returns a character string that contains the value for a capture buffer.

no YES YES

PRXSUBSTR Returns the position and length of a substring that matches a pattern.

no no YES

Page 11: Regular Expressions in SAS Enterprise Guide

Advanced Regular Expression Commands

Page 12: Regular Expressions in SAS Enterprise Guide

Commands

Perl (PRX)Description

CALL PRXPOSN Routine

Returns the start position and length for a capture buffer

PRXPOSN Function Returns the value for a capture bufferPRXPAREN Function Returns the last bracket match for which there is a match

in a patternCALL PRXNEXT Routine

Returns the position and length of a substring that matches a pattern and iterates over multiple matches within one string

CALL PRXDEBUG Routine

Enables Perl regular expressions in a DATA step to send debug output to the SAS log

Page 13: Regular Expressions in SAS Enterprise Guide

Demos

Page 14: Regular Expressions in SAS Enterprise Guide

Conclusion

See the Paper for Details Guide for Migrating from SAS (RX) expressions to Perl (PRX)

regular expressions Best practices for Perl Regular Expressions Advanced Perl Regular Expression Capabilities Demo

Contact http://www.marktab.com http://www.marktab.net @marktabnet

Page 15: Regular Expressions in SAS Enterprise Guide

Abstract

In version 9, the SAS® System introduces Perl regular expressions (sometimes known by the acronym PRX, the first three letters of these functions or call routines). However, previous versions of SAS® already had regular expressions (known by their acronym RX, the first two letters of these functions or call routines). This presentation will describe specific functional and performance differences in these two exclusive regular expression strategies, and offer recommendations on when to use each strategy. The technologies will be compared using SAS Enterprise Guide® 4.3.

Page 16: Regular Expressions in SAS Enterprise Guide

Mark Tabladillo / MarkTab Consulting