1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Performance per STE (*10000) Baseline AP/SpAP 0 1 2 3 4 5 6 Speedup 0.1% 1% Architectural Support for Efficient Large-Scale Automata Processing Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran† Sreepathi Pai‡ Adwait Jog* *College of William & Mary †AMD ‡University of Rochester 1- Automata Processing 2- Challenges & Opportunities 5- Summary 4- Efficient Automata Processing on AP 3- Potential Benefits & Research Questions Observation: Repeated configurations and executions on AP which causes inefficiency Goal: Accelerate large-scale NFA processing on AP + Demonstrate that a large number of NFA states are Cold during execution but still configured to AP + Predict if a state is Cold or Hot @ compile time using a small profiling input + Propose topological-order based NFA partitioning into Predicted Cold and Predicted Hot states + Develop SparseAP to handle mispredictions efficiently using our proposed Enable and Jump operations Results + 2.1x Speedup (up to 47x) Potential Solution Remove Cold states from the NFAs Configure ONLY the Hot states to AP Used widely in different areas Von Neumann architectures are not efficient at FSM processing - Irregular memory accesses - Limited Parallelism Solution: Use Automata Processor (AP) - Enables in-memory processing - Exploits state parallelism of NFAs Applications are getting Bigger AP capacity is Limited Challenge: Repeated Executions! Time Opportunity: Underutilization of AP Pattern mismatch Many unused states are configured to AP 0% 20% 40% 60% 80% 100% CAV4k DS CAV DS03 DS06 Snort_L DS09 Snort HM1500 HM500 HM1000 HM PEN TCP Rg1 EM ER Rg05 Fermi Pro Brill LV Bro217 SPM RF1 RF2 Percentage of States Hot (Enabled) Cold (Never-enabled) Time Oracular knowledge of input Arbitrary states partitioning Question#1: How to predict Cold states? Question#2: How to partition NFAs? Mispredictions Question#3: How to handle mispredictions efficiently? Decrease Batches We acknowledge the support of the National Science Foundation (NSF) grants (#1657336, #1717532, #1750667) MICRO 2018 Q1: How to predict Cold states? 47x 1.8x 2.1x 32.1% Q2: How to partition NFAs? Q3: How to handle mispredictions efficiently? Training Testing Solution: Use a small profiling input to predict the Hot/Cold states % from Input % from Training Accuracy 50% 100% 97% 10% 20% 93% 1% 2% 90% 0.1% 0.2% 87% Oracular knowledge of input Time Input Predicted Hot Set Predicted Cold Set u v vGenerated Intermediate Report(s) Problem: Input stream execution on the predicted Cold set is too Expensive v; Cycle x Solution: SparseAP Execution Mode SpAP = Jump Op + Enable Op Generated Intermediate Report(s) Arbitrary states partitioning Solution: Partition using Topological Order Correlates with Cold and Hot states Makes transition unidirectional 0% 20% 40% 60% 80% 100% CAV4k DS CAV DS03 DS06 Snort_L DS09 Snort HM1500 HM500 HM1000 HM PEN TCP Rg1 EM ER Rg05 Fermi Pro Brill LV Bro217 SPM RF1 RF2 Percentage of States Shallow Medium Deep

Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran Sreepathi Pai ...massemibrahim.github.io/Documents/PDFs/poster_hongyuan_liu_micro2018.pdf · Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran†Sreepathi

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran Sreepathi Pai ...massemibrahim.github.io/Documents/PDFs/poster_hongyuan_liu_micro2018.pdf · Hongyuan Liu* Mohamed Ibrahim* Onur Kayiran†Sreepathi

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Perf

orm

an

ce p

er

ST

E(*

10

00

0)

Baseline AP/SpAP

0

1

2

3

4

5

6

Sp

eed

up

0.1% 1%

Architectural Support for Efficient Large-Scale Automata ProcessingHongyuan Liu* Mohamed Ibrahim* Onur Kayiran† Sreepathi Pai‡ Adwait Jog*

*College of William & Mary †AMD ‡University of Rochester

1- Automata Processing

2- Challenges & Opportunities

5- Summary

4- Efficient Automata Processing on AP

3- Potential Benefits & Research Questions

▪ Observation: Repeated configurations and executions on AP whichcauses inefficiency

▪ Goal: Accelerate large-scale NFA processing on AP+ Demonstrate that a large number of NFA states are Cold during

execution but still configured to AP+ Predict if a state is Cold or Hot @ compile time using a small

profiling input+ Propose topological-order based NFA partitioning into Predicted

Cold and Predicted Hot states+ Develop SparseAP to handle mispredictions efficiently using our

proposed Enable and Jump operations

▪ Results+ 2.1x Speedup (up to 47x)

Potential SolutionRemove Cold states from the NFAs

Configure ONLY the Hot states to AP

Used widely in different areas

Von Neumann architectures are not efficient at FSM processing- Irregular memory accesses- Limited Parallelism

Solution: Use Automata Processor (AP)

- Enables in-memory processing

- Exploits state parallelism of NFAs

Applications are getting Bigger

AP capacity is Limited

Challenge: Repeated Executions!

Time

Opportunity: Underutilization of APPattern mismatch Many unused states are configured to AP

0%20%40%60%80%

100%

CA

V4

k

DS

CA

V

DS0

3

DS0

6

Sno

rt_L

DS0

9

Sno

rt

HM

15

00

HM

50

0

HM

10

00

HM

PEN TC

P

Rg1 EM ER

Rg0

5

Ferm

i

Pro

Bri

ll

LV

Bro

21

7

SPM

RF1

RF2P

erc

en

tage

of

Stat

es Hot (Enabled) Cold (Never-enabled)

Time

Oracular knowledge of input Arbitrary states partitioning

Question#1:How to predict Cold states?

Question#2:How to partition NFAs?

Mispredictions

Question#3:How to handle mispredictions efficiently?

Decrease Batches

We acknowledge the support of the National Science Foundation (NSF) grants (#1657336, #1717532, #1750667)

MICRO 2018

Q1: How to predict Cold states?

47x

1.8x2.1x

32.1%

Q2: How to partition NFAs?

Q3: How to handle mispredictions efficiently?

Tra

inin

gTe

sting

Solution: Use a small profiling input to predict

the Hot/Cold states

% from Input

% from Training

Accuracy

50% 100% 97%

10% 20% 93%

1% 2% 90%

0.1% 0.2% 87%

Oracular knowledge of input

Time

Input

Predicted

Hot Set

Predicted

Cold Set

u

v

v’

Generated Intermediate

Report(s)

Problem: Input stream execution on the predicted Cold set is too Expensive

v; Cycle x

Solution: SparseAP Execution ModeSpAP = Jump Op + Enable Op

Generated Intermediate Report(s)

Arbitrary states partitioning

Solution: Partition using Topological Order

Correlates with Cold and Hot states

Makes transition unidirectional 0%20%40%60%80%

100%

CA

V4

k

DS

CA

V

DS0

3

DS0

6

Sno

rt_L

DS0

9

Sno

rt

HM

15

00

HM

50

0

HM

10

00

HM

PEN TC

P

Rg1 EM ER

Rg0

5

Ferm

i

Pro

Bri

ll

LV

Bro

21

7

SPM

RF1

RF2P

erc

en

tage

of

Stat

es

Shallow Medium Deep