18
S. Ryan Quick @phaedo, Providentia Worldwide. April 2020 HPC Impact EDA Telemetry Neural Networks

HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

S. Ryan Quick @phaedo, Providentia Worldwide. April 2020

HPC ImpactEDA Telemetry Neural Networks

Page 2: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Systems IntelligenceEcosystem Management

Page 3: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Systems Intelligence PrinciplesMethodology for leveraging multiple data domains through complex data processing

Disparate / Unlike Domains

Messaging Middleware

Insight

Insight

Page 4: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

• Aggregation

• Event Statistics

• Atomic Pattern Recognition

• Simple example shown as “waterfalling” for illustration — the operations are parallel and stateless

• Pattern is an example of the type and method of telemetry we use for EDA environmental and in-workload collection to feed AI and neural networks inline

• There are literally thousands of metrics for a single operation, millions per job

Multiple-Domain Simple Data Access

Metrics Calculator

CPU Event

Source

app login r/secapp successful login r/secapp failed login r/seccpu 1m load avgcpu 5m load avgcpu 15m load avgcpu blocked proc cntcpu running proc cntcpu waiting proc cntcpu user %cpu idle %cpu system %cpu io wait %db active queriesdb slow queriesdb selectsdb updatesdb deletesdb rows fetcheddb table locks helddb row locks held

Available Source Fields App Login Event

Source

DB Access Event

Source

> 3?

app failed login / app success login * 100

AVG(cpu waiting / cpu running)) / cpu 1M load avg * 100

> 0.5? DB Slow Queries

> 4?

Anomaly Detected: Potential Login

Attack

yes

yes

yes

Page 5: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

• Affinity + Simple Case

• Stream + Augmented Datasource

• Parallel Stream

• Frequency-Shifted Stream

• “Correlative/Normalized View”: Similar to a SQL “join” concept, we relate data fields in disparate stream sources

• Many examples — for other talks :)

• This illustrates the mechanisms by which we can combine and augment data types for complex events in AI/neural networks and utilize inline training and active models.

• Also allows us to introduce the notion of insight, which is crucial to incremental improvement model — especially for “slight touch ecosystems” like coral reefs

Multiple-Domain Complex Event Processing Approaches

Complex Event Processor

CPU Source

Zookeeper Source

RabbitMQ Source

Application Event

Source

Parallel SourceDisparate

Normalization

Correlative/Normalized

View

Correlative/Normalized

View

Correlative/Normalized

View

approx-data-szavg-latencyephemeral-countfollowersmax-fd-cntmax-latencymin-latencyopen-fd-cntnum-alive-connectionsoutstanding-requestspackets-receivedpackets-sentpending-syncssynced-followerswatch-cntznode-cnt

Zookeeper

message totalmessage readymessage unaskedrate.publishrate.deliverrate.redeliverrate.confirmrate.ackconnection.totalconnection.idlechannel.totalchannel.publisherchannel.consumerchannel.duplexchannel.inactiveexchange.rate.phaedoq.totalq.idleq.messages.phaedoq.consumers.phaedoq.memory.phaedoq.ingress.phaedoq.egress.phaedobinding.total

RabbitMQ

Page 6: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Semiconductor EDA Designing the Digital Future

Page 7: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

HPC HTC

• “High Throughput Computing”

• Very predictable, common engineering pipeline

• Toolset geared to repeat the steps in the pattern 100s, 1000s of times per iteration, per engineer constantly. Each adjustment cascades hundreds/thousands of small jobs.

• Jobs are very short lived. Avg time on single core is under 3s. Job scheduler itself is often a bottleneck on large, shared systems.

• EDA requires multiple phases of HDL synthesizers and HLL compilers and so can result in different sorts of computational bottlenecks at different phases of the pipeline as well as resulting for different design choices in the engineering decisions.

EDA Characteristics

Page 8: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Well-established Sector

• Traditional enterprise storage (NFS3)

• 10-100M small <=1M files/dir)

• user and group based access controls

• POSIX, locking not required

• OS scheduler is often sufficient. Sometimes, job submission separated by login node.

• License model well understood, and generally by core or time-based. Codes are generally proprietary.

• Turnkey deployment is up and running in minutes on nearly any sized system. Very little motivation to alter the status quo.

EDA Characteristics

Page 9: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

What Would it Take to Try something new?

• All on-prem, w/ cloud tests successful but not adopted:

• too costly

• intellectual property concerns

• ROI delayed

• data management difficulties

• Storage enhancements show improvements, and large shops adopt those, but NFS3 performs well for most small-medium practitioners.

EDA Environments

Page 10: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

What Would it Take to Try something new?

• EDA Process is well-known, easy-to-hire to, and well-understood in the industry. Why rock the boat?

• Any perturbations to the system would need to overcome the cost of change, which in semiconductor fabrication can be immense.

• Even where bottlenecks are known (storage, compute, scheduling), they are understood and manageable. New is new and unpredictable with unknown value…

EDA Pipelines at Scale?

Page 11: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

For valuable and motivational change in semiconductor EDA, we need disruption both in behavior and environment simultaneously.

Page 12: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

External focus for HTC/Systems Intelligence

• Two primary mechanisms for augmenting the EDA process:

Internally (inside the EDA pipeline).

Externally (augmenting and enhancing the pipelining environment).

We are focusing here for this project, but the usual neural network caveats apply.

Neural Networks for EDA Pipelines

Semiconductor Electronic Design Automation«precondition» API to workflow data

Chip SpecificationDesign entry/Functional verificationRTL synthesisPartitioning of chipDesign for test (DFT) insertionFloor planningPlacement stageClock tree synthesis (CTS)Routing stageFinal verificationGDS II

Infrastructure Automation«precondition» API to all components«precondition» API backwards compatible

Systems ProvisioningNetwork ProvisioningApplication DeploymentConfiguration ManagementPlatform ManagementChange Orchestration

capabilities

XY

User/group file CRUDWorkflow schedulingJob managementLicense management

X

Y

sd Systems Intelligence — EDA Messaging Substrate

CEP

Ingest

Data Analytics

inline models

offline modelsAtomic Pattern

Recognition

Parallel Stream

Command & Control

Stream Augmentation

data

/sco

res/

met

rics decisioning

orchestration

validation

feedback

Frequency-Shifted Streams

Affinity Streams

Aggregation/ Statistics

Internal

External

Page 13: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Semiconductor EDA Designing the Digital Future

“When we think of sensing technologies as devices that order the world, rather than devices that describe it, then alternative relationships between the social and the technical are strikingly brought to light.”

— Genevieve Bell (Intel) @feraldata

Page 14: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ideEDA Workflow and Supporting Infrastructure SI Messaging

Semiconductor Electronic Design Automation«precondition» API to workflow data

Chip SpecificationDesign entry/Functional verificationRTL synthesisPartitioning of chipDesign for test (DFT) insertionFloor planningPlacement stageClock tree synthesis (CTS)Routing stageFinal verificationGDS II

Infrastructure Automation«precondition» API to all components«precondition» API backwards compatible

Systems ProvisioningNetwork ProvisioningApplication DeploymentConfiguration ManagementPlatform ManagementChange Orchestration

capabilities

XY

User/group file CRUDWorkflow schedulingJob managementLicense management

X

Y

sd Systems Intelligence — EDA Messaging Substrate

CEP

Ingest

Data Analytics

inline models

offline modelsAtomic Pattern

Recognition

Parallel Stream

Command & Control

Stream Augmentation

data

/sco

res/

met

rics decisioning

orchestration

validation

feedback

Frequency-Shifted Streams

Affinity Streams

Aggregation/ Statistics

Semiconductor Electronic Design Automation«precondition» API to workflow data

Chip SpecificationDesign entry/Functional verificationRTL synthesisPartitioning of chipDesign for test (DFT) insertionFloor planningPlacement stageClock tree synthesis (CTS)Routing stageFinal verificationGDS II

Infrastructure Automation«precondition» API to all components«precondition» API backwards compatible

Systems ProvisioningNetwork ProvisioningApplication DeploymentConfiguration ManagementPlatform ManagementChange Orchestration

capabilities

XY

User/group file CRUDWorkflow schedulingJob managementLicense management

X

Y

sd Systems Intelligence — EDA Messaging Substrate

CEP

Ingest

Data Analytics

inline models

offline modelsAtomic Pattern

Recognition

Parallel Stream

Command & Control

Stream Augmentation

data

/sco

res/

met

rics decisioning

orchestration

validation

feedback

Frequency-Shifted Streams

Affinity Streams

Aggregation/ Statistics

External Capabilities and Infrastructure

EDA SI Messaging Substrate

Insight

Insight

Page 15: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ideEDA Workflow and AI/NN Frameworks

Semiconductor Electronic Design Automation«precondition» API to workflow data

Chip SpecificationDesign entry/Functional verificationRTL synthesisPartitioning of chipDesign for test (DFT) insertionFloor planningPlacement stageClock tree synthesis (CTS)Routing stageFinal verificationGDS II

Infrastructure Automation«precondition» API to all components«precondition» API backwards compatible

Systems ProvisioningNetwork ProvisioningApplication DeploymentConfiguration ManagementPlatform ManagementChange Orchestration

capabilities

XY

User/group file CRUDWorkflow schedulingJob managementLicense management

X

Y

sd Systems Intelligence — EDA Messaging Substrate

CEP

Ingest

Data Analytics

inline models

offline modelsAtomic Pattern

Recognition

Parallel Stream

Command & Control

Stream Augmentation

data

/sco

res/

met

rics decisioning

orchestration

validation

feedback

Frequency-Shifted Streams

Affinity Streams

Aggregation/ Statistics

Semiconductor Electronic Design Automation«precondition» API to workflow data

Chip SpecificationDesign entry/Functional verificationRTL synthesisPartitioning of chipDesign for test (DFT) insertionFloor planningPlacement stageClock tree synthesis (CTS)Routing stageFinal verificationGDS II

Infrastructure Automation«precondition» API to all components«precondition» API backwards compatible

Systems ProvisioningNetwork ProvisioningApplication DeploymentConfiguration ManagementPlatform ManagementChange Orchestration

capabilities

XY

User/group file CRUDWorkflow schedulingJob managementLicense management

sd Neural Networks

sd Messaging-Based Machine Learning / AI / Neural Networks Workflow

Data Analytics and Normalization Reactive Systems

scor

ing

/ met

rics

decisioning

orchestration

validation

feedback

inline learning models

Clustering, Classification, Decision

Trees

Insight Consumers

Ecosystem Insight and KPI Enhancements

Ecosystem Messaging Platform Pattern Enhancements

Mod

el R

unM

odel

Tra

inin

g

Offline / replay learning models

CEP

/ING

EST

from

Exi

stin

g Da

taso

urce

s

X

Y

Y

X

External Capabilities and Infrastructure

EDA ML / AI / NN Workflow

SI M

essa

ging

Sub

stra

te

Insight

Insight

Insight

Page 16: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Unique position for AI and NNWhy Artificial Intelligence/Neural Networks for this Problem?

• Small, incremental human-driven changes are not cost-effective in today’s DevOps systems

• Continuous observation for “minority report” style changes is difficult to design sprints and test efficacy, even harder to measure ROI

• Command and control systems can be designed to allow incremental change directly from NNs based on deployments — e.g. allow each “reef” to tune itself based on its own ecosystem

• The “show your work”/“show your rationale” problems are weaker in EDA compared to delivering results than in other domains

Page 17: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Insight: “looking inward”

Insight provides a mechanism for self-tuning behavior of the running system at all levels:•algorithms, models, data access, expert systems, KPIs, behaviors, reports, accuracy, efficiency, even insight itself•In-built feedback mechanism for capturing behavior and performance•Mechanism to ensure that changes over time are accounted for and noticed if not understood•Allows for inline and ongoing training without having to maintain offline (and outdated) training datasets•Allows for locale-specific NN training (the NN-locale problem).

Page 18: HPCAI 202004 EDA · Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small

Prov iden t ia Wor ldw ide

Program StatusWhere are we now?

• Telemetry data from workload systems feeding messaging platform

• Synthetic workload (provided from partner benchmarking suite) being modified for user-emulation

• NN specific topology choice and models under discussion with wider team considering we will need to utilize simultaneous learning, model promotion, results propagation, etc.

• Insight mechanisms are developed in the messaging substrate automatically, with common APIs available to higher level structures. Common reporting in dashboards etc.

• Always looking for helpers to take things farther — will report more later as we (un)shelter…