20
© 2017 Arm Limited Arm Tech Symposia 2017 Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist | Arm

Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

  • Upload
    vodieu

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited Arm Tech Symposia 2017

Hardware & software performance analysis

using cycle models

Feng Niu| Senior Technical Specialist | Arm

Page 2: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 2

IntroductionHow can system designers evaluate configuration choices without silicon?

• Caches, buffers, number of CPUs – so many choices!

• You want to build fast enough, because too slow won’t work, and too fast is a waste of silicon

We’ll show you how you can use Arm Cycle Models to help evaluate your design choices

Page 3: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited

Design problem overview

Page 4: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 4

Arm CPUs

Arm CPUs can have a number of configuration options.

• L1 cache sizes

• L2 cache size (and in some cases, whether there is even an L2 cache included)

• Number of CPUs in a multi-processor cluster

• Other options – cache memory latency, buffer sizes, and so on

How can you evaluate those design choices?

• More CPUs with bigger caches will run the fastest, but do you really need all that performance?

Page 5: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 5

Physical Implementation

Bigger caches can give better performance, but….

Bigger caches mean more area, and more power usage.

More area can mean a lower yield, and increased cost per part.

And, getting it wrong means another expensive spin.

It (literally) pays to do just enough, and get it right the first time.

Page 6: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 6

Benchmarks

Arm provides some benchmark numbers for our IP.

• But they aren’t always comprehensive, and may not match your intended application

• May not be easy to extrapolate to the best configuration for you

Running your code on your system would be the best way to evaluate your design.

• Your code will need porting first

• Your code will (likely) need debugging

• Your code will likely need some deeper analysis to optimize, and to find performance problems

Page 7: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 7

So how do you address those problems?Problem summary: debugging and running your own code and getting accurate performance measurements in a variety of IP configurations.

• You have some options:

Method Advantages Disadvantages

RTL Simulation• Cycle accurate• Easy to reconfigure

• Slow clock speeds• Limited source level debug

FPGA• Cycle accurate• Good clock speeds

• Difficult to reconfigure• No internal debug visibility

Emulator • Cycle accurate• Expensive• Difficult to configure• Not great clock speeds

Third party silicon• Cycle accurate• Real clock speeds

• No custom IP configuration• Difficult to add custom IP• No internal debug visibility

Code models/simulators• Good debug visbility• Good clock speeds

• Not cycle accurate

Page 8: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited

Arm Cycle Models overview

Page 9: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 9

Arm Cycle Models Faster SoC architectural definition and fine-grained performance optimization

Choose the right Arm IP and get the most out of it.

• 100% cycle accurate models indicate true system performance

• Use accurate virtual prototypes to determine best system architecture trade-offs

• Facilitates IP selection – simply replace an IP block, re-run the simulation and observe the performance delta

Simplify system design and performance analysis.

• Ideal for system architecture design, IP selection, firmware development and benchmarking

• Removes the risk of making the wrong design decision due to inaccurate results

• Get started in minutes with our reference systems of CPU, interconnect and memory models

Achieve faster time to result compared to RTL simulation/emulation.

• Reconfigure models in minutes not days, for quick turnaround what-if analysis

• Debug and performance analysis features help pinpoint performance bottlenecks

• Scales out easily to large development teams

Optimize critical code

100% Cycle Accurate

Quickly evaluate SoC configurations

Page 10: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 10

System-level analysis of virtual platforms

Unique set of capabilities.

• Automatic creation of cycle accurate models from RTL

• Development environment for system-level performance analysis

• Wide portfolio of Arm processor and system models with debug and profiling interfaces

Cycle Model Studio

Arm SoC Designer

Custom RTL Custom cycle model

Custom cycle models

Model library

Generation of cycle accurate models

Page 11: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 11

• Understand the interaction of SW and HW for critical algorithms

• Models must expose performance data and profiling interfaces

• Software optimizationrequires system visibility

• Accuracy is critical for system validation and certification

• Easily try different IP configurations and software loads

• Generate benchmarking data for design decisions and customer wins

• Easily understand the behaviourand performance of the IP

• Models need to be accurateand secure

IP evaluationSystem

architecture

Performance optimization

Software development

Cycle Accurate Simulation is Important to Arm Partners

Page 12: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 12

Build any valid configuration.

Only valid configurations are buildable.

100% accurate model available for download.

User is emailed when models are ready.

Existing models can be re-downloaded or reconfigured as needed.

Arm Model Creation on

Page 13: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 13

Arm IP models are delivered on IP Exchange.

Models are configurable and instrumented.

• DS-5, profiling, cache statistics, semi-hosting, etc

Models are available in both SoC Designer and SystemC.

www.armipexchange.com

Arm Cycle Accurate Models on IP Exchange

Cortex A ProcessorsCortex-A75Cortex-A72Cortex-A57Cortex-A55Cortex-A53Cortex-A35Cortex-A32Cortex-A15Cortex-A9Cortex-A8Cortex-A7Cortex-A5

CoreLink IPCCN-504CCN-502NIC-400NIC-301PL301CCI-550CCI-500CCI-400DMC-400MMU-400GIC-600PL34xPL35x

Cortex R ProcessorsCortex-R7Cortex-R5Cortex-R4Cortex-R8Cortex-R52Cortex M ProcessorsCortex-M33Cortex-M23Cortex-M7Cortex-M4Cortex-M3Cortex-M0Cortex-M0+

And many more

Page 14: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 14

Using Cycle Models for tradeoff analysisSoC Designer includes substantial analysis capabilities to enable architectural tradeoffs.

Leverage benchmarks or directed stimulus to create realistic system loads.

Modify configuration to gauge impact of architectural decisions.

Refine system to achieve desired results.

Analyze a single run or multiple.

Latency, throughput, branches, caching, instrumented power, software processes and much more.

Page 15: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 15

Model Creation

Arm Cycle Accurate Virtual Prototype SolutionsArchitectural Analysis and Firmware

Development

RTL

System LevelModel

LT Model

LT Model

Arm® Fast Model™

Model and System Deployment

Model and System Libraries

IP Models

CPAKArm® Cycle

Model™ Arm® Cycle

Model™

Arm® Cycle

Model™

Page 16: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 16

Fast forwarding with Swap & PlayAccuracy is not needed while running the bootcode and setting up the benchmark.

Fast model execution

Swap to cycle model

execution

Cycle model execution

For SPECint, fast model system runs this part in 2 secs while the same

cycle model system takes 26 minutes

Bootcode Benchmark setup Benchmark execution Process results

Page 17: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 17

Over 100 pre-built, extensible virtual prototypes.

• Arm® Cortex®-A72, Cortex-A57, Cortex-A53, Cortex-A15, Cortex-A9, Cortex-A7, big.LITTLE™ and more

Reconfigurable memory and fabric.

• CCN-50x, NIC-400, NIC-301, CCI-400, DMC-400, etc

Pre-built bare-metal software.

Pre-built OS ports.

• Linaro Linux with VE memory mapping

Swap & Play enabled.

• Execute at 10s to 100s of MIPS

• Debug with 100% accuracy

Source code for all software components.

Downloadable 24/7 from.

CPAKs enable fast productivity and minimize support requirements.

Virtual Prototypes for Performance AnalysisPerformance Analysis Kits (CPAKs)

Page 18: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 18

Cycle Models enable accurate performance analysis with enhanced profiling capabilities for many classes of software, including benchmarks.

Cycle Models provide software and system debugging features to quickly identify issues during software execution.

System assembly, IP configuration, and modifications are easier with Cycle Models allowing more comparisons to be performed.

Arm Cycle Models Summary

Simplifying pre-silicon performance analysis

Page 19: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

© 2017 Arm Limited 19

What next?

If you are interested in licensing Arm Cycle Models, please contact:

[email protected] -or- [email protected] -or- Your Arm Partner team

For existing Cycle Model licensees, please send your support questions to:

[email protected]

You can read about Cycle models on the web:

• https://developer.arm.com/products/system-design/cycle-models

Arm IP Exchange:

• http://armipexchange.com/

Page 20: Hardware & software performance analysis using cycle …SZ-C3_BJ-C2_Hardwa… · Hardware & software performance analysis using cycle models Feng Niu| Senior Technical Specialist

2020

Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!감사합니다धन्यवाद

© 2017 Arm Limited