Upload
vodieu
View
218
Download
2
Embed Size (px)
Citation preview
© 2017 Arm Limited Arm Tech Symposia 2017
Hardware & software performance analysis
using cycle models
Feng Niu| Senior Technical Specialist | Arm
© 2017 Arm Limited 2
IntroductionHow can system designers evaluate configuration choices without silicon?
• Caches, buffers, number of CPUs – so many choices!
• You want to build fast enough, because too slow won’t work, and too fast is a waste of silicon
We’ll show you how you can use Arm Cycle Models to help evaluate your design choices
© 2017 Arm Limited
Design problem overview
© 2017 Arm Limited 4
Arm CPUs
Arm CPUs can have a number of configuration options.
• L1 cache sizes
• L2 cache size (and in some cases, whether there is even an L2 cache included)
• Number of CPUs in a multi-processor cluster
• Other options – cache memory latency, buffer sizes, and so on
How can you evaluate those design choices?
• More CPUs with bigger caches will run the fastest, but do you really need all that performance?
© 2017 Arm Limited 5
Physical Implementation
Bigger caches can give better performance, but….
Bigger caches mean more area, and more power usage.
More area can mean a lower yield, and increased cost per part.
And, getting it wrong means another expensive spin.
It (literally) pays to do just enough, and get it right the first time.
© 2017 Arm Limited 6
Benchmarks
Arm provides some benchmark numbers for our IP.
• But they aren’t always comprehensive, and may not match your intended application
• May not be easy to extrapolate to the best configuration for you
Running your code on your system would be the best way to evaluate your design.
• Your code will need porting first
• Your code will (likely) need debugging
• Your code will likely need some deeper analysis to optimize, and to find performance problems
© 2017 Arm Limited 7
So how do you address those problems?Problem summary: debugging and running your own code and getting accurate performance measurements in a variety of IP configurations.
• You have some options:
Method Advantages Disadvantages
RTL Simulation• Cycle accurate• Easy to reconfigure
• Slow clock speeds• Limited source level debug
FPGA• Cycle accurate• Good clock speeds
• Difficult to reconfigure• No internal debug visibility
Emulator • Cycle accurate• Expensive• Difficult to configure• Not great clock speeds
Third party silicon• Cycle accurate• Real clock speeds
• No custom IP configuration• Difficult to add custom IP• No internal debug visibility
Code models/simulators• Good debug visbility• Good clock speeds
• Not cycle accurate
© 2017 Arm Limited
Arm Cycle Models overview
© 2017 Arm Limited 9
Arm Cycle Models Faster SoC architectural definition and fine-grained performance optimization
Choose the right Arm IP and get the most out of it.
• 100% cycle accurate models indicate true system performance
• Use accurate virtual prototypes to determine best system architecture trade-offs
• Facilitates IP selection – simply replace an IP block, re-run the simulation and observe the performance delta
Simplify system design and performance analysis.
• Ideal for system architecture design, IP selection, firmware development and benchmarking
• Removes the risk of making the wrong design decision due to inaccurate results
• Get started in minutes with our reference systems of CPU, interconnect and memory models
Achieve faster time to result compared to RTL simulation/emulation.
• Reconfigure models in minutes not days, for quick turnaround what-if analysis
• Debug and performance analysis features help pinpoint performance bottlenecks
• Scales out easily to large development teams
Optimize critical code
100% Cycle Accurate
Quickly evaluate SoC configurations
© 2017 Arm Limited 10
System-level analysis of virtual platforms
Unique set of capabilities.
• Automatic creation of cycle accurate models from RTL
• Development environment for system-level performance analysis
• Wide portfolio of Arm processor and system models with debug and profiling interfaces
Cycle Model Studio
Arm SoC Designer
Custom RTL Custom cycle model
Custom cycle models
Model library
Generation of cycle accurate models
© 2017 Arm Limited 11
• Understand the interaction of SW and HW for critical algorithms
• Models must expose performance data and profiling interfaces
• Software optimizationrequires system visibility
• Accuracy is critical for system validation and certification
• Easily try different IP configurations and software loads
• Generate benchmarking data for design decisions and customer wins
• Easily understand the behaviourand performance of the IP
• Models need to be accurateand secure
IP evaluationSystem
architecture
Performance optimization
Software development
Cycle Accurate Simulation is Important to Arm Partners
© 2017 Arm Limited 12
Build any valid configuration.
Only valid configurations are buildable.
100% accurate model available for download.
User is emailed when models are ready.
Existing models can be re-downloaded or reconfigured as needed.
Arm Model Creation on
© 2017 Arm Limited 13
Arm IP models are delivered on IP Exchange.
Models are configurable and instrumented.
• DS-5, profiling, cache statistics, semi-hosting, etc
Models are available in both SoC Designer and SystemC.
www.armipexchange.com
Arm Cycle Accurate Models on IP Exchange
Cortex A ProcessorsCortex-A75Cortex-A72Cortex-A57Cortex-A55Cortex-A53Cortex-A35Cortex-A32Cortex-A15Cortex-A9Cortex-A8Cortex-A7Cortex-A5
CoreLink IPCCN-504CCN-502NIC-400NIC-301PL301CCI-550CCI-500CCI-400DMC-400MMU-400GIC-600PL34xPL35x
Cortex R ProcessorsCortex-R7Cortex-R5Cortex-R4Cortex-R8Cortex-R52Cortex M ProcessorsCortex-M33Cortex-M23Cortex-M7Cortex-M4Cortex-M3Cortex-M0Cortex-M0+
And many more
© 2017 Arm Limited 14
Using Cycle Models for tradeoff analysisSoC Designer includes substantial analysis capabilities to enable architectural tradeoffs.
Leverage benchmarks or directed stimulus to create realistic system loads.
Modify configuration to gauge impact of architectural decisions.
Refine system to achieve desired results.
Analyze a single run or multiple.
Latency, throughput, branches, caching, instrumented power, software processes and much more.
© 2017 Arm Limited 15
Model Creation
Arm Cycle Accurate Virtual Prototype SolutionsArchitectural Analysis and Firmware
Development
RTL
System LevelModel
LT Model
LT Model
Arm® Fast Model™
Model and System Deployment
Model and System Libraries
IP Models
CPAKArm® Cycle
Model™ Arm® Cycle
Model™
Arm® Cycle
Model™
© 2017 Arm Limited 16
Fast forwarding with Swap & PlayAccuracy is not needed while running the bootcode and setting up the benchmark.
Fast model execution
Swap to cycle model
execution
Cycle model execution
For SPECint, fast model system runs this part in 2 secs while the same
cycle model system takes 26 minutes
Bootcode Benchmark setup Benchmark execution Process results
© 2017 Arm Limited 17
Over 100 pre-built, extensible virtual prototypes.
• Arm® Cortex®-A72, Cortex-A57, Cortex-A53, Cortex-A15, Cortex-A9, Cortex-A7, big.LITTLE™ and more
Reconfigurable memory and fabric.
• CCN-50x, NIC-400, NIC-301, CCI-400, DMC-400, etc
Pre-built bare-metal software.
Pre-built OS ports.
• Linaro Linux with VE memory mapping
Swap & Play enabled.
• Execute at 10s to 100s of MIPS
• Debug with 100% accuracy
Source code for all software components.
Downloadable 24/7 from.
CPAKs enable fast productivity and minimize support requirements.
Virtual Prototypes for Performance AnalysisPerformance Analysis Kits (CPAKs)
© 2017 Arm Limited 18
Cycle Models enable accurate performance analysis with enhanced profiling capabilities for many classes of software, including benchmarks.
Cycle Models provide software and system debugging features to quickly identify issues during software execution.
System assembly, IP configuration, and modifications are easier with Cycle Models allowing more comparisons to be performed.
Arm Cycle Models Summary
Simplifying pre-silicon performance analysis
© 2017 Arm Limited 19
What next?
If you are interested in licensing Arm Cycle Models, please contact:
• [email protected] -or- [email protected] -or- Your Arm Partner team
For existing Cycle Model licensees, please send your support questions to:
You can read about Cycle models on the web:
• https://developer.arm.com/products/system-design/cycle-models
Arm IP Exchange:
• http://armipexchange.com/
2020
Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!감사합니다धन्यवाद
© 2017 Arm Limited