Upload
nguyendiep
View
220
Download
0
Embed Size (px)
Citation preview
Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual
Glance into the past
• Initial ARM Processor developed by Acorn Computers, 1985 • ARM means: Acorn RISC Machine architecture • Architecture was influenced by UC Berkeley’s RISC Project • RISC means: Reduced Instruction Set Computing • RISC vs. CISC: remember GRA Lecture ?
• // TODO: Show OLD advertisement with fancy 80s jingle
Why ? • Architectural simplicity can be beneficial:
è small implementations è possibly low power consumption
• Keyfeatures: Keeping implementation size small while maintaining
reasonable performance and low power consumption.
• Example: Jetson K1 board does not exceed 15W, even under heavy load • ARM Architecture is suitable for embedded Applications, and even HPC
nowadays: ARM Cluster in Spain.
About the ARM Cortex-A15 MPCore • Implements ARMv7-A architecture
• 32 bit processor core, licensed by ARM
• Can access 40 bit large physical addresses (thus up to 1TB RAM)
• 15 Stage Integer, 17-25 Stage FP pipeline
• NEON extension (ARMS way of doing SIMD)
• Out of order speculative issue 3-way superscalar execution pipeline
• 32 KB data + 32 KB instruction L1 cache per core
• Integrated low-latency L2 cache controller, up to 4 MB per cluster
About the Cortex-A15 (ARMv7-A) Cortex A15 Multiprocessor Functionality • L2 Cache with Snoop Control Unit for cache coherency
Stuff to know about the Cortex-A15 • Better memory system performance than former models • Floating point performance enhanced • Multicore functionality for scalability • Wider pipelines for higher instruction throughput
About the Architecture (ARMv7-A) • 32 Bit ARM Architecture • Offering hardware floating point unit and various RISC features • Most often used architecture in mobile devices • Three profiles, describe in more detail later: • A = application, R = real time, M = microcontroller • Fixed instruction width of 32 bit • Almost single clock-cycle execution of most instruction
ARMv7 Variants • ARMV7-A: Traditional ARM architecture with multiple modes, supports ARM
and Thumb instruction set ( 16 bit instruction set with subset functionality of ARM instruction set è better code density ). Supports virtual memory system based on an MMU.
• ARMV7-R: Realtime profile with multiple modes, supports ARM and Thumb instruction set. Supports protected memory system, based on memory protection unit.
• ARMV7-M: Microcontroller profile, designed for low-latency interrupt processing, implements some variant of protected memory system.
Core data types • Data types in memory: • Byte: 8 bits • Halfword: 16 bits • Word: 32 bits • Doubleworld: 64 bits
• Data types in registers, supported by instruction set: • 32-bit pointers • unsigned or signed 32-bit integers • unsigned 16-bit or 8-bit integers • signed 16-bit or 8-bit integers • two 16-bit integers packed into a register • four 8-bit integers packed into a register • unsigned or signed 64-bit integers held in two registers
Core data types • Load and store operations transfer bytes, halfwords or words to and from
memory. • Instruction set supports also instructions that transfer two or more words to and
from memory
About the Architecture (ARMv7-A) • ARM implements typical RISC features: • Large and uniform register file (some ARM processors had over 60 64bit
registers)
• Load/store Architecture è data-processing operation operate only on registers content, not on memory è more uniform non-functional behavior of instructions ( //TODO: ask students why ? )
• Simple addressing modes: load/store addresse are computed from register
contents and instruction fields only
About the Architecture (ARMv7-A) • Other ARM features: • Combined shift/arithmetic shift/logic operations
• Load and Store multiple instructions è maximizing data throughput • Multiple registers can be loaded from a block of consecutive memory
• Conditional execution of all instructions: Used to be ARMs substitute for a Branch predictor, code gets executed depending on condition of flags in Application Program Status Register, thus keeping number of used branches small and speeding up execution, while saving silicon for a branch predictor.
About the Architecture (ARMv7-A) • Conditional Execution example: gcd algorithm in C:
• Normal way with branches
• Better way for ARM with conditional execution feature
• BUT: Modern ARM processor DO actually have branch prediction units
About the Architecture (ARMv7-A) • Core Registers • Thirteen general-purpose 32-bit registers, R0-R12
• Three 32-bit Registers for special use, SP (stack pointer), PC (program counter) and LR(link register)
About the Architecture (ARMv7-A) • SP: Stack pointer, points to the address of the upmost stack element. Could be
used for other things than holding a stack pointer when using ARM instruction set, but that is likely to break stuff according to the manual
• LR: Link Register, holds address where a called function should return to when it completes. More efficient than popping the return address from the memory where the stack is situated at. Nice, when calling a leaf routine for example.
• PC: Program Counter, reads address of current instruction plus 8 bytes. è Legacy thing, from when the pipeline was only three stages deep.
About the Architecture (ARMv7-A) • Application Program Status Register • 32 bit register • Reports program status • Contains condition flags such as negative, zero or carry • Contains an overflow flag • Contains greater than or equal flags • Can be used to utilize the conditional execution (as explained earlier)
About the Architecture (ARMv7-A) • Execution state registers: ISETSTATE,ITSTATE,ENDIANSTATE • Modify execution of instructions
• èwhether instruction will be interpreted as Thumb or ARM instruction: ISETSTATE
• èwhether data is interpreted big-endian or little-indian: ENDIANSTATE
• No direct access to these registers from application level instructions • But can be changed due to side-effects of these instructions
About the Architecture (ARMv7-A) • Execution state registers: ISETSTATE,ITSTATE,ENDIANSTATE • ITSTATE is a register used for execution of the IT instruction, applying to a
block of up to four instructions following an IT instruction • IT instruction makes up to four following instructions with conditions that can be
true or not . IT instructions are normally generated by the assembler, because Thumb instruction set does not support conditional execution with C,N,V,Z flags, thus IT instructions are used.
• It is divided into two subfields • IT[7:5] Holds base condition for If-Then block. Top 3 bits of condition
code from IT instruction field firstcond • IT[4:0] Size of the IT block. Value of the LSB of condition code for each
instruction in the block