Upload
esther-townsend
View
234
Download
0
Tags:
Embed Size (px)
Citation preview
11
ARMARMThe first encounterThe first encounter
Authors: Authors: Nemanja Perovic, Nemanja Perovic, [email protected]@yahoo.com Prof. Dr. Veljko Milutinovic, Prof. Dr. Veljko Milutinovic, [email protected]@etf.bg.ac.yu
22
What Is ARM?What Is ARM?
Advanced RISC MachineAdvanced RISC Machine
First RISC microprocessor First RISC microprocessor
for commercial usefor commercial use
Market-leader for low-powerMarket-leader for low-power
and cost-sensitive embedded applicationsand cost-sensitive embedded applications
44
FeaturesFeatures
Architectural simplicityArchitectural simplicity
which allowswhich allows
Very small implementationsVery small implementations
which result inwhich result in
Very low power consumptionVery low power consumption
55
The History of ARMThe History of ARM
Developed at Acorn Computers Limited,
of Cambridge, England,
between 1983 and 1985
Problems with CISC:
Slower then memory parts
Clock cycles per instruction
66
The History of ARM (2)The History of ARM (2)
Solution – the Berkeley RISC I:
Competitive
Easy to develop (less than a year)
Cheap
Pointing the way to the future
77
ARM ArchitectureARM Architecture
Typical RISC architecture:Typical RISC architecture:
Large uniform register fileLarge uniform register file
Load/store architectureLoad/store architecture
Simple addressing modesSimple addressing modes
Uniform and fixed-length instruction fieldsUniform and fixed-length instruction fields
88
ARM Architecture (2)ARM Architecture (2)
Enhancements:Enhancements:
Each instruction controls the ALU and shifterEach instruction controls the ALU and shifter
Auto-incrementAuto-increment
and auto-decrement addressing modesand auto-decrement addressing modes
Multiple Load/StoreMultiple Load/Store
Conditional executionConditional execution
99
ARM Architecture (3)ARM Architecture (3)
Results:Results:
High performanceHigh performance
Low code sizeLow code size
Low power consumptionLow power consumption
Low silicon areaLow silicon area
1010
Pipeline OrganizationPipeline Organization
Increases speed – Increases speed –
most instructions executed in single cyclemost instructions executed in single cycle
Versions:Versions: 3-stage (ARM7TDMI and earlier)3-stage (ARM7TDMI and earlier)
5-stage (ARMS, ARM9TDMI)5-stage (ARMS, ARM9TDMI)
6-stage (ARM10TDMI)6-stage (ARM10TDMI)
1111
Pipeline Organization (2)Pipeline Organization (2)
3-stage pipeline: Fetch – Decode - Execute3-stage pipeline: Fetch – Decode - Execute
Three-cycle latency, Three-cycle latency,
one instruction per cycle throughputone instruction per cycle throughput
cycle
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode Execute
instruction
t t+1 t+2 t+3 t+4
i
i+1
i+2
1212
Write-backWrite-back
Buffer/dataBuffer/data
ExecuteExecute
DecodeDecode
Pipeline Organization (3)Pipeline Organization (3)
5-stage pipeline:5-stage pipeline: Reduces work per cycle => Reduces work per cycle =>
allows higher clock frequencyallows higher clock frequency
Separates data and Separates data and instruction memory => instruction memory => reduction of CPI reduction of CPI
(average number (average number of clock Cycles Per Instruction)of clock Cycles Per Instruction)
Stages:Stages:
FetchFetch
1313
Pipeline Organization (4)Pipeline Organization (4)
Pipeline flushed and refilled on branch,Pipeline flushed and refilled on branch,
causing execution to slow downcausing execution to slow down
Special features in instruction setSpecial features in instruction set
eliminate small jumps in codeeliminate small jumps in code
to obtain the best flow through pipelineto obtain the best flow through pipeline
1414
Operating ModesOperating Modes
Seven operating modes:Seven operating modes: UserUser
Privileged:Privileged:
System (version 4 and above)System (version 4 and above)
FIQFIQ
IRQIRQ
AbortAbort
Undefined Undefined
SupervisorSupervisor
exception modesexception modes
1515
Operating Modes (2)Operating Modes (2)
User mode:User mode:
Normal program Normal program execution modeexecution mode
System resources System resources unavailableunavailable
Mode changed Mode changed by exception onlyby exception only
Exception modes:Exception modes:
Entered Entered upon exceptionupon exception
Full accessFull accessto system resourcesto system resources
Mode changed freelyMode changed freely
1616
ExceptionsExceptions
Table 1 - Exception types, sorted by Interrupt Vector addressesTable 1 - Exception types, sorted by Interrupt Vector addresses
ExceptionException ModeMode PriorityPriority IV AddressIV Address
ResetReset SupervisorSupervisor 11 0x000000000x00000000
Undefined instructionUndefined instruction UndefinedUndefined 66 0x000000040x00000004
Software interruptSoftware interrupt SupervisorSupervisor 66 0x000000080x00000008
Prefetch AbortPrefetch Abort AbortAbort 55 0x0000000C0x0000000C
Data AbortData Abort AbortAbort 22 0x000000100x00000010
InterruptInterrupt IRQIRQ 44 0x000000180x00000018
Fast interruptFast interrupt FIQFIQ 33 0x0000001C0x0000001C
1717
ARM RegistersARM Registers
31 general-purpose 32-bit registers31 general-purpose 32-bit registers
16 visible, R0 – R1516 visible, R0 – R15
Others speed up the exception processOthers speed up the exception process
1818
ARM Registers (2)ARM Registers (2)
Special roles:Special roles: Hardware Hardware
R14 – Link Register (LR): R14 – Link Register (LR): optionally holds return address optionally holds return address for branch instructionsfor branch instructions
R15 – Program Counter (PC)R15 – Program Counter (PC)
SoftwareSoftware
R13 - Stack Pointer (SP)R13 - Stack Pointer (SP)
1919
ARM Registers (3)ARM Registers (3)
Current Program Status Register (CPSR)Current Program Status Register (CPSR)
Saved Program Status Register (SPSR)Saved Program Status Register (SPSR)
On exception, entering On exception, entering modmod mode: mode: (PC + 4) (PC + 4) LR LR
CPSR CPSR SPSR_modSPSR_mod
PC PC IV address IV address
R13, R14 replaced by R13_mod, R14_modR13, R14 replaced by R13_mod, R14_mod
In case of FIQ mode R7 – R12 also replacedIn case of FIQ mode R7 – R12 also replaced
2020
ARM Registers (4)ARM Registers (4)
R0R1R2R3R4R5R6R7R8R9R10R11R12R13R14
R15 (PC)
CPSR
System & UserR0R1R2R3R4R5R6
R7_fiqR8_fiqR9_fiqR10_fiqR11_fiqR12_fiqR13_fiqR14_fiq
R15 (PC)
CPSRSPSR_fiq
FIQR0R1R2R3R4R5R6R7R8R9R10R11R12
R13_irqR14_irq
R15 (PC)
CPSRSPSR_irq
IRQR0R1R2R3R4R5R6R7R8R9R10R11R12
R13_svcR14_svcR15 (PC)
CPSRSPSR_svc
SupervisorR0R1R2R3R4R5R6R7R8R9R10R11R12
R13_abtR14_abtR15 (PC)
CPSRSPSR_abt
AbortR0R1R2R3R4R5R6R7R8R9R10R11R12
R13_undR14_undR15 (PC)
CPSRSPSR_und
Undefined
2121
Instruction SetInstruction Set
Two instruction sets:Two instruction sets: ARMARM
Standard 32-bit instruction setStandard 32-bit instruction set
THUMBTHUMB
16-bit compressed form16-bit compressed form
Code density better than most CISCCode density better than most CISC
Dynamic decompression in pipelineDynamic decompression in pipeline
2222
ARM Instruction SetARM Instruction Set
Features:Features: Load/Store architectureLoad/Store architecture
3-address data processing instructions3-address data processing instructions
Conditional executionConditional execution
Load/Store multiple registersLoad/Store multiple registers
Shift & ALU operation in single clock cycleShift & ALU operation in single clock cycle
2323
ARM Instruction Set (2)ARM Instruction Set (2)
Conditional execution:Conditional execution: Each data processing instruction Each data processing instruction
prefixed by condition codeprefixed by condition code
Result – smooth flow of instructions through pipelineResult – smooth flow of instructions through pipeline
16 condition codes:16 condition codes:
EQEQ equalequal MIMI negativenegative HIHI unsigned unsigned higherhigher GTGT signed greater signed greater
thanthan
NENE not equalnot equal PLPL positive or positive or zerozero LSLS unsigned lower unsigned lower
or sameor same LELE signed less signed less than or equalthan or equal
CSCSunsigned unsigned higher or higher or samesame
VSVS overflowoverflow GEGE signed greater signed greater than or equalthan or equal ALAL alwaysalways
CCCC unsigned unsigned lowerlower VCVC no overflowno overflow LTLT signed less signed less
thanthan NVNV special special purposepurpose
2424
ARM Instruction Set (3)ARM Instruction Set (3)
ARM instruction set
Data processing instructions
Data transfer instructions
Software interruptinstructions
Block transferinstructions
Multiply instructions
Branching instructions
2525
Data Processing Instructions Data Processing Instructions
Arithmetic and logical operationsArithmetic and logical operations
3-address format:3-address format: Two 32-bit operands Two 32-bit operands
(op1 is register, op2 is register or immediate)(op1 is register, op2 is register or immediate)
32-bit result placed in a register32-bit result placed in a register
Barrel shifter for op2 allows full 32-bit shiftBarrel shifter for op2 allows full 32-bit shiftwithin instruction cyclewithin instruction cycle
2626
Data Processing Instructions (2) Data Processing Instructions (2)
Arithmetic operations:Arithmetic operations: ADD, ADDC, SUB, SUBC, RSB, RSCADD, ADDC, SUB, SUBC, RSB, RSC
Bit-wise logical operations:Bit-wise logical operations: AND, EOR, ORR, BICAND, EOR, ORR, BIC
Register movement operations:Register movement operations: MOV, MVN MOV, MVN
Comparison operations:Comparison operations: TST, TEQ, CMP, CMNTST, TEQ, CMP, CMN
2727
Data Processing Instructions (3)Data Processing Instructions (3)
Conditional codesConditional codes
++Data processing instructionsData processing instructions
++Barrel shifterBarrel shifter
==Powerful tools for efficient coded programsPowerful tools for efficient coded programs
2828
Data Processing Instructions (4)Data Processing Instructions (4)
e.g.:e.g.:
if (z==1) R1=R2+(R3*4) if (z==1) R1=R2+(R3*4)
compiles tocompiles to
EQADDS R1,R2,R3, LSL #2EQADDS R1,R2,R3, LSL #2
( SINGLE INSTRUCTION ! )( SINGLE INSTRUCTION ! )
2929
Data Transfer InstructionsData Transfer Instructions
Load/store instructionsLoad/store instructions
Used to move signed and unsigned Used to move signed and unsigned Word, Half Word and Byte to and from registersWord, Half Word and Byte to and from registers
Can be used to load PC Can be used to load PC (if target address is beyond branch instruction range)(if target address is beyond branch instruction range)
LDRLDR Load WordLoad Word STRSTR Store WordStore Word
LDRHLDRH Load Half WordLoad Half Word STRHSTRH Store Half WordStore Half Word
LDRSHLDRSH Load Signed Half WordLoad Signed Half Word STRSHSTRSH Store Signed Half WordStore Signed Half Word
LDRBLDRB Load ByteLoad Byte STRBSTRB Store ByteStore Byte
LDRSBLDRSB Load Signed ByteLoad Signed Byte STRSBSTRSB Store Signed ByteStore Signed Byte
3030
Block Transfer InstructionsBlock Transfer Instructions
Load/Store Multiple instructions Load/Store Multiple instructions ((LDMLDM//STMSTM) )
Whole register bank Whole register bank or a or a subset subset copied to memory or restored copied to memory or restored with single instructionwith single instruction
R0
R1
R2
R14
R15
Mi
Mi+1
Mi+2
Mi+14
Mi+15
LDM
STM
3131
Swap InstructionSwap Instruction
Exchanges a word Exchanges a word between registers between registers
Two cyclesTwo cycles
butbut
single atomic actionsingle atomic action
Support for RT Support for RT semaphoressemaphores
R0
R1
R2
R7
R8
R15
3232
Modifying the Status RegistersModifying the Status Registers
Only indirectlyOnly indirectly
MSRMSR moves contents moves contents from CPSR/SPSR from CPSR/SPSR to selected GPRto selected GPR
MRSMRS moves contents moves contents from selected GPR from selected GPR to CPSR/SPSRto CPSR/SPSR
Only in privileged Only in privileged modesmodes
R0
R1
R7
R8
R14
R15
CPSRSPSR
MSR
MRS
3333
Multiply InstructionsMultiply Instructions
Integer multiplication (32-bit result)Integer multiplication (32-bit result)
Long integer multiplication (64-bit result) Long integer multiplication (64-bit result)
Built in Multiply Accumulate Unit (MAC)Built in Multiply Accumulate Unit (MAC)
Multiply and accumulate instructions add product Multiply and accumulate instructions add product to running totalto running total
3434
Multiply InstructionsMultiply Instructions
Instructions:Instructions:
MULMUL MultiplyMultiply 32-bit result32-bit result
MULAMULA Multiply accumulateMultiply accumulate 32-bit result32-bit result
UMULLUMULL Unsigned multiplyUnsigned multiply 64-bit result64-bit result
UMLALUMLAL Unsigned multiply accumulateUnsigned multiply accumulate 64-bit result64-bit result
SMULLSMULL Signed multiplySigned multiply 64-bit result64-bit result
SMLALSMLAL Signed multiply accumulateSigned multiply accumulate 64-bit result64-bit result
3535
Software InterruptSoftware Interrupt
SWISWI instruction instruction Forces CPU into supervisor modeForces CPU into supervisor mode Usage: SWI #nUsage: SWI #n
Maximum 2Maximum 22424 calls calls Suitable for running privileged code and Suitable for running privileged code and making OS callsmaking OS calls
Cond Opcode Ordinal
31 28 27 24 23 0
3636
Branching InstructionsBranching Instructions
BranchBranch (B): (B): jumps forwards/backwards jumps forwards/backwards
up to 32 MBup to 32 MB
Branch linkBranch link (BL): (BL): same same + saves (PC+4) in LR+ saves (PC+4) in LR
Suitable for function call/returnSuitable for function call/return
Condition codes for conditional branchesCondition codes for conditional branches
3737
Branching Instructions (2)Branching Instructions (2)
Branch exchange (BX) Branch exchange (BX) and and Branch link exchange (BLX)Branch link exchange (BLX): :
same as same as B/BL B/BL ++ exchange exchange instruction setinstruction set (ARM (ARM THUMB) THUMB)
Only way to swap setsOnly way to swap sets
3838
Thumb Instruction SetThumb Instruction Set
Compressed form of ARMCompressed form of ARM Instructions stored as 16-bit,Instructions stored as 16-bit, Decompressed into ARM instructions andDecompressed into ARM instructions and ExecutedExecuted
Lower performance (ARM 40% faster)Lower performance (ARM 40% faster)
Higher density (THUMB saves 30% space)Higher density (THUMB saves 30% space)
Optimal – Optimal – ““interworkinginterworking” ” (combining two sets) – (combining two sets) – compiler compiler supportedsupported
3939
THUMB Instruction Set (2)THUMB Instruction Set (2)
More traditional:More traditional: No condition codesNo condition codes Two-address data processing instructionsTwo-address data processing instructions
Access to R0 – R8 restricted toAccess to R0 – R8 restricted to MOVMOV, , ADDADD, , CMPCMP
PUSH/POP for stack manipulationPUSH/POP for stack manipulation Descending stack (SP hardwired to R13)Descending stack (SP hardwired to R13)
4040
THUMB Instruction Set (3)THUMB Instruction Set (3)
No No MSR MSR and and MRSMRS, , must change to ARM to modify CPSR must change to ARM to modify CPSR (change using (change using BX BX or or BLXBLX))
ARM entered automatically after RESET ARM entered automatically after RESET or entering exception modeor entering exception mode
Maximum 255 SWI callsMaximum 255 SWI calls
4141
The Next StepThe Next Step
New ARM Cortex family of processors New NEON™ media and signal
processing extensions Thumb®-2 blended 16/32-bit instruction set
for performance and low power Improved Interrupt handling
4242
SummarySummary
Adoption of ARM technology has increased efficiency and lowered costs
ARM is the world’s leading architecture today 3 billion ARM Powered chips and counting
4343
ReferencesReferences
www.arm.comwww.arm.com
ARM Limited ARM Limited ARM Architecture Reference ManualARM Architecture Reference Manual, , Addison Wesley, June 2000Addison Wesley, June 2000
Trevor Martin Trevor Martin The Insiders Guide To The Philips ARM7-The Insiders Guide To The Philips ARM7-Based Microcontrollers, Based Microcontrollers, Hitex (UK) Ltd., February 2005Hitex (UK) Ltd., February 2005
Steve Furber Steve Furber ARM System-On-Chip Architecture ARM System-On-Chip Architecture (2(2ndnd edition), edition), Addison Wesley, March 2000Addison Wesley, March 2000
4444
The EndThe End
Authors:Nemanja Perovic, [email protected]. Dr. Veljko Milutinovic, [email protected]