38
1 ARM ARCHITECTURE ARM ARCHITECTURE

Arm Architecture

Embed Size (px)

DESCRIPTION

by bharat

Citation preview

Page 1: Arm Architecture

11

ARM ARCHITECTUREARM ARCHITECTURE

Page 2: Arm Architecture

22

ARM ArchitectureARM Architecture

Typical RISC architecture:Typical RISC architecture:

Large uniform register fileLarge uniform register file

Load/store architectureLoad/store architecture

Simple addressing modesSimple addressing modes

Uniform and fixed-length instruction fieldsUniform and fixed-length instruction fields

Page 3: Arm Architecture

33

ARM Architecture (2)ARM Architecture (2)

Enhancements:Enhancements:

Each instruction controls the ALU and shifterEach instruction controls the ALU and shifter

Auto-incrementAuto-increment

and auto-decrement addressing modesand auto-decrement addressing modes

Multiple Load/StoreMultiple Load/Store

Conditional executionConditional execution

Page 4: Arm Architecture

44

ARM Architecture (3)ARM Architecture (3)

Results:Results:

High performanceHigh performance

Low code sizeLow code size

Low power consumptionLow power consumption

Low silicon areaLow silicon area

Page 5: Arm Architecture

55

Operating ModesOperating Modes

Seven operating modes:Seven operating modes: UserUser

Privileged:Privileged:

System (version 4 and above)System (version 4 and above)

FIQFIQ

IRQIRQ

AbortAbort

Undefined Undefined

SupervisorSupervisor

exception modesexception modes

Page 6: Arm Architecture

66

Operating Modes (2)Operating Modes (2)

User mode:User mode:

Normal program Normal program execution modeexecution mode

System resources System resources unavailableunavailable

Mode changed Mode changed by exception onlyby exception only

Exception modes:Exception modes:

Entered Entered upon exceptionupon exception

Full accessFull accessto system resourcesto system resources

Mode changed freelyMode changed freely

Page 7: Arm Architecture

77

ExceptionsExceptions

Table 1 - Exception types, sorted by Interrupt Vector addressesTable 1 - Exception types, sorted by Interrupt Vector addresses

ExceptionException ModeMode PriorityPriority IV AddressIV Address

ResetReset SupervisorSupervisor 11 0x000000000x00000000

Undefined instructionUndefined instruction UndefinedUndefined 66 0x000000040x00000004

Software interruptSoftware interrupt SupervisorSupervisor 66 0x000000080x00000008

Prefetch AbortPrefetch Abort AbortAbort 55 0x0000000C0x0000000C

Data AbortData Abort AbortAbort 22 0x000000100x00000010

InterruptInterrupt IRQIRQ 44 0x000000180x00000018

Fast interruptFast interrupt FIQFIQ 33 0x0000001C0x0000001C

Page 8: Arm Architecture

88

ARM RegistersARM Registers

31 general-purpose 32-bit registers31 general-purpose 32-bit registers

16 visible, R0 – R1516 visible, R0 – R15

Others speed up the exception processOthers speed up the exception process

Page 9: Arm Architecture

99

ARM Registers (2)ARM Registers (2)

Special roles:Special roles: Hardware Hardware

R14 – Link Register (LR): R14 – Link Register (LR): optionally holds return address optionally holds return address for branch instructionsfor branch instructions

R15 – Program Counter (PC)R15 – Program Counter (PC)

SoftwareSoftware

R13 - Stack Pointer (SP)R13 - Stack Pointer (SP)

Page 10: Arm Architecture

1010

ARM Registers (3)ARM Registers (3)

Current Program Status Register (CPSR)Current Program Status Register (CPSR)

Saved Program Status Register (SPSR)Saved Program Status Register (SPSR)

On exception, entering On exception, entering modmod mode: mode: (PC + 4) (PC + 4) LR LR

CPSR CPSR SPSR_modSPSR_mod

PC PC IV address IV address

R13, R14 replaced by R13_mod, R14_modR13, R14 replaced by R13_mod, R14_mod

In case of FIQ mode R7 – R12 also replacedIn case of FIQ mode R7 – R12 also replaced

Page 11: Arm Architecture

1111

ARM Registers (4)ARM Registers (4)

R0R1R2R3R4R5R6R7R8R9R10R11R12R13R14

R15 (PC)

CPSR

System & UserR0R1R2R3R4R5R6

R7_fiqR8_fiqR9_fiqR10_fiqR11_fiqR12_fiqR13_fiqR14_fiq

R15 (PC)

CPSRSPSR_fiq

FIQR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_irqR14_irq

R15 (PC)

CPSRSPSR_irq

IRQR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_svcR14_svcR15 (PC)

CPSRSPSR_svc

SupervisorR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_abtR14_abtR15 (PC)

CPSRSPSR_abt

AbortR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_undR14_undR15 (PC)

CPSRSPSR_und

Undefined

Page 12: Arm Architecture

1212

Pipeline OrganizationPipeline Organization

Increases speed – Increases speed –

most instructions executed in single cyclemost instructions executed in single cycle

Versions:Versions: 3-stage (ARM7TDMI and earlier)3-stage (ARM7TDMI and earlier)

5-stage (ARMS, ARM9TDMI)5-stage (ARMS, ARM9TDMI)

6-stage (ARM10TDMI)6-stage (ARM10TDMI)

Page 13: Arm Architecture

1313

Pipeline Organization (2)Pipeline Organization (2)

3-stage pipeline: Fetch – Decode - Execute3-stage pipeline: Fetch – Decode - Execute

Three-cycle latency, Three-cycle latency,

one instruction per cycle throughputone instruction per cycle throughput

cycle

Fetch Decode Execute

Fetch Decode Execute

Fetch Decode Execute

instruction

t t+1 t+2 t+3 t+4

i

i+1

i+2

Page 14: Arm Architecture

1414

Write-backWrite-back

Buffer/dataBuffer/data

ExecuteExecute

DecodeDecode

Pipeline Organization (3)Pipeline Organization (3)

5-stage pipeline:5-stage pipeline: Reduces work per cycle => Reduces work per cycle =>

allows higher clock frequencyallows higher clock frequency

Separates data and Separates data and instruction memory => instruction memory => reduction of CPI reduction of CPI

(average number (average number of clock Cycles Per Instruction)of clock Cycles Per Instruction)

Stages:Stages:

FetchFetch

Page 15: Arm Architecture

1515

Instruction SetInstruction Set

Two instruction sets:Two instruction sets: ARMARM

Standard 32-bit instruction setStandard 32-bit instruction set

THUMBTHUMB

16-bit compressed form16-bit compressed form

Code density better than most CISCCode density better than most CISC

Dynamic decompression in pipelineDynamic decompression in pipeline

Page 16: Arm Architecture

1616

ARM Instruction SetARM Instruction Set

Features:Features: Load/Store architectureLoad/Store architecture

3-address data processing instructions3-address data processing instructions

Conditional executionConditional execution

Load/Store multiple registersLoad/Store multiple registers

Shift & ALU operation in single clock cycleShift & ALU operation in single clock cycle

Page 17: Arm Architecture

1717

ARM Instruction Set (2)ARM Instruction Set (2)

Conditional execution:Conditional execution: Each data processing instruction Each data processing instruction

prefixed by condition codeprefixed by condition code

Result – smooth flow of instructions through pipelineResult – smooth flow of instructions through pipeline

16 condition codes:16 condition codes:

EQEQ equalequal MIMI negativenegative HIHI unsigned unsigned higherhigher GTGT signed greater signed greater

thanthan

NENE not equalnot equal PLPL positive or positive or zerozero LSLS unsigned lower unsigned lower

or sameor same LELE signed less signed less than or equalthan or equal

CSCSunsigned unsigned higher or higher or samesame

VSVS overflowoverflow GEGE signed greater signed greater than or equalthan or equal ALAL alwaysalways

CCCC unsigned unsigned lowerlower VCVC no overflowno overflow LTLT signed less signed less

thanthan NVNV special special purposepurpose

Page 18: Arm Architecture

1818

ARM Instruction Set (3)ARM Instruction Set (3)

ARM instruction set

Data processing instructions

Data transfer instructions

Software interruptinstructions

Block transferinstructions

Multiply instructions

Branching instructions

Page 19: Arm Architecture

1919

Data Processing Instructions Data Processing Instructions

Arithmetic and logical operationsArithmetic and logical operations

3-address format:3-address format: Two 32-bit operands Two 32-bit operands

(op1 is register, op2 is register or immediate)(op1 is register, op2 is register or immediate)

32-bit result placed in a register32-bit result placed in a register

Barrel shifter for op2 allows full 32-bit shiftBarrel shifter for op2 allows full 32-bit shiftwithin instruction cyclewithin instruction cycle

Page 20: Arm Architecture

2020

Data Processing Instructions (2) Data Processing Instructions (2)

Arithmetic operations:Arithmetic operations: ADD, ADDC, SUB, SUBC, RSB, RSCADD, ADDC, SUB, SUBC, RSB, RSC

Bit-wise logical operations:Bit-wise logical operations: AND, EOR, ORR, BICAND, EOR, ORR, BIC

Register movement operations:Register movement operations: MOV, MVN MOV, MVN

Comparison operations:Comparison operations: TST, TEQ, CMP, CMNTST, TEQ, CMP, CMN

Page 21: Arm Architecture

2121

Data Processing Instructions (3)Data Processing Instructions (3)

Conditional codesConditional codes

++Data processing instructionsData processing instructions

++Barrel shifterBarrel shifter

==Powerful tools for efficient coded programsPowerful tools for efficient coded programs

Page 22: Arm Architecture

2222

Data Processing Instructions (4)Data Processing Instructions (4)

e.g.:e.g.:

if (z==1) R1=R2+(R3*4) if (z==1) R1=R2+(R3*4)

compiles tocompiles to

EQADDS R1,R2,R3, LSL #2EQADDS R1,R2,R3, LSL #2

( SINGLE INSTRUCTION ! )( SINGLE INSTRUCTION ! )

Page 23: Arm Architecture

2323

Data Transfer InstructionsData Transfer Instructions

Load/store instructionsLoad/store instructions

Used to move signed and unsigned Used to move signed and unsigned Word, Half Word and Byte to and from registersWord, Half Word and Byte to and from registers

Can be used to load PC Can be used to load PC (if target address is beyond branch instruction range)(if target address is beyond branch instruction range)

LDRLDR Load WordLoad Word STRSTR Store WordStore Word

LDRHLDRH Load Half WordLoad Half Word STRHSTRH Store Half WordStore Half Word

LDRSHLDRSH Load Signed Half WordLoad Signed Half Word STRSHSTRSH Store Signed Half WordStore Signed Half Word

LDRBLDRB Load ByteLoad Byte STRBSTRB Store ByteStore Byte

LDRSBLDRSB Load Signed ByteLoad Signed Byte STRSBSTRSB Store Signed ByteStore Signed Byte

Page 24: Arm Architecture

2424

Block Transfer InstructionsBlock Transfer Instructions

Load/Store Multiple instructions Load/Store Multiple instructions ((LDMLDM//STMSTM) )

Whole register bank Whole register bank or a or a subset subset copied to memory or restored copied to memory or restored with single instructionwith single instruction

R0

R1

R2

R14

R15

Mi

Mi+1

Mi+2

Mi+14

Mi+15

LDM

STM

Page 25: Arm Architecture

2525

Swap InstructionSwap Instruction

Exchanges a word Exchanges a word between registers between registers

Two cyclesTwo cycles

butbut

single atomic actionsingle atomic action

Support for RT Support for RT semaphoressemaphores

R0

R1

R2

R7

R8

R15

Page 26: Arm Architecture

2626

Modifying the Status RegistersModifying the Status Registers

Only indirectlyOnly indirectly

MSRMSR moves contents moves contents from CPSR/SPSR from CPSR/SPSR to selected GPRto selected GPR

MRSMRS moves contents moves contents from selected GPR from selected GPR to CPSR/SPSRto CPSR/SPSR

Only in privileged Only in privileged modesmodes

R0

R1

R7

R8

R14

R15

CPSRSPSR

MSR

MRS

Page 27: Arm Architecture

2727

Multiply InstructionsMultiply Instructions

Integer multiplication (32-bit result)Integer multiplication (32-bit result)

Long integer multiplication (64-bit result) Long integer multiplication (64-bit result)

Built in Multiply Accumulate Unit (MAC)Built in Multiply Accumulate Unit (MAC)

Multiply and accumulate instructions add product Multiply and accumulate instructions add product to running totalto running total

Page 28: Arm Architecture

2828

Multiply InstructionsMultiply Instructions

Instructions:Instructions:

MULMUL MultiplyMultiply 32-bit result32-bit result

MULAMULA Multiply accumulateMultiply accumulate 32-bit result32-bit result

UMULLUMULL Unsigned multiplyUnsigned multiply 64-bit result64-bit result

UMLALUMLAL Unsigned multiply accumulateUnsigned multiply accumulate 64-bit result64-bit result

SMULLSMULL Signed multiplySigned multiply 64-bit result64-bit result

SMLALSMLAL Signed multiply accumulateSigned multiply accumulate 64-bit result64-bit result

Page 29: Arm Architecture

2929

Software InterruptSoftware Interrupt

SWISWI instruction instruction Forces CPU into supervisor modeForces CPU into supervisor mode Usage: SWI #nUsage: SWI #n

Maximum 2Maximum 22424 calls calls Suitable for running privileged code and Suitable for running privileged code and making OS callsmaking OS calls

Cond Opcode Ordinal

31 28 27 24 23 0

Page 30: Arm Architecture

3030

Branching InstructionsBranching Instructions

BranchBranch (B): (B): jumps forwards/backwards jumps forwards/backwards

up to 32 MBup to 32 MB

Branch linkBranch link (BL): (BL): same same + saves (PC+4) in LR+ saves (PC+4) in LR

Suitable for function call/returnSuitable for function call/return

Condition codes for conditional branchesCondition codes for conditional branches

Page 31: Arm Architecture

3131

Branching Instructions (2)Branching Instructions (2)

Branch exchange (BX) Branch exchange (BX) and and Branch link exchange (BLX)Branch link exchange (BLX): :

same as same as B/BL B/BL ++ exchange exchange instruction setinstruction set (ARM (ARM THUMB) THUMB)

Only way to swap setsOnly way to swap sets

Page 32: Arm Architecture

3232

Thumb Instruction SetThumb Instruction Set

Compressed form of ARMCompressed form of ARM Instructions stored as 16-bit,Instructions stored as 16-bit, Decompressed into ARM instructions andDecompressed into ARM instructions and ExecutedExecuted

Lower performance (ARM 40% faster)Lower performance (ARM 40% faster)

Higher density (THUMB saves 30% space)Higher density (THUMB saves 30% space)

Optimal – Optimal – ““interworkinginterworking” ” (combining two sets) – (combining two sets) – compiler compiler supportedsupported

Page 33: Arm Architecture

3333

THUMB Instruction Set (2)THUMB Instruction Set (2)

More traditional:More traditional: No condition codesNo condition codes Two-address data processing instructionsTwo-address data processing instructions

Access to R0 – R8 restricted toAccess to R0 – R8 restricted to MOVMOV, , ADDADD, , CMPCMP

PUSH/POP for stack manipulationPUSH/POP for stack manipulation Descending stack (SP hardwired to R13)Descending stack (SP hardwired to R13)

Page 34: Arm Architecture

3434

THUMB Instruction Set (3)THUMB Instruction Set (3)

No No MSR MSR and and MRSMRS, , must change to ARM to modify CPSR must change to ARM to modify CPSR (change using (change using BX BX or or BLXBLX))

ARM entered automatically after RESET ARM entered automatically after RESET or entering exception modeor entering exception mode

Maximum 255 SWI callsMaximum 255 SWI calls

Page 35: Arm Architecture

3535

The Next StepThe Next Step

New ARM Cortex family of processors New NEON™ media and signal

processing extensions Thumb®-2 blended 16/32-bit instruction set

for performance and low power Improved Interrupt handling

Page 36: Arm Architecture

3636

SummarySummary

Adoption of ARM technology has increased efficiency and lowered costs

ARM is the world’s leading architecture today 3 billion ARM Powered chips and counting

Page 37: Arm Architecture

3737

ReferencesReferences

www.arm.comwww.arm.com

ARM Limited ARM Limited ARM Architecture Reference ManualARM Architecture Reference Manual, , Addison Wesley, June 2000Addison Wesley, June 2000

Trevor Martin Trevor Martin The Insiders Guide To The Philips ARM7-The Insiders Guide To The Philips ARM7-Based Microcontrollers, Based Microcontrollers, Hitex (UK) Ltd., February 2005Hitex (UK) Ltd., February 2005

Steve Furber Steve Furber ARM System-On-Chip Architecture ARM System-On-Chip Architecture (2(2ndnd edition), edition), Addison Wesley, March 2000Addison Wesley, March 2000

Page 38: Arm Architecture

3838

The EndThe End

Authors:Nemanja Perovic, [email protected]. Dr. Veljko Milutinovic, [email protected]