ARM Memory - University of New Brunswickowen/courses/2253-2017/slides/04-arm-memory.pdf · ARM...

ARM MemoryOwen Kaser, CS2253

Mostly corresponds to book Chapter 5.

Overview

● Loads and Stores● Memory Maps● Register-Indirect Addressing● Post- and Pre-indexed Addressing

16 Registers is Not Enough

● So far, the only places discussed for data are the ARM's CPU registers

● Most interesting programs need more data.● We need memory outside the CPU for our bulk

data storage.● Also, memory can contain pre-computed tables

(eg, of trig functions) that are never altered● For your toaster's software, the machine code

can be set at the factory. Fancy toaster: you can “flash” your toaster with improved software.

Loads and Stores

● Recall that ARM is a “load/store” architecture. Cannot directly do calculations on values in memory. Have to load them into a CPU register to use them as inputs.

● Similarly, calculations put results into registers. Then you can use a store instruction to put them into memory.

● Loads and stores need to specify where in memory things should go. This will be a numeric “memory address”.

● (Memory) addressing modes are small built-in calculations the CPU can do, to compute the memory address.

● Simple case: value in, say, R3 is to be used as the address.

System Memory Maps

● A system built around an ARM7TDMI processor uses 32-bit values as memory addresses. Each address would correspond to a byte (oops, octet).

● The overall “memory address space” ranges from 0 to 0xFFFFFFFF.

● But the overall memory address space is further subdivided (boundaries are often small multiples of powers of 2)

● RAM, ROM, flash, and I/O devices can be given their own subdivisions.

● More on I/O devices later in the course. For now, just realize that some memory addresses accept stores, and some ignore them.

Ex. Memory Map (extracts from book Table 5.1)

Start End Description

0x00000000 0x0003FFFF On-chip flash

0x00040000 0x00FFFFFF reserved

0x01000000 0x1FFFFFFF ROM

0x20000000 0x20007FFF (Static) RAM

0x4000C000 0x4000CFFF UART 0 (a “serial port”) device

0xE0001000 0xE0001FFF “data watchpoint and trace” (DWT) facility

0xE0004000 0xFFFFFFFF reserved

For Simplicity....

● Let's only mess with addresses in a range that corresponds to RAM memory.

● Then, loads and stores both make sense.

Register-Indirect Addressing Mode

● Let's suppose you want to load the byte at address 0x00005000 into register R3.

● 8 bit value into a 32-bit container. If we want the 8-bit value to be zero-extended, use LDRB instruction.

● If you want it sign-extended, use LDRSB.● Simplest case: a register stores the address of some

data you care about. Let's go for R1.● Assembler: MOV R1, #0x00005000 ;address to R1

LDRB R3, [R1] ; memory value to R3

Looping Through Memory

● Let's suppose you want to wipe clear (to 0) the contents of all memory locations from 0x00005000 to 0x00005FFF.

● A loop will work nicely.

MOV R1, #0x00005000 ; starting location

MOV R2, #0x00006000; when to stop

MOV R3, #0

LP STRB R3, [R1] ; wipe clear current location's value

ADD R1, R1, #1 ; advance to next location

TEQ R1, R2 ; has R1 hit the stopping location?

BNE LP

Speeding It Up

● If the area to be cleared is properly aligned (starts on a multiple of 4) and is the right size (a multiple of 4) we can clear out 4 consecutive addresses with one STR (store word) instruction.

● Recall that a 32-bit word is stored across 4 addresses: A, A+1, A+2, A+3.

Faster Code

MOV R3, #0 ; 4 bytes of zeros

LP STR R3, [R1] ; wipe clear current location's value AND the next 3 locations' values

ADD R1, R1, #4 ; advance to location of next group of 4 bytes

BNE LP

● Loop runs only ¼ as many times now.

Even Faster

● The pattern of “use a register to provide a memory address, then update the register in preparation for the next loop” is extremely common.

● ARM designers created an addressing mode that does BOTH of these operations in a single instruction. “post-indexed”

● STR R3, [R1], #4 is equivalent to

STR R3, [R1]

ADD R1, R1, #4

Textbook Figure 5.2

Even Faster Code

MOV R3, #0 ; 4 bytes of zeros

LP STR R3, [R1], #4 ; wipe 4, then advance “pointer” R1

ADD R1, R1, #4 ; advance to location of next group

BNE LP

Java Pre- vs Post-Increment

● Can draw a parallel to Java's ++ operators.● Recall, v = M[ p++] in Java

– it uses the current version of p to index M

– then it increments p. post-increment.

● Versus v = M[++p] in Java– it first increments p pre-increment

– then then new value of p is used to index into M

Post-Indexed Addressing

● In ARM, post-indexed indexing takes a base register. (Should not be R15.)

● Uses that base register's value to go to memory● Then updates the base register's value by a little

computation– adding/subtracting a constant (earlier example)

– adding/subtracting a register● which is allowed to be modified by the barrel shifter● can be shifted/rotated by a constant amount● can be shifted/rotated by a register amount

● Usefulness of fanciest of these seems doubtful● LDR R1, [R2], ROR R3 ; is this useful???

Useful? Example

● Java, for an int array M, variable x:

j = 0;

while (….) {

sum += M[j];

j += x;}

● ARM: suppose x in R2, start of M in R1● In loop body: LDR R3, [R1], R2 LSL #2

Pre-Indexed Addressing

● There are two flavours of pre-indexed addressing. Both do a little computation and use the computed effective address to go to memory. In one, the base register is updated. Other flavour does not update.

● In assembly language, the ! symbol means to update the base register. Don't use R15 as the base register with !

● Ok to use R15, without ! The value of R15 is 8 bytes beyond the start of the current machine code. [Details of why are a bit advanced.]

Rationale for the “little computations”

● PC-relative addressing for constants● Getting a field of an object, given the start of

the object.● Indexing into array of objects, selecting a field

(if the object size is a power of two)● (Selected largely by analyzing what compilers

for HLLs would find useful, I think...rather than focussing on assembly language programmers)

Pre-indexed Figure (Textbook)

● Instruction is STR r0, [r1, #12]● Add ! to update r1 when finished:

STR r0, [r1, #12]! ; r0 ← x20c

Some Pre-indexed Examples

● MOV R1, 0x123456578 fails. Constant is not a rotation of an 8 bit value.

● Instead, initialize a memory location with your constant. Then use PC-relative addressing to load it.

● LDR R1, myConst ; pseudo-op

… 1000 bytes later...

myConst DCD 0x12345678

● The LDR instruction is actually something like

LDR R1, [PC, #996] ; PC was already 8 ahead● 996 is close enough to PC. Must be within 4 kiB.

Ex: Field Access for an Object

● In HLLs, the fields of an object occupy consecutive memory addresses (possibly with padding)

● Let's suppose that an object starts at 1000. There are two 32-bit fields, then a 16-bit halfword field that we want to load into R2.

● Let's suppose that R1 contains the starting address of the object.

● Use LDRH R2, [R1, #8] ; immediate offset is 8

(Desired field starts 8 bytes later: gotta skip over first two words.)

● (Minor point: LDRH requires offset ±256)

Ex: Array Access

● Suppose R1 contains the starting address of an array.

● Suppose the array's elements are 4 bytes each● To load the wth array element, we want address

R1 + 4*w● Suppose value w is in R2● LDR R5, [R1, R2 , LSL #2] loads desired value.

No ADR Pseudo-op

● The Crossware assembler does not seem to support ADR, which is used to put an address into a register (that you will then use as a base register). For instance, summing values in array…

MOV R0, #0 ; accumulate answer

ADR R1, MyArr ; Keil pseudo-op

ADR R2, AfterMyArr ; past last valid address

LP LDR R3, [R1], #4

ADD R0, R0, R3

TEQ R1, R2

BNE LP

MyArr DCD 34, 23, 56, 78, 12345566, ……...

AfterMyArr DCB 0

Instead of ADR● Instead of ADR, you should be able to do the following:

MOV R0, #0 ; accumulate answer

LDR R1, =MyArr

LDR R2, =AfterMyArr ; past last valid address

LP LDR R3, [R1], #4

ADD R0, R0, R3

TEQ R1, R2

BNE LP

MyArr DCD 34, 23, 56, 78, 12345566, ……...

AfterMyArr DCD 0 ; wasted word, could avoid...

LDR As Pseudoinstruction

● LDR Rx, =value works for any 32-bit value (address or constant).

● It sets aside space in a “constant pool” , preinitialized to value. This constant pool is (by default) at the end of the current AREA.

● Then it generates machine code for a PC-relative LDR into Rx from this preinitialized location.

● Like a convenient DCD and LDR Rx, [PC, #something]● See textbook Chapter 6.

Machine-Code FormatsLDR/STR/LDRB/STRB

● From reference manual:

Meaning of Some Bits (Ref Man)

Exercise/Example

● Determine machine code for

LDR R3, [R1], #4

and also

STRB R3, [R1, R2, LSR #5]!

Load and Store Multiple

● There are instructions LDM and STM that load or store a number of registers.

● With LDM, a bit vector in the machine code indicates which register to load. They are loaded from consecutive addresses.

● STM works similarly● They are especially useful in storing things on

the runtime stack, and will be looked at when we cover that topic.

ARM Memory - University of New Brunswickowen/courses/2253-2017/slides/04-arm-memory.pdf · ARM...

Documents

CS2253- Computerorganization Architecture-qb (2)

Cs2253 coa-2marks-2013

KeyStone 1 + ARM device memory System

OPERATING - Algoritma ve Programlama Bilgi Kaynağı · ARM architecture. • Chapter 9, Virtual Memory, updates kernel memory management to include the Linux SLUB and SLOB memory

A Tutorial Introduction to the ARM and POWER Relaxed Memory

CS2253 Computer Organization and Architecture Lecture Notes

CS2253 COMPUTER ORGANIZATION AND …onlineace.weebly.com/uploads/1/7/1/5/17153988/cs2253... · Web viewCS2253 COMPUTER ORGANIZATION AND ARCHITECTURE Question bank UNIT I BASIC STRUCTURE

ARM Cortex-M3 Introduction · ARM Cortex-M3 Introduction ARM University Relations. 2 Agenda ... ARM Cortex-M3 Microcontroller. 7 ARM and Thumb Performance Memory width (zero wait

Intra process memory protection for applications on ARM and x86

CPU ARM Systems : Memory Map ENG memory map and … · CPU ARM Systems : Memory Map 2. Kernel Sistemi s.r.l. 1 MEMORY MAP 1.1 Memory The PLC has 8192 16-bit internal memory locations

CS2253 - COMPUTER ORGANIZATION AND ARCHITECTURE DEPARTMENT ...chettinadtech.ac.in/storage/13-12-19/13-12-19-14-05-46-2462-sakthi... · PREPARED BY: S.SAKTHI, AP/IT CS2253 - COMPUTER

ARM CoreLink MMU-500 System Memory …infocenter.arm.com/.../DDI0517A_corelink_mmu_500_r0p0_trm.pdfAug 22, 2013 · ARM CoreLink MMU-500 System Memory Management Unit Technical Reference

Cs2253 CA 16marks

1 ARM University Program Copyright © ARM Ltd 2013 Using Direct Memory Access to Improve Performance

AN12026 SDRAM interface to LPC546xx external memory controller · The LPC43xx and LPC18xx use the same External Memory Controller (EMC), an ARM PrimeCell MultiPort Memory Controller

CoreSight Trace Memory Controller - Lauterbach · 2012. 2. 16. · CoreSight Trace Memory Controller ARM CoreSight With CoreSight, ARM makes available an extensive set of IP blocks,

©2000 Addison Wesley A basic ARM memory system. ©2000 Addison Wesley Simple ARM memory system control logic

CS2253 U5 Notes

Principles of ARM Memory Maps White Paperinfocenter.arm.com/help/topic/com.arm.doc.den0001c/... · 01/03/2012 · Principles of ARM Memory Maps . ... 2.1 CPUs and hardware ... Contiguous

ARM - Memory Mapping, Bit-Band Operations and CMSIS · Memory Mapping, Bit-Band Operations and CMSIS ARM MICROCONTROLLER & EMBEDDED SYSTEMS (17EC62) MODULE –2 (SELECTED TOPICS)