Upload
sureshbhudur
View
2.601
Download
7
Embed Size (px)
Citation preview
STM32 Seminar I t d ti t C t M CIntroduction to Cortex-M Core
COMPEL/STM SeminarCOMPEL/STM SeminarNovember 2010
Seminar AgendaOverview of ST Microcontroller PortfolioIntroduction to Cortex-M Core STM32 General Purpose Lines
Product-Line Overview (F100/F101/F103)Walk through the main peripheralsST Standard Peripheral LibraryLive demonstration of the STM32 Value Discovery Kit
STM32 Low Power LineSTM32 Low-Power LineProduct-Line Overview (L15x)Low-Power modes and consumption Specific Peripherals
STM32 Connectivity LineSTM32 Connectivity LineProduct-Line Overview (F105/7 & next)Ethernet & USB Host PeripheralsThird Party StacksAudio Supportpp
STM32 WirelessProduct-Line Overview (W108)RF PerformancesWireless Stacks (Zigbee, RF4CE, proprietary)
STM32 ToolsThird Party Compiler & IDEBoards and DebugerST Libraries
2
Cortex-M processorsForget traditional 8/16/32-bit classifications
Seamless architecture across all applicationsppEvery product optimised for ultra low power and ease of use
Cortex-M0 Cortex-M3 Cortex-M4“8/16-bit” applications “16/32-bit” applications “32-bit/DSC” applications
Binary and tool compatible
Cortex-M processors binary compatible
Cortex-M3 TrainingCortex M3 Training
Cortex M3 Core PresentationCortex-M3 Core Presentation
Cortex-M3 Processor
Hierarchical processor integrating core and advanced
system peripherals
Cortex-M3 coreHarvard architecture3-stage pipeline w. branch speculationThumb®-2 and traditional ThumbThumb 2 and traditional ThumbALU w. H/W divide and single cycle multiply
Cortex-M3 ProcessorCortex-M3 coreConfigurable interrupt controller Bus matrixAdvanced debug components Optional MPU & ETM (Not available in STM32F10x)
Cortex-M3 Processor Overview (1/2)ARM v7M Architecture
Thumb-2 Instruction Set Architecture Mix of 16 and 32 bit instructions for very high code densityMix of 16 and 32 bit instructions for very high code density
Harvard architectureSeparate I & D buses allow parallel instruction fetching & data storage
Integrated Nested Vectored Interrupt Controller (NVIC) for low latency interrupt processing
Vector Table is addresses, not instructions
Designed to be fully programmed in CEven reset, interrupts and exceptions
Integrated Bus MatrixIntegrated Bus MatrixBus Arbiter
Bit Banding – Atomic Bit Manipulation
W ite B ffeWrite Buffer
Memory Interface (I&D) Plus System Interface & Private Peripheral Bus
Integrated System Timer (SysTick) for Real Time OS or other scheduled tasks
Cortex-M3 Processor Overview (2/2)3-Stage Pipeline
Fetch Decode & ExecuteFetch, Decode & Execute
Single Cycle Multiply
Source Destination CyclesSource Destination Cycles
16b x 16b 32b 1
32b x 16b 32b 1
32b x 32b 32b 132b x 32b 32b 1
32b x 32b 64b 3-7*
*UMULL, SMULL,UMLAL, and SMLAL are interruptible and can also complete earlyd di ldepending on source values
Hardware DivisionUDIV & SDIV (Unsigned or Signed divide)
Instruction takes between 2 & 12 cycles depending on dividend and devisor
Closer the dividend and division the faster the instruction completes
Instruction is interruptible (abandoned/restarted)
Cortex-M3 & ARM7: ComparisonARM7TDMI-S Cortex-M3
Architecture v4T v7M
ISA Support ARM (32-bit) & Thumb (16-bit) Thumb-2 (Merged 32/16-bit)ISA Support ARM (32 bit) & Thumb (16 bit) Thumb 2 (Merged 32/16 bit)
DMIPS/MHz 0.74 Thumb / 0.93 ARM 1.25 Thumb-2
Pipeline 3-Stage 3-Stage + Branch Speculation
Interrupts FIQ / IRQ NMI SysTick and up to 240 interrupts Interrupts FIQ / IRQ NMI, SysTick and up to 240 interrupts. Integrated NVIC Interrupt Controllerup to 1-255 Priorities
Interrupt Latency 24-42 Cycles(Depending on LSM)
12 Cycles (6 when Tail Chaining)
Memory Map Undefined Architecture Defined
System Status PSR. 6 modes.20 Banked regs
xPSR. 2 modes.Stacked regs (1 bank)
Sleep Modes No Three p
Additional Features of the Cortex-M3Reduced pin debug & trace interfaces reduce pin overhead from 9-pins to 2- or 3-pinsHardware Interrupt Handling removes need for assembler code in interruptsIntegrated atomic bit manipulation for improved data storageExtended Data Watchpoints & Flash Patch technologyEmbedded sleep control and power-down modesOptional very small Memory Protection Unit (MPU) & Embedded Trace Macrocell (ETM)
High Performance CPU and BusesARM v7M Architecture: Harvard benefits with Von Neumann single memory space
Von Neumann “bottleneck”Single 32bit bus for:
Three 32bit buses for a parallel♦ code executionCODESingle 32bit bus for:
♦ code execution,♦ data transfer (core/dma),♦ peripheral control
♦ code execution,♦ data transfer (core/dma),♦ peripheral control
000
00 1111 1
1
01 01
11
1 10
0 00
0 11 1
1
000
0 00
00 1
0 11
1 1
CORE
DMA
CODE
1
11
0
10 0
0
CM3
DMA
10
1
1 0
1
CS
T
0
0
0
0
110 1
1DA
TA
CORTEX-M3
ARM7TDMI (ARM)DMIPS ARM966 (ARM)
111
1
10 000
0
01
1
0 000
01
FLASHRAM
DMA
PERIPH
PERIPH
1
FLASHRAM
DMA
PERIPH
PERIPH
10
1
0
ARM7TDMI (ARM)
ARM7TDMI (THUMB)
fCPU
Outstanding efficiency of 1.25 DMIPS/MHz and 1.2 CPI
THUMB-2THUMB 16bit Instruction Set Full THUMB compatibility
THUMB2 instruction set provide 32bit performance with 16bit code densityCPU
ARM 32bit Instruction Subset
New 16/32bit Instructions 1 cycle MAC and Hardware DivideBit handling
♦ Single POWERFULL instruction set No more mode switching
♦ Two 16bit instruction fetch per FLASH access
Complete ARM instruction setfor better performance
Bit handling
Cortex-M3 Memory MapVendor Specific (0.5GB)
Set aside to enable vendors to implement peripheral compatibility with previous systems
Private Peripheral Bus (1M)Address space for system componentsAddress space for system components (CoreSight, NVIC etc.)
External Device (1GB). Intended for external devices and/or shared memory that needs ordering/non-buffered
External RAM (1GB)External RAM (1GB)Intended for off chip memory
Peripheral (0.5G)Intended for normal peripherals. The bottom 1MB of the 32MB peripheral address space (0x40000000 – 0x400FFFFF) is reserved for bit-( )band accesses. Accesses to the peripheral 32MB bit band alias region (0x42000000 – 0x43FFFFFF) are remapped to this 1MB
SRAM (0.5GB)Intended for on-chip SRAM. The bottom 1MB of the SRAM address space (0x20000000 -the SRAM address space (0x20000000 -0x200FFFFF) is reserved for bit-band accesses. Accesses to the SRAM 32MB bit band alias region (0x22000000 – 0x23FFFFFF) are remapped to this 1MB address space.
Code(0.5GB)Reserved for code memory (flash, SRAM). This region is accessed via the Cortex-M3 ICode and DCode busses.
Optimized use of the RAM
Bit banding allows optimized code and give highest density use of SRAMBit banding allows optimized code and give highest density use of SRAM
Unaligned data access supported to improve data constant and RAM utilization
char (8)long (32)
char (8)long (32)
long (32) Structure
Dataaligned
32bit machinewhich doesnot support
unaligned data
long (32)
int (16)
char (8)long (32)
int (16)cint (16)
char (8) char (8) char (8)
char (8)… long
int (16)
char (8)… long int (16)c
int (16)… long (32)
char (8) char (8) char (8)char (8) long (32) …
long (32) …
long …
u umanagement
example
Unused (wasted) space Free space for the rest of the application
long (32)
Reduces SRAM Memory Requirements By Over 25%
Less Memory - LowER Cost devices
15
Debug Capabilities
M i il bl
Serial Wire Debugging for optimized device pin-out
JTAG SWDMore pins availablefor the application
S i l Wi Vi f t t d l b d idth d t t
Embedded break/watch capabilities for easy flashed application debugging♦ 2 hardware breakpoints 8 hardware breakpoints♦ 2 hardware watchpoints
Serial Wire Viewer for targeted low bandwidth data trace♦ Using serial wire interface or dedicated bus CKout+D[3..0] for better bandwidth♦ Triggered by embedded break and watch points
ETM bilit f b tt l ti d b iETM capability for better real time debugging♦ Instruction trace only♦ External signal triggering capability♦ Can be used in parallel with data watchpoint
Debugging features still kept whilst the core entered low power mode
17
Privilege, Modes and Stacks
Privileged/Non-privileged operationSame as ARM7 Supervisor/User
Thread mode and Handler modeHandler mode is an exception or interruptThread mode is just normal application code running
Main stack – Process stackExceptions use main stack in privileged modeApplications (thread mode) can use process stackApplications (thread mode) can use process stack
18
Execution ModesCortex-M3 has 2 execution modes and 2 privilege levels:
Privileged User
Handler modeAn exception is being processed Handler Mode
Always privileged execution
Thread modeN ti i b i d Thread Mode Thread ModeNo exception is being processed
Normal code is executing
Could be privileged or user
Thread Mode Thread Mode
When Thread mode has been changed to user, it cannot change itself back to privileged. Only a Handler can change the privilege of Thread mode.
This model is a simplification of the modes from other ARM processors
19
StacksCortex-M3 supports two stacks
Main Stack (initialised after reset by hardware)Main Stack (initialised after reset by hardware)
Process Stack
Exceptions use main stackExceptions use main stack
Thread mode uses either the main or process stack
Firm are selectableFirmware selectable
The intended usage model is
OS d E ti i t kOS and Exceptions use main stack
Threads (user processes) use the process stack
I d d f dif i h i kIntended to prevent user process from modifying the main stack
Can be configured to use just one stack (reset default)
20
Exception/Interrupt HandlingVery low latency interrupt processing
Exceptions processed in Privileged operation
Interruptible LDM/STM for low interrupt latency
Automatic processor state save and restoreProvides low latency ISR entry and exit
Allows handler to be written entirely in ‘C’
The Cortex-M3 processor integrates an advanced Nested Vectored Interrupt Controller (NVIC)Interrupt Controller (NVIC)
43 maskable interrupts channels (not including 16 interrupt lines of Cortex-M3)
16 programmable priority levels
Allows early processing of interrupts
Supports advanced features for next generation real-time applicationsTail-chaining of pending interrupts g p g p
Late-arrival interrupt handling and priority boosting / inversion
Exceptional Control Capabilities Through Integrated Interrupt Handling
23
Interrupt Response- Tail Chaining
IRQ1Highest
PUSH POPISR 1 PUSH POP ISR 2
IRQ2
ARM7I t t h dli i
42 CYCLES
26 16 26 16Interrupt handling in
assembler code
Tail-chaining
PUSH ISR 1 POPISR 2
12
Cortex-M3Interrupt handling in HW
6 12
6 CYCLES6 CYCLES
ARM7
• 26 cycles from IRQ1 to ISR1 entered
Cortex-M3
• 12 cycles from IRQ1 to ISR1 entered• 26 cycles from IRQ1 to ISR1 entered•Up to 42 cycles if LSM
•42 cycles from ISR1 exit to ISR2 entry•16 cycles to return from ISR2
• 12 cycles from IRQ1 to ISR1 entered• 12 cycles if LSM
•6 cycles from ISR1 exit to ISR2 entry•12 cycles to return from ISR2
25
Interrupt Response – Preemption
IRQ1Highest
IRQ2
42 CYCLES
Highest
POP ISR 1 PUSH 2 POP ISR 2
16 26 16
ARM7
42 CYCLES
ISR 1 POP ISR 2
1-
Cortex-M3
6 7 18 CYCLES
POP
1212
6 7-18 CYCLES
ARM7Cortex-M3
• POP may be abandoned early if another• Load Multiple uninterruptible,
and hence the core must complete thePOP and the full stack PUSH
y yinterrupt arrives
• If POP is interrupted it only takes 6cycles to enter ISR2 ( Equivalent to Tail-chaining)
26
Interrupt Response – Late Arriving
IRQ1Q
IRQ2
ISR 2ISR 1PUSH PUSH POP POPARM7
Highest
ISR 2ISR 1PUSH PUSH POP POPARM7
26 161626
ISR 2
Tail-Chaining
PUSH POPCortex-M3
126
ISR 1
Cortex-M3
• Stack push to ISR 2 is interruptedSt ki ti b t t dd
ARM7
• 26 cycles to ISR2 enteredI di t l t d b IRQ1 d • Stacking continues but new vector address
is fetched in parallel• 6 cycles from late-arrival to ISR1 entry.• Tail-chain into ISR 2
• Immediately pre-empted by IRQ1 andtakes a further 26 cycles to enter ISR 1.• ISR 1 completes and then takes 16
cycles to return to ISR 2.
27
Interrupt PrioritizationEach interrupt source has an 4-bit interrupt priority valueThe 4 bits are divided into pre-empting priority levels and non-pre-empting “sub-priority” levelssub priority levels
The software programmable PRIGROUP register field of the NVIC chooses how many of the 4-bits are used for “group-priority” and how many are used for “sub-priority”S b i it l l l h ff t if th ti i it l l th Sub-priority levels only have an effect if the pre-empting priority levels are the sameGroup priority is the pre-empting priority
Lower numbers are higher priorityLower numbers are higher priorityHardware interrupt number is lowest level of prioritization
IRQ3 is higher priority than IRQ4 if the priority registers are programmed the same
PRIGROUP(3 Bits)
Binary Point(group.sub)
Preempting Priority(Group Priority)
Sub-Priority
Bits Levels Bits Levels011 4.0 gggg 4 16 0 0
100 3.1 gggs 3 8 1 2ggg
101 2.2 ggss 2 4 2 4
110 1.3 gsss 1 2 3 8
111 0.4 ssss 0 0 4 16
In STM32F10x 16 levels (4-bit) of priority are implemented
Interrupt Priority Settings Examples
PRIGROUP Groups Sub-Groups
PRIGOUP = 011 „gggg“0 16 groups all with pre-
emption over lower groups15
0
p g p
PRIGOUP = 101 „ggss“
0
30
0
34 groups with each 4
sub-groups. Pre-emption only across
groups3
0
3groups
PRIGOUP = 111 „ssss“0
15
16 sub-groups without pre-emption over lower
sub-groups
Cortex-M3 Exception TypesyNo. Exception Type Priority Type of
Priority Descriptions
1 Reset -3 (Highest) fixed Reset
2 NMI -2 fixed Non-Maskable Interrupt
3 Hard Fault -1 fixed Default fault if other hander not implemented
4 MemManage Fault 0 settable MPU violation or access to illegal locations
5 Bus Fault 1 settable Fault if AHB interface receives error
6 Usage Fault 2 settable Exceptions due to program errors
7-10 Reserved N.A. N.A.
11 SVCall 3 settable System Service call
12 Debug Monitor 4 settable Break points, watch points, external debug
13 Reserved N.A. N.A.
14 PendSV 5 settable Pendable request for System Device
15 SYSTICK 6 settable System Tick Timer
16 Interrupt #0 7 settable External Interrupt #0
…… ………………….. ………………….. settable …………………..
256 Interrupt#240 247 settable External Interrupt #240
In STM32F10x 43 Interrupts are implemented (total interrupts available 59)
Vector Table
Vector Table starts at location 0In the code section of the memory map
Address Vector
0x00 Initial Main SPIn the code section of the memory map
Vector Table contains addresses (vectors)
of exception handlers and ISRs
0x04 Reset
0x08 NMI
0x0C Hard Fault
Not instructions like other ARM processors
Table size (in words) is = number of IRQ inputs + 16
0x10 Memory Manage
0x14 Bus Fault
0x18 Usage Fault
Minimum size ( case of 1 IRQ) : 17 words
Maximum size ( case of 240 IRQs) 256 words
Main stack pointer initial value in location 0
0x1C-0x28 Reserved
0x2C SVCall
0x30 Debug MonitorMain stack pointer initial value in location 0
Set up by hardware during Reset
Vector Table can be relocated (to SRAM)S ft fi bl th h d di t d i t i
0x34 Reserved
0x38 PendSV
0x3C Systick
Software configurable through dedicated register in SCB
40 IRQ0
… More IRQs
In STM32F10x the Vector Table size is 236 bytes (59 * 4 bytes)
37
Power Management“8bit Microcontroller like” power mode management
SLEEP NOW“W i f I ” i i l d♦ “Wait for Interrupt” instructions to enter low power mode
No more dedicated control register settings sequence♦ “Wait for Event” instructions to enter low power mode
No need of Interrupt to wake-up from sleepp p pRapid resume from sleep
SLEEP on EXIT♦ Sleep request done in interrupt routine
L d t d i t t t♦ Low power mode entered on interrupt returnVery fast wakeup time without context saving (6 cycles)
DEEP SLEEP♦ Long duration sleepo g du a o s eep
From product side: PLL can be stopped or shuts down the power to digital parts of the systemEnables low power consumption
Optimized RUN mode CORE power consumption3 time less than ARM7TDMI
System Timer (SysTick)Flexible system timer
24 bit self reloading down counter with end of count interrupt generation24-bit self-reloading down counter with end of count interrupt generation
2 configurable Clock sources
Suitable for Real Time OS or other scheduled tasks
In STM32F10x the SysTick clock can be: CPU clock or CPU clock/8 (provided externally by the Reset Clock Control )(provided externally by the Reset Clock Control )
39
Thank You !
50