View
9
Download
0
Category
Preview:
Citation preview
SOC Consortium Course Material
Lab2Lab2Debugging and EvaluationDebugging and Evaluation
Speaker: SunSpeaker: Sun--Rise WuRise WuDirected by Prof. Directed by Prof. TienTien--Fu ChenFu Chen
October 23, 2003National Chung Cheng University
Adopted from NCTU & NTU SOC Course Material
2SOC Consortium Course Material
Goal of This Lab
ARM Debug TargetARM Debug Target– Usage of different ARM debug architecture
Debug skillsDebug skills to be used to debug both software of processor and memory-mapped hardware design running at the target platform.Software cost estimationSoftware cost estimation– Estimation of code sizes and performance of benchmark
Profiling utilityProfiling utility– Can be used to estimate percentage time of each
function in an application
Memory configurationMemory configuration– For performance/cost trade-off
3SOC Consortium Course Material
Outline
IntroductionIntroductionARM Debug TargetARM Debug TargetDebugging SkillsDebugging SkillsSoftware Quality Measurement (Evaluation)Software Quality Measurement (Evaluation)AppendixAppendixReferenceReference
4SOC Consortium Course Material
Introduction (1/2)Introduction (1/2)
A debugger is software that enables you to make use of a debug agent in order to examine and control the execution of software running on a debug target.
The debugger issues instructions that can:1. Load software into memory on the target2. Start and stop execution of that software 3. Display contents of memory, registers, variables 4. Enable you to change stored values.5. Software Quality Measurement ( code size, performance,
profiling.. )
5SOC Consortium Course Material
Introduction (2/2)Introduction (2/2)
ARM support two methods to do debugging. GUI : ARM eXtended Debugger (AXD).DOS : ARM Symbolic Debugger (armsd).
AXD armsd
6SOC Consortium Course Material
ARM Debug Target
AXD can debug design through:– ARMulator (software)– Multi-ICE (hardware)
• use JTAG
– Angel (hardware)• Use COM port
7SOC Consortium Course Material
Multi-ICE Arch. (1/3)
Multi-ICE connection
8SOC Consortium Course Material
Multi-ICE Arch. (2/3)
Debugging software can be run on different computer through Network.
9SOC Consortium Course Material
Multi-ICE Arch. (3/3)
To support network connections, an additional application must be running on the windows workstation that runs the The multi-ICE server.
vthe portmapperallows software on other computers on the network to locate the The multi-ICE server.
10SOC Consortium Course Material
Angel (1/5)
Angel system– Debugger: Running on the host computer, giving
instructions to Angel and displaying the results obtained from it.
– Angel debug monitor: Running alongside the application being debugged on the target platform.
– Armsd: The command line must be of the form:armsd –adp –port s=1 –linespeed 38400 image.axf
11SOC Consortium Course Material
Angel (2/5)
Debug support– Reporting and modifying memory and processor status– Downloading applications to the target system– Setting breakpoints
C library semihosting support– Enabling applications linked the ARM C and C++ libraries
to make semihosting requests by SWI
Communications support– Using ADP for communicates– Providing an error-correcting communications protocol.
12SOC Consortium Course Material
Angel (3/5)
Angel’s communications diagram
13SOC Consortium Course Material
Angel (4/5)
Task management– Ensuring that only a single operation is carried out at any
time– Assigning task priorities and schedules tasks accordingly– Controlling the Angel environment processor mode
14SOC Consortium Course Material
Angel (5/5)
Enabling Angel communications to run off, or both types of interrupt.
FIQ,IRQ
Reporting the exception to the debugger, suspend the application, and pass control back to the debug
Data, PrefetchAbort
Using 3 undefined instructions to set breakpoints in code
Undefined
Installing it to support semihostingrequests , to allow applications and Angel to enter Supervisor mode
SWI
Exception handling
15SOC Consortium Course Material
Debugging SkillsDebugging Skills
Control of program execution– set breakpoints on interesting instructions– set watchpoints on interesting data accesses– single step through code
Examine and change processor state– read and write register values
Examine and change system state– access to system memory
Interleaving source code– show C/C++ code and assemble code together
16SOC Consortium Course Material
Watch / break point
Watchpoints are taken when the data being watchpointed has changed.
Breakpoints are taken when the instruction being breakpointed reaches the execution stage. the program counter is not updated, and retains the address of the breakpointed instruction.
17SOC Consortium Course Material
Software Quality MeasurementSoftware Quality Measurement
Memory requirement of the program– Data type: Volatile (RAM), non-volatile (ROM)– Memory performance: access speed, data width, size and range
Profiling– build up a picture of the percentage of time spent in each
procedure.
Performance benchmarking– Evaluate software performance prior to implement on hardware
Writing efficient C for ARM cores– ARM/Thumb interworking– Coding styles
18SOC Consortium Course Material
Application Code and Data Size
armlink offers two options to provide the relevant information:-info sizes (sizes of all objects)-info totals (summary only)============================================================Image component sizes
Code RO Data RW Data ZI Data Debug 25840 3444 0 0 104344 Object Totals22680 762 0 300 9104 Library Totals
=============================================================Code RO Data RW Data ZI Data Debug 48520 4206 0 300 113448 Grand Totals
=============================================================Total RO Size(Code + RO Data) 52726 ( 51.49kB)Total RW Size(RW Data + ZI Data) 300 ( 0.29kB)Total ROM Size(Code + RO Data + RW Data) 52726 ( 51.49kB)
=============================================================
• The size of code/data in – an ELF image can be viewed using fromelf –z– a library can be viewed using armar –sizes
19SOC Consortium Course Material
ARM and Thumb Code Size
Simple C routineif (x>=0)
return x;else
return -x;
The equivalent ARM assemblyIabs CMP r0,#0 ;Compare r0 to zero
RSBLT r0,r0,#0 ;If r0<0 (less than=LT) then do r0= 0-r0MOV pc,lr ;Move Link Register to PC (Return)
The equivalent Thumb assemblyCODE16 ;Directive specifying 16-bit (Thumb) instructions
labs CMP r0,#0 ;Compare r0 to zeroBGE return ;Jump to Return if greater or
;equal to zeroNEG r0,r0 ;If not, negate r0
return MOV pc,lr ;Move Link register to PC (Return)
20SOC Consortium Course Material
Memory Map and Size Considerations
The linker calculates the ROM and RAM requirements for code and data as follows:– ROM: Code size + RO data + RW data– RAM: RW Data + ZI data.
You may wish to copy code from ROM into faster RAM, which will also increase the RAMrequirementsPlacing the stacks in zero-wait state, 32-bit memory on-chip will significantly improve over 8 or 16-bit off-chip memory
Default memory map
ROM
RAM
21SOC Consortium Course Material
Profiling (1/3)
About Profiling:– Profiler samples the program counter and computes the
percentage time of each function spent.– Flat Profiling:
• If only pc-sampling info. is present. It can only display the time percentage spent in each function excluding the time in its children.
• Flat profiling accumulates limited information without altering the image
– Call graph Profiling: • If function call count info. is present. It can show the
approximations of the time spent in each function including the time in its children.
• Extra code is added to the image
22SOC Consortium Course Material
Profiling (2/3)
Profiling Limitations:– Profiling is NOT available for code in ROM, or for scatter
loaded images.– No data is gathered for programs that are too small.
Call graph ProfilingFlat Profiling
23SOC Consortium Course Material
Profiling (3/3)
The Profiler command syntax is as follows:armprof [-parent|-noparent] [-child|-nochild] [-sort options] prf_file
Call graph Profiling Sample OutputName cum%self%desc% calls---------------------------------------------------------------------main 17.69%60.06%1insert_sort77.76%17.69%60.06%1strcmp 60.06%0.00%243432---------------------------------------------------------------------qs_string_compare3.21%0.00%13021shell_sort3.46%0.00%14059insert_sort60.06%0.00%243432strcmp66.75%66.75%0.00%270512---------------------------------------------------------------------
cumulative
selfdescendants
calls
24SOC Consortium Course Material
Performance benchmarking (1/4)
Execution time ( real-time vs. emulated )– $sys_clock
– Execution time = Total Cycle count / Cycle Frequency
25SOC Consortium Course Material
Performance benchmarking (2/4)
When ARM processor executes program, it will change these clock types according to demand of operating.– increase performance of data access– efficient mechanism of lower power
•N-cycles (Non-sequential cycle)The ARM core requests a transfer to or from an address which is unrelated to the address used in the preceding cycle.
•S-cycles (Sequential cycle)The ARM core requests a transfer to or from an address which is either the same, or one word or one-half-word greater than the preceding address.
• I-cycles (Internal cycle or Idle cycle) The ARM core does not require a transfer, as it is performing an internal function.
•C-cycles (Coprocessor register transfer cycle)Total clock cycle = (N + S + I + C)Total clock cycle = (N + S + I + C)--cyclescycles
26SOC Consortium Course Material
Performance benchmarking (3/4)
Estimation using different Memory modelIf no map file is specified:– ARMulator will use a 4GB bank of ‘ideal’memory, i.e., no
wait states.The map file defines regions of memory, and, for each region:– The address range to which that region is mapped.– The data bus width (in bytes).– The access times for the memory region (in ns)
mapfile typically contains something like:00000000 00020000 ROM 2 R 150/100 150/10010000000 00008000 RAM 4 RW 100/65 100/65
start address, length, label, width, access, read time, write time type non-s/seq non-s/seq
27SOC Consortium Course Material
Performance benchmarking (4/4)
Benchmarking cached coresCache efficiency – Avg. memory access time = hit time +Miss rate x Miss Penalty– Cache Efficiency = Core-Cycles / Total Bus Cycles
28SOC Consortium Course Material
Writing efficient C for ARM cores
ARM/Thumb interworking– ARM : Bottleneck, interrupt handle– Thumb: others
Compiler optimization:– Space or speed (e.g, -Ospace or -Otime)– Debug or release version (e.g., -O0 ,-O1 or -O2)
– Instruction scheduling
Coding style– Variable type and size– Parameter passing– Loop termination– Division operation and modulo arithmetic
29SOC Consortium Course Material
Data Layout
Defaultchar a;short b;char c;int d;
Optimizedchar a;char c;short b;int d;
a pad b
c pad
d
a bc
d
occupies 12 bytes, with 4 bytes of padding
occupies 8 bytes, without any padding
Group variables of the same type together. This is the best way to ensure that as little padding data as possible is added by the compiler.
30SOC Consortium Course Material
Variable Types – Size Examples
wordincADD a1,a1,#1MOV pc,lr
char charinc (chara){ return a + 1;}
int wordinc(int a){ return a + 1;}
short shortinc(short a){ return a + 1;}
shortincADD a1,a1,#1MOV a1,a1,LSL #16MOV a1,a1,ASR #16MOV pc,lr
charincADD a1,a1,#1AND a1,a1,#&ffMOV pc,lr
31SOC Consortium Course Material
Stack Usage
C/C++ code uses the stack intensively. The stack is used to hold:– Return addresses for subroutines– Local arrays & structures
To minimize stack usage:– Keep functions small (few variables, less spills)minimize
the number of ‘live’variables (I.e., those which contain useful data at each point in the function)
– Avoid using large local structures or arrays (use malloc/free instead)
– Avoid recursion
32SOC Consortium Course Material
Global Data Issues
When declaring global variables in source code to be compiled with ARM Software, three things are affected by the way you structure your code:– How much space the variables occupy at run time. This
determines the size of RAM required for a program to run. The ARM compilers may insert padding bytes between variables, to ensure that they are properly aligned.
– How much space the variables occupy in the image. This is one of the factors determining the size of ROM needed to hold a program. Some global variables which are not explicitly initialized in your program may nevertheless have their initial value (of zero, as defined by the C standard) stored in the image.
– The size of the code needed to access the variables. Some data organizations require more code to access the data. As an extreme example, the smallest data size would be achieved if all variables were stored in suitably sized bitfields, but the code required to access them would be much larger.
33SOC Consortium Course Material
Loop termination
…int acc(int n) {int i; //loop indexint sum=0;
for (i=1; i<=n ;i++)for (i=1; i<=n ;i++)sum+=i;
return sum;
}…
…int acc(int n) {int i; //loop indexint sum=0;
for (i=n; i!=0 ;ifor (i=n; i!=0 ;i----))sum+=i;
return sum;
}…
loop.c loop_opt.c
34SOC Consortium Course Material
Division operation and modulo arithmetic
The remainder operator ‘%’is commonly used in modulo arithmetic.– This will be expensive if the modulo value is not a power of two– This can be avoid by rewriting C code to use if () statement heck
unsigned counter1 (unsigned counter){ return (++counter % 60);}=============================counter1
STMFB sp!, {lr}ADD r1, r0, #1MOV r0, #0x3CBL __rt_udivMOV r0, r1LDMIA sp!, {pc}
unsigned counter2 (unsigned counter){ if (++counter >= 60)
counter=0;return counter
}=============================counter2
ADD r0, r0, #1CMP r0, #0x3CMOVCS r0, #0MOV pc, lr
modulo.c modulo_opt.c
35SOC Consortium Course Material
AppendixAppendix
Content of JTAGContent of Embedded ICE
36SOC Consortium Course Material
Content of JTAG (I)
JTAG Arch.– Serial scan path from one cell to another – Controlled by TAP controller
37SOC Consortium Course Material
Content of JTAG (II)
Core logic
Device ID reg
Instruction reg
Bypass reg
TAP controller
out
I/O
TDOTDI
TMS
TCK
TRST
enable
in
In enable
38SOC Consortium Course Material
Content of JTAG (III)
System ground reference (All VSS pins should be con-nected
VSS2, 4, 6, 8,10,14
System powered up, pin connected to Vdd through a 33 ohm resistor
SPU13
Target System Reset (sometimes referred to nSYSRST or nRSTOUT)
nICERST12
Test data outTDO11
Test clockTCK9
Test mode selectTMS7
Test data inTDI5
Test reset, active lownTRST3
System powered up, pin connected to Vdd through a 33 ohm resistor
SPU1
FunctionNamePin
•Boldface represents that these pins are JTAG Signals
39SOC Consortium Course Material
Content of Embedded ICE (I)
Debug extensions to the ARM core– The extensions consist of a number of scan chains
around the processor core and some additional signals that are used to control the behavior of the core for debug purposes :
• BREAKPT: enables external hardware to halt processor execution for debug purposes.active high
• DBGRQ: is a level-sensitive input that causes the CPU to enter debug state when the current instruction has completed.
• DBGACK: is an output from the CPU that goes high when the core is in debug state
40SOC Consortium Course Material
Content of Embedded ICE (II)
The EmbeddedICE logic– This logic is the integrated onchip logic that provides
JTAG debug support for ARM core.
– This logic is accessed through the TAP controller on the ARM core using the JTAG interface. Consists of:
• Two watchpoint units• A control register• A status register• A set of registers implementing the Debug Communications
Channel link
41SOC Consortium Course Material
Reference
Profiling: “Application Note 93: Benchmarking with ARMulator”Efficient C programming: “Application Note 34: Writing Efficient C for ARM”Multi-ICE.pdfADS_DebuggersGuide.pdfADS_GettingStarted.pdfAFS_Referece_Guide.pdfUsing EmbeddedICE.pdf
Recommended