Upload
userscrybd
View
218
Download
0
Embed Size (px)
Citation preview
8/13/2019 Intel Optimization of PXA27x
1/14
White Paper
Optimization Technology for the
Intel PXA27x Processor FamilyPerformance and Power Savings for Wireless System and
Application Development
8/13/2019 Intel Optimization of PXA27x
2/14
Table of Contents
1. Introduction 3
2. Optimizing the Intel PXA27x Processors
Performance via the BSP 3
2.1 Optimizing for On-chip SRAM 4
2.2 Optimizing for the Enhanced Memory Subsystem 4
2.3 Optimizing for the Bus Transaction Arbiter 4
3. Intel Wireless MMX Technology 4
3.1 Enabling Intel Wireless MMX Technology 5
3.2 Writing Intel Wireless MMX Technology Code 6
4. Intel Quick Capture Technology 6
5. Wireless Intel SpeedStep Technology 7
6. Intel Integrated Performance Primitives (Intel IPP) 9
7. Intel Software Development Tools 11
7.1 Compilers 11
7.2 Debuggers 11
8. IntelVTune Performance Analyzer 12
9. Intel PCA Developer Network 12
10. Summary 13
Appendix A. Overview of Key System Optimizations 14
8/13/2019 Intel Optimization of PXA27x
3/14
1. Introduction
The Intel Personal Internet Client Architecture (Intel PCA)
PXA27x processor family offers developers a new generation of
ultra-low power and industry-leading multimedia performance
on silicon. Intel has integrated a host of new features in the
Intel PXA27x processor family to enable this level of power
and performance, including:
Intel Wireless MMX technology
Wireless Intel SpeedStep technology
Intel Quick Capture technology
Up to 624MHz core speed
Enhanced memory subsystem
Intelligent bus transaction arbiter 256K of on-chip SRAM
Intels complete suite of development components are designed
to help customers to take full advantage of cutting-edge
technologies and get the best power and performance from
mobile devices. These technologies also allow Independent
Software Vendors (ISVs) to fully tune their applications.
This paper describes a typical development cycle and how to
take advantage of the optimization technologies available from
Intel. The key features this paper addresses are:
Operating System Board Support Packages (BSPs)
Intel provides BSPs for a variety of operating systems
including Linux*, Microsoft Windows* Mobile for PocketPCs
and Smartphones, Microsoft Windows* CE .NET, Palm* OS,
and Symbian* OS. The BSPs include the latest optimizations
and drivers for the Intel PXA27x processor family and make
it easy for customers to create a BSP customized for their
own mobile device.
Intel Wireless MMX Technologyan advanced set of
multimedia instructions that brings desktop-like multimedia
performance to Intel PXA27x processor-based clients, whileminimizing the power needed to run rich applications.
Wireless Intel SpeedStep Technologyallows customer
to dynamically adjust the power and performance of the
processor based on CPU demand. This can significantly
decrease power consumption in wireless handheld devices.
Intel Quick Capture Technologyprovides the ability to
get live video and high-quality still images from a wide range
of camera sensors in current and future camera-enabled
mobile handsets and PDAs.
Intel Integrated Performance Primitives (Intel IPP)
a cross-platform software library that allows users to write
optimized applications that utilize Intel Wireless MMX technology
to maximize performance on the Intel PXA27x processor.
Intel
Software Development Tools (Intel
SDT)provides both an optimizing compiler and a set of
sophisticated, high-level language debuggers to help
software run at top speed.
IntelVTuneAnalyzerthis tool lets users profile
applications for hotspots of activity. A tuning assistant
provides support to optimize C/C++ code and/or assembler
sequences.
Intel PCA Developer Networkprovides information on
third-party software applications that are already optimized,
as well as optimization labs and support to answer
questions. With over 1,000 companies and over 3,000
different software and hardware solutions, the Intel PCA
Developer Network can help customers find value-add
solutions for mobile devices.
This paper introduces Intel optimization technologies and
address how each fit into a typical development cycle consisting
of iterations of coding, optimizing, and profiling. Devices that take
advantage of these optimizations achieve significant performance
improvements and power savings over those that do not.
Applications that take advantage of these optimizations will run
faster and more efficiently on the Intel PXA27x processor-based
devices. Pointers to additional resources that provide more
detailed information on each technology are provided at the
end of this paper.
2. Optimizing the Intel PXA27xProcessors Performance via the BSP
To ensure that designs and applications take full advantage of
the technology in the Intel PXA27x processor, OEM and ODMs
should make sure that the device BSP supports these features.
Intel provides BSPs for a variety of operating systems includingLinux, Microsoft Windows Mobile for PocketPCs and
Smartphones, Microsoft Windows CE .NET, Palm OS, and
Symbian OS. The BSPs include the latest optimizations and
drivers for the Intel PXA27x processor and make it easy for
customers to create a customized BSP.
The BSPs for the Intel PXA27x processor contain an extensive
number of optimizations, the latest versions of which can be
obtained from Intel field sales representatives. While a full list
3
White PaperOptimization Technology for the Intel PXA27x Processor
8/13/2019 Intel Optimization of PXA27x
4/14
White Paper
of the available optimized drivers is beyond the scope of this
paper, several key optimizations for the Intel PXA27x processor
are described here, including:
Optimizing for 256K of on-chip SRAM
Enabling and utilizing the enhanced memory subsystem
Taking advantage of the bus-transaction arbiter
2.1 Optimizing for On-chip SRAM
The internal SRAM can be used for frame buffers as well as
storage of variables or data to be processed. The SRAM has a
fast access time, and is powered from the VCC_SRAM domain,
offering both lower power and higher performance than using
external memory.
Example: a 320x240x16 bit-per-pixel frame buffer consumes154K of memory, allowing the rest to be used for temporary
storage of MPEG-4* video buffers, a Java* virtual machine
heap, incoming data from the Intel Quick Capture camera
interface, executable code, streamed data, or other variables
that need to be accessed quickly.
The SRAM is comprised of four independently controllable
64K banks. When entering sleep or deep-sleep mode, one
or more banks can remain powered on. This retains the OS
state so context can be restored quickly upon wakeup from
those modes.
2.2 Optimizing for the Enhanced Memory
Subsystem
The Intel PXA27x processor family enhances and adds flexibility
to the bus settings of the Intel PXA255 processor family, which
supported a 200-MHz system bus at core speed of 400MHz.
The internal system bus in the Intel PXA27x processor can run
up to 208MHz using fast bus mode at many other product
points, including 312 and 208MHz by setting the CLKCFG[B]
bit to one. As a result, applications on the Intel PXA27x
processor can offer better performance at the lower
frequency settings.
The memory controller offers flexibility to run at greater speeds
than before by setting the CCCR[A] bit to one. This helps
reduce latency and increases bandwidth to memory, offering
better system performance.
2.3 Optimizing for the Bus Transaction Arbiter
The bus arbiter in the Intel PXA27x processor performs the
arbitration for internal-bus-access transactions, which is
programmable through the ARB_CNTRL register. The Intel
PXA27x processor system bus supports six clientsthe core,
the DMA controller, the LCD controller, the USB host controller,
and both an internal and external memory controller. Customers
can program priority weights for each of these clients via the
arbiter-control register, which enables fine-tuning of device
performance based on the typical usage model for that device.
Example: a device is designed to encode MPEG-4 video using
Intel Quick Capture interface in the Intel PXA27x processor
and stream it over USB Host to an attached USB Client. By
assigning higher priority weights to the core and the USB host
controller, improving performance of processing the MPEG-4
video stream (CPU intensive) and transmitting it via USB Host.
Customers are encouraged to test the performance benefits of
applying different priority weights to these clients based on the
supported usage model.
Further performance can be gained by speculatively parking
a specific client on the arbiter. This means that the arbiter will
always start with that client when internal-bus-access
transactions are performed. Setting the arbiter-control register
to park on the core often results in the best performance.
This is only an overview of the key features that should be in
your BSP or in your application. Other features are listed in
the other sections of this paper.
For more information, consult the following documentation:
The Intel PXA27x Processor Optimization Guide
The Intel PXA27x Processor Developer Manual
Volume I of III
The Intel PXA27x Processor Developer Manual
Volume II of III
The Intel PXA27x Processor Developer Manual
Volume III of III
3. Intel Wireless MMX Technology
Introduced in 2003, Intel Wireless MMX technology is an advanced
set of multimedia instructions that help bring desktop-like
multimedia performance to Intel PXA27x processor-based clients,
while minimizing the power needed to run rich applications.
4
Optimization Technology for the Intel PXA27x Processor
8/13/2019 Intel Optimization of PXA27x
5/14
8/13/2019 Intel Optimization of PXA27x
6/14
3.2 Writing Intel Wireless MMX Technology Code
After Intel Wireless MMX technology is enabled, there are
several usage options to consider:
Write directly in assembly language, or use inline assembler
this offers the most flexibility, but requires more effort and
maintenance than other options.
Use the Intel C/C++ Compiler to use intrinsics that
support Intel Wireless MMX technology, and to use the
vectorizer feature.
Link to Intel Integrated Performance Primitives, which
are pre-optimized libraries that provide a high level of
abstraction to jump-start multimedia and signal
processing-based applications.
The figure above shows the relative performance benefits
for Intel Wireless MMX technology over scalar code. This test
decodes an MPEG-4 CIF resolution video clip and an MP3
file simultaneously.
Actual benchmarks were run on a Mainstone I system (main
board rev 1.1 ECO B, Rev 2 daughtercard, ECO D with 2.5 volt
VCC_MEM) with The Intel PXA27x processor A1 stepping running
at speeds indicated in graph. The 208-MHz measurements made
in the processor Run mode and measurements at all other
frequencies were made in turbo Mode. The system bus was
104Mhz for 208, 312, 416 and 520MHz core frequencies. This
platform represents a bare metal system with no operating
system installed. MPEG-4 decoder implement with Intel IPP library
optimized for Wireless MMX and MPEG-4 content is the CIF
resolution video clip Coastguard in portrait mode.
For more details on optimizing your application for Intel Wireless
MMX technology, reference the following documentation:
The Intel PXA27x Processor Optimization Guide
The book Programming with Intel Wireless MMX
Technology, available from Amazon.com (www.amazon.com),
is a comprehensive programming guide for Intel Wireless
MMX technology and an invaluable resource for this exciting
technology. Pre-order at http://www.amazon.com/exec/
obidos/tg/detail/-/0974364916
4. Intel Quick Capture Technology
With a growing number of PDAs and cell phones that include
digital cameras, the Intel PXA27x processor introduces Intel
Quick Capture technology. This provides high-performance
and low-power solutions for still and video image capture
and playback.
Intel Quick Capture Technology includes the following key
components:
Highly flexible Intel Quick Capture Interface
Hardware color-space conversion
The Intel Quick Capture interface eases the connection between
the Intel PXA27x processor, CMOS and some CCD sensors. It is
a 4-, 5-, 8-, 9-, or 10-bit wide bus with control and clock lines
that can be used in master and slave modes. The maximum
programmable resolution supported is 2048x2048 pixels, or
4.1 mega pixels.
The LCD controller in the Intel PXA27x processor supports a
hardware cursor and three image places: one base plane and two
overlays. The software interface that controls camera applications
typically resides on the base plane, and a preview window and/or
a decompressed image is displayed on the overlays.
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
FRAMERA
TE(fps)
IntelWireless MMX ARM* v5TE Instructions Only
51.0
32.0
208 MHz
66.0
42.0
312 MHz
75.0
50.0
416 MHz
Higher is
Better!
White Paper
6
Optimization Technology for the Intel PXA27x Processor
Figure 2: Intel Wireless MMX benefits over Scalar code
Figure 3: Programming with Intel Wireless MMX technology
8/13/2019 Intel Optimization of PXA27x
7/14
Overlay 2 has built-in hardware color-conversion from various
luminance-chrominance (YCbCr) formats to red/green/blue (RGB)
output. Here are two examples of where this can be used:
Camera sensors often output data in YCbCr 4:2:2 formats.
When performing camera preview for still-image or video
capture, the output of the camera sensor can be sent directly
to Overlay 2, which converts the YCbCr to RGB for viewing
on the LCD panel.
The output of an MPEG-4 decoder in YCbCr 4:2:0 format
can be sent directly to Overlay 2, allowing the video
sequence to be displayed on the LCD panel.
These sample scenarios can be used in other advanced
applications such as MPEG-4 video conferencing. The application
is best understood in Figure 4, which shows all key features of the
Intel Quick Capture technology, including three main data paths:
Self-preview video streamdata from the sensor in YCbCr
4:2:2 format is down sampled and converted to RGB 5:6:5
format using Intel Wireless MMX technology. These functions
are already in the Intel IPP. The live video preview in RGB
format is displayed directly to the LCD in Overlay 1.
Outgoing video encode streamthe sensor data is also
converted using Intel IPP to YCbCr 4:2:0 format, which is
accepted as input by the MPEG-4 video encoder. This
generates a compressed bit stream that is sent to a remote
recipient over the base band (802.11) network interface.
Incoming video streamoccurring simultaneously with the
other streams, the encoded stream from the remote recipient
is received and decoded by an MPEG-4 video decoder. The
output, in YCbCr 4:2:0 formats, is sent directly to the LCD in
Overlay 2, which performs color conversion and displays the
received video.
More information on utilizing Intel Quick Capture technology
is available in the application note titled Intel Quick Capture
Technology for the Intel PXA27x Processor Application Note.
5. Wireless Intel SpeedStep Technology
To help maintain system battery-life with increased performance,
the Intel PXA27x processor includes Wireless Intel SpeedStep
Technology, which dynamically optimizes application performance
and power usage to extend battery life for phones and PDAs.
This technology includes:
Five low-power states
Ability to change voltage and frequency dynamically
Wireless Intel SpeedStep power manager software
When running code, the processor operates in normal mode.
Additional low-power modes and their uses are summarized in
the following table:
Overlay 1
Base Plane
CMOS
Sensor
Base Band
or Network
Interface
Overlay 2
LCD Controller
Color
Space
Conversion
Display Scaling
2:1 to 4:1
Downscale
MPEG4
Video
Encode
Format
Convert
4:2:2 to 4:2:0
MPEG4
Video
Decode
Self Preview Video Stream
Outgoing Video Encode Stream
Decode of Incoming Baseband Video Stream
YCBCr 4:2:2 RGB565
64kbps bitstream YCBCr 4:2:0
IntelWireless MMXtechnology routines (Tentative, subject to change without notice)
7
White PaperOptimization Technology for the Intel PXA27x Processor
Figure 4: Example of Intel Quick Capture technology data streams
POWER STATE USAGE
Idle Processor clocks stopped with near-
instantaneous resumption. LCD may continue
to be refreshed via DMA.
Deep Idle Core and (optionally) peripheral PLLs disabled.
LCD may continue to be refreshed via DMA.
Standby Processor and peripheral state retained.
Sleep No state retained except for general-purpose
IO (GPIO). SRAM contents may be retained.
Deep Sleep No state retained. SRAM contents may be
retained.
8/13/2019 Intel Optimization of PXA27x
8/14
The processor supports dynamic runtime scaling of internal
core and bus frequencies, as well as core voltage. Commands
to change the voltage to an external power-management IC
(PMIC) chip are sent via a dedicated I 2C interface. Other low-
power features of the processor include internal SRAMpowered at 1.1V, and support for SDRAM down to 1.8V.
Intel provides the Wireless Intel SpeedStep power manager in
all BSPs. This software solution optimizes the usage of low-
power capabilities listed above to help maximize battery life for
phone standby time, talk time, and when running applications.
To include this solution, OEMs/ODMs should make sure
hardware systems support the power manager, include the
power manager in BSPs, and modify the platform-specific layer
for phones and PDAs. The power manager will adapt the
power policy to workloads and usage scenarios automatically,
without user intervention. OS vendors are not required to
modify operating systems, and ISVs do not need to modify
applications.
However, the power manager provides an Applications API
so that ISVs can enhance or fine-tune applications to achieve
power savings. The power manager also provides a User API
so end users can control the power policy.
The Wireless Intel SpeedStep Power Manager consists of five
software components or modules as shown in Figure 5.
The policy manager determines the system power policy
using several inputs, then uses OS services or its own services
to dynamically scale power and performance. The policy
includes defining the new operating system states and
desired frequency and voltages for the processor based on
the workload.
The policy manager also determines the operating system
power state. If this state is not supported by the specific
operating system then the power manager will create the
state and use the driver interface to transition the operating
system into the new state.
The idle profiler monitors the OS idle activity within the
operating system for a given workload and provides input
to the policy manager.
The performance profiler monitors the CPU percent utilization
and memory usage through the performance-monitoring
unit (PMU), a feature in Intel XScale microarchitecture. This
profiler determines if the workload is CPU-bound and/or
memory-bound, then provides input to the policy maker in the
policy manager.
The user interface allows users to tune the parameters
used to determine the power policy.
The operating system mapping layer allows the power
manager to be ported across multiple operating systems.
Applications (IPM)
Hardware
Key Pad Audio Display Comm USB Battery PMU
Applications
Policy
ManagerDVM/DFMState Mtg.
OS
Mapping
OS
PM
OEMIdle
OS
Services
Scheduler
User
Settings
Wireless Intel
Speedstep
Power
Manager
Operating
System
IdleProfiler
PerfProfiler
IPM EnhacedIPM Component IPM Optional OS Component
White Paper
8
Optimization Technology for the Intel PXA27x Processor
Figure 5: Wireless Intel SpeedStep Power Manager architecture
8/13/2019 Intel Optimization of PXA27x
9/14
Wireless Intel SpeedStep Technology Power Manager provides
a device driver interface and an application interface so that the
power policy is optimized, comprehensive and robust. All the
device drivers must register with the power manager through
the device driver APIs to get notification on all of the power
management events, such as state transitions, frequency
change and voltage change.
If the state is supported by the operating system, then the
power manager uses the operating system interface to notify
the device drivers. Otherwise, a power manager created by thedevice driver interface is used.
Upon receiving a callback for a power management state
transition or event, the device driver needs to transition to the
new state and prepare the device for the next state transition.
Device drivers have the flexibility to request a state change.
For example, the battery device driver provides input into the
policy manager for state changes based on thresholds for the
battery status.
The software components of the communications subsystem
are shown in Figure 6, illustrating the link between the
Application Subsystem and the Communications Subsystem,
which uses the serial or SSP Intel Mobile Scalable Link
(Intel MSL) in the Intel PXA27x processor.
The power management component of the communications
software (including the L1, L2 and L3 protocol layers) is
responsible for the managing the power for the communications
subsystem for each state for GSM and GPRS. The
communications device drivers running on the applications
subsystem is a client of Wireless Intel SpeedStep Technology
Power Manager and the OS power management, and receives
notifications from each on appropriate state transitions.
Example: When the OS goes into the standby state, the
communications driver is notified of this state change and
signals the communications power management component
about the state change. The communications subsystem is
then put into the low-power standby state and prepares to
wake up the new state requires processing on the Applications
Subsystem.
Similarly, for dynamic performance and power scaling, the
communications device driver is notified about frequency or
voltage change, which causes the appropriate communications
software to be notified.
Details about the Intel PXA27x processor power states can
be found in the Intel PXA27x Processor Developer Manuals.
Detailed information about the power manager can be found
in a Wireless Intel SpeedStep Technology Power Manager
application note, as well as in Intel BSPs.
6. Intel Integrated Performance Primitives(Intel IPP)
Intel IPP is a performance library of optimized algorithms
created to ease development of optimized applications. Intel
IPP includes general signal and image processing primitives, as
well as primitives that can be used to construct internationally
standardized audio, video, image, and speech encoder/
decoders (codecs) optimized for the Intel PXA27x processor
and Intel Wireless MMX technology.
Applications (IPM)
OS Services
IPM EnhacedIPM Component IPM Optional OS ComponentComm FW
Kernel OS PM IPM
CommUSB
MSL
Driver
USB
DriverAudioPMICHAL
PowerMgmt
Radio
AT/APEX InterfaceMSL
Protocol Stack L1
Protocol Stack L2/L3
Audio
9
White PaperOptimization Technology for the Intel PXA27x Processor
Figure 6: Communications subsystem
8/13/2019 Intel Optimization of PXA27x
10/14
White Paper
Intel IPP also supports application porting across certain
Intel platforms. Intel IPP supports a consistent Application
Programming Interface (API), so applications can be ported
easily across Intel server, desktop, and handheld processors
without sacrificing performance.
The following primitives are available for general one-
dimensional (1D) signal processing:
Vector initialization, arithmetic, statistics, thresholding,
and measure
Deterministic and random signal generation
Convolution, filtering, windowing, and transforms
Primitives for general two-dimensional (2D) image processing
include the following:
Vector initialization, arithmetic, statistics, thresholding,
and measure
Color space conversions
Morphological operations
Convolution, filtering, windowing, and transforms
Cryptography
Primitives for image capture:
Color format conversions: YCbCr 4:2:2 to 4:2:0
Gamma correction
Image scaling
Frame stabilization
Additional primitives are available that allow construction ofthe following multimedia codecs:
VideoITU H.263 decoder, ISO/IEC 14496-2 MPEG-4
decoder
AudioISO/IEC 11172-3 and 13818-3 (MPEG-1, -2) Layer 3
(MP3) decoder
SpeechITU-T G.723.1 codec and ETSI GSM-AMR codec
ImageISO/IEC JPEG codec.
A companion library, Intel Graphics Performance Primitives
(Intel GPP), includes primitives for 2D and 3D graphics. Intel
GPP includes the following optimized primitives:
Data-type conversion
Fixed-point arithmetic, including trigonometric functions
Vector and matrix operations
Rasterization
For more information about Intel IPP, visit
http://www.intel.com/software/products/ipp/.
10
Optimization Technology for the Intel PXA27x Processor
JTAG ROM
C/C++ Compiler
AssemblerLinkerC++
Library
C
Library
FPU
Library
I/F to
Specific
Hardware
Intrinsic
Functions
JTAG ROM
Monitor
Object
Filters
Compiler System
XDB Debugger Platform
Debuggers
Simulator OS
OS
Debug TaskIntel XScale
Stimulator
Microsoft*
Windows*T1 T3T2T1 T3T2T1 T3T2
Figure 7: Intel Software Development tools
8/13/2019 Intel Optimization of PXA27x
11/14
7. Intel Software Development Tools
Intel Software Development Tools are comprised of compiler
and debugger tools for building and debugging system
platforms and applications. The tools include:
Compiler system, including compiler, assembler, linker, and
optimized libraries
Debuggers for all stages of system software and application
development including simulators, JTAG and OS-aware
debuggers
7.1 Compilers
Intel compilers are highly optimized for Intel PCA processors.
The compiler systems generate code for Microsoft Windows
CE.NET, Microsoft Windows Mobile 2003 software for Pocket
PCs and Smartphones; Palm OS; Nucleus* OS and for OS-
independent systems. Important features of the compiler
systems include:
Support for Intel Wireless MMX technology using assembly
language and inline assembly, intrinsics, and a vectorizer
Linker with optimized C-runtime and floating-point libraries
Large set of optimization switches
PNO (Pace Native Objects or ARMlets) support for
Palm OS v5.x
Support for Intel Wireless MMX technology instructions includes
assembly language and inline assembly of those instructions,
as well as intrinsics and vectorization.
Intrinsics allow Intel Wireless MMX technology instructions
to be used in C/C++ code without the use of inline assembly
language. These macro-like functions help the compiler
keep track of registers for optimization and eases code
maintenance.
Vectorization is a feature that uses Intel Wireless MMX
technology to vectorize code that is normally scalar. The
vectorizer identifies code that can be run as SIMD operationsand attempts to use Intel Wireless MMX technology
instructions to implement those areas of code. Another way
to use Intel Wireless MMX technology is to link in code
developed using the Intel IPP.
7.2 Debuggers
The Intel Software Development Tools also provide a
comprehensive set of debuggers to support all phases
of development.
A simulator debugger contains the silicon model with simulated
peripheral and device registers, eliminating the need for real
hardware in this phase of the development cycle.
A JTAG debugger version allows access to the hardware
through JTAG so that hardware prototypes can be fully testedwithout having to debug client software running on the target.
The XDB debuggers provide full Flash memory support to
burn an OS image into board memory.
The ROM Monitor Debugger is a software solution and is
mainly used for ISV application debugging.
Appropriate OS-awareness plug-ins can be loaded either into
the JTAG debugger or into the ROM Monitor Debugger, so that
developers can follow the kernel with its task switches, queue
and semaphore tables, and other OS constructs. A JTAG
hardware connector on the target system is not required.
All debuggers share a common GUI and have the same basic
functionalities, including the following features:
First class C/C++ debugging support, including breakpoints,
local variable and memory display, and stepping through code
by using script language.
Direct access to registers that control Intel XScale
technology, Intel Wireless MMX technology and other
coprocessor registers, and on-chip peripherals. A bit field
editor and description of all bit fields for each register is
particularly useful for low-level driver development.
OS awareness plug-ins allow inspection of tasks, threads,
and other OS activities.
Support for scripting for batch files and for automated
validation allows overnight debugging.
Execution Trace Support displays code execution history
that occurred before breakpoint encountered.
System and application developers who use the Windows CE
.NET environments can use the Intel debugging extensions,
which are seamlessly integrated into those tools with an
additional debugger window.
OEMs developing on Microsoft Windows CE .NET and Windows
Mobile 2003 typically use Microsoft Platform Builder* to build and
debug their target platform, and communicate directly to the
target via JTAG or TCP/IP. The Intel debuggers also provide eXDI
debugger extensions for access with Macraigor Raven*
and EPI JTAG tools (available through EPI*).
ISVs developing applications for Microsoft Windows CE .NET
and Windows Mobile 2003 can use Microsoft eMbedded Visual
11
White PaperOptimization Technology for the Intel PXA27x Processor
8/13/2019 Intel Optimization of PXA27x
12/14
White Paper
C++* to build and debug their applications on an OEM or
ODMs device. eMbedded Visual C++ provides debugger
extensions to allow direct access of the Intel XScale technology
peripheral registers on that device.
The extensions only work, however, if the OEM/ODM builds
three specific ioctl functions into their system ROM or Flash that
are part of the Intel BSPs for CE .NET and for Windows Mobile
2003. System developers need to keep these functions in their
device BSPs to enable ISVs to develop and showcase
applications onto those devices.
The IntelXDB Debugger also supports communication over a
ROM monitor for ISVs developing on off-the-shelf target
devices. This is currently available for Palm OS only. For Palm
OS, system developers need to implement the ROM monitor to
ensure that ISVs can debug applications on these devices.
Intel Software Development Tools are available in the following
packages:
Intel C++ Compiler, For Platform Builder For Microsoft
Windows CE .NET
Intel C++ Compiler, For Microsoft eMbedded Visual C++
Intel C++ Software Development Tool Suite, For Palm OS,
Symbian OS, Nucleus OS, and OS independent systems
For more information on the Intel Software Development Tools,
visit http://www.intel.com/software/products/compilers.
8. IntelVTune Performance Analyzer
The IntelVTune Performance Analyzer helps developers to
optimize software on Intel processors, including Intel XScale
technology-based processors like the Intel PXA27x processor.
The Intel VTune analyzer remote collector component identifies
potential performance issues and provides recommendations
for improving software utilization of the processor hardware
and design. This tool works in a host/target development
environment, with performance data gathered by a data
collector running on a target development system with Intel
XScale technology. Software running on Intel applications
processors can be profiled remotely using the Intel VTune
performance analyzer GUI on the host system.
The Intel VTune performance analyzer provides performance
data from the system level down to the source level. Sampling
provides developers with the most accurate representation of
actual software performance, with negligible overhead. Using
the Performance Monitoring Unit (PMU) in the processor, all
program activity can be profiled at the microarchitecture level.
With intimate knowledge of the processor, the Intel VTune
performance analyzer provides additional analysis of potential
stalls and latency issues, and suggests techniques for improving
code performance. The Intel VTune performance analyzer is
available in beta for Microsoft Windows 2003 Mobile Software
and Linux targets using the Intel PXA27x processor.
Intel provides BSPs for Windows 2003 Mobile Software
(Smartphone and Pocket PC) for Intel PCA processors, and
the Intel VTune feature is part of the BSPs. To provide Intel
VTune support within handheld and cell phone devices to ISVs,
OEMs need to keep Intel VTune support in specific BSPs by
following the steps:
Windows 2003 Mobile Software-based devices:
Keep the Intel VTune (PMUDLL.dll) feature in Intels BSPs
based on Microsoft Windows 2003 Mobile Software (Pocket
PC and Smartphone)
Linux-based devices:
Rebuild the Intel VTune Device Driver Sources once and
provide binary as loadable module to ISVs. No modifications
on the BSP are required
9. Intel PCA Developer Network
With more than 6,000 individual members, 3,500 companies,
and over 750 available solutions, the Intel PCA Developer
Network is a global community of hardware and software
developers working to accelerate the delivery of next-
generation wireless Internet applications and client devices.
The Network supports wireless device and equipment
manufacturers, application developers and service providers
in these key areas:
Software optimization for processors based on Intel XScale
technology
Development support
Tools and technical support for Intel PCA building blocks
Marketing support and co-marketing opportunities
Solutions for wireless carriers and operators
Solutions for the wireless enterprise
Visit http://developer.intel.com/pca/developernetwork/index.htm
and see what additional resources and support await you today.
12
Optimization Technology for the Intel PXA27x Processor
8/13/2019 Intel Optimization of PXA27x
13/14
8/13/2019 Intel Optimization of PXA27x
14/14
White Paper Optimization Technology for the Intel PXA27x Processor
Performance tests and ratings contained within this document are measured using specific computer systems and/or components and reflectthe approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configurationmay affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they
are considering purchasing. For more information on performance tests and on the performance of Intel products, referencewww.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104
*Other names and brands may be claimed as the property of others.
Intel, the Intel logo, Pentium, Intel Centrino, Intel XScale, VTune, Personal Internet Client Architecture, Intel SpeedStep, MMX, and Wireless MMX aretrademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Copyright 2004 Intel Corporation. All rights reserved. 0304/MS/MD/PDF t Please Recycle 300869-001
Appendix A.
Overview of Key SystemOptimizations
Refer to the Intel PXA27x Processor Developer
Manual and the Intel PXA27x Processor
Optimization Guide for detailed tips on
optimizations.
Use internal SRAM to help reduce power and
increase performance.
Set CCCR[A] and CLKCFG[B]=1 to maximize
memory performance.
Make sure to set the bus arbiter and parking
settings properly.
Make sure the device BSP and OS allow the
presence of the Intel PXA27x processor to be
detected.
Make sure the Intel Wireless MMX technology
coprocessor is enabled in the device BSP.
Make sure the registers specific to Intel
Wireless MMX technology are preserved across
context switches and changes in power state.
Use the Intel Integrated Performance Primitives
to develop codecs with optimized performance.
Use the Intel Graphics Performance Primitivesto develop 3D pipelines with optimized
performance.
Use the Intel Quick Capture Interface to ease
interfacing with cameras.
Use Overlay 2 to perform hardware color
conversion.
Port the Wireless Intel SpeedStep Power
Manager software into the device BSP.
Use the Wireless Intel SpeedStep Power
Manager application API in applications to tune
power consumption.
Use the Intel Software Development Tools (Intel
C/C++ compiler) to build applications enabledwith Intel Wireless MMX technology.
Use the Intel XDB Debugger to debug
applications.
For OEMs building devices based on Microsoft
Windows CE.NET, Microsoft Windows Mobile
2003* software for Pocket PC and for
Smartphone: build in the specific ioctl() functions
compatible with Intel XDB Debugger into the
device system to allow direct debugging using
Microsoft eMbedded Visual C++.
For OEMs building devices based on Palm OS:
build in the ROM monitor compatible with the
Intel XDB Debugger into your devices system
to allow debugging.
Use the Intel VTune Performance Analyzer to
help profile and optimize applications.
For OEMs building devices based on Microsoft
Windows CE .NET, Microsoft Windows Mobile
2003 software for Pocket PC and for
Smartphone: keep the Intel VTune library
(PMUDLL.dll) in your device to enable ISVs touse VTune to tune applications for your device.
For OEMs building devices based on Linux:
rebuild the Intel VTune device driver sources
once and provide binary as loadable module to
ISVs. No modifications on BSPs are required.
For more information, visit the Intel Web site at: developer.intel.com