Intel Optimization of PXA27x

Embed Size (px)

Citation preview

  • 8/13/2019 Intel Optimization of PXA27x

    1/14

    White Paper

    Optimization Technology for the

    Intel PXA27x Processor FamilyPerformance and Power Savings for Wireless System and

    Application Development

  • 8/13/2019 Intel Optimization of PXA27x

    2/14

    Table of Contents

    1. Introduction 3

    2. Optimizing the Intel PXA27x Processors

    Performance via the BSP 3

    2.1 Optimizing for On-chip SRAM 4

    2.2 Optimizing for the Enhanced Memory Subsystem 4

    2.3 Optimizing for the Bus Transaction Arbiter 4

    3. Intel Wireless MMX Technology 4

    3.1 Enabling Intel Wireless MMX Technology 5

    3.2 Writing Intel Wireless MMX Technology Code 6

    4. Intel Quick Capture Technology 6

    5. Wireless Intel SpeedStep Technology 7

    6. Intel Integrated Performance Primitives (Intel IPP) 9

    7. Intel Software Development Tools 11

    7.1 Compilers 11

    7.2 Debuggers 11

    8. IntelVTune Performance Analyzer 12

    9. Intel PCA Developer Network 12

    10. Summary 13

    Appendix A. Overview of Key System Optimizations 14

  • 8/13/2019 Intel Optimization of PXA27x

    3/14

    1. Introduction

    The Intel Personal Internet Client Architecture (Intel PCA)

    PXA27x processor family offers developers a new generation of

    ultra-low power and industry-leading multimedia performance

    on silicon. Intel has integrated a host of new features in the

    Intel PXA27x processor family to enable this level of power

    and performance, including:

    Intel Wireless MMX technology

    Wireless Intel SpeedStep technology

    Intel Quick Capture technology

    Up to 624MHz core speed

    Enhanced memory subsystem

    Intelligent bus transaction arbiter 256K of on-chip SRAM

    Intels complete suite of development components are designed

    to help customers to take full advantage of cutting-edge

    technologies and get the best power and performance from

    mobile devices. These technologies also allow Independent

    Software Vendors (ISVs) to fully tune their applications.

    This paper describes a typical development cycle and how to

    take advantage of the optimization technologies available from

    Intel. The key features this paper addresses are:

    Operating System Board Support Packages (BSPs)

    Intel provides BSPs for a variety of operating systems

    including Linux*, Microsoft Windows* Mobile for PocketPCs

    and Smartphones, Microsoft Windows* CE .NET, Palm* OS,

    and Symbian* OS. The BSPs include the latest optimizations

    and drivers for the Intel PXA27x processor family and make

    it easy for customers to create a BSP customized for their

    own mobile device.

    Intel Wireless MMX Technologyan advanced set of

    multimedia instructions that brings desktop-like multimedia

    performance to Intel PXA27x processor-based clients, whileminimizing the power needed to run rich applications.

    Wireless Intel SpeedStep Technologyallows customer

    to dynamically adjust the power and performance of the

    processor based on CPU demand. This can significantly

    decrease power consumption in wireless handheld devices.

    Intel Quick Capture Technologyprovides the ability to

    get live video and high-quality still images from a wide range

    of camera sensors in current and future camera-enabled

    mobile handsets and PDAs.

    Intel Integrated Performance Primitives (Intel IPP)

    a cross-platform software library that allows users to write

    optimized applications that utilize Intel Wireless MMX technology

    to maximize performance on the Intel PXA27x processor.

    Intel

    Software Development Tools (Intel

    SDT)provides both an optimizing compiler and a set of

    sophisticated, high-level language debuggers to help

    software run at top speed.

    IntelVTuneAnalyzerthis tool lets users profile

    applications for hotspots of activity. A tuning assistant

    provides support to optimize C/C++ code and/or assembler

    sequences.

    Intel PCA Developer Networkprovides information on

    third-party software applications that are already optimized,

    as well as optimization labs and support to answer

    questions. With over 1,000 companies and over 3,000

    different software and hardware solutions, the Intel PCA

    Developer Network can help customers find value-add

    solutions for mobile devices.

    This paper introduces Intel optimization technologies and

    address how each fit into a typical development cycle consisting

    of iterations of coding, optimizing, and profiling. Devices that take

    advantage of these optimizations achieve significant performance

    improvements and power savings over those that do not.

    Applications that take advantage of these optimizations will run

    faster and more efficiently on the Intel PXA27x processor-based

    devices. Pointers to additional resources that provide more

    detailed information on each technology are provided at the

    end of this paper.

    2. Optimizing the Intel PXA27xProcessors Performance via the BSP

    To ensure that designs and applications take full advantage of

    the technology in the Intel PXA27x processor, OEM and ODMs

    should make sure that the device BSP supports these features.

    Intel provides BSPs for a variety of operating systems includingLinux, Microsoft Windows Mobile for PocketPCs and

    Smartphones, Microsoft Windows CE .NET, Palm OS, and

    Symbian OS. The BSPs include the latest optimizations and

    drivers for the Intel PXA27x processor and make it easy for

    customers to create a customized BSP.

    The BSPs for the Intel PXA27x processor contain an extensive

    number of optimizations, the latest versions of which can be

    obtained from Intel field sales representatives. While a full list

    3

    White PaperOptimization Technology for the Intel PXA27x Processor

  • 8/13/2019 Intel Optimization of PXA27x

    4/14

    White Paper

    of the available optimized drivers is beyond the scope of this

    paper, several key optimizations for the Intel PXA27x processor

    are described here, including:

    Optimizing for 256K of on-chip SRAM

    Enabling and utilizing the enhanced memory subsystem

    Taking advantage of the bus-transaction arbiter

    2.1 Optimizing for On-chip SRAM

    The internal SRAM can be used for frame buffers as well as

    storage of variables or data to be processed. The SRAM has a

    fast access time, and is powered from the VCC_SRAM domain,

    offering both lower power and higher performance than using

    external memory.

    Example: a 320x240x16 bit-per-pixel frame buffer consumes154K of memory, allowing the rest to be used for temporary

    storage of MPEG-4* video buffers, a Java* virtual machine

    heap, incoming data from the Intel Quick Capture camera

    interface, executable code, streamed data, or other variables

    that need to be accessed quickly.

    The SRAM is comprised of four independently controllable

    64K banks. When entering sleep or deep-sleep mode, one

    or more banks can remain powered on. This retains the OS

    state so context can be restored quickly upon wakeup from

    those modes.

    2.2 Optimizing for the Enhanced Memory

    Subsystem

    The Intel PXA27x processor family enhances and adds flexibility

    to the bus settings of the Intel PXA255 processor family, which

    supported a 200-MHz system bus at core speed of 400MHz.

    The internal system bus in the Intel PXA27x processor can run

    up to 208MHz using fast bus mode at many other product

    points, including 312 and 208MHz by setting the CLKCFG[B]

    bit to one. As a result, applications on the Intel PXA27x

    processor can offer better performance at the lower

    frequency settings.

    The memory controller offers flexibility to run at greater speeds

    than before by setting the CCCR[A] bit to one. This helps

    reduce latency and increases bandwidth to memory, offering

    better system performance.

    2.3 Optimizing for the Bus Transaction Arbiter

    The bus arbiter in the Intel PXA27x processor performs the

    arbitration for internal-bus-access transactions, which is

    programmable through the ARB_CNTRL register. The Intel

    PXA27x processor system bus supports six clientsthe core,

    the DMA controller, the LCD controller, the USB host controller,

    and both an internal and external memory controller. Customers

    can program priority weights for each of these clients via the

    arbiter-control register, which enables fine-tuning of device

    performance based on the typical usage model for that device.

    Example: a device is designed to encode MPEG-4 video using

    Intel Quick Capture interface in the Intel PXA27x processor

    and stream it over USB Host to an attached USB Client. By

    assigning higher priority weights to the core and the USB host

    controller, improving performance of processing the MPEG-4

    video stream (CPU intensive) and transmitting it via USB Host.

    Customers are encouraged to test the performance benefits of

    applying different priority weights to these clients based on the

    supported usage model.

    Further performance can be gained by speculatively parking

    a specific client on the arbiter. This means that the arbiter will

    always start with that client when internal-bus-access

    transactions are performed. Setting the arbiter-control register

    to park on the core often results in the best performance.

    This is only an overview of the key features that should be in

    your BSP or in your application. Other features are listed in

    the other sections of this paper.

    For more information, consult the following documentation:

    The Intel PXA27x Processor Optimization Guide

    The Intel PXA27x Processor Developer Manual

    Volume I of III

    The Intel PXA27x Processor Developer Manual

    Volume II of III

    The Intel PXA27x Processor Developer Manual

    Volume III of III

    3. Intel Wireless MMX Technology

    Introduced in 2003, Intel Wireless MMX technology is an advanced

    set of multimedia instructions that help bring desktop-like

    multimedia performance to Intel PXA27x processor-based clients,

    while minimizing the power needed to run rich applications.

    4

    Optimization Technology for the Intel PXA27x Processor

  • 8/13/2019 Intel Optimization of PXA27x

    5/14

  • 8/13/2019 Intel Optimization of PXA27x

    6/14

    3.2 Writing Intel Wireless MMX Technology Code

    After Intel Wireless MMX technology is enabled, there are

    several usage options to consider:

    Write directly in assembly language, or use inline assembler

    this offers the most flexibility, but requires more effort and

    maintenance than other options.

    Use the Intel C/C++ Compiler to use intrinsics that

    support Intel Wireless MMX technology, and to use the

    vectorizer feature.

    Link to Intel Integrated Performance Primitives, which

    are pre-optimized libraries that provide a high level of

    abstraction to jump-start multimedia and signal

    processing-based applications.

    The figure above shows the relative performance benefits

    for Intel Wireless MMX technology over scalar code. This test

    decodes an MPEG-4 CIF resolution video clip and an MP3

    file simultaneously.

    Actual benchmarks were run on a Mainstone I system (main

    board rev 1.1 ECO B, Rev 2 daughtercard, ECO D with 2.5 volt

    VCC_MEM) with The Intel PXA27x processor A1 stepping running

    at speeds indicated in graph. The 208-MHz measurements made

    in the processor Run mode and measurements at all other

    frequencies were made in turbo Mode. The system bus was

    104Mhz for 208, 312, 416 and 520MHz core frequencies. This

    platform represents a bare metal system with no operating

    system installed. MPEG-4 decoder implement with Intel IPP library

    optimized for Wireless MMX and MPEG-4 content is the CIF

    resolution video clip Coastguard in portrait mode.

    For more details on optimizing your application for Intel Wireless

    MMX technology, reference the following documentation:

    The Intel PXA27x Processor Optimization Guide

    The book Programming with Intel Wireless MMX

    Technology, available from Amazon.com (www.amazon.com),

    is a comprehensive programming guide for Intel Wireless

    MMX technology and an invaluable resource for this exciting

    technology. Pre-order at http://www.amazon.com/exec/

    obidos/tg/detail/-/0974364916

    4. Intel Quick Capture Technology

    With a growing number of PDAs and cell phones that include

    digital cameras, the Intel PXA27x processor introduces Intel

    Quick Capture technology. This provides high-performance

    and low-power solutions for still and video image capture

    and playback.

    Intel Quick Capture Technology includes the following key

    components:

    Highly flexible Intel Quick Capture Interface

    Hardware color-space conversion

    The Intel Quick Capture interface eases the connection between

    the Intel PXA27x processor, CMOS and some CCD sensors. It is

    a 4-, 5-, 8-, 9-, or 10-bit wide bus with control and clock lines

    that can be used in master and slave modes. The maximum

    programmable resolution supported is 2048x2048 pixels, or

    4.1 mega pixels.

    The LCD controller in the Intel PXA27x processor supports a

    hardware cursor and three image places: one base plane and two

    overlays. The software interface that controls camera applications

    typically resides on the base plane, and a preview window and/or

    a decompressed image is displayed on the overlays.

    90.0

    80.0

    70.0

    60.0

    50.0

    40.0

    30.0

    20.0

    10.0

    0.0

    FRAMERA

    TE(fps)

    IntelWireless MMX ARM* v5TE Instructions Only

    51.0

    32.0

    208 MHz

    66.0

    42.0

    312 MHz

    75.0

    50.0

    416 MHz

    Higher is

    Better!

    White Paper

    6

    Optimization Technology for the Intel PXA27x Processor

    Figure 2: Intel Wireless MMX benefits over Scalar code

    Figure 3: Programming with Intel Wireless MMX technology

  • 8/13/2019 Intel Optimization of PXA27x

    7/14

    Overlay 2 has built-in hardware color-conversion from various

    luminance-chrominance (YCbCr) formats to red/green/blue (RGB)

    output. Here are two examples of where this can be used:

    Camera sensors often output data in YCbCr 4:2:2 formats.

    When performing camera preview for still-image or video

    capture, the output of the camera sensor can be sent directly

    to Overlay 2, which converts the YCbCr to RGB for viewing

    on the LCD panel.

    The output of an MPEG-4 decoder in YCbCr 4:2:0 format

    can be sent directly to Overlay 2, allowing the video

    sequence to be displayed on the LCD panel.

    These sample scenarios can be used in other advanced

    applications such as MPEG-4 video conferencing. The application

    is best understood in Figure 4, which shows all key features of the

    Intel Quick Capture technology, including three main data paths:

    Self-preview video streamdata from the sensor in YCbCr

    4:2:2 format is down sampled and converted to RGB 5:6:5

    format using Intel Wireless MMX technology. These functions

    are already in the Intel IPP. The live video preview in RGB

    format is displayed directly to the LCD in Overlay 1.

    Outgoing video encode streamthe sensor data is also

    converted using Intel IPP to YCbCr 4:2:0 format, which is

    accepted as input by the MPEG-4 video encoder. This

    generates a compressed bit stream that is sent to a remote

    recipient over the base band (802.11) network interface.

    Incoming video streamoccurring simultaneously with the

    other streams, the encoded stream from the remote recipient

    is received and decoded by an MPEG-4 video decoder. The

    output, in YCbCr 4:2:0 formats, is sent directly to the LCD in

    Overlay 2, which performs color conversion and displays the

    received video.

    More information on utilizing Intel Quick Capture technology

    is available in the application note titled Intel Quick Capture

    Technology for the Intel PXA27x Processor Application Note.

    5. Wireless Intel SpeedStep Technology

    To help maintain system battery-life with increased performance,

    the Intel PXA27x processor includes Wireless Intel SpeedStep

    Technology, which dynamically optimizes application performance

    and power usage to extend battery life for phones and PDAs.

    This technology includes:

    Five low-power states

    Ability to change voltage and frequency dynamically

    Wireless Intel SpeedStep power manager software

    When running code, the processor operates in normal mode.

    Additional low-power modes and their uses are summarized in

    the following table:

    Overlay 1

    Base Plane

    CMOS

    Sensor

    Base Band

    or Network

    Interface

    Overlay 2

    LCD Controller

    Color

    Space

    Conversion

    Display Scaling

    2:1 to 4:1

    Downscale

    MPEG4

    Video

    Encode

    Format

    Convert

    4:2:2 to 4:2:0

    MPEG4

    Video

    Decode

    Self Preview Video Stream

    Outgoing Video Encode Stream

    Decode of Incoming Baseband Video Stream

    YCBCr 4:2:2 RGB565

    64kbps bitstream YCBCr 4:2:0

    IntelWireless MMXtechnology routines (Tentative, subject to change without notice)

    7

    White PaperOptimization Technology for the Intel PXA27x Processor

    Figure 4: Example of Intel Quick Capture technology data streams

    POWER STATE USAGE

    Idle Processor clocks stopped with near-

    instantaneous resumption. LCD may continue

    to be refreshed via DMA.

    Deep Idle Core and (optionally) peripheral PLLs disabled.

    LCD may continue to be refreshed via DMA.

    Standby Processor and peripheral state retained.

    Sleep No state retained except for general-purpose

    IO (GPIO). SRAM contents may be retained.

    Deep Sleep No state retained. SRAM contents may be

    retained.

  • 8/13/2019 Intel Optimization of PXA27x

    8/14

    The processor supports dynamic runtime scaling of internal

    core and bus frequencies, as well as core voltage. Commands

    to change the voltage to an external power-management IC

    (PMIC) chip are sent via a dedicated I 2C interface. Other low-

    power features of the processor include internal SRAMpowered at 1.1V, and support for SDRAM down to 1.8V.

    Intel provides the Wireless Intel SpeedStep power manager in

    all BSPs. This software solution optimizes the usage of low-

    power capabilities listed above to help maximize battery life for

    phone standby time, talk time, and when running applications.

    To include this solution, OEMs/ODMs should make sure

    hardware systems support the power manager, include the

    power manager in BSPs, and modify the platform-specific layer

    for phones and PDAs. The power manager will adapt the

    power policy to workloads and usage scenarios automatically,

    without user intervention. OS vendors are not required to

    modify operating systems, and ISVs do not need to modify

    applications.

    However, the power manager provides an Applications API

    so that ISVs can enhance or fine-tune applications to achieve

    power savings. The power manager also provides a User API

    so end users can control the power policy.

    The Wireless Intel SpeedStep Power Manager consists of five

    software components or modules as shown in Figure 5.

    The policy manager determines the system power policy

    using several inputs, then uses OS services or its own services

    to dynamically scale power and performance. The policy

    includes defining the new operating system states and

    desired frequency and voltages for the processor based on

    the workload.

    The policy manager also determines the operating system

    power state. If this state is not supported by the specific

    operating system then the power manager will create the

    state and use the driver interface to transition the operating

    system into the new state.

    The idle profiler monitors the OS idle activity within the

    operating system for a given workload and provides input

    to the policy manager.

    The performance profiler monitors the CPU percent utilization

    and memory usage through the performance-monitoring

    unit (PMU), a feature in Intel XScale microarchitecture. This

    profiler determines if the workload is CPU-bound and/or

    memory-bound, then provides input to the policy maker in the

    policy manager.

    The user interface allows users to tune the parameters

    used to determine the power policy.

    The operating system mapping layer allows the power

    manager to be ported across multiple operating systems.

    Applications (IPM)

    Hardware

    Key Pad Audio Display Comm USB Battery PMU

    Applications

    Policy

    ManagerDVM/DFMState Mtg.

    OS

    Mapping

    OS

    PM

    OEMIdle

    OS

    Services

    Scheduler

    User

    Settings

    Wireless Intel

    Speedstep

    Power

    Manager

    Operating

    System

    IdleProfiler

    PerfProfiler

    IPM EnhacedIPM Component IPM Optional OS Component

    White Paper

    8

    Optimization Technology for the Intel PXA27x Processor

    Figure 5: Wireless Intel SpeedStep Power Manager architecture

  • 8/13/2019 Intel Optimization of PXA27x

    9/14

    Wireless Intel SpeedStep Technology Power Manager provides

    a device driver interface and an application interface so that the

    power policy is optimized, comprehensive and robust. All the

    device drivers must register with the power manager through

    the device driver APIs to get notification on all of the power

    management events, such as state transitions, frequency

    change and voltage change.

    If the state is supported by the operating system, then the

    power manager uses the operating system interface to notify

    the device drivers. Otherwise, a power manager created by thedevice driver interface is used.

    Upon receiving a callback for a power management state

    transition or event, the device driver needs to transition to the

    new state and prepare the device for the next state transition.

    Device drivers have the flexibility to request a state change.

    For example, the battery device driver provides input into the

    policy manager for state changes based on thresholds for the

    battery status.

    The software components of the communications subsystem

    are shown in Figure 6, illustrating the link between the

    Application Subsystem and the Communications Subsystem,

    which uses the serial or SSP Intel Mobile Scalable Link

    (Intel MSL) in the Intel PXA27x processor.

    The power management component of the communications

    software (including the L1, L2 and L3 protocol layers) is

    responsible for the managing the power for the communications

    subsystem for each state for GSM and GPRS. The

    communications device drivers running on the applications

    subsystem is a client of Wireless Intel SpeedStep Technology

    Power Manager and the OS power management, and receives

    notifications from each on appropriate state transitions.

    Example: When the OS goes into the standby state, the

    communications driver is notified of this state change and

    signals the communications power management component

    about the state change. The communications subsystem is

    then put into the low-power standby state and prepares to

    wake up the new state requires processing on the Applications

    Subsystem.

    Similarly, for dynamic performance and power scaling, the

    communications device driver is notified about frequency or

    voltage change, which causes the appropriate communications

    software to be notified.

    Details about the Intel PXA27x processor power states can

    be found in the Intel PXA27x Processor Developer Manuals.

    Detailed information about the power manager can be found

    in a Wireless Intel SpeedStep Technology Power Manager

    application note, as well as in Intel BSPs.

    6. Intel Integrated Performance Primitives(Intel IPP)

    Intel IPP is a performance library of optimized algorithms

    created to ease development of optimized applications. Intel

    IPP includes general signal and image processing primitives, as

    well as primitives that can be used to construct internationally

    standardized audio, video, image, and speech encoder/

    decoders (codecs) optimized for the Intel PXA27x processor

    and Intel Wireless MMX technology.

    Applications (IPM)

    OS Services

    IPM EnhacedIPM Component IPM Optional OS ComponentComm FW

    Kernel OS PM IPM

    CommUSB

    MSL

    Driver

    USB

    DriverAudioPMICHAL

    PowerMgmt

    Radio

    AT/APEX InterfaceMSL

    Protocol Stack L1

    Protocol Stack L2/L3

    Audio

    9

    White PaperOptimization Technology for the Intel PXA27x Processor

    Figure 6: Communications subsystem

  • 8/13/2019 Intel Optimization of PXA27x

    10/14

    White Paper

    Intel IPP also supports application porting across certain

    Intel platforms. Intel IPP supports a consistent Application

    Programming Interface (API), so applications can be ported

    easily across Intel server, desktop, and handheld processors

    without sacrificing performance.

    The following primitives are available for general one-

    dimensional (1D) signal processing:

    Vector initialization, arithmetic, statistics, thresholding,

    and measure

    Deterministic and random signal generation

    Convolution, filtering, windowing, and transforms

    Primitives for general two-dimensional (2D) image processing

    include the following:

    Vector initialization, arithmetic, statistics, thresholding,

    and measure

    Color space conversions

    Morphological operations

    Convolution, filtering, windowing, and transforms

    Cryptography

    Primitives for image capture:

    Color format conversions: YCbCr 4:2:2 to 4:2:0

    Gamma correction

    Image scaling

    Frame stabilization

    Additional primitives are available that allow construction ofthe following multimedia codecs:

    VideoITU H.263 decoder, ISO/IEC 14496-2 MPEG-4

    decoder

    AudioISO/IEC 11172-3 and 13818-3 (MPEG-1, -2) Layer 3

    (MP3) decoder

    SpeechITU-T G.723.1 codec and ETSI GSM-AMR codec

    ImageISO/IEC JPEG codec.

    A companion library, Intel Graphics Performance Primitives

    (Intel GPP), includes primitives for 2D and 3D graphics. Intel

    GPP includes the following optimized primitives:

    Data-type conversion

    Fixed-point arithmetic, including trigonometric functions

    Vector and matrix operations

    Rasterization

    For more information about Intel IPP, visit

    http://www.intel.com/software/products/ipp/.

    10

    Optimization Technology for the Intel PXA27x Processor

    JTAG ROM

    C/C++ Compiler

    AssemblerLinkerC++

    Library

    C

    Library

    FPU

    Library

    I/F to

    Specific

    Hardware

    Intrinsic

    Functions

    JTAG ROM

    Monitor

    Object

    Filters

    Compiler System

    XDB Debugger Platform

    Debuggers

    Simulator OS

    OS

    Debug TaskIntel XScale

    Stimulator

    Microsoft*

    Windows*T1 T3T2T1 T3T2T1 T3T2

    Figure 7: Intel Software Development tools

  • 8/13/2019 Intel Optimization of PXA27x

    11/14

    7. Intel Software Development Tools

    Intel Software Development Tools are comprised of compiler

    and debugger tools for building and debugging system

    platforms and applications. The tools include:

    Compiler system, including compiler, assembler, linker, and

    optimized libraries

    Debuggers for all stages of system software and application

    development including simulators, JTAG and OS-aware

    debuggers

    7.1 Compilers

    Intel compilers are highly optimized for Intel PCA processors.

    The compiler systems generate code for Microsoft Windows

    CE.NET, Microsoft Windows Mobile 2003 software for Pocket

    PCs and Smartphones; Palm OS; Nucleus* OS and for OS-

    independent systems. Important features of the compiler

    systems include:

    Support for Intel Wireless MMX technology using assembly

    language and inline assembly, intrinsics, and a vectorizer

    Linker with optimized C-runtime and floating-point libraries

    Large set of optimization switches

    PNO (Pace Native Objects or ARMlets) support for

    Palm OS v5.x

    Support for Intel Wireless MMX technology instructions includes

    assembly language and inline assembly of those instructions,

    as well as intrinsics and vectorization.

    Intrinsics allow Intel Wireless MMX technology instructions

    to be used in C/C++ code without the use of inline assembly

    language. These macro-like functions help the compiler

    keep track of registers for optimization and eases code

    maintenance.

    Vectorization is a feature that uses Intel Wireless MMX

    technology to vectorize code that is normally scalar. The

    vectorizer identifies code that can be run as SIMD operationsand attempts to use Intel Wireless MMX technology

    instructions to implement those areas of code. Another way

    to use Intel Wireless MMX technology is to link in code

    developed using the Intel IPP.

    7.2 Debuggers

    The Intel Software Development Tools also provide a

    comprehensive set of debuggers to support all phases

    of development.

    A simulator debugger contains the silicon model with simulated

    peripheral and device registers, eliminating the need for real

    hardware in this phase of the development cycle.

    A JTAG debugger version allows access to the hardware

    through JTAG so that hardware prototypes can be fully testedwithout having to debug client software running on the target.

    The XDB debuggers provide full Flash memory support to

    burn an OS image into board memory.

    The ROM Monitor Debugger is a software solution and is

    mainly used for ISV application debugging.

    Appropriate OS-awareness plug-ins can be loaded either into

    the JTAG debugger or into the ROM Monitor Debugger, so that

    developers can follow the kernel with its task switches, queue

    and semaphore tables, and other OS constructs. A JTAG

    hardware connector on the target system is not required.

    All debuggers share a common GUI and have the same basic

    functionalities, including the following features:

    First class C/C++ debugging support, including breakpoints,

    local variable and memory display, and stepping through code

    by using script language.

    Direct access to registers that control Intel XScale

    technology, Intel Wireless MMX technology and other

    coprocessor registers, and on-chip peripherals. A bit field

    editor and description of all bit fields for each register is

    particularly useful for low-level driver development.

    OS awareness plug-ins allow inspection of tasks, threads,

    and other OS activities.

    Support for scripting for batch files and for automated

    validation allows overnight debugging.

    Execution Trace Support displays code execution history

    that occurred before breakpoint encountered.

    System and application developers who use the Windows CE

    .NET environments can use the Intel debugging extensions,

    which are seamlessly integrated into those tools with an

    additional debugger window.

    OEMs developing on Microsoft Windows CE .NET and Windows

    Mobile 2003 typically use Microsoft Platform Builder* to build and

    debug their target platform, and communicate directly to the

    target via JTAG or TCP/IP. The Intel debuggers also provide eXDI

    debugger extensions for access with Macraigor Raven*

    and EPI JTAG tools (available through EPI*).

    ISVs developing applications for Microsoft Windows CE .NET

    and Windows Mobile 2003 can use Microsoft eMbedded Visual

    11

    White PaperOptimization Technology for the Intel PXA27x Processor

  • 8/13/2019 Intel Optimization of PXA27x

    12/14

    White Paper

    C++* to build and debug their applications on an OEM or

    ODMs device. eMbedded Visual C++ provides debugger

    extensions to allow direct access of the Intel XScale technology

    peripheral registers on that device.

    The extensions only work, however, if the OEM/ODM builds

    three specific ioctl functions into their system ROM or Flash that

    are part of the Intel BSPs for CE .NET and for Windows Mobile

    2003. System developers need to keep these functions in their

    device BSPs to enable ISVs to develop and showcase

    applications onto those devices.

    The IntelXDB Debugger also supports communication over a

    ROM monitor for ISVs developing on off-the-shelf target

    devices. This is currently available for Palm OS only. For Palm

    OS, system developers need to implement the ROM monitor to

    ensure that ISVs can debug applications on these devices.

    Intel Software Development Tools are available in the following

    packages:

    Intel C++ Compiler, For Platform Builder For Microsoft

    Windows CE .NET

    Intel C++ Compiler, For Microsoft eMbedded Visual C++

    Intel C++ Software Development Tool Suite, For Palm OS,

    Symbian OS, Nucleus OS, and OS independent systems

    For more information on the Intel Software Development Tools,

    visit http://www.intel.com/software/products/compilers.

    8. IntelVTune Performance Analyzer

    The IntelVTune Performance Analyzer helps developers to

    optimize software on Intel processors, including Intel XScale

    technology-based processors like the Intel PXA27x processor.

    The Intel VTune analyzer remote collector component identifies

    potential performance issues and provides recommendations

    for improving software utilization of the processor hardware

    and design. This tool works in a host/target development

    environment, with performance data gathered by a data

    collector running on a target development system with Intel

    XScale technology. Software running on Intel applications

    processors can be profiled remotely using the Intel VTune

    performance analyzer GUI on the host system.

    The Intel VTune performance analyzer provides performance

    data from the system level down to the source level. Sampling

    provides developers with the most accurate representation of

    actual software performance, with negligible overhead. Using

    the Performance Monitoring Unit (PMU) in the processor, all

    program activity can be profiled at the microarchitecture level.

    With intimate knowledge of the processor, the Intel VTune

    performance analyzer provides additional analysis of potential

    stalls and latency issues, and suggests techniques for improving

    code performance. The Intel VTune performance analyzer is

    available in beta for Microsoft Windows 2003 Mobile Software

    and Linux targets using the Intel PXA27x processor.

    Intel provides BSPs for Windows 2003 Mobile Software

    (Smartphone and Pocket PC) for Intel PCA processors, and

    the Intel VTune feature is part of the BSPs. To provide Intel

    VTune support within handheld and cell phone devices to ISVs,

    OEMs need to keep Intel VTune support in specific BSPs by

    following the steps:

    Windows 2003 Mobile Software-based devices:

    Keep the Intel VTune (PMUDLL.dll) feature in Intels BSPs

    based on Microsoft Windows 2003 Mobile Software (Pocket

    PC and Smartphone)

    Linux-based devices:

    Rebuild the Intel VTune Device Driver Sources once and

    provide binary as loadable module to ISVs. No modifications

    on the BSP are required

    9. Intel PCA Developer Network

    With more than 6,000 individual members, 3,500 companies,

    and over 750 available solutions, the Intel PCA Developer

    Network is a global community of hardware and software

    developers working to accelerate the delivery of next-

    generation wireless Internet applications and client devices.

    The Network supports wireless device and equipment

    manufacturers, application developers and service providers

    in these key areas:

    Software optimization for processors based on Intel XScale

    technology

    Development support

    Tools and technical support for Intel PCA building blocks

    Marketing support and co-marketing opportunities

    Solutions for wireless carriers and operators

    Solutions for the wireless enterprise

    Visit http://developer.intel.com/pca/developernetwork/index.htm

    and see what additional resources and support await you today.

    12

    Optimization Technology for the Intel PXA27x Processor

  • 8/13/2019 Intel Optimization of PXA27x

    13/14

  • 8/13/2019 Intel Optimization of PXA27x

    14/14

    White Paper Optimization Technology for the Intel PXA27x Processor

    Performance tests and ratings contained within this document are measured using specific computer systems and/or components and reflectthe approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configurationmay affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they

    are considering purchasing. For more information on performance tests and on the performance of Intel products, referencewww.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104

    *Other names and brands may be claimed as the property of others.

    Intel, the Intel logo, Pentium, Intel Centrino, Intel XScale, VTune, Personal Internet Client Architecture, Intel SpeedStep, MMX, and Wireless MMX aretrademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

    Copyright 2004 Intel Corporation. All rights reserved. 0304/MS/MD/PDF t Please Recycle 300869-001

    Appendix A.

    Overview of Key SystemOptimizations

    Refer to the Intel PXA27x Processor Developer

    Manual and the Intel PXA27x Processor

    Optimization Guide for detailed tips on

    optimizations.

    Use internal SRAM to help reduce power and

    increase performance.

    Set CCCR[A] and CLKCFG[B]=1 to maximize

    memory performance.

    Make sure to set the bus arbiter and parking

    settings properly.

    Make sure the device BSP and OS allow the

    presence of the Intel PXA27x processor to be

    detected.

    Make sure the Intel Wireless MMX technology

    coprocessor is enabled in the device BSP.

    Make sure the registers specific to Intel

    Wireless MMX technology are preserved across

    context switches and changes in power state.

    Use the Intel Integrated Performance Primitives

    to develop codecs with optimized performance.

    Use the Intel Graphics Performance Primitivesto develop 3D pipelines with optimized

    performance.

    Use the Intel Quick Capture Interface to ease

    interfacing with cameras.

    Use Overlay 2 to perform hardware color

    conversion.

    Port the Wireless Intel SpeedStep Power

    Manager software into the device BSP.

    Use the Wireless Intel SpeedStep Power

    Manager application API in applications to tune

    power consumption.

    Use the Intel Software Development Tools (Intel

    C/C++ compiler) to build applications enabledwith Intel Wireless MMX technology.

    Use the Intel XDB Debugger to debug

    applications.

    For OEMs building devices based on Microsoft

    Windows CE.NET, Microsoft Windows Mobile

    2003* software for Pocket PC and for

    Smartphone: build in the specific ioctl() functions

    compatible with Intel XDB Debugger into the

    device system to allow direct debugging using

    Microsoft eMbedded Visual C++.

    For OEMs building devices based on Palm OS:

    build in the ROM monitor compatible with the

    Intel XDB Debugger into your devices system

    to allow debugging.

    Use the Intel VTune Performance Analyzer to

    help profile and optimize applications.

    For OEMs building devices based on Microsoft

    Windows CE .NET, Microsoft Windows Mobile

    2003 software for Pocket PC and for

    Smartphone: keep the Intel VTune library

    (PMUDLL.dll) in your device to enable ISVs touse VTune to tune applications for your device.

    For OEMs building devices based on Linux:

    rebuild the Intel VTune device driver sources

    once and provide binary as loadable module to

    ISVs. No modifications on BSPs are required.

    For more information, visit the Intel Web site at: developer.intel.com