877
Cell Broadband Engine Programming Handbook Version 1.1 April 24, 2007 Title Page

CBE Handbook v1.1 24APR2007 Pub

  • Upload
    nietoc

  • View
    34

  • Download
    4

Embed Size (px)

Citation preview

Title Page

Cell Broadband Engine Programming Handbook

Version 1.1

April 24, 2007

Copyright and Disclaimer

Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corpora-

tion 2006, 2007. All Rights Reserved Printed in the United States of America April 2007 The following are trademarks of International Business Machines Corporation in the United States or other countries, or both. IBM ibm.com IBM Logo eServer PowerPC PowerPC Architecture

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN AS IS BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document. IBM Systems and Technology Group 2070 Route 52, Bldg. 330 Hopewell Junction, NY 12533-6351 The IBM home page can be found at ibm.com The IBM Semiconductor solutions home page can be found at ibm.com/chips Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

ContentsList of Figures ............................................................................................................... 19 List of Tables ................................................................................................................. 23 Preface ........................................................................................................................... 29Related Publications ............................................................................................................................. Conventions and Notation ..................................................................................................................... Referencing Registers, Fields, and Bit Ranges .................................................................................... Terminology .......................................................................................................................................... Reserved Regions of Memory and Registers ....................................................................................... 29 30 31 32 32

Revision Log ................................................................................................................. 33 1. Overview of the Cell Broadband Engine Processor .............................................. 371.1 Background ..................................................................................................................................... 1.1.1 Motivation .............................................................................................................................. 1.1.2 Power, Memory, and Frequency ........................................................................................... 1.1.3 Scope of this Handbook ........................................................................................................ 1.2 Hardware Environment ................................................................................................................... 1.2.1 The Processor Elements ....................................................................................................... 1.2.2 Element Interconnect Bus ..................................................................................................... 1.2.3 Memory Interface Controller .................................................................................................. 1.2.4 Cell Broadband Engine Interface Unit ................................................................................... 1.3 Programming Environment ............................................................................................................. 1.3.1 Instruction Sets ...................................................................................................................... 1.3.2 Storage Domains and Interfaces ........................................................................................... 1.3.3 Byte Ordering and Bit Numbering .......................................................................................... 1.3.4 Runtime Environment ............................................................................................................ 38 38 39 40 41 41 41 42 42 43 43 43 45 46

2. PowerPC Processor Element ................................................................................... 492.1 PowerPC Processor Unit ................................................................................................................ 2.2 PowerPC Processor Storage Subsystem ....................................................................................... 2.3 PPE Registers ................................................................................................................................. 2.4 PowerPC Instructions ...................................................................................................................... 2.4.1 Data Types ............................................................................................................................ 2.4.2 Addressing Modes ................................................................................................................. 2.4.3 Instructions ............................................................................................................................ 2.5 Vector/SIMD Multimedia Extension Instructions ............................................................................. 2.5.1 SIMD Vectorization ................................................................................................................ 2.5.2 Data Types ............................................................................................................................ 2.5.3 Addressing Modes ................................................................................................................. 2.5.4 Instruction Types ................................................................................................................... 2.5.5 Instructions ............................................................................................................................ 2.5.6 Graphics Rounding Mode ...................................................................................................... 50 52 52 55 55 55 56 57 57 59 59 59 60 60

Version 1.1 April 24, 2007

Contents Page 3 of 877

Programming Handbook Cell Broadband Engine

2.6 Vector/SIMD Multimedia Extension C/C++ Language Intrinsics ..................................................... 2.6.1 Vector Data Types ................................................................................................................. 2.6.2 Vector Literals ........................................................................................................................ 2.6.3 Intrinsics .................................................................................................................................

60 60 61 61

3. Synergistic Processor Elements .............................................................................. 633.1 Synergistic Processor Unit .............................................................................................................. 3.1.1 Local Store ............................................................................................................................. 3.1.2 Register File ........................................................................................................................... 3.1.3 Execution Units ...................................................................................................................... 3.1.4 Floating-Point Support ........................................................................................................... 3.2 Memory Flow Controller .................................................................................................................. 3.2.1 Channels ................................................................................................................................ 3.2.2 Mailboxes and Signalling ....................................................................................................... 3.2.3 MFC Commands and Command Queues .............................................................................. 3.2.4 Direct Memory Access Controller .......................................................................................... 3.2.5 Synergistic Memory Management Unit .................................................................................. 3.3 SPU Instruction Set ......................................................................................................................... 3.3.1 Data Types ............................................................................................................................. 3.3.2 Instructions ............................................................................................................................. 3.4 SPU C/C++ Language Intrinsics ..................................................................................................... 3.4.1 Vector Data Types ................................................................................................................. 3.4.2 Vector Literals ........................................................................................................................ 3.4.3 Intrinsics ................................................................................................................................. 63 64 67 68 68 70 72 72 72 73 74 74 74 75 75 76 76 76

4. Virtual Storage Environment .................................................................................... 774.1 Introduction ...................................................................................................................................... 77 4.2 PPE Memory Management ............................................................................................................. 78 4.2.1 Memory Management Unit ..................................................................................................... 79 4.2.2 Address-Translation Sequence .............................................................................................. 80 4.2.3 Enabling Address Translation ................................................................................................ 81 4.2.4 Effective-to-Real-Address Translation ................................................................................... 81 4.2.5 Segmentation ......................................................................................................................... 83 4.2.6 Paging .................................................................................................................................... 85 4.2.7 Translation Lookaside Buffer ................................................................................................. 91 4.2.8 Real Addressing Mode ........................................................................................................... 98 4.2.9 Effective Addresses in 32-Bit Mode ..................................................................................... 101 4.3 SPE Memory Management ........................................................................................................... 101 4.3.1 Synergistic Memory Management Unit ................................................................................ 101 4.3.2 Enabling Address Translation .............................................................................................. 102 4.3.3 Segmentation ....................................................................................................................... 103 4.3.4 Paging .................................................................................................................................. 106 4.3.5 Translation Lookaside Buffer ............................................................................................... 106 4.3.6 Real Addressing Mode ......................................................................................................... 115 4.3.7 Exception Handling and Storage Protection ........................................................................ 116

5. Memory Map ............................................................................................................. 1195.1 Introduction .................................................................................................................................... 119 5.1.1 Configuration-Ring Initialization ........................................................................................... 121Contents Page 4 of 877 Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

5.1.2 Allocated Regions of Memory .............................................................................................. 5.1.3 Reserved Regions of Memory ............................................................................................. 5.1.4 The Guarded Attribute ......................................................................................................... 5.2 PPE Memory Map ......................................................................................................................... 5.2.1 PPE Memory-Mapped Registers ......................................................................................... 5.2.2 Predefined Real-Address Locations .................................................................................... 5.3 SPE Memory Map ......................................................................................................................... 5.3.1 SPE Local-Store Memory Map ............................................................................................ 5.3.2 SPE Memory-Mapped Registers ......................................................................................... 5.4 BEI Memory-Mapped Registers .................................................................................................... 5.4.1 I/O ........................................................................................................................................

121 124 124 124 124 125 125 126 127 128 129

6. Cache Management ................................................................................................ 1316.1 PPE Caches .................................................................................................................................. 6.1.1 Configuration ....................................................................................................................... 6.1.2 Overview of PPE Cache ...................................................................................................... 6.1.3 L1 Caches ........................................................................................................................... 6.1.4 Branch History Table and Link Stack .................................................................................. 6.1.5 L2 Cache ............................................................................................................................. 6.1.6 Instructions for Managing the L1 and L2 Caches ................................................................ 6.1.7 Effective-to-Real-Address Translation Arrays ..................................................................... 6.1.8 Translation Lookaside Buffer ............................................................................................... 6.1.9 Instruction-Prefetch Queue Management ............................................................................ 6.1.10 Load Subunit Management ............................................................................................... 6.2 SPE Caches .................................................................................................................................. 6.2.1 Translation Lookaside Buffer ............................................................................................... 6.2.2 Atomic Unit and Cache ........................................................................................................ 6.3 Replacement Management Tables ............................................................................................... 6.3.1 PPE TLB Replacement Management Table ........................................................................ 6.3.2 PPE L2 Replacement Management Table .......................................................................... 6.3.3 SPE TLB Replacement Management Table ........................................................................ 6.4 I/O Address-Translation Caches ................................................................................................... 131 132 132 134 139 139 144 148 148 148 148 149 149 149 152 152 155 156 157

7. I/O Architecture ....................................................................................................... 1597.1 Overview ....................................................................................................................................... 7.1.1 I/O Interfaces ....................................................................................................................... 7.1.2 System Configurations ........................................................................................................ 7.1.3 I/O Addressing ..................................................................................................................... 7.2 Data and Access Types ................................................................................................................ 7.2.1 Data Lengths and Alignments ............................................................................................. 7.2.2 Atomic Accesses ................................................................................................................. 7.3 Registers and Data Structures ...................................................................................................... 7.3.1 IOCmd Configuration Register ............................................................................................ 7.3.2 I/O Segment Table Origin Register ..................................................................................... 7.3.3 I/O Segment Table .............................................................................................................. 7.3.4 I/O Page Table .................................................................................................................... 7.3.5 IOC Base Address Registers ............................................................................................... 7.3.6 I/O Exception Status Register ............................................................................................. 159 159 160 162 163 163 164 164 164 164 167 169 173 174

Version 1.1 April 24, 2007

Contents Page 5 of 877

Programming Handbook Cell Broadband Engine

7.4 I/O Address Translation ................................................................................................................. 7.4.1 Translation Overview ........................................................................................................... 7.4.2 Translation Steps ................................................................................................................. 7.5 I/O Exceptions ............................................................................................................................... 7.5.1 I/O Exception Causes .......................................................................................................... 7.5.2 I/O Exception Status Register .............................................................................................. 7.5.3 I/O Exception Mask Register ............................................................................................... 7.5.4 I/O-Exception Response ...................................................................................................... 7.6 I/O Address-Translation Caches ................................................................................................... 7.6.1 IOST Cache ......................................................................................................................... 7.6.2 IOPT Cache ......................................................................................................................... 7.7 I/O Storage Model ......................................................................................................................... 7.7.1 Memory Coherence ............................................................................................................. 7.7.2 Storage-Access Ordering ..................................................................................................... 7.7.3 I/O Accesses to Other I/O Units through an IOIF ................................................................. 7.7.4 Examples .............................................................................................................................

174 174 177 179 179 180 180 180 180 180 182 187 187 188 193 194

8. Resource Allocation Management ......................................................................... 2018.1 Introduction .................................................................................................................................... 8.2 Requesters .................................................................................................................................... 8.2.1 PPE and SPEs ..................................................................................................................... 8.2.2 I/O ........................................................................................................................................ 8.3 Managed Resources ..................................................................................................................... 8.4 Tokens ........................................................................................................................................... 8.4.1 Tokens Required for Single-CBE-Processor Systems ......................................................... 8.4.2 Operations Requiring No Token .......................................................................................... 8.4.3 Tokens Required for Multi-CBE-Processor Systems ........................................................... 8.5 Token Manager ............................................................................................................................. 8.5.1 Request Tracking ................................................................................................................. 8.5.2 Token Granting .................................................................................................................... 8.5.3 Unallocated RAG ................................................................................................................. 8.5.4 High-Priority Token Requests .............................................................................................. 8.5.5 Memory Tokens ................................................................................................................... 8.5.6 I/O Tokens ........................................................................................................................... 8.5.7 Unused Tokens .................................................................................................................... 8.5.8 Memory Banks, IOIF Allocation Rates, and Unused Tokens ............................................... 8.5.9 Token Request and Grant Example ..................................................................................... 8.5.10 Allocation Percentages ...................................................................................................... 8.5.11 Efficient Determination of TKM Priority Register Values .................................................... 8.5.12 Feedback from Resources to Token Manager ................................................................... 8.6 Configuration of PPE, SPEs, MIC, and IOC .................................................................................. 8.6.1 Configuration Register Summary ......................................................................................... 8.6.2 SPE Address-Range Checking ............................................................................................ 8.7 Changing Resource-Management Registers with MMIO Stores ................................................... 8.7.1 Changes to the RAID ........................................................................................................... 8.7.2 Changing a Requesters Token-Request Enable ................................................................. 8.7.3 Changing a Requesters Address Map ................................................................................ 8.7.4 Changing a Requesters Use of Multiple Tokens per Access .............................................. 201 204 204 204 205 206 206 209 210 211 211 211 213 214 214 217 218 218 218 222 223 225 227 227 229 231 231 232 233 234

Contents Page 6 of 877

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

8.7.5 Changing Feedback to the TKM .......................................................................................... 8.7.6 Changing TKM Registers .................................................................................................... 8.8 Latency Between Token Requests and Token Grants .................................................................. 8.9 Hypervisor Interfaces ....................................................................................................................

234 234 235 235

9. PPE Interrupts ......................................................................................................... 2379.1 Introduction ................................................................................................................................... 9.2 Summary of Interrupt Architecture ................................................................................................ 9.3 Interrupt Registers ......................................................................................................................... 9.4 Interrupt Handling .......................................................................................................................... 9.5 Interrupt Vectors and Definitions ................................................................................................... 9.5.1 System Reset Interrupt (Selectable or x00..00000100) ..................................................... 9.5.2 Machine Check Interrupt (x00..00000200) ......................................................................... 9.5.3 Data Storage Interrupt (x00..00000300) ............................................................................ 9.5.4 Data Segment Interrupt (x00..00000380) .......................................................................... 9.5.5 Instruction Storage Interrupt (x00..00000400) ................................................................... 9.5.6 Instruction Segment Interrupt (x00..00000480) ................................................................. 9.5.7 External Interrupt (x00..00000500) .................................................................................... 9.5.8 Alignment Interrupt (x00..00000600) ................................................................................. 9.5.9 Program Interrupt (x00..00000700) .................................................................................... 9.5.10 Floating-Point Unavailable Interrupt (x00..00000800) ..................................................... 9.5.11 Decrementer Interrupt (x00..00000900) ........................................................................... 9.5.12 Hypervisor Decrementer Interrupt (x00..00000980) ........................................................ 9.5.13 System Call Interrupt (x00..00000C00) ............................................................................ 9.5.14 Trace Interrupt (x00..00000D00) ...................................................................................... 9.5.15 VXU Unavailable Interrupt (x00..00000F20) .................................................................... 9.5.16 System Error Interrupt (x00..00001200) .......................................................................... 9.5.17 Maintenance Interrupt (x00..00001600) ........................................................................... 9.5.18 Thermal Management Interrupt (x00..00001800) ............................................................ 9.6 Direct External Interrupts .............................................................................................................. 9.6.1 Interrupt Presentation .......................................................................................................... 9.6.2 IIC Interrupt Registers ......................................................................................................... 9.6.3 SPU and MFC Interrupts ..................................................................................................... 9.6.4 Other External Interrupts ..................................................................................................... 9.7 Mediated External Interrupts ......................................................................................................... 9.7.1 Mediated External Interrupt Architecture ............................................................................. 9.7.2 Mediated External Interrupt Implementation ........................................................................ 9.8 SPU and MFC Interrupts Routed to the PPE ................................................................................ 9.8.1 Interrupt Types and Classes ................................................................................................ 9.8.2 Interrupt Registers ............................................................................................................... 9.8.3 Interrupt Definitions ............................................................................................................. 9.8.4 Handling SPU and MFC Interrupts ...................................................................................... 9.9 Thread Targets for Interrupts ........................................................................................................ 9.10 Interrupt Priorities ........................................................................................................................ 9.11 Interrupt Latencies ...................................................................................................................... 9.12 Machine State Register Settings Due to Interrupts ..................................................................... 9.13 Interrupts and Hypervisor ............................................................................................................ 9.14 Interrupts and Multithreading ...................................................................................................... 237 238 242 243 244 246 247 249 250 251 252 252 253 254 255 255 256 256 257 258 258 259 261 263 263 264 269 270 274 274 276 277 278 279 284 287 289 289 291 291 293 293

Version 1.1 April 24, 2007

Contents Page 7 of 877

Programming Handbook Cell Broadband Engine

9.15 Checkstop ................................................................................................................................... 293 9.16 Use of an External Interrupt Controller ........................................................................................ 293 9.17 Relationship Between CBE and PowerPC Interrupts .................................................................. 294

10. PPE Multithreading ................................................................................................ 29510.1 Multithreading Guidelines ............................................................................................................ 10.2 Thread Resources ....................................................................................................................... 10.2.1 Registers ............................................................................................................................ 10.2.2 Arrays, Queues, and Other Structures ............................................................................... 10.2.3 Pipeline Sharing and Support for Multithreading ............................................................... 10.3 Thread States .............................................................................................................................. 10.3.1 Privilege States .................................................................................................................. 10.3.2 Suspended or Enabled State ............................................................................................. 10.3.3 Blocked or Stalled State ..................................................................................................... 10.4 Thread Control and Status Registers .......................................................................................... 10.4.1 Machine State Register (MSR) ............................................................................................. 10.4.2 Hardware Implementation Register 0 (HID0) ...................................................................... 10.4.3 Logical Partition Control Register (LPCR) ............................................................................ 10.4.4 Control Register (CTRL) ...................................................................................................... 10.4.5 Thread Status Register Local and Remote (TSRL and TSRR) .............................................. 10.4.6 Thread Switch Control Register (TSCR) .............................................................................. 10.4.7 Thread Switch Time-Out Register (TTR) ............................................................................. 10.5 Thread Priority ............................................................................................................................. 10.5.1 Thread-Priority Combinations ............................................................................................ 10.5.2 Choosing Useful Thread Priorities ..................................................................................... 10.5.3 Examples of Priority Combinations on Instruction Scheduling ........................................... 10.6 Thread Control and Configuration ............................................................................................... 10.6.1 Resuming and Suspending Threads .................................................................................. 10.6.2 Setting the Instruction-Dispatch Policy: Thread Priority and Temporary Stalling ............... 10.6.3 Preventing Starvation: Forward-Progress Monitoring ........................................................ 10.6.4 Multithreading Operating-State Switch .............................................................................. 10.7 Pipeline Events and Instruction Dispatch .................................................................................... 10.7.1 Instruction-Dispatch Rules ................................................................................................. 10.7.2 Pipeline Events that Stall Instruction Dispatch ................................................................... 10.8 Suspending and Resuming Threads ........................................................................................... 10.8.1 Suspending a Thread ......................................................................................................... 10.8.2 Resuming a Thread ........................................................................................................... 10.8.3 Exception and Interrupt Interactions With a Suspended Thread ....................................... 10.8.4 Thread Targets and Behavior for Interrupts ....................................................................... 295 297 297 298 299 301 301 302 302 302 303 304 305 306 307 308 309 309 309 310 312 315 315 315 317 318 318 318 319 321 321 321 323 324

11. Logical Partitions and a Hypervisor .................................................................... 32711.1 Introduction .................................................................................................................................. 11.1.1 The Hypervisor and the Operating Systems ...................................................................... 11.1.2 Partitioning Resources ....................................................................................................... 11.1.3 An Example Flowchart ....................................................................................................... 11.2 PPE Logical-Partitioning Facilities ............................................................................................... 11.2.1 Enabling Hypervisor State ................................................................................................. 11.2.2 Hypervisor-State Registers ................................................................................................ 327 328 328 329 331 331 331

Contents Page 8 of 877

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

11.2.3 Controlling Real Memory ................................................................................................... 11.2.4 Controlling Interrupts and Environment ............................................................................. 11.3 SPE Logical-Partitioning Facilities .............................................................................................. 11.3.1 Access Privilege ................................................................................................................ 11.3.2 Memory-Management Facilities ........................................................................................ 11.3.3 Controlling Interrupts ......................................................................................................... 11.3.4 Other SPE Management Facilities .................................................................................... 11.4 I/O-Address Translation .............................................................................................................. 11.4.1 IOC Memory Management Units ....................................................................................... 11.4.2 I/O Segment and Page Tables .......................................................................................... 11.5 Resource Allocation Management .............................................................................................. 11.5.1 Combining Logical Partitions with Resource Allocation ..................................................... 11.5.2 Resource Allocation Groups and the Token Manager ....................................................... 11.6 Power Management .................................................................................................................... 11.6.1 Entering Low-Power States ............................................................................................... 11.6.2 Thread State Suspension and Resumption ....................................................................... 11.7 Fault Isolation .............................................................................................................................. 11.8 Code Sample .............................................................................................................................. 11.8.1 Error Codes and Hypervisor-Call (hcall) Tokens ............................................................... 11.8.2 C Functions for PowerPC 64-bit ELF Hypervisor Call .......................................................

332 338 341 341 342 344 344 346 346 346 347 347 347 348 348 348 349 349 349 349

12. SPE Context Switching ........................................................................................ 35112.1 Introduction ................................................................................................................................. 12.2 Data Structures ........................................................................................................................... 12.2.1 Local Store Context Save Area ......................................................................................... 12.2.2 Context Save Area ............................................................................................................ 12.3 Overview of SPE Context-Switch Sequence ............................................................................... 12.3.1 Save SPE Context ............................................................................................................. 12.3.2 Restore SPE Context ........................................................................................................ 12.4 Implementation Considerations ................................................................................................... 12.4.1 Locking .............................................................................................................................. 12.4.2 Watchdog Timers .............................................................................................................. 12.4.3 Waiting for Events ............................................................................................................. 12.4.4 PPEs SPU Channel Access Facility ................................................................................. 12.4.5 SPE Interrupts ................................................................................................................... 12.4.6 Suspending the MFC DMA Queue .................................................................................... 12.4.7 SPE Context-Save Sequence and Context-Restore Sequence Code .............................. 12.4.8 SPE Parameter Passing .................................................................................................... 12.4.9 Storage for SPE Context-Save Sequence and Context-Restore Sequence Code ............ 12.4.10 Harvesting an SPE .......................................................................................................... 12.4.11 Scheduling ....................................................................................................................... 12.4.12 Light-Weight SPE Context Save ...................................................................................... 12.5 Detailed Steps for SPE Context Switch ...................................................................................... 12.5.1 Context-Save Sequence .................................................................................................... 12.5.2 Context-Restore Sequence ............................................................................................... 12.6 Considerations for Hypervisors ................................................................................................... 351 352 352 352 352 354 354 356 356 356 356 356 356 357 357 357 357 358 358 358 359 359 365 373

Version 1.1 April 24, 2007

Contents Page 9 of 877

Programming Handbook Cell Broadband Engine

13. Time Base and Decrementers .............................................................................. 37513.1 Introduction .................................................................................................................................. 13.2 Time-Base Facility ....................................................................................................................... 13.2.1 Clock Domains ................................................................................................................... 13.2.2 Time-Base Registers ......................................................................................................... 13.2.3 Time-Base Frequency ........................................................................................................ 13.2.4 Time-Base Sync Mode Controls ........................................................................................ 13.2.5 Reading and Writing the TB Register ................................................................................ 13.2.6 Computing Time-of-Day ..................................................................................................... 13.3 Decrementers .............................................................................................................................. 13.3.1 PPE Decrementers ............................................................................................................ 13.3.2 SPE Decrementers ............................................................................................................ 13.3.3 Using an SPU Decrementer to Monitor SPU Code Performance ...................................... 375 375 375 376 377 378 382 383 383 383 384 385

14. Objects, Executables, and SPE Loading ............................................................. 39114.1 Introduction .................................................................................................................................. 14.2 ELF Overview and Extensions .................................................................................................... 14.2.1 Overview ............................................................................................................................ 14.2.2 SPE-ELF Extensions ......................................................................................................... 14.3 Runtime Initializations and Requirements ................................................................................... 14.3.1 PPE Initial Machine State .................................................................................................. 14.3.2 SPE Initial Machine State for Linux .................................................................................... 14.4 Linker Requirements ................................................................................................................... 14.4.1 SPE Linker Requirements .................................................................................................. 14.4.2 PPE Linker Requirements .................................................................................................. 14.5 The CESOF Format .................................................................................................................... 14.5.1 CESOF Overview ............................................................................................................... 14.5.2 CESOF Use Convention of ELF ........................................................................................ 14.5.3 Embedding an SPE-ELF Executable in a PPE-ELF Object: The .spu.elf Section ......... 14.5.4 The spe_program_handle Data Structure ........................................................................... 14.5.5 The TOE: Accessing Symbol Values Defined in EA Space ............................................... 14.5.6 Future Software Tool Chain Enhancements for CESOF ................................................... 14.6 SPE Runtime Loader ................................................................................................................... 14.6.1 Runtime Loader Overview ................................................................................................. 14.6.2 SPE Runtime Loader Requirements .................................................................................. 14.6.3 Example SPE Runtime Loader Framework Definition ....................................................... 14.7 SPE Execution Environment ....................................................................................................... 14.7.1 Signal Types for the SPE Stop-and-Signal Instruction ...................................................... 391 392 392 393 395 395 399 401 401 402 402 403 403 404 405 407 411 412 412 413 415 421 421

15. Power and Thermal Management ........................................................................ 42315.1 Power Management .................................................................................................................... 15.1.1 CBE Slow State ................................................................................................................. 15.1.2 PPE Pause (0) State .......................................................................................................... 15.1.3 SPU Pause State ............................................................................................................... 15.1.4 MFC Pause State ............................................................................................................... 15.2 Thermal Management ................................................................................................................. 15.2.1 Thermal-Management Operation ....................................................................................... 15.2.2 Configuration-Ring Settings ............................................................................................... 15.2.3 Thermal Registers ..............................................................................................................Contents Page 10 of 877

423 424 425 426 426 426 427 429 429

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

15.2.4 Thermal Sensor Status Registers ...................................................................................... 429 15.2.5 Thermal Sensor Interrupt Registers .................................................................................. 429 15.2.6 Dynamic Thermal-Management Registers ........................................................................ 432

16. Performance Monitoring ...................................................................................... 43716.1 How It Works ............................................................................................................................... 16.2 Events (Signals) .......................................................................................................................... 16.3 Performance Counters ................................................................................................................ 16.4 Trace Array ................................................................................................................................. 438 438 438 439

17. SPE Channel and Related MMIO Interface ......................................................... 44117.1 Introduction ................................................................................................................................. 17.1.1 An SPEs Use of its Own Channels ................................................................................... 17.1.2 Access to Channel Functions by the PPE and other SPEs ............................................... 17.1.3 Channel Characteristics .................................................................................................... 17.1.4 Channel Summary ............................................................................................................. 17.1.5 Channel Instructions .......................................................................................................... 17.1.6 Channel Capacity and Blocking ......................................................................................... 17.2 SPU Event-Management Channels ............................................................................................ 17.3 SPU Signal-Notification Channels ............................................................................................... 17.4 SPU Decrementer ....................................................................................................................... 17.4.1 SPU Write Decrementer Channel ...................................................................................... 17.4.2 SPU Read Decrementer Channel ..................................................................................... 17.5 MFC Write Multisource Synchronization Request Channel ........................................................ 17.6 SPU Read Machine Status Channel ........................................................................................... 17.7 SPU Write State Save-and-Restore Channel ............................................................................. 17.8 SPU Read State Save-and-Restore Channel ............................................................................. 17.9 MFC Command Parameter Channels ......................................................................................... 17.9.1 MFC Local Storage Address Channel ............................................................................... 17.9.2 MFC Effective Address High Channel ............................................................................... 17.9.3 MFC Effective Address Low or List Address Channel ....................................................... 17.9.4 MFC Transfer Size or List Size Channel ........................................................................... 17.9.5 MFC Command Tag Identification Channel ...................................................................... 17.9.6 MFC Class ID and MFC Command Opcode Channel ....................................................... 17.10 MFC Tag-Group Management Channels .................................................................................. 17.10.1 MFC Write Tag-Group Query Mask Channel .................................................................. 17.10.2 MFC Read Tag-Group Query Mask Channel .................................................................. 17.10.3 MFC Write Tag Status Update Request Channel ............................................................ 17.10.4 MFC Read Tag-Group Status Channel ........................................................................... 17.10.5 MFC Read List Stall-and-Notify Tag Status Channel ...................................................... 17.10.6 MFC Write List Stall-and-Notify Tag Acknowledgment Channel ..................................... 17.11 MFC Read Atomic Command Status Channel .......................................................................... 17.12 SPU Mailbox Channels ............................................................................................................. 441 441 442 442 443 446 447 447 448 448 448 449 449 450 450 451 451 453 454 454 455 456 457 457 458 458 458 460 460 461 462 463

18. SPE Events ............................................................................................................ 46518.1 Introduction ................................................................................................................................. 465 18.2 Events and Event-Management Channels .................................................................................. 466 18.2.1 Event Conditions and Bit Definitions for Event-Management Channels ............................ 466Version 1.1 April 24, 2007 Contents Page 11 of 877

Programming Handbook Cell Broadband Engine

18.2.2 Pending Event Register (Internal, SPE-Hidden) ................................................................ 18.2.3 SPU Read Event Status ..................................................................................................... 18.2.4 SPU Write Event Mask ...................................................................................................... 18.2.5 SPU Write Event Acknowledgment .................................................................................... 18.2.6 SPU Read Event Mask ...................................................................................................... 18.3 SPU Interrupt Facility .................................................................................................................. 18.4 Interrupt Address Save-and-Restore Channels .......................................................................... 18.4.1 SPU Read State Save-and-Restore .................................................................................. 18.4.2 SPU Write State Save-and-Restore ................................................................................... 18.4.3 Nested Interrupts Using SPU Write State Save-and-Restore ............................................ 18.5 Event-Handling Protocols ............................................................................................................ 18.5.1 Synchronous Event Handling Using Polling or Stalling ...................................................... 18.5.2 Asynchronous Event Handling Using Interrupts ................................................................ 18.5.3 Protecting Critical Sections from Interruption ..................................................................... 18.6 Event-Specific Handling Guidelines ............................................................................................ 18.6.1 Protocol with Multiple Events Enabled ............................................................................... 18.6.2 Procedure for Handling the Multisource Synchronization Event ........................................ 18.6.3 Procedure for Handling the Privileged Attention Event ...................................................... 18.6.4 Procedure for Handling the Lock-Line Reservation Lost Event ......................................... 18.6.5 Procedure for Handling the Signal-Notification 1 Available Event ..................................... 18.6.6 Procedure for Handling the Signal-Notification 2 Available Event ..................................... 18.6.7 Procedure for Handling the SPU Write Outbound Mailbox Available Event ...................... 18.6.8 Procedure for Handling the SPU Write Outbound Interrupt Mailbox Available Event ........ 18.6.9 Procedure for Handling the SPU Decrementer Event ........................................................ 18.6.10 Procedure for Handling the SPU Read Inbound Mailbox Available Event ....................... 18.6.11 Procedure for Handling the MFC SPU Command Queue Available Event ...................... 18.6.12 Procedure for Handling the DMA List Command Stall-and-Notify Event ......................... 18.6.13 Procedure for Handling the Tag-Group Status Update Event .......................................... 18.7 Developing a Basic Interrupt Handler .......................................................................................... 18.7.1 Basic Interrupt Protocol Features and Design ................................................................... 18.7.2 FLIH Design ....................................................................................................................... 18.7.3 SLIH Design and Registering SLIH Functions ................................................................... 18.7.4 Example Application Code ................................................................................................. 18.8 Nested Interrupt Handling ........................................................................................................... 18.8.1 Nested Handler Design ...................................................................................................... 18.8.2 FLIH Design for Nested Interrupts ..................................................................................... 18.9 Using a Dedicated Interrupt Stack ............................................................................................... 18.10 Sample Applications .................................................................................................................. 18.10.1 SPU Decrementer Event .................................................................................................. 18.10.2 Tag-Group Status Update Event ...................................................................................... 18.10.3 DMA List Command Stall-and-Notify Event ..................................................................... 18.10.4 MFC SPU Command Queue Available Event .................................................................. 18.10.5 SPU Read Inbound Mailbox Available Event ................................................................... 18.10.6 SPU Signal-Notification Available Event .......................................................................... 18.10.7 Lock-Line Reservation Lost Event ................................................................................... 18.10.8 Privileged Attention Event ................................................................................................

467 468 469 469 470 470 471 471 471 471 472 472 473 474 475 475 477 478 479 480 481 482 483 483 485 486 486 488 489 489 490 492 494 495 496 496 498 500 500 501 502 504 505 505 505 506

Contents Page 12 of 877

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

19. DMA Transfers and Interprocessor Communication ......................................... 50719.1 Introduction ................................................................................................................................. 19.2 MFC Commands ......................................................................................................................... 19.2.1 DMA Commands ............................................................................................................... 19.2.2 DMA List Commands ......................................................................................................... 19.2.3 Synchronization Commands .............................................................................................. 19.2.4 Command Modifiers .......................................................................................................... 19.2.5 Tag Groups ........................................................................................................................ 19.2.6 MFC Command Issue ........................................................................................................ 19.2.7 Replacement Class ID and Transfer Class ID ................................................................... 19.2.8 DMA-Command Completion .............................................................................................. 19.3 PPE-Initiated DMA Transfers ...................................................................................................... 19.3.1 MFC Command Issue ........................................................................................................ 19.3.2 MFC Command-Queue Control Registers ........................................................................ 19.3.3 DMA-Command Issue Status and Errors .......................................................................... 19.4 SPE-Initiated DMA Transfers ...................................................................................................... 19.4.1 MFC Command Issue ........................................................................................................ 19.4.2 MFC Command-Queue Monitoring Channels ................................................................... 19.4.3 DMA Command Issue Status and Errors .......................................................................... 19.4.4 DMA List Command Example ........................................................................................... 19.5 Performance Guidelines for MFC Commands ............................................................................ 19.6 Mailboxes .................................................................................................................................... 19.6.1 Reading and Writing Mailboxes ......................................................................................... 19.6.2 Mailbox Blocking ................................................................................................................ 19.6.3 Dealing with Anticipated Messages ................................................................................... 19.6.4 Uses of Mailboxes ............................................................................................................. 19.6.5 SPU Outbound Mailboxes ................................................................................................. 19.6.6 SPU Inbound Mailbox ........................................................................................................ 19.7 Signal Notification ....................................................................................................................... 19.7.1 SPU Signalling Channels .................................................................................................. 19.7.2 Uses of Signaling ............................................................................................................... 19.7.3 Mode Configuration ........................................................................................................... 19.7.4 SPU Signal Notification 1 Channel .................................................................................... 19.7.5 SPU Signal Notification 2 Channel .................................................................................... 19.7.6 Sending Signals ................................................................................................................. 19.7.7 Receiving Signals .............................................................................................................. 19.7.8 Differences Between Mailboxes and Signal Notification ................................................... 507 508 510 512 512 513 513 515 515 516 517 517 519 519 523 524 525 526 529 532 533 534 534 535 535 536 540 544 545 546 546 547 547 547 550 552

20. Shared-Storage Synchronization ........................................................................ 55320.1 Shared-Storage Ordering ............................................................................................................ 20.1.1 Storage Model ................................................................................................................... 20.1.2 PPE Ordering Instructions ................................................................................................. 20.1.3 SPU Ordering Instructions ................................................................................................. 20.1.4 MFC Ordering Mechanisms ............................................................................................... 20.1.5 MFC Multisource Synchronization Facility ......................................................................... 20.1.6 Scenarios for Using Ordering Mechanisms ....................................................................... 20.2 PPE Atomic Synchronization ...................................................................................................... 20.2.1 Atomic Synchronization Instructions .................................................................................. 553 553 556 560 564 569 576 577 577

Version 1.1 April 24, 2007

Contents Page 13 of 877

Programming Handbook Cell Broadband Engine

20.2.2 PPE Synchronization Primitives ......................................................................................... 20.2.3 SPE Synchronization Primitives ......................................................................................... 20.3 SPE Atomic Synchronization ....................................................................................................... 20.3.1 MFC Commands for Atomic Updates ................................................................................ 20.3.2 The MFC Read Atomic Command Status Channel ........................................................... 20.3.3 Avoiding Livelocks ............................................................................................................. 20.3.4 Synchronization Primitives .................................................................................................

579 582 589 589 591 592 593

21. Parallel Programming ........................................................................................... 60121.1 Challenges .................................................................................................................................. 21.2 Patterns of Parallel Programming ............................................................................................... 21.2.1 Terminology ....................................................................................................................... 21.2.2 Finding Parallelism ............................................................................................................. 21.2.3 Strategies for Parallel Programming .................................................................................. 21.3 Steps for Parallelizing a Program ................................................................................................ 21.3.1 Step 1: Understand the Problem ........................................................................................ 21.3.2 Step 2: Choose Programming Tools and Technology ....................................................... 21.3.3 Step 3: Develop High-Level Parallelization Strategy ......................................................... 21.3.4 Step 4: Develop Low-Level Parallelization Strategy .......................................................... 21.3.5 Step 5: Design Data Structures for Efficient Processing .................................................... 21.3.6 Step 6: Iterate and Refine .................................................................................................. 21.3.7 Step 7: Fine-Tune .............................................................................................................. 21.4 Levels of Parallelism in the CBE Processor ................................................................................ 21.4.1 SIMD Parallelization ........................................................................................................... 21.4.2 Superscalar Parallelization ................................................................................................ 21.4.3 Hardware Multithreading .................................................................................................... 21.4.4 Multiple Execution Units ..................................................................................................... 21.4.5 Multiple CBE Processors ................................................................................................... 21.5 Tools for Parallelization ............................................................................................................... 21.5.1 Language Extensions: Intrinsics and Directives ................................................................ 21.5.2 Compiler Support for Single Shared-Memory Abstraction ................................................. 21.5.3 OpenMP Directives ............................................................................................................ 21.5.4 Compiler-Controlled Software Cache ................................................................................ 21.5.5 Compiler and Runtime Support for Code Partitioning ........................................................ 21.5.6 Thread Library .................................................................................................................... 601 601 602 603 604 606 606 606 607 607 607 608 608 609 610 610 610 610 611 612 612 613 613 615 618 619

22. SIMD Programming ............................................................................................... 62122.1 SIMD Basics ................................................................................................................................ 22.1.1 Converting Scalar Data to SIMD Data ............................................................................... 22.1.2 Approaching SIMD Coding Methodically ........................................................................... 22.1.3 Coding for Effective Auto-SIMDization ............................................................................... 22.2 Auto-SIMDizing Compilers .......................................................................................................... 22.2.1 Motivation and Challenges ................................................................................................. 22.2.2 Examples of Invalid and Valid SIMDization ....................................................................... 22.3 SIMDization Framework for a Compiler ...................................................................................... 22.3.1 Phase 1: Basic-Block Aggregation ..................................................................................... 22.3.2 Phase 2: Short-Loop Aggregation ...................................................................................... 22.3.3 Phase 3: Loop-Level Aggregation ...................................................................................... 22.3.4 Phase 4: Alignment Devirtualization .................................................................................. 621 622 626 637 639 640 642 646 648 648 649 650

Contents Page 14 of 877

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

22.3.5 Phase 5: Length Devirtualization ....................................................................................... 22.3.6 Phase 6: SIMD Code Generation and Instruction Scheduling ........................................... 22.3.7 SIMDization Example: Multiple Sources of SIMD Parallelism ........................................... 22.3.8 SIMDization Example: Multiple Data Lengths ................................................................... 22.3.9 Vector Operations and Mixed-Mode SIMDization ............................................................. 22.4 Other Compiler Optimizations ..................................................................................................... 22.4.1 OpenMP ............................................................................................................................ 22.4.2 Subword Data Types ......................................................................................................... 22.4.3 Backend Scheduling for SPEs ........................................................................................... 22.4.4 Interacting with Typical Optimizations ...............................................................................

655 656 657 660 665 666 666 666 667 668

23. Vector/SIMD Multimedia Extension and SPU Programming ............................. 67123.1 Architectural Differences ............................................................................................................. 23.1.1 Registers ........................................................................................................................... 23.1.2 Data Types ........................................................................................................................ 23.1.3 Instruction-Set Differences ................................................................................................ 23.2 Porting SIMD Code from the PPE to the SPEs ........................................................................... 23.2.1 Code-Mapping Considerations .......................................................................................... 23.2.2 Simple Macro Translation .................................................................................................. 23.2.3 Full Functional Mapping .................................................................................................... 23.2.4 Code-Portability Typedefs ................................................................................................. 23.2.5 Compiler-Target Definition ................................................................................................. 671 672 673 674 676 676 677 680 681 681

24. SPE Programming Tips ........................................................................................ 68324.1 DMA Transfers ............................................................................................................................ 24.1.1 Initiating DMA Transfers from SPEs .................................................................................. 24.1.2 Overlapping DMA Transfers and Computation .................................................................. 24.1.3 DMA Transfers and LS Accesses ...................................................................................... 24.2 SPU Pipelines and Dual-Issue Rules .......................................................................................... 24.3 Eliminating and Predicting Branches .......................................................................................... 24.3.1 Function-Inlining and Loop-Unrolling ................................................................................. 24.3.2 Predication Using Select-Bits Instruction ........................................................................... 24.3.3 Branch Hints ...................................................................................................................... 24.3.4 Program-Based Branch Prediction .................................................................................... 24.3.5 Profile or Linguistic Branch-Prediction ............................................................................... 24.3.6 Software Branch-Target Address Cache ........................................................................... 24.3.7 Using Control Flow to Record Branch History ................................................................... 24.4 Loop Unrolling and Pipelining ..................................................................................................... 24.5 Offset Pointers ............................................................................................................................ 24.6 Transformations and Table Lookups ........................................................................................... 24.6.1 The Shuffle-Bytes Instruction ............................................................................................ 24.6.2 Fast SIMD 8-Bit Table Lookups ......................................................................................... 24.7 Integer Multiplies ......................................................................................................................... 24.8 Scalar Code ................................................................................................................................ 24.8.1 Scalar Loads and Stores ................................................................................................... 24.8.2 Promoting Scalar Data Types to Vector Data Types ......................................................... 24.9 Unaligned Loads ......................................................................................................................... 683 684 684 689 690 691 692 692 693 697 698 699 700 701 704 704 704 705 708 708 708 710 710

Version 1.1 April 24, 2007

Contents Page 15 of 877

Programming Handbook Cell Broadband Engine

Appendix A. PPE Instruction Set and Intrinsics ....................................................... 715A.1 PowerPC Instruction Set ............................................................................................................... A.1.1 Data Types .......................................................................................................................... A.1.2 PPE Instructions .................................................................................................................. A.1.3 Microcoded Instructions ....................................................................................................... A.2 PowerPC Extensions in the PPE .................................................................................................. A.2.1 New PowerPC Instructions .................................................................................................. A.2.2 Implementation-Dependent Interpretation of PowerPC Instructions ................................... A.2.3 Optional PowerPC Instructions Implemented ...................................................................... A.2.4 PowerPC Instructions Not Implemented .............................................................................. A.2.5 Endian Support .................................................................................................................... A.3 Vector/SIMD Multimedia Extension Instructions ........................................................................... A.3.1 Data Types .......................................................................................................................... A.3.2 Vector/SIMD Multimedia Extension Instructions .................................................................. A.3.3 Graphics Rounding Mode .................................................................................................... A.4 C/C++ Language Extensions (Intrinsics) for Vector/SIMD Multimedia Extensions ....................... A.4.1 Vector Data Types ............................................................................................................... A.4.2 Vector Literals ...................................................................................................................... A.4.3 Intrinsics .............................................................................................................................. A.5 Issue Rules ................................................................................................................................... A.6 Pipeline Stages ............................................................................................................................. A.6.1 Instruction-Unit Pipeline ....................................................................................................... A.6.2 Vector/Scalar Unit Issue Queue .......................................................................................... A.6.3 Stall and Flush Points .......................................................................................................... A.7 Compiler Optimizations ................................................................................................................. A.7.1 Instruction Arrangement ...................................................................................................... A.7.2 Avoiding Slow Instructions and Processor Modes ............................................................... A.7.3 Avoiding Dependency Stalls and Flushes ........................................................................... A.7.4 General Recommendations ................................................................................................. 715 715 715 725 732 732 735 738 738 739 739 739 740 744 746 746 747 748 752 754 754 756 757 759 759 759 760 762

Appendix B. SPU Instruction Set and Intrinsics ....................................................... 763B.1 SPU Instruction Set ....................................................................................................................... B.1.1 Data Types .......................................................................................................................... B.1.2 Instructions .......................................................................................................................... B.1.3 Fetch and Issue Rules ......................................................................................................... B.1.4 Inline Prefetch and Instruction Runout ................................................................................ B.2 C/C++ Language Extensions (Intrinsics) for SPU Instructions ..................................................... B.2.1 Vector Data Types ............................................................................................................... B.2.2 Vector Literals ...................................................................................................................... B.2.3 Intrinsics .............................................................................................................................. B.2.4 Inline Assembly ................................................................................................................... B.2.5 Compiler Directives ............................................................................................................. 763 763 763 771 774 775 775 777 779 782 782

Appendix C. Performance Monitor Signals ............................................................... 785C.1 Selecting Performance Monitor Signals on the CBE Debug Bus ................................................. C.1.1 An Example of Setting up the Performance Monitor in PPSS L2 Mode A .......................... C.2 PowerPC Processor Unit (PPU) Signal Selection ........................................................................ C.2.1 PPU Instruction Unit ............................................................................................................ C.2.2 PPU Execution Unit (NClk) ..................................................................................................Contents Page 16 of 877

785 787 789 789 790

Version 1.1 April 24, 2007

Programming Handbook Cell Broadband Engine

C.3 PowerPC Storage Subsystem (PPSS) Signal Selection .............................................................. C.3.1 PPSS Bus Interface Unit (NClk/2) ....................................................................................... C.3.2 PPSS L2 Cache Controller - Group 1 (NClk/2) ................................................................... C.3.3 PPSS L2 Cache Controller - Group 2 (NClk/2) ................................................................... C.3.4 PPSS L2 Cache Controller - Group 3 (NClk/2) ................................................................... C.3.5 PPSS Noncacheable Unit (NClk/2) ..................................................................................... C.4 Synergistic Processor Unit (SPU)