Upload
dangtuong
View
215
Download
0
Embed Size (px)
Citation preview
ii
This document contains data derived from functional simulations and perfor-mance estimates. LSI Logic has not verified either the functional descriptions, orthe electrical and mechanical specifications using production parts.
Document DB14-000009-01, Second Edition (October 1996)This document describes Revision B of LSI Logic Corporation’s MiniRISC™CW400x Microprocessor Cores and will remain the official reference source forall revisions/releases of this product until rescinded by an update.
To receive product literature, call us at 1-800-574-4286 (or 415-940-6877outside the U.S. and Canada) and ask for Department JDS; or visit us athttp://www.lsilogic.com.
LSI Logic Corporation reserves the right to make changes to any products hereinat any time without notice. LSI Logic does not assume any responsibility or lia-bility arising out of the application or use of any product described herein, exceptas expressly agreed to in writing by LSI Logic; nor does the purchase or use ofa product from LSI Logic convey a license under any patent rights, copyrights,trademark rights, or any other of the intellectual property rights of LSI Logic orthird parties.
Copyright © 1995, 1996 by LSI Logic Corporation. All rights reserved.
TRADEMARK ACKNOWLEDGMENTLSI Logic logo design, MDE, Modular Design Environment, and CoreWare areregistered trademarks and C-MDE, MiniRISC, MiniSIM, Right-First-Time, andSelf-Embedding are trademarks of LSI Logic Corporation. MIPS is a trademarkof MIPS Technologies, Inc. SPARC is a registered trademark of SPARC Interna-tional, Inc. UNIX is a registered trademark of X/Open Company Limited. Verilogis a registered trademark of Cadence Design Systems, Inc. All other brand andproduct names may be trademarks of their respective companies.
Preface iii
Preface
This book is the primary reference and Technical Manual for theMiniRISC CW400x Microprocessor Core. It contains a complete func-tional description for the core and includes complete physical and elec-trical specifications for the core.
Audience This document assumes that you have some familiarity with microproces-sors and related support devices. This book is written for:
♦ Engineers and managers who are evaluating the processor for pos-sible use in a system
♦ Engineers who are designing the processor into a system
Organization This document has the following chapters and appendixes:
♦ Chapter 1, Introduction
♦ Chapter 2, Function
♦ Chapter 3, Signals
♦ Chapter 4, Instructions
♦ Chapter 5, Exception Processing (CP0)
♦ Chapter 6, Required External Modules
♦ Chapter 7, Interfaces
♦ Chapter 8, Methodologies and Layout Guidelines
RelatedPublications
MiniRISC™ Building Blocks Technical Manual, Doc No. DB14-000022-00Order Number C14031
iv Preface
CW33300 Enhanced Self-Embedding™ Processor Core User’s Manual,Order No. C14014
LR33000 Family Instruction Set Guide, Doc No. MT72-000102-99 OrderNumber J14029
ConventionsUsed in ThisManual
The first time a word or phrase is defined in this manual, it is italicized.
The following signal naming conventions are used throughout thismanual:
♦ A level-significant signal that is true or valid when the signal is LOWalways has an overbar ( ) over its name and ends with an “N.”
♦ An edge-significant signal that initiates actions on a HIGH-to-LOWtransition always has an overbar ( ) over its name and ends withan “N.”
♦ A level-significant signal that is true or valid when the signal is HIGHalways ends with a “P.”
♦ An edge-significant signal that initiates actions on a LOW-to-HIGHtransition always ends with a “P.”
The word assert means to drive a signal true or active. The worddeassert means to drive a signal false or inactive.
Hexadecimal numbers are indicated by the prefix “0x” before thenumber—for example, 0x32CF. Binary numbers are indicated by a sub-scripted “2” following the number—for example, 0011.0010.1100.11112.
Contents v
Contents
Chapter 1 Introduction1.1 System Overview 1-11.2 Features Summary 1-21.3 MiniRISC Product Family 1-31.4 CoreWare Program 1-4
1.4.1 CoreWare Building Blocks 1-41.4.2 Design Environment 1-41.4.3 Expert Support 1-5
Chapter 2 Function2.1 Microprocessor Overview 2-12.2 Functional Differences from the R3000 and the R4000
CPUs 2-32.3 Pipeline Architecture 2-42.4 Load Delay Slot 2-52.5 Branch Delay Slot 2-52.6 Load Scheduling Support 2-72.7 WAITI Instruction: Power Saving Feature 2-9
Chapter 3 Signals
Chapter 4 Instructions4.1 Instruction Formats 4-14.2 CW400x Opcode Bit Encoding 4-34.3 Instruction Summary 4-74.4 Load and Store Instructions 4-94.5 Computational Instructions 4-124.6 Jump and Branch Instructions 4-14
vi Contents
4.7 Branch Likely Instructions 4-164.8 Special Control Instructions 4-184.9 Trap Instructions 4-194.10 Coprocessor Instructions 4-204.11 System Control Coprocessor (CP0) Instructions 4-21
Chapter 5 Exception Processing (CP0)5.1 Exception Handling Registers 5-1
5.1.1 Status Register (R12) 5-25.1.2 Cause Register (R13) 5-45.1.3 Exception Program Counter (EPC) Register (R14) 5-75.1.4 Processor Revision Identifier (PRId) Register (R15) 5-7
5.2 Exception Processing 5-95.2.1 Exception Vector Locations 5-115.2.2 Status Register Mode Bits and Exception
Processing 5-115.2.3 System Control Coprocessor (CP0) Function 5-125.2.4 Register Accesses 5-135.2.5 Exception Handling 5-13
5.3 Exception Description Details 5-235.3.1 Address Error Exception 5-235.3.2 Breakpoint Exception 5-245.3.3 Bus Error Exception 5-255.3.4 Coprocessor Unusable Exception 5-265.3.5 Interrupt Exception 5-285.3.6 Overflow Exception 5-285.3.7 Reserved Instruction Exception 5-295.3.8 Reset Exception 5-305.3.9 System Call Exception 5-315.3.10 Trap Exception 5-32
Chapter 6 Required External Modules6.1 Global Output Enable Module (GOE) 6-1
6.1.1 Function 6-16.1.2 Signals 6-76.1.3 Connecting to the CW400x and Building Blocks 6-13
6.2 MMU Stub 6-14
Contents vii
6.2.1 Function 6-146.2.2 Signals 6-156.2.3 Connecting to the CW400x 6-17
Chapter 7 Interfaces7.1 CBus Interface 7-1
7.1.1 Bus Stealing 7-17.1.2 Interface Signals 7-17.1.3 Operation and Functional Waveforms 7-2
7.2 FlexLink Interface 7-137.2.1 Interface Signals 7-147.2.2 Computational Unit Instructions 7-167.2.3 Operation and Functional Waveforms 7-18
Chapter 8 Methodologies and Layout Guidelines8.1 Clocking Methodology 8-1
8.1.1 Duty Cycle 8-28.1.2 Local Clock Buffers 8-28.1.3 Gated Clocks 8-38.1.4 Delayed Clocks 8-38.1.5 Hold Time Margin 8-4
8.2 Scan Methodology 8-48.2.1 Methodology 8-58.2.2 Regeneration (Recommended Methodology) 8-68.2.3 Core ATPG Shell 8-78.2.4 CW400x ATPG Guidelines 8-98.2.5 MMU ATPG Guidelines 8-108.2.6 MDU ATPG Guidelines 8-10
8.3 Layout Guidelines 8-118.3.1 Hardmac I/O Placement 8-118.3.2 Data Bus 8-148.3.3 CW400x Placement 8-148.3.4 BBCC Placement 8-168.3.5 Computational Unit Placement 8-178.3.6 MMU Placement 8-188.3.7 Coprocessor Placement 8-208.3.8 Global Output Enable (GOE) Placement 8-22
viii Contents
8.3.9 Cache RAMs Placement 8-238.3.10 Tagmatch Placement 8-248.3.11 Write Buffer Placement 8-258.3.12 B-Bus Device Placement 8-26
Appendix A Structural ALU Improper Unknown Value (X) Handling
Customer Feedback
Figures 1.1 CW400x in a Typical System 1-22.1 CW400x Internal Block Diagram 2-22.2 CW400x Pipeline 2-42.3 CW400x Pipeline with X2 Stall Cycle 2-42.4 Three Consecutive Non-Load/Store Instructions 2-42.5 Load/Store Instruction 2-42.6 Two Consecutive Load/Store Instructions 2-52.7 WB to X1 Stage Bypass (No Load Delay Slot Necessary) 2-52.8 Branch Taken 2-62.9 Branch Not Taken 2-62.10 Branch Likely Taken 2-72.11 Branch Likely Not Taken 2-72.12 Scheduled Load Instruction 2-82.13 Scheduled Load Followed by a Second Load 2-84.1 I-Type (Immediate) Instruction 4-24.2 J-Type (Jump) Instruction 4-24.3 R-Type (Register) Instruction 4-24.4 Byte Specifications for Loads/Stores 4-94.5 WAITI Instruction Waveforms 4-225.1 Status Register 5-35.2 Cause Register 5-55.3 EPC Register 5-75.4 PRId Register 5-85.5 Status Register Changes During Exception Recognition 5-125.6 Restoring Control from Exceptions (RFE Instruction) 5-125.7 Typical Pipeline Flow 5-145.8 Branch Likely, Branch Not Taken (X1 Stage) 5-155.9 X1 Stage Exception (System Call) 5-16
Contents ix
5.10 WB Stage Exception (Overflow) 5-165.11 IF Stage Exception (TLB Miss, Instruction) 5-175.12 Reset Exception (Special Case) 5-175.13 X2 Stage Exception (TLB Miss, Data Load) 5-185.14 External Interrupt Signalled During X2 Stage 5-195.15 Instruction Bus Error, (X1 Stage) 5-195.16 Data Bus Error, (WB Stage) 5-205.17 Multiple CKILLMEMP Assertion 5-205.18 External Coprocessor (FPU) Interrupt (Interrupt Not Taken) 5-215.19 External Coprocessor (FPU) Interrupt (Interrupt Taken) 5-225.20 Branch Likely Delay Slot Invalidation 5-235.21 Branch Target Address Calculation 5-306.1 Basic Functional GOE Design Logic 6-36.2 Improved Timing GOE Design Logic 6-56.3 Final GOE Design Logic 6-66.4 Creation of RUN_INN 6-76.5 Creation of CPIPE_RUNN 6-76.6 GOE Module Attachments 6-136.7 MMU Stub Hard Address Mapping (Hard Map) 6-156.8 MMU Stub Attachments 6-187.1 Instruction Fetch Examples 1 7-47.2 Instruction Fetch Example 2 7-57.3 Data Load Example 1 7-77.4 Data Load Example 2 7-87.5 Data Load Example 3 7-97.6 Data Load Example 4 7-107.7 Data Store Example 1 7-127.8 Data Store Examples 2 7-137.9 Opcodes 7-167.10 R-Type Arithmetic (Extended) Instruction 7-177.11 I-Type Arithmetic (Extended) Instruction 7-187.12 Computational Unit Write to CW400x CPU Register 7-207.13 Computational Unit Single-Cycle Killed by CKILLXP 7-217.14 Computational Unit Operation, Stalled and Killed 7-227.15 Two-Cycle Computational Unit Operation (Example 1) 7-237.16 Two-Cycle Computational Unit Operation (Example 2) 7-247.17 Three-Cycle Computational Unit Operation 7-257.18 Stalled Two-Cycle Computational Unit Operation 7-26
x Contents
7.19 Two-Cycle CU Operation with Writeback (Example 1) 7-277.20 Two-Cycle CU Operation with Writeback (Example 2) 7-288.1 Two-level Clock Distribution Network 8-28.2 Gated Clock Logic 8-38.3 Methodology Flowchart 8-58.4 Input Pin Schematic for ATPG Shell 8-88.5 Output Pin Schematic for ATPG Shell 8-88.6 Bidirectional Pin Schematic for ATPG Shell 8-98.7 CW400x Hardmac 8-128.8 BBCC Hardmac 8-138.9 MDU Hardmac 8-148.10 CW400x Placement Example 8-158.11 BBCC Suggested Placement 8-168.12 Computational Unit Suggested Placement 8-178.13 MMU (with no CU) Suggested Placement 8-188.14 MMU (with CU) Suggested Placement 8-198.15 Coprocessor Placement Example 1 8-208.16 Coprocessor Placement Example 2 8-208.17 Coprocessor Placement Example 3 8-218.18 Global Output Enable Suggested Placement 8-228.19 Cache RAMs Placement Example 8-238.20 Tagmatch Placement 8-248.21 Write Buffer Placement Example 8-258.22 B-Bus Device Placement Example 8-26
Tables 3.1 Signal Summary 3-14.1 Shading Key for Tables 4.2 through 4.6 4-34.2 Major Opcode (op) Bit Encoding 4-44.3 SPECIAL Minor Opcode funct Bit Encoding 4-44.4 REGIMM Minor Opcode rt Bit Encoding 4-54.5 COPz (z = 0, 1, 2, 3) rs Minor Opcode Bit Encoding 4-54.6 COPz (z = 0, 1, 2, 3) rt Minor Opcode Bit Encoding 4-54.7 COP0 Minor Opcode funct Bit Encoding
(Bits[25:24] = 1x2) 4-64.8 COPz (z = 1, 2, 3) Minor Opcode funct Bit Encoding
(Bits[25:24] = 1x2) 4-64.9 CW400x Instructions 4-84.10 Load and Store Instruction Summary 4-10
Contents xi
4.11 ALU Immediate Arithmetic Instruction Summary 4-124.12 Three-Operand, Register-Type Arithmetic Instruction
Summary 4-134.13 Shift Instruction Summary 4-144.14 Jump and Branch Instruction Summary 4-154.15 Branch Likely Instruction Summary 4-174.16 Special Control Instruction Summary 4-184.17 Trap Instruction Summary 4-194.18 Coprocessor Instruction Summary 4-204.19 CP0 Instruction Summary 4-215.1 Exception-Processing Register Addresses 5-25.2 CW400x Exceptions 5-95.3 Exception Vector Locations 5-115.4 CP0 Register Addresses 5-135.5 Exception Priority 5-146.1 Output Enable Decoding 6-47.1 CW400x CBus Interface Signals 7-27.2 CW400x FlexLink Interface Signals 7-147.3 System Logic FlexLink Interface Signals 7-158.1 Driver Type and Module Name 8-38.2 Hold Time Margin 8-4
1-1
Chapter 1Introduction
This chapter introduces the LSI Logic MiniRISC™ CW400x Microproces-sor Core.
This chapter contains the following sections:
♦ Section 1.1, “System Overview”
♦ Section 1.2, “Features Summary”
♦ Section 1.3, “MiniRISC Product Family”
♦ Section 1.4, “CoreWare Program”
1.1SystemOverview
The MiniRISC CW400x Microprocessor Core family, components of theLSI Logic CoreWare® Library, are exceptionally compact, high-perfor-mance microprocessors compatible with the MIPS R4000, including allof the MIPS-I and most of the MIPS-II Instruction Set (for details seeChapter 4). The CW400x can be easily designed into a wide range ofproducts. The CW400x can be combined with industry standard coresand proprietary functional building blocks to create a completely custom-ized embedded system on a chip. LSI Logic currently provides the fol-lowing optional building blocks:
♦ Multiply/Divide Unit (MDU)
♦ Memory Management Unit (MMU)
♦ Basic Bus Interface Unit and Cache Controller (BBCC)
♦ Timer
These building blocks are described in the MiniRISC Building BlocksTechnical Manual. System designers can use these building blocks(unmodified or modified) and/or add their own customized logic to theCW400x Core.
1-2 Introduction
LSI Logic also provides the following external modules (for more informa-tion, see Chapter 6):
♦ Global Output Enable Module (GOE)
♦ MMU Stub (to be used if there is no MMU present)
The CW400x has been optimized for low-power and cost-sensitive appli-cations such as portable telecommunications, games, and consumermultimedia systems.
The CW400x FlexLink Interface allows customer-specific microprocessorinstructions. The core implements a simple three-stage pipeline and pro-vides a single cache/memory interface for both instructions and data.With a system clock of 60 MHz, the performance of the CW400x is esti-mated at 45 MIPS sustained. The core implements full scan to achievegreater than 99% fault coverage.
Figure 1.1 shows the CW400x Microprocessor Core and how it interfaceswith system logic in a typical customer design.
Figure 1.1CW400x in a TypicalSystem
1.2FeaturesSummary
The CW400x has the following features:
♦ Fully compatible with the MIPS-I and most of the MIPS-II InstructionSet
♦ CW400x-specific Instructions
CW400x
MMU or
Coprocessor
CBus
RAM/ROM
CacheDRAM
DMA
Timer
BIU andCache
Controller(BBCC)
Write Buffer
MDU
BBusCBusInterface
Controller
Controller
FlexLinkInterface
GOE
MMU Stub
MiniRISC Product Family 1-3
♦ Configurable, compact, modular design and unified bus architecture
♦ Eliminates the need for a load delay slot
♦ Simple three-stage pipeline: Fetch, Execute, and Writeback
♦ Load/Store Instructions, MFCz, MTCz, CFCz, and CTCz execute intwo cycles
♦ All other instructions execute in one cycle
♦ WAITI (Wait for Interrupt) Instruction for power savings
♦ Powerful FlexLink Interface allows customer-specific microprocessorinstructions
♦ High-performance Coprocessor Interface for user-definable copro-cessors and high-performance hardware FPU
♦ 32-bit memory and cache interfaces
♦ Optional building blocks: Timer, MMU, MDU, BBCC
♦ 3.3-volt operation
♦ Implementation of full scan to achieve 99% fault coverage
♦ 60-MHz worst case commercial maximum clock rate using high-performance 0.5-micron process
♦ 60 MIPS peak, 45 MIPS sustained with standard compiled MIPScode at 60 MHz
♦ Models available: performance and software development, VHDL,Verilog, and gate-level, timing-accurate models
♦ Compatible with the full range of MIPS, third party software develop-ment, and System Verification Environment tools
♦ Fully testable in embedded ASIC designs
♦ MR4001 Lead Vehicle chip available with cache, MMU, and MDU
1.3MiniRISCProduct Family
The MiniRISC product family has all the necessary tools to develop asystem on a chip, including LSI Logic’s MiniSIM™ architectural simulator,Verilog and VHDL models, a System Verification Environment, a PROMmonitor, third party software support, and a core bond-out chip foremulation.
1-4 Introduction
1.4CoreWareProgram
The CoreWare program offers a new approach to system design.Through the CoreWare program, LSI Logic gives customers the ability tocombine the CW400x Microprocessor Core with other cores on a singlechip to create products uniquely suited to the customer’s applications.This approach – combining high-performance building blocks, sophisti-cated design software, and expert support – provides unparalleleddesign flexibility and allows designers to create high-quality, leading-edgeproducts for a wide range of markets.
The CoreWare program consists of three main elements: a library ofcores, a design development and simulation package, and expert appli-cations support. The CoreWare library contains a wide range of complexcores based on accepted and emerging industry standards from high-speed interconnect, digital video, DSP, and others. LSI Logic provides acomplete framework for device and system development and simulation.LSI Logic’s advanced ASIC technologies consistently produce Right-First-Time™ silicon. LSI Logic’s in-house experts provide design supportfrom system architecture definition through chip layout and test vectorgeneration.
1.4.1CoreWareBuilding Blocks
The CoreWare building blocks include elements based on the LSI Logichigh-performance standard products as well as other, industry-standardproducts. The CoreWare building blocks, which include embedded MIPSand SPARC processors, bus interface controllers, and a family of floating-point processors, are fully supported library elements for use in the LSILogic hardware development environment. Note that the building blocksinclude gate-level simulation models with timing information, so design-ers can accurately simulate device performance and trade off variousimplementation options. In addition to gate-level simulation models, thebuilding blocks also include behavioral simulation models.
1.4.2DesignEnvironment
LSI Logic’s C-MDE™ (Concurrent-Modular Design Environment®) designsystem and LSI ToolKit provides a complete framework for device andsystem development. The LSI ToolKit provides front-end support, whilethe C-MDE provides backend support.
The new ASIC families are supported by LSI Logic’s comprehensivesystem-on-a-chip design methodology. This design methodology usesboth internally developed and industry-standard tools integrated with the
CoreWare Program 1-5
LSI ToolKit. LSI ToolKit is a system of software and libraries that allowengineers to use third-party software to access LSI Logic's technology.Designers can select from a suite of industry-standard simulators,synthesizers, timing analyzers and test tools seamlessly integrated intoa common environment for verification and sign-off.
1.4.3Expert Support
LSI Logic’s in-house experts support the CoreWare program with high-level design and market experience in a wide variety of application areas.These experts provide design support from system architecture definitionthrough chip layout and test vector generation. They help determine howmany functions to integrate on a single chip, trading off functionality ver-sus cost to find the most cost-effective solution. When the trade-offs arecomplete, the designer and LSI Logic’s applications engineers implementand test the design using C-MDE and theCoreWare building blocks.
2-1
Chapter 2Function
This chapter describes the function of the MiniRISC CW400x Micropro-cessor Core. It contains the following sections:
♦ Section 2.1, “Microprocessor Overview”
♦ Section 2.2, “Functional Differences from the R3000 and the R4000CPUs”
♦ Section 2.3, “Pipeline Architecture”
♦ Section 2.4, “Load Delay Slot”
♦ Section 2.5, “Branch Delay Slot”
♦ Section 2.6, “Load Scheduling Support”
♦ Section 2.7, “WAITI Instruction: Power Saving Feature”
For an introduction to Memory Space see Section 6.2, “MMU Stub”
2.1MicroprocessorOverview
The MiniRISC CW400x Microprocessor Core is an exceptionally com-pact, high-performance microprocessor compatible with the MIPS R4000(all of the MIPS-I and most of the MIPS-II Instruction Set). Figure 2.1 isan internal block diagram of the MiniRISC CW400x Microprocessor Core.Descriptions of the internal blocks follow the figure.
2-2 Function
Figure 2.1CW400x InternalBlock Diagram
The Register File contains the general-purpose registers. It suppliessource operands to the execution units and handles the storage ofresults to target registers. The System Control Coprocessor (CP0) pro-cesses exceptions (which includes interrupts). The Arithmetic LogicalUnit (ALU) performs arithmetic and logical operations, as well asaddress calculations. The Shifter performs shift operations.
The CBus Interface passes data to and from the core. It allows theattachment of up to three tightly coupled special-purpose coprocessorsthat enhance the microprocessor’s general purpose computationalpower. Using this approach, high-performance, application-specific hard-ware can be made directly accessible to a programmer at the instruction-set level. For example, a coprocessor might offer accelerated bit-mappedgraphics operations or real-time video decompression. The interface alsoallows the attachment of a Memory Management Unit (MMU) and a BusInterface Unit (BIU).
The FlexLink Interface allows the logic designer to insert specializedarithmetic instructions into the Microprocessor Core. Adding a Computa-tional Unit (for instance LSI Logic’s Multiply/Divide Unit) to the FlexLinkInterface, for instance, allows the logic designer to insert a DSP-typeinstruction. This interface can handle one-cycle operations or multicycleoperations.
Register File
CP0 ALU Shifter
FlexLink Interface
CBus Interface
Functional Differences from the R3000 and the R4000 CPUs 2-3
2.2FunctionalDifferencesfrom the R3000and the R4000CPUs
1. The CW400x is not a Harvard architecture. The R3000 and R4000Microprocessors are Harvard architectures. The CW400x provides asingle cache/memory interface instead of interfaces for both instruc-tion cache and data cache, cutting the I/O count almost in half. Over-head outside of the CW400x associated with address buses, databuses and RAMs is dramatically reduced.
2. The CW400x uses a three-stage pipeline (Fetch, Execute, and Write-back) instead of the R3000 five-stage pipeline or the R4000 seven-stage pipeline. The R3000 RD and ALU Stages are merged into asingle Execute Stage. Since it is not a Harvard architecture, theCW400x does not need a MEM Stage like the R3000 CPU. Instead,the CW400x stalls internally in the Execute Stage and does thememory access in a second Execute Cycle.
3. The CW400x is a 32-bit architecture like the R3000. The R4000 is a64-bit machine with 32-bit programmability.
4. The CW400x CP0 is similar to the R3000 CP0. In particular, thefields within the CP0 Registers that are related to exception handlingare like the R3000, and the CW400x implements only the kernel anduser operating modes (no supervisor mode).
5. The CW400x implements the MIPS-I and MIPS-II Branch Likely andTrap Instructions. Other MIPS-II Instructions (Load Linked, StoreConditional, Sync, Load and Store Double Coprocessor Instructions)cause Reserved Instruction Exceptions.
6. The CW400x contains no multiply or divide circuitry. Multiply anddivide circuitry would significantly increase the area of the CW400x.Since many applications do not require high performance multiplyand divide, the CW400x’s FlexLink Interface is designed to supportoptional multiply/divide units with differing performance. Refer toSection 7.2, “FlexLink Interface” for more details.
2-4 Function
2.3PipelineArchitecture
The CW400x implements a three-stage pipeline (Instruction Fetch, Exe-cute, and Writeback). Figures 2.2 and 2.3 show the two forms of theCW400x three-stage pipeline.
Figure 2.2CW400x Pipeline
Figure 2.3CW400x Pipelinewith X2 Stall Cycle
The execution of a single CW400x instruction consists of the followingpipeline stages:
1. Instruction Fetch – The core fetches the instruction (IF).
2. Execute – The core executes all ALU instructions, resolves condi-tional branches, and calculates Load and Store addresses (X1). Thecore transfers Load and Store data from external memory or cache(performs memory accesses) in a second Execute (Stall) Cycle (X2).
3. Writeback – The core writes the results into the Register File (WB).
Figures 2.4 through 2.6 show instruction pipeline examples.
Figure 2.4Three ConsecutiveNon-Load/StoreInstructions
Figure 2.5Load/StoreInstruction
X1 X2 WB
Execute WritebackInstruction Fetch
IF
IF X1 WB
IF X1 WB
IF X1 WB
1. Non-Load/Store Instruction
2. Non-Load/Store Instruction
3. Non-Load/Store Instruction
IF X1
IF X1 X2
IF X1 WB
WB
WB
1. Non-Load/Store Instruction
2. Load/Store Instruction
3. Non-Load/Store Instruction
Load Delay Slot 2-5
Figure 2.6Two ConsecutiveLoad/Store Instructions
2.4Load Delay Slot
The CW400x does not require a load delay slot.
In the five-stage R3000 architecture, the load delay slot refers to theinstruction following any load. The instruction following a load cannot usethe data from that load. Software must ensure that the instruction in theload delay slot does not depend on the data of the load.
Since the CW400x is a three-stage pipeline, its architecture does notrequire this restriction. All instructions, including load data dependentinstructions may follow a load.
Figure 2.7 shows an example of a load followed by an instruction depen-dent on the load data. A load delay slot is unnecessary because datafrom the load is valid in its WB Stage and can be bypassed to the fol-lowing instruction’s X1 Stage.
Figure 2.7WB to X1 StageBypass (No LoadDelay SlotNecessary)
2.5Branch DelaySlot
Because they are pipelined architectures, the CW400x, the R3000, andthe R4000 have a branch delay slot.
The branch delay slot refers to the instruction following any jump orbranch instruction. The branch delay slot prevents excess stalls and
1. Non-Load/Store Instruction
2. Load/Store Instruction
3. Load/Store Instruction
4. Non-Load/Store Instruction
IF X1
IF X1 X2
IF X1 WB
WB
WB
IF
X1
WB
X2
1.NOP
2. LOAD $10, ($0)
3. ADD $20, $10, $10
4. NOP
IF X1
IF X1 X2
IF X1 WB
WB
WB
IF X1 WB
2-6 Function
increases performance, by performing branch evaluation and addressgeneration at the same time as the instruction fetch of the instruction inthe branch delay slot. This causes a one instruction delay with the pos-sibilities shown in Figures 2.8 through 2.11.
All jumps and all branch instruction, when the branch is taken, executethe instruction in the branch delay slot before executing the jump/branchtarget instruction. Non-likely branch instructions, when the branch is nottaken, execute the instruction in the branch delay slot, like any otherinstruction, and continue the instruction flow. Likely branch instructions,when the branch is not taken, kill the instruction in the branch delay slotand continue the instruction flow.
Figure 2.8 shows the instruction flow for the following code:
J targetADD $0, $0OR $0, $0
target: AND $0, $0
Figure 2.8Branch Taken
Figure 2.9 shows the instruction flow for the following code:
BNE $0, $0, targetADD $0, $0OR $0, $0
target: AND $0, $0
Figure 2.9Branch Not Taken
1. Jump Instruction (J)
2. Add Instruction
3. And Instruction
IF X1
IF X1 WB
IF WB
WB
X1
Delay Slot
IF X1 WB
IF X1 WB
IF X1 WB
1. Branch Instruction (BNE)
2. Add Instruction
3. Or Instruction
Delay Slot
Load Scheduling Support 2-7
Figure 2.10 shows the instruction flow for the following code:
BGEZL $0, targetADD $0, $0OR $0, $0
target: AND $0, $0
Figure 2.10Branch LikelyTaken
Figure 2.11 shows the instruction flow for the following code (note thatthe branch is not taken):
BLTZL $0, targetADD $0, $0OR $0, $0
target: AND $0, $0
Figure 2.11Branch Likely NotTaken
2.6LoadSchedulingSupport
The CW400x supports load scheduling for data loads. The CW400xreleases the stall in the X2 Stage of a missed fetch and the pipeline con-tinues as if the data was fetched. When the data from the load requestis ready, the CW400x writes the data back to the Register File.
The CW400x stalls the pipeline to allow the scheduled load’s WB Stageto coexist with the current instruction’s WB Stage. Upon a data depen-dency condition, the CW400x stalls until the data is available.
IF X1 WB
IF X1 WB
IF X1 WB
1. Branch Instruction (BGEZL)
2. Add Instruction
3. And Instruction
Delay Slot
IF X1 WB
IF X1 WB
IF X1 WB
1. Branch Instruction (BLTZL)
2. Add Instruction (Cancelled or Killed)
3. Or Instruction
Delay Slot
2-8 Function
Figure 2.12 shows an example of the instruction flow for a scheduledload instruction.
Figure 2.12Scheduled LoadInstruction
Note that Instruction 1’s WB Stage and Instruction 3’s WB Stage coexistand that there will be at least one Stall Cycle during that pipeline stage.
The CW400x supports a single scheduled load. If a second load instruc-tion enters the X1 Stage, the CW400x stalls until the first load is fetched.The CW400x will not allow the second load to reach its X2 Stage untilthe outstanding scheduled load is resolved.
Figure 2.13 shows an example of the instruction flow for a scheduledload instruction followed by a second load.
Figure 2.13Scheduled LoadFollowed by aSecond Load
The CW400x supports scheduling for the LB, LH, LW, LWCz, LBU, andLHU Instructions, but not the LWL and LWR Instructions. The CW400xstalls in the X2 Stage of the LWL and LWR Instructions until the data isfetched.
The coprocessor may implement load scheduling support for the LWCXInstruction. The coprocessor must stall for data dependencies. To disableload scheduling support for the LWCX Instruction, the coprocessor muststall the CW400x until the data is ready.
If the Bus Interface Unit (BIU) does not implement load scheduling, itmust stall the CW400x for all loads in their X2 Stage until the data isavailable. The BIU must also handle write-after-read (WAR) and read-after-write (RAW) data hazards. Once scheduled (past the X2 Stage),loads cannot be cancelled, so the BIU must return the required data tothe CW400x or coprocessor.
IF WB
IF X1 X2 WB
IF X1 WB
IF X1 WB
X1
1. Scheduled Load Instruction
2. Non-Load/Store Instruction
3. Non-Load/Store Instruction
4. Non-Load/Store Instruction
1. Scheduled Load Instruction
2. Load Instruction
3. Non-Load/Store Instruction
IF X1 X2
IF X1 WB
WB
IF X1 WB
X2
WAITI Instruction: Power Saving Feature 2-9
2.7WAITIInstruction:Power SavingFeature
LSI Logic added the WAITI Instruction to the CW400x so that theCW400x can be put into an idle state to save power. The CW400x idleswhen the WAITI Instruction enters its WB Stage. When any interrupt isasserted, the CW400x exits the idle state and jumps to the ExceptionVector. The EPC Register contains the address of the instruction that fol-lows the WAITI Instruction (the target of the branch if WAITI is in thebranch delay slot).
For more information on the WAITI Instruction, see Section 4.11, “Sys-tem Control Coprocessor (CP0) Instructions.”
3-1
Chapter 3Signals
This chapter describes the signals that comprise the bit-level interface ofthe CW400x. Table 3.1 summarizes the signals.
The signals are described in alphabetical order by mnemonic. Eachsignal definition contains the mnemonic and the full signal name. Themnemonics for signals that are active LOW end in an “N” and have anoverbar over their names, and the mnemonics for signals that are activeHIGH end in a “P.”
In the descriptions that follow, the verb assert means to drive TRUE oractive. The verb deassert means to drive FALSE or inactive.
Computational Unit refers to any arithmetic/computational unit that isattached to the FlexLink Interface (which could be LSI Logic’s MDU). BusInterface Unit (BIU) refers to either the BIU in LSI Logic’s BBCC BuildingBlock or a system designer-defined BIU if the BBCC is not present.
Table 3.1Signal Summary Signal Description I/O
ADDRP[31:0] Address Bus Output
ASELP Computational Unit Select Input
ASTALLP Computational Unit Stall Request Input
AXBUSP[31:0] Computational Unit Result Bus Input
BBEP Bus Interface Unit (BIU) Bus Error Input
BBIG_ENDIANP Big Endian Select Input
BBUS_STEALN BIU Bus Steal Input
BCPCONDP[3:0] Coprocessor Condition Input
BCPU_RESETN CW400x Reset Input
BDRDYP BIU Load Data Ready Input
BINTP[5:0] Interrupts Input
(Sheet 1 of 3)
3-2 Signals
BIRDYP BIU Instruction Data Ready Input
CADDR_ERRORP Memory Address Error Output
CBYTEP[3:0] Byte Enables Output
CINTGRP Interrupt Grant Output
CIP_DN CW400x Instruction/Data Indication Output
CIR_BOTP[5:0] Instruction Register Bottom Six Bits Output
CIR_TOPP[5:0] Instruction Register Top Six Bits Output
CKILLMEMP Kill Memory Transaction Output
CKILLWP Kill Instruction in Writeback Stage Output
CKILLXP Kill Instruction in Execute Stage Output
CLOIDP[3:0] Microprocessor Implementation Input
CLOPRP[3:0] Microprocessor Revision Input
CMEM_FETCHP CW400x Memory Fetch Request Output
COEN CW400x Output Enable Input
COP_DRIVEP Coprocessor Drives Data Bus Indicator Output
COPP[1:0] Coprocessor Number Output
CRSP[31:0] CW400x Source Register (rs ) Bus Output
CRTP[31:0] CW400x Source Register (rt ) Bus Output
CRUN_INN CW400x Run Enable Input
CRUN_OUTP CW400x Run Request Output
CRX_VALIDN Register Buses Valid Output
CSTOREP CW400x Store to Memory Request Output
CTEST_RFWEP Test Mode Register File Write Enable Input
CWAITIP Wait for Interrupt Output
DATAP[31:0] CW400x Data Bus Bidirectional
GSCAN_ENABLEP Scan Test Mode Enable Input
GSCAN_INP Scan Test Input Input
GSCAN_OUTP Scan Test Output Output
Table 3.1 (Cont.)Signal Summary Signal Description I/O
(Sheet 2 of 3)
3-3
GTEST_ENABLEP Test Enable Input
MTLBMISSEXCP TLB1 Miss Exception Input
MTLBMODEXCP TLB Modified Exception Input
MTLBSHUTP TLB Shutdown Input
MUTLBMISSEXCP User TLB Miss Exception Input
PCLKP System Clock Input
1. Translation Lookaside Buffer.
Table 3.1 (Cont.)Signal Summary Signal Description I/O
(Sheet 3 of 3)
3-4 Signals
ADDRP[31:0] Address Bus OutputThe core drives these signals with the memory address.
ASELP Computational Unit Select InputA computational unit asserts this signal HIGH to informthe core that the current instruction is a user-definedcomputational unit instruction.
ASTALLP Computational Unit Stall Request InputA computational unit asserts this signal HIGH to stall thepipeline.
AXBUSP[31:0] Computational Unit Result Bus InputA computational unit puts the result of the arithmeticoperation onto this bus.
BBEP BIU Bus Error InputAsserting this signal HIGH causes the core to take a BusError Exception.
BBIG_ENDIANPBig Endian Select InputDriving this signal HIGH causes the core to operate withbig-endian byte ordering. Driving this signal LOW causesthe core to operate with little-endian byte ordering.
BBUS_STEALNBIU Bus Steal InputThe BIU asserts this signal LOW to inform the CW400xthat the BIU will become the Data Bus Master starting atthe rising edge of the next clock cycle.
BCPCONDP[3:0]Coprocessor Condition InputThe core tests these signals during the Execute Stage ofBCzF, BCzFL, BCzT, and BCzTL instructions. These sig-nals indicate the corresponding Coprocessor Condition.BCPCONDP[3:0] correspond to Coprocessors 3, 2, 1, 0.
BCPU_RESETNCW400x Reset InputAsserting this signal LOW resets the core.
BDRDYP BIU Load Data Ready InputAsserting this signal HIGH informs the core thatDATAP[31:0] contains valid data for a data fetch.
3-5
BINTP[5:0] Interrupts InputAsserting any of these signals causes the core to take anInterrupt Exception when interrupts are enabled.BINTP[5:0] correspond to Interrupts 5, 4, 3, 2, 1, 0.
BIRDYP BIU Instruction Data Ready InputAsserting this signal HIGH informs the core thatDATAP[31:0] contains valid data for an instruction fetch.
CADDR_ERRORPMemory Address Error OutputThe core asserts this signal HIGH to indicate a memorytransaction address error has occurred.
CBYTEP[3:0] Byte Enables OutputThese signals indicate (when asserted HIGH) which cor-responding bytes are valid on DATAP[31:0].
The following table shows the correspondence betweenbyte enables and the data bus bytes.
CINTGRP Interrupt Grant OutputThe core asserts this signal HIGH to indicate an excep-tion was taken due to an interrupt.
CIP_DN CW400x Instruction/Data Indication OutputThis signal qualifies the type of memory fetch when amemory fetch is indicated by CMEM_FETCHP. The coredrives this signal HIGH to indicate that it is performing aninstruction fetch. The core drives this signal LOW to indi-cate that it is performing a data fetch.
CIR_BOTP[5:0]Bottom Six Bits of Instruction Register OutputThese signals contain the bottom six bits of the Instruc-tion Register. These signals allow a computational unit todecode its own instructions.
ByteEnable
CorrespondingDATAP[31:0] Byte
CBYTEP3 [31:24]CBYTEP2 [23:16]CBYTEP1 [15:8]CBYTEP0 [7:0]
3-6 Signals
CIR_TOPP[5:0]Top Six Bits of Instruction Register OutputThese signals contain the top six bits of the InstructionRegister. These signals allow a computational unit todecode its own instructions.
CKILLMEMP Memory Transfers Killed OutputThe core asserts this signal HIGH to indicate that the cur-rent memory access is cancelled due to an exception.
CKILLWP Instruction Killed in Writeback Stage OutputThe core asserts this signal HIGH to indicate that theinstruction in the Writeback Stage is killed.
CKILLXP Instruction Killed in Execute Stage OutputThe core asserts this signal HIGH to indicate that theinstruction in the Execute Stage is killed.
CLOIDP[3:0] Microprocessor Implementation Number InputThese signals contain Bits [11:8] of the PRId Register.
CLOPRP[3:0] Microprocessor Revision Number InputThese signals contain Bits [3:0] of the PRId Register.
CMEM_FETCHPCW400x Memory Fetch Request OutputThe core asserts this signal HIGH to indicate that it isperforming a memory fetch.
COEN CW400x Output Enable InputThe Global Output Enable Module (GOE) asserts thissignal to enable the core to drive data onto DATAP[31:0].
COP_DRIVEP Coprocessor Drives Data Bus Indicator OutputThe core asserts this signal HIGH to inform the GOE thata coprocessor should drive DATAP[31:0].
COPP[1:0] Coprocessor Number OutputThese signals indicate which coprocessor should driveDATAP[31:0].
CRSP[31:0] CW400x Source Register ( rs ) Bus OutputThese signals contain the rs Operand of the currentinstruction.
3-7
CRTP[31:0] CW400x Source Register ( rt ) Bus OutputThese signals contain the rt Operand of the currentinstruction.
CRUN_INN CW400x Run Enable InputAsserting this signal LOW causes the core to go on to thenext bus run cycle (a clock cycle in which the bus is run-ning). Deasserting this signal HIGH stalls the core.
CRUN_OUTP CW400x Run Request OutputThe core asserts this signal HIGH to request to externalcontrol logic that it go on to the next bus run cycle. Thecore deasserts this signal LOW to request stalling thepipeline.
CRX_VALIDN Register Buses Valid OutputThe core asserts this signal LOW to indicate to a compu-tational unit that the Source Register Buses are valid.
CSTOREP CW400x Store to Memory Request OutputThe core asserts this signal HIGH to request a write tomemory.
CTEST_RFWEPTest Mode Register File Write Enable InputAsserting this signal HIGH allows the core to write datato the Register File. Deasserting this signal LOW disal-lows writing to the Register File.
CWAITIP Wait for Interrupt OutputThe core asserts this signal HIGH to indicate that aWAITI Instruction has caused it to go into a low powermode. The core deasserts this signal when it receives aninterrupt on BINTP[5:0].
DATAP[31:0] CW400x Data Bus BidirectionalThese signals transfer data to and from the core.
GSCAN_ENABLEPScan Test Mode Enable InputAsserting this signal enables scan testing. (For moreinformation on scan testing see Section 8.2, “Scan Meth-odology”)
GSCAN_INP Scan Test Input InputThe tester drives this signal with the scan test input.
3-8 Signals
GSCAN_OUTPScan Test Output OutputThe core drives this signal with the scan test output.
GTEST_ENABLEPTest Enable InputAsserting this signal HIGH enables scan testing of thechip’s system logic. Note that this signal must always beasserted during a scan test. Note also that this signal isused raw (not latched at all). (For more information onscan testing see Section 8.2, “Scan Methodology”)
MTLBMISSEXCPTLB Miss Exception InputAsserting this signal HIGH causes the core to take aTranslation Lookaside Buffer (TLB) Load or a TLB StoreException.
MTLBMODEXCPTLB Modified Exception InputAsserting this signal HIGH causes the core to take a TLBModified Exception.
MTLBSHUTP TLB Shutdown InputDriving this signal HIGH sets Bit 21 of the CW400x Sta-tus Register (TLB Shutdown Bit). Driving this signal LOWclears Bit 21 of the CW400x Status Register.
MUTLBMISSEXCPUser TLB Miss Exception InputAsserting this signal HIGH causes the core to take a TLBLoad or a TLB Store Exception.
PCLKP System Clock InputThis signal is the global clock input. All peripheral logicshould gate this clock with only one gate.
4-1
Chapter 4Instructions
♦ This chapter describes the format and use of the CW400x Instruc-tions. This chapter contains the following sections:
♦ Section 4.1, “Instruction Formats”
♦ Section 4.2, “CW400x Opcode Bit Encoding”
♦ Section 4.3, “Instruction Summary”
♦ Section 4.4, “Load and Store Instructions”
♦ Section 4.5, “Computational Instructions”
♦ Section 4.6, “Jump and Branch Instructions”
♦ Section 4.7, “Branch Likely Instructions”
♦ Section 4.8, “Special Control Instructions”
♦ Section 4.9, “Trap Instructions”
♦ Section 4.10, “Coprocessor Instructions”
♦ Section 4.11, “System Control Coprocessor (CP0) Instructions”
4.1InstructionFormats
Every instruction consists of a single word (32 bits) aligned on a wordboundary. Figures 4.1 through 4.3 show the three instruction formats: I-type (immediate), J-type (jump), and R-type (register). This restricted for-mat approach simplifies instruction decoding. All variable subfields in aninstruction format (such as rs , rt , and immediate ) are shown in lowercase.
The two instruction subfields op and funct have constant six-bit valuesfor specific instructions. These values are given uppercase mnemonicnames. For example, op is LB in the Load Byte instruction and op isSPECIAL and funct is ADD in the Add instruction.
4-2 Instructions
Figure 4.1I-Type (Immediate)Instruction
Figure 4.2J-Type (Jump)Instruction
Figure 4.3R-Type (Register)Instruction
op Six-Bit Major Operation Code
rs Five-Bit Source Register Specifier
rt Five-Bit Target (Source/Destination Register)
immediate 16-Bit Immediate, Branch Displacement, or AddressDisplacement
target 26-Bit Jump Target Address
rd Five-Bit Destination Register Specifier
shamt Five-Bit Shift Amount
funct Six-Bit Function Field
A single field may have both fixed and variable subfields, such that thename contains both uppercase and lowercase characters. For example,MFCz (Move from Coprocessor) represents four different six-bit operationcodes (opcodes), which designate one of three coprocessor classes (1through 3), concatenated with the fixed five-bit subfield MF.
31 26 25 21 20 16 15 0
op rs rt immediate
31 26 25 0
op target
31 26 25 21 20 16 15 11 10 6 5 0
op rs rt rd shamt funct
CW400x Opcode Bit Encoding 4-3
For the sake of clarity, an alias is sometimes used for a variable subfieldfor specific instruction formats. For example, base is used in place of rs
in the format for load and store instructions. Such an alias is alwayslower case, since it refers to a variable subfield.
4.2CW400xOpcode BitEncoding
This section lists the major and minor opcodes with their respective bitencodings in tabular form. Table 4.2 lists the bit encoding for the CW400xmajor opcodes. Tables 4.3 through 4.7 list the bit encoding for the minoropcodes. Table 4.1 shows a shading key that defines the availability ofunused opcodes in Tables 4.2 through 4.7. Note that system designerscan assign their own opcodes from those available.
Table 4.1Shading Key for Tables4.2 through 4.6
Available for Computational Unit-supported instructions. (The CW400x causes an RIException which can be overridden by the Computational Unit).
Available for Coprocessor-supported instructions (CW400x treats as NOP).
Not available to Computational Unit or Coprocessor (CW400x causes RI Exception).
4-4 Instructions
Table 4.2Major Opcode (op) BitEncoding
[28:26][31:29] 000 2 0012 0102 0112 1002 1012 1102 1112
0002 SPECIAL1 REGIMM2 J JAL BEQ BNE BLEZ BGTZ
0012 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI
0102 COP03 COP13 COP23 COP33 BEQL BNEL BLEZL BGTZL
0112
1002 LB LH LWL LW LBU LHU LWR
1012 SB SH SWL SW SWR
1102 LWC1 LWC2 LWC3
1112 SWC1 SWC2 SWC3
1. See Table 4.3 for the funct bit encodings of the SPECIAL minor opcodes.2. See Table 4.4 for the encoding requirements of REGIMM Instruction Bits.3. See Tables 4.5 through 4.7 for encoding requirements of Coprocessor Instruction Bits.
Table 4.3SPECIAL MinorOpcode funct BitEncoding
[2:0][5:3] 000 2 0012 0102 0112 1002 1012 1102 1112
0002 SLL SRL SRA SLLV SRLV SRAV
0012 JR JALR SYSCALL BREAK
0102
0112
1002 ADD ADDU SUB SUBU AND OR XOR NOR
1012 SLT SLTU
1102 TGE TGEU TLT TLTU TEQ TNE
1112
CW400x Opcode Bit Encoding 4-5
Table 4.4REGIMM MinorOpcode rt BitEncoding
[18:16][20:19] 000 2 0012 0102 0112 1002 1012 1102 1112
002 BLTZ BGEZ BLTZL BGEZL
012 TGEI TGEIU TLTI TLTIU TEQI TNEI
102 BLTZAL BGEZAL BLTZALL BGEZALL
112
Table 4.5COPz (z = 0, 1, 2, 3) rsMinor Opcode BitEncoding
[23:21][25:24] 000 2 0012 0102 0112 1002 1012 1102 1112
002 MFCz CFCz MTCz CTCz
012 BC1
102
112
1. Branch on Coprocessor. See Table 4.6 for further encoding requirements of BC Instruction Bits.
Table 4.6COPz (z = 0, 1, 2, 3) rtMinor Opcode BitEncoding
[18:16][20:19] 000 2 0012 0102 0112 1002 1012 1102 1112
002 BCzF BCzT BCzFL BCzTL
012
102
112
4-6 Instructions
Table 4.7COP0 Minor Opcodefunct Bit Encoding(Bits[25:24] = 1x2)
[2:0][5:3] 000 2 0012 0102 0112 1002 1012 1102 1112
0002
0012
0102 RFE
0112
1002 WAITI
1012
1102
1112
Table 4.8COPz (z = 1, 2, 3)Minor Opcode functBit Encoding(Bits[25:24] = 1x2)
[2:0][5:3] 000 2 0012 0102 0112 1002 1012 1102 1112
0002
0012
0102
0112
1002
1012
1102
1112
Instruction Summary 4-7
4.3InstructionSummary
Table 4.9 summarizes the CW400x Instruction Set. The CW400x sup-ports both MIPS-I and a subset of the MIPS-II Instruction Set (all theBranch Likely and Trap Instructions), and also implements some addi-tional CW400x-specific Instructions. The CW400x handles TLB-relatedinstructions as NOPs, letting the MMU handle them.
All instructions are 32 bits long. In Table 4.9, the MIPS-II and CW400x-specific Instructions are flagged to distinguish them from the MIPS-IInstructions.
Sections 4.4 through 4.11 provide more detail on the instructions. Foreven more detailed instruction descriptions see the LR33000 FamilyInstruction Set Guide.
Table 4.9CW400x Instructions
Op Description Op Description
Load/Store Instructions Jump and Branch InstructionsLB Load Byte BCzF Branch on Coprocessor z False
LBU Load Byte Unsigned BCzT Branch on Coprocessor z TrueLH Load Halfword BEQ Branch on EqualLHU Load Halfword Unsigned BGEZ Branch on Greater Than or Equal to ZeroLW Load Word BGEZAL Branch on Greater Than or Equal to Zero and LinkLWL Load Word Left BGTZ Branch on Greater Than ZeroLWR Load Word Right BLEZ Branch on Less Than or Equal to ZeroSB Store Byte BLTZ Branch on Less Than ZeroSH Store Halfword BLTZAL Branch on Less Than Zero and LinkSW Store Word BNE Branch on Not EqualSWL Store Word Left J JumpSWR Store Word Right JAL Jump and LinkImmediate Arithmetic Instructions JALR Jump and Link RegisterADDI Add Immediate JR Jump RegisterADDIU Add Immediate Unsigned Three-Operand, Register-Type Arithmetic InstructionsANDI AND Immediate ADD AddLUI Load Upper Immediate ADDU Add UnsignedORI OR Immediate AND Logical AndSLTI Set on Less Than Immediate NOR Logical NorSLTIU Set on Less Than Immediate Unsigned OR Logical OrXORI Exclusive OR Immediate SLT Set on Less ThanCoprocessor Instructions 1 SLTU Set on Less Than UnsignedBCzF Branch on Coprocessor z False SUB SubtractBCzT Branch on Coprocessor z True SUBU Subtract UnsignedCFCz Move Control from Coprocessor z XOR Exclusive Logical OrCOPz Coprocessor Operation Trap InstructionsCTCz Move Control to Coprocessor z TEQ2 Trap on EqualLWCz Load Word to Coprocessor z (z ≠ 0) TEQI2 Trap on Equal ImmediateMTCz Move to Coprocessor z TGE2 Trap on Greater Than or EqualMFCz Move from Coprocessor z TGEI2 Trap on Greater Than or Equal Immediate
4-8 Instructions
4.4Load and StoreInstructions
Load and Store Instructions move data between memory and generalregisters. They are all I-type Instructions. The only addressing modedirectly supported is base register plus 16-bit signed immediate offset.
The Load/Store Instruction operation code (opcode) determines theaccess type, which in turn indicates the size of the data item to be loadedor stored. Regardless of access type or byte-numbering order (endian-ness), the address specifies the byte that has the smallest byte addressof all the bytes in the addressed field. For a big-endian machine, this isthe most significant byte; for a little-endian machine, this is the least sig-nificant byte.
The bytes that are used within the addressed word can be determineddirectly from the access type and the two low-order bits of the address,as shown in Figure 4.4. Note that certain combinations of access typeand low-order address bits can never occur; only the combinationsshown in Figure 4.4 are permissible.
SWCz Store Word from Coprocessor z (z ≠ 0) TGEIU2 Trap on Greater Than or Equal Immediate UnsignedBranch Likely Instructions TGEU2 Trap on Greater Than or Equal UnsignedBCzFL2 Branch on Coprocessor z False Likely TLT2 Trap on Less ThanBCzTL2 Branch on Coprocessor z True Likely TLTI2 Trap on Less Than ImmediateBEQL2 Branch on Equal Likely TLTIU2 Trap on Less Than Immediate UnsignedBGEZALL2 Branch on Greater Than or Equal to Zero and Link Likely TLTU2 Trap on Less Than UnsignedBGEZL2 Branch on Greater Than or Equal to Zero Likely Shift InstructionsBGTZL2 Branch on Greater Than Zero Likely SLL Shift Left Logical
BLEZL2 Branch on Less Than or Equal to Zero Likely SLLV Shift Left Logical VariableBLTZALL2 Branch on Less Than Zero and Link Likely SRA Shift Right ArithmeticBLTZL2 Branch on Less Than Zero Likely SRAV Shift Right Arithmetic VariableBNEL2 Branch on Not Equal Likely SRL Shift Right LogicalSystem Control Coprocessor (CP0) Instructions SRLV Shift Right Logical VariableMFC0 Move from CP0 Special Control InstructionsMTC0 Move to CP0 BREAK BreakpointRFE Restore from Exception SYSCALL System CallWAITI3 Wait for Interrupt
1. Also see first two Branch Likely Instructions.2. MIPS-II instruction.3. MR4001-specific instruction.
Table 4.9 (Cont.)CW400x Instructions
Op Description Op Description
Load and Store Instructions 4-9
Figure 4.4Byte Specificationsfor Loads/Stores
Word
Access
0
Type
AddressBytes Accessed
Big-EndianA1 A0
0 1 2 3 3 2 1 0
0 1 2
1 2 3
0 1
2 3
0
1
2
3 3
2
1
0
3 2
1 0
123
2 1 0
Little-Endian
0
0 0
0 1
0 0
1 0
Byte
0 0
0 1
1 0
1 1
31 0 31 0
Low-Order
Bits:
Tribyte
Halfword
4-10 Instructions
Table 4.10 summarizes the CW400x Load and Store Instructions.
Table 4.10Load and StoreInstruction Summary
Instruction Format and Description
Load Byte LB rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Sign-extends the content of the addressed byte andloads this value into Register rt .
Load Byte Unsigned LBU rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Zero-extends the content of the addressed byte andloads this value into Register rt .
Load Halfword LH rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Sign-extends the content of the addressed halfwordand loads this value into Register rt .
Load HalfwordUnsigned
LHU rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Zero-extends the content of the addressed halfwordand loads this value into Register rt .
Load Word LW rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Loads the addressed word into Register rt .
Load Word Left LWL rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create a byte address. Loads the addressed word. Shifts this word leftso that the addressed byte is the leftmost byte of the word. Merges the bytesfrom this word with the contents of Register rt and loads the result into Registerrt .
Load Word Right LWR rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create a byte address. Shifts the addressed word right so that theaddressed byte is the rightmost byte of a word. Merges the bytes from memorywith the contents of Register rt and loads the result into Register rt .
(Sheet 1 of 2)
Load and Store Instructions 4-11
Store Byte SB rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Stores the least-significant byte of Register rt intothe addressed location.
Store Halfword SH rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Stores the least-significant halfword of Register rtinto the addressed location.
Store Word SW rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Stores the content of Register rt into the addressedlocation.
Store Word Left SWL rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create a byte address. Shifts the contents of Register rt right so thatwhat was the leftmost byte of the register word is now aligned to the same offsetas the addressed byte. Stores the bytes in the register into the correspondingbytes at the addressed byte.
Store Word Right SWR rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create a byte address. Shifts the contents of Register rt left so that whatwas the rightmost byte of the register word is now aligned to the same offset asthe addressed byte. Stores the bytes in the register into the corresponding bytesat the addressed byte
Table 4.10 (Cont.)Load and StoreInstruction Summary
Instruction Format and Description
(Sheet 2 of 2)
4-12 Instructions
4.5ComputationalInstructions
Computational Instructions perform arithmetic, logical, and shift opera-tions on values in registers. They occur in both R-type (both operandsare registers) and I-type (one operand is a 16-bit immediate) formats.There are four categories of Computational Instructions:
♦ Table 4.11 summarizes ALU Immediate Instructions.
♦ Table 4.12 summarizes Three-Operand, Register-Type Instructions.
♦ Table 4.13 summarizes Shift Instructions.
Table 4.11ALU ImmediateArithmetic InstructionSummary
Instruction Format and Description
Add Immediate ADDI rt, rs, immediateAdds the 16-bit, sign-extended immediate to the content of Register rs andstores the 32-bit result into Register rt . Traps on two’s complement overflow.
Add ImmediateUnsigned
ADDIU rt, rs, immediateAdds the 16-bit, sign-extended immediate to the content of Register rs andstores the 32-bit result into Register rt . Does not trap on overflow.
Set on Less ThanImmediate
SLTI rt, rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas signed 32-bit integers. If the content of Register rs is less than theimmediate , stores a one into Register rt ; otherwise stores a zero into Registerrt .
Set on Less ThanImmediate Unsigned
SLTIU rt, rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas unsigned 32-bit integers. If the content of Register rs is less than theimmediate , stores a one into Register rt ; otherwise stores a zero into Registerrt .
AND Immediate ANDI rt, rs, immediateZero-extends the 16-bit immediate , and ANDs this value with the content ofRegister rs . Stores the result into Register rt .
OR Immediate ORI rt, rs, immediateZero-extends the 16-bit immediate , and ORs this value with the content of Reg-ister rs . Stores the result into Register rt .
Exclusive ORImmediate
XORI rt, rs, immediateZero-extends the 16-bit immediate , and exclusive ORs this value with the con-tent of Register rs . Stores the result into Register rt .
Load UpperImmediate
LUI rt, immediateShifts the 16-bit immediate left 16 bits. Sets the least-significant 16 bits of theword to zeros. Stores the result into Register rt .
Computational Instructions 4-13
Table 4.12Three-Operand,Register-TypeArithmetic InstructionSummary
Instruction Format and Description
Add ADD rd, rs, rtAdds the contents of Registers rs and rt and stores the 32-bit result into Reg-ister rd . Traps on two’s complement overflow.
Add Unsigned ADDU rd, rs, rtAdds the contents of Registers rs and rt and stores the 32-bit result into Reg-ister rd . Does not trap on overflow.
Subtract SUB rd, rs, rtSubtracts the content of Register rt from the content of Register rs and storesthe 32-bit result into Register rd . Traps on two’s complement overflow.
Subtract Unsigned SUBU rd, rs, rtSubtracts the content of Register rt from the content of Register rs and storesthe 32-bit result into Register rd . Does not trap on overflow.
Set on Less Than SLT rd, rs, rtCompares the content of Register rt to the content of Register rs as signed,32-bit integers. If the content of Register rs is less than the content of Registerrt , stores a one into Register rd ; otherwise stores a zero into Register rd .
Set on Less ThanUnsigned
SLTU rd, rs, rtCompares the content of Register rt to the content of Register rs as unsigned,32-bit integers. If the content of Register rs is less than the content of Registerrt , stores a one into Register rd ; otherwise stores a zero into Register rd .
AND AND rd, rs, rtBitwise ANDs the contents of Registers rs and rt and stores the result into Reg-ister rd .
OR OR rd, rs, rtBitwise ORs the contents of Registers rs and rt and stores the result into Reg-ister rd .
Exclusive OR XOR rd, rs, rtBitwise exclusive ORs the contents of Registers rs and rt and stores the resultinto Register rd .
NOR NOR rd, rs, rtBitwise NORs the contents of Registers rs and rt and stores the result intoRegister rd .
4-14 Instructions
4.6Jump andBranchInstructions
Jump and Branch Instructions change the control flow of a program. AllJump and Branch Instructions occur with a one-instruction delay. That is,the instruction immediately following the jump or branch is always exe-cuted while the target instruction is being fetched from storage. Refer toSection 2.5, “Branch Delay Slot,” for a detailed discussion of the DelayedJump and Branch Instructions.
The J-type instruction format is used for both jump and jump-and-linkinstructions for subroutine calls. In this format, the 26-bit target addressis shifted left two bits and combined with the 4 high-order bits of thecurrent program counter to create a 32-bit absolute address.
The R-type instruction format, which takes a 32-bit byte addresscontained in a register, is used for returns, dispatches, and cross-pagejumps.
Table 4.13Shift InstructionSummary
Instruction Format and Description
Shift Left Logical SLL rd, rt, shamtShifts the bits of Register rt left by shamt bits, and inserts zeros into the low-order bits. Stores the 32-bit result into Register rd.
Shift Right Logical SRL rd, rt, shamtShifts the bits of Register rt right by shamt bits, and inserts zeros into the high-order bits. Stores the 32-bit result into Register rd .
Shift Right Arithmetic SRA, rd, rt, shamtShifts the bits of Register rt right by shamt bits, and sign-extends the high-orderbits. Stores the 32-bit result into Register rd .
Shift Left LogicalVariable
SLLV rd, rt, rsShifts the bits of Register rt left by the value contained in the low-order 5 bitsof Register rs . Inserts zeros into the low-order bits of Register rt and stores the32-bit result into Register rd .
Shift Right LogicalVariable
SRLV rd, rt, rsShifts the bits of Register rt right by the value contained in the low-order 5 bitsof Register rs . Inserts zeros into the high-order bits of Register rt and storesthe 32-bit result into Register rd .
Shift Right ArithmeticVariable
SRAV rd, rt, rsShifts the bits of Register rt right by the value contained in the low-order 5 bitsof Register rs . Sign-extends the high-order bits of Register rt and stores the32-bit result into Register rd.
Jump and Branch Instructions 4-15
Branches have 16-bit signed offsets relative to the program counter(I-type). Jump-and-link and Branch-and-link Instructions save a returnaddress in Register 31.
Table 4.14 summarizes the CW400x Jump and Branch Instructions.
Table 4.14Jump and BranchInstruction Summary
Instruction Format and Description
Jump J targetShifts the 26-bit target address left two bits, combines this value with the fourhigh-order bits of the program counter, and jumps to the address with a one-instruction delay.
Jump and Link JAL targetShifts the 26-bit target address left two bits, combines this value with the fourhigh-order bits of the program counter, and jumps to the address with a one-instruction delay. Stores the address of the instruction following the delay slotinto Register r31 (the Link Register).
Jump Register JR rsJumps to the address contained in Register rs with a one-instruction delay.
Jump and LinkRegister
JALR rs, rdJumps to the address contained in Register rs with a one-instruction delay.Stores the address of the instruction following the delay slot into Register rd .
Branch on Equal BEQ rs, rt, offsetBranches to the target address1 if the content of Register rs is equal to the con-tents of Register rt .
Branch on Not Equal BNE rs, rt, offsetBranches to the target address if the content of Register rs does not equal thecontents of Register rt .
Branch on Less Thanor Equal to Zero
BLEZ rs, offsetBranches to the target address if the content of Register rs is less than or equalto zero.
Branch on GreaterThan Zero
BGTZ rs, offsetBranches to the target address if the content of Register rs is greater than zero.
Branch on Less ThanZero
BLTZ rs, offsetBranches to the target address if the content of Register rs is less than zero.
Branch on Less Thanor Equal to Zero
BGEZ rs, offsetBranches to the target address if the content of Register rs is greater than orequal to zero.
(Sheet 1 of 2)
4-16 Instructions
4.7Branch LikelyInstructions
Branch Likely Instructions change the control flow of a program. AllBranch Likely Instructions occur with a one-instruction delay (the instruc-tion immediately following the branch is normally executed while thetarget instruction is being fetched from storage). However, if the condi-tional branch is not taken, the instruction in the branch delay slot isnullified.
Refer to Section 2.5, “Branch Delay Slot,” for a detailed discussion of thedelayed branch instructions.
Branches have 16-bit signed offsets relative to the program counter(I-type). Branch-and-link Instructions save a return address in Register31.
Branch on Less ThanZero and Link
BLTZAL rs, offsetStores the address of the instruction following the delay slot into Register r31(the Link Register). Branches to the target address if Register rs is less thanzero.
Branch on Less Thanor Equal to Zero andLink
BGEZAL rs, offsetStores the address of the instruction following the delay slot into Register r31(the Link Register). Branches to the target address if Register rs is greater thanor equal to zero.
Branch onCoprocessor z True
BCzT offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if Coprocessor z’s condi-tion line (BCPCONDPz signal) is true.
Branch onCoprocessor z False
BCzF offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if Coprocessor z’s condi-tion line (BCPCONDPz signal) is false.
1. All Branch Instruction target addresses are computed as follows: add the address of the instructionin the delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). All branchesoccur with a delay of one instruction.
Table 4.14 (Cont.)Jump and BranchInstruction Summary
Instruction Format and Description
(Sheet 2 of 2)
Branch Likely Instructions 4-17
Table 4.15 summarizes the CW400x Branch Likely Instructions. Theseinstructions are MIPS-II Instructions.
Table 4.15Branch LikelyInstruction Summary
Instruction Format and Description
Branch on EqualLikely
BEQL rs, rt, offsetBranches to the target address1 if the contents of Register rs is equal to thecontents of Register rt . If the conditional branch is not taken, the instruction inthe branch delay slot is nullified.
Branch on Not EqualLikely
BNEL rs, rt, offsetBranches to the target address if the contents of Register rs does not equal thecontents of Register rt . If the conditional branch is not taken, the instruction inthe branch delay slot is nullified.
Branch on Less Thanor Equal to ZeroLikely
BLEZL rs, offsetBranches to the target address if the contents of Register rs is less than orequal to zero. If the conditional branch is not taken, the instruction in the branchdelay slot is nullified.
Branch on GreaterThan Zero Likely
BGTZL rs, offsetBranches to the target address if the contents of Register rs is greater than zero.If the conditional branch is not taken, the instruction in the branch delay slot isnullified.
Branch on Less ThanZero Likely
BLTZL rs, offsetBranches to the target address if the contents of Register rs is less than zero.If the conditional branch is not taken, the instruction in the branch delay slot isnullified.
Branch on Less Thanor Equal to ZeroLikely
BGEZL rs, offsetBranches to the target address if the contents of Register rs is greater than orequal to zero. If the conditional branch is not taken, the instruction in the branchdelay slot is nullified.
Branch on Less ThanZero and Link Likely
BLTZALL rs, offsetStores the address of the instruction following the delay slot into Register r31(the Link Register). Branches to the target address if the contents of Register rsis less than zero. If the conditional branch is not taken, the instruction in thebranch delay slot is nullified.
Branch on Less Thanor Equal to Zero andLink Likely
BGEZALL rs, offsetStores the address of the instruction following the delay slot into Register r31(the Link Register). Branches to the target address if the contents of Register rsis greater than or equal to zero. If the conditional branch is not taken, the instruc-tion in the branch delay slot is nullified.
(Sheet 1 of 2)
4-18 Instructions
4.8Special ControlInstructions
Special Control Instructions cause an unconditional branch to thegeneral exception-handling vector. Special Control Instructions arealways R-type. Table 4.16 summarizes these instructions. These instruc-tions are MIPS-II Instructions.
Branch onCoprocessor z TrueLikely
BCzTL offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if Coprocessor z’s condi-tion line (BCPCONDPz signal) is true. If the conditional branch is not taken, theinstruction in the branch delay slot is nullified.
Branch onCoprocessor z FalseLikely
BCzFL offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if Coprocessor z’s condi-tion line (BCPCONDPz signal) is false. If the conditional branch is not taken, theinstruction in the branch delay slot is nullified.
1. All branch instruction target addresses are computed as follows: add the address of the instructionin the delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). All branchesoccur with a delay of one instruction.
Table 4.15 (Cont.)Branch LikelyInstruction Summary
Instruction Format and Description
(Sheet 2 of 2)
Table 4.16Special ControlInstruction Summary
Instruction Format and Description
System Call SYSCALLInitiates a system call trap and immediately transfers control to the ExceptionHandler.
Breakpoint BREAKInitiates a breakpoint trap and immediately transfers control to the ExceptionHandler.
Trap Instructions 4-19
4.9TrapInstructions
Trap Instructions cause the CW400x to trap to the Exception Handler, ifcertain test conditions are true. Table 4.17 summarizes the CW400x TrapInstructions.
Table 4.17Trap InstructionSummary
Instruction Format and Description
Trap on Equal TEQ rs, rtCompares content of Registers rs and rt . Traps if the content of Register rs isequal to the content of Register rt .
Trap on EqualImmediate
TEQI rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas signed 32-bit integers. Traps if the content of Register rs is equal to the sign-extended immediate .
Trap on Greater Thanor Equal
TGE rs, rtCompares the contents of Registers rs and rt as signed integers. Traps if thecontent of Register rs is greater than or equal to the content of Register rt .
Trap on Greater Thanor Equal Immediate
TGEI rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas signed 32-bit integers. Traps if the content of Register rs is greater than orequal to the sign-extended immediate .
Trap on Greater Thanor Equal ImmediateUnsigned
TGEIU rs, immediateCompares the 16-bit, sign-extended immediate with Register rs as unsigned32-bit integers. Traps if the content of Register rs is less than the sign-extendedimmediate .
Trap on Greater Thanor Equal Unsigned
TGEU rs, rtCompares the contents of Registers rs and rt as unsigned integers. Traps if thecontent of Register rs is greater than or equal to the content of Register rt .
Trap on Less Than TLT rs, rtCompares the contents of Registers rs and rt as signed integers. Traps if thecontent of Register rs is less than the content of Register rt .
Trap on Less ThanImmediate
TLTI rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas signed 32-bit integers. Traps if the content of Register rs is less than the sign-extended immediate .
Trap on Less ThanImmediate Unsigned
TLTIU rs, immediateCompares the 16-bit, sign-extended immediate with the content of Register rsas unsigned 32-bit integers. Traps if the content of Register rs is less than thesign-extended immediate .
Trap on Less ThanUnsigned
TLTU rs, rtCompares the content of Registers rs and rt as unsigned integers. Traps if thecontent of Register rs is less than the content of Register rt .
4-20 Instructions
4.10CoprocessorInstructions
For Coprocessor 3 to 1 Instructions, users need to make sure that thecorresponding Coprocessor Usable Bits, Cu[3:1], in the Status Registerare set. If the coprocessors are not enabled, the corresponding copro-cessor instructions will cause a Coprocessor Unusable (CpU) Exception.This also applies to Coprocessor 0, except if the processor is in KernelMode, the Cu0 Bit does not matter. Also note that the LWC0 and SWC0will cause an RI Exception.
Coprocessor Branch Instructions are J-type. Table 4.18 summarizes thedifferent Coprocessor Instructions.
Table 4.18Coprocessor InstructionSummary
Instruction Format and Description
Load Word toCoprocessor
LWCz rt, offset(base)Sign-extends the 16-bit offset and adds this value to the content of Registerbase to create an address. Loads the content of addressed word into Registerrt of Coprocessor z.
Store Word fromCoprocessor
SWCz rt, offset(base)Sign-extends the 16-bit offset and adds this value the to the content of Registerbase to create an address. Stores the content of Register rt from Coprocessorz to the addressed word.
Move to Coprocessor MTCz rt, rdMoves content of CW400x Register rt into Register rd of Coprocessor z.
Move fromCoprocessor
MFCz rt, rdMoves the content of Register rd of Coprocessor z into CW400x Register rt .
Move Control toCoprocessor
CTCz rt, rdMoves the content of CW400x Register rt into Control Register rd of Coproces-sor z.
Move Control fromCoprocessor
CFCz rt, rdMoves the content of Control Register rd of Coprocessor z into CW400x Reg-ister rt .
CoprocessorOperation
COPz cofunCoprocessor z performs the user defined coprocessor function cofun . TheCW400x’s state is not modified.
Branch onCoprocessor z True
BCzT offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if coprocessor z’s conditionline (BCP_CONDPz signal) is true.
(Sheet 1 of 2)
System Control Coprocessor (CP0) Instructions 4-21
4.11System ControlCoprocessor(CP0)Instructions
Coprocessor 0 Instructions perform operations on the System ControlCoprocessor (CP0) Registers to manipulate the memory managementand exception-handling facilities of the processor. Table 4.19 summarizesthe CP0 Instructions.
The CW400x and the MR400x treat TLB Access Instructions as NOPs.
When in User Mode, if the Cu0 Bit in the Status Register is set to zero,the CW400x takes a Coprocessor Unusable Exception if it decodes aRFE, MTC0, or MFC0 Instruction.
Branch onCoprocessor z False
BCzF offsetComputes a branch target address by adding the address of the instruction tothe 16-bit offset (shifted left two bits and sign-extended to 32 bits). Branchesto the target address (with a delay of one instruction) if coprocessor z’s conditionline (BCPCONDPz signal) is false.
Branch onCoprocessor z TrueLikely
BCzTL offsetSee Table 4.15, “Branch Likely Instruction Summary.”
Branch onCoprocessor z FalseLikely
BCzFL offsetSee Table 4.15, “Branch Likely Instruction Summary.”
Table 4.18 (Cont.)Coprocessor InstructionSummary
Instruction Format and Description
(Sheet 2 of 2)
Table 4.19CP0 InstructionSummary
Instruction Format and Description
Move to CP0 MTC0 rt, rdLoads the content of the CW400x rt into the CP0 Register rd .
Move from CP0 MFC0 rt, rdLoads the content of the CP0 Register rd into the CW400x rt .
Restore fromException
RFERestores the previous interrupt mask and mode bits of the Status Register intothe current status bits. Restores the old status bits into the previous status bits.
Wait for Interrupt WAITIStops execution of instructions and places the processor into a power save con-dition until a hardware interrupt or reset is received.
4-22 Instructions
Figure 4.5 shows the waveforms for the WAITI Instruction.
Figure 4.5WAITI InstructionWaveforms
The opcode encoding for the WAITI Instruction is shown in Tables 4.2and 4.7.
BINTP0
CRUN_INN
CWAITIP
WAITI - X1
ADDRP[31:0]
WAITI - IF WAITI - WB
PCLK
CRUN_OUTP
Exception Vector
5-1
Chapter 5Exception Processing(CP0)
This chapter describes exception processing in the MiniRISC CW400xMicroprocessor Core. The System Control Coprocessor (CP0) processesexceptions.
This chapter contains the following sections:
♦ Section 5.1, “Exception Handling Registers”
♦ Section 5.2, “Exception Processing”
♦ Section 5.3, “Exception Description Details”
5.1ExceptionHandlingRegisters
Table 5.1 shows the registers that the CW400x uses to handle excep-tions. During exception processing, software can examine these registersto determine the cause of an exception and the state of the CW400x.
The Translation Lookaside Buffer (TLB) Registers (EntryHi, EntryLo,Index, Random, BadVA, and Context) are used only when an MMU isattached to the CW400x. These registers are not implemented as part ofthe basic CW400x Microprocessor Core. However, the CW400x CBusInterface does include hooks to attach a TLB as an external part ofCoprocessor 0. The user accesses the TLB Registers as if they were inthe CW400x. The CW400x maps these accesses to the registers in theMMU.
The CW400x registers are described in detail in the following subsec-tions. The TLB Registers are described in detail in the MiniRISC BuildingBlocks Technical Manual.
5-2 Exception Processing (CP0)
Table 5.1Exception-ProcessingRegisterAddresses
5.1.1Status Register(R12)
The format of the CW400x Status Register is similar to the R3000 StatusRegister, except the CW400x Status Register does not contain the PE(Parity Error), CM (Cache Miss), PZ (Parity Zero), SwC (Swap Caches),or the IsC (Isolate Cache) Bits (Bits 16-20). These fields are definedunusable (value: 0). However, the functionality of the register and theremaining fields is the same as the R3000.
The Status Register contains all major status bits for exception condi-tions. All defined bits in the Status Register, with the exception of the TS(TLB Shutdown) Bit, are readable and writable; the TS Bit is read-only.Additional details on the function of each Status Register bit are providedin the paragraphs that follow.
Figure 5.1 shows the format of the Status Register. Upon reset, BEV =1, and KUc = 0, and IEc = 0; all other bits of this register are undefined.
RegisterAddress Register Name
0 Index1
1. Used only when an MMU is attached, and isdescribed in the MiniRISC Building BlocksTechnical Manual.
1 Random1
2 EntryLo1
4 Context1
8 Bad Virtual Address1
10 EntryHi1
12 Status
13 Cause
14 Exception Program Counter
15 Processor Revision Identifier
Exception Handling Registers 5-3
Figure 5.1Status Register
Cu[3:0] Coprocessor Usability Bits [31:28]Software sets the corresponding bits of Cu[3:0] to one toindicate that the associated coprocessor is usable. Bit 31corresponds to Coprocessor 3 and Bit 28 corresponds toCoprocessor 0. When a coprocessor instruction refer-ences a disabled coprocessor, it causes a CoprocessorUnusable Exception (CpU). Note that the System ControlCoprocessor (CP0) is always considered usable whenthe CW400x is operating in kernel mode, regardless ofthe setting of the Cu0 Bit.
R Reserved [ 27:23], [20:16], [7:6]These bits are reserved and read as zero. The CW400xignores attempts to set these bits; however, softwareshould write these bits as zero to ensure compatibilitywith future versions of hardware.
BEV Bootstrap Exception Vector 22This bit selects between two destination addresses forexceptions.
The BEV Bit controls the location of Exception Vectorsduring bootstrap (immediately following reset). When thisbit is set to zero, the Normal Exception Vector locationsare used; when the bit is set to one, Bootstrap ExceptionVector locations are used.
BEV set to Zero – The UTLB Miss Exception Vector islocated at 0x80000000, and the General Exception Vec-tor is located at 0x80000080.
BEV set to One – The UTLB Miss Exception Vector isrelocated to an address of 0xBFC00100, and the GeneralException Vector is relocated to 0xBFC00180. This alter-nate set of vectors can be used when diagnostic testscause exceptions to occur prior to verification of properoperation of the cache and main memory system.
The CW400x sets this bit to one upon deassertion of theReset Signal.
31 28 27 23 22 21 20 16 15 10 9 8 7 6 5 4 3 2 1 0
Cu[3:0] R BEV TS R Intr[5:0] Sw[1:0] R KUo IEo KUp IEp KUc IEc
5-4 Exception Processing (CP0)
TS TLB Shutdown 21This bit indicates that the TLB has shut down due to anattempt to access several TLB entries at the same time.This bit is read-only.
Intr[5:0] Hardware Interrupt Enables Mask [15:10]Software sets these six bits to one to enable the corre-sponding hardware interrupts. Bit 15 corresponds toINT5, and Bit 10 corresponds to INT0. All interrupts canbe disabled by clearing the Interrupt Enable Bit (IEc)described below.
Sw[1:0] Software Interrupt Enables Mask [9:8]Software sets these two bits to one to enable the corre-sponding software interrupts. All interrupts can be dis-abled by clearing the Interrupt Enable Bit (IEc) describedbelow.
KUo, p, c Kernel/User Mode, Old/Previous/Current 5, 3, 1The KUo, KUp, and KUc bits comprise a three-level stackshowing the old/previous/current mode (zero means ker-nel; one means user). The occurrence of an exceptionautomatically puts the system in Kernel Mode. Manipula-tion and use of these bits during exception processing isdescribed in Section 5.2.2, “Status Register Mode Bitsand Exception Processing.”
IEo, p, c Interrupt Enable, Old/Previous/Current 4, 2, 0The IEo, IEp, and IEc Bits comprise a 3-level stack show-ing the old/previous/current interrupt enable settings(zero means disabled; one means enabled). Manipulationand use of these bits during exception processing isdescribed in Section 5.2.2, “Status Register Mode Bitsand Exception Processing.”
5.1.2Cause Register(R13)
The format of the Cause Register is the same in the CW400x as in theR3000. The only difference is the way the CW400x sets the BD (BranchDelay) Bit. The CW400x sets the BD Bit only when an exception occursduring the execution of the instruction in the branch delay slot and thebranch is taken. If the branch is not taken then the CW400x will not setthe BD Bit, even if an exception occurs during the delay slot.
The contents of the Cause Register describe the last occurring excep-tion. A four-bit exception code (ExcCode) indicates the cause of the
Exception Handling Registers 5-5
exception. The remaining bit fields contain detailed information specificto certain exceptions. With the exception of the SI[1:0] Bits, all bits in theregister are read-only. Writes to the SI[1:0] Bits set or reset softwareinterrupts. The description also lists and briefly describes all possibleexception causes. All bits in this register are undefined on reset.
Figure 5.2 shows the format of the Cause Register. Upon reset, the con-tent of this register is undefined.
Figure 5.2Cause Register
BD Branch Delay 31The CW400x sets this bit to one to indicate that the lastexception was taken while executing in a branch delayslot and the branch was taken. (Differs from the R3000and 4000)
R Reserved 30, [27:16], [7:6], [1:0]These bits are reserved and read as zero. The CW400xignores attempts to set these bits; however, softwareshould write these bits as zero to ensure compatibilitywith future versions of hardware.
CE Coprocessor Error [29:28]When taking a Coprocessor Unusable Exception, theCW400x writes the referenced coprocessor number inthis field. This field is otherwise undefined.
IP[5:0] Interrupt Pending [15:10]The CW400x sets these bits to indicate that an externalinterrupt is pending. Bit 15 corresponds to Interrupt 5 andBit 10 corresponds to Interrupt 0. For MIPS compatibility,the Interrupt Pending Bits should be attached to Copro-cessors as follows:
31 30 29 28 27 16 15 10 9 8 7 6 5 2 1 0
BD R CE Reserved IP[5:0] SI[1:0] R ExcCode R
5-6 Exception Processing (CP0)
The system designer can attach Interrupts 0 and 1 anyway he wants.
SI[1:0] Software Interrupts [9:8]By setting either of these bits to one, software causes theCW400x to transfer control to the general exceptionroutine. The exception routine can tell which softwareinterrupt bit is set (pending) by reading this field. Theexception routine must reset the SI[1:0] Bits to zerobefore returning control to the interrupting software.
ExcCode Exception Code [5:2]The CW400x sets this field to indicate the type of eventthat caused the last general exception. The four bits areencoded as described in the table below. For more detailsee Table 5.2.
15 1011121314
Coprocessor 0, Interrupt 2
Coprocessor 1, Interrupt 3, FPU
Coprocessor 2, Interrupt 4
Coprocessor 3, Interrupt 5
[5:2] Mnemonic Description
0x0 Int Interrupt
0x1 TLBMOD TLB Modification Exception
0x2 TLBL TLB Miss Exception, Load or Instruction
0x3 TLBS TLB Miss Exception, Store
0x4 AdEL Address Error Exception, Load or Instruction
0x5 AdES Address Error Exception, Store
0x6 IBE Bus Error Exception, Instruction Fetch
0x7 DBE Bus Error Exception, Data Load or Store
0x8 Sys System Call Exception (SYSCALL Instr.)
0x9 Bp Breakpoint Exception
0xA RI Reserved Instruction Exception
0xB CpU Coprocessor Unusable Exception
0xC Ovf Arithmetic Overflow Exception
0xD Tr Trap Exception
0xE Reserved
0xF Reserved
Exception Handling Registers 5-7
5.1.3ExceptionProgramCounter (EPC)Register (R14)
The 32-bit, read-only Exception Program Counter (EPC) Registercontains the address of the instruction that caused the exception.However, when the exception instruction resides in a branch delay slotand the branch is taken, the CW400x sets the Cause Register BD Bitand places the address of the immediately preceding branch or jumpinstruction into the EPC Register.
The EPC Register behaves like the R3000 EPC Register except whenan exception occurs in the branch delay slot and the branch is not taken.In this case, the EPC Register points to the instruction causing theexception, even if it is in the delay slot. The R3000 EPC Register alwaysreflects the branch instruction address when the delay slot contains theexception-causing instruction, no matter if the branch was taken or not.
Figure 5.3 shows the format of the EPC Register. Upon reset, the contentof this register is undefined.
Figure 5.3EPC Register
EPC Virtual Address [31:0]This register contains the Virtual Address of the excep-tion-causing instruction or the address of the immediatelypreceding branch or jump instruction.
5.1.4ProcessorRevisionIdentifier (PRId)Register (R15)
This register contains information that identifies the implementation andrevision level of the processor. The format is the same as the R3000. Itshould be noted that the user should not depend on this field to identifythe revision of any MiniRISC microprocessor.
The PRId Register is read-only. The lowest four bits of each field areinputs into the CW400x (CLOIDP[3:0] and CLOPRP[3:0]) and arehardwired to a defined value.
The Processor Revision Identifier (PRId) Register contains informationthat identifies the implementation and revision level of the processor andsystem control coprocessor.
31 0
EPC
5-8 Exception Processing (CP0)
The revision number distinguishes some chip revisions. However, LSILogic is free to change this register at any time and does not guaranteethat changes to its chips will necessarily change the revision number orthat changes to the revision number necessarily reflect real chipchanges. For this reason, software should not rely on the revisionnumber to characterize the chip.
Figure 5.4 shows the format of the PRId Register. Upon reset, thecontent of this register is 0x00001000.
Figure 5.4PRId Register
R Reserved [31:16]These bits are reserved and read as zero. The CW400xignores attempts to set these bits.
IMP Implementation [15:8]This eight-bit field contains the CW400x’s implementationnumber. Bits [15:12] are hardwired to 00012. TheCLOIDP[3:0] inputs drive Bits [11:8].
REV Revision [7:0]This eight-bit field contains the CW400x’s revisionnumber. Bits [7:4] are hardwired to 00002. TheCLOPRP[3:0] inputs drive Bits [3:0].
31 16 15 8 7 0
R IMP REV
Exception Processing 5-9
5.2ExceptionProcessing
Table 5.2 lists and describes CW400x supported exceptions.
Table 5.2CW400xExceptions
When an exception occurs, the CW400x aborts the current instructionand all instructions following in the pipeline that have already begunexecution. The exception puts the system in kernel mode. The CW400xsets the ExcCode in the Cause Register (see Section 5.1.2, “Cause Reg-ister (R13)”). The CW400x jumps directly into a designated exceptionhandler routine. The CW400x loads the Exception Program Counter(EPC) Register with an appropriate restart location where execution mayresume after the exception is serviced. The restart location in the EPCis the address of the instruction that caused the exception or, if theinstruction was executing in a branch delay slot and the branch is taken,
Exception Description
Reset Assertion of the Reset Signal causes an exception thattransfers control to the Special Vector at virtual address0xBFC00000.
User TLB Miss A reference is made to a page in kuseg that has nomatching TLB entry.
TLB Miss A referenced TLB entry’s Valid Bit is not set or a referenceis made to the kseg2 page that has no matching TLB Entry.
TLB Modified During a store, the valid bit is set but the Dirty Bit is not setin the referenced TLB Entry.
Bus Error Assertion of the Bus Error Signal.
Address Error Attempt to load, fetch, or store an unaligned word, or refer-ence to a virtual address with the most significant bit setwhile in User Mode.
Overflow Two’s complement overflow during add or subtract.
System Call Execution of the SYSCALL Instruction.
Breakpoint Execution of the BREAK Instruction.
ReservedInstruction
Execution of an instruction with undefined opcode fields.
CoprocessorUnusable
Execution of a coprocessor instruction when theappropriate Cu Bit is not set.
Interrupt Assertion of one of the six hardware interrupt inputs orsetting one of the two software interrupt bits in the CauseRegister.
Trap Execution of a Trap Instruction with a true condition.
5-10 Exception Processing (CP0)
the address of the branch instruction immediately preceding the delayslot. Even though the processor is pipelined, exceptions are reported inthe order they occur, so all exceptions for the current instruction arereported prior to exceptions for successive instructions. The characteris-tics of the machine’s pipeline staging, however, cannot guarantee that allprocessor and associated system states will remain completelyunchanged as a result of the (possibly incomplete) execution of theinstruction immediately following an instruction that causes an exception.Examples of these state changes include:
♦ Instructions may have been read from memory and loaded into theI-Cache.
♦ The cache may have been updated in response to a bus error on acacheable, memory write operation.
The above events can normally be ignored because enough of themachine’s state is restored so that execution always resumes properlyafter servicing the exception.
This subsection describes the CW400x’s exception handlingmechanisms, the System Control Coprocessor (CP0) Registers, and allevents that cause exceptions.
The CW400x is always in one of two operating modes: normal or excep-tion. In the normal operating mode, the CW400x executes the program-specified sequence of instructions. In the exception mode, the normalsequence of instruction execution is suspended to allow the CW400x torespond to abnormal or asynchronous events. The CW400x’s exception-handling system efficiently manages machine exceptions, including arith-metic overflows, I/O interrupts, and system calls.
Exception causes are the same for the CW400x and the R4000, but theCW400x implements the Exception Registers differently than the R4000.The CW400x has all the same registers as the R4000 but not all thesame register fields.
The only functional difference in exception handling is the implementationof the BD Bit in the Cause Register and the behavior of the EPC Regis-ter. The CW400x sets the BD Bit only if the branch is taken and anexception occurs in the delay slot. The EPC Register will then containthe address of the branch, not the exception-causing instruction’s
Exception Processing 5-11
address. Otherwise, the CW400x does not set the BD Bit and the EPCRegister contains the address of the exception-causing instruction.
Each MiniRISC exception - its cause, handling and servicing is identicalto the R4000, with the special case of an exception occurring in thebranch delay slot (see Section 2.5, “Branch Delay Slot”).
Each exception is classified into the stage where the exception isacknowledged. For all IF Exceptions, the Instruction Fetch is invalidatedand in the next run cycle (a clock cycle in which the CW400x is running)the exception is taken. For X1 and X2 Exceptions, the CW400x takes theexception in the same cycle the exception is signaled (the InternalException Taken Signal is asserted). For WB Exceptions, the CW400xtakes the exception in the next run cycle.
5.2.1ExceptionVectorLocations
Table 5.3 shows the three different addresses the CW400x uses forexception vectors.
If the BEV (Bootstrap Exception Vector) Bit in the Status Register is setto one, the UTLB Miss Exception Vector address is changed to0xBFC00100, and the General Exception Vector is changed to0xBFC00180 while the Reset Vector remains unchanged.
Table 5.3Exception VectorLocations
5.2.2Status RegisterMode Bits andExceptionProcessing
When the CW400x responds to an exception, it saves the current Ker-nel/User Mode (KUc) and current Interrupt Enable Mode (IEc) Bits of theStatus Register into the previous Mode Bits (KUp and IEp). It saves theprevious Mode Bits (KUp and IEp) into the old Mode Bits (KUo and IEo).It clears the current mode bits (KUc and IEc) to cause the processor toenter the kernel operating mode and to disable all interrupts.
This three-level set of mode bits lets the CW400x respond to two levelsof exceptions before software must save the contents of the StatusRegister. Figure 5.5 shows how the CW400x manipulates the StatusRegister during exception recognition.
Exception VectorNormalLocation
BootstrapLocation
Reset 0xBFC00000 0xBFC00000
UTLB Miss 0x80000000 0xBFC00100
General 0x80000080 0xBFC00180
5-12 Exception Processing (CP0)
Figure 5.5Status RegisterChanges DuringExceptionRecognition
After an exception handler has completed execution, the CW400x mustreturn to the system context that existed prior to the exception (if possi-ble). The Restore From Exception (RFE) Instruction provides the mech-anism for this return.
The RFE Instruction restores control to a process that was preempted byan exception. When the RFE instruction is executed, it restores the pre-vious Interrupt Mask (IEp) Bit and Kernel/User Mode (KUp) Bit in the Sta-tus Register into the corresponding current Status Bits (IEc and KUc). Italso restores the old Status Bits (IEo and KUo) into the correspondingprevious status bits (IEp and KUp). The old status bits (IEo and KUo)remain unchanged. Figure 5.6 illustrates the actions of the RFEInstruction.
Figure 5.6Restoring Controlfrom Exceptions(RFE Instruction)
5.2.3System ControlCoprocessor(CP0) Function
The CP0 generates the Kill Signals needed by the CW400x and periph-erals for instruction cancellation in the case of exceptions. The CP0 pro-cesses the exceptions detected by the CW400x and peripherals byupdating the Exception Handling Registers to reflect the state of theexception. The CP0 contains the four registers that are important inexception processing: the Status Register, the Cause Register, the EPCRegister, and the Processor Revision Identification Register. After reset,
6 5 4 3 2 1 0
IEo KUp IEp KUc IEcKUo
IEo KUp IEp KUc IEcKUo
0 0
StatusRegister
StatusRegister
ExceptionRecognition
6 5 4 3 2 1 0
IEo KUp IEp KUc IEcKUo
IEo KUp IEp KUc IEcKUo
Status
Return FromException
Register
StatusRegister
Exception Processing 5-13
in the Reset Exception Handler, the software should initialize these reg-isters since most of the fields come up undefined.
5.2.4RegisterAccesses
The only way to access the CP0 Registers is by using the Move Fromand To Coprocessor Zero Instructions, MFC0 and MTC0. Table 5.4shows the register numbers for the CP0 registers.
Table 5.4CP0 RegisterAddresses
The transaction protocols initiated by these commands do not resemblethe MFC/MTCs for external coprocessors, because the CP0 is not anexternal coprocessor. The CP0 is integrated into the CW400x, allowingdirect access to the internal data flow.
5.2.5ExceptionHandling
The conditions that cause the instruction flow to deviate from the normalflow of execution are called exceptions. If two exceptions occur simulta-neously, the one with the higher priority takes precedence and is ser-viced. The Cause and the EPC Registers will reflect the exception withthe higher priority. Table 5.5 lists the specific exception conditions in hier-archical order from highest to lowest priority.
Exceptions cause the CW400x to update the Status, Cause, and EPCRegisters and jump instruction flow to an Exception Vector. The CW400xalso generates the appropriate Kill (Instruction Invalidate) Signals,CKILLMEMP, CKILLXP, and CKILLWP. CKILLMEMP is used to kill exter-nal memory transactions, CKILLXP is used to kill the instruction in theExecute (X) Stage, and CKILLWP is used to kill the instruction in theWriteback Stage.
RegisterNumber Register Name
R12 Status
R13 Cause
R14 Exception Program Counter
R15 Processor Revision Identifier
5-14 Exception Processing (CP0)
Table 5.5Exception Priority
Some exceptions have priority over others when simultaneous excep-tions occur. For example, the instruction in the X Stage is the BREAKInstruction, and in the same run cycle an external interrupt is signalled.The BREAK Exception will be serviced before the interrupt, since it is ahigher priority exception.
Figure 5.7 shows typical pipeline flow.
Figure 5.7Typical PipelineFlow
5.2.5.1 Kill (Instruction Invalidate) Signals
Asserting the CKILLXP Signal invalidates the X Stage and Asserting theCKILLWP Signal invalidates the WB Stage. With the exception of inter-rupts and the Branch Likely Instructions, branch not taken, the CW400xasserts CKILLMEMP to kill the current memory transaction (invalidatethe current instruction). The CW400x does not assert CKILLMEMP dur-ing interrupts and Branch Likely Instructions, branch not taken, becauseof possible data dependencies caused by load scheduling. The Kill Sig-nals are only valid on the rising edge of the clock. In the following figures,the Run Signal is sometimes shown to be continuously LOW, which is
Priority ExceptionStageServiced
Reset –
Trap, Overflow, Data Bus Error WB
Data Address Error X2
Data TLB Miss/TLB Miss User X2
TLB Modify X2
Instruction Bus Error X1
SYSCALL/BREAK/TRAP/Reserved Instruction X1
Coprocessor Unusable X1
Interrupt X1
Instruction Address Error IF
Instruction TLB Miss/TLB Miss User IF
Low
est
Hig
hest
IF X1
IF WBX
X1 X2 WB
IF X WB
Exception Processing 5-15
seldom true. When the processor stalls, the CW400x signals areextended until the next run cycle (the Kill Signals are asserted for adefined number of run cycles). Depending on the stall mix, the total num-ber of cycles the signals are asserted will vary with the number of stallcycles.
Figure 5.8 shows the appropriate Kill Signals and their timing in respectto the detection of a Branch Likely Instruction which was not taken, whichoccurs in the X1 Stage.
Figure 5.8Branch Likely,Branch Not Taken(X1 Stage)
5.2.5.2 General Exceptions
Figures 5.9 through 5.11 show examples of the Kill Signals associatedwith exceptions occurring in different stages.
PCLKP
CRUN_INN
CKILLXP
CKILLWP
CKILLMEMP
X1 WBIF (BLTZL)X1 WBIF (Add Instruction Killed)
5-16 Exception Processing (CP0)
Figure 5.9, the waveform for the System Call Exception (SYSCALLInstruction), illustrates how the CW400x behaves in any X1 Stage Excep-tion (shows the instruction invalidate sequence for exceptions during theX1 Stage), except for an Instruction Bus Error (see Figure 5.15). (Seealso Section 5.3.9, “System Call Exception.”)
Figure 5.9X1 StageException (SystemCall)
Figure 5.10, the waveform for the Overflow Exception, is a general wave-form for all exceptions that are signalled in the WB Stage, except for aData Bus Error (see Figure 5.16). (See also Section 5.3.6, “OverflowException.”)
Figure 5.10WB StageException(Overflow)
Figure 5.11 shows the Kill Waveform for an exception signalled in the IFStage. Even though an exception is signalled in the IF Stage, theCW400x does not assert any Kill Signal or Exception Taken Signal until
PCLKP
CRUN_INN
CKILLXP
CKILLWP
X1 WB
CKILLMEMP
EXCEPT_DETECT1
(Internal)
1. EXCEPT_DETECT is an exception detected by the MMU, theALU, a decode, or another external device.
PCLKP
CRUN_INN
CKILLXP
CKILLWP
CKILLMEMP
EXCEPT_DETECT1
(Internal)
1. EXCEPT_DETECT is an exception detected by the MMU, theALU, a decode, or another external device.
Exception Processing 5-17
the following X Stage, except for CKILLMEMP, which is used to kill theInstruction Fetch that causes the exception.
Figure 5.11IF Stage Exception(TLB Miss,Instruction)
Figure 5.12 shows the Kill Waveform for a Reset Exception.The Reset isspecial in that the Kill Signal protocols do not fit into the other threecategories (IF, X, and WB).
Figure 5.12Reset Exception(Special Case)
PCLKP
CRUN_INN
CKILLXP
CKILLWP
IF X1
CKILLMEMP
WB
EXCEPT_DETECT1
(Internal)
1. EXCEPT_DETECT is an exception detected by the MMU, theALU, a decode, or another external device.
PCLKP
CRUN_INN
BCPU_RESETN
CKILLXP
CKILLWP
CKILLMEMP
5-18 Exception Processing (CP0)
Figure 5.13 shows the instruction validation protocol for an exception sig-nalled in the X2 Stage. The Kill Signals, CKILLMEMP and CKILLXP, arevalid on the rising edge at the end of the X2 Stage. These signals areintended to kill the X2 Stage of the exception-causing instruction.
Figure 5.13X2 StageException (TLBMiss, Data Load)
Figure 5.14 shows an interrupt exception signalled in the X2 Stage of aninstruction. The CW400x defers the handling of the interrupt exception tothe next instruction’s Execute Stage, and consequentially the EPC Reg-ister reflects the address of the instruction in the following Execute Stage.Interrupts are never serviced in the X2 Stage. Even when the interrupt isasserted, the exception is not serviced until the following X Stage (X1).
The Interrupt Exception is discussed further in Section 5.2.5.3, “InterruptProcessing.”
Figure 5.14External InterruptSignalled DuringX2 Stage
PCLKP
CRUN_INN
EXCEPT_DETECT1
CKILLXP
CKILLWP
(Internal)
IF X1
CKILLMEMP
X2 WB
1. EXCEPT_DETECT is an exception detected by the MMU, theALU, a decode, or another external device.
X1 X2 WBX2
PCLKP
CRUN_INN
BINTP[5:0]
CKILLXP
CKILLWP
CKILLMEMP
Exception Processing 5-19
There are two exceptions to the general waveforms mentioned in Figures5.13 and 5.14. Bus errors need not be held to the next run cycle in orderto be acknowledged, unlike the other exception signals. So, as long theBus Error Signal is held asserted during the rising edge of the clock, theCW400x will acknowledge it (assuming no higher priority simultaneousexception). Like the IF Exceptions, the CW400x forwards Bus ErrorExceptions into the next stage from when they were asserted (seeFigures 5.15 and 5.16).
Figure 5.15Instruction BusError, (X1 Stage)
Figure 5.16Data Bus Error,(WB Stage)
An interesting case occurs when an IF Exception is signaled and CKILL-MEMP has to be asserted twice. This situation happens when the previ-ous instruction is external memory transaction (LOAD, STORE, MTC,MFC). The CW400x deasserts CKILLMEMP during the memory transac-tions’s X2 Stage to allow the read or write to take place. In the next runcycle, the CW400x asserts the Kill Signals as usual (see Figure 5.17).
PCLKP
CRUN_INN
BBEP
CKILLXP
CKILLWP
IF IF
CKILLMEMP
X1 WB
X2 X2 WB
PCLKP
CRUN_INN
BBEP
CKILLXP
CKILLWP
CKILLMEMP
5-20 Exception Processing (CP0)
Figure 5.17MultipleCKILLMEMPAssertion
5.2.5.3 Interrupt Processing
The CW400x has eight interrupt inputs (six external hardware pins, andtwo software bits in the Cause Register). When the CW400x detects aninterrupt, the CW400x generates an exception and asserts the appropri-ate Kill (Instruction Invalidate) Signals. The CW400x always grants theinterrupt except when the specific interrupt is disabled or when a higherpriority exception occurs simultaneously.
In case of a simultaneous interrupt and a non-interrupt exception, theinterrupt has priority over Instruction Address Error and Instruction TLBMiss. The other exceptions have priority over the interrupt during simul-taneous exception signalling.
Even though an Address Error and an Interrupt can happen simulta-neously, the interrupt has precedence. For interrupts, the CW400xasserts the Kill Signals and asserts the Interrupt Grant Signal, CINTGRP.
FIgures 5.18 and 5.19 show two cases of an external coprocessor, in thiscase a Floating Point Unit, asserting an interrupt. In Figure 5.18 theCW400x does not grant the interrupt, because a simultaneous exception(overflow) occurred in the Writeback Stage of the previous instruction.Since the overflow occurs in a instruction further along in the pipeline, ittakes precedence over the external interrupt and is serviced accordingly.
IF X1 WBX2 WBX1
PCLKP
CRUN_INN
EXCEPT_DETECT1
CKILLXP
CKILLWP
CKILLMEMP
(Exception-Causing(Load or Store)
(Internal)
1. EXCEPT_DETECT is an exception detected by the MMU,the ALU, a decode, or another external device. In this casethe exception is an Instruction TLB Miss.
Instruction)
Exception Processing 5-21
Figure 5.18ExternalCoprocessor (FPU)Interrupt (InterruptNot Taken)
PCLKP
CRUN_INN
CKILLXP
CKILLWP
CKILLMEMP
X WBX WBIF
(Overflow)(Floating Point Interrupt)
CINTGRP
BINTP32
EXCEPT_DETECT1
(Internal)
1. EXCEPT_DETECT is an exception detected by the MMU, theALU, a decode, or another external device. In this case theexception is an Arithmetic Overflow.
2. FPU Interrupt
5-22 Exception Processing (CP0)
In Figure 5.19, two simultaneous exceptions occur: an Address Error inthe IF Stage of a following instruction and an interrupt which occurs inthe X Stage of the instruction. The interrupt takes precedence, since itoccurs in an instruction that is further in the pipeline. The CW400xasserts CINTGRP to acknowledge the interrupt, and the appropriate sig-nals are asserted and values written to the Exception Handling Registersto reflect the taken interrupt. Notice that CKILLMEMP is not asserted forinterrupts. This is the only exception that does not cause CKILLMEMPto be asserted.
Figure 5.19ExternalCoprocessor (FPU)Interrupt (InterruptTaken)
Since interrupts can occur at any time, memory transactions may beerroneously killed. In the case of load scheduling, the load data can beserviced in any stage and is not killed by an interrupt since the loadoccurred many instructions before the interrupt was generated. Interruptsare not acknowledged during an instruction’s X2 Stage to prevent erro-neous memory transaction invalidation (see Figure 5.14.)
Figure 5.20 shows an interrupt being signalled during the Branch LikelyDelay Slot Invalidation Cycle. The CW400x invalidates the instructionafter a Branch Likely, if the branch conditions were not met. The InterruptException will not be serviced until the X Stage of a valid instruction (thenext instruction following the invalidated one in the delay slot).
PCLKP
CRUN_INN
CKILLXP
CKILLWP
CKILLMEMP
CINTGRP
BINTP32
CADDR_ERRORP1
X WBX WBIF
(Floating Point Interrupt)(ADDR_ERR)
1. Address error during an Instruction Fetch.2. FPU Interrupt.
Exception Description Details 5-23
Figure 5.20Branch LikelyDelay SlotInvalidation
5.3ExceptionDescriptionDetails
This section describes each CW400x exception and how software shouldhandle the exception in detail. TLB Exceptions are described in theMiniRISC Building Blocks Technical Manual.
5.3.1Address ErrorException
5.3.1.1 Cause
The Address Error Exception occurs when the CW400x attempts to load,fetch, or store a word that is not aligned on a word boundary or attemptsto load or store a halfword that is not aligned on a halfword boundary.The exception also occurs in user mode if a reference is made to anaddress whose most-significant bit is set, indicating a kernel modeaddress. This exception is not maskable.
5.3.1.2 Handling
When an Address Error Exception occurs, the CW400x branches to theGeneral Exception Vector (0x80000080 or 0xBFC00180). The CW400xsets the AdEL or AdES Exception Code in the Cause Register ExcCodeField to indicate whether the address error occurred during an instructionfetch or a load operation (AdEL) or a store operation (AdES). TheCW400x saves the KUp, IEp, KUc, and IEc Bits of the Status Register
X WBX WBIF
PCLKP
CRUN_INN
CKILLXP
CKILLWP
CKILLMEMP
(BNEL)(Invalidated Instruction)
BINTPx1
CINTGRP
1. x = 0, 1, 2, 3, 4, or 5.
5-24 Exception Processing (CP0)
into the KUo, IEo, KUp, and IEp Bits, respectively, and clears the KUcand IEc Bits.
The EPC Register points to the instruction that caused the exception,unless the instruction is in a branch delay slot and the branch was taken.In that case, the EPC Register points to the branch instruction precedingthe exception-causing instruction and the CW400x sets the BD Bit of theCause Register.
If the system includes an MMU when this exception occurs, the BadVARegister contains the address that was either improperly aligned or thatimproperly addressed kernel data while in user mode.
5.3.1.3 Servicing
Kernel software should indicate a segmentation violation to the executingprocess. Such an error is usually fatal, although an alignment error mightbe handled by simulating the instruction that caused the error.
5.3.2BreakpointException
5.3.2.1 Cause
The Breakpoint Exception occurs when the CW400x executes theBREAK Instruction. This exception is not maskable.
5.3.2.2 Handling
When the Breakpoint Exception occurs, the CW400x branches to theGeneral Exception Vector (0x80000080 or 0xBFC00180) and sets the BPCode in the Cause Register ExcCode Field. The CW400x saves theKUp, IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo,KUp, and IEp Bits, respectively, and clears the KUc and IEc Bits.
The EPC Register points to the BREAK Instruction that caused theexception, unless the instruction is in a branch delay slot and the branchis taken. In that case, the EPC Register points to the branch instructionpreceding the BREAK and the CW400x sets the BD Bit of the CauseRegister.
5.3.2.3 Servicing
Kernel software should transfer control to the applicable system routine.Unused bits of the BREAK Instruction (Bits [25:6]) can be used to pass
Exception Description Details 5-25
additional information. These bits can be examined by loading the con-tents of the instruction pointed at by the EPC Register. If the BD Bit isset, a value of four must be added to the contents of the EPC Registerto locate the instruction.
To resume execution, the EPC Register must be changed so that theCW400x does not execute the BREAK Instruction again. A value of fourmust be added to the contents of the EPC Register before returning. Ifthe BD Bit is set, the branch instruction must be interpreted in order toresume execution.
5.3.3Bus ErrorException
5.3.3.1 Cause
The Bus Error Exception occurs when the external logic asserts the BusError Input, BBEP, to end an external memory transaction such as aninstruction fetch or store operation. Events such as a bus time-out, back-plane bus parity errors, and invalid physical memory addresses oraccess types should cause external logic to signal this exception. Thisexeption is not maskable.
For store transactions, the delay caused by the write buffer prevents theexception from being synchronous with the instruction stream. When anerror occurs for a scheduled load, the bus error is an asynchronousevent.
The following information is BBCC specific.
The CW400x can handle bus errors precisely (immediate response), butthe write buffer in the BBCC and load-scheduling support prevent it.
Except for stores and scheduled loads, the Bus Error Exception is con-sidered synchronous. Stores are considered asynchronous because thestore does not occur in its instruction's X2 Stage (since the store datagoes through the write buffer). Scheduled loads are also consideredasynchronous since they do not occur in the instruction's appropriate(X2) pipeline stage.
Bus errors for unscheduled loads and instruction fetches are both con-sidered synchronous, so Data Bus Error (DBE) and Instruction Bus Error(IBE) Codes are assigned to the respective bus errors. For asynchronousbus errors, the CW400x may assign either the DBE Code or the IBECode, since the scheduled load or buffered write can occur in any pipe-
5-26 Exception Processing (CP0)
line stage. If the scheduled load or buffered write occur in anotherinstruction's X2 Stage, the CW400x writes the DBE Code into the CauseRegister, otherwise it writes the IBE Code.
5.3.3.2 Handling
When a Bus Error Exception occurs, the CW400x branches to theGeneral Exception Vector (0x80000080 or 0xBFC00180). The CW400xsets the IBE or DBE Code in the Cause Register ExcCode Field to indi-cate whether the error occurred during an instruction fetch reference(IBE) or during a data load or store reference (DBE). The CW400x savesthe KUp, IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo,KUp, and IEp Bits, respectively, and clears the KUc and IEc Bits.
The EPC Register points to the instruction that was executing when theBus Error occurred, unless the instruction is in a branch delay slot andthe branch is taken. In that case, the EPC Register points to the branchinstruction preceding the exception-causing instruction and the CW400xsets the BD Bit of the Cause Register.
5.3.3.3 Servicing
The physical address where the fault occurred can be computed from theinformation in the CP0 Registers:
♦ If the Cause Register Exception Code is set to IBE (showing aninstruction fetch), the address is contained in the EPC Register.
♦ If the Cause Register Exception Code is set to DBE, a load or storeinstruction caused the exception. For load instructions, the addressof the instruction that caused the exception is contained in the EPCRegister (if the BD Bit of the Cause Register is set, add four to thecontents of the EPC Register). The address of the load referencecan then be obtained by interpreting the instruction.
5.3.4CoprocessorUnusableException
5.3.4.1 Cause
The Coprocessor Unusable Exception occurs when an attempt is madeto execute a coprocessor instruction in a corresponding coprocessor unitthat has not been marked usable (the appropriate Cu Bit in the StatusRegister has not been set). For CP0 Instructions, this exception occurswhen the unit has not been marked usable, and the process is executing
Exception Description Details 5-27
in user mode. CP0 is always usable from kernel mode regardless of thesetting of the Cu0 Bit in the Status Register. This exception is notmaskable.
5.3.4.2 Handling
When a Coprocessor Unusable Exception occurs, the CW400x branchesto the General Exception Vector (0x80000080 or 0xBFC00180) and setsthe CpU Code in the Cause Register ExcCode Field. The CW400x savesthe KUp, IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo,KUp, and IEp Bits, respectively, and clears the KUc and IEc Bits.
Only one coprocessor can fail at a time. The contents of the CauseRegister CE (Coprocessor Error) Field show which of the four coproces-sors (0, 1, 2, or 3) the CW400x referenced when the exception occurred.
The EPC Register points to the coprocessor instruction that caused theexception unless the instruction is in a branch delay slot and the branchis taken. In that case, the EPC Register points to the branch instructionthat preceded the coprocessor instruction and the CW400x sets the BDBit of the Cause Register.
5.3.4.3 Servicing
Software can identify the coprocessor unit that was referenced by exam-ining the contents of the Cause Register CE Field. If the process is enti-tled access to the coprocessor, the coprocessor is marked usable, andthe corresponding user state is restored to the coprocessor.
If the process is entitled access to the coprocessor, but the coprocessoris known not to exist or to have failed, the system could interpret thecoprocessor instruction. If the BD Bit is set in the Cause Register, thebranch instruction must be interpreted; then the coprocessor instructioncould be emulated with the EPC Register advanced past the coprocessorinstruction.
If the process is not entitled access to the coprocessor, the processexecuting at the time should be handed as an illegal instruction/privilegedinstruction fault signal. Such an error is usually fatal.
5-28 Exception Processing (CP0)
5.3.5InterruptException
5.3.5.1 Cause
The Interrupt Exception occurs when one of eight interrupt conditions(software generates two, hardware generates six) is asserted. The sig-nificance of these interrupts is implementation dependent.
Each of the eight external interrupts can be individually masked by clear-ing the corresponding bit in the Intr[5:0] or Sw[1:0] Field of the StatusRegister. All eight of the interrupts can be masked at once by clearingthe IEc Bit in the Status Register.
5.3.5.2 Handling
When an Interrupt Exception occurs, the CW400x branches to the Gen-eral Exception Vector (0x80000080 or 0xBFC00180) and sets the IntCode in the Cause Register ExcCode Field. The CW400x saves theKUp, IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo,KUp, and IEp Bits, respectively, and clears the KUc and IEc Bits.
The IP Field in the Cause Register shows which of six external interruptsare pending; the Sw[1:0] Field in the Cause Register shows which (oftwo) software interrupts are pending. More than one interrupt can bepending at a time.
5.3.5.3 Servicing
If software generated the interrupt, it can clear the interrupt condition bysetting the corresponding Cause Register Sw[1:0] Bit to zero.
If external hardware generated the interrupt, the interrupt condition iscleared by alleviating the condition that caused the assertion of the inter-rupt signal.
5.3.6OverflowException
5.3.6.1 Cause
The Overflow Exception occurs when an ADD, ADDI, SUB, or SUBIInstruction results in a two’s complement overflow. This exception is notmaskable.
5.3.6.2 Handling
When an overflow exception occurs, the CW400x branches to theGeneral Exception Vector (0x80000080 or 0xBFC00180) and sets the
Exception Description Details 5-29
ExcCode of the Cause Register to Ovf . The CW400x saves the KUp,IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo, KUp, andIEp Bits, respectively, and clears the KUc and IEc Bits.
The EPC Register points to the instruction that caused the exception,unless the instruction is in a branch delay slot and the branch is taken.In that case, the EPC Register points to the branch instruction that pre-ceded the exception-causing instruction and the CW400x sets the BD Bitof the Cause Register.
5.3.6.3 Servicing
Kernel software should indicate a floating-point exception or integer over-flow error to the executing process. Such an error is usually fatal.
5.3.7ReservedInstructionException
5.3.7.1 Cause
The Reserved Instruction Exception occurs when the CW400x executesan instruction whose major opcode (Bits [31:26]) is undefined or a Spe-cial Instruction whose minor opcode (Bits [5:0]) is undefined.
This exception provides a way to interpret instructions that might beadded to or removed from the processor architecture. This exception isnot maskable.
5.3.7.2 Handling
When a reserved instruction exception occurs, the CW400x branches tothe General Exception Vector (0x80000080 or 0xBFC00180) and setsthe RI Code of the Cause Register ExcCode Field. The CW400x savesthe KUp, IEp, KUc, and IEc Bits of the Status Register into the KUo, IEo,KUp, and IEp Bits, respectively, and clears the KUc and IEc Bits.
The EPC Register points to the reserved instruction that caused theexception, unless the instruction is in a branch delay slot and the branchis taken. In that case, the EPC Register points to the branch instructionthat preceded the reserved instruction and the CW400x sets the BD Bitof the Cause Register.
5-30 Exception Processing (CP0)
5.3.7.3 Servicing
If instruction interpretation is not implemented, kernel software shouldindicate an illegal instruction/reserved operand fault to the executing pro-cess. Such an error is usually fatal.
An operating system can interpret the undefined instruction and passcontrol to a routine that implements the instruction in software. If theundefined instruction is in the branch delay slot, the routine that imple-ments the instruction is responsible for simulating the branch instructionafter the undefined instruction has been executed. Simulation of theBranch Instruction includes determining whether the conditions of thebranch were met (which is determined by checking the BD Bit in theCause Register) and then transferring control to the Branch TargetAddress (if required) or to the instruction following the delay slot if thebranch is not taken. If the branch is not taken, the next instruction’saddress is [EPC] + 4. If the branch is taken, the branch target addressis calculated as shown in Figure 5.21.
Figure 5.21Branch TargetAddressCalculation
Note that the target address is relative to the address of the instructionin the delay slot, not the address of the branch instruction. Refer to thebranch instruction descriptions for details on how branch targetaddresses are calculated.
5.3.8Reset Exception
5.3.8.1 Cause
The Reset Exception occurs upon deassertion of the CW400x ResetSignal, BCPU_RESETN. This exception is not maskable.
5.3.8.2 Handling
When a reset exception occurs, the CW400x provides a Reset ExceptionVector (0xBFC00000). The vector resides in the CW400x’s non-cache-able address space; therefore the hardware does not need to initializethe cache to handle this exception. The processor can fetch and executeinstructions while the caches are in an undefined state.
Next Instruction+8
Delay Slot
Branch Offset
+4
[EPC]
Target Address = ([EPC] + 4) + (offset * 4)
Exception Description Details 5-31
The contents of all registers in the CW400x are undefined when the resetexception occurs, except for when the Status Register KUc, IEc arecleared to zero and BEV is set to one.
5.3.8.3 Servicing
The Reset Exception is serviced by initializing all processor registers,coprocessor registers, and the memory system. Typically, diagnosticswould then be executed, and the operating system bootstrapped. TheReset Exception Vector is selected to appear within the non-cacheable,unmapped memory space of the machine so that instructions can befetched and executed while the cache and the memory system is still inan undefined state.
5.3.9System CallException
5.3.9.1 Cause
The System Call Exception occurs when the CW400x executes aSYSCALL Instruction. This exception is not maskable.
5.3.9.2 Handling
When the System Call Exception occurs, the CW400x branches to theGeneral Exception Vector (0x80000080 or 0xBFC00180) and sets theSys Code in the Cause Register ExcCode Field. The CW400x saves theKUp, IEp, KUc, and IEc bits of the Status Register into the KUo IEo,KUp, and IEp bits, respectively, and clears the KUc and IEc bits.
The EPC Register points to the SYSCALL Instruction that caused theexception, unless the SYSCALL Instruction is in a branch delay slot andthe branch is taken. In that case, the EPC Register points to the branchinstruction that preceded the SYSCALL Instruction and the CW400x setsthe BD Bit of the Cause Register.
5.3.9.3 Servicing
The operating system transfers control to the applicable system routine.To resume execution, the EPC Register must be altered so that theSYSCALL Instruction does not execute again. A value of four is addedto the EPC Register before returning to avoid re-execution of theSYSCALL Instruction. If the BD Bit in the Cause Register is set, thebranch must be interpreted.
5-32 Exception Processing (CP0)
5.3.10Trap Exception
5.3.10.1 Cause
The Trap Exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE,TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNEI Instruction results in a truecondition. This exception is not maskable.
5.3.10.2 Handling
When a Trap Exception occurs, the CW400x branches to the GeneralException Vector (0x80000080 or 0xBFC00180) and sets the Tr Code inthe Cause Register ExcCode Field. The CW400x saves the KUp, IEp,KUc, and IEc bits of the Status Register into the KUo, IEo, KUp, and IEpbits, respectively, and clears the KUc and IEc Bits.
The EPC Register points to the address of the Trap Instruction thatcaused the exception, unless the Trap Instruction is in a branch delay slotand the branch is taken. In that case, the EPC Register points to thebranch instruction that preceded the Trap Instruction and the CW400xsets the BD Bit of the Cause Register.
5.3.10.3 Servicing
Kernel software should transfer control to the applicable system routine.To resume execution, the EPC Register must be altered so that the Trapdoes not execute again.
6-1
Chapter 6Required ExternalModules
This chapter describes required external modules for the CW400x Micro-processor. Note that the MMU Stub is only required if there is no MMUattached to the CW400x Microprocessor Core.
Note that LSI Logic’s BBCC contains a BIU. In this document, referencesto the BIU usually also refer to the BBCC. References to the BBCC areusually specific to LSI Logic’s implementation of the BIU in the BBCC.(See the MiniRISC Building Blocks Technical Manual for more informa-tion about the BBCC.)
This chapter contains the following sections:
♦ Section 6.1, “Global Output Enable Module (GOE)”
♦ Section 6.2, “MMU Stub”
6.1Global OutputEnable Module(GOE)
Note: This section discusses Data Bus Methodology.
The CW400x needs the GOE external module because it does not inter-nally arbitrate what module drives the DATAP[31:0] signals.
The GOE is an external module that customers should use to control theData Bus, DATAP[31:0]. LSI Logic has made this an external module sothat customers can easily customize the logic. Most customers shoulduse the GOE as it is defined. However, this section contains a completedescription of the module for those who choose to alter it.
6.1.1Function
The GOE has three main functions:
1. The GOE provides the Output Enable Signals for all drivers (mod-ules) on the Data Bus, DATAP[31:0]. A single Data Bus output enabledecoder module is necessary because multiple decoders cause bus
6-2 Required External Modules
contention during scan (ATPG). An external, configurable GOE, prop-erly designed, guarantees one Data Bus driver at all times.
2. The GOE provides the Run Enable Signals (CRUN_INN,BRUN_INN, MRUN_INN) for the CW400x and all other peripherals.The GOE combines all the Run Request Signals (BRUN_OUTP,CRUN_OUTP, MRUN_OUTP, GRUN_OUT1P, and GRUN_OUT2P) tocreate the Global Run Enable Signal, RUN_INN (see Figure 6.6).Since the RUN_INN Signal is so important to the system level criticalpath, it is important that extra logic is not implemented for non-exis-tent peripheral modules. As an external configurable module, theGOE can be optimized for this path.
3. The GOE provides the CW400x Pipeline Run Indicator Signal,CPIPE_RUNN. The GOE asserts CPIPE_RUNN during Pipeline RunCycles and deasserts CPIPE_RUNN during Pipeline Stalls. The dif-ference between CPIPE_RUNN and RUN_INN is that CPIPE_RUNNis deasserted during an X2 Cycle, since the pipeline is stalled.RUN_INN will be asserted during an X2 Cycle, since the X2 Stageis a bus cycle.
6.1.1.1 Output Enables
During scan (when GTEST_ENABLEP is asserted), the BIU deassertsthe Cache Signals, (BZ_IDDOEP, BZ_I1DOEP, BZ_IDT_OEN, andBZ_I1T_OEN) and the OCM Signal (BOCMOEN) so they do not causea 3-state contention problem.
The GOE latches its inputs and every cycle performs a one-and-only-onedecode to choose which module drives DATAP[31:0].
Figures 6.1 through 6.3 illustrate the development of the GOE.
Global Output Enable Module (GOE) 6-3
Figure 6.1 shows a block diagram of a basic functional GOE design. Allflip-flops are clocked by an ungated clock.
Figure 6.1Basic FunctionalGOE Design Logic
FD1
Class A Signal
Class B Signal
DecodeOutputEnables
0
1
RUN_INN
Scan In
GSCAN_ENABLEP
Scan Out
Class C Signal
0
1
Scan In
Scan Out0
1
GSCAN_ENABLEP
FD1
6-4 Required External Modules
Table 6.1 shows the truth table that is implemented by the decode logic.The truth table must be a one-and-only-one decode for any arbitrarycombination of inputs, since while scanning in data through these flip-flops, the flip-flops contain undefined values. This method guaranteesthat DATAP[31:0] will never be floating and will never have contention.
Table 6.1Output EnableDecoding
BB
US
_ST
EA
LN1
1. Class B Signal
GT
ES
T_E
NA
BLE
P2
2. Class C Signal
BB
_SL
VD
OE
N1
CM
EM
_FE
TC
HP
3
3. Class A Signal
CO
P_D
RIV
EP
3
CO
PP
23
CO
P_E
XIS
TP
03
CO
P_E
XIS
TP
13
CO
P_E
XIS
TP
23
CO
P_E
XIS
TP
33
ME
AR
LYK
S1P
3
BO
CM
EX
IST
P3
CIP
_DN
3
BS
_DC
EN
P3
BS
_IC
EN
P3
BO
EN
(BIU
)
Cac
he4
4. One and only one of BZ_IDDOEP, BZ_I1DOEP, BZ_IDT_OEN, or BZ_I1T_OEN.
CO
EN
(CW
400x
)
MO
EN
(MM
U)
CO
P1O
EN
(Cop
1)
CO
P2O
EN
(Cop
2)
CO
P3O
EN
(Cop
3)
BO
CM
OE
N(O
CM
)
Condition
0 1 X X X X X X X X X X X X X 1 0 0 0 0 0 0 0 No Cache During Test
0 0 1 X X X X X X X X X X X X 1 0 0 0 0 0 0 0 BIU Data Access
0 0 0 X X X X X X X X X X X X 0 1 0 0 0 0 0 0 Delayed Cache Access
1 X X 0 0 X X X X X X X X X X 0 0 1 0 0 0 0 0 CW400x Store Access
1 X X 0 1 0 0 X X X X X X X X 0 0 1 0 0 0 0 0 No Coprocessor 0
1 X X 0 1 0 1 X X X X X X X X 0 0 0 1 0 0 0 0 Coprocessor 0 Read Access
1 X X 0 1 1 X 0 X X X X X X X 0 0 1 0 0 0 0 0 No Coprocessor 1
1 X X 0 1 1 X 1 X X X X X X X 0 0 0 0 1 0 0 0 Coprocessor 1 Read Access
1 X X 0 1 2 X X 0 X X X X X X 0 0 1 0 0 0 0 0 No Coprocessor 2
1 X X 0 1 2 X X 1 X X X X X X 0 0 0 0 0 1 0 0 Coprocessor 2 Read Access
1 X X 0 1 3 X X X 0 X X X X X 0 0 1 0 0 0 0 0 No Coprocessor 3
1 X X 0 1 3 X X X 1 X X X X X 0 0 0 0 0 0 1 0 Coprocessor 3 Read Access
1 1 X 1 X X X X X X X X X X X 1 0 0 0 0 0 0 0 No OCM During Test
1 0 X 1 X X X X X X 1 1 X X X 0 0 0 0 0 0 0 1 OCM Read Access
1 0 X 1 X X X X X X 1 0 X X X 1 0 0 0 0 0 0 0 No OCM Present
1 0 X 1 X X X X X X 0 X 0 0 X 1 0 0 0 0 0 0 0 No Data Cache Present
1 0 X 1 X X X X X X 0 X 0 1 X 0 1 0 0 0 0 0 0 Data Cache Read Access
1 0 X 1 X X X X X X 0 X 1 X 0 1 0 0 0 0 0 0 0 No Instruction Cache Present
1 0 X 1 X X X X X X 0 X 1 X 1 0 1 0 0 0 0 0 0 Instruction Cache Read Access
Global Output Enable Module (GOE) 6-5
The only problem with the design in Figure 6.1 is that the output enablesmust have a very fast delay from the clock signals. Therefore, the logicalthing to do is to move the decode logic in front of the flip-flops. Thisdesign latches all of the output enables, but presents a problem. If wemove the logic in front of the scan chain muxes, the flip-flops will containrandom values and so one-and-only-one decode cannot be guaranteedduring scan.
To solve this problem, LSI Logic implemented the scheme in Figure 6.2.This design is functionally equivalent to Figure 6.1, but with improved tim-ing.
Figure 6.2Improved TimingGOE Design Logic
This design would solve all of these problems if not for RUN_INN.RUN_INN is a very late signal, so the design must be optimized to allowRUN_INN to be as late as possible. Therefore, Figure 6.3 shows the finaldesign solution.
FD1
Class A Signal
Class B Signal
Decode OutputEnables
0
1
RUN_INN
Scan In
GSCAN_ENABLEP
Scan Out
Class C Signal
0
1
Scan In
Scan Out0
1
GSCAN_ENABLEP
FD1
FD1
6-6 Required External Modules
Figure 6.3Final GOE DesignLogic
This final version of the GOE has duplicated the decode logic as well asthe scan mux. This allows us to pre-compute the output enables for bothcases of RUN_INN LOW (0) and RUN_INN HIGH (1). Then we selectthe correct output enable, using a mux, when RUN_INN becomes valid.This circuit is functionally equivalent in every way to the originalfunctional circuit (Figure 6.1).
To use ATPG software, the logic should be replaced with the originalcircuit (Figure 6.1), since the flip-flops are not strictly scannable (soATPG would treat them as non-scanned flip-flops and lower thecoverage).
Decode
Class A Signal
Class B Signal
Decode
OutputEnables
0
1
RUN_INN
FD1 Scan Out
Class C Signal
0
1
Scan In
Scan Out0
1
GSCAN_ENABLEP
0
1
0
1 0
1
RUN_INN
FD1
FD1
Scan In
Global Output Enable Module (GOE) 6-7
6.1.1.2 Run Enables
All of the module Run Request Signals (BRUN_OUTP, CRUN_OUTP,MRUN_OUTP, GRUN_OUT1P, and GRUN_OUT2P) must be combinedto form a Global Run Enable Signal (RUN_INN). The GOE generatesthree copies of this signal (BRUN_INN, CRUN_INN, and MRUN_INN) forthe CW400x and all other peripherals (see Figure 6.4).
Figure 6.4Creation ofRUN_INN
6.1.1.3 Pipeline Run Indicator
All of the module run request signals (BRUN_OUTP, CRUN_OUTP,MRUN_OUTP, GRUN_OUT1P, GRUN_OUT2P) and CIP_DN must becombined to form the Pipeline Run Indicator Signal, CPIPE_RUNN (seeFigure 6.5).
Figure 6.5Creation ofCPIPE_RUNN
6.1.2Signals
This section describes the signals that comprise the bit-level interface ofthe GOE.
The signals are described in alphabetical order by mnemonic. Eachsignal definition contains the mnemonic and the full signal name. Themnemonics for signals that are active LOW end in an “N” and have anoverbar, and the mnemonics for signals that are active HIGH end in a “P.”
In the descriptions that follow, the verb assert means to drive TRUE oractive. The verb deassert means to drive FALSE or inactive.
BRUN_OUTP
BRUN_INN
CRUN_INN
MRUN_INN
CRUN_OUTPMRUN_OUTP
RUN_INN
GRUN_OUT1PGRUN_OUT2P
CIP_DNBRUN_OUTPCRUN_OUTPMRUN_OUTP
GRUN_OUT1PGRUN_OUT2P
CPIPE_RUNN
6-8 Required External Modules
6.1.2.1 Class A Signals
These signals are valid only on Bus Run Cycles (for a definition of BusRun Cycles see Section 7.1.3, “Operation and Functional Waveforms”).
BOCMEXISTP On-Chip Memory (OCM) Memory Present InputAsserting this signal indicates that the OCM is present.This signal is also an input to the OCM. The systemdesigner ties this to power (HIGH) to indicate OCMpresent, and to ground (LOW) to indicate OCM notpresent.
BS_DCENP Data Cache Enabled InputThis signal is accessed through a bit in the BBCC Con-figuration Register, BS_CONFIGP0. Asserting this signalinforms the GOE that the data cache is enabled.
BS_ICENP Instruction Cache Enabled InputThis signal is accessed through a bit in the BBCC Con-figuration Register, BS_CONFIGP4. Asserting this signalinforms the GOE that the instruction cache is enabled.
CIP_DN CW400x Instruction/Data Indication InputThis signal qualifies the type of memory fetch when amemory fetch is indicated by CMEM_FETCHP. TheCW400x drives this signal HIGH to indicate that it is per-forming an instruction fetch. The CW400x drives this sig-nal LOW to indicate that it is performing a data fetch.
CMEM_FETCHPCW400x Memory Fetch Request InputThe CW400x asserts this signal HIGH to indicate that itis performing a memory fetch.
COP_DRIVEP Coprocessor Drives Data Bus Indicator InputThe CW400x asserts this signal HIGH to inform the GOEthat a coprocessor should drive DATAP[31:0].
COPEXISTP[3:0] Coprocessors exist InputThe coprocessors assert these signals to indicate to theGOE which coprocessors are present.
COPP[1:0] Coprocessor Number InputOutput from the CW400x. These signals from the coreindicate to the GOE which coprocessor should driveDATAP[31:0].
Global Output Enable Module (GOE) 6-9
MEARLYKS1P Stub Early kseg1 Signal InputThe MMU Stub asserts this signal HIGH to indicate thatthe virtual address is in kseg1. MEARLYKS1P is a com-binational feed-through path based on the ADDRP[31:0]inputs. This signal is for devices that may require an earlyindication of the virtual memory area for a pending mem-ory cycle. It provides access information before the risingedge of the clock beginning the bus cycle. This signal isan input to the Bus Interface Unit. (BIU)
6.1.2.2 Class B Signals
These signals are valid at every clock cycle. Note that LSI Logic’s BBCCcontains a BIU.
BB_SLVDOEN BIU Bus Slave Drive Request InputThe BIU asserts this signal LOW to inform the GOE thatthe BIU is a bus slave and that the external device isrequesting a read access to the caches. This signal indi-cates that one of the cache RAMs will drive the bus start-ing at the rising edge of the next clock cycle.
BBUS_STEALNBIU Bus Steal InputThe BIU asserts this signal LOW to inform the GOE thatthe BIU will become the Data Bus Master starting at therising edge of the next clock cycle.
6.1.2.3 Class C Signals
These signals do not need to be latched. They are static for the purposesof decode.
GTEST_ENABLEPTest Enable InputAsserting this signal HIGH enables scan testing of thechip’s system logic. Note that this signal must always beasserted during a scan test. Note also that this signal isused raw (not latched at all). (For more information onscan testing see Section 8.2, “Scan Methodology”.)
GSCAN_ENABLEPScan Test Mode Enable InputAsserting this signal enables loading of the scan chain.
6-10 Required External Modules
6.1.2.4 Run/Stall Signals
These signals control and indicate the run/stall state of the system.
CPIPE_RUNN CW400x Pipeline Run Indicator OutputThe GOE asserts this signal LOW to inform the FlexLinkComputational Unit that the core is in a pipeline runcycle. The GOE deasserts this signal HIGH to inform theComputational Unit that the core is in a pipeline stallcycle.
CRUN_INN CW400x Run Enable OutputThe GOE asserts this signal LOW to enable the core togo on to the next run cycle. The GOE deasserts this sig-nal HIGH to stall the core.
CRUN_OUTP CW400x Run Request InputThe core asserts this signal HIGH to request to the GOEthat it go on to the next run cycle. The core deasserts thissignal LOW to request stalling the pipeline.
BRUN_INN BIU Run Enable OutputThe GOE asserts this signal LOW to enable the BIU togo on to the next run cycle. The GOE deasserts this sig-nal HIGH to stall the BIU.
BRUN_OUTP BIU Run Request InputThe BIU asserts this signal HIGH to request to the GOEthat it go on to the next run cycle. The BIU deasserts thissignal LOW to request stalling the pipeline.
GRUN_OUT1P General Device Run Request 1 InputGeneral Device 1 asserts this signal HIGH to request tothe GOE that it go on to the next run cycle. GeneralDevice 1 deasserts this signal LOW to request stallingthe pipeline.
GRUN_OUT2P General Device Run Request 2 InputGeneral Device 2 asserts this signal HIGH to request tothe GOE that it go on to the next run cycle. GeneralDevice 2 deasserts this signal LOW to request stallingthe pipeline.
Global Output Enable Module (GOE) 6-11
MRUN_INN External Device Run Enable OutputThe GOE asserts this signal LOW to enable the MMU togo on to the next run cycle. The GOE deasserts this sig-nal HIGH to stall the MMU.
MRUN_OUTP MMU Run Request InputThe MMU asserts this signal HIGH to request to the GOEthat it go on to the next run cycle. The MMU deassertsthis signal LOW to request stalling the pipeline.
6.1.2.5 GOE Output Enables
These signals are all valid every cycle and are designed to be hookedstraight into the output enables (after a buffer) for various modules. Onlyone of these signals (including also the BBCC Output Enables) can beasserted (enabling the device) at a time.
COEN CW400x Output Enable OutputInput to the CW400x. The GOE asserts this signal toenable the core to drive data onto DATAP[31:0].
BIUOEN BIU Output Enable OutputInput to the BIU. The GOE asserts this signal to enablethe BIU to drive data onto DATAP[31:0].
MOEN MMU (COP0) Output Enable OutputInput to the MMU. The GOE asserts this signal to enablethe MMU to drive data onto DATAP[31:0].
COP1OEN Coprocessor 1 Output Enable OutputInput to Coprocessor 1 (FPU). The GOE asserts this sig-nal to enable Coprocessor 1 (FPU) to drive data ontoDATAP[31:0].
COP2OEN Coprocessor 2 Output Enable OutputInput to Coprocessor 2. The GOE asserts this signal toenable Coprocessor 2 to drive data onto DATAP[31:0].
COP3OEN Coprocessor 3 Output Enable OutputInput to Coprocessor 3. The GOE asserts this signal toenable Coprocessor 3 to drive data onto DATAP[31:0].
6-12 Required External Modules
6.1.2.6 BBCC Output Enables
These signals are outputs from the BBCC, not the GOE (see Figure 6.6).Their operation is described here anyway, since they are part of the GOEfunction. The user must remember that the decodes for these signals arefixed and cannot be altered. These signals are all valid every cycle andare designed to be hooked straight into the input enables (after a buffer)for various modules. Only one of these signals (including also the GOEOutput Enables) can be asserted (enabling the device) at a time. (Seethe MiniRISC Building Blocks Technical Manual for more informationabout the BBCC, the 3-state gates, and the Cache System.)
BZ_IDDOEP I-Cache Set 0/D-Cache Data RAM Output Enable Out-putInput to the I-Cache Set 0/D-Cache Data RAM from theBBCC. The BBCC asserts this signal to enable the datafrom the I-Cache Set 0/D-Cache Data RAM to drive dataonto DATAP[31:0].
BZ_I1DOEP I-Cache Set 1 Data RAM Output Enable OutputInput to the I-Cache Set 1 Data RAM from the BBCC.The BBCC asserts this signal to enable the I-Cache Set1 Data RAM to drive data onto DATAP[31:0].
BZ_IDT_OEN I-Cache Set 0/D-Cache Tag RAM Output Enable Out-putInput to a set of 3-state gates from the BBCC. The BBCCasserts this signal to enable a set of 3-state gates todrive data from the I-Cache Set 0/D-Cache Tag RAMonto DATAP[31:0].
BZ_I1T_OEN I-Cache Set 1 Tag RAM Output Enable OutputInput to a set of 3-state gates from the BBCC. The BBCCasserts this signal to enable a set of 3-state gates todrive data from the I-Cache Set 1 Tag RAM ontoDATAP[31:0].
BOCMOEN On-Chip Memory (OCM) Output Enable OutputInput to the OCM from the BBCC. The BBCC asserts thissignal to enable the OCM to drive data onto DATAP[31:0].
Global Output Enable Module (GOE) 6-13
6.1.3Connecting tothe CW400x andBuilding Blocks
Figure 6.6 shows how to attach the GOE to the CW400x and buildingblocks.
Figure 6.6GOE ModuleAttachments
BRUN_INN
BRUN_OUTP
COP2OEN
MRUN_INN
MOEN
MRUN_OUTP
MEARLYKS1P
GSCAN_ENABLEP
GTEST_ENABLEP
COEN
COP_DRIVEP
COPP[1:0]
CRUN_OUTP
CRUN_INN
CMEM_FETCHP
CIP_DN
CW400xGOE
Coprocessor 1
CPIPE_RUNN
COPEXISTP2
COPEXISTP3
COP3OEN
COP1OEN
COPEXISTP1
Coprocessor 2
Coprocessor 3
BIU
BIUOEN
BB_SLVDOEN
BBUS_STEALN
MMU
COPEXISTP0
BS_CONFIGP0
BOCMOENBOCMEXISTP OCM
GRUN_OUT1P
GRUN_OUT2P
(BBCC)
CacheSystem
BS_CONFIGP4
BZ_IDDOEP
BZ_I1DOEP
BZ_IDT_OEN
BZ_I1T_OEP
(BS_DCENP)
(BS_ICENP)
Tied to Ground or Power
6-14 Required External Modules
6.2MMU Stub
The MMU Stub is required as an external module if there is no MMUattached to the CW400x. Both the MMU and MMU Stub latch and holdthe address bus through stalls and also direct-map the kseg0 and kseg1(kernel segments 0 and 1) virtual address space onto the first 512Mbytes of physical address space.
The MiniRISC CW400x drives addresses onto the Address Bus,ADDRP[31:0]. Although CW400x-based systems do not require a fullMMU, in most cases, some of the functions of the MMU are required forthe system to maintain both MIPS compatibility and ease of design. LSILogic provides the MMU Stub to perform these tasks.
6.2.1Function
The MMU Stub takes addresses from the MiniRISC CW400x Core andregisters them. This address registration is useful because the CW400xdoes not hold the address valid for an entire bus cycle, but rather, holdsit only around the rising clock edge beginning the bus cycle. The MMUStub registers the address for the entire bus cycle. In addition, it trans-lates the addresses in kseg0 or kseg1 to the lower 512 MBytes of Phys-ical Memory, as in the MIPS standard memory map. This translation isthe only address translation performed by the MMU Stub, and as such,is referred to as a hard map. Figure 6.7 shows the MMU Stub hard map.
MMU Stub 6-15
Figure 6.7MMU Stub HardAddress Mapping (HardMap)
Since the output address is transformed when in kseg0 and kseg1, theMMU Stub generates signals indicating that the address from theCW400x was in kseg0/1/2 (KSEGCHECKP), and more precisely, if it wasin kseg1 (KSEG1_NOCACHEP).
6.2.2Signals
This section describes the signals that comprise the bit-level interface ofthe MMU Stub.
The signals are described in alphabetical order by mnemonic. Eachsignal definition contains the mnemonic and the full signal name. The
0xFFFF FFFFkseg2
kseg1
kseg0
kuseg
0x8000 0000
0xA000 0000
0xC000 0000
0xFFFF FFFF
0x0000 0000
Microprocessor Address Real Memory
UserCached
KernelCached
KernelUncached
KernelCached
32-Bit Address4 GB Memory
512 MBytes
0x2000 0000
0x0000 0000
6-16 Required External Modules
mnemonics for signals that are active LOW end in an “N” and have anoverbar, and the mnemonics for signals that are active HIGH end in a “P”(except LVIRADDR_31, LVIRADDR_30, and LVIRADDR_29).
In the descriptions that follow, the verb assert means to drive TRUE oractive. The verb deassert means to drive FALSE or inactive.
ADDRP[31:0] CW400x Address Bus InputThe core drives these signals with the memory address.
GSCAN_ENABLEPScan Test Mode Enable InputSystem logic asserts this signal HIGH to enable scantesting.
GSCAN_INP Scan Test Input InputThis signal is the input to the internal scan chain.
GSCAN_OUTP Scan Test Output OutputThis signal is the output from the internal scan chain.
KSEG1_NOCACHEPkseg1 Indicator OutputThe MMU Stub asserts this signal HIGH to indicate thatthe Stub has detected an address from the CW400x inkseg1 space, indicating that this data transaction shouldnot be cached.
KSEGCHECKP kseg0/1/2 Indicator OutputThe MMU Stub asserts this signal HIGH to indicate thatthe MMU Stub has detected an address from theCW400x in kernel space. This signal is an input to theBus Interface Unit.
LVIRADDR_29 CW400x Address Bit 29 OutputThis signal is the registered version of the CW400xAddress Bit 29, unmapped. This signal is an input to theBus Interface Unit.
LVIRADDR_30 CW400x Address Bit 30 OutputThis signal is the registered version of the CW400xAddress Bit 30, unmapped. This signal is an input to theBus Interface Unit.
MMU Stub 6-17
LVIRADDR_31 CW400x Address Bit 31 OutputThis signal is the registered version of the CW400xAddress Bit 31, unmapped. This signal is an input to theBus Interface Unit.
MEARLYKS1P Stub Early kseg1 Signal OutputThe MMU Stub asserts this signal HIGH to indicate thatthe virtual address is in kseg1. MEARLYKS1P is a com-binational feed-through path based on the ADDRP[31:0]inputs. This signal is for devices that may require an earlyindication of the virtual memory area for a pending mem-ory cycle. It provides access information before the risingedge of the clock beginning the bus cycle. This signal isan input to the Bus Interface Unit.
MRUN_INN External Device Run Signal InputDeasserting this signal HIGH indicates that some othermodule is stalling the CBus. The MMU Stub only clocksin new addresses during Bus Run Cycles.
PCLKP System Clock InputThis signal is the global clock input. It is used to clockelements in the MMU Stub.
REG_ADDRP[31:0]CW400x Address Bus OutputThese signals are the registered, translated CW400xAddress Bus. These signals are inputs to the Bus Inter-face Unit.
6.2.3Connecting tothe CW400x
In order to connect the MMU Stub to the CW400x correctly, simply con-nect the ADDRP[31:0] Inputs to the ADDRP[31:0] Outputs of theCW400x. Connect the MRUN_INN input to a gate that logically NANDsall Run Indication Signals in the system, so that the MRUN_INN signalis active only if all the Run Indication Signals are indicating run (this logicis found in the GOE Module). The MMU Scan Inputs(GSCAN_ENABLEP and GSCAN_INP) should be connected to the Glo-bal Scan Enable and the Scan Out of another module’s scan chain. Theother signals connect to a BIU.
6-18 Required External Modules
Figure 6.8 shows a block diagram of the logical I/O connections for theMMU Stub.
Figure 6.8MMU StubAttachments
CW400x
ADDRP[31:0]
KSEG1_NOCACHEP
KSEGCHECKP
MEARLYKS1P
LVIRADDR_29
LVIRADDR_30
LVIRADDR_31
REG_ADDRP[31:0]
MMU Stub
System MRUN_INN
BIU
PCLKP
ADDRP[31:0]
PCLKP
GSCAN_ENABLEP
GSCAN_INP
GSCAN_OUTP
Scan Chain Test Output
Global Scan Enable
NAND of allRun IndicationSignals From
from Another Module
GOE
Clock
Scan Chain TestInput to AnotherModule
7-1
Chapter 7Interfaces
This chapter describes the interfaces for the CW400x Microprocessor. Itcontains the following sections:
♦ Section 7.1, “CBus Interface”
♦ Section 7.2, “FlexLink Interface”
7.1CBus Interface
The CBus Interface is the main link between the CW400x Microproces-sor and logic, such as an MMU, BIU (Bus Interface Unit), Cache, andCoprocessors. The BIU is external to the core (see Figure 1.1). The usermust either create a BIU according to the information in this section, oruse LSI Logic’s BBCC.
7.1.1Bus Stealing
To allow the BIU to implement instruction streaming and load schedulingefficiently, the BIU can assert BBUS_STEALN to steal Data Bus(DATAP[31:0]) cycles away from the CW400x. When BBUS_STEALN isasserted, any module which is driving DATAP[31:0] must release it. TheBIU will then be guaranteed that it can drive DATAP[31:0] without con-tention. The BIU can then do an operation such as block refill, insert datafor a load, DMA transfers, or cache snooping.
If a cycle is stolen in a X2 Stage, the CW400x stalls to guarantee thatthe last X2 Cycle will not be stolen for Stores, MTCz, MFCz, CTCz, andCFCz Instructions. This simplifies the coprocessor’s interface design.
7.1.2InterfaceSignals
The CW400x CBus Interface consists of the signals shown in Table 7.1.Signal direction is relative to the CW400x. For more detail on thesesignals see Chapter 3.
7-2 Interfaces
Table 7.1CW400x CBusInterface Signals
7.1.3Operation andFunctionalWaveforms
CW400x Microprocessor transactions occur during Bus Run Cycles.Asserting CRUN_INN causes the following clock cycle to be a Bus RunCycle. The states of CIP_DN, CMEM_FETCHP, and CSTOREP in thecycle before the Bus Run Cycle specify what type of transaction willoccur during the Bus Run Cycle.
7.1.3.1 Instruction Fetches
Instruction Fetch Protocol Rules:
1. The CW400x asserts CMEM_FETCHP before the rising edge of theBus Run Cycle to initiate a fetch request from memory.CMEM_FETCHP is only valid during Bus Run Cycles.
2. The CW400x drives CIP_DN HIGH before the rising edge of the BusRun Cycle to initiate an instruction transfer. CIP_DN is only valid dur-ing Bus Run Cycles.
3. The CW400x drives the address of the instruction to be fetched onADDRP[31:0] before the rising edge of the Bus Run Cycle.ADDRP[31:0] is only valid during the Bus Run Cycle. It is not heldduring stall cycles. If the system designer needs to store the address
Signal Definition I/O
ADDRP[31:0] Address Bus Output
BBEP BIU Bus Error Input
BBIG_ENDIANP Big Endian Select Input
BDRDYP BIU Load Data Ready Input
BIRDYP BIU Instruction Data Ready Input
CADDR_ERRORP Memory Address Error Output
CBYTEP[3:0] Byte Enables Output
CIP_DN CW400x Instruction/Data Indication Output
CKILLMEMP Kill Memory Transaction Output
CMEM_FETCHP CW400x Memory Fetch Request Output
COEN CW400x Output Enable Input
CRUN_INN CW400x Run Enable Input
CRUN_OUTP CW400x Run Request Output
CSTOREP CW400x Store to Memory Request Output
DATAP[31:0] CW400x Data Bus Bidirectional
CBus Interface 7-3
externally during stalls, he/she must either use the MMU Stub, orattatch external flip-flops that are clocked during the beginning of theBus Run Cycle.
4. The BIU must drive DATAP[31:0] with the requested instruction andassert BIRDYP to tell the CW400x that valid data is on the data bus.If the instruction cannot be provided by the end of the Bus RunCycle, the CW400x deasserts CRUN_OUTP to stall the pipe. Oncethe data is valid on the bus, the BIU must assert BIRDYP.
5. The CW400x asserts CKILLMEMP if the outstanding instructionrequest must be killed (a TLB miss or an address error for example).The BIU may assert BIRDYP in the same cycle as CKILLMEMP butmust not assert BIRDYP in the following cycles of the instructionrequest. CKILLMEMP will only be asserted during the Bus RunCycles.
6. Upon a bus error, the BIU must assert BBEP and BIRDYP.
7-4 Interfaces
Figure 7.1 shows four instruction fetches.
Figure 7.1Instruction FetchExamples 1
1. Instruction fetch with an instruction cache hit.2. Instruction fetch with an instruction cache miss.3. Instruction fetch with an instruction cache miss and some other external stall.4. Instruction fetch with an instruction bus error.
2-IF 3-IF1-IF 4-IF
Bus Run Bus Stall Bus Run Bus Stall Bus Stall Bus Run Bus StallBus Run
MD95.177
PCLKP
CRUN_INN
CRUN_OUTP
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BIRDYP
BBUS_STEALN
BBEP
1 2 3 4
1 2 3
CBus Interface 7-5
Figure 7.2 shows four more instruction fetches.
Figure 7.2Instruction FetchExample 2
7.1.3.2 Data Loads
Data Load Protocol Rules:
1. The CW400x asserts CMEM_FETCHP before the rising edge of theX2 Stage Bus Run Cycle to initiate a fetch request from memory.CMEM_FETCHP is only valid during Bus Run Cycles.
1. Instruction fetch with an instruction cache hit followed by a bus steal.2. Instruction fetch which is killed (TLB miss or address error).3. Instruction fetch with an instruction cache miss and a bus steal.4. Instruction fetch with an instruction cache miss, instruction on bus during steal cycle and exter-
nal stall. Note that the last cycle of 4-IF is a stall cycle even though CRUN_OUTP is HIGHbecause there is an external stall request present (CRUN_INN HIGH).
2-IF 3-IF1-IF 4-IF
Bus Run Bus Run Bus Stall Bus Stall Bus Run Bus Stall Bus StallBus Run
MD95.178
PCLKP
CRUN_INN
CRUN_OUTP
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BIRDYP
BBUS_STEALN
CKILL_MEMP
21 3 4
1 BIU BIU 3 BIU-4
7-6 Interfaces
2. The CW400x drives CIP_DN LOW before the rising edge of the X2Stage Bus Run Cycle to indicate a data transfer. CIP_DN is onlyvalid during Bus Run Cycles.
3. During the first cycle of the X2 Stage, there is a Bus Run Cycle, butinternally, the CW400x stalls the pipe.
4. The CW400x drives the address of the data to be fetched onADDRP[31:0] before the rising edge of the Bus Run Cycle.ADDRP[31:0] is only valid during the Bus Run Cycle. It is not heldduring stall cycles.
5. The CW400x asserts CBYTEP[3:0] before the rising edge of the X2Stage Bus Run Cycle to distinguish which bytes are to be fetched.CBYTEP[3:0] remains asserted until the end of the X2 Stage. Notethat CBYTEP[3:0] is an X Stage signal, so it must not be used in theBIU instruction fetch logic.
6. The BIU drives DATAP[31:0] with the requested data and assertsBDRDYP.
Non-scheduleable Loads. If the BIU cannot provide the requesteddata by the end of the Bus Run Cycle and the load is not sched-uleable (LWL and LWR Instructions), the CW400x continues to stallthe pipe until BDRDYP is asserted. Note that the BIU is not requiredto use the BBUS_STEALN to provide the requested data.
Scheduleable Loads. If the BIU cannot provide the requested databy the end of the Bus Run Cycle and the load is scheduleable, theCW400x releases the stall and continues the pipe. Once the data isready, the BIU must assert BBUS_STEALN, drive the requested dataonto the DATAP[31:0], and assert BDRDYP. If the scheduled load hasa data dependency or another load enters the X1 Stage, theCW400x stalls the pipe in the X1 Stage and waits for the BIU to pro-vide the scheduled load data.
No Scheduling for Scheduleable Loads. If the BIU cannot providethe requested data by the end of the Bus Run Cycle and the load isscheduleable, the user may choose to not implement load schedulingby deasserting CRUN_INN (stalling the core) until BDRDYP isasserted. Not implementing load scheduling simplifies the BIUdesign.
7. The CW400x asserts CKILLMEMP if the outstanding data requestmust be killed (such as when a TLB miss or address error occurs).The BIU may assert BDRDYP in the same cycle as CKILLMEMP but
CBus Interface 7-7
must NOT assert BDRDYP in the following cycles of the datarequest. CKILLMEMP will only be asserted in the Bus Run Cycles.
8. Upon a bus error, the BIU must assert BBEP and BDRDYP.
Figure 7.3 shows three data loads.
Figure 7.3Data Load Example 1
1. Data fetch with a data cache hit.2. Data fetch with a data cache miss. The CW400x stalls because its a non-scheduleable load.3. Data fetch that is killed (TLB miss or address error). Note that BDRDYP is not asserted.
1-X2 2-X1 2-X2 3-X11-X1 3-X2
Bus Run Bus Run Bus Run Bus Stall Bus Run Bus RunBus Run
MD95.179
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CBYTEP[3:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BDRDYP
BIRDYP
BBUS_STEALN
CKILL_MEMP
1 2 3
1 2 3
1 2
7-8 Interfaces
Figure 7.4 shows two more data loads.
Figure 7.4Data Load Example 2
1a.Data fetch with a data cache miss and scheduled load.1b. Scheduled load data fetched.2. Data fetch (non-scheduleable) with a bus error. Note that BDRDYP is asserted for bus error.
1-X2 2-X11-X1 2-X2
Bus Run Bus Run Bus Stall Bus Run Bus Run Bus StallBus Run
MD95.180
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CBYTEP[3:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BDRDYP
BIRDYP
BBUS_STEALN
CKILL_MEMP
1a 2
1a 2
BIU-1b
CBus Interface 7-9
Figure 7.5 shows a non-scheduleable data load with a data cache missinterrupted by a bus steal. Note that BBUS_STEALN was not assertedfor a scheduled load or instruction refill.
Figure 7.5Data LoadExample 3
1-X21-X1
Bus Run Bus Stall Bus StallBus Run
MD95.181-1
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BDRDYP
BIRDYP
BBUS_STEALN
CBYTEP[3:0]
1
1
BIU 1
7-10 Interfaces
Figure 7.6 shows a previously scheduled data load forcing a stall in theX1 Stage of another load that has a data cache hit.
Figure 7.6Data LoadExample 4
7.1.3.3 Data Stores
Data Store Protocol Rules:
1. The CW400x asserts CSTOREP before the rising edge of the X2Stage Bus Run Cycle to indicate a store request. CSTOREP is onlyvalid during Bus Run Cycles. The CW400x will never assertCMEM_FETCHP and CSTOREP in the same Bus Run Cycle.
2. The CW400x drives CIP_DN LOW before the rising edge of the X2Stage Bus Run Cycle to indicate a data transfer. CIP_DN is onlyvalid during Bus Run Cycles.
3. The CW400x asserts the address of the data to be stored onADDRP[31:0] before the rising edge of the X2 Stage Bus Run Cycle.The address is only valid during the Bus Run X2 Cycle. It is not heldduring stalls.
2-X1 2-X2
Bus Run Bus Stall Bus Stall Bus Run
MD95.181-2
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BDRDYP
BIRDYP
BBUS_STEALN
CBYTEP[3:0]
2
2
BIU 2
CBus Interface 7-11
4. The CW400x asserts CBYTEP[3:0] before the rising edge of the X2Stage Bus Run Cycle to indicate which bytes are to be stored.CBYTEP[3:0] continues to be asserted until the end of the X2 Stage.
5. During the first cycle of the X2 Stage, there is a Bus Run Cycle, butinternally, the CW400x stalls the pipe.
6. The BIU must assert COEN during the X2 Stage. The CW400xdrives the requested data onto DATAP[31:0] in the X2 Stage Bus RunCycle and following stall cycles.
7. When the BIU asserts BBUS_STEALN, it must also deassert COENduring the following cycle to 3-state DATAP[31:0]. AfterBBUS_STEALN is deasserted, the CW400x stalls the pipe for onemore cycle to guarantee that the CW400x will drive the DATAP[31:0]with the store data in the last X2 Cycle. If BBUS_STEALN isasserted during the X1 Stage, the CW400x continues into the X2Stage but does not drive DATAP[31:0]. To help make the design pro-cess easier, designers may choose to use the existing external mod-ule, the GOE, which contains the logic for controlling COEN asdescribed above.
8. The BIU must deassert CRUN_INN to stall the pipe if one non-stolenX2 Cycle is not sufficient (such as a store miss) and register theaddress if needed (the MMU or MMU Stub may already do this).
9. The CW400x asserts CKILLMEMP if the outstanding data store mustbe killed (due to a TLB miss or address error for example). CKILL-MEMP will only be asserted in the Bus Run Cycles.
10. During stalls, the CW400x continues to drive DATAP[31:0] until theend of the X2 Stage.
11. An external stall in the X1 Stage prevents the CW400x from enteringthe X2 Stage.
7-12 Interfaces
Figure 7.7 shows two examples of data stores.
Figure 7.7Data Store Example 1
1-X2 2-X11-X1 2-X2
Bus Run Bus Run Bus Run Bus Stall Bus StallBus Run
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BBUS_STEALN
COEN
1. Store data cache hit (no external stall).2. Store data cache miss (external stall).
MD95.182
CBYTEP[3:0]
1 2
1 2
1 2
FlexLink Interface 7-13
Figure 7.8 shows two more examples of data stores.
Figure 7.8Data Store Examples 2
7.2FlexLinkInterface
The FlexLink Interface allows users to implement extended instructionsand insert extra hardware to speed up existing arithmetic functions. Thisflexibility enables system designers to optimize system performance andminimize silicon area. The hardware that is attached to the FlexLink Inter-face is referred to as a Computational Unit in the rest of this document.
Example applications: users can use the FlexLink Interface to connect ahigh-performance multiply-accumulate unit, such as LSI Logic’s MDU(see the MiniRISC Building Blocks Technical Manual), a Fast FourierTransform (FFT) engine, or a leading-one detector in order to acceleratecertain computational routines for DSP applications.
1-X2 2-X11-X1 2-X2
Bus Run Bus Stall Bus Run Bus Run Bus StallBus Run
MD95.183
Bus Stall
PCLKP
CRUN_INN
CIP_DN
ADDRP[31:0]
CMEM_FETCHP
CSTOREP
DATAP[31:0]
BBUS_STEALN
COEN
1. Store data cache hit interrupted by BBUS_STEALN (the CW400x stalls the pipe to guaranteelast X2 Cycle).
2. Store data cache miss interrupted by BBUS_STEALN.
CBYTEP[3:0]
1 2
1 2
BIU 1 2 BIU 2
7-14 Interfaces
A Computational Unit (CU) defines and decodes its own instructions. Itmay obtain its source operands from either its own register file, theCW400x Register File, or the instruction’s immediate field. At the end ofthe operation, it writes the result back to the CW400x Register File.Alternatively, it can write the result back to its own register file. This isparticularly useful in multicycle operations since the CW400x does notneed to be stalled to wait for the result.
7.2.1InterfaceSignals
The CW400x FlexLink Interface consists of the signals shown in Tables7.2 and 7.3. Table 7.2 shows the signals that interface with the CW400xCore (I/O direction relative to the CW400x Core). Table 7.3 shows addi-tional signals that interface with the CU (I/O direction relative to thesystem logic). For more detail on these signals see Chapter 3.
Table 7.2CW400x FlexLinkInterface Signals
Signal Definition I/O 1
1. Input to CW400x from CU. Output from CW400x to CU.
ASELP Computational Unit Select Input
ASTALLP Computational Unit Stall Request Input
AXBUSP[31:0] Computational Unit Result Bus Input
CIR_BOTP[5:0] Instruction Register Bottom Six Bits Output
CIR_TOPP[5:0] Instruction Register Top Six Bits Output
CKILLXP Kill Instruction in Execute Stage Output
CRSP[31:0] CW400x Source Register (rs ) Bus Output
CRTP[31:0] CW400x Source Register (rt ) Bus Output
CRX_VALIDN Register Buses Valid Output
FlexLink Interface 7-15
Table 7.3System LogicFlexLink InterfaceSignals
Signal Definition I/O 1
1. Input to system logic from CU. Output from system logic to CU.2. This signal is used by the FlexLink module to determine when the system is
stalling. If, for some reason, the FlexLink module needs to differentiatebetween pipeline stall cycles and bus stall cycles, the CPIPE_RUNN signalmay be substituted for this signal. For most systems these two signals couldbe used interchangeably.
BCPU_RESETN CW400x Reset Output
CRUN_INN2 CW400x Run Enable Output
GSCAN_ENABLEP Scan Test Mode Enable Output
GSCAN_INP Scan Test Input Output
GSCAN_OUTP Scan Test Output Input
PCLKP System Clock Output
7-16 Interfaces
7.2.2ComputationalUnitInstructions
The Computational Unit (CU) Instructions can use any of the availableopcodes shown in Figure 7.9. The CU can support up to a maximum of60 additional instructions: 22 I-type and 38 R-type.
Figure 7.9Opcodes
01234567
0 1 2 3 4 5 6 7CIRP[28:26]
CIRP[31:29]
I-Type
01234567
CIRP[5:3] 0 1 2 3 4 5 6 7
CIRP[2:0]R-Type
Required for R-Type instructionsReserved MIPS-II CW400x InstructionAvailable to be used by the Computational Unit (CU)
KEY
Unimplemented MIPS2 instruction (available to be used by CU; however, ifthe CU uses it to implement a user-defined instruction, users should make sure
MFHI/LO, MTHI/LO Instructions; same reason as above)(available to the CU, but recommended to be used for MULT, DIV,Unimplemented MIPS1 MULT, DIV, MFHI/LO, MTHI/LO Instructions
the CU will mishandle that MIPS2 Instruction as another user-defined instruction.that the unimplemented MIPS2 instruction is not in the instruction stream, otherwise
FlexLink Interface 7-17
7.2.2.1 R-Type CU Instructions
Figure 7.10 shows the format of R-Type Instructions.The CW400xpasses the instruction bits that contain the opcode (Bits [31:26] andBits[5:0] of the instruction) to the CU on CIR_TOPP[5:0] andCIR_BOTP[5:0]. At the same time, CW400x also delivers the rs and rt
Source Registers on CRSP[31:0] and CRTP[31:0]. At the end of theoperation, the CW400x gets the result from the CU throughAXBUSP[31:0], and writes it back to rd Destination Register.
If the instruction’s rd Field (Bits [15:11]) = 000002, the CW400x will notwrite the CU’s result back into the CW400x register.
Figure 7.10R-Type Arithmetic(Extended) Instruction
0 Zeroes [31:26]All six bits must be zero.
rs Register File Operand Address [25:21]Five-bit source register specifier.
rt Register File Operand Address [20:16]Five-bit source register.
rd Register File Destination Address [15:11]Five-bit destination register specifier.
0 Zeroes [10:6]All five bits must be zero.
op Instruction Code [5:0]Six-bit opcode.
31 26 25 21 20 16 15 11 10 6 5 0
0 rs rt rd 0 op
7-18 Interfaces
7.2.2.2 I-Type CU Instruction
Figure 7.11 shows the format I-Type Instructions. The CW400x passesthe instruction bits that contain the opcode (Bits [31:26] of the instruction)to the CU on CIR_TOPP[5:0]. At the same time, CW400x also deliversthe rs Source Register and the sign-extended immediate on CRSP[31:0]and CRTP[31:0] respectively. At the end of the operation, the CW400xgets the result from the CU through AXBUSP[31:0], and writes it back tothe rt Destination Register. If the CU wants to have the result writtenback to its own register instead of the CW400x’s, the instruction shouldhave Bits [20:16] = 0.
If the instruction’s rt Field (Bits [20:16]) = 000002, the CW400x will notwrite the CU’s result back into the CW400x Register.
Figure 7.11I-Type Arithmetic(Extended) Instruction
op Instruction Code [31:26]Six-bit opcode.
rs Register File Operand Address [25:21]Five-bit source register specifier.
rt Register File Destination Address [20:16]Five-bit destination register.
immediate 16-bit Immediate [15:0]The sign extends this value to 32 bits and passes it toCRTP[31:0].
7.2.3Operation andFunctionalWaveforms
A CU can implement single or multicycle instructions (instructions whichwrite results back to the CU Registers or the CW400x Registers). Thefollowing text describes a general mechanism for instruction handling.
As soon as the CU decodes a valid instruction, it must assert ASELP toprevent the CW400x from signaling a Reserved Instruction Exception.(The CU must continue to assert ASELP during stalls.) It can then start
31 26 25 21 20 16 15 0
op rs rt immediate
FlexLink Interface 7-19
the operation by using the operands from CRSP[31:0] and CRTP[31:0].The CU has to make sure that CRX_VALIDN is LOW by the end of thatcycle. If CRX_VALIDN is HIGH, it means that the operands it has justobtained are not valid, and it should obtain the operands again in thenext cycle, and restart the operation. The CW400x guarantees at leastone X Cycle where CRX_VALIDN is LOW. If one wants to save power,and performance is less an issue, the CU can check to see ifCRX_VALIDN is LOW before loading the operands off CRSP[31:0] andCRTP[31:0] to start the operation. Depending on where the writeback is,one of the following things will happen:
♦ If the instruction is one which writes the result back to the CW400x’sRegister, the CU has to assert ASTALLP to stall the CW400x untilthe operation is done. The CU deasserts ASTALLP in the same cyclethe CU puts the result onto AXBUSP[31:0]. If, for any reason, theCW400x is still stalling after ASTALLP is deasserted (indicated by aHIGH CRUN_INN), the CU can still keep ASTALLP deasserted, buthas to make sure that a valid result is on AXBUSP[31:0] in the lastcycle before the CW400x goes on to the next run cycle. In order toachieve this, the CU can drive AXBUSP[31:0] the whole time theCW400x is stalling. This whole period when the CU decodes a validinstruction until the CU finishes driving AXBUSP[31:0] is consideredto be an Extended X Stage of the instruction. If CKILLXP is assertedduring this extended X Stage, the CU should kill the instruction, andthe CW400x will make sure that no writeback happens in the end.
♦ For the case where the CU writes the result back to its own registers,the CU does not assert ASTALLP, and the CW400x can continuewith the next instruction. When the operation is finished, the CUwrites the result back to its own registers. Note that in this case,since the CU does not stall the CW400x until the operation is com-plete, the next run cycle after the CU decodes the instruction is nolonger considered to be the X Stage of the instruction. It is the XStage of the next instruction. The CU can ignore any assertion ofCKILLXP or CRUN_INN after it passes the X Stage of an instruction.On the other hand, if the CW400x tries to read the result from theCU Register before it is ready, the CU must stall the Read Instructionby asserting ASTALLP until the result is ready.
7-20 Interfaces
7.2.3.1 Single-cycle Operations
Figure 7.12 shows a typical Computational Unit single-cycle operationthat writes its result into the CW400x CPU Register.
Figure 7.12Computational UnitWrite to CW400x CPURegister
MD95.165
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
Data
FlexLink Interface 7-21
Figure 7.13 shows a Computational Unit single-cycle operation that iskilled by CKILLXP.
Figure 7.13Computational UnitSingle-Cycle Killed byCKILLXP
MD95.166
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
CKILLXP
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
7-22 Interfaces
Figure 7.14 shows a Computational Unit single-cycle operation that isstalled by the CW400x in its X Stage, and then killed by CKILLXP in itsExtended X Stage.
Figure 7.14Computational UnitOperation, Stalled andKilled
MD95.167
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
CKILLXP
CRUN_INN
AXBUSP[31:0]
ASELP
ASTALLP
Instruction
Data
FlexLink Interface 7-23
7.2.3.2 Multicycle Operation - Result to CW400x Register File
Figure 7.15 shows a two-cycle Computational Unit operation that writesthe result back to the CW400x Register File.
Figure 7.15Two-CycleComputational UnitOperation (Example 1)
MD95.168
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
Data
7-24 Interfaces
Figure 7.16 shows a two-cycle Computational Unit operation that writesthe result back to CW400x Register File.
Note that in the first cycle of the X Stage, CRX_VALIDN is HIGH (deas-serted), which indicates that data on CRSP[31:0] and CRTP[31:0] maynot be valid. The CU may have started its two-cycle operation, but shouldnot proceed. In the next cycle, the CU reads CRSP[31:0]/CRTP[31:0]again, and restarts its two-cycle operation. This time CRX_VALIDN goesLOW which means the CRSP[31:0]/CRTP[31:0] are valid; therefore, theCU can let its two-cycle operation proceed. The CU can count on the factthat after CRX_VALIDN becomes LOW, it will stay LOW until the end ofthe current X Stage. Therefore, the CU can latch in the operands fromCRSP[31:0] and CRTP[31:0] when it sees CRX_VALIDN is LOW at therising clock edge, and use the latched operands for the rest of the mul-ticycle operation. Alternatively, the CU can also choose not to latch in theoperands, and depend on CRSP[31:0] and CRTP[31:0] being held for thewhole X Stage.
Figure 7.16Two-CycleComputational UnitOperation (Example 2)
Two-cycle CU Operation
MD95.169
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
Data
FlexLink Interface 7-25
Figure 7.17 shows a 3-cycle Computational Unit operation whichattempts to write the result back to the CW400x Register File, but is killedby CKILLXP.
Note that during Cycle 3, although ASTALLP is asserted, CRUN_INN isdeasserted. This is because the CW400x ignores the CU data depen-dency stall request when it sees CKILLXP asserted. If the CU wants tostall the CW400x regardless of CKILLXP, it can output a separate stallsignal to the CW400x GOE Module (refer to Section 6.1, “Global OutputEnable Module (GOE)”) which ORs all modules’ stall signals to generatea global stall signal.
Figure 7.17Three-CycleComputational UnitOperation
MD95.170
1 2 3
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
CKILLXP
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
7-26 Interfaces
Figure 7.18 shows a two-cycle Computational Unit operation whichattempts to write result back to the CW400x Register File, but is stalledby the CW400x by an extra cycle, and then killed by CKILLXP.
Figure 7.18Stalled Two-CycleComputational UnitOperation
CU Stall CycleCPU Extra
Instruction's X Stage
MD95.171
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
CKILLXP
CRUN_INN
AXBUSP[31:0]
ASELP
ASTALLP
Stall Cycle
Instruction
Data
FlexLink Interface 7-27
7.2.3.3 Multicycle Operation - Result Back to Own Register File
Figure 7.19 shows a two-cycle Computational Unit operation which writesresult back to its own register file.
Figure 7.19Two-Cycle CUOperation withWriteback (Example 1)
MD95.172
CU Writes Result Back to its Own Registers
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CRX_VALIDN
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
CU Instruction Move from CU Instruction
Data
Data
7-28 Interfaces
Figure 7.20 shows a two-cycle Computational Unit operation which writesresult back to its own register file.
CKILLXP cannot kill it anymore after it passes its X Stage.
Figure 7.20Two-Cycle CUOperation withWriteback (Example 2)
CRX_VALIDN
Instruction's X Stage
MD95.173
CU Writes Result Back to its Own Registers
PCLKP
CIR_TOPP[5:0], CIR_BOTP[5:0]
CRSP[31:0], CRTP[31:0]
CKILLXP
ASELP
AXBUSP[31:0]
ASTALLP
CRUN_INN
Instruction
Data
8-1
Chapter 8Methodologies andLayout Guidelines
This chapter describes methodologies and layout guidelines for theCW400x Microprocessor. It contains the following sections:
♦ Section 8.1, “Clocking Methodology”
♦ Section 8.2, “Scan Methodology”
♦ Section 8.3, “Layout Guidelines”
For Data Bus Methodology see Section 6.1, “Global Output Enable Mod-ule (GOE).”
8.1ClockingMethodology
This section describes the clocking methodology used in the CW400xCore, the MMU and the MDU building blocks. Users may consider this aguideline of how they handle the clock when designing their own buildingblocks.
8-2 Methodologies and Layout Guidelines
LSI Logic recommends a two-level clock distribution network for chipswhich use the CW400x (see Figure 8.1).
Figure 8.1Two-level ClockDistribution Network
8.1.1Duty Cycle
The duty cycle of clock oscillators varies in different IC and boarddesigns, and when the global clock passes through the clock tree to localregisters, it goes through cells which have different rising and fallingdelays. As a result, the clock duty cycle may vary from 30% to 70%. Inorder to make them easier to use, the CW400x, MMU, and MDU aredesigned to work with such a varying clock duty cycle by using only oneedge of the clock.
8.1.2Local ClockBuffers
Inside each core or building block, the global clock is buffered locallybefore being used, and the clock buffers are in separate modules(cw400x_gck* modules described Section 8.1.3, “Gated Clocks”) withinthe building blocks. This methodology allows the designer to specify thatthe clock buffer modules not be touched by the synthesis tool, andthereby minimize clock skews and ramp times.
CW400xBuilding
Building
Chip That Includes the CW400x Core
BoardLevelClock
Use wire length to controlthe clock skew betweendifferent blocks on the die;a balanced clock tree cangive a clock skew of lessthan 0.2 ns between buildingblocks in a nominal environment.
Block 2
Block 1
Clocking Methodology 8-3
Table 8.1 shows what driver types and the module names LSI Logic usesfor different loadings.
Table 8.1Driver Type andModule Name
In a nominal environment, the CW400x and MDU has a typical clockdriver delay of 0.7 ns (from the clock input through the local buffers toindividual flip-flops or latches, including the wire delay) while the MMUhas a typical clock driver delay of 0.6 ns. Clock skews are also limited to0.1 ns inside the CW400x, MMU, and MDU.
8.1.3Gated Clocks
The CW400x, the MMU, and the MDU use gated clocks to save power.These building blocks use the cw400x_gckand2x (x= l, a, b, c) gatedclock buffer modules which contain the logic shown in Figure 8.2. Thislogic guarantees the signal which is to be gated with the clock is stableover the high phase of the clock. Having the logic in a separate moduleensures that the synthesis tool does not improperly optimize the logic.
Figure 8.2Gated Clock Logic
8.1.4Delayed Clocks
The CW400x uses delay cells to delay the clocks for the CW400x’s Reg-ister File and in the MDU for latching operands. LSI Logic has manuallychecked the delayed clocks after layout to make sure that the clocks havethe correct timing.
Driver Type Module NameLoad Range(Standard Loads)
and2l cw400x_gckand2l 2.3 - 4.8
and2a cw400x_gckand2a 4.5 - 9.8
and2b cw400x_gckand2b 9.0 - 19.7
and2c cw400x_gckand2c 14.4 - 30.8
Clock
Gated ClockGate SignalGSCAN_ENABLEP Q
Latch
GN
AND2xDOR
8-4 Methodologies and Layout Guidelines
8.1.5Hold TimeMargin
To ensure that the core and building blocks are free of hold time prob-lems, additional hold time margin (see Table 8.2) is guaranteed to allinternal flip-flops. This additional hold time margin ensures a robustdesign that is immune from bad clock skew.
Table 8.2Hold Time Margin
8.2ScanMethodology
Users of the CW400x Core have two options for production testing. Theycan use the core as part of their chip-level full-scan chain or they canuse the core with an Automatic Test Pattern Generation (ATPG) shellaround it and LSI Logic guaranteed patterns.
This section describes how to perform tests for the CW400x using eachof these methods. It is important that the customer follow all LSI Logicmethodology recommendations for scan testing.
For this core, LSI Logic used Mentor Graphics DFT Tools: DFTAdvisorfor scan insertion and rules checking and FastScan for ATPG. For mostcustomers, it would be best to use the same toolset if possible, in orderto avoid any roadblocks associated with using other tools.
Environment Hold Time Margin
BCCOM 0.3 ns
NOM 0.5 ns
WCCOM 0.7 ns
Scan Methodology 8-5
8.2.1Methodology
Figure 8.3 shows the generic flow which should be used for scan inser-tion and ATPG for the core. The diagram numbers each step, which areexplained in the text following the diagram.
Figure 8.3Methodology Flowchart
1. Start.
2. Design - Design includes all aspects of design, including RTL design,Synthesis, Timing Analysis, and so on. Layout is not included in thisflow; it should be done after scan insertion.
3. Synthesize Netlist - Use a preliminary netlist in order to start scaninsertion and ATPG. Remember that any resynthesis will requireredoing the scan insertion and ATPG.
Note that inserting scan will probably increase the area and pathdelays, so layout and final timing analysis must be done after scaninsertion.
4. Rules Checking - Rules checks includes all Design-for-Test ruleschecking. This may include checks from LSI Logic and/or from theATPG Tool itself. LSI Logic supplies a dummy core netlist for runningall rules checks. This netlist should prevent violations from occurring
3. Synthesize Netlist
4. Rules Checking
6. Insert Scan
2. Design
5. Pass
YES
NO
7. Hook up Core
8. Run ATPG
10. End
9. GoodNO
YES
Coverage?
to Layout
Checks?
1. Start
8-6 Methodologies and Layout Guidelines
inside the core. This dummy netlist contains no gates, just IO pins,and a simple connection from scan test input to scan test output.
5. Pass Checks? - If rules checking passes, then insert scan. If rulesare violated, fix them by changing the design.
6. Insert Scan - Perform scan chain synthesis.
7. Hook up Core - Once scan is inserted, manually hook up the coreas needed to the scan chain. The core and all building blocks havefull-scan inserted already.
8. Run ATPG - Either the customer or LSI Logic can run ATPG. (ATPGis described more fully in Sections 8.2.2 through 8.2.6.) LSI Logicgrants access to all building block netlists. LSI Logic runs throughATPG for all modules to make sure that 99% coverage is achievable.LSI Logic does not guarantee fault coverage for building blocks; thecustomer must generate patterns for them.
9. Good Coverage? - If coverage is not good, change the way the ATPGtool is being utilized, change the scan-insertion scheme, or add con-trol and/or observation points. If any of the parts of the design areuntestable, change the design.
10. End - Once all of these aspects are fixed, fault coverage will reachan acceptable level. This level is usually 99% single stuck-at faultcoverage.
8.2.2Regeneration(RecommendedMethodology)
8.2.2.1 Overview
This method uses the scan chain inside of the core as part of the overallscan chain. ATPG patterns are completely regenerated for the core. Theadvantage of this flow is that it saves area and performance associatedwith an ATPG shell which otherwise would need to be placed around thecore. The disadvantage is that the ATPG vectors must be regeneratedfor each customer design. Because of this, LSI Logic cannot guaranteefault coverage inside of the core with this methodology.
With this core, Mentor FastScan must be used for ATPG. This does notprevent the customer from doing preliminary ATPG with a different tool.But, in order to get the highest coverage, the advanced features ofFastScan are needed. Note that we did not use any scan-sequentialpatterns.
Scan Methodology 8-7
8.2.2.2 Methodology
In order to use this flow, the customer must be using full-scan in the logicoutside of the core. When hooking up the core to the scan chain, the cus-tomer can choose to hook everything up into a single chain or multiplechains. As long as the LSI Logic scan methodology is followed, thereshould be no problem.
In order to run preliminary ATPG, the customer must use the dummycore netlist. The customer will not have access to the internal core netlist,and so cannot do ATPG for this block. LSI Logic Field Coreware Engi-neers (FCEs) will generate ATPG patterns using Mentor FastScan.
8.2.3Core ATPGShell
8.2.3.1 Overview
In this method, the customer uses pre-generated core patterns with anATPG shell placed around the core. The advantage of this method is thatthe customer does not need to regenerate core patterns and is guaran-teed greater than 99% coverage for the core. The disadvantages are theadded area and delays associated with the ATPG shell. Also, the corescan test input and scan test output pins must be IO pins of the chip,and so may need special attention.
8-8 Methodologies and Layout Guidelines
8.2.3.2 ATPG Shell
The ATPG shell enables all core inputs to be controllable and all outputsto be observable. During normal operation, it does not affect the func-tionality of the I/Os, but, when GTEST_ENABLEP is asserted, the inputsare driven by scannable flip-flops. The outputs are clocked into scanna-ble flip-flops as well (Figure 8.4 and Figure 8.5). Note that the flip-flop forinputs can be shared with the one for outputs, since inputs only use theQ Pin and outputs only use the D Pin.
Figure 8.4Input PinSchematic forATPG Shell
Figure 8.5Output PinSchematic forATPG Shell
cw400x_ccpu_scan_shell Module
cw400x_ccpu Module
Input Pin
GTEST_ENABLEP
0
1
Input Pin
PCLKP
D Q
cw400x_ccpu_scan_shell Module
cw400x_ccpu Module
Output PinOutput Pin
PCLKP
D Q
Scan Methodology 8-9
Bidirectional I/Os are inherently observable and controllable without addi-tional logic, since they are both inputs and outputs (Figure 8.6). COENshould be asserted during scan testing of the CW400x.
Figure 8.6Bidirectional PinSchematic forATPG Shell
To use the ATPG Shell, the customer calls it instead of the CW400xCore. It has exactly the same I/O pins as the core itself. This modulecalls the core inside of it.
8.2.3.3 Methodology
In order to use this flow, the customer must bring the core scan pins(GSCAN_INP, GSCAN_OUTP, GSCAN_ENABLEP, andGTEST_ENABLEP) out to the chip level. The customer can use any test-ing methodology outside of the core.
The customer can run preliminary ATPG without the core, since it shouldnot impact the outside logic anyway. In this scenario, the customer willhave some test patterns and the core will also have test patterns.
8.2.4CW400x ATPGGuidelines
The CW400x is a special case because it contains a RAM (the RegisterFile). This RAM is isolated by scannable flip-flops, so it should not givetoo much trouble. It does require that a functional pattern be run in orderto test the register file itself. Furthermore, this register file test sequenceis needed to fully test the paths into and out of the register file. TheCW400x logic needs these patterns to get above 99% coverage.
The datapath has scan inserted manually. It uses an optimized scanstructure that takes advantage of existing routes. This saves area, but the
cw400x_ccpu_scan_shell Module
cw400x_ccpu Module
Bidirectional PinPinBidirectional
8-10 Methodologies and Layout Guidelines
control must be such that certain signals are held a certain way duringscan. The control modules have scan inserted by DFTAdvisor.
8.2.5MMU ATPGGuidelines
The MMU must have scan inserted by DFTAdvisor. Be careful to addproper buffering to GSCAN_INP, GSCAN_OUTP, and GSCAN_ENABLEP.
The MMU is connected to the Data Bus, so care must be taken sincethis is a 3-stateable bus. Furthermore, the MMU contains a RAM,RRTLB1. This RAM is not isolated. Since the customer will have to runcertain patterns through this RAM in order to test it, we have countedthese simple patterns in the logic coverage. They are needed to test allpaths to and from the MMU. By doing this, we believe the MMU willachieve greater than 99% coverage. The customer should be able to runATPG on the MMU, although it will just show decreased coverage due tothe RAM being unknown.
8.2.6MDU ATPGGuidelines
In the MDU, the datapath has scan inserted manually in the structuraldesign. The cw400x_amdu_ctrl module must have scan inserted byDFTAdvisor. Then, these two chains must be stitched together.
You must convert the flip-flop that is clocked by the delayed clock to abuffer for the purposes of ATPG. It cannot be scanned because of thedelayed clock, but it will always clock in the value just calculated so thisis functionally equivalent for ATPG.
Be careful to add proper buffering to GSCAN_INP, GSCAN_OUTP, andGSCAN_ENABLEP after scan insertion.
Other than these things, the MDU is straightforward — no RAMs, no3-states. It should get very high coverage.
Layout Guidelines 8-11
8.3LayoutGuidelines
The performance of the CW400x Microprocessor Core, and the easewith which it is laid out, is dependent on the placement of the CW400xand the associated building blocks on the chip. This chapter discussesthe connections between these modules, and gives suggestions as tohow to lay out the CW400x Microprocessor Core and its associatedbuilding blocks.
8.3.1Hardmac I/OPlacement
In order to understand how the modules should be placed relative toeach other, it is important to know the locations of the interfaces on thehardmacs. Three modules are provided as hardmacs: the CW400xMicroprocessor, the BBCC, and the MDU. Although the orientation ofthese hardmacs can be rotated and flipped, this chapter refers to thehardmacs in the orientation shown in Figures 8.7 through 8.9.
8-12 Methodologies and Layout Guidelines
8.3.1.1 CW400x
Figure 8.7 shows a diagram of the CW400x Microprocessor Hardmac.Notice that the Data Bus can be accessed from both the left and rightsides of the hardmac. This layout helps avoid routing the Data Busaround the hardmac. In most cases, only the left Data Bus pins will beused, since this is the side with the control pins.
Figure 8.7CW400x Hardmac
Interrupts
CoprocessorCondition Bits
Data OutputEnable
MMUExceptions
CBusControls
CUInstructionBus
Data Bus (DATAP[31:0])
Address (ADDRP[31:0])CU Register File Buses/CU Result Bus/
MD95.155
CW400x
Layout Guidelines 8-13
8.3.1.2 BBCC
Figure 8.8 shows a diagram of the BBCC Hardmac. This hardmac alsohas pins for the Data Bus on both sides of the hardmac in order toimprove the routing of the Data Bus.
Figure 8.8BBCC Hardmac B-Bus Address B-Bus Data
CBusControls
B-BusControls
ConfigurationRegister
Cache RAMControls
TagTag for Matching
IndexData Bus (DATAP[31:0])
DataOutputEnable
Mapped Address
Address
WBControl WB Address WB Data
MD95.156
BBCC
(MADDROUTP[31:2])
(ADDRP[14:2]
8-14 Methodologies and Layout Guidelines
8.3.1.3 MDU
Figure 8.9 shows a diagram of the MDU Hardmac.
Figure 8.9MDU Hardmac
8.3.2Data Bus
The routing of the Data Bus on the chip is very important. The Data Busgoes to many modules: the CW400x, the BBCC, the MMU, the copro-cessors, the cache RAMs, and the write buffer. Because of this, the load-ing of the Data Bus can become quite high. Having excessive loading onthis bus can cause problems since it is a 3-state bus, and the 3-statedrivers may be slow in driving it. In the layout of the chip, the Data Busshould be kept as short as possible.
8.3.3CW400xPlacement
The CW400x Microprocessor Hardmac is designed with almost all of itspins on the left and bottom sides (the exception is the Data Bus, whichis on both the left and right sides). Since there are no pins on the rightside of the chip, the CW400x can be easily placed with its right side
Register File Buses/Result Bus
Instruction Bus
MDU
MD95.157
Layout Guidelines 8-15
against the edge of the chip (shown as (a) in Figure 8.10). The top of theCW400x can also be placed against an edge of the chip (placing theCW400x in a corner, which is shown as (b)). The bottom of the CW400xcan be placed near an edge of the chip if the design does not containan MMU or Computational Unit (CU) (shown as (c)).
Figure 8.10CW400x PlacementExample
Chip ChipChip with no
CW400x
(a) (b) (c)MD95.158
MMU or CU
CW400x
CW400x
8-16 Methodologies and Layout Guidelines
8.3.4BBCCPlacement
The BBCC is designed to be placed very close to the CW400x, on its leftside, as shown in the following figure. The Data Bus pins of the CW400xand the BBCC should be exactly aligned in order to obtain the best rout-ing of the Data Bus. Aligning the Data Bus pins causes the power busesin the CW400x and BBCC to also align.
While the CW400x and BBCC should be placed close together, enoughroom should be left for the signals that need to be routed between theCW400x and the BBCC. These signals include the MMU Exception Sig-nals (if a MMU is in the design), and the CU Instruction Bus (if a CU isin the design). The Data Bus may also need to be routed between theCW400x and the BBCC in some instances. Figure 8.11 shows BBCCsuggested placement.
Figure 8.11BBCC SuggestedPlacement
Data Bus
Power
CBusControlsCBus
Controls
CW400x
BBCC
MD95.159
Layout Guidelines 8-17
8.3.5ComputationalUnit Placement
The Computational Unit module (for example the MDU Building Block)should be placed below the CW400x, as shown in Figure 8.12.
Figure 8.12Computational UnitSuggestedPlacement
CU Instruction Bus
CU Register File Buses/CU Result Bus
Instruction
Register File Buses/Result Bus
CW400x
MDU
Bus
MD95.160
8-18 Methodologies and Layout Guidelines
8.3.6MMU Placement
The MMU Building Block may consist of an MMU or the MMU Stub. It isnot a hardmac. The MMU should also be placed below the CW400x (asshown in Figure 8.13). If both a CU and a real MMU exist in the design,then the layout should accommodate both as well as possible. Mostlikely, placing the MMU to the left of the CU would result in better routing(as shown in Figure 8.14).
Figure 8.13MMU (with no CU)SuggestedPlacement
MMU Exceptions
ADDRP[31:0]
MADDROUTP[31:2]
ADDRP[14:2]
BBCC
CW400x
MMU
MD95.161-1
Layout Guidelines 8-19
Figure 8.14MMU (with CU)SuggestedPlacement
MMU Exceptions
MADDROUTP[31:2]
ADDRP[14:2]
BBCC
CW400x
MMU
MD95.161-2
Instruction
Register File BusesResult Bus
CU
Address
Register File/Results
ADDRP[31:0]/CU Register File Buses/
CU Results
CU Instruction Bus
Bus
8-20 Methodologies and Layout Guidelines
8.3.7CoprocessorPlacement
The interface between a coprocessor and the CW400x consists of theData Bus and the CBus controls. The coprocessor can be placed to theleft of the CW400x if no BBCC is present in the design. If a BBCC doesexist in the design, the coprocessor can be placed either above or belowthe BBCC. Some examples of coprocessor placement are shown in Fig-ures 8.15 through 8.17.
Figure 8.15CoprocessorPlacementExample 1
Figure 8.16CoprocessorPlacementExample 2
CBusControls
CW400x
MD95.162-1
Coprocessor Data Bus
CBusControls
CW400x
MD95.162-2
Coprocessor
CBusControls
BBCC
Data Bus
Layout Guidelines 8-21
Figure 8.17CoprocessorPlacementExample 3
CBusControls
CW400x
MD95.162-3
Coprocessor
CBusControls
BBCC
Data Bus
8-22 Methodologies and Layout Guidelines
8.3.8Global OutputEnable (GOE)Placement
The Global Output Enable (GOE) is a small module, and is not providedas a hardmac. It generates the Run Signals, and also the output enablesfor the Data Bus. These are both time-critical, and the placement of theGOE is important. It should be close to the CW400x, BBCC, MMU, CU,and coprocessors. A suggested placement for the GOE is shown inFigure 8.18. The GOE is described in more detail in Chapter 6.
Figure 8.18Global OutputEnable SuggestedPlacement
BBCC
CW400x
MD95.163
MMU, CU,
Output Enable (MOEN)GOE
orCoprocessor
Run Signals
Run Signals
Output Enable (COEN)
Run Signals
Output Enable(BIUOEN)
Layout Guidelines 8-23
8.3.9Cache RAMsPlacement
The Cache RAMs are controlled by the BBCC. The control pins are onthe left side of the BBCC. It is more important to have the Tag RAMsclose to the BBCC control pins than the Data RAMs, since the tagmatch(the match logic between the tag in the Tag RAMs and the tag from theBBCC) is a critical path and should be optimized. An example placementof the cache RAMs is shown in Figure 8.19.
Figure 8.19Cache RAMsPlacement Example
BBCC
CW400x
MD95.164
Cache RAM Controls
Tag
D-Cache/I-CacheSet 0
Tag RAM
I-Cache Set 1Tag RAM
I-Cache Set 1Data RAM
D-Cache/I-CacheSet 0
Data RAM
Data Bus
8-24 Methodologies and Layout Guidelines
8.3.10TagmatchPlacement
The logic to compare the tags in the tag RAMs to the transaction tagshould be close to both the tag RAMs and the BBCC, as this logic istime-critical. The tag RAMs’ output is also connected to the Data Busthrough 3-state gates. Figure 8.20 shows the connections to the tag-match logic and the connections from the tag RAMs to the Data Bus.
Figure 8.20Tagmatch Placement
BBCC CW400x
MD95.174
Match
Tag for Matching
D-Cache/I-CacheSet 0
Tag RAM
I-Cache Set 1Tag RAM
I-Cache Set 1Data RAM
D-Cache/I-CacheSet 0
Data RAM
TAG Match/3-State Gates
Data Bus
Layout Guidelines 8-25
8.3.11Write BufferPlacement
The Write Buffer has many connections to the bottom of the BBCC. Italso receives data from the Data Bus and the address from the MMU.Figure 8.21 shows an example placement.
Figure 8.21Write BufferPlacementExample
CW400x
BBCC
MD95.175
WB WB
Write Buffer
MMU
MADDROUTP[31:2]
Address
Data Bus
Address Data
8-26 Methodologies and Layout Guidelines
8.3.12B-Bus DevicePlacement
B-Bus Devices have many connections to the top of the BBCC. An exam-ple placement is shown in Figure 8.22.
Figure 8.22B-Bus DevicePlacementExample
CW400x
BBCC
MD95.176
B-Bus B-BusAddress Data
B-BusControls
B-Bus Device
Data Bus
A-1
Appendix AStructural ALUImproper UnknownValue (X) Handling
The structural simulation model for the CW400x does not handleunknown values (Xs) properly in four instructions. The actual siliconworks correctly, but the simulation model is incorrect. It is incorrectbecause of LSI Logic’s fast implementation of the ALU, and cannot befixed while keeping accurate gate-level modeling of the design.
The four instructions are:
1. AND rd, rs, rt
rd is the destination register
rs is a source register which contains an X at bit b (b is any bit)
rt is a source register which contains a zero at bit b
After execution of the AND Instruction, the destination register willincorrectly contain an X instead of a zero at bit b.
example: rs = 0x0000.000X; rt = 0x0000.0000; rd = 0x0000.000X
2. ANDI rt, rs, immed
rt is the destination register
rs is a source register which contains an X at bit b (b is any bit)
immed is the immediate field which contains a zero at bit b
After execution of the ANDI Instruction, the destination register willincorrectly contain an X instead of a zero at bit b.
example: rs = 0x0000.00X0; immed = 0x0000.0000; rt =0x0000.00X0
3. OR rd, rs, rt
rd is the destination register
rs is a source register which contains an X at bit b (b is any bit)
rt is a source register which contains a one at bit b
A-2 Structural ALU Improper Unknown Value (X) Handling
After execution of the OR Instruction, the destination register willincorrectly contain an X instead of a one at bit b.
Switching rs and rt will produce an identical incorrect X result.
example: rs = 0x0000.000X; rt = 0x0000.0001; rd = 0x0000.000X
example: rs = 0x0000.0001; rt = 0x0000.000X; rd = 0x0000.000X
4. ORI rt, rs, immed
rt is the destination register
rs is a source register which contains an X at bit b (b is any bit)
immed is the immediate field which contains a one at bit b
After execution of the ORI Instruction, the destination register willincorrectly contain an X instead of a one at bit b.
Switching rs and immed will produce an identical incorrect X result.
example: rs = 0x0000.00X0; immed = 0x0000.0010; rt =0x0000.00X0
example: rs = 0x0000.0010; immed = 0x0000.00X0; rt =0x0000.00X0
All remaining cases, including X cases, are handled correctly.
The incorrect X handling may create problems when trying to mask reg-isters that have not been fully initialized. Specifically, the CP0 Cause andStatus Registers should be initialized in the software reset handler to pre-vent this problem.
Note that AND rd, r0, X and ANDI rt, r0, X properly produce a 0and may be used to mask uninitialized registers.
The Register Transfer Level (RTL) ALU handles all cases, including thefour listed above, correctly.
Customer Feedback
We would appreciate your feedback on this document. Please copy thefollowing page, add your comments, and fax it to us at the address onthe following page.
If appropriate, please also fax copies of any marked-up pages from thisdocument.
Important: Please include your name, phone number, fax number, andcompany address so that we may contact you directly forclarification or additional information.
Thank you for your help in improving the quality of our documents.
Customer Feedback
Reader’sComments
Fax your comments to:
LSI Logic CorporationTechnical PublicationsM/S G-712Fax: 408.433.8989
Please tell us how you rate this document: MiniRisc CW400x Micropro-cessor Core Technical Manual. Place a check mark in the appropriateblank for each category.
What could we do to improve this document?
If you found errors in this document, please specify the error and pagenumber. If appropriate, please fax a marked-up copy of the page(s).
Please complete the information below so that we may contact youdirectly for clarification or additional information.
Excellent Good Average Fair PoorCompleteness of information ____ ____ ____ ____ ____Clarity of information ____ ____ ____ ____ ____Ease of finding information ____ ____ ____ ____ ____Technical content ____ ____ ____ ____ ____Usefulness of examples andillustrations ____ ____ ____ ____ ____
Overall manual ____ ____ ____ ____ ____
Name Date
Telephone
Title
Company Name
Street
City, State, Zip
Department Mail Stop
Fax
U.S. Distributorsby State
AlabamaHuntsvilleHamilton HallmarkTel: 800.633.2918
Wyle ElectronicsTel: 800.964.9953
ArizonaPhoenixHamilton HallmarkTel: 800.528.8471
Wyle ElectronicsTel: 602.804.7000
TempeHamilton HallmarkTel: 602.414.7705
CaliforniaCulver CityHamilton HallmarkTel: 310.558.2000
IrvineHamilton HallmarkTel: 714.789.4100
♦Wyle ElectronicsTel: 714.789.9953
Los AngelesWyle ElectronicsTel: 818.880.9000
RocklinHamilton HallmarkTel: 916.624.9781
SacramentoWyle ElectronicsTel: 916.638.5282
San DiegoHamilton HallmarkTel: 619.571.7540
Wyle ElectronicsTel: 619.565.9171
San Jose♦Hamilton Hallmark
Tel: 408.435.3500
Santa ClaraWyle ElectronicsTel: 408.727.2500
Woodland HillsHamilton HallmarkTel: 818.594.0404
ColoradoColorado SpringsHamilton HallmarkTel: 719.637.0055
Denver♦Wyle Electronics
Tel: 303.457.9953
EnglewoodHamilton HallmarkTel: 303.790.1662
ConnecticutCheshireHamilton HallmarkTel: 203.271.2844
FloridaFort LauderdaleHamilton HallmarkTel: 305.484.5482
Wyle ElectronicsTel: 305.420.0500
LargoHamilton HallmarkTel: 800.282.9350
OrlandoWyle ElectronicsTel: 407.740.7450
Tampa/N. FloridaWyle ElectronicsTel: 800.395.9953
Winter ParkHamilton HallmarkTel: 407.657.3317
GeorgiaAtlantaWyle ElectronicsTel: 800.876.9953
DuluthHamilton HallmarkTel: 800.241.8182
IllinoisArlington Heights
♦Hamilton HallmarkTel: 708.797.7300
ChicagoWyle ElectronicsTel: 708.620.0969
IowaCarmelHamilton HallmarkTel: 800.829.0146
KansasOverland ParkHamilton HallmarkTel: 800.332.4375
KentuckyLexingtonHamilton HallmarkTel: 800.235.6039
MarylandBaltimoreWyle ElectronicsTel: 410.312.4844
ColumbiaHamilton HallmarkTel: 800.638.5988
MassachusettsBoston
♦Wyle ElectronicsTel: 800.444.9953
Peabody♦Hamilton Hallmark
Tel: 508.532.3701
MichiganPlymouthHamilton HallmarkTel: 313.416.5800
MinnesotaBloomingtonHamilton HallmarkTel: 612.881.2600
MinneapolisWyle ElectronicsTel: 800.860.9953
MissouriEarth CityHamilton HallmarkTel: 314.291.5350
New JerseyMt. LaurelHamilton HallmarkTel: 609.222.6400
No. New JerseyWyle ElectronicsTel: 201.882.8358
ParsippanyHamilton HallmarkTel: 201.515.1641
New MexicoAlburquerqueHamilton HallmarkTel: 505293.5119
New YorkHauppaugeHamilton HallmarkTel: 516.737.7400
Long IslandWyle ElectronicsTel: 516.293.8446
RochesterHamilton HallmarkTel: 800.462.6440
North CarolinaRaleighHamilton HallmarkTel: 919.872.0712
Wyle ElectronicsTel: 919.469.1502
OhioClevelandWyle ElectronicsTel: 216.248.9996
DaytonHamilton HallmarkTel: 800.423.4688
Wyle ElectronicsTel: 513.436.9953
SolonHamilton HallmarkTel: 216.498.1100
ToledoWyle ElectronicsTel: 419.861.2622
WorthingtonHamilton HallmarkTel: 614.888.3313
OklahomaTulsaHamilton HallmarkTel: 918.254.6110
OregonBeavertonHamilton HallmarkTel: 503.526.6200
PortlandWyle ElectronicsTel: 503.643.7900
PennsylvaniaPhiladelphiaWyle ElectronicsTel: 800.871.9953
TexasAustinHamilton HallmarkTel: 512.258.8848
Wyle ElectronicsTel: 800.365.9953
DallasHamilton HallmarkTel: 214.553.4302
Wyle ElectronicsTel: 800.955.9953
HoustonHamilton HallmarkTel: 713.787.8300
Wyle ElectronicsTel: 713.784.9953
San AntonioWyle ElectronicsTel: 210.697.2816
UtahSalt Lake CityHamilton HallmarkTel: 801.266.2022
Wyle ElectronicsTel: 801.974.9953
WashingtonRedmondHamilton HallmarkTel: 206.881.6697
SeattleWyle ElectronicsTel: 800.248.9953
WisconsinMilwaukeeWyle ElectronicsTel: 800.867.9953
New BerlinHamilton HallmarkTel: 414.780.7200
♦Dstributors withDesign ResourceCenters
Sales Offices and DesignResource Centers
Printed in USA1096.500.G
Printed onRecycled Paper
ISO 9000 Certified
New JerseyEdison
♦Tel: 908.549.4500Fax: 908.549.4802
New YorkNew YorkTel: 716.223.8820Fax: 716.223.8822
North CarolinaRaleighTel: 919.783.8833Fax: 919.783.8909
OregonBeavertonTel: 503.645.0589Fax: 503.645.6612
TexasAustinTel: 512.388.7294Fax: 512.388.4171
Dallas♦Tel: 214.788.2966
Fax: 214.233.9234
HoustonTel: 713.379.7800Fax: 713.379.7818
WashingtonBellevueTel: 206.822.4384Fax: 206.827.2884
INTERNATIONAL
AustraliaReptechnic Pty LtdNew South WalesTel: 612.9953.9844Fax: 612.9953.9683
CanadaLSI Logic Corporation ofCanada IncOntarioOttawa
♦Tel: 613.592.1263Fax: 613.592.3253
Toronto♦Tel: 416.620.7400
Fax: 416.620.5005
QuebecPointe Claire
♦Tel: 514.694.2417Fax: 514.694.2699
LSI Logic CorporationCorporate HeadquartersTel: 408.433.8000Fax: 408.433.8989
UNITED STATES
CaliforniaIrvine
♦Tel: 714.553.5600Fax: 714.474.8101
San DiegoTel: 619.635.1300Fax: 619.635.1350
Silicon ValleySales OfficeTel: 408.433.8000Fax: 408.433.7783Design Center
♦Tel: 408.433.8000Fax: 408.433.2820
ColoradoBoulderTel: 303.447.3800Fax: 303.541.0641
FloridaBoca RatonTel: 407.989.3236Fax: 407.989.3237
GeorgiaAtlantaTel: 770.395.3800Fax: 770.395.3811
IllinoisSchaumburg
♦Tel: 847.995.1600Fax: 847.995.1622
KentuckyBowling GreenTel: 502.793.0010Fax: 502.793.0040
MarylandBethesda
♦Tel: 301.897.5800Fax: 301.897.8389
MassachusettsWaltham
♦Tel: 617.890.0180Fax: 617.890.6158
MinnesotaMinneapolis
♦Tel: 612.921.8300Fax: 612.921.8399
DenmarkLSI Logic DevelopmentCentreBallerupTel: 45.44.86.55.55Fax: 45.44.86.55.56
FranceLSI Logic S.A.Paris
♦Tel: 33.1.34.63.13.13Fax: 33.1.34.63.13.19
GermanyLSI Logic GmbHMunich
♦Tel: 49.89.4.58.33.0Fax: 49.89.4.58.33.108
StuttgartTel: 49.711.13.96.90Fax: 49.711.86.61.428
Hong KongAVT Industrial LtdHong KongTel: 852.2428.00008Fax: 852.2401.2105
IndiaLogiCAD India Private LtdBangaloreTel: 91.80.526.2500Fax: 91.80.338.6591
IsraelLSI LogicRamat Hasharon
♦Tel: 972.3.5.403741Fax: 972.3.5.403747
Netanya♦Tel: 972.9.657190
Fax: 972.9.657194
ItalyLSI Logic S.P.A.Milano
♦Tel: 39.39.687371Fax: 39.39.6057867
JapanLSI Logic K.K.Tokyo
♦Tel: 81.3.5463.7821Fax: 81.3.5463.7820
Osaka♦Tel: 81.6.947.5281
Fax: 81.6.947.5287
KoreaLSI Logic Corporation ofKorea LtdSeoul
♦Tel: 82.2.561.2921Fax: 82.2.554.9327
SingaporeDesner Electronics Pte LtdSingaporeTel: 65.285.1566Fax: 65.284.9466
Electronic Resources LtdTel: 65.298.0888Fax: 65.298.1111
SpainLSI Logic S.A.Madrid
♦Tel: 34.1.3672200Fax: 34.1.3673151
SwedenLSI Logic ABStockholm
♦Tel: 46.8.444.15.00Fax: 46.8.750.66.47
SwitzerlandLSI Logic Sulzer AGBrugg/BielTel: 41.32.536363Fax: 41.32.536367
TaiwanLSI Logic Asia-PacificRegional OfficeTaipei
♦Tel: 886.2.718.7828Fax: 886.2.718.8869
Jeilin TechnologyCorporationTel: 886.2.248.4828Fax: 886.2.248.9765
United KingdomLSI Logic Europe plcBracknell
♦Tel: 44.1344.426544Fax: 44.1344.481039
♦Sales Offices withDesign Resource Centers