Software and Hardware Considerations for FPU Exception ...cdn.preterhuman.net/texts/computing/intel_cpu... · 2/21/97 12:57 PM 24329102.DOC INTEL CONFIDENTIAL (until publication date)

2/21/97 12:57 PM 24329102.DOC

INTEL CONFIDENTIAL(until publication date)

APPLICATIONNOTE

AP-578

Software andHardwareConsiderations forFPU ExceptionHandlers for IntelArchitectureProcessors

Order Number: 243291-002

February 1997

AP-578

3/11/97 10:25 AM 24329102.DOC


2

Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel orotherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions ofSale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating tosale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, orinfringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, lifesaving, or life sustaining applications.

Intel may make changes to specifications and product descriptions at any time, without notice.

The Pentium® and Pentium Pro processor may contain design defects or errors known as errata which may cause the productto deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may beobtained from:

Intel Corporation P.O. Box 7641 Mt. Prospect IL 60056-7641

or call 1-800-879-4683or visit Intel’s website at http:\\www.intel.com

Copyright © Intel Corporation 1996, 1997.

* Third-party brands and names are the property of their respective owners.

AP-578

2/21/97 12:57 PM 24329102.DOC


3

CONTENTS

PAGE PAGE

1.0 INTRODUCTION AND READING GUIDE .3

2.0 MS-DOS* COMPATIBLE HANDLERS ANDTHEIR ISSUES OVER GENERATIONS ...5

2.1 Origin of MS-DOS* Mode: 8088 and8087 .....................................................5

2.2 Development of MS-DOS* Mode with 80286and 80287; Intel386 Processor andIntel387 Math Coprocessor ...................5

2.2.1 SPECIAL HARDWARE FOR THE80287 INTERFACE ........................6

2.2.2 SPECIAL HARDWARE FOR THEINTEL387 MATH COPROCESSORINTERFACE ..................................6

2.3 FERR# & IGNNE# with Intel486™ andPentium Processors with CR0.NE=0 ..7

2.3.1 BASIC RULES: WHEN FERR# ISGENERATED ................................7

2.3.2 RECOMMENDED EXTERNALHARDWARE TO SUPPORT MS-DOS*COMPATIBILITY ...........................8

2.3.3 “NO-WAIT” FPU INSTRUCTIONS CANGET FPU INTERRUPT INWINDOW ..................................... 10

2.4 Pentium Pro Processor withCR0.NE=0 .......................................... 13

3.0 RECOMMENDED PROTOCOL FORMS-DOS™ AND WINDOWS* 95COMPATIBLE HANDLERS ................... 14

3.1 Numeric Exceptions and their Defaults 143.1.1 TWO OPTIONS FOR HANDLING

NUMERIC EXCEPTIONS ............ 143.1.2 AUTOMATIC EXCEPTION HANDLING :

USING MASKED EXCEPTIONS ..15

3.2 Software Exception Handling ...............163.3 Synchronization Required for Use of FPU

Exception Handlers .............................173.3.1 EXCEPTION SYNCHRONIZATION:

WHAT, WHY AND WHEN ............173.3.2 EXCEPTION SYNCHRONIZATION

EXAMPLES..................................173.3.3 PROPER EXCEPTION

SYNCHRONIZATION IN GENERAL 183.4 FPU Exception Handling Examples .....183.5 Need for Preserving the State of IGNNE#

Circuit if Use FPU and SMM ...............223.6 Considerations When FPU Shared

Between Tasks ...................................223.6.1 SPECULATIVELY DEFERRING FPU

SAVES, GENERAL OVERVIEW ..233.6.2 TRACKING FPU OWNERSHIP .....243.6.3 INTERACTION OF FPU STATE

SAVES AND FP EXCEPTIONASSOCIATION ............................24

3.6.4 INTERRUPT ROUTING FROM THEKERNEL.......................................26

4.0 DIFFERENCES FOR HANDLERS USINGNATIVE MODE .......................................27

4.1 Origin with 80286 and 80287; Intel386™Processorand Intel387 Math Coprocessor ..........27

4.2 Changes with Intel486 , Pentium andPentiumPro Processors with CR0.NE=1 ..........27

4.3 Considerations When FPU Shared BetweenTasks Using Native Mode ...................27

AP-578

2/21/97 12:57 PM 24329102.DOC


4

1.0 INTRODUCTION AND READINGGUIDE

The primary purpose of this application note is toprovide information to help software engineerswrite the most robust Floating-Point Unit (FPU)exception handlers possible. This note alsoprovides the basic hardware information needed todesign the MS-DOS* compatible interface1 for themost recent generations of Intel Architectureprocessors, starting with the Intel486™ processor.(Because of the small amount of new designactivity, the hardware interfaces for the 8086through the Intel386™ processors are treated onlybriefly.) The third purpose is to provide acompendium of the history of the development andvariations of the Intel Architecture Floating-PointUnits (FPUs) as relevant to their exceptionhandling. Following is a list of Intel Architectureprocessors and math coprocessors inchronological order.

• 8086 processor

• 8087 math coprocessor

• 80286 processor

• 80287 math coprocessor

• Intel386™ processor

• Intel387 math coprocessor

• Intel486™ DX processor(with integrated FPU)

• Intel486 SX processor

• Intel487 math coprocessor

• Pentium® processor (with integrated FPU)

• Pentium Pro processor (with integrated FPU)

Much of this material is in various sections of thePentium Processor Family Developer’s Manual,Volume 3. There is also some material in thisapplication note that is not published elsewhere.On the other hand, there is much additional

Footnotes1 WINDOWS* 95 and WINDOWS 3.1 (and earlier

versions) use almost the same interface as MS-DOS*, and the recommendations herein for anMS-DOS compatable system apply to all threeoperating systems.

material on the FPU from the Pentium ProcessorFamily Developer’s Manual, Volume 3 which hasnot been reproduced here, including the details oneach of its specific exceptions. Much of this will beuseful in writing FPU exception handlers, soVolume 3 should be used as an essentialreference along with this appliction note.

NOTE

The following manuals referenced in thisdocument are archived and are available onIntel’s web site at http://www.intel.com:Pentium® Processor Family Developer’sManual, Volume 1: Pentium Processor(Order Number 241428-004) and thePentium® Processor Family Developer’sManual, Volume 3: Architecture andProgramming Manual (Order Number241430-004).

The materials are presented in a mostlychronological order, which supports the historypreservation purpose, and also minimizes forwardreferences. Thus the main body of this applicationnote begins with Section 2 which covers the sixpresently available generations of IntelArchitecture FPUs in chronological order startingwith the 8087. The history of the FPU exceptionhandling has been complicated both by Intel’ssuccessful efforts to improve the performance andflexibility of the FPU through the generations, andby the decision to support upward compatibility fora large customer base which was implementingFPU exception handling in a way compatible withthe first 8088 Personal Computers (PCs) andmajor Operating Systems (OSs). This secondcomplication has resulted in two different systemsor modes for FPU exception handling starting withthe 80286 and 80287.

Beginning with the 80286 and 80287, Intelprovided a dedicated input pin (ERROR#) on the80286, to be connected to the ERROR# output pinon the 80287, for the FPU exceptions. Whenasserted, the ERROR# input triggers interrupt 16.The use of this dedicated interrupt for the FPUexception handler is referred to as the “nativemode”, and is recommended by Intel. However, forreasons explained in Sections 2.1 and 2.2, themajority of the Intel Architecture (IA) customerbase has not been using the native mode, butrather the “MS-DOS compatible mode” for FPUexception handling. Since the MS-DOS compatiblemode has the largest customer base, is the morecomplicated mode, and has changed the mostbetween generations, it is the main focus ofSection 2. In addition to the history of the

AP-578

2/21/97 12:57 PM 24329102.DOC


5

architecture and interfaces for FPU exceptionhandling, Section 2 provides the basic hardwareinformation needed to design the MS-DOScompatible interface for the most recentgenerations of IA processors, and discusses indetail several important system implications.

Section 3 describes the recommended protocol forwriting MS-DOS compatible FPU exceptionhandlers, with various options, along withdiscussions of several problems and how to avoidthem. Most of the material is also applicable tonative mode handlers.

Although the native mode of FPU exceptionhandling was available from the second generationof the six presently available generations of theIntel Architecture FPUs (and brief discussions of itare provided in Section 2), we give the mainpresentation of it last, in Section 4. This is morechronologically consistent than it would seembecause it has not become widely used untilrecently.

A software engineer who needs to write an MS-DOS compatible FPU exception handler but doesnot want to review the FPU history (or read anymore about hardware than necessary) may skipSection 2 and begin reading Section 3. Then somesubsections of Section 2 should be read asneeded when referenced in Section 3. Someonewriting a native mode exception handler that wantsto read only what’s necessary should start withSection 4, but then should also read Section 3, asmost of the recommended protocol for FPUexception handling is the same for MS-DOScompatible and native modes and is not repeatedin Section 4. Studying Section 4 first will allow thisreader more easily to skip references back intoSection 2 which are not relevant to the nativemode.

A note on TERMINOLOGY: There are manyvariations of the words which are used to label an(unmasked) FPU error condition, and also thecode which handles it. “Error”, “exception” and“fault” are used to refer to the condition. Such acondition results in an interrupt, if no mask orblock is in effect along the interrupt pathway. Thecode which handles the interrupt can be referredto as an error or exception or fault handler, or aninterrupt or exception service routine, etc. Thephrase “exception handler” has been usedconsistently (as much as possible) in thisapplication note, for several reasons: “Exception”is less general than interrupt (which includesexternal hardware interrupts and software

interrupts, as well as the processor problemconditions called exceptions or faults), butcorrectly more general than error or fault (becausee.g. a precision exception caused by the fact thatthe number 1/3 cannot be exactly represented inthe 80 bit FPU format is not really due to anymistake or error!). However, the reader should beaware that a number of the variations given abovecan be found in the literature, and that whenapplied to the FPU, they all mean the same thing.

2.0 MS-DOS* COMPATIBLEHANDLERS AND THEIR ISSUESOVER GENERATIONS

2.1 Origin of MS-DOS* Mode: 8088and 8087

The 8087 has an output pin, INT, which it assertswhen an unmasked exception occurs. There is nodedicated pin or interrupt vector number in the8088 or 8086 specific for an FPU error assertion.Intel recommended that the FPU INT be routed tothe 8088 or 8086 INTR pin through an 8259AProgrammable Interrupt Controller (PIC), and notto the NMI input. However, the original PC designattached INT to NMI anyway, because by the timethe 8087 was available, the original PC hadalready assigned other functions to the 8 inputs ofthe single PIC used in that design.

2.2 Development of MS-DOS*Mode with 80286 and 80287;Intel386 Processor andIntel387 Math Coprocessor

The 80286 and 80287 and Intel386 processor andIntel387 math coprocessor pairs are each providedwith ERROR# pins that are recommended to beconnected between the processor and FPU. If thisis done, when an unmasked FPU exceptionoccurs, the FPU records the exception, andasserts its ERROR# pin. The processorrecognizes this active condition of the ERROR#status line at the next WAIT or ESC instruction inits instruction stream, and branches to the FPUexception handler at interrupt vector 16. This is thenative mode.

However, it was important to maintain maximumcompatibility with the already significant 8088 and8086 PC software base, where the NMI vector (#2)was used for FPU exceptions and vector 16 was

AP-578

2/21/97 12:57 PM 24329102.DOC


6

used for the BIOS video software interrupt. So theoriginal IBM PC-AT* design for the 80286 and80287 maintained Vector #16 for the BIOS video,and vector 2 was shared between the FPUexception and the new parity checking feature. Aparity error detected by external hardware directlytriggered vector 2 through the NMI pin. The FPUexception was handled by tying the 80286 RROR#input permanently high, and the 80287 ERROR#output was tied to the IRQ13 interrupt input on thesecond (cascaded) PIC in the PC-AT design. ThePIC was programmed to issue vector 75H whenIRQ13 was triggered.2 But to maintaincompatibility with older PC software that expectedto access its own FPU exception handler bychanging vector 2, the BIOS routine activated byINT 75H branches to INT 2. The standard INT 2routine tests to see if the signal is due to the NMIpin (in which case it branches to the Parity Errorhandler) or an FPU exception.

2.2.1 SPECIAL HARDWARE FOR THE80287 INTERFACE

It is necessary to guarantee, in the case of an80287 exception, that the exception will behandled through the external loop using IRQ13 inthe cascaded PIC before other 80287 instructionsare sent over from the 80286. This is done byasserting BUSY# to the 80286, which normallymeans that the 80287 is still busy with a previousinstruction, and so blocks the 80286 from sendinganother until BUSY# is de-asserted. Thisadditional use of BUSY# is implemented by anedge triggered flip-flop which latches BUSY# usingERROR# from the 80287 as a clock. The output ofthis latch is OR’ed with the BUSY# output of the80287 and drives the BUSY# input of the 80286.This PC-AT scheme effectively delays

deactivation of BUSY# at the 80286 whenever an80287 ERROR# is signaled.

Since the BUSY# signal to the 80286 remainsactive after an exception, the IRQ13 interrupt

Footnotes2 WINDOWS 95 and WINDOWS 3.1 (and earlier

versions) use interrupt 5DH instead of 75H, butthe recommendations herein apply to systemsusing these WINDOWS operating systems, aswell as MS-DOS.

(exception) handler (accessed through interruptvector 75H) is guaranteed to execute before anyother 80287 instruction can begin (except forsome special control instructions).The IRQ13handler clears the BUSY# latch (by writing to aspecial I/O port defined at 0F0H), thus allowingexecution of 80287 instructions to proceed. Thehandler then branches to the NMI handler(interrupt vector 2), where the user definednumeric exception handler resides in PCcompatible systems. Thus the PC-AT schemeapproximates the exception reporting schemebetween the 8087 and 8088 in the original PC.

2.2.2 SPECIAL HARDWARE FOR THEINTEL387™ MATH COPROCESSORINTERFACE

The Intel386 processor can use a PC-ATcompatible interface to communicate with anIntel387 math coprocessor, that is similar to theone in the 80286 and 80287 system above. Aswith the 80286, the Intel386 processor ERROR#pin should be tied permanently inactive (high), andthe Intel387 ERROR# output used both to driveIRQ13, and to latch BUSY# in a flip-flop. TheIRQ13 handler (vector 75H) should clear theBUSY# latch and branch to the NMI handler, as inthe 80286 case.

However, an additional hardware feature isneeded to manage the PEREQ signal to theIntel386 processor. After the Intel387 mathcoprocessor asserts ERROR#, and then itsBUSY# signal has gone inactive, externalhardware must re-assert the PEREQ signal to theIntel386 processor. This is needed for storeinstructions (for example, FST mem ) because theIntel387 math coprocessor drops PEREQ once it

AP-578

2/21/97 12:57 PM 24329102.DOC


7

signals an exception. While the Intel386 processorhas not yet recognized the occurrence of theexception, it still expects the data transfers tocomplete via PEREQ re-activation. It ispermissible for the Intel386 processor to receiveundefined data during such I/O read cycles.Disabling the Intel387 math coprocessor is notnecessary, because the dummy data transfercycles directed to the Intel387 math co processorwhen PEREQ is externally reactivated for theIntel386 processor will not disturb the operation ofthe Intel387math coprocessor. The IRQ13interrupt handler should remove the extension ofBUSY# and also the re-activation of PEREQ via awrite to PC/AT compatible hardware at I/O port0F0H.

An Intel387 math coprocessoroffers significantperformance improvements over the 80287, butbecause the Intel386 processor was ready forproduction before the Intel387math coprocessor,the Intel386 processor was designed to work witheither the 80287 or Intel387 math coprcoessors.The Intel386 processor automatically configuresitself for the attached FPU on reset by testing theERROR# pin, and setting or clearing bit 4 in CR0(see Section 10.1.3 in the Pentium ProcessorFamily Developer’s Manual, Volume 3). This bit isthe ET (Extension Type) bit, and it will be set ifERROR# is low (meaning an Intel387 is attached)and cleared if ERROR# is high (meaning there isan 80287 or no FPU attached). The MS-DOScompatible hardware interface is similar to that forthe Intel386 processor and Intel387 mathcoprocessor combination.

2.3 FERR# & IGNNE# withIntel486™ and PentiumProcessors with CR0.NE=0

In the Intel486 and Pentium processors, moreenhancements and speedup features have beenadded to the corresponding FPUs. Also, the FPUis built into the same chip as the processor, whichallows further increases in speed. MS-DOScompatibility for exception handling has also beenbuilt in, with the NE bit in control register CR0selecting the MS-DOS compatible mode if madezero. (NE=1 selects the native or internal mode,which generates Interrupt 16, which is the sameas the native version of exception handling for the80286 and 80287 and the Intel386 processors andIntel 387 math coprocessor.)

In MS-DOS compatible mode, the FERR#(Floating-point ERRor) output replaces the

ERROR# signal from the previous generations,and is connected to a PIC. A new input signal,IGNNE# ( IGNore Numeric Error), is provided toallow the FPU exception handler to execute FPUinstructions, if desired, without first clearing theerror condition and without triggering the interrupta second time. This IGNNE# feature is needed toreplicate the capability that was provided on MS-DOS compatible Intel 80286 and 80287 and theIntel386 processors and INtel 387 mathcoprocessor-based systems by turning off theBUSY# signal, when inside the FPU exceptionhandler, before clearing the error condition.

Note that Intel, in order to provide Intel486processors for market segments which had noneed for an FPU, created the “SX” versions.These Intel486 SX processors did not contain thefloating-point unit. Intel also produced Intel487 SXmath coprocessors for end users who laterdecided to upgrade to a system with an FPU.These Intel487 SX math coprocessors are similarto standard Intel486 processors with a workingFPU on board. Thus the external circuitrynecessary to support the MS-DOS compatiblemode for Intel487 SX math coprocessors is thesame as for standard Intel486 DX processors.

Note that the special DP (Dual Processing) modefor Pentium processors, and also the more generalIntel MultiProcessor Specification for systems withmultiple Pentium or Pentium Pro processors,support FPU exception handling only in the nativemode. Intel does not recommend using the MS-DOS compatible FPU mode for systems usingmore than one processor.

2.3.1 BASIC RULES: WHEN FERR# ISGENERATED

• Assume the following conditions: NE=0, theIGNNE# input is de-asserted, and then anFPU instruction causes an unmasked FPUexception. Then in most cases, deferred errorreporting occurs. This means that theprocessor does not respond immediately, butrather freezes just before executing the nextWAIT or FPU instruction (except for “No-Wait” instructions, which the FPU executesregardless of an error condition).

• At the same time that the processor freezes,it also asserts the FERR# output.

AP-578

2/21/97 12:57 PM 24329102.DOC


8

• The frozen processor waits for an externalinterrupt, which must be supplied by externalhardware in response to the FERR#assertion.

• In MS-DOS compatible systems, FERR# isfed to the IRQ13 input in the cascaded PIC,which generates interrupt 75H, which thenbranches to interrupt 2, as described abovefor the 80286and 80287 and Intel386processor and Intel387 processor-basedsystems.

These cases in which FERR# is not asserted atthe time of the error, but rather at the next FPU orWAIT instruction, include all exceptions caused bythe basic arithmetic instructions (including FADD,FSUB, FMUL, FDIV, FSQRT, FCOM andFUCOM), precision exceptions caused by all typesof FPU instructions, and numeric underflow andoverflow on all types of FPU instructions exceptstores to memory. We will refer to these cases asdeferred (error reporting).

On the other hand, there are some exceptions,which when caused by some instructions, driveFERR# at the time that the exception occurs.These include FPU stack fault, invalid operationand denormal exceptions caused by alltranscendental instructions, FSCALE, FXTRACT,FPREM and others, and all exceptions (exceptprecision) when caused by FPU store instructions.These cases are called immediate (errorreporting). (These cases will, like the deferred,cause the processor to freeze just beforeexecuting the next WAIT or FPU instruction if theerror condition has not been cleared by that time.)Note that in general, whether an FPU exceptioncase is deferred or immediate depends both onwhich exception occurred, and which instructioncaused that exception. A complete specification ofthese cases, which applies also to the Intel486, isgiven in Section 5.1.21 in the Pentium ProcessorFamily Developer’s Manual, Volume 1.

If NE=0 but the IGNNE# input is active while anunmasked FPU exception is in effect, theprocessor disregards the exception, does notassert FERR#, and continues. If IGNNE# is thende-asserted and the FPU exception has not beencleared, the processor will respond as described

above. (That is, an immediate exception case willassert FERR# immediately. A deferred exceptioncase will assert FERR# and freeze just before the

next FPU or WAIT instruction.) The assertion ofIGNNE# is intended for use only inside the FPUexception handler, where it is needed if one wantsto execute non-control FPU instructions fordiagnosis, before clearing the exception condition.When IGNNE# is asserted inside the exception

handler, a preceding FPU exception has alreadycaused FERR# to be asserted, and the externalinterrupt hardware has responded, but IGNNE#assertion still prevents the freeze at FPUinstructions. Note that if IGNNE# is left activeoutside of the FPU exception handler, additionalFPU instructions may be executed after a giveninstruction has caused an FPU exception. In thiscase, if the FPU exception handler ever did getinvoked, it could not determine which instructioncaused the exception.

To properly manage the interface between theprocessor’s FERR# output, its IGNNE# input, andthe IRQ13 input of the PIC, additional externalhardware is needed. A recommendedconfiguration is described below.

2.3.2 RECOMMENDED EXTERNALHARDWARE TO SUPPORT MS-DOS*COMPATIBILITY

Figure 1 below provides an external circuit whichwill assure proper handling of FERR# and IGNNE#when an FPU exception occurs. In particular, itassures that IGNNE# will be active only inside theFPU exception handler without depending on theorder of actions by the exception handler. Somehardware implementations have been less robustbecause they have depended on the exceptionhandler to clear the FPU exception interruptrequest to the PIC (FP_IRQ signal) before thehandler causes FERR# to be de-asserted byclearing the exception from the FPU itself. Figure2 below shows the details of how IGNNE# willbehave when the circuit in Figure 1 isimplemented. The temporal regions within the FPUexception handler activity are described asfollows:

1. The FERR# signal is activated by an FPUexception and sends an interrupt requestthrough the PIC to the processor’s INTR pin.

2. During the FPU interrupt service routine(exception handler) the processor will need toclear the interrupt request latch (Flip Flop #1).It may also want to execute non-control FPUinstructions before the exception is cleared

AP-578

2/21/97 12:57 PM 24329102.DOC


9

from the FPU. For this purpose the IGNNE#must be driven low. Typically in the PCenvironment an I/O access to Port 0F0Hclears the external FPU exception interruptrequest (FP_IRQ). In the recommendedcircuit, this access also is used to activateIGNNE#. With IGNNE# active the FPUexception handler may execute any FPUinstruction without being blocked by an activeFPU exception.

3. Clearing the exception within the FPU willcause the FERR# signal to be deactivatedand then there is no further need for IGNNE#to be active. In the recommended circuit, the

deactivation of FERR# is used to deactivateIGNNE#. If another circuit is used, thesoftware and circuit together must assure thatIGNNE# is deactivated no later than the exitfrom the FPU exception handler.

4. In the circuit in Figure 1 when the FPUexception handler accesses I/O port 0F0H itclears the IRQ13 interrupt request outputfrom Flip Flop #1 and also clocks out theIGNNE# signal (active) from Flip Flop #2. Sothe handler can activate IGNNE#, if needed,by doing this 0F0H access before clearing

AP-578

2/21/97 12:57 PM 24329102.DOC


10

Intel486 ,Pentium® , orPentium Proprocessor

FF #1

FF #2

FP_IRQ

LEGENDFF #n: Flip Flop #nCLR: Clear or reset

Figure 1. Recommended Circuit for MS-DOS* Compatible FPU Exception Handling

the FPU exception condition (which de-assertsFERR#). However, the circuit does not depend onthe order of actions by the FPU exception handlerto guarantee the correct hardware state upon exitfrom the handler. The flip flop which drives IGNNE#to the processor has its CLEAR input attached tothe inverted FERR#. This ensures that IGNNE# cannever be active when FERR# is inactive. So if thehandler clears the FPU exception condition beforethe 0F0H access, IGNNE# does not get activatedand left on after exit from the handler.

2.3.3 “NO-WAIT” FPU INSTRUCTIONS CANGET FPU INTERRUPT IN WINDOW

The Pentium and the Intel486 processorsimplement the “No-Wait” Floating-Point instructions(FNINIT, FNCLEX, FNSTENV, FNSAVE, FNSTSW,FNSTCW, FNENI, FNDISI or FNSETPM - SeeSection 6.3.7 in the Pentium® Processor FamilyDeveloper's Manual, Volume 3) in the MS-DOSCompatibility mode (CR0.NE = 0) in the followingmanner:

AP-578

2/21/97 12:57 PM 24329102.DOC


11

If an unmasked numeric exception is pending froma preceding FPU instruction, a member of the “No-Wait” class of instructions will, at the beginning ofits execution, assert the FERR# pin in response tothat exception just like other FPU instructions, butthen, unlike the other FPU instructions, FERR# willbe de-asserted. This de-assertion was implementedto allow the “No-Wait” class of instructions toproceed without an interrupt due to any pendingnumeric exception. However, the brief assertion ofFERR# is sufficient to latch the FPU exceptionrequest into most hardware interfaceimplementations (including Intel’s recommendedcircuit).

All the FPU instructions are implemented such thatduring their execution, there is a window in whichthe processor will sample and accept externalinterrupts. If there is a pending interrupt, theprocessor services the interrupt first beforeresuming the execution of the instruction.Consequently, it is possible that the “No-Wait”Floating-Point instruction may accept the externalinterrupt caused by it’s own assertion of the FERR#pin in the event of a pending unmasked numericexception, which is not an explicitly documentedbehavior of a “No-Wait” instruction. This process isillustrated by Figure 3, which is followed by adetailed description of the several cases possible.

0F0H Address Decode

Figure 2. Behavior of Signals During FPU Exception Handling

AP-578

2/21/97 12:57 PM 24329102.DOC


12

E x c e p t io n g e n e ra t in gF P i n s t r u c t i o n

S t a r t o f th e “N o - W a it” F P

i n s t r u c t i o n

A s s e r t io n O f F E R R #b y t h e p r o c e s s o r

S y s t e m D e p e n d e n tD e l a y

E x t e r n a l I n t e r r u p t S a m p lin g W i n d o w

W i n d o w C L O S E D

C a s e I

C a s e I I

A s s e r t io n O f IN T Rp i n b y t h e s y s t e m

Figure 3: Timing of Receipt of External Interrupt

Figure 3 assumes that a floating-point instructionwhich generates a “deferred” error (as definedabove in the Section 2.3.1), which asserts theFERR# pin only on encountering the next floating-point instruction, causes an unmasked numericexception. Assume that the next floating-pointinstruction following this instruction is one of the“No-Wait” floating-point instructions. The FERR#pin is asserted by the processor to indicate thepending exception on encountering the “No-Wait”floating-point instruction. After the assertion of theFERR# pin the “No-Wait” floating-point instructionopens a window where the pending externalinterrupts are sampled.

Then there are two cases possible depending onthe timing of the receipt of the interrupt via the INTRpin (asserted by the system in response to theFERR# pin) by the processor.

Case 1If the system responds to the assertion of FERR#pin by the “No-Wait” floating-point instruction via theINTR pin during this window then the interrupt isserviced first, before resuming the execution of the“No-Wait” floating-point instruction.

Case 2If the system responds via the INTR pin after thewindow has closed then the interrupt is recognizedonly at the next instruction boundary.

There are two other ways, in addition to Case Iabove, in which a “No-Wait” floating-pointinstruction can service a numeric exception insideits interrupt window. First, the first floating-pointerror condition could be of the “immediate” category(as defined in Section 2.3.1) that assert FERR#

AP-578

2/21/97 12:57 PM 24329102.DOC


13

immediately. If the system delay before assertingINTR is long enough, relative to the time elapsedbefore the “No-Wait” floating-point instruction, INTRcan be asserted inside the interrupt window for thelatter. Second, consider two “No-Wait” FPUinstructions in close sequence, and assume that aprevious FPU instruction has caused an unmaskednumeric exception. Then if the INTR timing is toolong for an FERR# signal triggered by the first “No-Wait” instruction to hit the first instruction’s interruptwindow, it could catch the interrupt window of thesecond.

The possible malfunction of a “No-Wait” FPUinstruction explained above cannot happen if theinstruction is being used in the manner for whichIntel originally designed it. The “No-Waitinstructions were intended to be used inside theFPU exception handler, to allow manipulation of theFPU before the error condition is cleared, withouthanging the processor because of the FPU errorcondition, and without the need to assert IGNNE#.They will perform this function correctly, sincebefore the error condition is cleared, the assertionof FERR# that caused the FPU error handler to beinvoked is still active. Thus the logic that wouldassert FERR# briefly at a “No-Wait” instructioncauses no change since FERR# is alreadyasserted. The “No-Wait” instructions may also beused without problem in the handler after the errorcondition is cleared, since now they will not causeFERR# to be asserted at all.

If a “No-Wait” instruction is used outside of the FPUexception handler, it may malfunction as explainedabove, depending on the details of the hardwareinterface implementation and which particularprocessor is involved. The actual interrupt insidethe window in the “No-Wait” instruction may beblocked by surrounding it with the instructions:PUSHFD, CLI, “No-Wait”, then POPFD. (CLI blocksinterrupts, and the push and pop of flags preservesand restores the original value of the interrupt flag.)However, if FERR# was triggered by the “No-Wait”,its latched value and the PIC response will still be ineffect. Further code can be used to check for andcorrect such a condition, if needed. Section 3.6(Considerations When FPU Shared BetweenTasks) discusses an important example of this typeof problem and gives a solution.

2.4 Pentium Pro Processor withCR0.NE=0

When bit NE=0 in CR0, the MS-DOS* compatiblemode of the Pentium Pro processor providesFERR# and IGNNE# functionality that is almostidentical to the Intel486 and Pentium processors.The same external hardware, as described inSection 2.3.2 above, is recommended for thePentium Pro processor as well as the two previousgenerations. The only change to MS-DOScompatible FPU exception handling with thePentium Pro processor is that all exceptions for allFPU instructions cause immediate error reporting.That is, FERR# is asserted as soon as the FPUdetects an unmasked exception; there are no casesin which error reporting is deferred to the next FPUor WAIT instruction. (As is discussed in Section2.3.1, most exception cases in the Intel486 andPentium processors are of the deferred type.)

Although FERR# is asserted immediately upondetection of an unmasked FPU error, this certainlydoes not mean that therequested interrupt willalways be serviced before the next instruction in thecode sequence is executed. To begin with, thePentium Pro processor executes severalinstructions simultaneously. There also will be adelay, which depends on the external hardwareimplementation, between the FERR# assertion fromthe processor and the responding INTR assertion tothe processor. Further, the interrupt request to thePICs (IRQ13) may be temporarily blocked by theOS, or delayed by higher priority interrupts, andprocessor response to INTR itself is blocked if theOS has cleared the IF bit in EFLAGS.

However, just as with the Intel486 and Pentiumprocessors, if the IGNNE# input is inactive, afloating-point exception which occurred in theprevious FPU instruction and is unmasked causesthe processor to freeze immediately whenencountering the next WAIT or FPU instruction(except for “No-Wait” instructions). This means thatif the FPU exception handler has not already beeninvoked due to the earlier exception (and thereforethe handler has not cleared that exception statefrom the FPU), the processor is forced to wait forthe handler to be invoked and handle the exception,before the processor can execute another WAIT orFPU instruction.

AP-578

14

2/21/97 12:57 PM 24329102.DOC


As explained in Section 2.3.3, if a “No-Wait”instruction is used outside of the FPU exceptionhandler, in the Intel486 and Pentium processors, itmay accept an unmasked exception from aprevious FPU instruction which happens to fallwithin the external interrupt sampling window that isopened near the beginning of execution of all FPUinstructions. This will not happen in the Pentium Proprocessor, because this sampling window has beenremoved from the “No-Wait” group of FPUinstructions.

3.0 RECOMMENDED PROTOCOLFOR MS-DOS ANDWINDOWS* 95 COMPATIBLEHANDLERS3

The activities of numeric programs can be split intotwo major areas: program control and arithmetic.The program control part performs activities suchas deciding what functions to perform, calculatingaddresses of numeric operands, and loop control.The arithmetic part simply adds, subtracts,multiplies, and performs other operations on thenumeric operands. The processor is designed tohandle these two parts separately and efficiently.An FPU exception handler, if a system chooses toimplement one, is often one of the mostcomplicated parts of the program control code.

3.1 Numeric Exceptions and theirDefaults

The FPU can recognize six classes of numericexception conditions while executing numericinstructions:

Footnotes3 Although there are some differences in the way

FPU exceptions are handled between MS-DOS,and WINDOWS 95 and WINDOWS 3.1 (andearlier versions), the WINDOWS operatingsystems operate the processor in the MS-DOScompatable mode, and the recommended protocolgiven here applies to all these systems. On theother hand, current versions of WINDOWS NTuse the FPU native mode.

1. #I — Invalid operation#IS Stack fault#IA IEEE standard invalid operation

2. #Z Divide-by-zero

3. #D Denormalized operand

4. #O Numeric overflow

5. #U Numeric underflow

6. #P Inexact result (precision)

For complete details on these exceptions and theirdefaults, see the Pentium Processor FamilyDeveloper’s Manual, Volume 3, Sections 7.1.7through 7.1.13.

3.1.1 TWO OPTIONS FOR HANDLINGNUMERIC EXCEPTIONS

Depending on options determined by the softwaresystem designer, the processor takes one of twopossible courses of action when a numericexception occurs:

1. The FPU can handle selected exceptionsitself, producing a default fix-up that isreasonable in most situations. This allows thenumeric program execution to continueundisturbed. Programs can mask individualexception types to indicate that the FPUshould generate this safe, reasonable resultwhenever the exception occurs. The defaultexception fix-up activity is treated by the FPUas part of the instruction causing theexception; no external indication of theexception is given (except that the instructiontakes longer to execute when it handles amasked exception.) When masked exceptionsare detected, a flag is set in the numeric statusregister, but no information is preservedregarding where or when it was set.

AP-578

2/21/97 12:57 PM 24329102.DOC


15

2. Alternatively, a software exception handler canbe invoked to handle the exception. When anumeric exception is unmasked and theexception occurs, the FPU stops furtherexecution of the numeric instruction andcauses a branch to a software exceptionhandler. The exception handler can thenimplement any sort of recovery proceduresdesired for any numeric exception detectableby the FPU.

3.1.2 AUTOMATIC EXCEPTION HANDLING:USING MASKED EXCEPTIONS

Each of the six exception conditions describedabove has a corresponding flag bit in the FPUstatus word and a mask bit in the FPU control word.If an exception is masked (the corresponding maskbit in the control word = 1), the processor takes anappropriate default action and continues with thecomputation. The processor has a default fix-upactivity for every possible exception condition itmay encounter. These masked-exceptionresponses are designed to be safe and aregenerally acceptable for most numeric applications.

For example, if the Inexact result (Precision)exception is masked, the system can specifywhether the FPU should handle a result that cannotbe represented exactly by one of four modes ofrounding: rounding it normally, chopping it towardzero, always rounding it up, or always down. If theUnderflow exception is masked, the FPU will storea number that is too small to be represented innormalized form as a denormal (or zero if it’ssmaller than the smallest denormal). Note thatwhen exceptions are masked, the FPU may detectmultiple exceptions in a single instruction, becauseit continues executing the instruction afterperforming its masked response. For example, theFPU could detect a denormalized operand, performits masked response to this exception, and thendetect an underflow.

As an example of how even severe exceptions canbe handled safely and automatically using thedefault exception responses, consider a calculationof the parallel resistance of several values usingonly the standard formula (Figure 4). If R1 becomeszero, the circuit resistance becomes zero. With thedivide-by-zero and precision exceptions masked,the processor will produce the correct result. FDIVof R1 into 1 gives infinity, and then FDIV of (infinity+R2 +R3) into 1 gives zero.

Figure 4. Arithmetic Example Using Infinity

By masking or unmasking specific numericexceptions in the FPU control word, programmerscan delegate responsibility for most exceptions tothe processor, reserving the most severeexceptions for programmed exception handlers.Exception-handling software is often difficult towrite, and the masked responses have beentailored to deliver the most reasonable result foreach condition. For the majority of applications,masking all exceptions yields satisfactory resultswith the least programming effort. Certainexceptions can usefully be left unmasked during thedebugging phase of software development, andthen masked when the clean software is actuallyrun. An invalid-operation exception for example,typically indicates a program error that must becorrected.

The exception flags in the FPU status word providea cumulative record of exceptions that haveoccurred since these flags were last cleared. Onceset, these flags can be cleared only by executingthe FCLEX/FNCLEX (clear exceptions) instruction,by reinitializing the FPU with FINIT/FNINIT orFSAVE/FNSAVE, or by overwriting the flags with anFRSTOR or FLDENV instruction. This allows aprogrammer to mask all exceptions, run acalculation, and then inspect the status word to see

AP-578

16

2/21/97 12:57 PM 24329102.DOC


if any exceptions were detected at any point in thecalculation.

3.2 Software Exception Handling

If the FPU in or with an Intel family processor(80286 and onwards) encounters an unmaskedexception condition, with the system operated in theMS-DOS compatible mode and with IGNNE# notasserted, a software exception handler is invokedthrough a PIC and the processor’s INTR pin. TheFERR# (or ERROR# ) output from the FPU thatbegins the process of invoking the exceptionhandler may occur when the error condition is firstdetected, or when the processor encounters thenext WAIT or FPU instruction. Which of these twocases occurs depends on the processor generationand also on which exception and which FPUinstruction triggered it, as discussed earlier inSection 2. The elapsed time between the initialerror signal and the invocation of the FPU exceptionhandler depends of course on the externalhardware interface, and also on whether theexternal interrupt for FPU errors is enabled. But thearchitecture ensures that the handler will beinvoked before execution of the next WAIT orfloating-point instruction since an unmaskedfloating-point exception causes the processor tofreeze just before executing such an instruction(unless the IGNNE# input is active, or it is a “No-Wait” FPU instruction).

The frozen processor waits for an external interrupt,which must be supplied by external hardware inresponse to the FERR# (or ERROR#) output of theprocessor (or coprocessor), usually through IRQ13on the “slave” PIC, and then through INTR. Thenthe external interrupt invokes the exceptionhandling routine. Note that if the external interruptfor FPU errors is disabled when the processorexecutes an FPU instruction, the processor willfreeze until some other (enabled) interrupt occurs ifan unmasked FPU exception condition is in effect.If NE = 0 but the IGNNE# input is active, theprocessor disregards the exception and continues.Error reporting via an external interrupt is supportedfor MS-DOS compatibility. Chapter 23 of thePentium Processor Family Developer’s Manual,Volume 3 contains further discussion ofcompatibility issues.

The references above to the ERROR# output fromthe FPU apply to the Intel387 and 80287 mathcoprocessors (NPX chips). If one of thesecoprocessors encounters an unmasked exceptioncondition, it signals the exception to the 80286 orIntel386 processor using the ERROR# status linebetween the processor and the coprocessor. SeeSection 2.2 above, and Chapter 23 of the Pentium

Processor Family Developer’s Manual, Volume 3for differences in FPU exception handling.

The exception-handling routine is normally a part ofthe systems software. The routine must clear (ordisable) the active exception flags in the FPU statusword before executing any FP instructions thatcannot complete execution when there is a pendingFP exception. Otherwise, the FP instruction willtrigger the FPU interrupt again, and the system willbe caught in an endless loop of nested FPexceptions, and hang. In any event, the routinemust clear (or disable) the active exception flags inthe FPU status word after handling them, andbefore IRET(D). Typical exception responses mayinclude:

• Incrementing an exception counter for laterdisplay or printing

• Printing or displaying diagnostic information(e.g., the FPU environment and registers)

• Aborting further execution, or using theexception pointers to build an instruction thatwill run without exception and executing it

Applications programmers should consult theiroperating system's reference manuals for theappropriate system response to numericalexceptions. For systems programmers, somedetails on writing software exception handlers areprovided in Chapter 14 of the Pentium ProcessorFamily Developer’s Manual, Volume 3, as well as inthis application note.

As discussed in Section 2.3.2, some early FERR#to INTR hardware interface implementations areless robust than the recommended circuit. This isbecause they depended on the exception handler toclear the FPU exception interrupt request to the PIC(by accessing port 0F0H) before the handler causesFERR# to be de-asserted by clearing the exceptionfrom the FPU itself. To eliminate the chance of a

AP-578

2/21/97 12:57 PM 24329102.DOC


17

problem with this early hardware, Intel recommendsthat FPU exception handlers always access port0F0H before clearing the error condition from theFPU.

3.3 Synchronization Required forUse of FPU Exception Handlers

Concurrency or synchronization managementrequires a check for exceptions before letting theprocessor change a value just used by the FPU. Itis important to remember that almost any numericinstruction can, under the wrong circumstances,produce a numeric exception.

3.3.1 EXCEPTION SYNCHRONIZATION:WHAT, WHY AND WHEN

Exception synchronization means that theexception handler inspects and deals with theexception in the context in which it occurred. Ifconcurrent execution is allowed, the state of theprocessor when it recognizes the exception is oftennot in the context in which it occurred. Theprocessor may have changed many of its internalregisters and be executing a totally differentprogram by the time the exception occurs. If theexception handler cannot recapture the originalcontext, it cannot reliably determine the cause ofthe exception or to recover successfully from theexception. To handle this situation, the FPU hasspecial registers updated at the start of eachnumeric instruction to describe the state of thenumeric program when the failed instruction wasattempted. This provides tools to help the exceptionhandler recapture the original context, but theapplication code must also be written withsynchronization in mind. Overall, exceptionsynchronization must ensure that the FPU andother relevant parts of the context are in a welldefined state when the handler is invoked after anunmasked numeric exception occurs.

When the FPU signals an unmasked exceptioncondition, it is requesting help. The fact that theexception was unmasked indicates that furthernumeric program execution under the arithmeticand programming rules of the FPU will probablyyield invalid results. Thus the exception must behandled, and with proper synchronization, or theprogram will not operate reliably.

For programmers in higher-level languages, allrequired synchronization is automatically providedby the appropriate compiler. However, for assembly

language programmers exception synchronizationremains the responsibility of the programmer. It isnot uncommon for a programmer to expect thattheir numeric program will not cause numericexceptions after it has been tested and debugged,but in a different system or numeric environment,exceptions may occur regularly nonetheless. Anobvious example would be use of the program withsome numbers beyond the range for which it wasdesigned and tested. The example in Section 3.3.2shows a more subtle way in which unexpectedexceptions can occur.

As described in Section 3.1.1, depending onoptions determined by the software systemdesigner, the processor can perform one of twopossible courses of action when a numericexception occurs.

• The FPU can provide a default fix-up forselected numeric exceptions. If the FPUperforms its default action for all exceptions,then the need for exception synchronization isnot manifest. However, code is often ported tocontexts and operating systems for which itwas not originally designed. The examplebelow illustrates that it is safest to alwaysconsider exception synchronization whendesigning code that uses the FPU.

• Alternatively, a software exception handler canbe invoked to handle the exception. When anumeric exception is unmasked and theexception occurs, the FPU stops furtherexecution of the numeric instruction andcauses a branch to a software exceptionhandler. When an FPU exception handler willbe invoked, synchronization must always beconsidered to assure reliable performance.

The following examples illustrate the need toalways consider exception synchronization whenwriting numeric code, even when the code is initiallyintended for execution with exceptions masked.

3.3.2 EXCEPTION SYNCHRONIZATION:EXAMPLES

In the following examples, three instructions areshown to load an integer, calculate its square root,then increment the integer. The synchronousexecution of the FPU will allow both of these

AP-578

18

2/21/97 12:57 PM 24329102.DOC


Incorrect Error Synchronization:

FILD COUNT ; FPU instructionINC COUNT ; integer instruction alters operandFSQRT ; subsequent FPU instruction -- error

; from previous FPU instruction detected here

Proper Error Synchronization:

FILD COUNT ; FPU instructionFSQRT ; subsequent FPU instruction -- error from

; previous FPU instruction detected hereINC COUNT ; integer instruction alters operand

programs to execute correctly, with INC COUNTbeing executed in parallel in the processor, as longas no exceptions occur on the FILD instruction.However, if the code is later moved to anenvironment where exceptions are unmasked, thecode in the first example will not work correctly:

In some operating systems supporting the FPU, thenumeric register stack is extended to memory. Toextend the FPU stack to memory, the invalidexception is unmasked. A push to a full register orpop from an empty register sets SF (Stack Faultflag) and causes an invalid operation exception.The recovery routine for the exception mustrecognize this situation, fix up the stack, thenperform the original operation. The recovery routinewill not work correctly in the first example shown inthe figure. The problem is that the value of COUNTis incremented before the exception handler isinvoked, so that the recovery routine will load anincorrect value of COUNT, causing the program tofail or behave unreliably.

3.3.3 PROPER EXCEPTI ONSYNCHRONIZATION IN GENERAL

As explained before (see Section 3.2), if the FPUencounters an unmasked exception condition asoftware exception handler is invoked beforeexecution of the next WAIT or floating-pointinstruction. This is because an unmasked floating-point exception causes the processor to freezeimmediately before executing such an instruction(unless the IGNNE# input is active, or it is a “No-Wait” FPU instruction). Exactly when the exceptionhandler will be invoked (in the interval betweenwhen the exception is detected and the next WAITor FPU instruction) is dependent on the processorgeneration, the system, and which FPU instructionand exception is involved.

To be safe in exception synchronization, oneshould assume the handler will be invoked at theend of the interval. Thus the program should notchange any value that might be needed by thehandler (such as COUNT in the above example)until after the next FPU instruction following an FPUinstruction that could cause an error. If the programneeds to modify such a value before the next FPUinstruction (or if the next FPU instruction could alsocause an error), then a WAIT instruction should beinserted before the value is modified. This will forcethe handling of any exception before the value ismodified. A WAIT instruction should also be placedafter the last floating-point instruction in anapplication so that any unmasked exceptions will beserviced before the task completes.

3.4 FPU Exception HandlingExamples

There are many approaches to writing exceptionhandlers. One useful technique is to consider theexception handler procedure as consisting of"prologue," "body," and "epilogue" sections of code.

In the transfer of control to the exception handlerdue to an INTR, NMI, or SMI, external interruptshave been disabled by hardware. The prologueperforms all functions that must be protected frompossible interruption by higher-priority sources.Typically, this involves saving registers andtransferring diagnostic information from the FPU tomemory. When the critical processing has beencompleted, the prologue may re-enable interrupts toallow higher-priority interrupt handlers to preemptthe exception handler. The standard "prologue" notonly saves the registers and transfers diagnosticinformation from the FPU to memory but also clearsthe FP exception flags in the status word.Alternatively, when it is not necessary for the

AP-578

2/21/97 12:57 PM 24329102.DOC


19

handler to be re-entrant, another technique mayalso be used. In this technique, the exception flagsare not cleared in the "prologue" and the body ofthe handler must not contain any FP instructionsthat cannot complete execution when there is apending FP exception. (The “No-Wait” instructionsare discussed in Section 6.3.7 of the Pentium

Processor Family Developer’s Manual, Volume 3).Note that the handler must still clear the exceptionflag(s) before executing the IRET. If the exceptionhandler uses neither of these techniques thesystem will be caught in an endless loop of nestedFP exceptions, and hang.

The body of the exception handler examines thediagnostic information and makes a response thatis necessarily application-dependent. This responsemay range from halting execution, to displaying amessage, to attempting to repair the problem andproceed with normal execution. The epilogueessentially reverses the actions of the prologue,restoring the processor so that normal executioncan be resumed. The epilogue must not load anunmasked exception flag into the FPU or anotherexception will be requested immediately.

The following code examples show the ASM386and ASM486 coding of three skeleton exceptionhandlers, with the save spaces given as correct for32 bit protected mode. They show how prologuesand epilogues can be written for various situations,but the application dependent exception handlingbody is just indicated by comments showing whereit should be placed.

The first two are very similar; their only substantialdifference is their choice of instructions to save andrestore the FPU. The tradeoff here is between theincreased diagnostic information provided byFNSAVE and the faster execution of FNSTENV.(Also, after saving the original contents, FNSAVEre-initializes the FPU, while FNSTENV only masksall FPU exceptions.) For applications that aresensitive to interrupt latency or that do not need toexamine register contents, FNSTENV reduces theduration of the "critical region," during which theprocessor does not recognize anotherinterrupt request. (See the Pentium ProcessorFamily Developer’s Manual, Volume 3, Section6.2.1.6 for a complete description of the FPU saveimage.)

After the exception handler body, the epiloguesprepare the processor to resume execution from thepoint of interruption (i.e., the instruction followingthe one that generated the unmasked exception).Notice that the exception flags in the memoryimage that is loaded into the FPU are cleared tozero prior to reloading (in fact, in these examples,the entire status word image is cleared).

Example 1 and Example 2 assume that theexception handler itself will not cause an unmaskedexception. Where this is a possibility, the generalapproach shown in 3 can be employed. The basictechnique is to save the full FPU state and then toload a new control word in the prologue. Note thatconsiderable care should be taken when designingan exception handler of this type to prevent thehandler from being reentered endlessly.

Example 1. Full-State Exception Handler

SAVE_ALL PROC;; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE

PUSH EBP..MOV EBP, ESPSUB ESP, 108 ; ALLOCATES 108 BYTES (32-bit PROTECTED MODE SIZE)

;SAVE FULL FPU STATE, RESTORE INTERRUPT ENABLE FLAG (IF)FNSAVE [EBP-108]PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOPPOPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION

;; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE;; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY)

AP-578

20

2/21/97 12:57 PM 24329102.DOC


; RESTORE MODIFIED STATE IMAGEMOV BYTE PTR [EBP-104], 0HFRSTOR [EBP-108]

; DE-ALLOCATE STACK SPACE, RESTORE REGISTERSMOV ESP, EBP..POP EBP

;; RETURN TO INTERRUPTED CALCULATION

IRETDSAVE_ALL ENDP

Example 2. Reduced-Latency Exception Handler

SAVE_ENVIRONMENT PROC;; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU ENVIRONMENT


;SAVE ENVIRONMENT, RESTORE INTERRUPT ENABLE FLAG (IF)FNSTENV [EBP-28]PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOPPOPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION

;; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE;; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY); RESTORE MODIFIED ENVIRONMENT IMAGE

MOV BYTE PTR [EBP-24], 0HFLDENV [EBP-28]


;; RETURN TO INTERRUPTED CALCULATION

IRETDSAVE_ENVIRONMENT ENDP

Example 3. Reentrant Exception Handler

LOCAL_CONTROL DW ? ; ASSUME INITIALIZED..

REENTRANT PROC;; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE

AP-578

2/21/97 12:57 PM 24329102.DOC


21


; SAVE STATE, LOAD NEW CONTROL WORD, RESTORE INTERRUPT ENABLE FLAG (IF)FNSAVE [EBP-108]FLDCW LOCAL_CONTROLPUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOPPOPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION

.

.;; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE. AN UNMASKED EXCEPTION; GENERATED HERE WILL CAUSE THE EXCEPTION HANDLER TO BE REENTERED.; IF LOCAL STORAGE IS NEEDED, IT MUST BE ALLOCATED ON THE STACK.;

.

.; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY); RESTORE MODIFIED STATE IMAGE

MOV BYTE PTR [EBP-104], 0HFRSTOR [EBP-108]


;; RETURN TO POINT OF INTERRUPTION

IRETDREENTRANT ENDP

AP-578

22

2/21/97 12:57 PM 24329102.DOC


3.5 Need for Preserving the Stateof IGNNE# Circuit if Use FPUand SMM

In Section 2.3.2 the recommended circuit (Figure 2)for MS-DOS compatible FPU exception handling forIntel486 processors and beyond contains two flipflops. When the FPU exception handler accessesI/O port 0F0H it clears the IRQ13 interrupt requestoutput from Flip Flop #1 and also clocks out theIGNNE# signal (active) from Flip Flop #2. Theassertion of IGNNE# may be used by the handler ifneeded to execute any FPU instruction whileignoring the pending FPU errors. The problem hereis that the state of Flip Flop #2 is effectively anadditional (but hidden) status bit that can affectprocessor behavior, and so ideally should be savedupon entering SMM, and restored before resumingto normal operation. If this is not done, and also theSMM code saves the FPU state, AND an FPU errorhandler is being used which relies on IGNNE#assertion, then (very rarely) the FPU handler willnest inside itself and malfunction. The followingexample shows how this can happen.

The problem will only occur if the processor entersSMM between the OUT and the FLDCWinstructions. But if that happens, AND the SMMcode saves the FPU state using FNSAVE, then theIGNNE# Flip Flop will be cleared (becauseFNSAVE clears the FPU errors and thus de-assertsFERR#). When the processor returns from SMM itwill restore the FPU state with FRSTOR, which willre-assert FERR#, but the IGNNE# Flip Flop will notget set. Then when the FPU error handler executesthe FLDCW instruction, the active error conditionwill cause the processor to re-enter the FPU errorhandler from the beginning. This may cause thehandler to malfunction.

Note that NMI (or any interrupt through INTR that isenabled inside the FPU exception handler) willcause this same problem, if the interrupt routinesaves and restores the FPU state, and it happensto occur between the OUT and the FLDCWinstructions. SMI is the main focus here because itis much more likely to invoke FNSAVE/FRSTORthan other interrupts because of 0V suspend (seebelow). The problem can easily be eliminated fromall interrupts besides SMI and NMI by not enablingINTR inside the FPU exception handler.

To avoid this problem, Intel recommends twomeasures:

1. Do not use the FPU for calculations insideSMM code (or code for NMI, or any otherinterrupts enabled inside the FPU exceptionhandler). (The normal power management,and sometimes security, functions provided bySMM have no need for FPU calculations; ifthey are needed for some special case, usescaling or emulation instead.) This eliminatesthe need to do FNSAVE/FRSTOR inside SMMcode, except when going into an 0V suspendstate (in which, in order to save power, theprocessor is turned off completely, requiring itscomplete state to be saved).

2. The system should not call upon SMM code toput the processor into 0V suspend while theprocessor is running FPU calculations, or justafter an interrupt has occurred. Normal powermanagement protocol avoids this by going intopower down states only after timed intervals inwhich no system activity occurs.

3.6 Considerations When FPUShared Between Tasks

The Intel Architecture allows speculative deferral offloating-point state swaps on task switches. Thisfeature allows postponing an FPU state swap untilan FPU instruction is actually encountered inanother task. Since kernel tasks rarely use floating-point, and some applications do not use floating-point or use it infrequently, the amount of timesaved by avoiding unnecessary stores of thefloating-point state is significant. Speculativedeferral of FPU saves does, however, place anextra burden on the kernel in three key ways:

1. The kernel must keep track of which threadowns the FPU, which may be different fromthe currently executing thread.

2. The kernel must associate any floating-pointexceptions with the generating task. Thisrequires special handling since floating-pointexceptions are delivered asynchronous withother system activity.

3. There are conditions under which spuriousfloating-point exception interrupts aregenerated, which the kernel must recognizeand discard.

AP-578

2/21/97 12:57 PM 24329102.DOC


23

Suppose that the FPU exception handler includes the following sequence:

FNSTSW save_sw ; save the FPU status word using a “No-Wait” FPU instruction

OUT 0F0H, AL ; clears IRQ13 & activates IGNNE#

. . . .

FLDCW new_cw ; loads new CW ignoring FPU errors, since IGNNE# is assumed active; or any ; other FPU instruction that is not a “No-Wait” type will cause the

same problem

. . . .

FCLEX ; clear the FPU error conditions & thus turn off FERR# & reset theIGNNE# FF

3.6.1 SPECULATIVELY DEFERRING FPUSAVES, GENERAL OVERVIEW

In order to support multi-tasking, each thread in thesystem needs a save area for the general purposeregisters, and each task that is allowed to usefloating-point needs an FPU save area largeenough to hold the entire FPU stack and associatedFPU state such as the control word and statusword. (See the Pentium Processor FamilyDeveloper’s Manual, Volume 3, Section 6.2.1.6 fora complete description of the FPU save image.)

On a task switch, the general purpose registers areswapped out to their save area for the suspendingthread, and the registers of the resuming thread areloaded. The FPU state does not need to be savedat this point. If the resuming thread does not usethe FPU before it is itself suspended, then both asave and a load of the FPU state has beenavoided. It is often the case that several threadsmay be executed without any usage of the FPU.

The processor supports speculative deferral of FPUsaves via interrupt 7 “Device Not Available” (DNA),used in conjunction with CR0 bit 3, the “TaskSwitched” bit (TS). (See the Pentium ProcessorFamily Developer’s Manual, Volume 3, Sections10.1.3 & 14.9.7) Every task switch via the hardwaresupported task switching mechanism (see Section13.5 of the Pentium Processor Family Developer’sManual, Volume 3) sets TS. Multi-threaded kernelsthat use software task switching4 can set the TS bitby reading CR0, ORing a ‘1’ into bit 3, and writingback CR05. Any subsequent floating-pointinstructions (now being executed in a new threadcontext) will fault via interrupt 7 before execution.

Footnotes4In a software task switch, the operating system

uses a sequence of instructions to save thesuspending thread’s state and restore theresuming thread’s state instead of the single long,noninterruptable task switch operation provided bythe Intel Architecture.

5 Although CR0, bit 2, the emulation flag (EM), alsocauses a DNA exception, do not use the EM bitas a surrogate for TS. EM means that no floating-point unit is available and that FP instructionsmust be emulated. Using EM to trap on taskswitches is not compatible with Intel ArchitectureMMX Technology. If the EM flag is set, MMXinstructions raise the invalid opcode exception.

AP-578

2/21/97 12:57 PM 24329102.DOC


24

This allows the DNA handler to save the oldfloating-point context and reload the FPU state forthe current thread. The handler should clear the TSbit before exit using the CLTS instruction. On returnfrom the handler the faulting thread will proceedwith its floating-point computation.

Some operating systems save the FPU context onevery task switch, typically because they alsochange the linear address space between tasks.The problem and its solution discussed below applyto these operating systems also.

3.6.2 TRACKING FPU OWNERSHIP

Since the contents of the FPU may not belong tothe currently executing thread, the thread identifierfor the last FPU user needs to be trackedseparately. This is not complicated -- the kernelshould simply provide a variable to store the threadidentifier of the FPU owner, separate from thevariable that stores the identifier for the currentlyexecuting thread. This variable is updated in theDNA exception handler, and is used by the DNAexception handler to find the FPU save areas of theold and new threads. A simplified flow for a DNAexception handler is then:

1. Use the ‘FPU Owner’ variable to find the FPUsave area of the last thread to use the FPU.

2. Save the FPU contents to the old thread’ssave area, typically using an FNSAVEinstruction.

3. Set the ‘FPU Owner’ variable to the identifythe currently executing thread.

4. Reload the FPU contents from the newthread’s save area, typically using anFRSTOR instruction.

5. Clear TS using the CLTS instruction and exitthe DNA exception handler.

While this flow covers the basic requirements forspeculatively deferred FPU state swaps, there aresome additional subtleties that need to be handledin a robust implementation.

3.6.3 INTERACTION OF FPU STATE SAVESAND FP EXCEPTION ASSOCIATION

Recall these key points from earlier in thisdocument: When considering FP exceptions acrossall implementations of the Intel Architecture, andacross all FP instructions, an FP exception can be

initiated from any time during the excepting FPinstruction, up to just before the next FP instruction.The ‘next’ FP instruction may be the FNSAVE usedto save the FPU state for a task switch. In the caseof “no-wait:” instructions such as FNSAVE, theinterrupt from a previously excepting instruction(NE=0 case) may arrive just before the “no-wait”instruction, during, or shortly thereafter with asystem dependent delay. Note that this implies thatan FP exception might be registered during thestate swap process itself, and the kernel and FPexception interrupt handler must be prepared forthis case.

A simple way to handle the case of exceptionsarriving during FPU state swaps is to allow thekernel to be one of the FPU owning threads. Areserved thread identifier is used to indicate kernelownership of the FPU. During an FP state swap,the ‘FPU owner’ variable should be set to indicatethe kernel as the current owner. At the completionof the state swap, the variable should be set toindicate the new owning thread. The numericexception handler needs to check the FPU ownerand discard any numeric exceptions that occurwhile the kernel is the FPU owner. A more generalflow for a DNA exception handler that handles thiscase is shown next:

AP-578

2/21/97 3:11 PM 24329102.DOC


25

DNA Handler Entry

Current Threadsame as

FPU Owner?

FPU Owner := Kernel

FNSAVE to Old Thread’sFP Save Area

(may cause numeric exception)

<other handler set up code>

<other handler code>

FPU Owner := Current Thread

FRSTOR from Current Thread’sFP Save Area

CLTS (clears CR0.TS)

Exit DNA Handler

No

Yes

<handler final clean-up>

Numeric exceptions received while the kernel owns the FPU for a state swap must be discarded in the kernelwithout being dispatched to a handler. A flow for a numeric exception dispatch routine is shown below:

Numeric Exception Entry

Is KernelFPU Owner?

Normal Dispatch toNumeric Exception Handler Exit

No

Yes

AP-578

2/21/97 12:57 PM 24329102.DOC


26

It may at first glance seem that there is a possibilityof FP exceptions being lost because of exceptionsthat are discarded during state swaps. This is notthe case, as the exception will be re-issued whenthe FP state is reloaded. Walking through stateswaps both with and without pending numericexceptions will clarify the operation of these twohandlers.

Case 1: FPU State Swap Without NumericException

Assume two threads ‘A’ and ‘B’, both using thefloating-point unit. Let A be the thread to have mostrecently executed a FP instruction, with no pendingnumeric exceptions. Let B be the currentlyexecuting thread. CR0.TS was set when thread Awas suspended. When B starts to execute a FPinstruction the instruction will fault with the DNAexception because TS is set.

At this point the handler is entered, and eventually itfinds that the current FPU Owner is not thecurrently executing thread. To guard the FPU stateswap from extraneous numeric exceptions, the FPUOwner is set to be the kernel. The old owner’s FPUstate is saved with FNSAVE, and the currentthread’s FPU state is restored with FRSTOR.Before exiting, the FPU owner is set to thread B,and the TS bit is cleared.

On exit, thread B resumes execution of the faultingFP instruction and continues.

Case 2: FPU State Swap with DiscardedNumeric Exception

Again, assume two threads ‘A’ and ‘B’, both usingthe floating-point unit. Let A be the thread to havemost recently executed a FP instruction, but thistime let there be a pending numeric exception. LetB be the currently executing thread. When B startsto execute a FP instruction the instruction will faultwith the DNA exception and enter the DNA handler.(If both numeric and DNA exceptions are pending,the DNA exception takes precedence, in order tosupport handling the numeric exception in its owncontext.)

When the FNSAVE starts, it will trigger an interruptvia FERR# because of the pending numericexception. After some system dependent delay, thenumeric exception handler is entered. It may beentered before the FNSAVE starts to execute, or itmay be entered shortly after execution of theFNSAVE. Since the FPU

Owner is the kernel, the numeric exception handlersimply exits, discarding the exception. The DNAhandler resumes execution, completing theFNSAVE of the old FP context of thread A and theFRSTOR of the FP context for thread B.

Thread A eventually gets an opportunity to handlethe exception that was discarded during the taskswitch. After some time, thread B is suspended,and thread A resumes execution. When thread Astarts to execute an FP instruction, once again theDNA exception handler is entered. B’s FPU state isFNSAVE’ed, and A’s FPU state is FRSTOR’ed.Note that in restoring the FPU state from A’s savearea, the pending numeric exception flags arereloaded in to the FP status word. Now when theDNA exception handler returns, thread A resumesexecution of the faulting FP instruction just longenough to immediately generate a numericexception, which now gets handled in the normalway. The net result is that the task switch andresulting FPU state swap via the DNA exceptionhandler causes an ‘extra’ numeric exception whichcan be safely discarded.

3.6.4 INTERRUPT ROUTING FROM THEKERNEL

In MS-DOS, an application that wishes to handlenumeric exceptions hooks interrupt 2 by placing itshandler address in the interrupt vector table, andexiting via a jump to the previous interrupt 2handler. Protected mode systems that run MS-DOSprograms under a subsystem can emulate thisexception delivery mechanism. For example,assume a protected mode O.S. that runs withCR.NE = 1, and that runs MS-DOS programs in avirtual machine subsystem. The MS-DOS programis set up in a virtual machine that provides avirtualized interrupt table. The MS-DOS applicationhooks interrupt 2 in the virtual machine in thenormal way. A numeric exception will trap to thekernel via the real INT 16 residing in the kernel atring 0. The INT 16 handler in the kernel thenlocates the correct MS-DOS virtual machine, andreflects the interrupt to the virtual machine monitor.The virtual machine monitor then emulates aninterrupt by jumping through the address in thevirtualized interrupt table, eventually reaching theapplication’s numeric exception handler.

AP-578

2/21/97 12:57 PM 24329102.DOC


27

4.0 DIFFERENCES FOR HANDLERSUSING NATIVE MODE

The 8087 has a pin INT which it asserts when anunmasked exception occurs. But there is nointerrupt input pin in the 8086 or 8088 dedicated toits attachment, nor an interrupt vector number in the8086 or 8088 specific for an FPU error assertion.But beginning with the Intel 80286 and 80287,hardware connections were dedicated to supportthe FPU exception, and interrupt vector 16assigned to it.

4.1 Origin with 80286 and 80287;Intel386™ Processor andIntel387 Math Coprocessor

The 80286 and 80287 and Intel386 processor andIntel387 math coprocessor pairs are each providedwith ERROR# pins that are recommended to beconnected between the processor and FPU. If thisis done, when an unmasked FPU exception occurs,the FPU records the exception, and asserts itsERROR# pin. The processor recognizes this activecondition of the ERROR# status line immediatelybefore execution of the next WAIT or FPUinstruction (except for the “N0-Wait” type) in itsinstruction stream, and branches to the routine atinterrupt vector 16. Thus an FPU exception will behandled before any other FPU instruction (after theone causing the error) is executed (except for “No-Wait” instructions, which will be executed withouttriggering the FPU exception interrupt, but it willremain pending).

Using the dedicated interrupt 16 for FPU exceptionhandling is referred to as the native mode. It is thesimplest approach, and the one recommendedmost highly by Intel.

4.2 Changes with Intel486 ,Pentium and Pentium ProProcessors with CR0.NE=1

With these latest three generations of the IntelArchitecture, more enhancements and speedupfeatures have been added to the correspondingFPUs. Also, the FPU is now built into the same chipas the processor, which allows further increases inthe speed at which the FPU can operate as part ofthe integrated system. This also means that thenative mode of FPU exception handling, selectedby setting bit NE of register CR0 to 1, is nowentirely internal.

If an unmasked exception occurs during an FPUinstruction, the FPU records the exceptioninternally, and triggers the exception handlerthrough interrupt 16 immediately before executionof the next WAIT or FPU instruction (except for “No-Wait” instructions, which will be executed asdescribed in Section 4.1 above).

An unmasked numerical exception causes theFERR# output to be activated even with NE=1, andat exactly the same point in the program flow as itwould have been asserted if NE were zero.However, the system would not connect FERR# toa PIC to generate INTR when operating in thenative, internal mode. (If the hardware of a systemhas FERR# connected to trigger IRQ13 in order tosupport MS-DOS, but an OS using the native modeis actually running the system, it is the OSsresponsibility to make sure that IRQ13 is notenabled in the slave PIC.) With this configuration asystem is immune to the problem discussed inSection 2.3.3, where for Intel486 and Pentiumprocessors a “No-Wait” FPU instruction can get anFPU exception.

4.3 Considerations When FPUShared Between Tasks UsingNative Mode

The protocols recommended in Section 3.6 forMS-DOS compatible FPU exception handlers thatare shared between tasks may be used withoutchange with the native mode. However, theprotocols for a handler written specifically for nativemode can be simplified, because the problem of aspurious floating-point exception interrupt occurringwhile the kernel is executing cannot happen innative mode.

The problem as actually found in practical code in aMS-DOS compatible system happens when theDNA handler uses FNSAVE to switch FPUcontexts. If an FPU exception is active, thenFNSAVE triggers FERR# briefly, which usually willcause the FPU exception handler to be invokedinside the DNA handler. In native mode, neitherFNSAVE nor any other “No-Wait” instructions cantrigger interrupt 16. (As discussed above, FERR#gets asserted independent of the value of the NEbit, but when NE=1, the OS should not enable itspath through the PIC.) Another possible (very rare)way a floating-point exception interrupt could occurwhile the kernel is executing is by an FPUimmediate exception case having its interruptdelayed by the external hardware until executionhas switched to the kernel. This also cannot

AP-578

2/21/97 12:57 PM 24329102.DOC


28

happen in native mode because there is no delaythrough external hardware.

Thus the native mode FPU exception handler canomit the test to see if the kernel is the FPU owner,and the DNA handler for a native mode system canomit the step of setting the kernel as the FPUowner at the handler’s beginning. Since howeverthese simplifications are minor and save little code,it would be a reasonable and conservative habit (aslong as the MS-DOS compatible mode is widelyused) to include these steps in all systems.

Note that the special DP (Dual Processing) modefor Pentium processors, and also the more generalIntel MultiProcessor Specification for systems withmultiple Pentium or Pentium Pro processors,support FPU exception handling only in the nativemode. Intel does not recommend using theMS-DOS compatible FPU mode for systems usingmore than one processor.

Documents

Software and Hardware Considerations for FPU Exception ...cdn.preterhuman.net/texts/computing/intel_cpu... · 2/21/97 12:57 PM 24329102.DOC INTEL CONFIDENTIAL (until publication date)