C2. Fixed Point and Floating Point Operations

Embed Size (px)

Citation preview

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    1/71

    Table of Contents

    :

    Chapter

    -

    1

    Basic Structure of

    Computer

    (1-1)

    to

    (1-

    Chapter

    -

    2

    Rxed Point and

    Floating

    Point

    Operations

    (2-1)

    to

    (2

    -

    Chapter

    -

    3 Basic

    Processing

    Unit

    (3-1) to

    (3

    -

    Chapter

    -

    4

    Pipelining (4-1)

    to

    (4

    -

    Chapter

    -

    5

    Memory

    System

    (5-1)

    to

    (5

    -1

    Chapter

    -6 I/O

    Organization

    (6-1)

    to

    (6

    -1

    Appendix

    -A

    Proofs

    (A -1)

    to

    (A

    Features

    of Book

     

    ;*

    Use

    of

    clear,

    plain

    and

    lucid

    language

    making

    the

    understanding

    very

    eas

     •

    Book provides

    detailed

    insight

    into

    the

    subject

     *

    Approach

    of

    the

    book resembles class

    room

    teaching.

    |*

    Excellent

    theory

    well

    supported

    with the

    practical

    examples

    and

     

    illustrations.

     

    *

    Neat

    and

    to

    the scale

    diagrams

    for

    easier

    understanding

    of concepts.

    mmmm

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    2/71

    Syllabus

    (Computer

    Organization and

    Architecture)

    1.

    Basic

    Structure of Computers

     Chapten

    -

    1,2)

    Functional

    units

     

    Basic

    operational

    concepts

     

    Bus

    structures

    -

    Performance

    and

    metrics

     

    Instructions

    and

    instruction

    sequencing

    -

    Hardware

     

    software

    interface

     

    Instruction

    set

    architecture

    -

    Addressing

    modes

     

    RISC

     

    CISC. ALU

    design

     

    Fixed

    point

    and

    floating

    point

    operations.

     

    2.

    Basic

    Processing

    Unit

    (Chapter

    -

    3)

    Fundamental

    concepts

    -

    Execution of

    a

    complete

    instruction

     

    Multiple

    bus

    organization  

    Hardwired

    control

     

    Micro

    programmed control

    -

    Nano

    programming.

    3.

    Pipelining

    (Chapter

     

    4)

    Basic

    concepts

     

    Data

    hazards

     

    Instructionhazards

     

    Influenceon instruction sets

     

    Data

    path

    and control considerations

    -

    Performance

    considerations

     

    Exception

    handling.

    4.

    Memory

    System

    (Chapter

    -

    5)

    Basic

    concepts

     

    Semiconductor

    RAM

    -

    ROM

    -

    Speed

     

    Size

    and

    cost

     

    Cache

    memories

     

    Improving cache

    performance

     

    Virtual

    memory

     

    Memory management

    requirements

     

    Associative

    memories

     

    Secondary storage

    devices.

    5. I/O

    Organization

    (Chapter

     

    0

    Accessing

    I/O devices

     

    Programmed input/output

    -

    Interrupts- Direct

    memory access

     

    Buses

     

    Interface

    circuits

     

    Standard

    I/O

    Interfaces

     PCI,

    SCSI,

    USB),

    I/O

    devices

    and

    processors.

    ¡ö¡ö¡ö¡ö¡ö¡ª

      ■

     f   ilHlimm

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    3/71

    Table of

    Contents

    (Detail)

    1.1 Introduction

    1-1

    .1 Introduction

     

    -1

    1

    .2

    Computer

    Types

    1-2

    1.3

    Functional

    Units

    1

    -

    4

    13.1

    Input

    Unit

    1-5

    1.32

    Memory

    Unit

    1-5

    1.3.3

    Arithmetic

    and

    Logic

    Unit

    1-6

    1.3.4

    Output

    Unit

    1-7

    1.3.5

    Control

    Unit

    1-7

    1

    .4 Basic

    Operational

    Concepts

    :

    1-7

    1.5

    BUS Structures

    .

    1-11

    1.5.1

    Single

    Bus

    Structure

    1-13

    1.52

    Multiple

    Bus Structures

    1-14

    1.6

    Software

    1-15

    1.7

    Performance

    1-19

    1.7.1

    Processor

    Clock

    1-21

    1.72

    CPU Time

    1-21

    1.7.3

    Performance Metrics

    1-22

    1.7.3.1

    Hardware

    Software Interface

    t

     

    22

    1.7.32

    Oder Performance Measures

    1

    -

    24

    1.7.4

    Performance Measurement 1-26

    1.8 Instructions

    and

    Instruction

    Sequencing

    1

    -

    29

    1.8.1

    Register

    Transfer

    Notation

    1-32

    1

    .8.2

    Assembly

    Language

    Notation

    1-32

    1.8.3

    Basic

    Instruction

    Types

    1-33

    1

    3.3.1 Three Address Instructions 1-33

    18.32

    Two Address Instructions

    1-33

    18.33

    One

    Address

    Instnxtcn 1-34

    1

    8.3.4

    Zero

    Address Instructions 1-34

    1.8.4

    Instruction

    Execution

    and

    Straight-Line Sequencing

    1-35

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    4/71

    I

    1.8.5

    Branching

    1-37

    1.8.6

    Conditional Codes

    1-39

    1.8.7

    Generating

    Memory

    Addresses

    1-40

    1.9

    Instruction

    Set

    Architecture

    1

     

    40

    1.10

    RISC-CISC

    1-42

    1.10.1

    RISC VersusCISC

    1-43

    1.11

    Addressing

    Modes

    1

    -

    44

    1.11.1

    Implementation

    ot

    Variables

    and

    Constants

    1

    -44

    1.112

    Indirection and

    Pointers

    1-45

    1.11.3

    Indexing and

    Arrays

     

    1-45

    1.11.4 Relative

    Addressing

    1-47

    1.11.5

    AdditionalModes

    1-47

    1.11.6

    RISC

    Addressing

    Modes

    1-48

    Review

    Questions

    1

    -

    49

    University

    Questions

    with

    Answers

    1-50

    Chapter-

      Hxeo rotni

    ana

    rioatrog

    roim

    uperauons

      • wioiz-w

    2.1

    Introduction

    2-1

    2.2

    Addition

    and Subtraction

    of

    Signed Numbers

    2

     

    1

    22.1 Adders

    2-7

    22.1.1

    Hall-Adder

    2-7

    22.12Ful-Adder

    . 2-8

    222

    Serial

    Adder

    2-10

    22.3

    Parallel

    Adder

    2-11

    22.4

    Parallel Subtractor

    2-12

    22.5

    Addition /

    Subtraction

    Logic

    Unit

    2-12

    2.3

    Design

    of

    Fast

    Adders

    2-14

    2.3.1

    Carry-Lookahead Adders

    2-15

    2.4

    Multiplication

    of

    Positive Numbers

    2-19

    2.5 Signed Operand

    Multiplication

    2-22

    2.5.1

    Booth’s

    Algorithm

    2-22

    2.6

    Fast

    Multiplication

    2-29

    2.6.1

    Bit-Pair

    Recoding

    ol

    Multipliers

    2-29

    2.7

    Integer

    Division

     

    2-31

    17.1

    RestoringDivision 2-31

    2.72 Non-restoringDivision 2-35

    17.3

    Comparison

    between

    Restoring

    and Non-restoring

    Division

    Algorithm

    2-38

    2.8

    Floating

    Point Numbers

    and Operations

    2-39

    ¡öBB

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    5/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    6/71

    Fixed

    Point

    and

    Floating

     

    Point

    Operations

    2.1

    Introduction

    This

    chapter explains

    the addition

    and

    subtraction of

    signed

    numbers

    and

    implementation

    of

    their

    circuits.

    We

    will

    also see the

    design of

    fast adders.

    Multiplication

    algorithms

    for

    unsigned

    and

    signed

    numbers are

    covered

    in

    this

    chapter.

    Then

    a

    technique

    which

    works

    equally

    well

    for both

    positive

    and

    negative

    multiplier

    called

    'Booth

    algorithm’

    is

    explained.

    Then

    the

    algorithms

    for

    integer

    division are

    explained.

    The

    chapter

    discussed

    floating

    point

    numbers

    and basic

    arithmetic

    operations

    on

    them at

    the

    end.

    Based

    on the number

    system

    two

    basic

    data

    types

    are

    implemented

    in

    the

    computer

    system

    :

    fixed point numbers and

    floating

    point

    numbers.

    Representing

    numbers

    in

    such

    data

    types

    is

    commonly

    known as

    fixed

    point

    representation

    and

    floating

    point

    representation,

    respectively.

    In

    binary

    number

    system,

    a

    number can be

    represented

    as

    an

    integer

    or

    a

    fraction.

    Depending

    on

    the

    design,

    the

    hardware can

    interpret

    number

    as

    an

    integer

    or fraction. The

    radix

    point

    is

    never

    explicitly

    specified.

    It is

    implicited

    in

    the

    design

    and

    the

    hardware

    interprets

    it

    accordingly.

    In integer

    numbers,

    radix

    point

    is

    fixed

    and

    assumed to

    be

    to

    the

    right

    of the

    right

    most digit

    As radix

    point

    is

    fixed,

    the

    number

    system

    is

    referred

    to

    as

    fixed

    point

    number

    system.

    With

    fixed

    point

    number

    system

    we

    can

    represent

    positive

    or

    negative

    integer

    numbers.

    However,

    floating

    point

    number

    system

    allows

    the

    representation

    of numbers

    having

    both

    integer

    part

    and

    fractional

    part.

    2.2

    Addition

    and Subtraction

    of

    Signed

    Numbers

    We

    can

    relate

    addition and

    subtraction

    operations of

    numbers

    by

    the

    following

    relationship

    :

    (± A)

    -

    (+B)

    =

    A)

    +

    (-B)

    and

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    7/71

    Computer

    Organization &

    Architecture

    2-2

    Fixed

    Point and

    Float

    Point

    Operatio

    Therefore,

    we can

    change

    subtraction

    operation

    to

    an

    addition

    operation

    by

    chang

    the

    sign

    of the

    subtrahend. Let

    us see how we can

    represent

    negative

    numbers

    in

    bin

    system.

    1's

    Complement Representation

    The

    l s

    complement

    of

    a

    binary number

    is the

    number

    that

    results

    when we

    change

    l s to

    zeros

    and the

    zeros

    to

    ones.

    Example

    2.1

    : Find

    l s

    complement of

    {l

    1

    0

    1)2.

    Solution :

    1 1

    0

    1

    «-

    number

    0 0

    1

    0

    «-

    l s

    complement

    Example

    2

    .2

    :

    Find

    l's

    complement

    of 1

    0

    1

    1 10

    0

    1.

    Solution : 10111001 number

    01000110

    l s

    complement

    2's

    Complement

    Representation

    The 2's

    complement

    is the

    binary

    number

    that

    results when we add

    1

    to

    the

    complement.

    It

    is

    given

    as

    2's

    complement

    =

    l s

    complement

    +

    1

    The 2's

    complement

    form is

    used

    to

    represent

    negative

    numbers.

    Example

    2.3

    :

    Find 2's

    complement of

      1

    0

    0

    1)2.

    Solution

    :

    1

    0

    0

    1

    number

    0 110 l s

    complement

    +

    1

    0

    111

    2's

    complement

    Example

    2.4

    :

    Find

    2 s

    complement

    of

    (1

    010

    001

    1)2.

    Solution : 1010

    0011

    number

    0101 1100

    l's

    complement

    +

    1_

    0 10

    1 1

    101

    2's

    complement

    Let

    us see the subtraction of binary number

    using

    l's complement

    and

    2‘ s complem

    number

    representations.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    8/71

    Computer

    Organization

    &

    Architecture

    2-3

    Fixed

    Point and Floating

    Point

    Operations

    Binary

    Arithmetic

    -

    Negative Numbors

    in

    1‘ s

    Complement

    Form

    Case

    1

    (Both

    Positive)

    :

    Add

    (28)10

    and

    (15)10

    2

    2

    2

    2

    2

    28

    0

    14

    0

    7

    1

    3

    1

    1

    1

    0

    LSD

    MSD

    LSD

    MSD

    (011100),

     

    »

    (28)

    10

    (01111).

    (15)

    )0

    Addition

    of

    28

    and 15

    :

     

    Cany

     

    §£

    0

    1 1 1

    0

    0

    (28)10

    0

    0

    0

    1

     

    (15)ro

    0

    1

    0

    1

    0 1

    1

    (43),

    o

    Note :

    Here,

    the

    magnitude

    of

    greater

    number is

    5-bit;

    however,

    the

    magnitude

    of

    the

    result

    is

    6-bit.

    Therefore,

    the

    numbers

    arc

    sign-extended

    to

    7-bits.

    Case

    2

    (Smaller

    Negative)

    :

    Add

    (28)10

    and

    (~15)10

    We

    have

    (011100),

    -*

    (28)

    I0

    and

    (01111),

    (15)

    10

    (10000),

     

    *

    l’s

    complement

    of

    15

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    9/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    10/71

    Fixed

    Point and

    Floating

    Computer

    Organization &

    Architecture 2-5

    Point

    Operations

    Verification

    :

    1

    0

    1

    0

    1

    0

    0

    0 1

    0

    1

    0

    1

    1

    ->

    (43),

    0

    Note

    :

     

    Here,

    the

    magnitude

    of

    greater

    number

    is

    5-bit;

    however,

    the

    magnitude

    of

    the

    result

    is 6-bit.

    Therefore,

    the numbers are sign-extended

    to 7-bits.

     

    For

    proper

    result we

    suggest

    to

    use

    1 sign-bit

    extension

    to the

    number

    having

    greater

    magnitude

    and

    represent

    the number

    having

    smaller

    magnitude

    with

    extended

    number of bits.

    Binary

    Arithmetic

     

    Negative

    Numbers

    in

    2‘ s

    Complement

    Form

    Case

    1

    (Both

    Positive)

    :

    Add

    (28)10

    and

    (15)10

    We

    have

    (0111

    00)2

    -*

    (28),

    and

    (01111)2

     ÿ

    (15),„

    Addition

    of

    28 and

    15

    Sign-extension

     

    0

    1

    1 1

    0

    0

    (28ho

     

    Sign-axtensioo

    I Oj

    0

    0

     

    1

    (15),c

    Sign

     

    *

     

    1

    1

    (43),

    0

    Case

    2

    (Smaller

    Negativo)

    :

    Add

    (28)10

    and

    (-15)10

    We

    have

    (0111

    00)2

     ÿ

    (2

    8)10

    and

    (01111)]

    (15),„

    (10001)]

     

    » 2's

    complement

    of

    15

    Addition

    of 28 and

    -

    15

    Sign-axtansion

     ÿ

    0

    1

    1

    1

    0 0

    (28),

    Sign-axtansion

    M

    1

    1

    0

    0

    0

    1

     —

    1S),o

    *

    Ignore

    carry

    [X.J

    0 0

    0

    1 1

    0

    1

    (13),

    Case

    3

    (Greater

    Negative)

    :

    Add

    (-28)10

    and

    (15)10

    We have

    (011100)2

    (28),„

    and

    (01111)]

    -»(15)10

    (100100)] »

    2's

    complement

    of 28

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    11/71

    Fixed Point

    and

    Floating

    Computer

    Organization &

    Architecture

    2

    -

    6

    Point

    Operations

    Addition

    of

    (-

     

    28)

    and

    (15)

    Sign-extension

    -*

    1 0

     

    0

    (-28),o

    Sign-extension

    -*

     

    o o

     

    (15),0

    1

    11

    0

    0 1

    1

     -13)

    Result

    it

    In

    2

    a

    complement

    form

    Verification

    :

    1

    1

    1

    0

    0

    1

    1

    0

    0 0

    1

    1 0

    0

     

    1

    0 0 0

    1

    1

    0

    1

    -*

    (13|10

    Case

    4

    (Both

    Negative)

    :

    Add

    (-28)10

    and

    (-15)10

    Wc

    have

    (Oil

    100)

    2

    (28)xo and

    (01111)]

     ÿ

    (15),

     

    (100100)] -»

    2's

    complement

    of

    28

    (10001)]

     

    »

    2's

    complement

    of

    IS

    Addition of

    -

    28

    and

    -

    15

    Sign-extension

    -*

    1

     

    1

    0 0

    1

    0 0

    (-28)

     

    Sign-extension

    £

    1

    1:1

    0 0

    0

    1

    (-

    15)

    Ignore

    carry

    -*

    10 1

    10

    1

    (-43)

    Result Is

    In

    7s

    complement

    form

    Verification

    :

     

    1

    0

     

    1

    0

    1

    0

    1

    0 1 1

    -*

    (43)

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    12/71

    Computer Organization

      Architecture 2-7

    Fixed

    Point

    and

    Floating

    Point

    Operations

    2.2.1

    Adders

    Digital

    computers

    perform

    various arithmetic

    operations.

    The

    most

    basic

    operation,

    no

    doubt,

    is

    the

    addition of two

    binary

    digits.

    This

    simple

    addition

    consists of

    four

    possible elementary operations,

    namely,

    0

    +

    0

    =

    0

    0

    +

    1 1

    1

    +

      =1

    1

    +

    1

    »

    lOj

    The

    first

    three

    operations

    produce

    a sum whose

    length

    is one

    digit,

    but

    when the

    last

    operation

    is

    performed

    sum

    is

    two

    digits.

    The

    higher

    significant

    bit

    of

    this result

    is called a

    carry,

    and lower

    significant

    bit

    is

    called sum. The

    logic

    circuit which

    performs

    this

    operation

    is

    called a

    half-adder.

    The circuit

    which

    performs

    addition

    of

    three

    bits

    (two

    significant

    bits and

    a

    previous carry)

    is

    a

    full-adder.

    Let

    us

    see

    the

    logic

    circuits to

    perform

    half-adder

    and

    full-adder

    operations.

    2.21.1 Half-Adder

    The half-adder

    operation

    needs two

    binary

    inputs

    :

    augend

    and addend

    bits;

    and

    two

    binary

    outputs

    : sum

    and

    carry.

    The

    truth table

    shown

    in

    Table

    2.1

    gives

    the

    relation

    between

    input

    and

    output

    variables fo r half-adder

    operation.

    Inputs

    Outputs

    A

    B

    Carry

    Sum

    0

    0 0 0

    0

    1

    0

    1

    1

    0

    0

    1

    1 1

    1

    0

    Table 21

    Truth

    table for half-adder

    Carry

    Outputs

    Sum

    Fig. 2.1 Block schematic

    of

    half-adder

    A

     

    Inputs

    B

     

    Half

    adder

    K-map simplification

    for

    carry

    and

    sum

    For Carry

    For Sum

    Carry

    =

    AB

    Sum

    =

    AS

    AB

    ■A©B

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    13/71

    Computer

    Organization

    &

    Architecture

    2-8

    Fixed Point

    and Floating

    Point

    Operations

    Logic

    diagram

    Fig. 2.3

    Logic

    diagram

    for half-adder

    Limitations

    of

    Half-Adder

    :

    In

    multidigit

    addition we have

    to

    add

    two

    bits

    alongwith

    the

    carry

    of

    previous digit

    addition.

    Effectively

    such

    addition

    requires

    addition of three bits. This is not

    possible

    with

    half

    adder. Hence half-adders

    are not

    used

    in

    practice.

    2J.1

     

    Full-Adder

    A

    full-adder

    is

    a

    combinational

    circuit

    that

    forms

    the

    arithmetic

    sum of three

    input

    bits. It

    consists

    of

    three

    inputs

    and

    two outputs.

    Two

    of

    the

    input

    variables,

    denoted

    by A

    and

    B,

    represent

    the

    two

    significant

    bits to be added. The

    third

    input

    represents

    the

    carry

    from

    the

    previous

    lower

    significant

    position.

    The

    troth

    table

    fo r

    full-adder is

    shown

    in

    Table Z2.

    inputs Outputs

    A B

    C*

    Carry

    Sum

    0 0 0 0 0

    o

    0

    1

    0

    :

    0

    1

    0

    0

    1

    0

    1

    1

    1

    0

    1

    0

    0

    0

    1

    1

    0

    1 1

    0

    1 1

    0

    1

    0

    1 1 1 1

    1

    Table 2.2 Truth te'bla for

    full-addor

    Fig.

    2.4 Block

    schematic of

    full-adder

    K-map

    simplification

    for

    carry

    and

    sum

    Coo,

    =

    AB*A

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    14/71

    Computer

    Organization

    &

    Architecture

    2

     

    9

    Fixed

    Point

    and

    Floating

    Point

    Operations

    Logic

    diagram

    A

    Sum

    Fig.

    2.6

    Sum

    of

    product

    implementation

    of

    full-adder

    The Boolean

    function for

    sum

    can be further

    simplified

    as

    follows

    :

    Sum

    =

    A B

    C,n

    +

    A B

    Q„

    +

    A B Cj„

    +

    A B

    Cjn

    =

    Cj„

    (A

    B +

    AB)+

    C,„

    (A

    B+

    A

    B)

    =

    Cin(A

    O

    B)

    +Cln

    (A©

    B)

    =

    Cjn

    (Affi~B)

    +

    Cjj,

    (A© B)

    =

    C;„

    ©(A©

    B)

    With

    this simplified

    Boolean

    function circuit

    for

    full-adder can

    be

    implemented as

    shown

    in

    the

    Fig.

    2.7.

    Fig.

    2.7

    Implementation

    of

    full-adder

    A

    full-adder

    can also be

    implemented

    with

    two

    half-adders

    and one

    OR

    gate, as

    shown

    in

    the

    Fig. 2.8.

    The

    sum

    output

    from the second half-adder

    is die

    exdusive-OR

    of

    0ÿ

    and

    the

    output

    of the

    first

    half-adder,

    giving,

    Sum

    =

    Cj„ ©(A©B)

    =

    Cjn

    ffi(AB

    +AB)

    -

    Cjn

    (A

    B+

    A

    B)

    +

    C„

    (A

    B

    +

    A

    B)

    =

    Cj„ (AB-

    A~B)+Cj„ (AB

    +

    AB)

    =

    Cin[(A+

    BMA+B)l+Cjn(AB+AB)

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    15/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    16/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    17/71

    Computer

    Organization

     

    Architecture

    2-12

    Fixed

    Point

    and

    Floatin

    Point

    Operation

    C„

    Fig.

    2.10

    Block

    diagram

    of

    n-bit

    parallel

    adder

    It

    should be

    noted

    that cither

    a half-adder can

    be

    used for

    the least

    significa

    position

    or

    the

    carry input

    of

    a

    full-adder

    is

    made

    0

    because

    there

    is

    no

    carry

    into

    th

    least

    significant

    bit

    position.

    2.2.4

    Parallel Subtractor

    The

    subtraction

    of

    binary

    numbers

    can

    be

    done most

    conveniently

    by

    means

    complements.

    The subtraction

    A

    -

    B

    can be

    done

    by

    taking

    the

    2's

    complement

    of

    and

    adding it to

    A. The 2's

    complement

    can be

    obtained

    by

    taking

    the l s

    compleme

    and

    adding

    one

    to

    the least

    significant

    pair

    of bits. The l s

    complement

    can

    implemented

    with inverters

    and

    a

    one

    can

    be

    added

    to

    the

    sum through

    the

    inp

    carry

    to

    get

    2's

    complement,

    as shown

    in

    the

    Fig.

    2.11.

    Co-1

    Fig.

    2.11 4-bit

    parallel

    subtractor

    2.2.5 Addition /

    Subtraction

    Logic Unit

    Fig.

    2.12 shows hardware

    to

    implement

    integer

    addition

    and

    subtraction.

    It

    consis

    of

    n-bit

    adder,

    2's

    complement circuit

    and overflow detector

    logic

    circuit Number

    and

    number

    b are the two

    inputs

    fo r n-bit adder.

    For

    subtraction,

    the

    subtrahen

    (number

    b)

    is

    converted

    into

    its 2's

    complement

    form

    by

    making Add

    /Subtract

    contr

    signal

    to

    the

    logic

    one. When

    Add/Subtract

    control

    signal

    is

    one,

    all bits

    of

    number

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    18/71

    Computer

    Organization

    &

    Architecture

    2-13

    Fixed

    Point and

    Floating

    Point

    Operations

    i

    are

    complemented

    and

    carry

    zero

    (Cg)

    is

    set

    to

    one.

    Therefore

    n-bit adder

    gives result

    as R=a+b+l, where

    b+ 1

    represents

    2's

    complement

    of

    number b.

    Fig.

    2.12 (b) shows

    the

    implementation

    of

    logical

    expression

    to

    detect overflow.

    Add/

    Subtract

    cootrot

    Fig.

    2.12 (a) Hardware for

    integer

    addition and

    subtraction

    Vt

    bh-i

    >\vi

    Fig.

    2.12

    (b) Overflow detector

    logic

    circuit

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    19/71

    Computer

    Organization

    &

    Architecture

    2-14

    Fixed

    Point

    and Floatin

    Point

    Operation

    2.3

    Design

    of Fast Adders

    The

    n-bit

    adder discussed

    in

    the last section is

    implemented

    using

    full-add

    stages.

    In

    which the

    carry output

    of each full-adder

    stage

    is connected

    to

    the

    car

    input

    of

    the

    next

    higher-order stage.

    Therefore,

    the

    sum

    and

    carry

    outputs

    of

    an

    stage cannot

    be

    produced

    until

    the

    input carry

    occurs;

    this

    leads

    to

    a

    time

    delay

    in

    th

    addition

    process.

    This

    delay

    is known

    as

    carry

    propagation

    delay,

    which

    can be be

    explained by

    considering

    the

    following

    addition.

    0 10

    1

    +

    0 0

    11

    10 0 0

    Addition

    of

    the LSB

    position

    produces

    a

    carry

    into

    the second

    position.

    This

    carr

    when

    added

    to

    the

    bits

    of

    the

    second position (stage),

    produces

    a

    carry

    into

    the

    thi

    position.

    The

    latter

    carry,

    when

    added to the

    bits

    of

    the third

    position,

    produces

    carry

    into

    the

    last

    position.

    The

    key

    thing

    to

    notice

    in

    this

    example

    is

    that

    the sum b

    generated

    in

    the

    last position

    (MSB)

    depends on

    the

    carry

    that was

    generated

    by

    t

    addition

    in

    the

    previous

    positions.

    This means

    that,

    adder will not

    produce

    corre

    result until

    LSB

    carry

    has

    propagated

    through

    the

    intermediate

    full-adders. Th

    represents

    a time

    delay

    that

    depends

    on the

    propagation

    delay

    produced

    in

    ea

    full-adder.

    For

    example, if

    each full-adder

    is

    considered

    to have

    a

    propagation

    del

    of

    30

    ns,

    'then

    Sg

    will not

    reach

    its

    correct

    value

    until 90 ns after

    LSB

    carry

    generated.

    Therefore,

    total

    time

    required

    to

    perform

    addition

    is

    90+30=

    120

    ns.

    Obviously,

    this

    situation becomes much

    worse

    if

    we extend the

    adder

    circuit

    add

    a

    greater

    number of

    bits.

    If

    the adder

    were

    handling

    16-bit

    numbers,

    the

    car

    propagation

    delay

    could be 460 ns.

    Generally,

    carry

    propagation

    delay

    and

    sum

    propagation

    delay

    are

    measured

    terms of

    gate

    delays. Looking

    at

    Fig.

    2.8 we

    can notice

    that

    for

    full-adder

    Cou,

    requir

    two

    gate

    delays

    and

    sum

    requires only

    one

    gate

    delay.

    When

    we

    connect

    such

    fu

    adder

    circuits

    in

    cascade

    to

    generate

    n-bit

    ripple

    adder

    as

    shown

    in

    the

    Fig.

    2.10,

    C

    is

    available

    in

    2(n-l) gate

    delays,

    and

    Sn_,

    is

    correct

    one

    XOR

    gate

    delay

    later.

    TT

    final

    cany-out,

    C„

    is available after

    2n

    gate

    delays.

    Thus

    fo r

    4-bit

    ripple

    adder

    C4

    available

    after

    8(2x4)

    gate

    delays,

    C3

    is available

    in

    6[2(4—

    1)]

    gate

    delays

    and

    Sj

    available

    in

    7

    gate

    delays.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    20/71

    Computer

    Organization

    & Architecture

    2-15

    Fixed

    Point

    and

    Floating

    Point

    Operations

    2.3.1

    Carry-Lookahead

    Adders

    One

    method

    of

    speeding up

    this

    process by

    eliminating

    inter

    stage

    carry

    delay

    is

    called

    lookahcad-carry

    addition.

    This method utilizes

    logic

    gates

    to

    look at the

    lower-order bits

    of the

    augend

    and addend to

    sec

    if

    a

    higher-order

    carry

    is

    to

    be

    generated.

    It

    uses

    two

    functions

    :

    carry

    generate

    and

    carry

    propagate.

    O-’

    Fig.

    2.13 Full adder circuit

    Consider the circuit of the full adder

    shown

    in

    Fig. 2.13.

    Here,

    we

    define two

    functions

    :

    cany generate and

    carry propagate.

    Pj

    =

    Aj

    ©

    Bj

    Gj

    =

    Aj

    B,

    (Refer

    Appendix-A

    for

    details.)

    The

    output

    sum

    and

    carry

    can

    be

    expressed

    as

    S,

    =

    Pi©Q

    Q

    +i

    =

    G, +

    P,

    C(

    G,

    is

    called a

    carry

    generate

    and

    it

    produces

    on

    carry

    when

    both

    A,

    and

    B,

    are

    one,

    regardless

    of

    the

    input

    carry.

    P,

    is

    called

    a

    carry

    propagate

    because it is

    term

    associated

    with

    the

    propagation

    of

    the

    carry

    from

    C(

    to

    Q4l.

    Now

    Q.j

    can

    be

    expressed as

    a

    sum

    of

    products

    function

    of the

    P

    and G

    outputs

    of all

    the

    preceding

    stages.

    For

    example,

    the carriers

    in

    a

    four

    stage carry-lookahead adder are

    defined

    as

    follows :

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    21/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    22/71

    Computer

    Organization

    &

    Architecture

    2

     17

    Fixed Point and

    Floating

    Point

    Operations

    delays

    after the

    signals

    A,

    B

    and

    Cj„

    are

    applied.

    In

    comparison

    note that

    a

    4-bit

    ripple carry

    adder

    requires

    7

    gate delays

    for

    S3

    and

    8

    gate

    delays

    for

    C4.

    A

    complete

    4-bit

    carry-loolcahcad

    adder

    in a

    block

    form

    is shown

    in

    the

    Rg.2.15.

    We

    can cascade

    such

    4-bit

    carry-lookahead

    adders

    to

    form

    a

    16-bit

    or

    32-bit

    adder.

    We

    require

    four

    4-bit

    carry-lookahead

    adders to

    form

    16-bit

    carry-lookahead

    adder

    S3

    S2

    S,

    S0

    Aa

    B3

    a2

    b2

    AI

    BI

    A)

    Bo

    Fig.

    2.16

    and

    eight

    4-bit

    carry-lookahead

    adders

    to form

    32-bit

    carry-lookahead

    adder.

    In

    32-bit

    adder,

    the

    carry

    out

    C4

    form

    the low-order

    4-bit adder is

    available

    3

    gate

    delays

    after

    the

    input

    operands

    A,

    B

    and

    Cg

    are

    applied

    to

    the

    32-bit

    adder.

    Then

    Cg

    is available

    at

    the

    output

    of the second

    adder

    after a

    further

    2 gate delays,

    C12

    is

    available after

    a

    further

    2

    gate

    delays,

    and so on.

    Finally,

    Cjg

    the

    carry-in

    to the

    high-order

    4-bit

    adder,

    is

    available

    after

    a

    total

    of

    (6

    x

    2)

    +

    3

    =

    15

    gate

    delays,

    C32

    is

    available after

    2

    more

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    23/71

    Computor

    Organization

    & Architecture

    2-18

    Fixed

    Point and

    Floating

    Point

    Operations

    gate

    delays,

    i.e.

    after 17

    gate delays

    and

    S31

    is available after

    3

    gate

    delays,

    i.e.

    18

    gate

    delays.

    These

    gate

    delays

    are very

    less

    compared

    to

    total

    delays

    of 63 and 64 fo r

    Sj,

    and

    C3Iif

    ripple-carry adder is used.

    Multilevel

    Generate

    and

    Propagate

    Functions

    In

    the

    32-bit

    adder

    just

    discussed,

    the

    carriers

    C4,

    Cg, C12,

    .....

    ipple

    through the

    4-bit

    adder

    blocks

    with

    two

    gate

    delays

    per

    clock. This is

    analogous

    to

    the

    way

    tha

    individual carries

    ripple

    through

    each

    bit

    stage

    in

    a

    ripple-carry

    adder.

    By

    using

    multi-level block

    generate

    and

    propagate

    functions,

    it

    is

    possible

    to

    use

    the lookahead

    approach

    to

    develop

    the

    carries

    C4,

    Cg,

    C12,

     

    in parallel,

    in

    a

    multi-leve

    carry-lookahead

    circuit.

    The

    Fig.

    2.17

    shows a

    16-bit adder

    implemented

    using

    four

    4-bit

    adder blocks.

    Here,

    blocks

    provide

    new

    output

    functions

    defined as

    Gj,

    and

    Pj

    where

    K

    =

    0

    for

    the firs t 4-bit

    block,

    K

    =

    1 fo r the second 4-bit block and

    so

    on.

    In

    the

    first

    block,

    W1P0

    i

    r0

    and

    =

    G3

    +P3G2 +P3P2G1 +P3P2P4G0

    *)5-ir yisu

    xii-s rii-a

    *7-4

    y?-t

    *so

    y»a

    O .

    Fig.

    2.17 16-bit

    carry-lookahead

    adder

    built from

    4-blt

    adders

    Therefore,

    we

    can use

    first-level

    G,

    and

    P,

    functions to determine

    whether

    bi

    stage

    i

    generates

    or

    propagates

    a

    carry,

    and we can use

    the second

    level

    Gÿ

    and

    functions

    to

    determine whether block

    K

    generates

    or

    propagates

    a

    carry.

    With these

    new functions

    available,

    it is not

    necessary

    to wait for carries

    to

    ripple through

    the

    4-bit blocks.

    Looking

    at

    Fig.

    2.17,

    we

    can

    determine

    CI6

    as

    CI6

    =

    +

    P3G2

     

    PjP’G»

     

    PJPJPJGQ

    +

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    24/71

    Computer

    Organization &

    Architecture

    2*19

    Fixed

    Point and

    Floating

    Point

    Operations

    The

    above

    expression

    is identical in

    form

    to the expression for

    Cv-

    only

    variable

    names arc

    different

    Therefore,

    the structure

    of

    the

    carry-lookahead

    circuit

    in the

    Fig.

    2.17

    is identical

    to

    the

    carry-lookahead

    circuit shown

    in

    the

    Fig.

    2.16.

    However,

    it

    is

    important

    to note

    that

    the carries

    C4,

    Cg,

    C]2

    and

    Ci6

    generated internally by

    the

    4-bit

    adder

    blocks

    are

    not needed

    in the Fig.

    2.17 because

    they

    are

    generated

    by

    the

    multi-level

    carry-lookahead

    circuits.

    Let

    see

    the

    delay in

    producing outputs

    from the

    16-bit

    carry-ahead

    adder.

    The

    delay

    in

    developing

    the carries

    produced by

    the

    carry-lookahead

    circuits is

    two gate

    delays

    more

    than the

    delay

    needed to

    develop

    the

    G

    and functions. The

    G

    [

    and

    functions

    require

    two

    gate delays

    and one

    gate

    delay,

    respectively,

    after

    the

    generation

    of

    Gl

    and

    P(.

    Therefore,

    all carries

    produced

    by

    the

    carry-lookahead

    circuits

    are

    available 5

    gate

    delays

    after

    A,

    B

    and

    Q,

    are

    applied

    as

    inputs.

    The

    carry

    CJJ

    is

    generated

    inside

    the

    higher-order

    4-bit block

    in

    the

    Fig.

    2.17

    in

    two

    gate

    delays

    after

    Cu

    followed

    by

    S15

    in one

    further gate

    delay.

    Therefore,

    S,s

    is

    available

    after

    8

    gate

    delays.

    These

    delays,

    5

    gate

    delays

    for

    C16

    and

    8

    gate delays

    for

    Sls

    are

    less

    as

    compared

    to

    9 and 10

    gate delays

    for

    C16

    and

    St5

    in

    cascaded

    4-bit

    cany-lookahead

    adder

    blocks,

    respectively.

    We

    can

    cascade

    two 16-bit adders to

    implement

    32-bit

    adder. Here,

    only

    two

    more

    gate

    delays

    are

    required to get

    C22

    and

    Sgj

    and

    C16

    and

    S15,

    respectively.

    Therefore,

    C32

    is available

    after

    7

    gate

    delays

    and

    SJJ

    is

    available

    after 10

    gate

    delays.

    These

    delays

    are

    less

    compared

    to 18

    and 17

    gate delays

    for the same

    outputs

    if

    the 32-bit

    adder

    is

    built from

    a

    cascade

    of

    eight

    4-bit

    adders.

    If we

    go for

    third

    level

    then

    we

    can bu lit

    64-bit

    using

    four

    16-bit adders.

    Delay

    through

    this adder will be

    12

    gate

    delays

    for

    and 7

    gate

    delays

    fo r

    2.4

    Multiplication

    of

    Positive Numbers

    The

    multiplication

    is

    a

    complex

    operation

    than

    addition and

    subtraction.

    It

    can

    be

    performed

    in

    hardware or

    software. A

    wide

    variety

    of

    algorithms

    have been used

    in

    various

    computers.

    For

    simplicity

    we

    will

    first

    see the

    multiplication

    algorithms for

    unsigned

    (positive) integers, and then we

    will see

    the multiplication

    algorithm

    for

    signed

    numbers.

    1101

    (13)

    Multiplicand

    1001

    (9)

    Multiplier

    1101

    0000

    0000

    1101

    Partial Products

    1110101

    Final

    Product

    (117)

    Fig.

    2.18

    Manual multiplication

    algorithm

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    25/71

    Computer Organization

      Architecture

    2-20

    Fixed Point and Floating

    Point

    Operation

    Fig.

    2.18

    shows

    the

    usual algorithm

    for

    multiplying

    positive

    numbers

    by

    hand

    Looking

    at this

    algorithms

    we

    can note

    following

    points

    :

     

    Multiplication

    process

    involves

    generation

    of

    partial

    products,

    one for each

    digit

    in

    the

    multiplier.

    These

    partial products

    are

    then summed to

    produce

    the

    final

    product

     

    In

    the

    binary

    system

    the

    partial

    products

    are

    easily

    defined. When the

    multiplier

    bit is

    0,

    the

    partial product

    is

    0, and

    when

    the

    multiplier

    is

    1,

    the

    partial product

    is

    the

    multiplicand.

     

    The

    final

    product

    is

    produced

    by

    summing

    the

    partial products. Before

    summing

    operation

    each successive

    partial product

    is

    shifted

    one

    position

    to

    the left relative to the

    preceding partial product,

    as

    shown

    in

    the

    Fig.

    2.18.

     

    The

    product

    of two

    n-digit

    numbers

    can

    be

    accommodated

    in 2n

    digits,

    so

    the

    product

    of the two

    4-bit

    numbers in

    fits

    into

    8-bits.

    Fig.

    2.19 shows the

    implementation

    of manual

    multiplication

    approach.

    It

    consist

    of

    n-bit

    binary

    adder,

    shift

    and add

    control

    logic

    and

    four

    registers.

    A,

    B,

    C,

    and

    Q

    As

    shown

    in the

    Fig.

    2.19

    multiplier

    and

    multiplicand

    arc

    loaded

    into

    register

    Q

    and

    register

    B,

    respectively,

    and

    C

    are

    initially

    set to

    0.

    Multiplicand

    MM

    |

    Bo

    1

    d

    n

    'n-

    rv-Bit

    Adder

     

    rvbit bus

    Add

    Shift

    and

    Logic

    St

     

    R.jh

    RHMM

     

    A<

     

    °

     

     

    1

    bit

    Register

    Muttipker

    Note

    :

    Dottod

    lines Indicate

    control

    signals

    Fig.

    2.19

    Hardware

    Implementation of

    unsigned

    binary

    multiplication

    Multiplication

    Operation Steps

    1.

    Bit

    0 of

    multiplier

    operand (Q0

    of

    Q

    register)

    is

    checked.

    2.

    If

    bit 0

    (QQ)

    is

    one then

    multiplicand

    and

    partial

    product

    are

    added and al

    bits

    of

    C,

    A

    and

    Q

    registers

    are

    shifted

    to

    the

    right

    one

    bit,

    so

    that

    the

    C bi

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    26/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    27/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    28/71

    Computer

    Organization

     

    Architecture 2-23

    Fixed Point and

    Floating

    Point

    Operations

    Therefore,

    the

    product

    can

    be

    computed by adding

    2*

    times the

    multiplicand

    to

    the

    2's

    complement

    of

    2*

    times the

    multiplicand. In

    simple

    notations,

    we can

    describe

    the

    sequence of

    required operations

    by

    recoding the

    preceding multiplier

    as

    0+100-10

    In genera], for

    Booth's

    algorithm

    recoding

    scheme can

    be

    given

    as

    :

    -1 times

    the

    shifted

    multiplicand

    is

    selected when

    moving

    from

    0

    to

    1,

    +

    1

    times the

    shifted

    multiplicand

    is

    selected

    when

    moving

    from

    1

    to

    0,

    and 0

    times

    the

    shifted

    multiplicand

    is selected

    for

    none

    of

    the

    above

    case,

    as

    multiplier is scanned from

    right

    to

    left.

    We

    have

    to

    assume

    an

    implied

    0 to

    right

    of the

    multiplier

    LSB.

    This

    is

    illustrated

    in

    the

    following examples.

    Example

    2.5

    : Recode the

    multiplier

    10

    110

    0

    forBootfo

    multiplication.

    Solution

    1

    0

    -

    1

    +1

    1 1

    0

    -

    1

    Multiplier

    Recoded

    multiplier

    Example

    2.6 :

    Recode

    the

    multiplier

    0

    110

    0

    1

    fbrÿBoothÿsmiÿMication

    Solution :

    zcro

    Oil 0 01

    Multiplier

    +

    1

    0 1

    0 1 1 Recoded

    multiplier

    The

    Fig.

    2.22

    shows

    the

    Booth's

    multiplication.

    As shown

    in

    the

    Fig.

    2.22,

    whenever

    multiplicand

    is

    multiplied by

    -1,

    its

    2's

    complement

    is

    taken

    as

    a

    partial

    result.

    Multiplier

    :

    0

    0

    1 1

    0

    0

    Multiplicand

    :

    0

    1

    0 0

    1

    1.

    Recoded

    multiplier

    :0«

    t

    0 1

    00

    Multiplication

    0 10

    0

    11

    x

    0

     10

      10 0

    0

    0

    1

    0 1

    10

    0 0 0

    0

    0

    0

    0

    0

    1

    0

    0

    0

    0 0 0

    0

    0 0

    0

    1

    0 0

    0

    1

    0 0

    1

    1

    0

    0

    1

    0 0

    1 «-

    2‘ s

    complement

    of

    Vie

    multiplicand

    1

    0

    0

    Fig.

    2J22

    Booth s

    multiplication

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    29/71

    Computer

    Organization

    & Architecture

    2-24

    Fixed

    Point and Floatin

    Point

    Operatio

    The same algorithm can be used

    for

    negative

    multiplier,

    This is

    illustrated

    in

    t

    following

    example.

    ine*

    Example

    2.7 :

    Multiply

    0

    1 1 1

    0

      +14)

    and 11011

      -5).

    Solution

    :

    0  

    1

    1 0

    11

    0 10 1

    -1

    (ÿ14) Multiplicand

    (- 5)

    Multiplier

    Recoded

    Multiplier

    Multiplication

    :

    0

    1

    1

    1

    0

    X

    0

    -1

      1 0 -1

    1

    1 1 1 1 1

    0 0

    1

    0

    0 0 0 0 0 0 0 0 0

    0

    0

    0 0

    1 1 1

    0

    1

    1

    1 0 0

    1

    0

    0 0 0 0 0 0

    1 1

    1 0

    1 1

    1 0 1 0

    2

    s

    complement

    of

    the

    multiplicand

    (-70)

    The same

    algorithm

    also can be used for

    negative

    multiplier

    and

    negati

    multiplicand.

    This is

    illustrated

    in

    the

    following example.

    Example

    2.8 :

    Explain

    the

    following

    pair of

    signed 2 s complement numbers.

    Multiplicand

    :

    1 1

    0 0

    1 1

      -13)

    Multiplier

    :

    101

    100

      -20)

    Solution :

    1

    0

    1

    1 0 0

    Multiplier

    -

    1

      0- 0 0

    Recoded

    Multiplier

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    30/71

    Computer Organization

    &

    Architecture

    2

     

    25

    Fixed Point

    and Floating

    Point

    Operations

    Multiplication

    :

    1

    1 0

    0

    1 1

    Multiplicand

    X

    -

    1

      1

    0

    -1

    0 0

    Recoded

    Multiplier

    0 0

    0

    0 0

    0 0 0 0 0 0 0

    0 0

    0

    0 0

    0 0 0 0 0 0

    0 0 0 0 0 0

    1

    1

    0

    1

      -

    Zs

    complement

    of

    the

    multiplicand

    0

    0 0

    0 0

    0

    0 0 0

    1 1 1

    1 0

    0

    1

    1

    0

    0

    0 1

    1

    0

    1

    «—

    Zs

    complement

    of

    the

    multiplicand

    0 0

    0

    1

    0

    0 0 0 0

    1

    0 0

    (260)

    The Booth's

    algorithm

    can be

    implemented

    as shown

    in

    the

    Fig.

    2.23.

    The

    circuit

    is

    similar to

    circuit for

    positive

    number

    multiplication.

    It consists of n-bit

    adder,

    shift,

    add subtract control

    logic

    and

    four

    registers,

    A,

    B,

    Q

    and

    Q_j.

    As shown

    in the

    Fig.

    2.23

    multiplier

    and

    multiplicand

    are

    loaded

    into

    register

    Q

    and

    register

    B,

    respectively,

    and

    register

    A

    and

    Q_: are

    initially

    set to 0.

    Multiplicand

     

    Initial

    settings :

    A

    0

    and

    Q_,

    =

    0

    Fig.

    2.23 Hardware

    implementation of

    signed

    binary

    multiplication

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    31/71

    Computer

    Organization

    & Architecture

    2-26

    Fixed Point

    and

    Floatin

    Point

    Operatio

    The

    n-bit

    adder

    performs

    addition

    of

    two

    inputs.

    One input is

    the A

    register

    an

    other

    input

    is

    multiplicand.

    In

    case

    of

    addition.

    Add/sub

    line

    is

    0,

    therefore

    and multiplicand

    is

    directly

    applied

    as

    a

    second

    input

    to

    the

    mbit adder.

    In

    case

    subtraction.

    Add/sub

    line is

    1,

    therefore

    0ÿ=1

    and

    multiplicand

    is

    complement

    and

    then

    applied

    to

    the

    n-bit

    adder.

    As

    a

    result,

    the

    2's

    complement

    of

    multiplicand

    added

    in

    the

    A

    register.

    The

    shift,

    add

    and subtract control

    logic

    scans bits

    QQ

    and Q_t one

    at

    a time an

    generates

    the

    control

    signals

    as

    shown

    in

    the

    Table

    2.3.

    If

    the two

    bits

    are

    sam

    (1

    -

    1

    or

    0

    -

    0),

    then

    all

    of the bits of the

    A, Q,

    and

    Q_

    t

    registers

    are shifted to

    rig

    1-bit without addition or subtraction

    (Add/subtract

    Enable

    =

    0).

    If

    the two

    bits a

    differ,

    then

    the

    multiplicand

    (

    B-register)

    is added

    to

    or subtracted

    from the

    A

    regist

    depending

    on the status of bits.

    If

    bits

    are

    QQ

    =

    0

    and

    Q_j

    =

    1

    then

    multiplicand

    added

    and

    if

    bits

    are

    Q0=

    1

    and

    Q_j

    =

    0

    then

    multiplicand

    is

    subtracted.

    Af

    addition

    or

    subtraction right

    shift occurs

    such

    that

    the

    leftmost

    bit of

    A

    (A

    n_j)

    is

    n

    only

    shifted into

    An_2,

    but also

    remains

    in

    An_j.

    This

    is

    required

    to

    preserve

    the

    si

    of the number

    in A

    and

    Q.

    It

    is

    known as

    an

    arithmetic shift, since

    it

    preserves the

    si

    bit

    Qo Q-t

    Add/sub Add/Subtract Enable

    Shift

    0

    0

    X

    0

    1

    0

    1

    0

    1 1

    1 0

    1

    1

    1

    1 1

    X

    0

    1

    Table

    2.3 Truth table for

    shift,

    add and

    subtract

    control

    logic

    The

    sequence

    of

    events

    in

    Booth's

    algorithm

    can be

    explained

    with the

    help

    flowchart shown

    in

    Fig.

    2.24.

    Let

    us

    sec the

    multiplication

    of

    4-bit

    numbers,

    5

    and

    4

    with

    al l

    possib

    combinations.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    32/71

    Computer

    Organization

     

    Architecture

    2-27

    Fixed

    Point

    and

    Floating

    Point

    Operations

    Fig.

    2.24

    Booth's

    algorithm

    for

    signed

    multiplication

    CASE

    1 :

    Both

    Positive

     5x4

    Multiplicand

     8)

    «-

    0

    1

    0

    1

     5)

    Multiplier

    (Q )

    «-

    0

    1

    0 0

     4)

    Stops

    A

    Q

    Q-t

    Operation

    0 0 0 0 0

    1

    0 0 0

    Initial

    Step

    1

    :

    0

    0 0

    0

    0

    0

    1 0

    0 Shift

    right

    Step

    2

    : 0

    0

    0

    0

    0

    0

    0 1 0

    Shift

    right

    Step

    3

    :

    1

    0

    1

    1

    0 0

    0 1

    0

    A

    «-

    A

     

    B

    1

    1

    0

    1

    1

    0 0

    0

    1

    Shift right

    Step

    4 :

    0 0

    1

    0

    1

    0 0 0

    1 A

    «-

    A

     

    B

    0 0 0

    1

    0

    1

    0 0

    0 Shift

    right

    Result

    :

    0

    0

    0

    1

    0

    1

    0

    0

     

    20

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    33/71

    Computer

    Organization

     

    Architecture

    2  

    28

    Fixed Point and

    Floating

    Point

    Operations

    CASE

    2 :

    Negative

    Multiplier

      5

    x

     

    4

    )

    Muftiplicand(B)

    4—

    0

    1

    0

    1

    (5)

    Multiplier

     Q)

    «-

    110

    0

     -4

    Steps A

    Q

    Q-i

    Operation

    0

    0 0

    0

    1

    1

    0

    0 0 Initial

    Step

    1

    :

    0

    0

    0 0 0

    1 1

    0 0 Shi*

    right

    Step 2

    :

    0 0 0 0 0

    0

    1 1

    0 Shift

    right

    Step

    3

    :

    1

    0

    1

    1

    0

    0

    1 1

    0

    A

    *-

    A

    -

    B

    1

    1

    0

    1

    1

    0 0 1

    1

    Shift

    right

    Step

    4

    :

    1 1

    1

    0 1

    1

    0

    0

    1

    Shift

    right

    Rosutt

    :

    1

    1

    1

    0

    1

    1

    0

    0

     

    20

    (2's

    complement

    of

    20)

    CASE

    3 :

    Negative

    Multiplicand

     -5x4

    Multiplicand B )

    «-

    1

    0

    11

     -5

    Mu ltjplier Q)

    «-

    0 10 0

    (4)

    Steps

    A

    Q

    Q-1

    Operation

    0

    0

    0 0 0 10 0

    0

    Initial

    Step 1

    :

    0

    0 0 0 0

    0 10 0

    Shift

    right

    Step

    2

    ;

    0

    0

    0 0

    0 0 0

    1

    0

    Shift

    right

    Step

    3

    :

    0

    10 1

    0 0 0

    1

    0

    A «-

    A

    -

    B

    0

    0 10

    10 0

    0

    1

    Shift

    right

    Step

    4

    :

    110

    1

    10 0

    0

    1

    A

    «-A

     

    B

    1110 110 0 0

    Shift

    right

    Result

    :

    1110

    1100

    =-

    20

     2‘ s

    complement

    of

    20)

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    34/71

    Computer

    Organization

    &

    Architecture

    2

    -

    29

    Fixed Point and

    Floating

    Point

    Operations

    CASE

    4

    :

    Both

    Negative

    (

    -

    5

    x

    -

    4

    )

    Multipkcand(B)

    «-

    1

    0

    1

    1

    (-5)

    Muliplier(Q)

    «- 1 1

    0

    0

      -4 )

    Steps

    A

    Q

    Q-t

    Operation

    0

    0 0 0

    11

    0

    0

    Initial

    Step

    1

    :

    0

    0

    0

    0 0

    11

    0

    Shift right

    Step

    2 : 0

    0 0 0 0 0

    11

    0

    Shift right

    Step

    3

    :

    0 1

    1

    0 0

    11

    0

    A «- A

    -

    B

    0

    0

    1 1

    0

    1

    1

    Shift

    right

    Step

    4

    :

    0

    0

    0

    1

    0

    1

    0

    1

    Shift

    right

    Result

    :

    0 0

    0

    1

    0

    1 0

    =

     

    20

    2.6 Fast

    Multiplication

    There

    are two

    techniques

    fo r

    speeding up

    the

    multiplication

    process.

    In

    first

    technique

    the maximum

    number

    of

    summands

    are

    reduced

    to

    n/2

    fo r n-bit

    operands.

    The

    second

    technique,

    called the

    carry

    save addition reduces the time needed to

    add

    the summand.

    In

    this

    section

    we

    will

    see

    first

    technique

    to

    speed

    up

    multiplication.

    2.6.1

    Bit-Pair

    Recoding

    of Multipliers

    To

    speed-up

    the

    multiplication

    process in

    the Booth's

    algorithm

    a

    technique

    called

    bit-pair

    recoding

    is used.

    It

    is also called modified Booth’s

    algorithm. It

    halves the

    maximum

    number of summands.

    In

    this

    technique,

    the Booth-recoded

    multiplier

    bits

    are

    grouped

    in

    pairs.

    Then

    each

    pair

    is

    represented

    by

    its

    equivalent single bit

    multiplier

    reducing

    total

    number

    of

    multiplier

    bits

    to

    half.

    For

    example

    pair

    (+

    1

    -1)

    is

    equivalent

    to the

    pair

    (0

    +1).

    That

    is,

    instead of

    adding

    -1

    times

    multiplicand

    at

    shifted

    position

    i

    to

    +1

    times

    the

    multiplicand

    at

    position

    i

    +

    1,

    the

    same

    result

    is

    obtained

    by

    adding

    +1 times

    multiplicand

    at

    position

    i.

    Similarly,

    (+1 0)

    is

    equivalent

    to

    (0

    +2),

    (-1

    +1)

    is

    equivalent

    to

    (0

    -1)

    and

    so on.

    By replacing

    pairs

    with

    their

    equivalents

    we

    can

    get

    bit-pair

    recoded

    multiplier.

    But instead

    of

    deriving

    bit-pair

    recoded

    multiplier

    from Booth recoded

    multiplier

    one can

    directly

    derive it from

    original

    multiplier.

    The

    bit-pair

    recoding

    of

    multiplier

    can be

    directly

    derived from

    Table 2.4.

    The

    Table 2.4

    shows

    the

    bit-pair

    code

    for

    all

    possible multiplier

    bit

    options.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    35/71

    Fixed

    Point and

    Floatin

    Computer

    Organization

     

    Architecture

    2-30

    Point Operation

    Multiplier

    bit-pair

    Multiplier

    bit

    on the

    right

    Bit-pair

    recoded

    multiplier bit

    at

    position

    I

    i

     

    1

    i

    i-

    1

    0 0 0 0

    0

    0

    1

     

    1

    0

    1 0

     

    1

    0 1

    1

     

    2

    1

    0 0 -2

    1

    0

    1

     

    1

    1 1

    0

     

    1

    1

    1

    1

    0

    Table

    2.4

    Example

    2.9

    :

    Find

    the

    bit-pair

     o e

    for

    multiplier.

    11010

    Solution :

    By

    referring

    table

    we

    can derive

    bit-pair

    code as follows

    :

    Sign

    extension

     

    [T]

    1- 1 0 1

    0 o]

     

    mp l ied

     

    to

    I

    II

    ||

    |

    nghlotLSB

    0

     1 2

    Example

    2.10

    :

    Multiply

    given

    signe

    2 s

     omplement

    numbers

    using bit-pa

    re o ing

    A=110101

    multiplicand

     

    -11

    B

    =

    011 01

    1

    multiplier

     +27

    Solution

    :

    Let

    us find the

    bit-pair

    code for

    multiplier.

    0

     

    [6]

    ¡ö?

    Implied

    0

    to

    I

    |

    ||

    |

    right of LSB

     

    1  1

    Multiplication

    :

    1 1

    0

    1

    0

    1

    X

    ♦2

     1

     1

    0 0 0

    0 0

    0

    0 0

    1

    0 1

    1

     

    2 s

    complement

    o

    the multiplicand

    0

    0

    0

    0

    0 0

    1

    0

    1

    1

    «-

    2’s

    complement

    o

    the multiplicand

    1 1

    1

    0

    1

    0

    1

    0

     

    Multiplicand

    x

     +2)

    1

    1

    1

    0 1

    1

    0 1 0 1 1

    1

     -297)

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    36/71

    Computer Organization &

    Architecture

    2-31

    Fixed Point

    and

    Floating

    Point

    Operations

    2.7

    Integer

    Division

    The division is more complex than multiplication.

    For

    simplicity

    we

    will

    see

    division for

    positive

    numbers.

    Fig.

    2.25 shows the usual

    algorithm

    for

    dividing

    positive

    numbers

    by

    hand.

    It

    shows

    examples

    of decimal

    division and

    the

    binary-coded

    division of the same

    value.

    Divisor

    14

    u)

    169

     

    2

    49

    48

    I

    Quotient

    Dividend

     

    -

    Divisor

    Partis]

    Remainder

     

    1110

    1100

     

    10101001

    1100

    10010

    1100

    01100

    1100

    00001

    Fig.

    2.25 Division

    examples

     

    Quotient

    -•

     

    Dividend

    Remainder

    In both the

    divisions,

    division

    process

    is

    same,

    only

    in

    binary

    division

    quotient

    bits are 0

    and

    1.

    We

    will

    now

    see

    the

    binary

    division

    process

    in

    detail.

    First,

    the

    bits

    of

    the

    dividend

    are examined from left

    to

    right,

    until

    the

    set of bits examined

    represents

    a

    number

    greater

    than

    or

    equal

    to the

    divisor;

    this is

    referred to

    as

    the

    divisor being able

    to

    divide

    the

    number.

    Until

    this

    condition

    occurs,

    Os

    are

    placed

    in

    tire

    quotient

    from

    left

    to

    right

    When

    the condition is

    satisfied,

    a

    1

    is

    placed

    in

    the

    quotient

    and

    the divisor

    is

    subtracted

    from

    the

    partial

    dividend.

    The

    result

    is

    referred

    to

    as

    a

    partial

    remainder. From

    this

    point

    onwards,

    the division

    process

    follows

    repetition

    of

    steps.

    In

    each

    repetition

    cycle,

    additional bits

    from

    the dividend are

    brought

    down

    to

    the

    partial

    remainder

    until

    the result is

    greater

    than

    or

    equal

    to the

    divisor,

    and

    the divisor is

    subtracted

    from the result to

    produce

    a new

    partial

    remainder.

    The

    process

    continues until

    all

    the bits of

    the dividend

    are

    brought

    down

    and

    result is

    still

    less

    than the

    divisor.

    2.7.1

    Restoring Division

    Fig.

    2.26 shows the

    hardware

    for

    implementation

    of

    division

    process

    described

    in

    the

    previous

    section.

    It

    consists

    of

    n

    +

    1-bit

    binary

    adder,

    shift,

    add

    and

    subtract

    control

    logic

    and

    registers

    A,

    B,

    and

    Q.

    As shown

    in

    the

    Fig.

    2.26 divisor and

    dividend

    are

    loaded into

    register

    B

    and

    register

    Q,

    respectively. Register

    A

    is

    initially

    set

    to

    zero. The

    division

    operation

    is then carried out. After the

    division is

    complete,

    the

    n-bit

    quotient

    is

    in

    register

    Q

    and the

    remainder

    is

    in

    register

    A.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    37/71

    Computer

    Organization &

    Architecture

    2-32

    Fixed Point

    and

    Floating

    Point

    Operations

    Divisor

    Fig.

    2.26 Hardware

    to

    implement binary

    division

    Division

    Operation

    Steps

    :

    1. Shift

    A

    and Q

    left one

    binary

    position.

    2.

    Subtract

    divisor from

    A

    and

    place

    answer back

    in

    A

    (A

    «-

    A

    -

    B).

    3.

    If

    the

    sign

    bit of

    A

    is

    1,

    set

    Q0

    to

    0

    and add

    divisor

    back

    to

    A

    (that

    is,

    restore

    A);

    Otherwise,

    set

    Q0

    to

    1.

    4

    Repeat

    steps

    1,

     

    and

    3

    n

    times.

    A

    flowchart for

    division

    operation

    is

    as shown

    in

    Fig.

    2.27.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    38/71

    Computer

    Organization

     

    Architecture 2-33

    Fixed

    Point

    and Floating

    Point

    Operations

    Fig.

    2.27

    Flowchart for

    restoring

    division operation

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    39/71

    Computer

    Organization

    & Architecture

    2

     

    34

    Fixed

    Point

    and

    Floating

    Point

    Operations

    Let us see

    one

    example.

    Consider 4-bit

    dividend and 2-bit

    divisor :

    Dividend

    =

    10

    10

    Divisor

    =0011

    Fig.

    2.28

    shows

    steps

    involved

    in the

    above

    binary

    division.

    A

    Register

    Q

    Register

    Initially

    0

    0

    0

    0 0

    1

    0

    1

    0

    Shift

    0 0

    0

    0

    1

    0

    1

    0

    Subtract B

    1 1

    1

    0 1

    setQg

    1 1 1

    0

    Restore (A+B)

    0

    0

    0

    1 1

     1

    0 0

    0

    0

    1

    0

    1

    0

    0

    Shift

    0 0

    0

    1

    0

    1

    o

    0D

    Subtract

    B

    1

    1 1

    0

    1

    setQg

    9

    1 1 1 1

    Restore

    (A+B)

    0 0

    0

     

    1

    ~1

    0

    0

    0 1

    0

    1

    0

    00

    Shift

    0 0

    1

    0

    1

    0

    000

    Subtract

    B

    1

    1 1

    0

    1

    setQg

    (S)

    0 0

    1

    0

    0

      [1]

    Shift

    0

    0

    1

    0 0

    Subtract

    B

    1

    1

    1

    0

    1

    setQg

    £

    0

    0 0

    1

    Remainder

    000

    Quotient

    First Cycle

    Second

    Cycle

    Third

    Cycle

    Fourth

    Cycle

    Note

    :

    Subtract

    B

    means add

    B In

    2's

    complement

    form

    Fig.

    2.28

    A

    restoring

    division

    example

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    40/71

    Computer

    Organization

    &

    Architecture

    2

    -

    35

    Fixed

    Point

    and

    Floating

    Point

    Operations

    The

    division

    algorithm

    just

    discussed

    needs

    restoring

    register

    A

    after each

    unsuccessful

    subtraction.

    (Subtraction

    is said

    to be

    unsuccessful

    if

    the result is

    negative).

    Therefore it

    is

    referred

    to

    as

    restoring

    division

    algorithm.

    This

    algorithm

    is

    improved,

    giving

    non-restoring

    division

    algorithm.

    Consider

    the

    sequence

    of

    operations

    that takes

    place

    after the

    subtraction

    operation

    in

    the restoring

    algorithm.

    If

    A

    ia

    positive

    If

    A la

    negative

    Shift left

    and

    subtract

    divisor

    -*

    2A

    -

    B

    Restore

    -*

    A

    +

    B

    Shift

    left and subtract divisor-*

    2

    (

    A

    +

    B

    )

    -

    B

    =

    2A

     

    B

    Looking

    at the

    above

    operations

    we can

    write

    following

    steps

    for

    non-restoring

    algorithm.

    Step

    1

    :

    If

    the

    sign

    of

    A

    is

    0,

    shift

    A and

    Q

    left

    one

    bit

    position

    and

    subtract

    divisor

    from

    A; otherwise,

    shift

    A

    and

    Q

    left

    and add

    divisor to

    A.

    If

    the

    sign

    of

    A

    is

    0,

    set

    Qo

    to

    1; otherwise,

    set

    Q0

    to 0.

    Step

    2

    :

    Repeat

    steps

    1

    'and

    2 for

    n

    times.

    Step

    3

    :

    If

    the

    sign

    of

    A

    is

    1,

    add divisor to

    A.

    Note

    :

    Step

    3 is

    required

    to

    leave the

    proper

    positive

    remainder

    in

    A

    at

    the end

    of n

    cycles.

    2.7.2

    Non-restoring

    Division

    A

    flowchart for

    non-restoring

    division

    operation

    is as shown

    in Fig.

    2.29.

    Let

    us

    see

    one

    example.

    Consider 4-bit dividend

    and

    2-bit

    divisor :

    Dividend

    =

    10

    10

    Divisor

    =

    0011

    Fig.

    2.30 shows

    steps

    involved in

    the non-restoring

    binary

    division.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    41/71

    Computer

    Organization

     

    Architecture

    2 36

    Fixed Point

    and

    Floatin

    Point

    Operation

    Fig.

    2.29 Flowchart for non-restoring

    division

    operation

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    42/71

    Computer

    Organization

    & Architecture

    2-37

    Fixed Point

    and

    Floating

    Point

    Operations

    A

    Register

    Q

    Register

    Initially

    0 0 0 0 0 1 1

    Shift

    0

    0

    0

    0

    1

    0

    1

    0

    Q

    Subtract

    1

    1

    1 0 1

    set

    Q0

    t

    1

    1

    1

    0 0

    1

    0

    [o]

    Shift

    1

    1

    1

    0 0 1 o

    0Q

    Add

    0

    0

    0

    1

    1

    se t

    Qo

    1

    1

    1

    1

    1

    o

    @[o]

    Shift

    1 1

    1

    1

    1

    o

    Add

    0 0 0

    1

    1

    o

    [o][o][T]

    etQo

    t

    0 0

    1

    0

    Shift

    0 0 1 0 0

    0GD0D

    Subtract

    1

    1

    1

    0

    1

    000[j]

    0

    0

    1

    Remainder

    Quotient

    Dividend

    First

    Cycle

    Second

    Cycle

    Third

    Cyde

    Fourth

    Cyde

    .

    F'g-

    2.30

    A non-restoring

    division

    example

    In the

    above

    example

    after

    4

    cycles

    register A

    is

    positive

    and

    hence step

    3

    is

    not

    required. Let

    us

    see

    another

    example

    where we

    need

    step

    3. Consider

    4-bit dividend

    and 2-bit divisor :

    Dividend

    =10

    11

    Divisor

    =

    0101

    Fig.

    2.31 shows

    steps

    involved in

    the non-restoring

    binary

    division.

    The

    hardware shown

    in

    Fig.

    2.26

    can also be used to

    perform

    non-restoring

    algorithm.

    There

    is

    no

    simple

    algorithm

    fo r

    signed

    division.

    In

    signed

    division,

    the

    operands

    are

    preprocessed

    to

    transform them

    into

    positive

    values.

    Then

    using

    one of

    the

    algorithms

    just

    discussed

    quotients

    and

    remainders

    are

    calculated. The

    quotients

    and

    remainders are

    then

    transformed

    to the

    correct

    signed

    values.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    43/71

    Computer

    Organization

     

    Architecture

    2-38

    Fixed Point

    and Floating

    Point Operations

    A

    Register

    Initially

    Shift

    Subtract

    set

    Q0

    0 0 0 0 0

    0 0 0

    0

    1

    1

    10

     

    110

    0

     

    Q Register

    10

    11

     

    Dividend

    0

    1 1

     

    _

     

    First

    Cycle

    0110

    f

    Shift

    1 1

    0

    0

    0

    1

    1

      DD

    Add

    0 0

    1

    0

    1

    setQ ,

    V

    1

    0

    1

    i

    1

    ®[5]

    Shift

    1 1

    0

    1

    1 1

    00D

    Add

    0 0

    1

    0

    1

    (5)

    o

    0 0

    0

    1

    GO®

    [7]

    Second

    Cycle

    Third

    Cycle

    Shift

    Subtract

    Add

    0 0 0 0

    1

    1

    10

    11

    110 0

     

    1110

    0

    0

    0

    10

    1

    0

    0 0 0 1

    Remainder

    BQDHID

    Quotient

    Fourth

    Cycle

    Restore Remainder

    Fig. 2.31

    A

    non-restoring division

    example

    2.7.3

    Comparison

    between

    Restoring

    and

    Non-restoring Division

    Algorithm

    Sr .

    No

    Restoring Non-restoring

    1.

    Noeds

    restoring

    of

    register A

    if

    the

    result of

    subtraction is negative.

    Does not

    need

    restoring.

    2.

    In

    each

    cycle

    content of

    register

    A is

    first

    shifted left

    and

    then div isor

    is

    subtracted from

    il

    In

    each cycle

    content of

    register

    A is

    first

    shifted left and then divisor is added

    or

    subtracted

    with

    the content

    of

    register

    A

    depending

    on the

    sign

    of A.

    3. Does

    not

    need

    restoring

    of remainder. Needs

    restoring

    of remainder If remainder

    is

    negative.

    4.

    Slower

    algorithm.

    Faster

    algorithm.

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    44/71

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    45/71

    Computer

    Organization

      Architecture

    2

     

    40

    Fixed Point

    and

    Floating

    Point

    Operations

    point

    number

    system.

    The

    string

    of the

    significant digits

    is

    commonly

    known as

    mantissa.

    In the

    above

    example,

    we

    can

    say

    that,

    Sign

    =

    0

    Mantissa

    =

    11101100110

    Exponent

    =

    5

    In

    floating

    point

    numbers,

    bias

    value

    is added

    to

    the

    true

    exponent.

    This solves

    the

    problem

    of

    representation

    of

    negative

    exponent.

    Due to this the

    magnitude

    of two

    numbers can be

    compared

    by doing arithmetic

    on the

    exponent

    first.

    2.8.1

    IEEE

    Standard for

    Floating-Point Numbers

    The

    standards for

    representing floating point

    numbers

    in

    32-bits

    and

    64-bits

    have

    been

    developed

    by

    the

    Institute

    of

    Electrical

    and

    Electronics

    Engineers

    (IEEE),

    referred

    to

    as IEEE

    754 standards.

    Fig.

    2.32 shows

    these

    IEEE

    standard formats.

    Sign

    of

    number

    :

    0 signifies

    +

    1 signifies

    -

    31 30

    23

     

    32-bits

     

    2

    0

    Is J

     

    M

     

    __

     

    -bit

    signed

    exponent

    in

    excess-127

    representation

    23-bit

    mantissa

    fraction

    Value

    represented

    =

    ±

    1.M

    x

    2

    £•-12?

    (a) Single

    precision

    64-bits

    63

    62

    52

    51 0

    lsl

    E'

    M

    Sign

    *

    11-bit

    excess-1023

    exponent

    52-bit

    mantissa

    fraction

    Value

    represented

    »

    ±

    1.M

    x 2

    (b)

    Double

    precision

    Fig.

    2.32

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    46/71

    Computer

    Organization

    &

    Architecture 2-41

    Fixed

    Point and

    Floating

    Point Operations

    The 32-bit standard

    representation

    shown

    in

    Fig.

    232

    (a)

    is called

    a

    single

    precision

    representation

    because

    it

    occupies

    a

    single

    32-bit word.

    The

    32-bits

    are

    divided into three fields

    as

    shown

    below

    :

    (field

    1)

    Sign

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    47/71

    Computer

    Organization &

    Architecture 2-42

    Fixed

    Point and

    Floating

    Point

    Operations

    Example

    2.11

    :

    Represent

    1259.1

    25

    in

    single

    precision and double

    precision

    formats.

    Solution

    : Step

    1

    :

    Convert

    decimal number

    in

    binary

    format

    Integer

    Part :

    Fractional

    Part

    :

    78

    4

    16

     

    1259

    16)

    78

    112

    64

    0139

    14

    -

    E

    128

    Oil

    -

    B

    100 1110

    1011

    m

    4EBH

    -

     

    4

    E

    B

    0.125x2

    =

    0.25

    0

    0.25x2

    =

    0.5

    0

    0.5

    x

    2

    =

    1.0

    1

    0

    =

    0.001

    Binary

    number

    =

    10011101011+ 0.001

    =

    10011101011.001

    Step

    2

    :

    Normalize

    the number

    1001110101 1. 001

    1.0

    0

    111

    0

    1

    0

    1 1

    0

    0 1

    x210

    Now we will

    see

    the

    representation

    of

    the

    numbers

    in

    single

    precision

    and double

    precision

    formats.

    Single

    Precision

    For

    a

    given

    number

    S

    =

    0,

    E

    =

    10,

    and

    M

    =

    0011101011001

    Bias for

    single

    precision

    format is

    =

    127

    E'

    =

    E

    +

    127

    =

    10

    +

    127

    =

    13710

    =

    1

    0 0

    0

    1

    0

    012

    Number

    in

    double

    precision

    format is

    given

    as

    0 1000

    1001

    0011101011001 0

    s

    Sign

    Exponent Mantissa

  • 8/18/2019 C2. Fixed Point and Floating Point Operations

    48/71

    Computer

    Organization &

    Architecture 2-43

    Fixed Point and

    Floating

    Point

    Operations

    Double Precision

    For

    a

    given

    number

    S