The Intel Pen Ti Um Processor

Embed Size (px)

Citation preview

  • 8/9/2019 The Intel Pen Ti Um Processor

    1/12

    The Intel Pentium

    Processor

    Bogdan Ilisie

    &Rika Kanai

    Pentium Pentium Pro Pentium II Pentium III

  • 8/9/2019 The Intel Pen Ti Um Processor

    2/12

    The first Intel Pentium Introduced to market on March 22, 1993 with a CPU

    clock cycle of 66 Mhz

    With its coming, it hosted many innovations, the

    most notable being:

    Superscalar architecture

    Dynamic Branch Prediction

    Pipelined Integer Unit

    These features made the newly introduced chip a very popular choice for desktop,

    although it was later found that the processor had some notorious implementation

    errors.

    Pipelined Floating-Point Unit

  • 8/9/2019 The Intel Pen Ti Um Processor

    3/12

    The Pentium CPU (MMX)

  • 8/9/2019 The Intel Pen Ti Um Processor

    4/12

    Pipelined Integer Unit

    The Pentium pipelined Integer Unit supports 5 stages:

    1) Pre-fetch

    2) Decode

    3) Address generate

    4) EX Execute - ALU and Cache Access

    5) WB Writeback

    Although different later processors like the MMX tampered with the 5 execution

    steps(by adding intermediate LIFO structures to hold bulks of instructions), the steps

    remain the core foundation of the pipelining.

    As it can be seen from the previous diagram, the Integer unit has two pipelines(U and

    V),while the Floating Point Unit (FPU) has one pipeline.

  • 8/9/2019 The Intel Pen Ti Um Processor

    5/12

    1) In the Pre-fetch cycle, two pre-fetch buffers read instructions to be executed. Instructions can be fetched from the

    U or V pipeline. The U pipeline contains more complex instructions.

    2) In the Decode cycle, two decoders, decode the instructions and try to pair them together so they can run in parallel,

    since the Pentium features a Superscalar architecture.

    Even though the Pentium processor features a Superscalar architecture,

    in order for two instructions to run concurrently, like in the diagram

    below, they need to satisfy some rules. Essentially, the instructions have

    to be independent otherwise they cannot be paired together.

    3) In the second Decode stage, or the address generate stage, the

    addresses of memory operands are calculated. After these calculations, the

    EX stage of the pipeline is ready to execute.

    Pipelined Integer Unit

    A Floating Point instruction cannot be paired with an Integer

    instruction.

  • 8/9/2019 The Intel Pen Ti Um Processor

    6/12

    Pipelined Integer Unit(Conclusion)

    4) In the Execution cycle, the ALU is reached.

    5) In the Write Back cycle, information is written back to the registers.

    If two instructions are executing concurrently in the pipeline (given they satisfy the

    proper conditions, and are independent) and one of them stalls as a result of hazard

    control, the other one will also stall.

    For two instructions to be paired together in the Decode stage, they have to lack

    dependencies.

    The two paired instructions would also have to be basic, in the sense that they contain

    no displacements or immediate addressing.

    As it can be deduced, pipelines will sometimes execute an instruction at the time,

    despite the Superscalar ability.

  • 8/9/2019 The Intel Pen Ti Um Processor

    7/12

    Branch Prediction

    Other than the Superscalar ability of the Pentium processor, the

    branch prediction mechanism is a much-debated improvement.

    Predicting the behaviors of branches can have a very strongimpact on the performance of a machine. Since a wrong

    prediction would result in a flush of the pipes and wasted cycles.

    The branch prediction mechanism is done through a branch target

    buffer. The branch target buffer contains the information about all

    branches.

    The prediction of whether a jump will occur or no, is based on the branchs previous behavior.There are four possible states that depict a branchs disposition to jump:

    Stage 0: Very unlikely a jump will occur

    Stage 1: Unlikely a jump will occur

    Stage 2: Likely a jump will occur

    Stage 3: Very likely a jump will occur

  • 8/9/2019 The Intel Pen Ti Um Processor

    8/12

    Branch Prediction

    When a branch has its address in the branch target buffer, its

    behavior is tracked.

    This diagram portrays the four stages associated branch

    prediction.

    If a branch doesnt jump two times in a row, it will go down to

    State 0.

    Once in Stage 0, the algorithm wont predict another another

    jump unless the branch will jump for two consecutive jumps (so

    it will go fromS

    tate 0 toS

    tate 2)

    Once in Stage 3, the algorithm wont predict another nojump

    unless the branch is not taken for two consecutive times.

  • 8/9/2019 The Intel Pen Ti Um Processor

    9/12

    Branch PredictionIt is actually believed that Pentiums algorithm for branch

    prediction is incorrect.

    As it can be seen in the diagram to the right, State 0 will jump

    directly to State 3, instead of following the usual path which

    would include State 1, and State 2.

    This abnormality might be attributed to the way in which thebranch target buffer operates:

    - If a branch is not found in the branch target buffer, then it

    predicted that it wont jump.

    - A branch wont get an actual entry in the branch target buffer,

    until the first time it jumps, and when it does, it goes straight into

    State 3.

    - Because the branch wont get an entry into the branch targetbuffer until the first time it jumps, this will cause an alteration

    into the actual state diagram, as it can be clearly seen.

    More information about this problem can be found at http://x86.ddj.com/articles/branch/branchprediction.htm

  • 8/9/2019 The Intel Pen Ti Um Processor

    10/12

    Branch Prediction (in later Pentium Models)

    The Intel Pentium branch prediction algorithm is indeed better than a 50% guess, but it has

    limitations.

    In a need to increase the accuracy of branch predictions, the processors following the Pentiumadopted a different branch prediction algorithm.

    Some loops have repetitive patterns and they need to be recognized. With a two bit binary counter,

    it is impossible to attain any complexity.

    Later generation processors, such as the Pentium MMX, Pentium Pro, Pentium II, use another

    mechanism for branch prediction.

    A 4 bit register is used to record the previous behavior of the branch. If the 4 bit register would be

    0001, it would mean that the branch only jumped the last time out of 4.

    A 4 bit register would not be of much use without any additional logic. In addition to the 4 bit

    register, there are 16, 2-bit counters like the ones that were previously shown.

  • 8/9/2019 The Intel Pen Ti Um Processor

    11/12

    Branch Prediction

    (in later Pentium Models)

    A 4 bit register that records the behavior of the branch

    along with 16 2-bit counters, the mechanism is able to

    give more accurate branching predictions.

    Since the register has 4 bits, it has 16 possible values,

    so the current value of the 4 bit register can always beassociated with one of the 16 bit counters, like it is

    shown in the diagram to the right.

    Each value in the 4 bit register, represents a trend of

    that branch.

    For each trend, we must be able to predict the next

    value.

    Since each register value will be pointing to a different 2-bit counter, the state of the 2-bitcounter will most likely return the correct prediction for that particular register pattern.

    Therefore, by combining a 4 bit register that records past trends, with 16 individually updated 2-

    bit counters, we end up with a much stronger mechanism for prediction, which is currently used

    in Pentium MMX, Pentium II, and others.

  • 8/9/2019 The Intel Pen Ti Um Processor

    12/12

    Newer Generation Chips

    The next move up from Pentium was Pentium MMX.

    The Pentium MMX, includes new instructions, registers, and data types which are aimed at

    maximizing the speed of multimedia computations.

    Since multimedia work requires massive data manipulation, SIMD instructions were added to

    the MMX set. SIMD instructions work on multiple data values at once, in order to maximize

    the amount of work done by each instruction.

    The improved multimedia support of the MMX, along with lower power consumption, larger

    caches, and new branch prediction mechanisms, brought about the new generations of

    Pentiums (II & III)