91
Assembly Language x86 Family Architecture Motaz K. Saad Spring 2007 1 Motaz K. Saad, Dept. of CS

The x86 Family

Embed Size (px)

DESCRIPTION

The x86 Family

Citation preview

Page 1: The x86 Family

Assembly Languagex86 Family Architecture

Motaz K. SaadSpring 2007

1Motaz K. Saad, Dept. of CS

Page 2: The x86 Family

Overview• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System

2Motaz K. Saad, Dept. of CS

Page 3: The x86 Family

General Concepts

• Basic microcomputer design• Instruction execution cycle• Reading from memory• How programs run

3Motaz K. Saad, Dept. of CS

Page 4: The x86 Family

Basic Microcomputer Design• Clock synchronizes CPU operations• Control unit (CU) coordinates sequence of execution steps• ALU performs arithmetic and bitwise processing

4Motaz K. Saad, Dept. of CS

Page 5: The x86 Family

Motaz K. Saad, Dept. of CS 5

ProcessorProcessor

Control Control UnitUnit

Arithmetic Arithmetic Logic Unit (ALU)Logic Unit (ALU)

Arithmetic Arithmetic Logic Unit (ALU)Logic Unit (ALU)

InputInputDevicesDevices

StorageStorageDevicesDevices

OutputOutputDevicesDevicesMemoryMemoryDataData InformationInformation

InstructionsInstructionsDataData

InformationInformation

InstructionsInstructionsDataData

InformationInformation

Control Control UnitUnit

Page 6: The x86 Family

6Motaz K. Saad, Dept. of CS

Page 7: The x86 Family

Clock• Synchronizes all CPU and BUS operations• Machine (clock) cycle measures time of a

single operation• Clock is used to trigger events

one cycle

1

0

7Motaz K. Saad, Dept. of CS

Page 8: The x86 Family

What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System

8Motaz K. Saad, Dept. of CS

Page 9: The x86 Family

Instruction Execution Cycle

• Fetch• Decode• Fetch operands• Execute • Store output

9Motaz K. Saad, Dept. of CS

Page 10: The x86 Family

Motaz K. Saad, Dept. of CS 10

Cache Memory• High-speed expensive static RAM both inside and

outside the CPU.– Level-1 cache: inside the CPU– Level-2 cache: outside the CPU

• Cache hit: when data to be read is already in cache memory

• Cache miss: when data to be read is not in cache memory.

Page 11: The x86 Family

How a Program Runs

11Motaz K. Saad, Dept. of CS

Page 12: The x86 Family

Multitasking

• OS can run multiple programs at the same time.• Multiple threads of execution within the same

program.• Scheduler utility assigns a given amount of CPU

time to each running program.• Rapid switching of tasks

– gives illusion that all programs are running at once– the processor must support task switching.

12Motaz K. Saad, Dept. of CS

Page 13: The x86 Family

IA-32 Processor Architecture

• Modes of operation• Basic execution environment• Floating-point unit• Intel Microprocessor history

13Motaz K. Saad, Dept. of CS

Page 14: The x86 Family

Modes of Operation• Protected mode

– native mode (Windows, Linux)

• Real-address mode– native MS-DOS

• System management mode– power management, system security, diagnostics

• Virtual-8086 mode• hybrid of Protected

• each program has its own 8086 computer14Motaz K. Saad, Dept. of CS

Page 15: The x86 Family

Basic Execution Environment• Addressable memory• General-purpose registers• Index and base registers• Specialized register uses• Status flags• Floating-point, MMX, XMM registers

15Motaz K. Saad, Dept. of CS

Page 16: The x86 Family

Addressable Memory

• Protected mode– 4 GB– 32-bit address

• Real-address and Virtual-8086 modes– 1 MB space– 20-bit address

16Motaz K. Saad, Dept. of CS

Page 17: The x86 Family

X86 General-Purpose Registers

CS

SS

DS

ES

EIP

EFLAGS

16-bit Segment Registers

EAX

EBX

ECX

EDX

32-bit General-Purpose Registers

FS

GS

EBP

ESP

ESI

EDI

Named storage locations inside the CPU, optimized for speed.

17Motaz K. Saad, Dept. of CS

Page 18: The x86 Family

Accessing Parts of Registers• Use 8-bit name, 16-bit name, or 32-bit name• Applies to EAX, EBX, ECX, and EDX

18Motaz K. Saad, Dept. of CS

Page 19: The x86 Family

Index and Base Registers• Some registers have only a 16-bit name for

their lower half:

19Motaz K. Saad, Dept. of CS

Page 20: The x86 Family

Some Specialized Register Uses• Segment

– CS – code segment– DS – data segment– SS – stack segment– ES, FS, GS - additional

segments

• EIP – instruction pointer• EFLAGS

– status and control flags– each flag is a single

binary bit

• General-Purpose– EAX – accumulator– EBX – base register– ECX – loop counter– EDX – data register– ESP – stack pointer– ESI, EDI – index registers– EBP – extended frame

pointer (stack)

Motaz K. Saad, Dept. of CS 20

Page 21: The x86 Family

Status Flags• Carry– unsigned arithmetic out of range

• Overflow– signed arithmetic out of range

• Sign– result is negative

• Zero– result is zero

• Auxiliary Carry– carry from bit 3 to bit 4

• Parity– sum of 1 bits is an even number

21Motaz K. Saad, Dept. of CS

Page 22: The x86 Family

Intel Microprocessor History

• Intel 8086, 80286• IA-32 processor family• P6 processor family• CISC and RISC

22Motaz K. Saad, Dept. of CS

Page 23: The x86 Family

Early Intel Microprocessors• Intel 8080

– 64K addressable RAM– 8-bit registers– CP/M operating system– S-100 BUS architecture– 8-inch floppy disks!

• Intel 8086/8088– IBM-PC Used 8088– 1 MB addressable RAM– 16-bit registers– 16-bit data bus (8-bit for 8088)– separate floating-point unit (8087)

23Motaz K. Saad, Dept. of CS

Page 24: The x86 Family

The IBM-AT

• Intel 80286– 16 MB addressable RAM– Protected memory– several times faster than 8086– introduced IDE bus architecture– 80287 floating point unit

24Motaz K. Saad, Dept. of CS

Page 25: The x86 Family

Intel IA-32 Family• Intel386

– 4 GB addressable RAM, 32-bit registers, paging (virtual memory)

• Intel486– instruction pipelining

• Pentium– superscalar, 32-bit address bus, 64-bit

internal data path

25Motaz K. Saad, Dept. of CS

Page 26: The x86 Family

Intel P6 Family• Pentium Pro

– advanced optimization techniques in microcode

• Pentium II– MMX (multimedia) instruction set

• Pentium III– SIMD (streaming extensions) instructions

• Pentium 4 and Xeon– Intel NetBurst micro-architecture, tuned for

multimedia

26Motaz K. Saad, Dept. of CS

Page 27: The x86 Family

CISC and RISC• CISC – complex instruction set

– large instruction set– high-level operations– requires microcode interpreter– examples: Intel 80x86 family

• RISC – reduced instruction set– simple, atomic instructions– small instruction set– directly executed by hardware– examples:

• ARM (Advanced RISC Machines)• DEC Alpha (now Compaq)

27Motaz K. Saad, Dept. of CS

Page 28: The x86 Family

What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System

28Motaz K. Saad, Dept. of CS

Page 29: The x86 Family

IA-32 Memory Management

• Real-address mode• Calculating linear addresses• Protected mode• Multi-segment model• Paging

29Motaz K. Saad, Dept. of CS

Page 30: The x86 Family

Real-Address mode

• 1 MB RAM maximum addressable• Application programs can access any

area of memory• Single tasking• Supported by MS-DOS operating

system

30Motaz K. Saad, Dept. of CS

Page 31: The x86 Family

Segmented MemorySegmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset

li ne

ar a

ddr e

sse

s

one segment

31Motaz K. Saad, Dept. of CS

Page 32: The x86 Family

Calculating Linear Addresses• Given a segment address, multiply it by 16 (add

a hexadecimal zero), and add it to the offset• Example: convert 08F1:0100 to a linear address

Adjusted Segment value: 0 8 F 1 0

Add the offset: 0 1 0 0

Linear address: 0 9 0 1 0

32Motaz K. Saad, Dept. of CS

Page 33: The x86 Family

Your turn . . .What linear address corresponds to the segment/offset address 028F:0030?

028F0 + 0030 = 02920

Always use hexadecimal notation for addresses.

33Motaz K. Saad, Dept. of CS

Page 34: The x86 Family

Your turn . . .What segment addresses correspond to the linear address 28F30h?

Many different segment-offset addresses can produce the linear address 28F30h. For example:

28F0:0030, 28F3:0000, 28B0:0430, . . .

34Motaz K. Saad, Dept. of CS

Page 35: The x86 Family

Protected Mode (1 of 2)

• 4 GB addressable RAM– (00000000 to FFFFFFFFh)

• Each program assigned a memory partition which is protected from other programs

• Designed for multitasking• Supported by Linux & MS-Windows

35Motaz K. Saad, Dept. of CS

Page 36: The x86 Family

Protected mode (2 of 2)

• Segment descriptor tables• Program structure

– code, data, and stack areas– CS, DS, SS segment descriptors– global descriptor table (GDT)

• MASM Programs use the Microsoft flat memory model

36Motaz K. Saad, Dept. of CS

Page 37: The x86 Family

What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System

37Motaz K. Saad, Dept. of CS

Page 38: The x86 Family

Components of an IA-32 Microcomputer

• Motherboard• Video output• Memory• Input-output ports

38Motaz K. Saad, Dept. of CS

Page 39: The x86 Family

Motherboard

• CPU socket• External cache memory slots• Main memory slots• BIOS chips• Sound synthesizer chip (optional)• Video controller chip (optional)• IDE, parallel, serial, USB, video, keyboard,

joystick, network, and mouse connectors• PCI bus connectors (expansion cards)

39Motaz K. Saad, Dept. of CS

Page 40: The x86 Family

Intel D850MD Motherboard

dynamic RAM

Pentium 4 socket

Speaker

IDE drive connectors

mouse, keyboard, parallel, serial, and USB connectors

AGP slot

Battery

Video

Power connector

memory controller hub

Diskette connector

PCI slots

I/O Controller

Firmware hub

Audio chip

Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification

40Motaz K. Saad, Dept. of CS

Page 41: The x86 Family

Video Output

• Video controller– on motherboard, or on expansion card– AGP (

accelerated graphics port technology)• Video memory (VRAM)• Video CRT Display

– uses raster scanning– horizontal retrace– vertical retrace

• Direct digital LCD monitors– no raster scanning required

41Motaz K. Saad, Dept. of CS

Page 42: The x86 Family

Sample Video Controller (ATI Corp.)

• 128-bit 3D graphics performance powered by RAGE™ 128 PRO

• 3D graphics performance

• Intelligent TV-Tuner with Digital VCR

• TV-ON-DEMAND™

• Interactive Program Guide

• Still image and MPEG-2 motion video capture

• Video editing

• Hardware DVD video playback

• Video output to TV or VCR

42Motaz K. Saad, Dept. of CS

Page 43: The x86 Family

Memory• ROM

– read-only memory• EPROM

– erasable programmable read-only memory• Dynamic RAM (DRAM)

– inexpensive; must be refreshed constantly• Static RAM (SRAM)

– expensive; used for cache memory; no refresh required• Video RAM (VRAM)

– dual ported; optimized for constant video refresh• CMOS RAM

– complimentary metal-oxide semiconductor– system setup information

• See: Intel platform memory (Intel technology brief)43Motaz K. Saad, Dept. of CS

Page 44: The x86 Family

Input-Output Ports• USB (universal serial bus)

– intelligent high-speed connection to devices

– up to 12 megabits/second– USB hub connects multiple devices– enumeration: computer queries devices– supports hot connections

• Parallel– short cable, high speed– common for printers– bidirectional, parallel data transfer– Intel 8255 controller chip

44Motaz K. Saad, Dept. of CS

Page 45: The x86 Family

Input-Output Ports (cont)

• Serial– RS-232 serial port– one bit at a time– uses long cables and modems– 16550 UART (universal asynchronous

receiver transmitter)– programmable in assembly language

45Motaz K. Saad, Dept. of CS

Page 46: The x86 Family

What's Next• General Concepts• IA-32 Processor Architecture• IA-32 Memory Management• Components of an IA-32 Microcomputer• Input-Output System

46Motaz K. Saad, Dept. of CS

Page 47: The x86 Family

Levels of Input-Output• Level 3: Call a library function (C++, Java)

– easy to do; abstracted from hardware; details hidden– slowest performance

• Level 2: Call an operating system function– specific to one OS; device-independent– medium performance

• Level 1: Call a BIOS (basic input-output system) function– may produce different results on different systems– knowledge of hardware required– usually good performance

• Level 0: Communicate directly with the hardware– May not be allowed by some operating systems

47Motaz K. Saad, Dept. of CS

Page 48: The x86 Family

Displaying a String of Characters

When a HLL program displays a string of characters, the following steps take place:

48Motaz K. Saad, Dept. of CS

Page 49: The x86 Family

ASM Programming levelsASM programs can perform input-output at each of the following levels:

49Motaz K. Saad, Dept. of CS

Page 50: The x86 Family

Summary• Central Processing Unit (CPU)• Arithmetic Logic Unit (ALU)• Instruction execution cycle• Multitasking• Floating Point Unit (FPU)• Complex Instruction Set• Real mode and Protected mode• Motherboard components• Memory types• Input/Output and access levels

50Motaz K. Saad, Dept. of CS

Page 51: The x86 Family

More Details about X86 Family Architecture

X86 family Generations

51Motaz K. Saad, Dept. of CS

Page 52: The x86 Family

X86 Family• 8086 and 8088 Microprocessors• 80x86 architecture

Address bus : 20 bits, 16 bits for 8-bit chips Max. memory capacity : 1 Mbytes

Internal structure is divided into BIU and EU Fetch and instruction execution can occur simultaneously

Length of internal registers expanded from 8 bit to 16 bit

H/W multiply and divide instructions built into the processor

Support for an external math coprocessor for floating-point operations in H/W as much as 100 times faster

52Motaz K. Saad, Dept. of CS

Page 53: The x86 Family

Intel 8085 architecture : 8-bit data, 16-bit address

53Motaz K. Saad, Dept. of CS

Page 54: The x86 Family

Internal architecture of 8086

54Motaz K. Saad, Dept. of CS

Page 55: The x86 Family

PC Standard

For 16bit data bus, two 8-bit memory banks are required expensive at the time

in 1979, Intel announced 8088 µ-P that is identical to the 8086 except an external 8-bit data bus. Two memory accesses are needed to input a word.

IBM announced the IBM-PC, using 8088 µ-P and 16 KB memory (expandable to 64 KB). Clock speed : 4.77 MHz -------- PC standard is defined.

55Motaz K. Saad, Dept. of CS

Page 56: The x86 Family

80186 and 80188 Microprocessors

High-integration CPUs : includes 8086 (or 8088) core and a clock generator, a programmable timer, an interrupt controller, a DMA controller, etc.

Instruction set is fully compatible to 8086 and 8088, but include 9 new instructions.

Used for IBM-PC compatibles and many embedded computers.

56Motaz K. Saad, Dept. of CS

Page 57: The x86 Family

80286 Microprocessor

Processor of IBM PC-AT Provide two programming modes

1) Real mode - functions exactly same as 8086 - use only 20 least significant address lines (max. 1 MB) - faster than 8086 due to redesigning and higher clock

2) Protected mode - 16 new instructions are added- support multi-program environment by giving each program a predetermined amount of memory (16 MB)- programs no longer have physical addresses, but are addressed by a segment selector- Several programs can be loaded into memory at the same time, but protected from each other (*MS-DOS)

57Motaz K. Saad, Dept. of CS

Page 58: The x86 Family

The 8086 and 80286 microprocessors.

58Motaz K. Saad, Dept. of CS

Page 59: The x86 Family

80386 Microprocessor

New Standard announced (1985) by Intel with commitment of successive u-P generations being remained compatible with this chip, Intel Architecture-32 (IA-32) thru 2000.

Data bus & internal registers : 32 bits Address bus : 32 bits max. 4 GB of physical memory

59Motaz K. Saad, Dept. of CS

Page 60: The x86 Family

Internal architecture of 80386

60Motaz K. Saad, Dept. of CS

Page 61: The x86 Family

Internal registers (partly) of 80386

61Motaz K. Saad, Dept. of CS

Page 62: The x86 Family

80386 supports two operating modes (like 80286)

1) Real Address Mode - used by MS-DOS - in this mode, 80386 becomes a fast 8086. 2) Protected Virtual Address Mode (Protected Mode) - On-board MMU manages 4 GB of memory - Each task is given a segment of memory governed by a descriptor register, that defines the segment base address, the segment limit, and the attributes for the segment (code, data, read-only, etc.) - Use paging technique : 4 KB pages can be swapped in and out of memory (using a disk) to allow a task to have a virtual memory space as large as 64 TB.

62Motaz K. Saad, Dept. of CS

Page 63: The x86 Family

When operating with 64 KB of cache, the 386 achieves a hit rate of 93% the processor operates at full speed 93% of the time

Instruction set of 386 is 100% compatible with the older processors in the family.

14 new instructions are added and several others are modified. [ex] data can be moved between the internal registers at a time.

80386SX : designed to ease the transition from 16- to 32-bit processors --- 16-bit external data bus and 24-bit address bus.

63Motaz K. Saad, Dept. of CS

Page 64: The x86 Family

80486 Microprocessor

Maintain compatibility with the older u-Ps Only 6 new instructions are added to be used by OS S/W, not by application programs. Redesigned using RISC concepts frequently used instructions to execute in a single clock cycle. New 5-stage instruction execution pipeline 5 instructions can be executed at once. On-board 8K cache and 80387 coprocessor twice faster than 386 (20 MHz 387 = 40 MHz 386)

486SX : excludes 80387, designed for low-end appli- cations that do not require a coprocessor.

64Motaz K. Saad, Dept. of CS

Page 65: The x86 Family

486DX2 and DX4

DX2 : the internal clock rate is twice the external clock. DX4 : the internal clock rate is three times.

Allow to use less expensive components on the computer system board, while the processor operate at its maximum data rate (internally).

[Ex] 486DX2 66 : 66 MHz (int. clock) & 33 MHz (ext. clock) 486DX4 100 : 100 MHz (int. clock) & 33 MHz (ext. clock)

Overdrive Processors : 486 system boards include an over-drive socket to allow users to upgrade low-speed 486DX or486SX with 486DX2 and DX4 style processors.

65Motaz K. Saad, Dept. of CS

Page 66: The x86 Family

Pentium

Superscalar Architecture : provides two instruction execution pipelines, each with its own ALU, addressgeneration circuitry, and data cache interface. execute two different instructions simultaneously

Additional Features :• includes on-board cache (separate 8K instruction cache and data cache) and a coprocessor• 8-stage instruction pipelines• achieves 5~8 times floating-point performance of 486• external data bus : 64 bits• about twice as fast as the 486

66Motaz K. Saad, Dept. of CS

Page 67: The x86 Family

Key features of the Pentium microprocessor. The execution unit has two pipelines allowing two instructions to be executed simultaneously.

67Motaz K. Saad, Dept. of CS

Page 68: The x86 Family

MMX (Multimedia Extension) : provides 3 architecturalenhancements over non-MMX Pentium

① 57 instructions are added for multimedia (audio, video,and graphic data) applications.

② SIMD(Single-Instruction stream Multiple-Data stream)allows the same operation to be performed on multipledata items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement.

③ Internal cache size is increased from 16K to 32K.

For general applications, 10~20% performance improved.For multimedia applications, nearly 70% improved.

68Motaz K. Saad, Dept. of CS

Page 69: The x86 Family

Socket 7 : ZIF(zero insertion force) socket

Pentium chip : 296-pin PGA package. A heat sink and fan are mounted atop the chip, and the entire assembly plugged into a ZIF, so-called socket 7.

Socket 7 defines a platform that defines the front side bus connection to the L2 cache, disk interface, video interface, and the ISA and PCI expansion buses.

69Motaz K. Saad, Dept. of CS

Page 70: The x86 Family

Pentium processor with heat sink and fan mated to a Socket 7 connector.

70Motaz K. Saad, Dept. of CS

Page 71: The x86 Family

Pentium Pro

6th - generation processors (Pentium Pro, Pentium II, Pentium III and Celeron)

36 address lines max. 64 GB memory New features 1. Inclusion of L2 cache in the same package with proc. 2. New system board platform called Socket 8 (Pro), slot 1 & 2 (Pentium II, III, and Celeron), and Socket 370 (Pentium III and Celeron). 3. New instruction architecture based on Dynamic Execution

Two chips in One Package : Pentium Pro consists of two separate silicon dies – one for the processor and the other for256KB L2 cache.

71Motaz K. Saad, Dept. of CS

Page 72: The x86 Family

The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.)

72Motaz K. Saad, Dept. of CS

Page 73: The x86 Family

Dynamic Execution : a new approach to processing S/Winstructions that reduces idle processor time.

① Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditionalbranches reduce waste of pipeline clocks.

② Data Flow Analysis : looks at upcoming S/W instruc-tions for the optimal sequence of processing.

③ Speculative Execution : allows to execute instructionsin a different order from which they are entered theprocessor = “out-of-order execution”. The result ofthese instructions are stored as speculative resultsuntil their final states can be determined.

73Motaz K. Saad, Dept. of CS

Page 74: The x86 Family

Superscalar Processor of Degree Three : Pentium has

three instruction decoders, and can execute 3 simul-

taneous instructions.

Internal Cache : L2 cache in the same package.

74Motaz K. Saad, Dept. of CS

Page 75: The x86 Family

Pentium II

• Pentium Pro is dead (short life) due to - the lack of MMX instructions- use of the expensive dual- and tri-cavity package

Pentium II is a Pentium Pro with MMX technology, repackaged in a new single-edge contact(SEC) cartridge that is inserted in “Slot 1 connector – 242 pins” or “Slot 2 connector – 330 pins

Processor and L2 are mounted on a ceramic substrate (silicon dies are separate)

Processor clock : 300 ~ 450 MHz, bus clock : 100 MHz L1(32 KB) & L2(512 KB) with 64-bit dedicated bus

75Motaz K. Saad, Dept. of CS

Page 76: The x86 Family

Exploded view of single-edge contact (SEC) cartridge. (Courtesy of Intel Corporation.)

76Motaz K. Saad, Dept. of CS

Page 77: The x86 Family

Installing the SEC cartridge into the retention mechanism. (Courtesy of Intel Corporation.)

77Motaz K. Saad, Dept. of CS

Page 78: The x86 Family

Celeron

Pentium II without L2 cache (Pentium II SX ?)

Use the slot 1 connector without the plastic cover called “naked CPU”

Celeron A : Include 128KB L2 cache on the same die with processor.

- Drawback : 66 MHz bus cycle- 370-pin PGA package (called Socket 370)

78Motaz K. Saad, Dept. of CS

Page 79: The x86 Family

The Celeron processor is a Pentium II without the L2 cache. Later versions, called the Celeron A, include this cache on the same silicon die with the processor. (Courtesy of Intel

Corporation.)

79Motaz K. Saad, Dept. of CS

Page 80: The x86 Family

Pentium III

Higher clock speed : based on the Pentium II core, with 600MHz clock and an external bus freq. of 133MHz

70 new streaming SIMD extensions (SSE) :- 50 to improve floating-point performance- 12 to improve multimedia processing- 8 to improve the efficiency of L1 cache

80Motaz K. Saad, Dept. of CS

Page 81: The x86 Family

The Pentium III microprocessor with integrated L2 cache. This chip has more than 22 million transistors. (Courtesy of Intel Corporation.)

81Motaz K. Saad, Dept. of CS

Page 82: The x86 Family

Xeon Processors

Scalability : As processing demands increase, additional processors can be interconnected to keep pace.

- One of the advantages of Pentium Pro that can support up to 4 processors ; SMP (symmetric multiprocessing)

Pentium II Xeon processor can be scaled to 2, 4, 8 or more, and used for high-end server and workstations.

Pentium III Xeon processor : similar but offer the strea- ming SIMD technology.

82Motaz K. Saad, Dept. of CS

Page 83: The x86 Family

P7 Itanium

IA-64 : 7th-generation processor architecture, Code name = Merced

64-bit architecture : 128 64-bit registers & 128 82-bit floating-point registers (including hidden bits) [c.f.] IA-32 : 10 32-bit reg., 8 fl-pt. reg.

Explicit parallelism : instructions are packed in 128-bit bundles ready for execution. Each bundle consists of 3 41-bit instructions and 5-bit template. All three inst- ructions are dispatched in parallel

83Motaz K. Saad, Dept. of CS

Page 84: The x86 Family

Speculation : preload data to minimize memory delays when data is needed

Predication : When a conditional branch instruction is encountered, Itanium follows both branch paths, then commits the results of the correct path only.

Data bus : 128 bits

Address bus : 64 bits max. 264 bytes memory

84Motaz K. Saad, Dept. of CS

Page 85: The x86 Family

80x86 Compatible Microprocessors

Second Sources : manufacturing 80x86 u-P chips after licensed by Intel.

Clones and Look-Alikes

Pin-for-pin replacements with all of the same fea- tures as the Intel processor.

[Ex] AMD 386DX, 486DX4-100, Cyrix 5x86, etc.

85Motaz K. Saad, Dept. of CS

Page 86: The x86 Family

The AMD K7 or Athlon processor. It mates to a new proprietary socket called Slot A. (Courtesy of Advanced Micro Devices.)

86Motaz K. Saad, Dept. of CS

Page 87: The x86 Family

Measuring Processor Performance

Benchmark programs : used to measure the performance of a computer system (system benchmarks) or of a com- ponent in that system such as the processor, disk, video card, or main memory (component benchmarks).

Component-level Benchmarks

• Whetstone : used to measure the time to execute integer and floating-point arithmetic instructions and “if” statements. --- including a high percentage of fl.pt. operations mostly used to represent numerical programs.

87Motaz K. Saad, Dept. of CS

Page 88: The x86 Family

• Dhrystone : a synthetic benchmark consisting of 12 procedures with 94 statements, no fl.-pt. ops.

• Microprocessor Benchmarks : developed for compa- ring the processing ability of the vaious u-P chips. --- Ziff-Davis’ CPUmark and Intel’s iCOMP index.

CPUmark : measures the speed of a PC’s proc- essor subsystem, including the CPU, its internal and external caches, and system RAM. [Ex] Fig. 1-20 : CPUmark99 ratings for 80x86s

iCOMP : combines 4 industry standard benchmarks : CPUmark32, Norton SI32, SPEC95, and the Intel Media Benchmark (audio, vedio, image, 3-D, etc.).

88Motaz K. Saad, Dept. of CS

Page 89: The x86 Family

CPUmark is a benchmark that measures the speed of the processor and its internal cache.

89Motaz K. Saad, Dept. of CS

Page 90: The x86 Family

System-level Benchmarks

• Microcomputer Benchmarks : measures the speed of processor with considering a slow disk or video subsystem.

Winston : System-level, application-based benchmark to measure a PC’s overall performance when running today’s 32-bit applications on Window 95, 98, NT. [Ex] Winstone 98 ratings for 80x86s

Performance Rating : Cyrix and AMD developed theP-rating (Processor Performance) system --- runningapplications on a processor and compare to a Pentiumu-P. [Ex] Table 1-2 : PR166 ~ 366 for AMD and Cyrix chips

90Motaz K. Saad, Dept. of CS

Page 91: The x86 Family

Winstone 98 measures the performance of a PC system running typical Windows applications.

91Motaz K. Saad, Dept. of CS