Advanced Higher Computing Computer Architecture Chapter 2

Advanced Higher Advanced Higher ComputingComputing

Computer Computer ArchitectureArchitecture

Chapter 2Chapter 2

The internal architecture of the microprocessor

The memory address register (MAR)Data transferred to the processor from anywhere else in the system, arrives in the MDR.Data which is to be sent out from the processor, is sent out from the MDR along the data bus.The MDR forms a tiny buffer between the internal bus and the data bus, so it is also known as the memory buffer register

The memory data register (MDR) - sometimes called the memory buffer register (MBR)Addresses never arrive in the MAR from outside the processor. The address bus is a one-way bus, unlike the data bus which can carry data in either direction.The data that is being fetched from memory will be a machine code instruction. This arrives in the MDR like any other item of data. It is then transferredby the internal bus to the instruction register

The instruction register (IR)Data is held while being decoded by the control unit. At any instant, the IR holds the machine code instruction which is currently being decoded and executed.A machine code program consists of a series of machine code instructions, held in main memory. These are fetched one by one from memory.

The program counter (PC)How does the processor know where to find the next instruction to be processed? the program counter holds the address of the nextInstruction.

In addition, there will be many general purpose registers (GP registers), which, as their name implies, can be used to store any item of data at any time as required by the current program running in the processor.

Pupil taskPupil task

Complete questions 10-14 on pages 30 & 31

To execute a machine code program it must first be loaded, together with any data that it needs, into main memory (RAM). Once loaded, it is accessible to the CPU which fetches one instruction at a time, decodes and executes it.

Fetch, decode and execute are repeated until a program instruction to HALT isencountered. This is known as the fetch-execute cycle.

1. Fetch. The instruction is fetched from the memory location whose address is contained in the Program Counter and placed in the Instruction Register. The instruction will consist of an operation code and possibly some operands. The operation code determines which operation is carried out. The term opcode is usually used as a shorthand for operation code.

2. Decode. The pattern of the opcode is interpreted by the Control Unit and the appropriate actions are taken by the electronic circuitry of the CU. These actions may include the fetching of operands for the instruction from memory or from the general purpose registers.

3. Increment. The Program Counter is incremented. The size of the increment will depend upon the length of the instruction in the IR. For example, if this instruction was a 3 byte instruction then the PC would be incremented by 3.

4. Execute. The instruction in the Instruction Register is executed. This may lead to operands being sent to the ALU for arithmetic processing and the return of the result to one of the general purpose registers. When a HALT instruction is received then execution of this program ceases.

The fetch phase

1. The contents of the PC are copied into the MAR;

2. The contents of memory at the location designated by the MAR are copied into the MDR;

3. The PC is incremented;

4. The contents of the MDR are copied into the IR.

The execute phase consists of the following steps:1. Decode the instruction in the IR;2. Execute the instruction in the IR.

The execute phase

For convenience we can write this series of steps as a pseudocode representation: loop forever

PC > MAR[MAR] > MDRPC +1 > PCMDR > IRDecode IRExecute IR

End loop Note that means is copied to and that means the contents of the location pointed to by .


Web animation on Scholar for:-

Use of Registers in an Instruction Fetch

&

Sequencing the steps in an instruction fetch

Improving performance

Computer and microprocessor designers are driven by the need to improve computer performance to meet the ever increasing demands of computer users.

Early microprocessors had clock speeds measured in kHz (thousands of cycles per second) while modern processors such as the Pentium III are now achieving speeds of over 1 GHz (thousand million cycles persecond). Obviously clock speed is an important factor in determining

the clock speed versus the performance of Intel processors as measured in Million Instructions perSecond (MIPS). MIPS is now an outdated way to measure performance but it is the only measure applicable over the whole range.

Intel Processor Clock Speed MIPS8086 8 MHz 0.880286 12.5 MHz 2.780386DX 20 MHz 6.080486DX 25 MHz 20Pentium 60 MHz 100Pentium Pro 200 MHz 440

This table shows that the performance as measured by MIPS has gone up at a higher rate than has the clock rate

Pupil task Pupil task (30mins)(30mins)

Intel Processor Clock Speed MIPSPentium 2 (1997)Pentium 3 (1999)Pentium 4 (2000)Itanium (2001) Pentium M (2003)

Go to www.intel.com/pressroom/kits/quckreffam.htm

orwww.intel.com and enter quckreffam into the search box

Complete the table below. Then predict what clock speedsYou would expect to be available in the next 5 years

http://www.intel.com/pressroom/kits/quckreffam.htm

http://www.intel.com/

Increasing data bus width

Intel 8086 was developed which used a 16 bit data busand set of internal registers. This gave huge improvements in performance, and allowed the development of the first PCs

Increasing the clock speed will increase the number of data fetches that can be made per second.

Increasing the data bus width will increase the amountof data that can be fetched each time

A data bus width of only 4 bits took 2 fetches to fetch a byte from memory to the processor. The Intel 8008 processor (1972) used an 8 bit data bus. Clearly the internal registers (particularly the MDR) had to match this

In 1985, Intel decided to increase the data bus width and internal registers of its processor again, so the 80386 wasproduced with a 32 bit data bus. 32 bits was the norm for the next 10 years, until the first 64 bit Pentium chip was introduced in 1995. All PC designs since then have made use of 64 bit techn.

NOTE: A similar development has taken place in the Motorola chips which are used in Apple computers, fromthe early 68000 16-bit architecture through to the currentG5 64-bit architecture

the width of the address bus has no direct effect on performance

The width of the address bus determines the maximum memory address

address bus widths have also increased steadily over the last few decades from 16-bit to 32-bit, and now 64-bit.

The earliest computers had a single system bus connecting the processor with the main memory and peripheral interfaces. This system bus operated at the same speed as the processor.

the data bus width has been stepped up from 8 to 16, 32 and now 64 bits wide.

The number of different components within a system hasalso increased. A modern processor is likely to be connected to a range of peripherals as well as main memory. These peripherals operate at lower speeds than the processor and main memory. As a result, designs have developed with multiple buses within the system

a very fast "frontside" system bus for main memory, a slower bus for communication with peripheral devices

The PCI and PCI-X buses are connected to the main system bus by a bus bridge, which controls the traffic onand off the bus

The PCI and PCI-X buses are known as multipoint buses; (this means they can have branches to any number of Devices)

The previous diagram shows how components interactThe separation of (relatively slow) peripheral traffic on to the PCI bus means that fast data transfers between main memory and the processor are not slowed down.

The PCI-X bus, as well as being faster and wider than the original PCI bus, also has a number of special features to maximise performance.

These include the prioritisation of data from different devices, which particularly improves performance of streaming audio and video.

Cache Memory

The speeds quoted for data access to main memory sound quite impressive. However, current processors are able to process data even faster than that!

One solution to this problem would be to increase thenumber of registers on the microprocessor itself, so that all data required would be instantaneously availableto the processor.

However, this solution is impractical, leading to over complex and large microprocessor chips

Cache memory is connected to the processor by the "backside" bus

Normally whole blocks of data are transferred from main memory into cache, while single words are transferred along the backside bus from the cache to the processor.

Cache memory uses the faster but more expensive static RAM chips rather than the less expensive, but slower, dynamic RAM chips which are used for most of the main memory

L1 and L2 cache

Most modern chips also have level 1 (L1) cache. This is similar to L2 cache, but the cache is actually on the same chip as the processor. This means that it is even faster toaccess than L2 cache.

Pentium processors have two L1 caches on the processor. One of these is for caching data, while the other is use for caching instructions (machine code).

In the Pentium 4 processor, each of these is 8Kbytes.Similarly, the PowerPC G4 processor has two 32Kb L1 caches.

As we have seen, memory access is one of the major bottlenecks limiting performance of computer systems.

Many techniques have been devised to overcome this, including use of SRAM, widening the data bus, using separate buses for memory and peripherals,and the use of L1 and L2 cache.

Another technique which can be applied is called interleaving.

Memory interleaving

The idea behind interleaving is that memory can be split up The idea behind interleaving is that memory can be split up into 2 or 4 independent RAM chips. A memory read or write into 2 or 4 independent RAM chips. A memory read or write will normally take 3 or 4 clock cycles to perform.will normally take 3 or 4 clock cycles to perform.Data is actually being transferred along the data bus during Data is actually being transferred along the data bus during only 1 of these clock cycles.only 1 of these clock cycles.The processor has to insert "wait" states into its program to The processor has to insert "wait" states into its program to allow for this. If memory interleaving has been implemented,allow for this. If memory interleaving has been implemented, the processor can use a "wait" state to initiate the next the processor can use a "wait" state to initiate the next memory access, so saving time.memory access, so saving time.In effect, the processor can access the 4 memory chips almostIn effect, the processor can access the 4 memory chips almostsimultaneously. This increases throughput significantly. To simultaneously. This increases throughput significantly. To make best use of this, successive data items must be stored in make best use of this, successive data items must be stored in different memory chips.different memory chips.

Memory interleaving is tricky to implement for memory fetches, as the processor has to deal with the data that arrives, which may require further processing steps. It is ,therefore, more often used for memory writes, where the processor simply sends the data off to memory, and does not have to "worry" about what happens next.

For a similar reason, memory interleaving is used to speed up access to video RAM.

This is less problematic than main memory, as all the "data" is simply data, whereas in main memory, the "data" may be instructions!

Direct Memory Access (DMA)

direct memory access (DMA), which is used when data is being transferred to or from a peripheral device.

There are two methods commonly used to transfer data without DMA –• programmed I/O, • interrupt-driven I/O

The inefficiencies of programmed and interrupt driven I/O are not too serious under most circumstances, but become a serious issue when large blocks of data are to be transferred between main memory and a slow peripheral.

DMA is a technique which overcomes this.

Diagram of DMA – DMAC can be required For exam papers?


Complete questions 18 – 25 on pages 42 & 43

Documents

Advanced Higher Computing Computer Architecture Chapter 2