Upload
lamduong
View
213
Download
0
Embed Size (px)
Citation preview
Projects for IC-project and Verification HT2-2010-VT2-2011
You need to select a project as a team (2 members). Please apply by email to [email protected], you need to specify 1st and 2nd choice. We need to distribute workload and therefore are we not able to guarantee that you will get your 1st choice. Please apply no later than Dec 2nd.
MAP Channel Decoder (Reza)
Reconfigurable UMTS filter (Deepak)
Implementation of a JPEG Accelerator (Johan) Hardware Based Media Player (Yasser) Surveillance System (Isael)
RISC track:
1st draft Mini-MIPS (Chenxin)
IC-Project: MAP Channel Decoder Objective: In this project the students will design a decoder for rate 1/2 convolutional codes generated by [7, 5] encoder. The encoder is memory two and is shown in Figure 1. There are 4 states on the Trellis representation for the encoder of memory two. Decoding requires a more complicated hardware and is done using the BCJR decoding algorithm on a tail-biting structure. Inputs of the decoder are the coded data generated by the encoder (implemented in MATLAB, C++ or any other programming language) and the output of the decoder is the decoded (the original) message. In this project the students only need to implement the decoder in the hardware, not the encoder.
Figure 1. [7,5] convolutional encoder
Grading: Grade: 3
Mandatory: The decoder should work properly according to the known input and the desired output sequences during the verification. A fixed block length for the codes and soft decision for decoding can be considered here. A through comparison with Matlab results are also required. Grade: 4 Mandatory: Tasks for grade 3 + through optimization and verification of the design using power analysis, trying different wordlengths for the inputs and calculations as well as efficient memory utilization. Important design factors such as area and power consumption expected to be optimized to some reasonable extent. All considerations for an excellent design and a well prepared report are taken into account.
Reconfigurable UMTS filter:
This project deals with the implementation of a UMTS filter used in a Wideband-CDMA system. A
generic block diagram of the W-CDMA receiver is shown in Figure 1.
Figure.1: Block diagram of a w-CDMA receiver.
The filter specifications has to satisfy the requirements from the 3GPP standard, out of band signal
attenuation being one of them. The length of the filter that satisfies the 3GPP specification was found
out to be atleast 65 taps in [1].
The difference between the in-band power and the out-of-band power is defined as adjacent channel
selectivity (ACS) and is shown in Figure 2. There can be scenarios where the out-of-band signal power is
either very strong or weak or somewhere in-between. It was also shown in [1] that by measuring the in-
band and out-of-band signals the filter can be operated at relaxed specification and reduce power.
Coefficient optimizations on the filter was carred out in [2].
Figure.2: UMTS filter specification.
Figure.3: Block diagram of the optimized filter architecture.
In [3 ]further optimizations were carried out for ASIC implementations such as, splitting up the filter into
filterbanks to reduce the number of clock domains, taking advantage of early saturation of the signal
power measurement units etc.
This project aims at implementing architecture of the UMTS filter shown in Figure 3.
Challenges:
In logic design: The architecture involves the design of an FIR and an IIR filter as modules. It also consists
of a control unit that takes appropriate decisions to vary the length of the FIR filter from a maximum of
65 taps to a minimum of 5 taps. The control unit also involves the implementation of a hardware
division unit.
In Synthesis and Place&Route: Designing with multiple clock domains, clock dividers.
[1] R. Veljanovski, “A Reconfigurable Root Raised Cosine Filter for a mobile receiver,” Ph.D. dissertation, Victoria University of Technology, 2003. [2] H. Bruce, “Power optimisation of a reconfigurable FIR filter,” Master’s thesis, Lund University, Sweden, 2004. [3] D. Dasalukunte, A. Palsson, M. Kamuf, P. Persson, R. Veljanovski, V. Öwall: Architectural Optimization for Low power in a Reconfigurable UMTS filter,WPMC, San Diego, 2006.
Implementation of a JPEG Accelerator
Johan Lofgren
email: [email protected]
Abstract—The aim of this project is to implement a JPEG accelerator
in hardware. The implementation should be able to both encode and
decode JPEG images. In the encoder chain, there will be a Discrete
Cosine Transform (DCT) unit, Quantizer unit and an Entropy Encoderunit. In the decoder chain the needed blocks are the Entropy Decoder,
the Dequantizer and the Inverse Discrete Cosine Transform (IDCT). The
DCT unit should be based on the algorithm proposed by Arai et al. The
implementation process goes through the basic steps of a ASIC designflow from specification, software modelling, implementation, to synthesis,
and place and route.
I. INTRODUCTION
The JPEG standard is one of the most common image compression
standards available. It is a lossy compression, which means that the
encoded image can normally not be exactly recreated. This allows
for higher compression rates. The encoding/decoding procedure is
suitable for hardware acceleration, because the process consists of
many similar, simple operations. In this project acceleration hardware
is constructed.
A. JPEG Encoding
Image
(8x8 block)
DCTQuantizer
(ZigZag)
Entropy
Encoder
Compressed
Data
Table Table
Fig. 1. JPEG encoding chain
The encoding chain is shown in Fig. 1. First the image data is
transformed with a Discrete Cosine Transform (DCT). This transform
has some nice properties that allows for better compression of the
image. The next stage quantize the output of the DCT. This is the
lossy stage. Here is where information is lost. However, the removed
information is that which the human eye is least sensitive to, and
therefore the image still looks good for the human observer. Finally
the image data is encoded using an entropy (Huffman) encoding, and
stored in compressed form.
B. JPEG Decoding
Image
(8x8 block)
IDCTDequantizer
(ZigZag)
Entropy
Decoder
Compressed
Data
Table Table
Fig. 2. JPEG encoding chain
The decoding is running the process in reverse, as shown in Fig. 2.
First the entropy decoding block is executed, then the dequantizer and
finally the Inverse Discrete Cosine Transform (IDCT) is run. Different
table data is used based on the level of compression used.
C. Discrete Cosine Transform
The DCT is similar to the Discrete Fourier Transform (DFT) and
is defined by the following equation
F (u, v) =
N−1∑
x=0
M−1∑
y=0
cos
[
(2x+ 1)uπ
2N
]
cos
[
(2y + 1)vπ
2M
]
The 2-D DCT has better energy concentration properties than the
DFT, see figure 3. It produces less high frequency components at the
boundaries of the transformation blocks, thus less visual artifacts.
Therefore, the DCT is used in many standardized compression
algorithms, e.g., JPEG and MPEG. As for the DFT, there exists many
efficient implementation algorithms and a good compilation of these
implementations is done by G.S. Taylor and G.M. Blair [1] in which
the algorithm proposed by Arai et al [2] is the most important and
on which this assignment is based on.
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Fig. 3. Example of energy compaction of an 8 × 8 block using the DCTalgorithm.
II. ASSIGNMENT
The assignment is divided into a number of parts.
Grade 3: In order to pass the course (grade 3), the minimal
requirements are to implement the 2D DCT and the 2D IDCT,
together with the quantizer/dequantizer units.
Grade 4: To receive a grade 4, also the entropy encoder/decoder
needs to be implemented.
Grade 5: A grade 5 requires a full implementation including a
way to display the image, either using a VGA output or an interface
to a computer. Gray-scale images are sufficient.
All Grades: Implement a model in e.g. C or Matlab and generate
testvectors. Translate the software model into VHDL and verify the
funtionality by simulation in Modelsim. Synthesize the design and
perform P&R.
REFERENCES
[1] G. Taylor and G. Blair, “Design for the disrete cosine transform in vlsi,” inComputers and digital techniques, IEE proceedings, vol. 145, mar 1998,p. No. 2.
[2] T. A. Y. Arai and M. Nakajima, “A fast dct-sq scheme for images,” intrans. IEICI, vol. E71, nov 1988, pp. 1095–1097.
IC-Project:
Hardware Based Media player
Objective:
In this project the students will design a hardware based media player application. This
application must have the capability of handling analog stereo sound input and output the
sound on the speakers. The student must display the different frequency bands on the
VGA monitor, with a keyboard based control panel. The panel should be able to control
the volume, balance and frequency attenuations or amplifications.
Grading:
Grade: 3
Mandatory: Handling analog stereo sound input and output the sound on the
speakers. The student must display 10 different frequency bands on the VGA monitor for each channel. You will have to submit a preliminary report explaining how you have
defined your filter bands together with show MATLAB and VHDL implementations.
Grade: 4
Mandatory: Tasks for grade 3 + a keyboard based control panel that should be
able to control the volume, balance and frequency attenuations or amplifications. You
will have to submit a preliminary report explaining how you have defined reconstructed
your audio signal after filtering together with MATLAB and VHDL implementations.
Grade: 5
Mandatory: Tasks for grade 4 + echo feature with a control system to change the
echo duration for atleast two different time periods). You will have to submit a report
explaining how you have defined implemented echo together with MATLAB and VHDL
implementations.
All of the task must go through the complete flow of Digital IC Design and a final report
must be submitted.
Supervisors:
Yasser
IC-Project and Verification - ETIN01Surveillance SystemProject Description
Isael DiazEIT - Lund University
2010 / 2011 Ht2-Vt2
1 Introduction
Nowadays automated surveillance systems are being introduced in modern society more often.One of the factors influencing this trend is that today is possible to include an image sensor innearly any tiny gadget, being a mobile phone the perfect example. Cameras are becoming notonly smaller, and therefore less intrusive to humans, but also humans are getting more used tobe surrounded by cameras.
One additional factor is the high demand on situation awareness. Safety has become one ofthe main concern of the average citizen, companies and governments spend more money thanever in making sure certain area is secure or any thread or danger is detected in good time.Surveillance systems can be designed to cover one or several tasks such as identity tracking,location tracking, activity tracking, etc. All this tasks require a awareness of the system’s scene.Typical questions to be answered by such systems are: Is there someone in the scene? Who isthat someone in the scene?, Is that someone aloud to be in the scene? What is that someonedoing?.
2 Functional Description
Taking into consideration that the approach selected for this project is based on backgroundsubtraction, the first stage will consist of a background subtraction where the pixels not con-sidered as part of the of the background are extracted. Robust background subtraction is aresearch topic on its own. A binary foreground pixel is defined as the subtraction of a currentimage from a background model, denoted by
FG = |I −BG| > Thr (1)
Where FG denotes the binary foreground, I the current pixel value in the scene, BG theknown background model stored in the system, and Thr a threshold is a constant defined bythe illumination and other conditions in the the scene.
Once the foreground is detected, it is necessary to group pixels in blobs where every blobrepresents an independent object moving in the scene, this is accomplish by connecting andclassifying neighboring foreground pixels, this operation is called labeling and consists of tagging
1
Surveillance System 2
Objectmoving
BG Subtraction Object Segmentation
FeatureExtraction
Objecttracking
Figure 1: Object tracking’s stages
all pixels adjacently connected with a common label. All pixels with the same label are definedto belong to the same blob. Pixel’s label can be calculated by Equation 2, where L denotesthe label in the position i, j, P () is the propagation function that generates a new label orpropagates the neighboring label and K denotes the neighborhood dimensions.
L(i, j) = I(i, j) · P
(K∑
n=−K
L(i + n, j − 1)
)(2)
A blob describes an object or group of objects. When pixels are grouped into blobs, featureslike area and center of gravity (COG) can be measured from the maximum and minimum valuesof the blobs in the image. When stereo-images are provided is even plausible to estimate thedepth from the sensors to the object of interest. Foreground identified objects can be seenas objects containing a specific position, occupying certain area, moving with certain speedtowards a specific direction in the scene. All these parameters can be keep in track in order tofire an alarm when an object is moving to close to certain corner of the scene, which could bea entrance to a restricted area.
The Figure 1 shows the different stages from background subtraction to Object tracking.Some examples of final implementations can be extracted from the articles listed in the referencesection at the end of this document.
3 Project
The project consist in developing a number of hardware accelerators contained in a typicalsurveillance system. The number of accelerators to be developed varies according to the desiredgrade in the project.
Figure 2 shows a block diagram of the final surveillance system. The accelerators to bedeveloped are in color pink, while the remaining blocks are to be placed in order to have fullfunctionality. They can be downloaded from Internet (Typically Xilinx website) or obtainedfrom anywhere else.
3.1 Grade 3 - Object Segmentation and Feature Extraction
In order to obtain a grade 3, two functions have to be developed, namely, Object Segmentationand Feature Extraction. The input to this stage is a SVGA size image (800x600) of theforeground stored in the external memory.
The developed block has to be able to extract the foreground image from the externalmemory, group pixels into blocks, or segments, and extract its object features such as size and
Surveillance System 3
Memory Controller
External Memory
Background Model
??
Foreground
BackgroundSubtraction
Filtersstage
FeatureExtraction
ObjectTracking
VGAController
Test platform
ObjectSegmentation
Figure 2: Surveillance system
position. Note that it is crucial to not interfere with the writing of the foreground from theprevious stages. The input image has to be shown in a VGA monitor, which is to be connectedto the test environment, The objects in the foreground have to be enclosed in a green coloredbox. The system has to be able to detect a minimum size of objects of 25x20 pixels.
3.2 Grade 4 - Object tracking
In order to obtain a grade 4, the position and size of the previous labeled objects is registered,extracting its speed and direction. The green colored box enclosing the objects in the foregroundwill change color to red to those objects that are moving faster than n pixels per second. Bydefault n is equal to 30. The system has to be able to follow upto 10 independent objects inthe image.
Instead of a single foreground image, the test environment will store a new foreground imageinto the external memory in regular intervals. e.g. 18 frames per second.
3.3 Grade 5 - Complete Surveillance system
Grade 5 is obtained by completing functionality of the entire system by incorporating a realcamera and performing background subtraction, to create the foreground image utilized by thefollowing stages, previously described. Since the image will be streamed in from real life images,some filtering is advised to get rid of small undesired particles in the image.
The background model will be created from an initial image taken by the camera and storedin the external memory. This initial image has to be clean, that is to say, no moving objects.
3.4 Specifications
• All accelerators have to be placed together in the same silicon core. Try to reuse as manypads as possible.
• For functionality purposes, Xilinx Virtex II Pro development kit will be used with acamera NI-LM9648 with a speed of 18 frames per second (In case of grade 5)
Surveillance System 4
• All blocks or functions must be able to show their functionality on the FPGA as standalone blocks.
• Minimum size objects to be detected is 25x20 pixels (grade 3)
• Calculate and indicate speed for a maximum of 10 objects that move faster than 30 pixelsper second. (grade 4)
3.5 Grading
• For grade 5, the entire surveillance system has to be demonstrated together, not as sep-arate blocks.
• The final grade will be assigned by analyzing project’s: development, realization, verifi-cation and its corresponding written reports.
ETIN01 - Digital IC-proje t & Veri� ation (HT2∼VT2 2010/2011)Mini-MIPS design proje t v.1.0Short project description
This document describes the Mini-MIPS design project which is part of course ETIN01
“Digital IC-project & Verification” conducted at EIT, LTH. Mini-MIPS is a 32-bit RISC
with a simple instruction set similar to the MIPS instruction subset considered in chapter
4 of the textbook: D.A. Patterson & J.L. Hennessy, “Computer Organization and Design -
The Hardware/Software Interface, fourth edition.” (Chapters 5 and 6 in the third edition)
The idea of this project based course is to guide students through a simple microprocessor
design, in order to get a thorough understanding of the basic concepts taught throughout
the preceding course EITF20 “Computer Architecture”. Hence, functional richness of the
CPU design is not the primary goal in this course.
With the aid of SPIM assembly language simulator, ModelSim SE simulator, ASIC
design tools, and FPGA development board, students in the course will:
• Design an executable specification of the Mini-MIPS (Task 1);
• Design a simple 5-stage pipelined implementation of the Mini-MIPS (Task 2);
• Synthesize and Place & Route the pipelined Mini-MIPS in standard ASIC design
flow using 130nm low power CMOS cells (Task 3);
• Verify the pipelined Mini-MIPS in FPGA development board (Task 4);
• Integrate the pipelined Mini-MIPS with a console I/O peripheral (Task 5);
• Adding a memory hierarchy to the pipelined Mini-MIPS; (Task 6 & 7);
• Design and implement an extended instruction set to use MipsIt GCC C compiler
(Task 8).
Prerequisite courses to the Mini-MIPS design project are EITF35 “Introduction to
Structured VLSI Design” and EITF20 “Computer Architecture”, where the VHDL lan-
guage, ModelSim simulator, as well as the background knowledge on computer architec-
ture have been acquainted. Standard ASIC design flow including both hardware integra-
tion, synthesis, and place & route have been introduced in earlier laboratory exercises of
this course.
This project is composed of two parts and is correspondingly evaluated with two dif-
ferent grades. Task 1, 2, 3 and 4 are mandatory in the project, and students will get grade 4
after completing the tasks. For getting a higher grade (grade 5), students should complete
one of the optional assignments among three choices: Task 5, or Task 6 & 7, or Task 8.
1
Contents
1 Introduction 3
2 Mini-MIPS specification 3
2.1 Instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 System structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Bus protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Using SPIM for the Mini-MIPS project 6
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Memory map and initialization . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Loading program data into the VHDL model . . . . . . . . . . . . . . . . 7
3.4 Virtual and bare machine . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Compulsory tasks 8
4.1 Task 1: Single cycle behavioural specification . . . . . . . . . . . . . . . 8
4.2 Task 2: Five stage pipeline implementation . . . . . . . . . . . . . . . . 8
4.3 Task 3: ASIC synthesis and place & route . . . . . . . . . . . . . . . . . 10
4.4 Task 4: FPGA verification . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Overview of design files 11
6 Optional tasks 12
6.1 Task 5: Adding a console I/O peripheral . . . . . . . . . . . . . . . . . . 12
6.2 Task 6: Adding a memory interface . . . . . . . . . . . . . . . . . . . . 12
6.3 Task 7: Adding an instruction- and/or data cache . . . . . . . . . . . . . 12
6.4 Task 8: Using the MipsIt GCC C Compiler . . . . . . . . . . . . . . . . 12
2
1 Introduction
Mini-MIPS is a 32-bit RISC with a simple instruction set similar to the MIPS instruc-
tion subset considered in chapter 4 in the textbook: D.A. Patterson & J.L. Hennessy,
“Computer Organization and Design - The Hardware/Software Interface, fourth edition”
Morgan Kaufmann, 2008 (Chapters 5 and 6 in the third edition). The Mini-MIPS 1.0
implements a true subset of the MIPS I instruction set architecture used in the MIPS
R2000. Like the real MIPS, the Mini-MIPS CPU operates with separate instruction and
data memories and has a 5-stage pipeline with register forwarding, delayed branches and
delayed load.
You are given VHDL description of a Mini-MIPS system consisting of Mini-MIPS
CPU template (which you will have to complete), a clock generator, and two instances
of a memory entity: the instruction memory and the data memory. Your tasks will be to
specify (task 1), design and implement in VHDL (task 2), synthesize (task 3), and verify
(task 4) the Mini-MIPS CPU, simulate the system and run small test programs to verify
the functionalities. You may also choose to do any of the optional tasks (task 5, 6, 7, 8) in
order to get a higher grade.
We use the PC-version of SPIM to develop assembly language programs and to trans-
late these into binary machine code. The VHDL model of the memory entity is capable
of extracting the binary image of a program and its data from a log-file saved from SPIM
and load it into the memory entity. In this way you can execute programs developed using
SPIM on your VHDL model of the Mini-MIPS system.
2 Mini-MIPS specification
2.1 Instruction set
Mini-MIPS is a 32-bit RISC processor with 32-bit instructions and 32-bit data. The in-
struction and data memory address space is 232 bytes, and only word-aligned addressing
is supported in the Mini-MIPS instruction subset. Mini-MIPS has 32 general purpose 32-
bit registers R[0:31]. Two registers are special: R[0] contains the constant 0 (hardwired)
and writes are ignored; R[31] is used to hold return addresses in case of procedure calls
(the jal-instruction). The Mini-MIPS instruction formats are defined in Fig. 1 and the
instruction set is defined in Table 1.
2.2 System structure
The Mini-MIPS CPU uses separate instruction and data memories. This basic configu-
ration is shown in Fig. 2(a). The separate instruction and data memory interfaces of the
Mini-MIPS CPU provide significant freedom in designing the memory system. Fig. 2(b)
shows a simple low-cost system with only one single main memory module, and figure
3(c) shows a high performance system with separate instruction and data caches and a
single main memory module. Notice also that the main memory bus used in Fig. 2(b) and
(c) can be different from the one used by the Mini-MIPS CPU.
3
Opcode Rs Rt Rd Shamt FunctionR-type
31 26 25 021 20 16 15 11 10 6 5
Opcode Rs Rt Immediate/OffsetI-type
Opcode TargetJ-type
Figure 1: Mini-MIPS instruction formats.
(a)
Mini-
MIPS
I-Mem
D-Mem
Mini-
MIPSMem
Bus Interface
(b)
(c)
Mini-
MIPSMem
Bus InterfaceI-Cache
D-Cache
Figure 2: Different Mini-MIPS system configurations.
2.3 Bus protocol
Both data and instruction memory interfaces of the Mini-MIPS use the same protocol.
This implies that I-Mem and D-Mem in Fig. 2(a) can be modelled as instances of the
same VHDL entity, with I-Mem initialized with instructions and D-Mem with data. Fig. 3
shows signals in the memory interface and Fig. 4 shows the protocol. The interfaces are
synchronous and only two transactions are provided: Write Word and Read Word. Also,
the CPU indicates on a clock cycle basis if it is using the bus or not. Signals REQ and
RW are used for this indication (and they are valid just after the rising edge of the clock
and throughout the clock period).
As there are multiple drivers on the DATA bus, different data sources need to be
properly connected to avoid conflicts. Usually, three kinds of the bus connections are
available: OR-, MUX-, and Tristate-bus. Here in this project, OR-bus is chosen due to the
smallest hardware footprint and availability (tristate buffers are not always available, such
as in FPGA).
HOLD signal in the memory interface is raised when memory needs more than one
clock cycle to perform the read or write transaction. In case of a write operation, the
Mini-MIPS should continuously drive the bus as shown in Fig. 4.
4
Table 1: Mini-MIPS instruction set (a true subset of the MIPS R2000 instruction set).
α = 0 for the single cycle Task 1 version, α = 4 for the pipelined Task 2 version. ‘&’
indicates bit-string concatenation; ‘s()’ represents data sign extension.
Inst. 31-26 25-21 20-16 15-11 10-6 5-0 Semantics
Arithmetic
addu X“00” R[s] R[t] R[d] X“00” X“21” R[d] = R[s] + R[t]
addiu X“09” R[s] R[t] Imm R[t] = R[s] + s(Imm)
subu X“00” R[s] R[t] R[d] X“00” X“23” R[d] = R[s] - R[t]
multu X“00” R[s] R[t] X“00” X“00” X“19” LO = ((R[s] * R[t]) ≪ 32) ≫ 32
HI = (R[s] * R[t]) ≫ 32
Logical
and X“00” R[s] R[t] R[d] X“00” X“24” R[d] = R[s] AND R[t]
or X“00” R[s] R[t] R[d] X“00” X“25” R[d] = R[s] OR R[t]
xor X“00” R[s] R[t] R[d] X“00” X“26” R[d] = R[s] XOR R[t]
sll X“00” X“00” R[t] R[d] Shamt X“00” R[d] = R[t] ≪ Shamt (logical)
srl X“00” X“00” R[t] R[d] Shamt X“02” R[d] = R[t] ≫ Shamt (logical)
sra X“00” X“00” R[t] R[d] Shamt X“03” R[d] = R[t] ≫ Shamt (arithmetic)
slt X“00” R[s] R[t] R[d] X“00” X“2A” R[d] = if (R[s] < R[t]) (signed)
then 1D
else 0D
sltu X“00” R[s] R[t] R[d] X“00” X“2B” R[d] = if (R[s] < R[t]) (unsigned)
then 1D
else 0D
Data Transfer
mfhi X“00” X“00” X“00” R[d] X“00” X“10” R[d] = HI
mflo X“00” X“00” X“00” R[d] X“00” X“12” R[d] = LO
lui X“0F” X“00” R[t] Imm R[t] = Imm & X“0000”
lw X“23” X“00” R[t] Offset R[t] = Mem[R[s] + s(Offset)]
sw X“2B” X“00” R[t] Offset Mem[R[s] + s(Offset)] = R[t]
Unconditional jump
j X“02” Target PC = (PC+ α)[31:28] & Target[25:0] & “00”
jal X“03” Target R[31] = PC+ 4+ α
PC = (PC+ α)[31:28] & Target[25:0] & “00”
jr X“00” R[s] X“00” X“00” X“00” X“08” PC = R[s]
Conditional branch
beq X“04” R[s] R[t] Offset PC = if (R[s] == R[t])
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
bne X“05” R[s] R[t] Offset PC = if (R[s] 6= R[t])
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
5
REQ
RW
Hold
CLK CLK
Addr
Data
Mini-
MIPSMemory
REQ RW Description
0 X Bus not used
11 Read transaction (load word)
0 Write transaction (store word)
Figure 3: Mini-MIPS memory interface (used for both I-Mem and D-Mem).
R
1
R
2
R
3
W
5W4
Addr3 Addr4 Addr5Addr1 Addr2
CLK
REQ
RW
Hold
Addr
Data
Figure 4: Timing diagram showing read and write transactions.
3 Using SPIM for the Mini-MIPS project
3.1 Introduction
We use SPIM to develop programs for the Mini-MIPS project, i.e. to simulate and debug
the programs, and to translate the symbolic assembly language code into binary machine
code, which is what the real hardware as well as the VHDL models of the Mini-MIPS can
execute. PC-SPIM is introduced in appendix A of the Patterson & Hennessy textbook as
well as in sections “Software” and “Tutorials” on the companion CD.
3.2 Memory map and initialization
The Mini-MIPS implementation conforms to the memory usage conventions of SPIM
and the real MIPS (P&H "COD" 3e, figures 2.17 and A.5.1): A user program must start
at address 0x00400000 (the text-segment). Similarly the data-segment starts at address
0x10000000, and by default SPIM will place data at address 0x10010000 unless the .sdata
or .rdata directive is followed by an address.
The VHDL code modelling of the memory unit contains data structures which popu-
late the following 3 fragments of the address space:
• 0x00000000 − 0x00000017 (Initialization and jump to 0x00400000)
• 0x00400000 − 0x00401ffc (Text segment, program starts at 0x00400000)
• 0x10010000 − 0x10011ffc (Data segment, only dynamic data and stack)
6
At reset the program counter is set to 0x00000000, and the initialization code in the
VHDL model sets up the stack pointer ($sp) & global pointer ($gp) and jumps to address
0x00400000 where your program starts. In the real MIPS as well as in SPIM, the top of the
stack will start at address 0x7fffeffc, and the $sp (R[29]) will be initialized accordingly.
Similarly, $gp (R[28]) is initialized to point to 0x10008000; but this is only used by a
compiler and you may forget all about it for now. As the VHDL model of the memory
populates only a small fraction of the address space, $sp is initialized to 0x10011ffc in
the VHDL model. If your test program does not depend on these initializations, you can
skip the initialization code and simply set the program counter to 0x00400000 and start
your program.
In a similar way SPIM may load an exception program (along with your test program)
which performs similar initializations and jumps to 0x00400000 starting your program.
The exception handler does much more than initializing the above registers, and it is
quite complex. Therefore it is recommended that you do not load it, i.e., deselect “Load
exceptions file” in the settings menu, and remember to manually set the program counter
to 0x0040000 before every simulation run.
3.3 Loading program data into the VHDL model
The VHDL model of the memory entity is capable of: (i) extracting the binary image of a
program and its data from a log-file saved from SPIM, and (ii) loading it into the memory.
In this way you can execute programs developed using SPIM on your VHDL model of
the Mini-MIPS system.
In practice this is handled as follows. Just after opening the assembler source file in
SPIM you save a log file (File-menu, “Save Log File”). Then you copy the log file to your
ModelSim project directory (where the .mpf and .vhd files are), and edit the filename
in the configuration (in test1.vhd or test2.vhd) to conform with the name of the log file.
Make sure that the first word of data at 0x10010000 is non-zero, or the .log file loader
will fail.
As explained later we provide source code for two test programs, and for these we
also provide log files generated with the “Load exceptions file” setting disabled.
3.4 Virtual and bare machine
SPIM supports both the virtual machine and the bare machine instruction set. SPIM is
capable of expanding pseudo-instructions but it is not able to reorder instructions.
When simulating the virtual machine described in Appendix A of the textbook, enable
the settings: “Allow pseudo-instructions”, “Mapped I/O”; and disable “Bare machine”,
“Delayed branches” and “Delayed loads”. This is how you develop test programs for the
single-cycle Mini-MIPS in Task 1.
When simulating the pipelined bare machine, i.e. test programs for the pipelined
Mini-MIPS in Task 2, “Bare machine”, “Delayed branches” and “Delayed load” must
be checked. Note that in this mode, the assembly programs must be written to account
for the branch delay slot and the load-use data hazards, as SPIM is not able to reorder
(reschedule) instructions by itself. Beware of placing pseudo-instructions in the branch
delay slot, since they may be expanded into multiple instructions. The “quick and dirty”
way to do this is to manually insert a nop instruction after every load, branch & jump in-
struction in the assembly source. Optimally the program instructions should be reordered
7
to place useful instructions in the branch delay slots and after load instructions. Verify
that the final program runs OK with “Delayed branches” and “Delayed load” enabled in
SPIM, before running the program on your pipelined Mini-MIPS.
4 Compulsory tasks
4.1 Task 1: Single cycle behavioural specification
Your first task will be to write a top-level behavioural specification of the Mini-MIPS CPU
(as abstract, short and precise as possible). You will be given a test bench corresponding
to the system structure shown in Fig. 2(a). The CPU architecture body includes the data-
and instruction memories in the form of two arrays of 32-bit words. The “memories” are
initialized by reading a test program from a text file. The filename is a generic and it
is specified in the configuration for the test bench. This makes it straightforward to run
different programs on the VHDL model of the Mini-MIPS.
The only thing missing in this model is the VHDL code specifying the behaviour of
the CPU entity. The specification should be a single cycle model (i.e. without pipelining,
delayed branch, delayed load, and register forwarding). An overview of the files given to
you is found in Section 5.
We provide you with two test programs, one for computing Fibonacci numbers and
one for computing square-roots. These two programs will NOT test the CPU completely.
To test all instructions in the instruction set you will need to develop additional test pro-
grams. Try to use each instruction in a few different ways so that all parts of the CPU get
tested.
At reset your Mini-MIPS should start fetching instructions from address 0x00000000
which contains a small start-up program. You can bypass this for debugging purposes and
start from 0x00400000 if you need to.
A) Write a behavioural specification of the Mini-MIPS CPU and run the two test pro-
grams in the VHDL simulator. NB: Make sure to select VHDL-93 in the simulator.
B) Write a test program that tests all instructions in the Mini-MIPS instruction set.
Compare the behaviour of your Mini-MIPS with SPIM.
4.2 Task 2: Five stage pipeline implementation
Your next task is to make an implementation-like behavioural model of the Mini-MIPS
CPU with a 5-stage pipeline (IF, ID, EX, MEM, WB), and the necessary register forward-
ing needed to resolve data hazards in the pipeline. Your CPU should work with external
memory modules operating according to the timing diagram in Fig. 4. Do not forget to
support the HOLD signal. You will be given a VHDL-model of a memory module, and
a test bench instantiating a system composed of a clock generator, your CPU, and two
memories. The memories are loaded from text files specified in the configuration.
Unlike the example in chapter 6 in the textbook, your Mini-MIPS implementation
must have a branch delay slot of only one instruction, similar to the MIPS I ISA’s “de-
layed branches”. This means that calculation of branch conditions and updating of the
PC should be done in the ID-stage of the pipeline. This implies that register forwarding
8
Figure 5: Structuring your VHDL model into processes. Process P1 contains both com-
binational and sequential logic, all outputs are registered. P2comb is pure combinational
logic. P2reg is a sequential process which models a register (using flop-flops).
must also be handled by the ID-stage, which again implies that the ID-stage always de-
livers the proper register operands to the EX-stage. Consequently, the register forwarding
mechanism can be implemented by multiplexers just after the outputs of the register file.
Finally we adopt the “delayed load” MIPS I ISA programming convention that no
instruction will immediately access a register that is loaded by preceding lw instruction.
With two memories, delayed branches, and delayed load, it is now possible to resolve all
remaining hazards with a forwarding unit.
A) Draw a block diagram of your 5-stage Mini-MIPS pipeline (similar to the figures
in chapter 6 of the textbook 3e, for example figure 6.36 or 6.41). If you make this
detailed, nice, and clear it will save you hours of VHDL code debugging.
B) Write a behavioural implementation-like VHDL description of the pipelined Mini-
MIPS. You should organize your description into a number of processes executing
concurrently within a single architecture body. You might want to describe larger
components (ALU, Register file) as separate entities and instantiate them as com-
ponents in the main architecture body of the CPU, as this makes testing easier.
Hints: Each pipeline stage in the Mini-MIPS CPU performs a number of indepen-
dent computations and you can model each of these as separate processes. Fig. 5
illustrates this idea. If the input signals to a register are needed elsewhere in the de-
sign, for example to implement forwarding, these signals must be declared and the
pipeline stage fragment is modelled as a combinational and a sequential part. NB:
The VHDL coding style used in Fig. 5 is synthesizable. There is a branch delay
slot after unconditional jump and conditional branch instructions, i.e. bxx, bxxx, j,
jr, jal, jalr instructions.
9
UART Lite
PLB bus 4.6
MicroBlaze
PC
Xilinx XUP Virtex-II Pro development board
PLB bus driver & User logics
Branch
IF/ID MEM/WB
PC
Register
ID/EXE
... ALU
PGM
...
DM
EXE/MEM
PA
PBDebugging
data path
Figure 6: An illustration of the test set-up for Mini-MIPS using Xilinx MicroBlaze.
C) Develop additional test programs to test register forwarding from all stages, delayed
load and delayed branches. Make sure that all data paths in the CPU are tested.
4.3 Task 3: ASIC synthesis and place & route
Synthesize and Place & Route Mini-MIPS in standard ASIC design flow using 130nm
low power CMOS cells. Make sure the CPU is correctly implemented by doing a back-
annotated timing simulation. You should also determine the longest combinational path
and maximum clock frequency of your Mini-MIPS CPU, as well as the hardware re-
sources used.
4.4 Task 4: FPGA verification
Synthesize and Place & Route Mini-MIPS towards FPGA implementation in Xilinx ISE,
and verify the functionality using Xilinx XUP Virtex-II Pro development board.
Hints: In order to send data into and read data out from the Mini-MIPS, you should
design a proper verification environment around the CPU to ease the tests. One simple
approach is to use a master processor to drive the Mini-MIPS. You might for instance
build a testbed in Xilinx EDK, where attach the Mini-MIPS as a slave processor and a
RS232 module as a communication interface to the common PLB data bus, and drive
the bus by the Xilinx MicoBlaze microprocessor. The RS232 interface can be used to
transfer data between the MicroBlaze and external PC, where MicroBlaze further sends
data in and out from the Mini-MIPS. The block diagram of this test set-up is illustrated in
Fig. 6.
10
5 Overview of design files
The following files are available. You can download them from course homepage.
types.vhd: defines common types for the rest of the modules.
clock.vhd: defines the clock generator.
Task 1
cpu1.vhd: defines the CPU entity. It includes the necessary memory initializations.
Your task is to fill in VHDL code to model the behaviour of a single-cycle Mini-MIPS
CPU. The entity cpu1 has a generic parameter called programname. This parameter is
used to specify which program is executed by the CPU. The file containing the program
is obtained by loading an assembly file (for example fib1.s) into the SPIM simulator and
then saving the log file. The log file obtained has all the information necessary for the
CPU to run the program.
mem1.vhd: defines the memory package. It includes procedures for accessing the
instruction and data memory.
test1.vhd: defines the test bench. It instantiates two components: A clock generator
and a CPU (cpu1). Two different configurations are given. testfib is used to execute the
Fibonacci test program and testsqrt is used to execute the square root test program.
fib1.s & fib1.log: the source code and the log file for the Fibonacci test program.
sqrt1.s & sqrt1.log: the source code and the log file for the square root test program.
Task 2
cpu2.vhd: defines the CPU entity with ports to the external memory components.
The body of the entity has been stripped off, it is your task to develop a 5-stage pipelined
version of the CPU.
mem2.vhd: defines the memory entity. Note that the entity has two generic parame-
ters: filename that defines the name of a binary memory image to load; and n that defines
the number of hold cycles (wait states) that the memory instance inserts at each access.
By setting this parameter larger than zero, you can test that your CPU model responds
correctly to the hold signal.
test2.vhd: defines the test bench. Notice how PORT MAP statements are used to
connect instances of the clock, cpu, and memory entities. Two different configurations
are given. testfib is used to execute the Fibonacci test program and testsqrt is used to
execute the square root test program. Notice how wait states are inserted for data memory
accesses. Notice also that the same binary image is loaded into the instruction memory
and the data memory. The reason for not separating instructions and data is that the test
programs can then also be used to test the the system configurations shown in figures 3(b)
and 3(c).
fib2.s & fib2.log: the source code and the log file for the Fibonacci test program. fib2
differs from fib1 by using delayed branches.
sqrt2.s & sqrt2.log: the source code and the log file for the square root test program.
sqrt2 differs from sqrt1 by using delayed branches.
11
6 Optional tasks
This section specifies a number of optional tasks. You will need to complete at least one
group of tasks (Task 6 and 7 belong to one group) in order to acquire a higher grade.
6.1 Task 5: Adding a console I/O peripheral
The Mini-MIPS is quite useless without any I/O communication with the external world.
In this task you need to add an I/O bus interface into the Mini-MIPS, and attach at least one
peripheral module onto the bus, such as the I/O switches and LEDs, RS232 interface, etc.
The console I/O peripheral can be connected to the data memory interface of the Mini-
MIPS, with suitable address decoding for the memory-mapped registers. For example,
you can allocate a range of addresses in the memory space for the peripherals, and use
an I/O console to do the proper address decoding and to manage the operation of the
peripherals. You may also consider to add an interrupt controller to further improve the
usability of the Mini-MIPS. For example, when connecting a timer to the I/O bus, CPU
needs to be interrupted upon the timer event.
PCSpim is able to simulate the console hardware (Window->Console), so you can test
simple console programs. Four hardware registers are memory-mapped to the address
space 0xFFFF0000 ∼ 0xFFFF000F (16 bytes):
0xFFFF0000 Receiver control register. Bit 0: ready, Bit 1: keyboard interrupt
enable.
0xFFFF0004 Receiver data register: Lower 8 bits: last character typed.
0xFFFF0008 Transmitter control register: Bit 0: ready, Bit 1: transmit interrupt
enable.
0xFFFF000C Transmitter data register: Lower 8 bits: character to be sent.
6.2 Task 6: Adding a memory interface
Design a bus interface and implement the system structure shown in Fig. 2(b) that uses
only one memory module. This may be considered as the Von Neumann model. Create an
entity called minimips3 that connects the CPU from task 2 to the bus interface entity. Hint:
You’ll need to be able to stall the CPU, as memory in this set-up holds both instructions
and data.
6.3 Task 7: Adding an instruction and/or data cache
Add an instruction or data cache between the CPU and memory bus interface as illustrated
in Fig. 2(c). Start your design with simple cache configurations, such as a direct-mapped
write-through cache, and gradually increase the hardware complexity.
6.4 Task 8: Using the MipsIt GCC C Compiler
MipsIt is a simplified GCC C cross-compiler for MIPS, with a Windows interface. By
extending the instruction set to include the instructions listed in Table 2 and Table 3, it
is possible to compile C-programs and execute them on the Mini-MIPS. The following
12
describes how to generate an assembly language file for SPIM. MipsIt software can be
downloaded from the course homepage.
Locate and run “MipsIt.exe” in the bin directory of the place MipsIt is installed. To
begin select “File->New” in the menu and create a new “C(minimal)/Assembler” Project.
Create a new C file (or add an existing). When the C file is opened, select “Build->View
Assembler” to compile the C program into readable MIPS assembly code. Save the as-
sembler code in a .s file and load it in SPIM. Note that GCC does not know about delayed
load and delayed branches, so to run the code in SPIM, “delayed load” and “delayed
branches” must be disabled. To use the GCC output with the pipelined Task 2 Mini-
MIPS, you must manually modify the assembly code as described above, e.g. reschedule
your code or insert nop instruction to utilize the delayed slot. Other guidelines for using
GCC with Mini-MIPS are listed below:
• No C libraries are available for Mini-MIPS, so you’ll have to do without <stdio.h>
and so on.
• Since the Mini-MIPS does not include instructions for loading and storing bytes
and half-words, the char and short data types cannot be used, just use the int type
for everything.
• Mini-MIPS does not include the div instructions, so the C operators ‘/’ and ‘%’
should only be used with constants.
• Q: What is _main()? it doesn’t exist!
A: The MipsIt C compiler will insert a call to _main() to do some initialization
before running your program. Easy fix: define it as an empty function:void _main(void) {}• Q: Why does my program (e.g. sqrt1.s) read from the stack?
A: PCSpim inserts some C startup code before your program, just let it run and your
program will eventually start at address 0x00400024. To disable this, change .text
in “sqrt1.s” to .text 0x00400000 and the startup code will not be there.
13
Table 2: Extended Mini-MIPS instruction set supporting the MipsIt GCC C compiler.
Extended instructions are shown in bold. α = 4 for the pipelined processor. ‘&’ indicates
bit-string concatenation; ‘s()’ represents signed extension; ‘us()’ represents unsigned ex-
tension.
Inst. 31-26 25-21 20-16 15-11 10-6 5-0 Semantics
Arithmetic
addu X“00” R[s] R[t] R[d] X“00” X“21” R[d] = R[s] + R[t]
addiu X“09” R[s] R[t] Imm R[t] = R[s] + s(Imm)
subu X“00” R[s] R[t] R[d] X“00” X“23” R[d] = R[s] - R[t]
multu X“00” R[s] R[t] X“00” X“00” X“19” LO = ((R[s] * R[t]) ≪ 32) ≫ 32
HI = (R[s] * R[t]) ≫ 32
Logical
and X“00” R[s] R[t] R[d] X“00” X“24” R[d] = R[s] AND R[t]
or X“00” R[s] R[t] R[d] X“00” X“25” R[d] = R[s] OR R[t]
xor X“00” R[s] R[t] R[d] X“00” X“26” R[d] = R[s] XOR R[t]
sll X“00” X“00” R[t] R[d] Shamt X“00” R[d] = R[t] ≪ Shamt (logical)
srl X“00” X“00” R[t] R[d] Shamt X“02” R[d] = R[t] ≫ Shamt (logical)
sra X“00” X“00” R[t] R[d] Shamt X“03” R[d] = R[t] ≫ Shamt (arithmetic)
slt X“00” R[s] R[t] R[d] X“00” X“2A” R[d] = if (R[s] < R[t]) (signed)
then 1D
else 0D
sltu X“00” R[s] R[t] R[d] X“00” X“2B” R[d] = if (R[s] < R[t]) (unsigned)
then 1D
else 0D
nor X“00” R[s] R[t] R[d] X“00” X“27” R[d] = R[s] NOR R[t]
andi X“0C” R[s] R[t] Imm R[d] = R[s] AND us(Imm)
ori X“0D” R[s] R[t] Imm R[d] = R[s] OR us(Imm)
xori X“0E” R[s] R[t] Imm R[d] = R[s] XOR us(Imm)
sllv X“00” R[s] R[t] R[d] X“00” X“04” R[d] = R[t] ≪ R[s][4:0] (logical)
srlv X“00” R[s] R[t] R[d] X“00” X“06” R[d] = R[t] ≫ R[s][4:0] (logical)
srav X“00” R[s] R[t] R[d] X“00” X“07” R[d] = R[t] ≫ R[s][4:0] (arithmetic)
slti X“0A” R[s] R[t] Imm R[d] = if (R[s] < s(Imm)) (signed)
then 1D
else 0D
sltiu X“0B” R[s] R[t] Imm R[d] = if (R[s] < s(Imm)) (unsigned)
then 1D
else 0D
14
Table 3: Extended Mini-MIPS instruction set supporting the MipsIt GCC C compiler,
continued. Extended instructions are shown in bold. α = 4 for the pipelined processor.
‘&’ indicates bit-string concatenation; ‘s()’ represents signed extension; ‘us()’ represents
unsigned extension.
Inst. 31-26 25-21 20-16 15-11 10-6 5-0 Semantics
Data Transfer
mfhi X“00” X“00” X“00” R[d] X“00” X“10” R[d] = HI
mflo X“00” X“00” X“00” R[d] X“00” X“12” R[d] = LO
lui X“0F” X“00” R[t] Imm R[t] = Imm & X“0000”
lw X“23” X“00” R[t] Offset R[t] = Mem[R[s] + s(Offset)]
sw X“2B” X“00” R[t] Offset Mem[R[s] + s(Offset)] = R[t]
Unconditional jump
j X“02” Target PC = (PC+ α)[31:28] & Target[25:0] & “00”
jal X“03” Target R[31] = PC+ 4+ α
PC = (PC+ α)[31:28] & Target[25:0] & “00”
jr X“00” R[s] X“00” X“00” X“00” X“08” PC = R[s]
jalr X“00” R[s] X“00” R[d] X“00” X“09” R[d] = PC+ 4+ α (R[d] is usually R[31])
PC = R[s] (R[s] and R[d] must be different)
Conditional branch
beq X“04” R[s] R[t] Offset PC = if (R[s] == R[t])
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
bne X“05” R[s] R[t] Offset PC = if (R[s] 6= R[t])
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
bltz X“01” R[s] X“00” Offset PC = if (R[s] < 0)
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
bgez X“01” R[s] X“01” Offset PC = if (R[s] >= 0)
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
blez X“06” R[s] X“00” Offset PC = if (R[s] <= 0)
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
bgtz X“07” R[s] X“00” Offset PC = if (R[s] > 0)
then (PC+ α+(s(Offset) ≪ 2))
else (PC+ 4)
15