Upload
chouba-nabil
View
3.245
Download
1
Embed Size (px)
DESCRIPTION
elementary µprocessor
Citation preview
elementary µprocessorNabil Chouba
- Scientists felt that IC was not yet ready to support computer on a chip
- Japan Busicom & Intel : design a calculator 12 chip.- Ted Hoff decided that Intel could build it in one chip:
- CPU, ROM and RAM- Ted Hoff designed the architecture- Federico Faggin the logic and physical design (next found Zilog)- Stan wrote the application programs
=> The first commercially microprocessor : Intel 4004
4-bit CPU, 16bit instructions, 12bit PC, 2200 transistors
The Beginning : 1969
Instruction set
Opcode Address 15 8 7 0
ADD Address
XOR Address
STORE Address
LOAD Address
JUMP Address
JUMPZ Address
Our elementary processor :
Machine coding 00000001a7a6a5a4a3a2a1a0 ACC ← ACC + Mem[Address]
ACC ← ACC xor Mem[Address]Mem[Address] ← ACCACC ← Mem[Address] PC ← Address
If ACC = 0 then PC← Address
00000010a7a6a5a4a3a2a1a0
00000011a7a6a5a4a3a2a1a0
00000100a7a6a5a4a3a2a1a0
00000101a7a6a5a4a3a2a1a0
Assembly language Functionality description
00000110a7a6a5a4a3a2a1a0
Instruction register IR:
PC
MAR
MDR
AC
C
IR
AL
U
Memory
A (address)
D (dataIn)
Q (dataOut)
Datapath : Main Components
PC
MAR
MDR
AC
C
+1
mux
AL
U
Memory
A (address)
D (dataIn)
Q (dataOut)
Datapath : JUMP, Next inst
JUMP : PC ← Address
Address Opcode 0 7 8 15
PC ← PC + 1IR
PC
MAR
MDR
AC
C
+1
mux
AL
U
Memory
A (address)
D (dataIn)
Q (dataOut)
Datapath : ADD,OR,LOAD
0 7 8 15
ADD ACC ← ACC + Mem[Address]OR ACC ← ACC xor Mem[Address]
LOAD ACC ← Mem[Address]
IR
mux
= Mem[Adrs]
PC
MAR
MDR
AC
C
IR
+1
mux
AL
U
Memory
A (address)
D (dataIn)
Q (dataOut)
Datapath : STORESTORE Mem[Address] ← ACC
PC
MAR
MDR
AC
C
+1
mux
AL
U
A (address)
D
Q (dataOut)
Datapath Fetch inst : Get instruction form the RAM
Van Neumann architectureRAM : data & Instruction
0: ADD 5…5: 520
1
MDR <= Mem[MAR]2
ADD 5
0
0
MAR <= PC1
2
IR
3
IR <= MDR3 ADD 5
PC
MAR
MDR
AC
C
+1
mux
mux
A (address)
D
Q (dataOut)
Datapath
AL
U
Fetch Data : Get Data form the RAM
Adrs : 5 ADD 0 7 8 15
IR
MDR
0: ADD 5…5: 520
520
2
5
1
MAR<=IR1
MDR <= Mem[MAR]2
PC
MAR
MDR
AC
C
+1
mux
mux
AL
U
CTR
loadMDR
loadACC
loadPC
loadMARopALU
muxPC
muxMAR
Memory
A (address)
D (dataIn)
Q (dataOut)
R/W
Datapath : Control Unit
muxACC
Address opcode 0 7 8 15
IR Opcode
loadIR
Zero Z
ALU
+
xor
muxa
b
Result
b
Top Level RTL view Netlist: Gate view
ALU can be implemented using :Standard Cell or hard macro
Memory : SRAM
Sense Amplifier
Pinout: A : address D : data input Q : data output R/W : Read / write
Feature : - Row address decoder - Sense amplifies - Column decoder/mux
Fetch Next stage1
MAR ←PCPC ←PC+1
IR ←MDR
Fetch Next stage3
Decodeinstruction
decode (IR[15:8])MAR ←IR
Mem[MAR] ←MDR
ExecuteADD
ACC ←ACC + MDR
ExecuteXOR
ACC ←ACC OR MDR
ExecuteLOAD
ACC ←MDR
rst CTR: state machine Fetch Next
stage2
MDR ←Mem[MAR]
Execute
LOAD MDR
MDR ←Mem[MAR]
Opcode= ADD
Execute
LOAD MDR
MDR ←Mem[MAR]
Opcode= OR
Execute
LOAD MDR
MDR ←Mem[MAR]
Opcode= LOAD
ExecuteSTORE
Opcode= STORE
ExecuteJUMP
PC ←IR
Opcode= JUMP or JUMPZ & Z
Opcode= JUMPZ & !Z
Fetch_1
muxMAR ←’0’ muxPC ←’0’loadPC ←’1’loadMAR ←’1’
Fetch_2
R/W ←’0’loadMDR ←’1’
Fetch_2
loadIR ←’1’
Decode muxMAR ←’1’ loadMAR ←’1’
ExecADD_1
R/W ←’0’loadMDR ←’1’
ExecADD_2
loadACC ←’1’muxACC ←’0’opALU ←’1’
ExecOR_1
R/W ←’0’loadMDR ←’1’
ExecOR_2
loadACC ←’1’muxACC ←’0’opALU ←’0’
ExecLoad_1
R/W ←’0’loadMDR ←’1’
ExecLoad_1
loadACC ←’1’muxACC ←’1’
ExecStore_1
R/W ←’1’
ExecJump
muxPC ←’1’loadPC ←’1’
rst CTR: state machine
Opcode= ADD
Opcode= OR Opcode= LOAD
Opcode= STOREOpcode= JUMP or JUMPZ & Z
Opcode= JUMPZ & !Z
muxMAR ←’0’ muxPC ←’0’loadPC ←’1’loadMAR ←’1’
Fetch_2
R/W ←’0’loadMDR ←’1’
Fetch_2
loadIR ←’1’
Decode muxMAR ←’1’ loadMAR ←’1’
ExecADD_1
R/W ←’0’loadMDR ←’1’
ExecADD_2
loadACC ←’1’muxACC ←’0’opALU ←’1’
ExecOR_1
R/W ←’0’loadMDR ←’1’
ExecOR_2
loadACC ←’1’muxACC ←’0’opALU ←’0’
ExecLoad_1
R/W ←’0’loadMDR ←’1’
ExecLoad_1
loadACC ←’1’muxACC ←’1’
ExecStore_1
R/W ←’1’
ExecJump
muxPC ←’1’loadPC ←’1’
Fetch_1
VHDL – FSM: Next State Processing
--next state processing combinatory_FSM_next : process(state_reg,opcode,zflag) begin state_next<= state_reg; case state_reg is when Fetch_1 => state_next <= Fetch_2; when Fetch_2 => state_next <= Fetch_3; when Fetch_3 => state_next <= decode;
when ExecADD_1 => state_next <= ExecADD_2; when ExecADD_2 = state_next <= Fetch_1; when ExecOR_1 => state_next <= ExecOR_2; when ExecOR_2 => state_next <= Fetch_1; when ExecLoad_1 => state_next <= ExecLoad_2; when ExecLoad_2 => state_next <= Fetch_1; when ExecStore_1 => state_next <= Fetch_1; when ExecJump => state_next <= Fetch_1; when others => end case; end process;
when decode => case opcode is when op_add => state_next <= ExecADD_1; when op_or => state_next <= ExecOR_1; when op_load => state_next <= ExecLoad_1; when op_store => state_next <= ExecStore_1; when op_jump => state_next <= ExecJump; when op_jumpz => state_next <= Fetch_1; if zflag = '1' then state_next <= ExecJump; end if; when others => end case;
VHDL – FSM : Output Processing :
when ExecADD_1 => MemRW <= '0'; loadMDR<= '1';
when ExecADD_2 => loadACC <= '1'; muxACC <= '0'; opALU <= '1'; when ExecOR_1 => MemRW <= '0'; loadMDR<= '1';
when ExecOR_2 => loadACC <= '1'; muxACC <= '0'; opALU <= '0';
case state_reg is when Fetch_1 => muxMAR <= '0'; muxPC <= '0'; loadPC <= '1'; loadMAR<= '1';
when Fetch_2 => MemRW <= '0'; loadMDR<= '1';
when Fetch_3 => loadIR<= '1';
when decode => muxMAR <= '1'; loadMAR <= '1';
--output processing combinatory_outproc: process(state_reg) begin --defaut value muxPC <= '0'; muxMAR <= '0'; muxACC <= '0'; loadMAR <= '0'; loadPC <= '0'; loadACC <= '0'; loadMDR <= '0'; loadIR <= '0'; opALU <= '0'; MemRW <= '0';
when ExecLoad_1 =>
MemRW <= '0'; loadMDR<= '1';
when ExecLoad_2 => loadACC <= '1'; muxACC <= '1'; when ExecStore_1 => MemRW <= '1';
when ExecJump => muxPC <= '1'; loadPC <= '1';
when others => end case; end process;
Synthesis– CTR – Gate view
* synthesis done by Synplify tool
VHDL – Datapathu_alu : alu port map ( A => MDR_reg, B => ACC_reg,
opALU => opALU, Rout => ALU_out);
MDR_next <= MemQ when loadMDR = '1' else MDR_reg;
IR_next <= MDR_reg when loadIR = '1' else IR_reg;
ACC_next <= MDR_reg when loadACC = '1' and muxACC= '1' else ALU_out when loadACC = '1' and muxACC= '0' else
ACC_reg;
VHDL – Datapath
PC_next <= IR_reg(15 downto 8) when loadPC = '1' and muxPC= '1' else PC_reg + 1 when loadPC = '1' and muxPC= '0' else
PC_reg;
MAR_next <= IR_reg(15 downto 8) when loadMAR = '1' and muxMAR= '1' else PC_reg when loadMAR = '1' and muxMAR= '0' else
MAR_reg;
MemAddr <= MAR_reg; MemD <= ACC_reg; opCode <= IR_reg(7 downto 0);
zflag <= '1' when ACC_reg = "0000000000000000" else '0';
Synthesis– Datapath – RTL view
* synthesis done by Synplify tool
ACC_reg & alu
MDR_reg & alu
zflag
MAR_reg
PC_reg (+1) IR_reg
Synthesis – Datapath – Gate view
* synthesis done by Synplify tool
u_ctr : ctr port map (clk => clk, rst => rst, zflag => zflag, opCode => opCode, muxPC => muxPC, muxMAR => muxMAR, muxACC => muxACC, loadMAR=> loadMAR, loadPC => loadPC, loadACC=> loadACC, loadMDR=> loadMDR, loadIR => loadIR, opALU => opALU, MemRW => MemRW);
U_Memory: ram generic map ( d_width => 16, addr_width => 8, mem_depth => 256) PORT MAP ( clk => clk, we =>MemRW , d =>MemD , q =>MemQ , addr=>unsigned(MemAddr));
u_datapath : datapath port map (clk => clk, rst => rst, opCode => opCode, muxPC => muxPC, muxMAR => muxMAR, muxACC => muxACC, loadMAR=> loadMAR, loadPC => loadPC, loadACC=> loadACC, loadMDR=> loadMDR, loadIR => loadIR, zflag => zflag, opALU => opALU, MemAddr =>MemAddr, MemD =>MemD, MemQ =>MemQ);
VHDL – µpros : Top level
Synthesis – µpros : Top level
* synthesis done by Synplify tool
VHDL – ALU
entity alu is Port ( A : in STD_LOGIC_VECTOR (15 downto 0); B : in STD_LOGIC_VECTOR (15 downto 0); opALU : in STD_LOGIC; Rout : out STD_LOGIC_VECTOR (15 downto 0));end alu;
architecture RTL of alu isbeginRout <= A + B when opALU = '1' else A xor B;end RTL;
Synthesis – ALU – RTL view
VHDL – RAM – coding stylebegin wr_port : process ( clk ) begin if (clk'event and clk = '1') then if ( we = '1') then mem(conv_integer(addr)) <= d; end if; end if; end process wr_port ; q <= mem(conv_integer(addr)) ; end RAM_arch;
entity ram is generic( d_width : integer ; addr_width : integer ; mem_depth : integer ); port ( clk : in STD_LOGIC; we : in STD_LOGIC; d : in STD_LOGIC_VECTOR(d_width - 1 downto 0); q : out STD_LOGIC_VECTOR(d_width - 1 downto 0); addr : in unsigned(addr_width - 1 downto 0)); end ram; architecture RAM_arch of ram is
type mem_type is array (mem_depth - 1 downto 0) of STD_LOGIC_VECTOR (d_width - 1 downto 0); signal mem : mem_type ;
ASIC : (using specific generator) - .db file for synthesis - .hdl file for simulation
FPGA : - Already Supported by FPGA provider - Synthesis recognize specific coding style
Synthesis– RAM – Gate view
* synthesis done by Synplify tool
Synthesis – Resource Usage Report
Mapping to part: xc3s1500fg456-4Cell usage:• FDC 14 uses• FDCE 84 uses• FDP 1 use• GND 2 uses• MULT_AND 15 uses• MUXCY_L 22 uses• MUXF5 16 uses• RAM64X1S 64 uses• XORCY 23 uses• LUT2 1 use• LUT3 107 uses• LUT4 20 uses
I/O ports: 27
RAM/ROM usage summarySingle Port Rams (RAM64X1S): 64
Global Clock Buffers: 1 of 8 (12%)
Mapping Summary:Total LUTs: 384 (1%)
* synthesis done by Synplify tool
Synthesis – Worst Slack
Requested Estimated Requested Estimated Clock Starting Clock Frequency Frequency Period Period Slack --------------------------------------------------------------------------------------------micorPross|clk 201.7 MHz 171.4 MHz 4.958 5.833 -0.875 =======================================================
Starting Points with Worst Slacku_ctr.state_reg_fast[6] u_datapath.ACC_reg_fast[0] u_datapath.ACC_reg[1] u_datapath.MDR_reg[1] u_datapath.MDR_reg_fast[0] u_datapath.ACC_reg[2] u_datapath.MDR_reg[2] u_datapath.ACC_reg[3] u_datapath.MDR_reg[3] u_datapath.ACC_reg[4]
Ending Points with Worst Slacku_datapath.ACC_reg[15] u_datapath.ACC_reg[14]u_datapath.ACC_reg[13]u_datapath.ACC_reg[12]u_datapath.ACC_reg[11]u_ctr.state_reg[11]u_datapath.ACC_reg[10]u_datapath.ACC_reg[9]u_datapath.PC_reg[0]u_datapath.PC_reg[1]
* synthesis done by Synplify tool
Synthesis – Worst Path
Name Type Name Dir Delay Arrival Time Fan Out(s)------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
u_ctr.state_reg_fast[6] FDC Q Out 0.720 0.720 - opALU_fast Net - - 0.790 - 8 u_datapath.ACC_next_0_axb_0 LUT3 I0 In - 1.510 - u_datapath.ACC_next_0_axb_0 LUT3 O Out 0.579 2.089 - u_datapath.ACC_next_0_cry_0_0 MUX S In - 2.089 - u_datapath.ACC_next_0_cry_0_0 MUX LO Out 0.480 2.570 - ACC_next_0_cry_0 Net - - 0.000 - 2 ... ACC_cry 1..2..3..4..5..6..7..8..9..10......15 …u_datapath.ACC_next_0_s_15 XORCY CI In - 3.410 - u_datapath.ACC_next_0_s_15 XORCY O Out 0.883 4.293 - u_datapath.ACC_next_0[15] LUT3_L I2 In - 5.003 - u_datapath.ACC_next_0[15] LUT3_L LO Out 0.579 5.582 - ACC_next[15] Net - - 0.000 - 1 u_datapath.ACC_reg[15] FDCE D In - 5.582 - ========================================================================================================================
Total path delay (propagation time + setup) of 5.693 is 4.193(73.6%) logic and 1.500(26.4%) route.
* synthesis done by Synplify tool
0:LOAD 111:ADD 122:STORE 133:XOR 10 4:JUMPZ 75:LOAD 136:JUMP 17:JUMP 7
10: 8 11: 212: 1
ACC ← Mem[11] ACC ← ACC + Mem[12]Mem[13] ← ACCACC ← ACC xor Mem[10]If ACC = 0 then PC← 7ACC ← Mem[13]PC ← 1PC ← 7
DataDataData
Assembly language Functionality description
0:00000011 00001011 1:00000001 000011002:00000100 00001101 3:00000010 000011000:00000011 00001011 1:00000001 000011000:00000011 00001011 1:00000001 000011000:00000011 00001011 1:00000001 00001100
for (i=2;i!=8;i++);for (;;);
Overall Simulation
i=2i=3 i=4 i=5 i=6
i=7 i=8hold on
for (i=2;i!=8;i++); for (;;);
Loop : 1
LOAD 11 ADD 12 STORE 13 XOR 10 JUMPZ 7 LOAD 13RST JUMP1
LOAD 11
ADD 12
STORE 13
XOR 10
JUMPZ 7
LOAD 13
JUMP 1
JUMPZ 7 & loop on JUMP 7