M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung...

Preview:

Citation preview

M2: Team Paradigm

:: Milestone 2 2-D Discrete Cosine Transform

Group M2:Tommy Taylor Brandon HsiungChangshi XiaoBongkwan Kim

Project Manager: Yaping Zhan

M2: Team Paradigm

Project statusDesign Proposal (Complete)Architecture Proposal (Almost Complete): Algorithm description (Done): High level simulation (Done): Mapping algorithm into hardware (Done): Behavioral Verilog and test bench (Debugging)Size estimates/floor plan (To be completed): Structural Verilog: More accurate transistor count: Floor plan

M2: Team Paradigm

Design decisions

Do not include motion prediction

Go with 2-D DCT

Use SRAM

No pipelining

Will not run in real-time

M2: Team Paradigm

Distributed algorithm of 1D DCT :

A = cos(/4)

B = cos(/8)

C = sin(/8)D = cos(/16)E = cos(3/16)

F = sin(3/16)G = sin(/16)

A A A A

B C -C -B

A -A -A A

C -B B -C

x0 + x7

x1 + x6

x2 + x5

x3 + x4

X0

X2

X4

X6

= 1/2

D E F G

E -G -D -F

F -D G E

G -F E -D

x0 - x7

x1 - x6

x2 - x5

x3 - x4

X1

X3

X5

X7

= 1/2

M2: Team Paradigm

In two’s complement representation:

ui = -buiB-1 + j=1, B-1 2-jbui

j

Where, buij is the jth bit, bui

B-1 is the MSB, i.e. the sign bit

Xn = j=1,B-1 2-jDn(bj) – Dn(bB-1), where Dn(bj) = (i=1,3Ci,n buij)

A A A A

B C -C -B

A -A -A A

C -B B -C

b015 b0

14…b00

b115 b1

14…b10

b215 b2

14…b20

b315 b3

14…b30

X0

X2

X4

X6

=

For example, D0(b14) = Ab014+Ab1

14+Ab214+Ab3

14

Distributed algorithm of 1D DCT (continued):

M2: Team Paradigm

1D DCT architecture

out_data(16)

Selector

+ -

+ +R R

Parallel to serial

Control logic

ROM

in_data(16)

in_valid

out_valid

out_ready

out_done

clk

vdd

vss

reset

Register file 8x16

Register file 8x16

Bit addressgenerator

Bit addressgenerator

ROM

M2: Team Paradigm

2D DCT :

Two 1D DCT can operate in pipeline to boost throughput performance, this requires RAM can be read and wrote at the same time and each 1D DCT module read/write the RAM in row and column order alternatively.

1D DCT (on rows)

1D DCT (on columns)

Transpose RAM

Data in

Data out

Control logic

M2: Team Paradigm

Transistor count and performance estimation :

adder register ROM Control logic total pins

4x16x30 18x16x20 8x16x2 1000 ~9k 40

1DDCT module :

2DDCT = 2x1DDCT + SRAM ~ 24k

throughput latency

8 samples/64 cycle 528 cycle

M2: Team Paradigm

High level simulation (in C/C++) :three implementation of 1DDCT:

1. Based on definition

2. Based on fast algorithm

3. Based on distributed algorithm

input

Function 1

Function 2

Function 3

Matlab

comparepass/fail

M2: Team Paradigm

-

Selector

R0 R7 We begin by inputting eight, sixteen bit values into individual registers

We use a selector to select the registers that will be added and subtracted

The R0 & R7 values are added and subtracted in parallel...So forth for R1 & R6...R2 & R5....R3 & R4

It will take 8 clock cycles to get all the data

R7R0

Step 1:

M2: Team Paradigm

Step 1 (Verilog)

always @ (posedge clk or negedge rst) begin if(rst==0) begin

count <= 0; end else begin

if(in_clr==1) begin count <= 0; end else begin if(in_valid && ~out_full) begin buf[count] <= in_data; count <= count + 1; end end

end end // always @ (posedge clk or negedge rst)

always @ (posedge clk) begin if(in_read) begin

out_data1 <= buf[in_addr]; out_data2 <= buf[7-in_addr];

end end

Write operation

Read operation

M2: Team Paradigm

Bit Address Generator

Store the results from the addition and subtraction into 8, 16' registers

Taking the first bit in each of the four registers (addition results and subtraction result) we use the value to allow the bit address generator to store it in the proper position in ROM

R0 R7bit 1bit 1bit 1bit 1

1011

Rom0 Rom7

Step 2

M2: Team Paradigm

Step 2 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin

count <= 0; end else begin

if(in_clr==1) begin count <= 0; end else begin if(in_read & ~out_full) begin buf[count] <= in_data; count <= count + 1; end end

end end

always @ (in_bitpos) begin out_addr[3] <= buf[0][in_bitpos:in_bitpos]; out_addr[2] <= buf[1][in_bitpos:in_bitpos]; out_addr[1] <= buf[2][in_bitpos:in_bitpos]; out_addr[0] <= buf[3][in_bitpos:in_bitpos]; end

Bit address generator

Read operation

M2: Team Paradigm

Rom0 Rom7

R5 R6

S1S0

Parallel to Serial

From the ROM the data in the addresses are added, stored in a register then the result is shifted (multiplied by a factor of two...two's complement)

Step 3

M2: Team Paradigm

Step 3 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin

out_data <= 0; bit_pos <= 15;

end else begin

if(in_clr==1) begin out_data <= 0; bit_pos <= 15; end else begin if(~out_done) begin out_data <= out_data + in_data; bit_pos <= bit_pos - 1; end end // else: !if(in_clr==1)

end end

M2: Team Paradigm

C Code Result

M2: Team Paradigm

::conclusion & questions

: Implementing 2D DCT

: Roughly 24k transistor count

: Verilog needs debugging

Recommended