Download pdf - Andreas Ehliar 2015-09-01 - Linköping · PDF fileAndreas Ehliar 2015-09-01 ... I Basic VHDL or Verilog ... I No lab signup list as there is only one lab group Andreas Ehliar 01 -

01 - Introduction

Andreas Ehliar

2015-09-01

http://www.isy.liu.se/edu/kurs/TSEA26/

Andreas Ehliar 01 - Introduction


Motivation - Why you should read this course!

I Learn processor design

I Learn about efficient hardware design

I Learn some firmware design tricksI The course is focused on ASIPs (Application Specific

Instruction set Processors) and signal processingI Most of the ideas are applicable in other situations as wellI For example: Creating highly efficient computing units in

FPGAs


Motivation

I After this course you should be able to design a simpleapplication specific processor, or similar device, by yourself.

I The course will be challenging, but rewarding


Credits

I Many of the slides have been adapted from slides initiallycreated by Dake Liu (who had this course before me)


Course coverage

I Focus is not on DSP algorithms

I Nor on IC basicsI Focus on implementation

I In the labs, the tutorials, and the examI Hardware on RTL levelI Software on assembler level


Pre-requirements

I Basic DSP algorithm knowledge

I Basic computer engineering (CPU, assembler, etc)

I Logic design with synchronous logic

I Basic VHDL or VerilogI Diplomatically speaking, if you don’t have the pre-requisites,

suffice to say that you will have gained a lot of knowledgewhen passing this course. . .

I Students have passed this course before, without having all ofthe prerequisites, but they probably had to work pretty hard todo so.

I (Please talk to me in the break if you have any doubts aboutthese requirements. )


Goal of the course

I You should be able to design a simple processor:I Design methodsI The instruction set & architectureI The microarchitectureI Integration and verificationI Firmware


Scope of the course

I (10%) ASIP Design Methods

I (20%) Explore architecture and instruction set

I (50%) Micro architecture design of the core

I (10%) Integration and verification

I (10%) Firmware and programming tools


Course literature

I Dake Liu, Embedded DSP Processor DesignI Should be available at local book storesI Also available as an ebook at http://www.bibl.liu.se/

I Andreas Ehliar, Exercise Collection for TSEA26I Available at http:

//www.isy.liu.se/edu/kurs/TSEA26/tutorials.htmlI Mostly based on problems from old exams.


http://www.bibl.liu.se/

http://www.isy.liu.se/edu/kurs/TSEA26/tutorials.html

http://www.isy.liu.se/edu/kurs/TSEA26/tutorials.html

TSEA26 Staff

I Lectures/examiner: Andreas Ehliar

I Tutorials: Andreas Ehliar, Andreas Karlsson

I Labs: Andreas Karlsson, Syed Asad Alam

I Course homepage:http://www.isy.liu.se/edu/kurs/TSEA26/



TSEA26 - Course registration

I The computer labs fit exactly 32 students

I A few late arrivals also want to attend the course

I If you for some reason want to drop the course, please let meknow!


Labs

I Computer based laborationsI Four labs and 10 scheduled lab slots (10h each)I Recommended lab slot usage:

I Slot 1-2: Lab 1I Slot 3-5: Lab 2I Slot 6-8: Lab 3I Slot 9-10: Lab 4

I No lab signup list as there is only one lab group


Application example: Communication

Estimatedmessage signal

Receiver

ReceivedSignal

Channel

TransmittedSignalTransmitter

Streaming signal between a transceiver pair

... Normal signal Training signal ... Normal signal ... Training ...

Recovering data from a noisy channel

Knowndata

Channelbehavior

ReceiveddataReceiver

Synchronization andchannel estimator

A streaming period

Liu2008, figure 1.13


Application example: Voice coding

The vocal model synthesis: e.g. to model a vocalchannel using a 10-tap IIR filter

The gain of the voice

The pitch of the voice

The attenuation of the voice

The noise of the voice

Can compress voice data from 104 kbit/s to 1.2 kbit/s

Synth

esi

zed v

oic

e p

att

ern

A/D

convert

er



Application example: Video compression

F-doma in frame compressionB-f

ram

eG

-fra

me

R-f

ram

e RG

B t

o Y

UV

convers

ion

Y-fr

am

eU

-fra

me

V-f

ram

e

Inter-frame compression

Intra-frame compression F-d

om

ain

resi

due

com

pre

ssio

n

Loss

less

com

pre

ssio

n



DSP Alternatives - Normal desktop/laptop

I Suitable for many DSP applications such as video and audioprocessing

I Specialied instruction set extensions such as SSE (and MMX)allows for efficient parallelization of DSP tasks

I (GPU offloading is also getting more popular)


DSP Alternatives - Normal desktop/laptop

I Not suitable for applications with hard real-timerequirements

I Not suitable for applications with low powerrequirements

I Not suitable for medium or high volume embeddedapplications with low cost requirements


DSP Alternatives - General purpose DSP processor

I Suitable for most typical DSP tasks

I Suitable for hard real-time tasks

I Probably a good choice for low or medium volume embeddedapplications

I Not a good choice for very computationally demandingDSP applications


Application specific Instruction set Processor (ASIP)

I A processor optimized for a certain kind of application domain

I Instruction set optimized for a certain application

I Accelerators for very demanding parts of the application

I Low power, high speed, low cost

I The focus of this course


Application Specific Integrated Circuit (ASIC)

I In theory:I Lowest cost (in high volume)I Lowest powerI Highest performance

I In practice:I Very high development costI Long development timeI Only used when absolutely necessary


ASIP Design flow

I DSP architecture

I DSP FW design tools

I DSP algorithm design


Discussion break

I How do you decide on an architecture/instruction set for aprocessor that is supposed to be very good at running“Application X”?


Discussion break

I How do you decide on an architecture/instruction set for aprocessor that is supposed to be very good at running“Application X”?

I Hint: In almost all applications there are some functions thatare only run once or twice and some functions that arerunning almost all the time


Some possibilities, discussed during class

I What do we need to know about Application X?

I Size of dataset, locality of data

I Real-time requirements

I Power requirements

I What kind of operations and how common they are

I Parallelization possibililties

I etc. . .


Simplified ASIP Design Flow

Application / function coverage analysis

Instruction set proposal and 90% 10% locality

Instruction set simulator and assembler

Benchmarking: speed up and coverage

satisfied

Release the instruction set architecture

No

yes

Micro architecture design, RTL, and VLSI

yes

Source code profiling

Instruction set design

Processor implementation

Liu2008, figure 1.26 (slightly modified)


ASIP Design in the Future

I The dream:I Give a tool the source code of an application (or applications)I The tool automatically generates the RTL code of an ASIP

optimized for this application (or applications)I (Oh, and I’d like a pony as well, or perhaps a unicorn.)


ASIP Design in the future

I Reality checkI Some tools are available to aid in processor designI Higher level than VHDL or VerilogI Removes the need for some of the “grunt” workI Typically limits the design space to reduce the complexity of

the problemI Still requires a skilled user to make the most of it


To design an efficient ASIP today, we need

I Application specific datapath and data typesI Deep understanding of data types and corner cases

I Application specific memory accessI Deep understanding of parallel data features

I Application specific program flowI Deep understanding and specification of control complexities


Todays lecture: Finite length arithmetic

I Numerical representations

I Finite length data

I Signal processing under finite data precision

I Corner cases


DEMO

I Two versions of an MP3 decoder

I The number format used for intermediate results in both MP3decoders are 13 bits wide, yet the files sound very different.Why?


Numerical Representation

I Fixed point: Integer

I Fixed point: FractionalI Floating point basic

I Floating point: IEEE 754I Floating point: DSP specific floating point

I (Block floating point / Dynamic scaling)


General requirements

I Low silicon cost requirements often lead to custom data typesin DSP applications

I Typical nomenclature: Word (16 bit), double word (32 bit)

I Other possibilities: Custom word length for computation,memories, bus widths as required by the application


Two’s complement representation

I From hardware point of view: Addition/subtraction isidentical to unsigned addition/subtraction

I In this course: Almost always fixed point two’s complementrepresentation


Fixed point numerical representation

I Why fixed point (Important)I Easy to implement data path hardwareI Low hardware cost (low chip area)I Short physical critical pathI Low power


Fixed point numerical representation

I What is fixed point numerical representation?I Integer or fractional representation, dominates DSP fieldI Integer: Between −2n−1 and +2n−1 − 1I Fractional: Between −1 and +1− 2−n+1

I (Where n is the number of bits)


Fractional numerical representation

I Fractional representation is straightforward:I −a0 + a−1 · 2−1 + a−2 · 2−2 + ...a−n · 2−n

I a0 is the sign bit, the range is −1 ≤ x < 1I (You do remember that the secret with two’s complement

numbers is that the sign bit has a negative weight rather thana positive weight?)

I Example:I 010100002 = 0.625 (decimal)I = −0 · 20 + 1 · 2−1 + 1 · 2−3 = 0.625


Fractional vs integer multiplication

Integer

x

Fractional

x

Weights

Sign bit Sign bit Sign bit Sign bit

Result } }4-bitresult

4-bitresult

I Integer: Overflow is a real concernI Example: 0111× 0111 = 00110001

I Integer: 001100012 = 4910

I Integer (truncated): 00012 = 1 (Fatal error)I Fractional: 00.110001 = 0.765625I Fractional (truncated): 0.1102 = 0.7510 (Pretty close to

correct result)


Fixed point

I A more general case: The radix point can be located betweenany bits:

I Q(n,m) notation is often used to signify this:

I n: The number of bits to the left of the radix point

I m: The number of bits to the right of the radix pointI Beware of confusion:

I In the textbook: n includes the sign bitI In many other sources: n do not include the sign bit

I To be on the safe side: Mention the width separately: (E.g., a16 bit number in Q(2, 13) format.


Useful definitions

I PrecisionI The distance between the smallest values that can be

represented using a certain number formatI Example: Using 16 bits fractional numbers, the precision of

the data is 0.0000000000000012 ≈ 0.00030510


Useful definitions

I Dynamic range (of a digital signal)I The ratio between the largest range a certain number format

can represent and the precision.I E.g., the dynamic range of a 16 bit fixed point number format

is 65535/1I Commonly measured in dB. Example:

20× log1065535 ≈ 96dB. Each extra bit adds about 6dB


Useful definitions

I Quantization error (of digital signals)I The numerical error introduced when a longer numeric format

is converted to a shorter oneI E.g: s.fffffffff → s.ffffI (Quantization errors that occur during A/D and D/A

conversions are not discussed much in this course.)


Typical DSP Kernel

I Read data from memory

I Calculate using as many bits as required

I Store data in memory using as many bits as required (typicallyfewer)


Drawbacks of Fixed Point

I Sometimes it is not possible to separate dynamic range andprecision

I Higher firmware design costs (E.g., when using a Matlabmodel as a reference)


Floating Point Repetition

I Numbers are represented by a mantissa (m, an exponent (e),and a sign bit (s)

I value = −1S ×m × 2e


IEEE-754 Standard for Floating Point Numbers

I IEEE-754 Single precision floating point number (32 bits)

31 30 23 22 0

S Exponent Mantissa

I S: Sign bit, 0 is positive, 1 is negative

I Exponent: 8-bit field in excess 127 format. If 255: A specialvalue such as infinity or NaN (Not a Number)

I Mantissa: 23-bit field containing the normalized fraction(using an implicit one)


Why Floating Point?

I Large dynamic range using a small number of bits

I Usually easier for the programmer → faster time to market(TTM)


Why not Floating Point?

I When precision is more important than dynamic range

I Complicated data path, longer critical path in the data path,higher silicon cost for arithmetic units

I Harder to reason about the stability of calculationsI Example: x + y + z 6= z + y + x


IEEE-754 Standard for Floatnig Point Numbers

I 23 bit mantissa (with implicit one)

I 8 bit exponent

I 1 sign bitI Other features:

I Rounding (Round to + inf, − inf, 0, and round to nearest even)I Subnormal numbers (sometimes called denormalized numbers)


Discussion break

I If you are designing an application specific processor whereyou need floating point numbers, can you reduce the hardwarecost by using a custom FP format?

I (Note: Not really a discussion break, rather left as somethingto think of until Lecture 2)