01 - Introduction
Andreas Ehliar
2015-09-01
http://www.isy.liu.se/edu/kurs/TSEA26/
Andreas Ehliar 01 - Introduction
Motivation - Why you should read this course!
I Learn processor design
I Learn about efficient hardware design
I Learn some firmware design tricksI The course is focused on ASIPs (Application Specific
Instruction set Processors) and signal processingI Most of the ideas are applicable in other situations as wellI For example: Creating highly efficient computing units in
FPGAs
Andreas Ehliar 01 - Introduction
Motivation
I After this course you should be able to design a simpleapplication specific processor, or similar device, by yourself.
I The course will be challenging, but rewarding
Andreas Ehliar 01 - Introduction
Credits
I Many of the slides have been adapted from slides initiallycreated by Dake Liu (who had this course before me)
Andreas Ehliar 01 - Introduction
Course coverage
I Focus is not on DSP algorithms
I Nor on IC basicsI Focus on implementation
I In the labs, the tutorials, and the examI Hardware on RTL levelI Software on assembler level
Andreas Ehliar 01 - Introduction
Pre-requirements
I Basic DSP algorithm knowledge
I Basic computer engineering (CPU, assembler, etc)
I Logic design with synchronous logic
I Basic VHDL or VerilogI Diplomatically speaking, if you don’t have the pre-requisites,
suffice to say that you will have gained a lot of knowledgewhen passing this course. . .
I Students have passed this course before, without having all ofthe prerequisites, but they probably had to work pretty hard todo so.
I (Please talk to me in the break if you have any doubts aboutthese requirements. )
Andreas Ehliar 01 - Introduction
Goal of the course
I You should be able to design a simple processor:I Design methodsI The instruction set & architectureI The microarchitectureI Integration and verificationI Firmware
Andreas Ehliar 01 - Introduction
Scope of the course
I (10%) ASIP Design Methods
I (20%) Explore architecture and instruction set
I (50%) Micro architecture design of the core
I (10%) Integration and verification
I (10%) Firmware and programming tools
Andreas Ehliar 01 - Introduction
Course literature
I Dake Liu, Embedded DSP Processor DesignI Should be available at local book storesI Also available as an ebook at http://www.bibl.liu.se/
I Andreas Ehliar, Exercise Collection for TSEA26I Available at http:
//www.isy.liu.se/edu/kurs/TSEA26/tutorials.htmlI Mostly based on problems from old exams.
Andreas Ehliar 01 - Introduction
TSEA26 Staff
I Lectures/examiner: Andreas Ehliar
I Tutorials: Andreas Ehliar, Andreas Karlsson
I Labs: Andreas Karlsson, Syed Asad Alam
I Course homepage:http://www.isy.liu.se/edu/kurs/TSEA26/
Andreas Ehliar 01 - Introduction
TSEA26 - Course registration
I The computer labs fit exactly 32 students
I A few late arrivals also want to attend the course
I If you for some reason want to drop the course, please let meknow!
Andreas Ehliar 01 - Introduction
Labs
I Computer based laborationsI Four labs and 10 scheduled lab slots (10h each)I Recommended lab slot usage:
I Slot 1-2: Lab 1I Slot 3-5: Lab 2I Slot 6-8: Lab 3I Slot 9-10: Lab 4
I No lab signup list as there is only one lab group
Andreas Ehliar 01 - Introduction
Application example: Communication
Estimatedmessage signal
Receiver
ReceivedSignal
Channel
TransmittedSignalTransmitter
Streaming signal between a transceiver pair
... Normal signal Training signal ... Normal signal ... Training ...
Recovering data from a noisy channel
Knowndata
Channelbehavior
ReceiveddataReceiver
Synchronization andchannel estimator
A streaming period
Liu2008, figure 1.13
Andreas Ehliar 01 - Introduction
Application example: Voice coding
The vocal model synthesis: e.g. to model a vocalchannel using a 10-tap IIR filter
The gain of the voice
The pitch of the voice
The attenuation of the voice
The noise of the voice
Can compress voice data from 104 kbit/s to 1.2 kbit/s
Synth
esi
zed v
oic
e p
att
ern
A/D
convert
er
Liu2008, figure 1.15
Andreas Ehliar 01 - Introduction
Application example: Video compression
F-doma in frame compressionB-f
ram
eG
-fra
me
R-f
ram
e RG
B t
o Y
UV
convers
ion
Y-fr
am
eU
-fra
me
V-f
ram
e
Inter-frame compression
Intra-frame compression F-d
om
ain
resi
due
com
pre
ssio
n
Loss
less
com
pre
ssio
n
Liu2008, figure 1.16
Andreas Ehliar 01 - Introduction
DSP Alternatives - Normal desktop/laptop
I Suitable for many DSP applications such as video and audioprocessing
I Specialied instruction set extensions such as SSE (and MMX)allows for efficient parallelization of DSP tasks
I (GPU offloading is also getting more popular)
Andreas Ehliar 01 - Introduction
DSP Alternatives - Normal desktop/laptop
I Not suitable for applications with hard real-timerequirements
I Not suitable for applications with low powerrequirements
I Not suitable for medium or high volume embeddedapplications with low cost requirements
Andreas Ehliar 01 - Introduction
DSP Alternatives - General purpose DSP processor
I Suitable for most typical DSP tasks
I Suitable for hard real-time tasks
I Probably a good choice for low or medium volume embeddedapplications
I Not a good choice for very computationally demandingDSP applications
Andreas Ehliar 01 - Introduction
Application specific Instruction set Processor (ASIP)
I A processor optimized for a certain kind of application domain
I Instruction set optimized for a certain application
I Accelerators for very demanding parts of the application
I Low power, high speed, low cost
I The focus of this course
Andreas Ehliar 01 - Introduction
Application Specific Integrated Circuit (ASIC)
I In theory:I Lowest cost (in high volume)I Lowest powerI Highest performance
I In practice:I Very high development costI Long development timeI Only used when absolutely necessary
Andreas Ehliar 01 - Introduction
ASIP Design flow
I DSP architecture
I DSP FW design tools
I DSP algorithm design
Andreas Ehliar 01 - Introduction
Discussion break
I How do you decide on an architecture/instruction set for aprocessor that is supposed to be very good at running“Application X”?
Andreas Ehliar 01 - Introduction
Discussion break
I How do you decide on an architecture/instruction set for aprocessor that is supposed to be very good at running“Application X”?
I Hint: In almost all applications there are some functions thatare only run once or twice and some functions that arerunning almost all the time
Andreas Ehliar 01 - Introduction
Some possibilities, discussed during class
I What do we need to know about Application X?
I Size of dataset, locality of data
I Real-time requirements
I Power requirements
I What kind of operations and how common they are
I Parallelization possibililties
I etc. . .
Andreas Ehliar 01 - Introduction
Simplified ASIP Design Flow
Application / function coverage analysis
Instruction set proposal and 90% 10% locality
Instruction set simulator and assembler
Benchmarking: speed up and coverage
satisfied
Release the instruction set architecture
No
yes
Micro architecture design, RTL, and VLSI
yes
Source code profiling
Instruction set design
Processor implementation
Liu2008, figure 1.26 (slightly modified)
Andreas Ehliar 01 - Introduction
ASIP Design in the Future
I The dream:I Give a tool the source code of an application (or applications)I The tool automatically generates the RTL code of an ASIP
optimized for this application (or applications)I (Oh, and I’d like a pony as well, or perhaps a unicorn.)
Andreas Ehliar 01 - Introduction
ASIP Design in the future
I Reality checkI Some tools are available to aid in processor designI Higher level than VHDL or VerilogI Removes the need for some of the “grunt” workI Typically limits the design space to reduce the complexity of
the problemI Still requires a skilled user to make the most of it
Andreas Ehliar 01 - Introduction
To design an efficient ASIP today, we need
I Application specific datapath and data typesI Deep understanding of data types and corner cases
I Application specific memory accessI Deep understanding of parallel data features
I Application specific program flowI Deep understanding and specification of control complexities
Andreas Ehliar 01 - Introduction
Todays lecture: Finite length arithmetic
I Numerical representations
I Finite length data
I Signal processing under finite data precision
I Corner cases
Andreas Ehliar 01 - Introduction
DEMO
I Two versions of an MP3 decoder
I The number format used for intermediate results in both MP3decoders are 13 bits wide, yet the files sound very different.Why?
Andreas Ehliar 01 - Introduction
Numerical Representation
I Fixed point: Integer
I Fixed point: FractionalI Floating point basic
I Floating point: IEEE 754I Floating point: DSP specific floating point
I (Block floating point / Dynamic scaling)
Andreas Ehliar 01 - Introduction
General requirements
I Low silicon cost requirements often lead to custom data typesin DSP applications
I Typical nomenclature: Word (16 bit), double word (32 bit)
I Other possibilities: Custom word length for computation,memories, bus widths as required by the application
Andreas Ehliar 01 - Introduction
Two’s complement representation
I From hardware point of view: Addition/subtraction isidentical to unsigned addition/subtraction
I In this course: Almost always fixed point two’s complementrepresentation
Andreas Ehliar 01 - Introduction
Fixed point numerical representation
I Why fixed point (Important)I Easy to implement data path hardwareI Low hardware cost (low chip area)I Short physical critical pathI Low power
Andreas Ehliar 01 - Introduction
Fixed point numerical representation
I What is fixed point numerical representation?I Integer or fractional representation, dominates DSP fieldI Integer: Between −2n−1 and +2n−1 − 1I Fractional: Between −1 and +1− 2−n+1
I (Where n is the number of bits)
Andreas Ehliar 01 - Introduction
Fractional numerical representation
I Fractional representation is straightforward:I −a0 + a−1 · 2−1 + a−2 · 2−2 + ...a−n · 2−n
I a0 is the sign bit, the range is −1 ≤ x < 1I (You do remember that the secret with two’s complement
numbers is that the sign bit has a negative weight rather thana positive weight?)
I Example:I 010100002 = 0.625 (decimal)I = −0 · 20 + 1 · 2−1 + 1 · 2−3 = 0.625
Andreas Ehliar 01 - Introduction
Fractional vs integer multiplication
Integer
x
Fractional
x
Weights
Sign bit Sign bit Sign bit Sign bit
Result } }4-bitresult
4-bitresult
I Integer: Overflow is a real concernI Example: 0111× 0111 = 00110001
I Integer: 001100012 = 4910
I Integer (truncated): 00012 = 1 (Fatal error)I Fractional: 00.110001 = 0.765625I Fractional (truncated): 0.1102 = 0.7510 (Pretty close to
correct result)
Andreas Ehliar 01 - Introduction
Fixed point
I A more general case: The radix point can be located betweenany bits:
I Q(n,m) notation is often used to signify this:
I n: The number of bits to the left of the radix point
I m: The number of bits to the right of the radix pointI Beware of confusion:
I In the textbook: n includes the sign bitI In many other sources: n do not include the sign bit
I To be on the safe side: Mention the width separately: (E.g., a16 bit number in Q(2, 13) format.
Andreas Ehliar 01 - Introduction
Useful definitions
I PrecisionI The distance between the smallest values that can be
represented using a certain number formatI Example: Using 16 bits fractional numbers, the precision of
the data is 0.0000000000000012 ≈ 0.00030510
Andreas Ehliar 01 - Introduction
Useful definitions
I Dynamic range (of a digital signal)I The ratio between the largest range a certain number format
can represent and the precision.I E.g., the dynamic range of a 16 bit fixed point number format
is 65535/1I Commonly measured in dB. Example:
20× log1065535 ≈ 96dB. Each extra bit adds about 6dB
Andreas Ehliar 01 - Introduction
Useful definitions
I Quantization error (of digital signals)I The numerical error introduced when a longer numeric format
is converted to a shorter oneI E.g: s.fffffffff → s.ffffI (Quantization errors that occur during A/D and D/A
conversions are not discussed much in this course.)
Andreas Ehliar 01 - Introduction
Typical DSP Kernel
I Read data from memory
I Calculate using as many bits as required
I Store data in memory using as many bits as required (typicallyfewer)
Andreas Ehliar 01 - Introduction
Drawbacks of Fixed Point
I Sometimes it is not possible to separate dynamic range andprecision
I Higher firmware design costs (E.g., when using a Matlabmodel as a reference)
Andreas Ehliar 01 - Introduction
Floating Point Repetition
I Numbers are represented by a mantissa (m, an exponent (e),and a sign bit (s)
I value = −1S ×m × 2e
Andreas Ehliar 01 - Introduction
IEEE-754 Standard for Floating Point Numbers
I IEEE-754 Single precision floating point number (32 bits)
31 30 23 22 0
S Exponent Mantissa
I S: Sign bit, 0 is positive, 1 is negative
I Exponent: 8-bit field in excess 127 format. If 255: A specialvalue such as infinity or NaN (Not a Number)
I Mantissa: 23-bit field containing the normalized fraction(using an implicit one)
Andreas Ehliar 01 - Introduction
Why Floating Point?
I Large dynamic range using a small number of bits
I Usually easier for the programmer → faster time to market(TTM)
Andreas Ehliar 01 - Introduction
Why not Floating Point?
I When precision is more important than dynamic range
I Complicated data path, longer critical path in the data path,higher silicon cost for arithmetic units
I Harder to reason about the stability of calculationsI Example: x + y + z 6= z + y + x
Andreas Ehliar 01 - Introduction
IEEE-754 Standard for Floatnig Point Numbers
I 23 bit mantissa (with implicit one)
I 8 bit exponent
I 1 sign bitI Other features:
I Rounding (Round to + inf, − inf, 0, and round to nearest even)I Subnormal numbers (sometimes called denormalized numbers)
Andreas Ehliar 01 - Introduction