14
1 Spring 2005 EECS150 - Lec19-review Page 1 EECS150 - Digital Design Lecture 19 – Review March 31, 2005 John Wawrzynek Spring 2005 EECS150 - Lec19-review Page 2 Exam II Midterm Exam next week Tuesday (4/5) In class Closed book/notes Covers lectures 9 (FSMs) through lecture 17 (memory 1) Exam held in 125 Cory Today: Highlights from lectures 9 - 17 I will mention most important points from each lecture Exam may cover subtopics not mentioned today Use homework as a guide to the type of questions on the exam Spring 2005 EECS150 - Lec19-review Page 3 Lecture 9 - Finite State Machines 1 February 15, 2005 Spring 2005 EECS150 - Lec19-review Page 4 Finite State Machines (FSMs) FSM circuits are a type of sequential circuit: output depends on present and past inputs effect of past inputs is represented by the current state Behavior is represented by State Transition Diagram: traverse one edge per clock cycle.

Midterm Exam next week Tuesday (4/5) EECS150 - Digital ...inst.eecs.berkeley.edu/~cs150/sp05/Lecture/lec19.pdf · Exam II • Midterm Exam next week Tuesday (4/5) – In class –

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Spring 2005 EECS150 - Lec19-review Page 1

    EECS150 - Digital DesignLecture 19 – Review

    March 31, 2005John Wawrzynek

    Spring 2005 EECS150 - Lec19-review Page 2

    Exam II• Midterm Exam next week Tuesday (4/5)

    – In class– Closed book/notes– Covers lectures 9 (FSMs) through lecture 17 (memory 1)

    • Exam held in 125 Cory• Today:

    – Highlights from lectures 9 - 17– I will mention most important points from each lecture– Exam may cover subtopics not mentioned today– Use homework as a guide to the type of questions on the exam

    Spring 2005 EECS150 - Lec19-review Page 3

    Lecture 9 - Finite State Machines 1

    February 15, 2005

    Spring 2005 EECS150 - Lec19-review Page 4

    Finite State Machines (FSMs)• FSM circuits are a type of

    sequential circuit:– output depends on present and

    past inputs• effect of past inputs is represented

    by the current state

    • Behavior is represented by State Transition Diagram:– traverse one edge per clock cycle.

  • 2

    Spring 2005 EECS150 - Lec19-review Page 5

    Formal Design Process

    Review of Design Steps:

    1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Write down encoded state transition table5. Derive logic equations6. Derive circuit diagram

    FFs for stateCL for NS and OUT

    Spring 2005 EECS150 - Lec19-review Page 6

    State Encoding• One-hot encoding of states.• One FF per state.

    • Why one-hot encoding?– Simple design procedure.

    • Circuit matches state transition diagram (example next page).– Often can lead to simpler and faster “next state” and output logic.

    • Why not do this?– Can be costly in terms of FFs for FSMs with large number of states.

    • FPGAs are “FF rich”, therefore one-hot state machine encoding is often a good approach.

    Spring 2005 EECS150 - Lec19-review Page 7

    One-hot encoded FSM• Even Parity Checker Circuit:

    • In General: • FFs must be initialized for correct operation (only one 1)

    Circuit generated through direct inspection of the STD.

    Spring 2005 EECS150 - Lec19-review Page 8

    Lecture 10 - Finite State Machines 2

    February 17, 2005

  • 3

    Spring 2005 EECS150 - Lec19-review Page 9

    FSM RecapMoore Machine Mealy Machine

    STATE[output values]

    input value

    STATE

    input value/output values

    Both machine types allow one-hot implementations.

    Spring 2005 EECS150 - Lec19-review Page 10

    FSM ComparisonSolution A

    Moore Machine• output function only of PS• maybe more states (why?)• synchronous outputs

    – no glitches– one cycle “delay”– full cycle of stable output

    Solution BMealy Machine

    • output function of both PS & input• maybe fewer states• asynchronous outputs

    – if input glitches, so does output– output immediately available– output may not be stable long

    enough to be useful (below):

    If output of Mealy FSM goes through combinational logic before being registered, the CL might delay the signal and it could be missed by the clock edge.

    Spring 2005 EECS150 - Lec19-review Page 11

    General FSM Design Process with Verilog Implementation

    Design Steps:1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Assign encodings (bit patterns) to symbolic states5. Code as Verilog behavioral description� Use parameters to represent encoded states.� Use separate always blocks for register assignment and CL

    logic block.� Use case for CL block. Within each case section assign all

    outputs and next state value based on inputs. Note: For Moore style machine make outputs dependent only on state not dependent on inputs.

    Spring 2005 EECS150 - Lec19-review Page 12

    FSMs in Verilog

    always @(posedge clk) if (rst) ps

  • 4

    Spring 2005 EECS150 - Lec19-review Page 13

    Lecture 11 - Shifters & Counters

    February 24, 2003

    Spring 2005 EECS150 - Lec19-review Page 14

    Universal Shift-register

    Spring 2005 EECS150 - Lec19-review Page 15

    Shift Registers• Plain shift register:

    • Shifter with shift-enable input

    • Verilog: assign OUT = Q[0];always @ (posedge clk)

    if (shiftEnable) Q

  • 5

    Spring 2005 EECS150 - Lec19-review Page 17

    Odd Counts• Extra combinational logic can be

    added to terminate count before max value is reached:

    • Example: count to 12

    • Alternative:

    4-bit binarycounter

    load

    4

    TC

    Spring 2005 EECS150 - Lec19-review Page 18

    Synchronous Counters• How do we extend to n-bits?• Extrapolate c+: d+ = d ⊕ abc, e+ = e ⊕ abcd

    • Has difficulty scaling (AND gate inputs grow with n)

    • CE is “count enable”, allows external control of counting, • TC is “terminal count”, is asserted on highest value, allows cascading,

    external sensing of occurrence of max value.

    a b c

    a+ b+ c+

    d

    d+

    b

    b+

    c

    c+

    a

    a+

    d

    d+CE TC

    Spring 2005 EECS150 - Lec19-review Page 19

    Synchronous Counters

    b

    b+

    c

    c+

    a

    a+

    d

    d+CE TC

    • How does this one scale?� Delay grows α n

    • Generation of TC signals very similar to generation of carry signals in adder.

    • “Parallel Prefix” circuit reduces delay:

    log2n

    log2n

    Spring 2005 EECS150 - Lec19-review Page 20

    Ring Counters• “one-hot” counters0001, 0010, 0100, 1000, 0001, …

    “Self-starting” version:

    • What are these good for?

    D Q D Q D Q D Q

    q3 q2 q1 q0

    D Q D Q D Q D Q

    q3 q2 q1

    S R S R S R S R

    q0

    0 0 0 0reset

  • 6

    Spring 2005 EECS150 - Lec19-review Page 21

    Lecture 12 – Project Description

    March 1, 2005

    Spring 2005 EECS150 - Lec19-review Page 22

    Digital Audio– Music waveform

    – A series of numbers is used to represent the waveform, ratherthan a voltage or current, as in analog systems.

    • Discrete time: regular spacing of sample values in time. Most digital audio system use 44.1KHz (consumer) sample rate or 48KHz (professional) sample rate. – Lower frequency would limit the maximum representable frequency

    content. (Human hearing max is 20KHz)• Digital: All inputs/outputs and internal values (signals) take on discrete

    values (not analog). Most digital audio systems use 16-bit values (64K possible values for any point in waveform). Using much fewer than 16 bits generates noticeable noise from distortion.

    Spring 2005 EECS150 - Lec19-review Page 23

    Analog / Digital Conversion

    • Converters are used to move from/to the analog domain.• ADC & DAC often combined in a single chip called CODEC

    (coder/decoder).• Other types of CODECs perform other functions (ex: video conversion,

    audio compression/decompression).

    Digital SystemDigital System

    processingprocessing

    recordingrecording

    playbackplayback

    synthesissynthesis

    Analog to DigitalConverter (ADC)

    sound source

    (microphone)

    sample clock

    26, 46, 51, 55, 51, …

    Digital to AnalogConverter (DAC)

    26, 46, 51, 55, 51, …

    sample clock

    poweramplifier

    decompressiondecompression

    compressioncompression

    Spring 2005 EECS150 - Lec19-review Page 24

    Digital Audio Data-rates

    • Relatively small storage devices has prompted the development and application of many compression algorithms for music and speech:

    – Typically compression ratios of 10-100– MP3: 32Kbits/sec - 320Kbits/sec (factor of 4x to 44x)– These techniques are lossy; information is lost. However the better ones (MP3 &

    AAC for example) used techniques based on characteristics of human auditory perception to drop information of little importance.

    • In our project, uncompressed audio will be used. – Sufficient network bandwidth to support multiple streams of audio.– Much simpler hardware design.

    • Uncompressed audio is often referred to as PCM (pulse code modulation) . (.wav files in windows)

    44.1K samples/sec x 2 (stereo) x 16 bits/samples = 1.4 Mbit/sec = 176,400 Bytes/sec

    1 minute ≈ 10MByte total

  • 7

    Spring 2005 EECS150 - Lec19-review Page 25

    Local Area Network (LAN) Basics

    • A LAN is made up physically of a set of switches, wires, and hosts. Routers and gateways provide connectivity out to other LANs and to the internet.

    • Ethernet defines a set of standards for data-rate (10/100Mbps, 1/10Gbps), and signaling to allow switches and computers to communicate.

    • Most Ethernet implementations these days are “switched” (point to point connections between switches and hosts, no contention or collisions).

    • Information travels in variable sized blocks, called Ethernet Frames, each frame includes preamble, header (control) information, data, and error checking. We usually call these packets.

    • Preamble is a fixed pattern used by receivers to synchronize their clocks to the data.

    • Link level protocol on Ethernet is called the Medium Access Control (MAC) protocol. It defines the format of the packets.

    switchswitch

    host

    host

    host host

    switch

    host

    to router or gateway

    Preamble MAC Payload CRC(8 bytes) header

    Spring 2005 EECS150 - Lec19-review Page 26

    Ethernet Medium Access Control (MAC)

    • MAC protocol encapsulates a payload by adding a 14 byte header before the data and a 4-byte cyclic redundancy check (CRC) after the data.

    • The CRC provides error detection in the case where line errors result in corruption of the MAC frame. In most applications a frame with an invalid CRC is discarded by the MAC receiver.

    Ethertypes for EECS150 project: 0x0101: audio packets0x0102: LCD packets(picked from the range of “experimental” type codes to avoid potential conflict.

    – One way transmission only.– All packets will be broadcasted

    • A 6-byte destination address, specifies either a single recipient node (unicastmode), a group of recipient nodes (multicast mode), or the set of all recipient nodes (broadcast mode).

    • A 6-byte source address, is set to the sender’s globally unique node address. Its common function is to allow address learning which may be used to configure the filter tables in switches.

    • A 2-byte type field, identifies the type of protocol being carried (e.g. 0x0800 for IP protocol).

    Spring 2005 EECS150 - Lec19-review Page 27

    Protocol Stacks• Usual case is that MAC protocol

    encapsulates IP (internet protocol) which in turn encapsulates TCP (transport control protocol) with in turn encapsulates the application layer. Each layer adds its own headers.

    • Other protocols exist for other network services (ex: printers).

    • When the reliability features (retransmission) of TCP are not needed, UDP/IP is used. Gaming and other applications where reliability is provided at the application layer.

    application layerex: http

    TCP

    IP

    MAC Layer 2Layer 3Layer 4Layer 5

    Streaming Ex. Mpeg4

    UDP

    IP

    MAC Layer 2Layer 3Layer 4Layer 5

    Spring 2005 EECS150 - Lec19-review Page 28

    Standard Hardware-Network-Interface

    • Usually divided into three hardware blocks. (Application level processing could be either hardware or software.)

    – MAG. “Magnetics” chip is a transformer for providing electrical isolation.

    – PHY. Provides serial/parallel and parallel/serial conversion and encodes bit-stream for Ethernet signaling convention. Drives/receives analog signals to/from MAG. Recovers clock signal from data input.

    – MAC. Media access layer processing. Processes Ethernet frames: preambles, headers, computes CRC to detect errors on receiving and to complete packet for transmission. Buffers (stores) data for/from application level.

    • Application level interface– Could be a standard bus (ex: PCI)– or designed specifically for application

    level hardware.• MII is an industry standard for

    connection PHY to MAC.

    MAG(transformer)

    PHY(Ethernet signal)

    MAC(MAC layer processing)

    application level

    interfaceEthernet

    connection

    Media Independent Interface (MII)

    Calinx has no MAC chip, mustbe handled in FPGA.

    Calinx has no MAC chip, mustbe handled in FPGA.

  • 8

    Spring 2005 EECS150 - Lec19-review Page 29

    Lecture 14 - CMOS

    March 8, 2005

    Spring 2005 EECS150 - Lec19-review Page 30

    Transistor-level Logic Circuits• NAND gate • NOR gate

    • Note: – out = 0 iff both a OR b = 1 therefore out =

    (a+b)’– Again pFET network and nFET network are

    duals of one another.

    Other more complex functions are possible. Ex: out = (a+bc)’

    Spring 2005 EECS150 - Lec19-review Page 31

    Transmission Gate• Transmission gates are the way to build “switches” in CMOS. • In general, both transistor types are needed:

    – nFET to pass zeros.– pFET to pass ones.

    • The transmission gate is bi-directional (unlike logic gates).

    • Does not directly connect to Vdd and GND, but can be combined with logic gates or buffers to simplify many logic structures.

    Spring 2005 EECS150 - Lec19-review Page 32

    Pass-Transistor Multiplexor• 2-to-1 multiplexor:

    c = sa + s’b

    • Switches simplify the implementation:

    s

    s’b

    a

    c

  • 9

    Spring 2005 EECS150 - Lec19-review Page 33

    Tri-state Buffers

    • Bidirectional connections: • Busses:

    Tri-state buffers are used when multiple circuits all connect to a common bus.Only one circuit at a time is allowed to drive the bus. All others “disconnect”.

    Spring 2005 EECS150 - Lec19-review Page 34

    Transistor-level Logic CircuitsPositive Level-sensitive latch:

    Latch Transistor Level:Positive Edge-triggered flip-flop

    built from two level-sensitive latches:

    clk’

    clk

    clk

    clk’

    Spring 2005 EECS150 - Lec19-review Page 35

    Lecture 15 - Timing

    March 10, 2005

    Spring 2005 EECS150 - Lec19-review Page 36

    Limitations on Clock Rate1 Logic Gate Delay

    • What are typical delay values?

    2 Delays in flip-flops

    • Both times contribute to limiting the clock period. Plus clock skew.

    t

    input

    output

    D

    clk

    Q

    setup time clock to Q delay

    • What must happen in one clock cycle for correct operation?• Assuming perfect clock distribution (all flip-flops see the clock at the same

    time):– All signals connected to FF inputs must be ready and “setup” before

    rising edge of clock.

  • 10

    Spring 2005 EECS150 - Lec19-review Page 37

    General Model of Synchronous Circuit

    • In general, for correct operation:

    for all paths.• How do we enumerate all paths?

    – Any circuit input or register output to any register input or circuit output.

    – “setup time” for circuit outputs depends on what it connects to– “clk-Q time” for circuit inputs depends on from where it comes.

    reg regCL CL

    clock input

    output

    option feedback

    input output

    T ≥ time(clk→Q) + time(CL) + time(setup)T ≥ τclk→Q + τCL + τsetup

    Spring 2005 EECS150 - Lec19-review Page 38

    Gate Switching Behavior• Inverter:

    • NAND gate:

    Spring 2005 EECS150 - Lec19-review Page 39

    Gate Delay• Cascaded gates:

    Vout

    Vin

    “transfer curve” for inverter.

    Spring 2005 EECS150 - Lec19-review Page 40

    Gate Delay• Fan-out:

    • The delay of a gate is proportional to its output capacitance. Because, gates 2 and 3 turn on/off at a later time. (It takes longer for the output of gate 1 to reach the switching threshold of gates 2 and 3 as we add more output capacitance.)

    1

    3

    2

  • 11

    Spring 2005 EECS150 - Lec19-review Page 41

    “Critical” Path

    • Critical Path: the path with the maximum delay, from any input to any output.– In general, we include register set-up and clk-to-Q times in

    critical path calculation.

    • What is the critical path in this circuit?

    • Why do we care about the critical path?

    Spring 2005 EECS150 - Lec19-review Page 42

    Delay in Flip-flops• Setup time results from delay

    through first latch.

    • Clock to Q delay results from delay through second latch.

    D

    clk

    Q

    setup time clock to Q delay

    clk

    clk’

    clk

    clk

    clk’

    clk’

    clk

    clk’

    Spring 2005 EECS150 - Lec19-review Page 43

    Clock Skew (cont.)

    • If clock period T = TCL+Tsetup+Tclk→Q, circuit will fail.• Therefore:

    1. Control clock skewa) Careful clock distribution. Equalize path delay from clock source to all clock loads by controlling wires delay and buffer delay.b) don’t “gate” clocks.

    2. T ≥ TCL+Tsetup+Tclk→Q + worst case skew.• Most modern large high-performance chips (microprocessors) control

    end to end clock skew to a few tenths of a nanosecond.

    clock skew, delay in distributionCL

    CLKCLK’

    CLK

    CLK’

    Spring 2005 EECS150 - Lec19-review Page 44

    Lecture 16 - Power

    March 15, 2005

  • 12

    Spring 2005 EECS150 - Lec19-review Page 45

    Basics• Power supply provides energy for charging and discharging wires and transistor

    gates. The energy supplied is stored & then dissipated as heat.

    • If a differential amount of charge dq is given a differential increase in energy dw, the potential of the charge is increased by:

    • By definition of current: dqdwV /=dtdqI /=

    dtdwP /≡ Power: Rate of work being done w.r.t time.Rate of energy being used.

    IVPdtdq

    dqdw

    dtdw ×==×=/

    �∞−

    =t

    Pdtw total energy

    Units: tEP ∆= Watts = Joules/seconds

    A very practicalformulation!

    If we would liketo know total energy

    Spring 2005 EECS150 - Lec19-review Page 46

    Metrics• How does MIPS/watt relate to energy?• Average power consumption = energy / time

    MIPS/watt = instructions/sec / joules/sec = instructions/joule

    – therefore an equivalent metric (reciprocal) is energy per operation (E/op)

    • E/op is more general - applies to more that processors– also, usually more relevant, as batteries life is limited by total

    energy draw.– This metric gives us a measure to use to compare two alternative

    implementations of a particular function.

    Spring 2005 EECS150 - Lec19-review Page 47

    Power in CMOS

    C

    pullupnetwork

    pulldownnetwork

    Vdd

    GND

    10

    i(t)

    v(t)t0 t1

    v(t)

    VddSwitching Energy:

    energy used toswitch a node

    Energy supplied Energy dissipatedEnergy stored

    Calculate energy dissipated in pullup:

    Esw = P(t)dt =t0t1� (Vdd − v) ⋅ i(t)dt =t0

    t1� (Vdd − v) ⋅ c (dv dt) dt =t0t1�

    = cVdd dv − c v ⋅ dv = cVdd2 −1 2cVdd

    2

    v0

    v1�v0v1� =1 2cVdd 2

    An equal amount of energy is dissipated on pulldown.

    Spring 2005 EECS150 - Lec19-review Page 48

    Controlling Energy Consumption

    • Largest contributing component to CMOS power consumption is switching power:

    • Factors influencing power consumption:n: total number of nodes in circuit

    α: activity factor (probability of each node switching)

    f: clock frequency (does this effect energy consumption?)

    Vdd: power supply voltage

    • What control do you have over each factor? • How does each effect the total Energy?

    What control do you have as a designer?

    221 ddavgavgavg VcfnP ⋅⋅⋅= α

    In EECS150 design projects, we will not optimize for power consumption.

  • 13

    Spring 2005 EECS150 - Lec19-review Page 49

    Lecture 17 – Memory 1

    March 17, 2005

    Spring 2005 EECS150 - Lec19-review Page 50

    Standard Internal Memory Organization

    • Special circuit tricks are used for the cell array to improve storage density. (We will look at these later)

    • RAM/ROM naming convention: – examples: 32 X 8, "32 by 8" => 32 8-bit words – 1M X 1, "1 meg by 1" => 1M 1-bit words

    2-D arrary of bit cells. Each cell stores one bit of data.

    Spring 2005 EECS150 - Lec19-review Page 51

    Read Only Memory (ROM)• Simply form of memory. No write operation needed.• Functional Equivalence:

    • Full tri-state buffers are not needed at each cell point.• In practice, single transistors are used to implement zero cells. Logic

    one’s are derived through precharging or bit-line pullup transistor.

    Connections to Vddused to store a logic 1, connections to GND for storing logic 0.

    address decoder bit-cell array

    Spring 2005 EECS150 - Lec19-review Page 52

    Column MUX in ROMs and RAMs: • Controls physical aspect ratio

    – Important for physical layout and to control delay on wires.

    • In DRAM, allows time-multiplexing of chip address pins

  • 14

    Spring 2005 EECS150 - Lec19-review Page 53

    Cascading Memory Modules (or chips) • Example: assemblage of 256 x 8

    ROM using 256 x 4 modules:

    • example: 1K x * ROM using 256 x 4 modules:

    • each module has tri-state outputs:

    Spring 2005 EECS150 - Lec19-review Page 54

    Memory Components Types:• Volatile:

    – Random Access Memory (RAM): • DRAM "dynamic" • SRAM "static"

    • Non-volatile:– Read Only Memory (ROM):

    • Mask ROM "mask programmable" • EPROM "electrically programmable" • EEPROM "erasable electrically programmable" • FLASH memory - similar to EEPROM with programmer

    integrated on chip

    Spring 2005 EECS150 - Lec19-review Page 55

    Volatile Memory Comparison

    • SRAM Cell

    • Larger cell � lower density, higher cost/bit

    • No refresh required

    • Simple read � faster access • Standard IC process � natural for

    integration with logic

    • DRAM Cell

    • Smaller cell � higher density, lower cost/bit

    • Needs periodic refresh, and refresh after read

    • Complex read � longer access time • Special IC process � difficult to integrate

    with logic circuits

    word line

    bit line bit line

    word line

    bit line

    The primary difference between different memory types is the bit cell.

    Spring 2005 EECS150 - Lec19-review Page 56

    Dual-ported Memory Internals• Add decoder, another set of

    read/write logic, bits lines, word lines:

    • Example cell: SRAM

    • Repeat everything but cross-coupled inverters.

    • This scheme extends up to a couple more ports, then need to add additional transistors.

    deca decb cellarray

    r/w logic

    r/w logic

    data portsaddress

    ports

    b2 b2b1 b1

    WL2

    WL1