A 96-channel FPGA-based Time-to-Digital Converter

7/26/2019 A 96-channel FPGA-based Time-to-Digital Converter

1/14

ARTICLE IN PRESS

Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457

A 96-channel FPGA-based Time-to-Digital Converter

(TDC) and fast trigger processor module with multi-hit

capability and pipeline

Mircea Bogdana, Henry Frischa, Mary Heintza, Alexander Paramonova,,Harold Sandersa, Steve Chappab, Robert DeMaatb, Rod Kleinb, Ting Miaob,

Peter Wilsonb, Thomas J. Phillipsc

aEnrico Fermi Institute, University of Chicago, USAbFermilab National Accelerator Laboratory, USA

cDuke University, USA

Received 2 July 2005; received in revised form 15 August 2005; accepted 18 August 2005

Available online 7 September 2005

Abstract

We describe an field-programmable gate arrays based (FPGA), 96-channel, Time-to-Digital converter (TDC) and

trigger logic board intended for use with the Central Outer Tracker (COT) [T. Affolder et al., Nucl. Instr. and Meth.

A 526 (2004) 249] in the CDF Experiment [The CDF-II detector is described in the CDF Technical Design Report

(TDR), FERMILAB-Pub-96/390-E. The TDC described here is intended as a further upgrade beyond that described in

the TDR] at the Fermilab Tevatron. The COT system is digitized and read out by 315 TDC cards, each serving 96 wires

of the chamber. The TDC is physically configured as a 9U VME card. The functionality is almost entirely programmed

in firmware in two Altera Stratix FPGAs. The special capabilities of this device are the availability of 840 MHz LVDS

inputs, multiple phase-locked clock modules, and abundant memory. The TDC system operates with an input

resolution of 1.2 ns, a minimum input pulse width of 4.8 ns and a minimum separation of 4.8 ns between pulses. Each

input can accept up to 7 hits per collision. The time-to-digital conversion is done by first sampling each of the 96 inputs

in 1.2-ns bins and filling a circular memory; the memory addresses of logical transitions (edges) in the input data are

then translated into the time of arrival and width of the COT pulses. Memory pipelines with a depth of 5 :5ms allowdeadtime-less operation in the first-level trigger; the data are multiple-buffered to diminish deadtime in the second-level

trigger. The complete process of edge-detection and filling of buffers for readout takes 12 ms. The TDC VME interfaceallows a 64-bit Chain Block Transfer of multiple boards in a crate with transfer-rates up to 47 Mbytes/s. The TDC

module also produces prompt trigger data every Tevatron crossing via a deadtimeless fast logic path that can be easily

reprogrammed. The trigger bits are clocked onto the P3 VME backplane connector with a 22-ns clock for transmission

to the trigger. The full TDC design and multi-card test results are described. There is no measurable cross-talk between

www.elsevier.com/locate/nima

0168-9002/$- see front matter r 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.nima.2005.08.071

Corresponding author. Tel.: +1 773 7027479; fax: +1 773 8345959.

E-mail address: [email protected] (A. Paramonov).
http://www.elsevier.com/locate/nimahttp://www.elsevier.com/locate/nima


2/14

channels; linearity is limited by the least-count time bin. The physical simplicity ensures low-maintenance; the

functionality being in firmware allows reprogramming for other applications.

r 2005 Elsevier B.V. All rights reserved.

PACS:29.40.Gx; 07.50.E; 29.40.+r

Keywords:TDC; FPGA; Pipelined; Multi-hit; Fast trigger processer

1. Introduction

The Collider Detector at Fermilab (CDF), is a

large (5000-ton) detector of particles produced in

protonantiproton collisions at 1.96 TeV at the

Fermilab Tevatron[2]. The detector consists of a

solenoidal magnetic spectrometer surrounded by

systems of segmented calorimeters and muon

chambers. Inside the solenoid, precision tracking

systems measure the trajectories of particles; the

particle momenta are measured from the curvature

in the magnetic field and the energy deposited in

the calorimeters. The tracking systems consist of a

silicon-strip system with 4750,000 channels

around the beam-pipe, followed by the Central

Outer Tracker (COT), a large cylindrical drift

chamber with 30,240 sense wires arranged in 96

layers divided into 8 superlayers of 12 wireseach[1]. Four of the layers have the wires parallel

to the beam axis; the remaining four are tilted by

2 to provide small-angle stereo for 3D recon-

struction of tracks. The maximum drift time of the

COT is 200 ns; the maximum drift length is

0.88 cm.

During the present Run II, which started in

2001, the peak luminosity of the Tevatron has

grown to over 1032 cm2 s1, a factor of more than

five higher than in Run I. The Tevatron operates

with a time between beam crossings of 396 ns, withthe result that the occupancy (hits/channel) in the

COT increases with luminosity as the average

number of protonantiproton collisions per bunch

crossing is now substantially greater than one.

A broad range of efforts are underway to upgrade

the readout bandwidth to allow operation at

luminosities up to 3 1032 cm2 s1.

The COT is used to provide a precise measure-

ment in the magnetic spectrometer of the trajec-

tories of the particles produced in the high-energy

protonantiproton collisions. These measurements

are made by recording the time of arrival of

electrical charge at sense wires in the COT.

A particle traversing the COT perpendicular to

the beams traverses 96 layers of sense wires; there

are a total of 30,240 sense wires in the COT. The

measurement of the time of arrival at each wire is

made with a multi-channel Time-to-Digital con-

verter (TDC)[3].

The conversion of time to a digital signal is a

well-developed and sophisticated field[47]. In this

note we describe the design of a new 96-channel

TDC and trigger logic module designed for the

COT implemented in two 48-channel field-pro-

grammable gate arrays (FPGAs) [8]. A third,

smaller, FPGA serves as the VME interface. The

other chips on the board are limited to delay lines,

buffers on the input and output signals to theconnectors, and DC-to-DC converters to supply

voltages not available in the existing VME [9,10]

crates.

Using FPGAs for the combined TDC and

trigger logic functionalities has the benefits of

negligible cross-talk, linearity limited only by the

least count, integrated flexible signal processing for

trigger logic, and high-reliability due to very-low

chip count. In addition, because the implementa-

tion is in firmware, the chip can easily be

reconfigured for enhanced capability; the abilityto reconfigure also shortens the design cycle as the

board does not need to be rebuilt (in general)

during prototyping.

Thirty working prototypes have been built and

tested. A comprehensive suite of test routines,

including some that exploit the capabilities of large

FPGAs to implement sources of test data, has

been implemented and documented. We present

results on performance and readout bandwidth.

Fig. 1shows one of the 30 preproduction boards.

ARTICLE IN PRESS

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457 445


3/14

2. TDC Specifications

A summary of the TDC physical and opera-

tional characteristics is given in Table 1. The

schematics of the board are available at [11].Details of how the TDC operates are given in the

text below.

3. Principle of operation of the TDC

Two functions are implemented in the firmware

of the FPGAs: a conventional TDC, digitizing the

time of arrival of signals, and the processing of

COT data for the trigger system. In this section we

describe the TDC implementation.Secondary particles from antiprotonproton

(pp) collisions traverse the Tevatron beam pipe,

the silicon-strip vertex detector, and then the COT

drift chamber volume. The charged particles ionize

the gas in the drift chamber volume; the tracks are

measured from the time of arrival of the ionization

on the sense wires of the COT[1]. These electrical

pulses (hits), are amplified and shaped by the

Amplifier Shaper Discriminator (ASDQ) cards[1]

directly on the end-plates of the COT, and

transmitted to Repeater cards that drive the

cables to the VME crates on the outside of

the magnet yoke that contain the TDC boards.

The TDC is used to digitize the time of arrival and,as a measure of pulse height, the width of the

pulses from the Repeater cards.

The time-to-digital conversion is implemented

with two Altera Stratix FPGAs, each handling 48

sense wires. This device has an LVDS differential

I/O interface that consists of one dedicated serial-

izer/deserializer circuit [13] for each of the 48

differential I/O pairs. Serial data are received

along with a low-frequency clock. An internal

Phase-Locked Loop (PLL) multiplies the incoming

clock by a factor of 1 to 10. Each input signal is

sampled at the resulting high-frequency clock

rate, converting it to a (1-bit) serial stream, which

is then shifted serially through a shift register.

The shift register is read out as a parallel word

at the low-frequency clock rate, thus convert-

ing the serial data stream into a parallel data

stream that contains the input data sampled at the

higher clock rate. In this application the low-

frequency clock is internally generated with a 12 ns

period and the multiplier factor is set to 10, for a

resulting 1.2 ns sampling rate of the incoming

LVDS signal and a 10-bit wide parallel outputdata stream.

Fig. 2illustrates the serial-to-parallel conversion,

as seen in the Altera Quartus II [14] simulation

window. The input pulse is converted into a 10-bit

parallel data stream, clocked out as successive

words at a 12 ns period. The leading edge of a hit

in the tracking chamber is determined by inspecting

this stream, two words at a time, and counting the

number of 0 bits before the first 1 bit of a

string of at least four 1 bits in 2 consecutive

words at a time. The width of a hit is calculated bycounting the number of successive bits (either 0

or 1) until the start of a string of at least four

consecutive 0 bits occurs.

4. TDC BoardBlock diagram

The physical layout and the data flow on the

board are presented in Fig. 3. We step through

each element in turn below.

ARTICLE IN PRESS

Fig. 1. The CDF-II TDC board. The two large chips with silver

heat sinks are the TDC FPGAs; the large black objects are DC-

to-DC converters (the layout allows addition DC-to-DCconverters, not needed and hence not stuffed on this board).

The FPGA for the VME interface can be seen in the upper left-

hand corner. Connector headers and dip switches near the

center of the board allow debugging with a logic analyzer.

M. Bogdan et al. / Nuclear Instruments and Methods in Physics Research A 554 (2005) 444457446


4/14

The Front Panel (on the left in Fig. 3) receives

96 differential inputs, arranged in four 24-

channel connectors, that receive pulses from

the amplifier/shaper circuits of the COT. The

signals are first applied to a receiver block

that converts them from a CDF-specific

quasi-LVDS signal [12] into standard LVDS

and passes 48 of them directly to each of the

two FPGA TDC chips, which have identical

designs.

Each of the two TDC FPGAs (TDC Chips)

does the time-to-digital conversion for 48 wires

to generate the Hit-Count and Hit-Data results.

These are stored in internal VME Readout

buffers implemented on the chip. The TDC

Chips also generate prompt data for the Level-1

track trigger processor (XFT).

The VME interface block is implemented with

an Altera Apex FPGA [15]. This block co-

ordinates VME access to the TDC Chips for

ARTICLE IN PRESS

Table 1

The physical and operational characteristics of the CDF-II TDC

Characteristic Values Comment

TDC digitization performance

Channels 96 48/FPGA

Time bin size 1.2 ns

Hits/channel p7 hits Configurable via VME

Min time between hits 4.8 ns

Full scale range 304.8 ns 254 1.2ns

Min width/hit 4.8 ns

Max width/hit 304.8 ns

Differential non-linearity o200 ps See Section 8.1

Accuracy (jitter) p1 count See Section 8.1

Channel-to-channel skew o100 ps

L1 pipe-line size 512 words=6:144ms 480-bit wordsTest data RAM size 512 words=6:144ms 480-bit words

L2 buffer length p64 words/768 ns 480-bit words; ConfigurableProcessing time 12ms after L2A Includes readout packing

Min interval between L2As 12ms L2A is Level-2 Accept

Prompt (Trigger) Outputs

XFT trigger bits/wire 6 From 11 time windows

XFT time-window 6 ns Min 1.2 ns; Max 12 ns

# of XFT time-windows 11/wire Mapped into 6 trigger bits

Trigger latency 80 ns after BC First word out

Trigger output freq. 32 bits/ 22 ns

Trigger output 43 pins TTL, on VME P3

Readout characteristics

VME interface VME64 Implemented in FPGA

VME readout modes A32/D32, A32/D64 D64 in CBLT mode onlyCBLT64 transfer rate 47 MBytes/s Burst speed

Test modes Data generator Internal 8192 wordmemory

Physical characteristics

Physical format 9U VME ANSI VIPA[9]

Power requirements (V/A) 5 V=15A; 5 V=2 AInput connectors 68-pin Mini-D Ribbon

Input levels LVDS CDF uses quasi-LVDS[12]

Front panel LEDs 1 Triple LED/FPGA Configurable in firmware

Trigger output connector VME P3



5/14

regular and Chain Block Transfers (CBLT) [9]

in both 32- and 64-bit modes. The VME chip

itself is connected to only the 16 least significant

bits of the VME data bus (the TDC Chips

connect to all 32 data lines).

Control signals from the CDF data acquisition

and trigger systems are brought onto the TDC

board using user-defined pins of the VME P2

backplane connector. The control signals are

bussed on the backplane of the CDF-standard

9U VME/VIPA crate[9,10]from a CDF TracerCard [16] to each of the TDC boards. The

specific control signals used by the TDC are as

follows:

The CDF system clockthis is the master

reference signal for the CDF data-acquisi-

tion system. The clock has a 132-ns period

and is synchronous with the accelerator RF

structure. On the TDC board the differential

PECL signal from the backplane is first

converted to TTL, then phase-locked, buf-

fered and applied to the TDC chips. The

clock signal is also applied to the TDC chips

after passing through a pair of program-

mable delay lines (0.25 ns granularity, 64 ns

range).

The Bunch Crossing signal (BC), indicating

that a clock corresponds to a crossing of

proton and antiproton bunches.

The Bunch 0 signal (B0) marks the first

proton bunch which comes once per cycle

around the Tevatron ring. The Level l and 2 trigger Accept/Reject

signals, as well as the Level 2 Buffer address

bits.

A calibration pulse from the VME back-

plane. This is converted from PECL to ECL

and can be applied to a pair of pins on each

of the four front panel connectors. The pulse

is thus sent to the amplifierdiscriminator

shaper card (ASDQ) of the COT[1] and is

used for testing and calibration.

ARTICLE IN PRESS

12ns clock

Input pulse

Leading edge Trailing edge

bit 9

bit 8

bit 7

bit 6

bit 5

bit 4

bit 3

bit 2

bit 1

bit 0

OUTPUT

0ns

Word 0 Word 1 Word 2 Word 3 Word 4 Word 5 Word 60000000111 1111111111 1111000000

Fig. 2. An example serial-to-parallel conversion of one of the 96 input channels into a 10-bit-wide parallel data stream as seen in the

Quartus II[14]simulation window. The top two traces are one input line from the tracking chamber and the locally-generated 12-ns

clock. The next 10 traces are the 10 bits of parallel data from the serial-to-parallel conversion, output as one word every 12 ns. The

parallel data, shown at the bottom of the figure, are then examined two words at a time for transitions that signify a leading or trailing

edge of a hit in the COT.



6/14

The VME P3 backplane connector is used to

transmit trigger flags generated by the TDC to

the eXtremely Fast Tracker (XFT) processor to

identify tracks in the COT for the Level-1

trigger.

Each FPGA is connected to a 20-pin header

so that it is easy to use a logic analyzer for

testing and diagnosis. Signals can be routed

to the header by programming the FPGA

firmware.

5. The TDC FPGA Chip

The block diagram of a TDC Chip is presented

in Fig. 4. There are two major data paths inside

the TDC Chip, one to record the COT hits for

VME readout, and one for the generation of the

prompt trigger bits (Trigger Primitives) for the

XFT trigger track processor. The Chip is also

provided with a Test Data generator, an LVDS

pulse generator and a PLL clock generator.

The major functional blocks inside the TDC

Chip are described individually below. These are

implemented as firmware and are optimized for the

CDF application; other applications can be

accommodated by firmware changes.

5.1. The input block

Each TDC Chip has four banks of 12 high-speed LVDS inputs. From each bank a 120-bit

wide data bus passes data to the MUX/MASK

block (see Fig. 4), which can be set under VME

control to block out any unwanted channel (for

example, a COT wire that is continuously set true

due to a some failure). The MUX/MASK block

also allows internal testing of the TDC Chip by

allowing the inputs to be switched to a test pattern

generated inside the Chip with the Test Data

Generator block (described in Section 5.5). The

ARTICLE IN PRESS

FP Conn4

FP Conn3

FP Conn2

FP Conn1

68 pin LVDS

68 pin LVDS

68 pin LVDS

68 pin LVDS

From COT

TDC Chip_1

STRATIX

EP1S30F780C6

TDC Chip_0STRATIX

EP1S30F780C6

P

P

P

1

2

3

VME

EP20K100QC240-3V

CDF_CLK_DEL

2 x 3D3418-0.25S

VMEBuffe

r

Buffer

to XFT

CDF Control

VME

VME

VMEBuffer

Receiver

Receiver

Receiver

Receiver

Fig. 3. The physical layout of the TDC board. The four input connectors, each with 24 LVDS channels, are on the left; the VME

backplane connectors are on the right. The elements are described in turn in the text.



7/14

fast digitization and conversion to a 10-bit wide

data stream for each channel then follows.

5.2. The pipeline and the Level 2 buffer system

The CDF Level-1 trigger is deadtime-less, with

all front-end data held in a pipeline for 5:544mswhile the Level-1 trigger decision is being made

[17]. On the TDC card the delay is implementedwith a clocked pipeline. On receipt of a Beam

Crossing (BC) signal, an input memory address

counter is set to zero; the counter then increments

on every 12 ns clock. The phase of this input pulse

can be adjusted on each board in 0.25 ns steps, set

by VME, to compensate for signal propagation

and input signal length differences. The memory is

512 words deep, each word containing 480 bits (48

channels 10 bits/channel). The memory has two

ports; the first port is always writing and the

second port is always reading. The adjustable

address of the second port, an offset from the first

port, is set to establish the desired delay period for

the pipeline. The maximum delay is 6:1ms.To reduce deadtime in the CDF Level-2 (L2)

trigger system, on a Level-1 Accept signal (L1A)

the data from a given beam crossing are trans-

ferred to one of four Level-2 buffers, awaiting a L2

trigger decision [17]. The four L2 buffers areindependently controlled by an accompanying L2

buffer-selection signal. These signals are in phase

with the CDF clock pulse. The L2 buffers are

two-port memories; each has a respective write

address counter for the input write port and all

share a single address counter for their (output)

read port. This allows the writing of a second

Level-1 buffer while a first is still collecting data.

The write clock is synchronous with the logic in

the pipeline, 12 ns per tick. The data are written

ARTICLE IN PRESS

SERDES

IN

TEST-DATA RAM*

MUX

MASK*

RAM*

512 words

00

01

10

11

RAM*64 words

Inputs from COT

Edge

Detector*

The Pipeline

L2 Buffers

PLL

XFT

Block* XFT-DAQ*

Hit Count Buffer*

7 VME32 words

Hit Data Buffer*168 VME32 words

L1A

L2A

CDF CLOCK

Tx Pulse

RAM*

512 words

SERDES

OUTTx Out Pulse

to front panel

To VME

Interface VME

Decoder

VME

Access(*)

Delayed 12ns , 22ns , 66ns

clocks

readwrite

To P3

Fig. 4. The functional block diagram of the TDC FPGA (TDC Chip). All processing is determined by programming in firmware. Each

Chip handles 48 LVDS channels (shown as coming into the SERDES block in the upper left). The prompt trigger flags for the XFT

trigger processor are output through the P3 VME connector. Data are read out by the CDF Data Acquisition system from the Hit

Count and Hit Data Buffers. An asterisk indicates registers or memories that are VME-accessible. The individual blocks in the diagram

are described in the text.



8/14

10-bits-wide per channel, so that 48 channels are

written at each tick in a 480-bit wide word. To

achieve a maximum of 384 ns for the time range,

the maximum length is 32 words, set via VME atinitialization (the maximum drift time in the CDF

COT is 200ns).

A Level-2 Accept signal, together with an

address pointer, selects one of the four Level-2

buffers to be transferred to the Edge Detector

Block. The read port of the selected memory is

driven through the full range of stored data

addresses to present all the stored data. As not

all buffers contain data from beam crossings that

pass the Level-2 trigger (and therefore these would

not receive a Level-2 Accept signal), the logic

allows any Level-2 buffer to be overwritten

whether or not it has been read. No data memory

buffer is ever erased.

5.3. The edge detector block

The purpose of the Edge Detector Block is to

find hits on the wires. Hits are defined as pulses of

at least 4.8 ns in width. Since the pulses can be

of indefinite length, the techniques of pattern

matching or look-up tables cannot be used. The

technique used in the Edge Detector Block is tolook for leading and trailing edges of a pulse. A

leading edge is defined as a transition from low (0)

to high (1) and a trailing edge is defined as a

transition from high (1) to low (0). It is assumed

that all wires start out in a low (0) state and the

first transition to find is a leading edge.

The Edge Detector Block is made up of two

modules. The first, called the ED, finds and stores

the edges on each of the 48 wires. The second,

called the ED48, controls the timing of the data

transfer into the Edge Detector Block, collects andpacks the output hit data, and signals when the

Block is finished. Each wire has its own dedicated

ED module, making 48 on each TDC Chip; there

is only one ED48 module on each Chip.

The data from the Level-2 Buffer are fed into

the Edge Detector Block in 10-bit words. A single

ED looks for hits in two consecutive 10-bit words

at a time. The beginning of a hit is defined as a

zero followed by at least four ones, or in the case

of the first word, four ones in a row. The end of a

hit is defined as a one followed by four zeroes.

There are three possible transitions in each word

and each transition needs a memory cycle.

The maximum number of words looked at perbeam crossing for each wire is variable, with a

maximum of 33 words (396/12). The maximum is

set by the time between Tevatron beam crossings,

396 ns, and the data clocking period of 12 ns for

each word (1.2 ns per bit times 10 bits per word).

The number of words to be searched by the ED is

set in a register via VME at initialization.

5.3.1. A worked example

Table 2 shows the relationship between the

position of bits in the words sent to the ED andtheir respective time value as an example in a word

that has three possible transitions.

In this case, there is a leading edge (transition

#1) starting at time value 1 and ending at time

value 4 (transition #2). There is also possibly

another leading edge (transition #3), starting at

time value 9, depending on what is in the next

word. In this case, the ED, which looks at 2 words

at a time, would find a hit if the next word started

with three ones.

Once a hit has been found, the data describing

the hit are stored in a RAM in the ED and the hittotal is incremented. Each hit is characterized by

the number of the time bin of the leading edge and

the width, expressed as the number of time bins in

the hit. Thus if the data in the example ofTable 2

were the first word in the data stream, the leading

edge stored would be 1 and the width would be 4.

If instead the example data were the third word,

the leading edge time would be 21 (10 bits each for

ARTICLE IN PRESS

Table 2

A sample word showing three possible transitions in the dataword sent to the Edge Detector

Bit position [9] [8] [7] [6] [5] [4] [3] [2] [1] [0]

Sample data 0 1 1 1 1 0 0 0 0 1

Time value 0 1 2 3 4 5 6 7 8 9

The first is a leading edge starting at time value 1 and ending at

time value 4, which is the second transition. There is also

possibly another leading edge transition starting at time value 9,

depending on what is in the next word. Time values go from

early (9) to late (0).



9/14

the first and second words plus time value 1 in the

third word) and the width would still be 4.

5.4. The XFT Block

The choice of implementing the TDC function-

ality by firmware in an FPGA facilitates integrat-

ing an associated functionality, the implementation

of complex deadtimeless pattern recognition to

feed the tracking trigger system. The TDC XFT

block generates Trigger Primitives used by the

eXtremely Fast Tracker (XFT) [18], which identi-

fies tracks in the Central Outer Tracker (COT).

The tracks are used in the Level 1 Trigger, as well

as in Level 2, and consequently the complete

pattern recognition and momentum reconstruction

have to be available within 5:5ms after the beamcrossing. The sense-wire planes of each superlayer

in the COT are tilted in the r f plane so that a

high momentum track will traverse each superlayer

[1], and consequently will travel between a pair of

sense wires in each plane, resulting in some hits

that are prompt. The present Run IIa XFT splits

the 396ns interval between beam crossings into

three time bins, aprompt, anot sure, and a delayed

time bin. These three bins are logically combined to

give 2 trigger bits per wire every 396ns. The newTDC/XFT design described here uses 11 time

windows to produce 6 output trigger bits per COT

wire every 396 ns. The 6 bits, referred to below

as Trigger Primitives, are derived from the hit

occupancies in the 11 time windows with Boolean

logic. The larger number of time windows allows

better momentum resolution and fake track rejec-

tion.The Trigger Primitives, consisting of the 6 bits

per wire times 48 wires, are output each 396 ns

from the XFT block on each of the 2 FPGAs.

Every 22 ns the TDC Chip sends out 16 bits to

the P3 backplane (48 bits for 66 ns, and so 48

6 288 bits for 396 ns). In order to speed up the

transmission of data to the XFT, the bit for each

of the time bins is transmitted in turn after the

calculation for its corresponding three time-win-

dows (see below) is completed. The time windows

can be reprogrammed to optimize performance by

changing data in VME registers without firmware

changes.

The TDC XFT block receives two types of

signals, the CDF control pulses Bunch 0 (B0) and

Bunch Crossing (BC), and the primary (digital)

data-stream from the COT. The XFT block sends

out timing alignment signals and the Trigger

Primitives (trigger bits) to the P3 connector on

the VME backplane for transmission to the XFT.

The TDC XFT block is controlled by two VME

registers and an internal RAM[19]. The values of

the registers determine the input and outputdelays, and the RAM contents define the time-

window intervals.

The TDC XFT block includes three main blocks

(Fig.5):

ARTICLE IN PRESS

48 Occupancy DetectorsOutput

Multiplexer

Data Stream Trigger Bits

Trigger

Primitives

Trigger

Logic

Control

CDF_CLK

BC

Time-window

enable bits

Time-window

clear bits

BC Delayed

B0

Fig. 5. Block diagram of the TDC XFT Logic. The data stream comes from the MUX/MASK Block (Section 5.1) as 480-bit words

every 12 ns. The CDF_CLK, BC and B0 signals are the 132 ns CDF master clock, and the Bunch Crossing and Bunch Zero signals

generated by the Tevatron, and are transmitted to the VME backplane through the CDF DAQ system [16]. The TDC XFT block sends

out the Trigger Primitives, which are the multiplexed trigger bits, and trigger control signals.



10/14

Occupancy Detectors (OD).

The 48 Occupancy Detectors, one per channel,

receive the digitized COT data stream (48

channels 10 bits/channel every 12 ns), per-form hit recognition, and send out 48 6

Trigger Primitive bits every bunch crossing to

the Output Multiplexer. Each OD looks for a

1111 pattern (a hit) in two consecutive words

in the data stream (20-bits), which corresponds

to a 24 ns time interval. The search for hits is

performed separately by two Hit Scanners

inside each OD; one for hits starting in the first

5 bits (6 ns), and the second for hits in the next 5

bits of the 20-bit data segment. Each Occu-

pancy Detector contains 11 independent 1-bit

registers, (Time-Window Registers), controlled

by the Trigger Logic Control, that store hit

information. Each register records if there is a

hit in during the corresponding 12-ns time-

window. In the current firmware design each

time-window is controlled by two bits (time-

window control bits), used to implement

separate hit scanning for the first and the last

6 ns of each 12-ns clock interval. If the first

control bit is high, the register is sensitive to the

Hit Scanner for the first 6 ns of every 12ns

interval (the first 5 bits of 10-bit data word).Similarly, if the second bit is high, the register is

sensitive to the Hit Scanner for the last 6 ns of

every 12 ns (last 5 bits of 10-bit data word). The

time-window register records the logical OR of

the output of the two Hit Scanners. For each of

the 11 registers, the time-windows covering a

396 ns cycle (33 12-ns intervals) are defined by a

2 33 bitmap stored in the Trigger Logic

Control. At the end of every 396-ns cycle the

Time Window Register bits are sent to the

Output Multiplexer. The registers are thencleared.

This implementation allows defining time-win-

dow ranges in units of 6 ns for the CDF XFT.

However, hit scanning can be done separately

for every bit of the 10-bit data word (this would

require configuring the firmware for 10 time-

window control bits per channel). In this case

the time-window unit would be 1.2 ns. The 6 ns

resolution is thought to be sufficient for the

existing XFT system.

Trigger Logic Control (TLC). The TLC con-

trols the XFT logic. Every 12 ns the Trigger

Logic Control block sends 22 bits, two per time-

window to the Occupancy Detectors definingwhich 11 time-windows are enabled for this

clock cycle. All Occupancy Detectors get the

same 22 time-window bits. These time-window

bits are stored in a 22-bit wide RAM, accessible

via VME. The RAMs Address Counter is

incremented every 12-ns clock and hence the

time-window bits refresh every 12ns. The

Address Counter starts to count on receiving

an XFT-Enable pulse, which is adjusted to be

delayed by the same amount as the COT data

stream relative to the Bunch Crossing (BC) and

Bunch Zero (B0).

Output Multiplexer (OM). This receives the

trigger bits from the 48 Occupancy Detectors

and sends the Trigger Primitives in parallel

to the P3 backplane. Each OM sends out a

16-bit Trigger Primitive word every 22 ns on

the P3 connector to the XFT. The signals

are buffered to the P3 connector as TTL levels.

The OM also sends synchronously a Word_0

marker, the B0 marker if appropriate (i.e. the

crossing is that of the Tevatron bunch 0), and

an alignment signal (Data Strobe). The OMdoes not perform any logical operations on

the trigger bits, but sends them in the order

required by the XFT [18].

The existing cables to the XFT have too low a

band-width to transmit the 22-ns Main Clock as

a data strobe, and so Trigger Primitive bits are

sent on the leading and trailing edges of a

slower clock which is also transmitted on the

cables. The Data Strobe (DS) is a 44 ns clock

formed from doubling the period of the Main

Clock. Thirty-two bits, 16 from each TDCChip, are sent every 22 ns, with 18 such cycles in

the beam-crossing period of 396 ns.

5.4.1. TDC XFT-DAQ Block

The TDC XFT-DAQ block of the firmware,

used for testing and diagnostic purposes only,

connects to a dedicated DAQ system similar to the

hit-data stream. It has the same structure, consist-

ing of a Pipeline, L2 buffers, and VME-readout

buffers, and follows the same L1A/L2A sequence

ARTICLE IN PRESS



11/14

as that of the main hit-data stream. The length of

the XFT-L2 buffers is also VME controlled. For

testing, the XFT block is fitted with another simple

VME Readout RAM that contains the currentXFT Trigger Primitives. This RAM can be frozen

and read out via VME for diagnostic tests, in

particular for debugging the TDC-XFT connec-

tion.

5.5. The Test Data Generator

The Test Data Pattern Generator inside the

TDC Chip allows testing of the functionality of the

TDC by running test data through the entire

sequence. The Pattern Generator is implemented

in the firmware as a VME accessible dual-port,

512-bit wide RAM, which can store 8192 32-bit

words. The first 480 bits of the 512 are used to

drive the 48 channels with 10-bit words of test

data. The Pattern Generator is used in the tests

described below in Section 8.

5.6. The ASDQ Pulse Generator

The COT electronics chain of ASDQ-Repeater-

TDC boards can currently be calibrated by having

each TDC send a differential-ECL calibrationpulse upstream on a pair of dedicated lines in the

multi-conductor signal cable to its associated

ASDQ cards. Each ASDQ uses this input to

generate a calibration pulse at its input, which then

propagates back through the data lines of the same

signal cable[1]. This loop allows a time calibration

of the signal path, including the actual cable

length. Multiple chains can be calibrated simulta-

neously by providing the bussed calibration back-

plane signal to each TDC (see Section 4). The

TDC firmware also provides for local generationon the TDC of the pulses to the ASDQ cards. The

choice of the two calibration modes is made by

VME register selection.

In the local generation mode, the SERDES

OUT block (see Fig. 4) generates a serial LVDS

pulse pattern, available as an ECL signal on the

front panel. The timing and number of pulses are

controlled via VME by writing the contents of the

Tx Pulse RAM(Fig.4), which is implemented as a

10-bit wide, 512-word memory.

5.7. The Clock Generator Block

The Clock Generator Block, implemented in

each TDC FPGA with PLLs, generates the 12-nsand 22-ns clocks used inside the Chip. All the

clocks are synchronous with the delayed CDF

clock. The prompt CDF clock is also received and

is used to latch the CDF-specific back-plane

control signals (described in Section 4). Four 12-

ns clocks are generated onto output pins of the

FPGA as LVDS signals and routed back into the

Chip onto the FPGAs dedicated high-speed clock

input pins, one for each high-speed I/O bank.

6. The VME Interface Block

The VME Interface is implemented with a third,

smaller, Altera FPGA [15]. The design permits

Chained Block Transfer (CBLT) read commands

in both 32- and 64-bit modes for data transfer.

CBLT uses geographical addresses different from

the ones normally used in the crate, recognized

only by the participating modules. In CDF there

are typically 18 TDC cards per VME crate.

There are two possible CBLT read commands:

(1) Read Block Transfer from virtual slot 30: Hit

Count words are read from every TDC module

in the crate. Each TDC produces 14 words/

board in 32-bit mode and 8 words/board in 64-

bit mode.

(2) Read Block Transfer from virtual slot 31: Hit

Data words are read from every TDC module

in the crate. Each TDC produces up to 336

words/board in 32-bit mode and up to 168

words/board in 64-bit mode.

In this firmware implementation, the CBLT mode

is enabled by default. Up to 18 TDC cards sit in a

VME crate on the CDF detector; the TDC module

closest to the Crate CPU is automatically con-

sidered first in the chain. The setting of a module

as last in the chain, or the possible removal of a

module from the readout, is done by writing to a

register in the modules VME Chip to disable

CBLT mode[19].

ARTICLE IN PRESS



12/14

7. Module power requirements

The TDC board receives 5 V=15 A and

5 V=2A on the P0, P1, and P2 backplaneconnectors. The board generates 1:5 V=15 Aand 3:3 V=10 A with DC/DC converters, and2:5 V=3 A using a linear regulator. A 3:3 Vvoltage is also generated and passed through the

front panel connectors to the amplifierdiscrimi-

natorshaper card (ASDQ)[1]. Spare locations on

the TDC board are provided for two additional

DC/DC converters.

8. Performance of the TDC and Readout

The implementation of both the TDC function-

ality and the processor with its extensive trigger

logic and self-testing capabilities in one chip with

digital inputs and outputs has an impact on

characterizing the properties of the TDC. We

discuss the linearity, cross-talk, and channel-to-

channel time skew in turn below [20]. We also

describe the bandwidth of the module for both 32-

and 64-bit VME block transfers.

8.1. Linearity

Unlike in more conventional analog TDCs, the

time-to-digital conversion is basically digital, with

the time binning determined by the sampling of the

input signal during de-serializing. This is con-

trolled by a phase-locked loop. The PLL locks a

local oscillator; an error bit monitors loss of lock.

The jitter in the clock driven by the PLL is

specified to be less than 200 ps; the drift in the PLL

is negligible [13]. The differential non-linearity

consequently is less than 200 ps. Channel-to-channel skew is specified to be less than 100 ps

[13]. Each of these effects is significantly smaller

than the least count of 1.2 ns; we (conservatively)

quote 1 count to be the inherent precision of the

device.

In the CDF application described here, the data

arrive at the front face of the TDC asynchronously

with the CDF clock. There is consequently an

undetermined phase between the clock and the

data, resulting in a possible additional 1 count

jitter.1 Fig. 6shows measurements of the linearity,

with no deviation from expectations.

8.2. Channel-to-channel skew

Skew measurements were made from the LVDS

receivers output pins to the FPGA input pins. The

measuring point on the FPGA was accessed by

soldering test pick-off wires to the FPGA BGAhole pattern on the solder-side of the PC board.

The channel-to-channel skew from contributions

other than the FPGA was measured to be less than

100 ps. Channel-to-channel variations inside the

FPGA are negligible.

8.3. Crosstalk

To test for crosstalk we compare the output of

the TDC boards with a reference set, computed

automatically by the test routines from theinput data stored in the RAM in the TDC Chips.

All test routines are implemented as C-code,

which is executed on the PowerPC (MVME2301

or MVME5500) crate controller after compilation.

Other crate controllers can also be used without

any significant differences. Both walking 1s and

0s, and random patterns were used. The test

ARTICLE IN PRESS

00

50HitDa

taCount(Ti

me)

+20 ns+0 ns +60 ns+40 ns +80 ns +100 ns

Time Change of ASDQ Input Pulse

T0

100

150

200

Avg CNT=91.5

Avg CNT=108.1

Avg CNT=125.0Avg CNT=142.0

Avg CNT=158.4

Avg CNT=175.0

Slope is 1-count/1.202 ns

Fig. 6. Measurement of linearity of one channel: counts versus

delay in the input pulse. Because the sampling is controlled by a

phase-locked loop the differential non-linearity is expected tobe less than the least count, consistent with the measurement.

1This is not inherent in the TDC electronics, but is from the

application. For a synchronous application this does not apply.



13/14

routines exercise the Edge Detector (ED48 test),

the TDC XFT block (XFT test) and the Chain

Block Transfer in both 32- and 64-bit modes

(CBLT32/64 tests).A total of 5 109 cycles were performed; zero

bit-errors were observed. This puts a limit at 90%

C.L. on the cross-talk bit-error rate due to cross-

talk inside the FPGA of 5 1010.

8.4. Readout bandwidth

The results of tests of the readout speed are

described below in more detail. The burst-mode

readout speed achieved in CBLT64 mode is

47 MB/s. The Edge Detector (ED48) processing

time is always less than 12 ms (for 7 hits/wire).

8.5. Testing chain block transfer: the CBLT32/64

Test

The CBLT 32/64 tests read out multiple boards

in a crate sequentially. Tests were performed with

up to 18 TDC boards in a VME crate. The tests

are designed to check the VME chain-block

transfer capabilities of the board, and also crate-

wide characteristics such the crate backplane

capability, stability, and multi-board performance.In the CBLT tests the initial contents of the

Level 2 buffer RAMs are used to predict the

output of the ED48 modules as read from the Hit

Count and the Hit Data RAMs. In the full-crate

test 18 TDC boards are used. For each of the 36

TDC Chips in the test (2/board), the L2 buffer

lengths are set so that all possible patterns are

sampled. The order of accessing the four L2

buffers is also selected differently for each TDC

board, so that all combinations of L2 buffer and

buffer length are sampled. The CBLT32 andCBLT64 tests were repeated 5 109 times each

without a single failure, such as would be due to

cross-talk or timing jitter in the logic.

9. Conclusions and summary

A new 96-channel TDC and trigger processor

module has been designed for the CDF Experi-

ment at Fermilab using the multichannel bit-

sampling capabilities of the AlteraStratix FPGA

family. The board, built in a 9U VME format,

contains few other components other than the 2

TDC FPGAs, a VME controller implemented in a3rd FPGA, DC-to-DC-converters, and input/out-

put buffers. The functionality is exceptionally

flexible, being controlled by firmware, so that it

can be reprogrammed for different applications.

The TDC has extensive test capabilities, imple-

mented directly in the FPGAs. Thirty boards have

been built and tested. The reliability of the board is

high as the chip count is very low. The TDC

differential non-linearity is 200 ps, i.e. within 1

count at any point in the range, and no cross-talk

was detected in 5 109 96-channel tests. The

channel-to-channel skew is less than 100 ps. A full

crate of the CDF-II TDC has been operated and

read out in 64-bit block-transfer mode at a speed

of 47 Mbytes/s.

Acknowledgements

We thank Bill Badgett, Frank Chlebana, Pat

Lukens, Aseet Mukherjee, and Kevin Pitts for

help, support, and advice. Nils Krumnack and Ed

Rogers deserve special thanks for providing

critical input on the XFT specifications and the

design of the XFT sections of the TDC. We thank

Rich Northrop for the picture of the board.

This work was supported in part by the

National Science Foundation under Grant No.

5-43270, and the US Department of Energy.

References

[1] T. Affolder, et al., Nucl. Instr. and Meth. A 526 (2004)249.

[2] The CDF-II detector is described in the CDF Technical

Design Report (TDR), FERMILAB-Pub-96/390-E. The

TDC described here is intended as a further upgrade

beyond that described in the TDR.

[3] R.S. Moore [the CDF Run II collaboration], A custom 96-

channel VME TDC for the CDF detector for Tevatron

collider Run II; FERMILAB-CONF-04-262 Prepared for

2004 IEEE Nuclear Science Symposium and Medical

Imaging Conference (NSS/MIC), Rome, Italy, 1622

October 2004.

[4] D.I. Porat, IEEE Trans. Nucl. Sci. NS-20 (1973) 36.

ARTICLE IN PRESS



14/14

[5] J. Kalisz, Metrologia 41 (2004) 17

http://www.iop.org/EJ/abstract/0026-1394/41/1/004.

[6] J.F. Genat, Nucl. Instr. and Meth. A 315 (13) (1992) 411.

[7] An extensive list of references can be found in:

A. Ma ntyniemi, A High Resolution Time-to-DigitalConverter Based on Stabilised Three-stage Delay Line

Interpolation, Department of Electrical and Information

Engineering, University of Oulu; OULU 2004 Thesis;

ISBN 951-42-7460-I;ISBN 951-42-7460-X;http://herkules.

oulu.fi/isbn951427461X/isbn951427461X.pdf

[8] J. Kalisz, R. Szplet, J. Pasierbinski, A. Poniecki, IEEE

Trans. Instrum. Meas. 46 (1997) 51.

[9] ANSI/VIPA 23-1998, March 22, 1998. These crates

support geographical addressing. Chain Block Transfer is

described in Appendix E.

[10] T. Shaw, G. Sullivan, A Standard Front-End Trigger

VME Bus-Based Readout Crate for the CDF Upgrade,

CDF/DOC/TRIGGER/CDFR/2388, May 12, 1998. The

CDF readout crate is a 21-slot 9U VME crate based on the

VIPA standard, with added bussed signals on rows A and

C of the P2 connector. All 64 of the user-defined pins on

these rows are bussed between slots 2 and 21 using the

same termination scheme as the standard VME bussed

lines. Additional power pins are provided on the P0

connector. The J3 backplane, which is physically separate

from J1/J2, is customized for I/O specific to each CDF

subsystem.

[11] The schematics and test results are available online

at: http://edg.uchicago.edu/bogdan/tdc/index.html. The

firmware is in CVS at http://www-cdfonline.fnal.gov/

cgi-bin/cvsweb.cgi/NuTDC. The schematics and code are

available on request.

[12] B. Bevensee, et al., IEEE Trans. Nucl. Sci. NS-43 (1996)

1725;

K. Pitts, private communication.

[13] Altera Corporation, Stratix Device Handbook, v3.0, Apr.

2004. The serializer/deserializer block is labeled SERDESin this Handbook, and in the figures and text here. The

specifications on differential non-linearity and skew in the

chip are from communications from Altera.

[14] Altera Corporation, Quartus II Handbook, v2.1, August

2004.

[15] Altera Apex AP20K100.

[16] T. Shaw, Specification for Trigger And Clock + Event

Readout Module (TRACER) Terri Shaw, CDF Internal

Note 4686, 8/1/98.

[17] M. Campbell, H. Frisch, M. Shochet, G. Sullivan,

D. Toback, J. Wahl, P. Wilson, CDF Internal Note

2038, April 1993.

[18] E. Rogers, N. Krumnack, CDF Internal Note 7193,

August 2004. This note is the specification for the XFT

implementation.

[19] M. Bogdan, H. Sanders, CDF Internal Note 6998, June

2004.

[20] Tests were made with VxWorks V5.3c [21] and FISION

V2.12 [22] running on a RedHat V6.2 Linux operating

system. CDF control pulses were provided by a CDF

TESTCLK card[23].

[21] r2003 WindRiver Systems, Inc. MCL-DS-VXW-0309.

[22] J. Pangburn, FISION V2.12 Users Guide, http://

www-cdfonline.fnal.gov/vme/FISION.html.

[23] T. Shaw, W. Stuermer, Internal note TESTCLK V7, CDF

Crate Clock and Trigger Driver, Fermilab, ETT/CDF

Upgrade Group, July, 1998.

ARTICLE IN PRESS

http://www.iop.org/EJ/abstract/0026-1394/41/1/004http://herkules.oulu.fi/isbn951427461X/isbn951427461X.pdfhttp://herkules.oulu.fi/isbn951427461X/isbn951427461X.pdfhttp://edg.uchicago.edu/~bogdan/tdc/index.htmlhttp://edg.uchicago.edu/~bogdan/tdc/index.htmlhttp://www-cdfonline.fnal.gov/cgi-bin/cvsweb.cgi/NuTDChttp://www-cdfonline.fnal.gov/cgi-bin/cvsweb.cgi/NuTDChttp://www-cdfonline.fnal.gov/vme/FISION.htmlhttp://www-cdfonline.fnal.gov/vme/FISION.htmlhttp://www-cdfonline.fnal.gov/vme/FISION.htmlhttp://www-cdfonline.fnal.gov/vme/FISION.htmlhttp://www-cdfonline.fnal.gov/cgi-bin/cvsweb.cgi/NuTDChttp://www-cdfonline.fnal.gov/cgi-bin/cvsweb.cgi/NuTDChttp://edg.uchicago.edu/~bogdan/tdc/index.htmlhttp://edg.uchicago.edu/~bogdan/tdc/index.htmlhttp://herkules.oulu.fi/isbn951427461X/isbn951427461X.pdfhttp://herkules.oulu.fi/isbn951427461X/isbn951427461X.pdfhttp://www.iop.org/EJ/abstract/0026-1394/41/1/004

Documents

A 96-channel FPGA-based Time-to-Digital Converter