20
Performance analysis of self-electro-optic-effect-device-based ~ SEED-based ! smart-pixel arrays used in data sorting M. P. Y. Desmulliez, B. S. Wherrett, A. J. Waddie, J. F. Snowdon, and J. A. B. Dines The performance factors associated with self-electro-optic-effect-device- ~SEED-! based smart-pixel ar- rays are analyzed in terms of semiconductor technology and pixel complexity. The sorting task is chosen as a practical example. Complementary metal-oxide semiconductor ~CMOS!–SEED 2 3 2 self-routing nodes operated with quasi-cw-mode lasers are shown to provide the maximum processing power and on- or off-chip communication rate. The need for new front-end amplifiers for the smart-pixel technology is emphasized. © 1996 Optical Society of America Key words: Optical computing, optical information processing, smart pixel arrays, sorting algorithm, perfect shuffle, self-electro-optic-effect device. 1. Introduction Advances in the integration of micrometer-size optoelectronic components with very-large-scale- integration ~VLSI! circuits have enabled arrays of electronic-processing elements to be interconnected through free space or by means of guided-wave struc- tures. 1 The combination of these components with conventional electronic-logic families @emitter- coupled logic ~ECL!, complementary metal-oxide semiconductors ~CMOS’s!, bipolar CMOS’s ~Bi- CMOS’s!# has permitted the design of various ele- ments, defined as smart pixels, whose common feature is the reception and the modulation or emis- sion of information in the optical domain and its pro- cessing in the electrical domain. 2,3 The use of this hybrid technology in optical demonstrator systems 4,5 has prompted investigations of whether the perfor- mance of smart-pixel arrays, 6–9 their cost, and their reliability 10 can be regarded as feasible in informa- tion processing. Of particular importance are the issues involved in determining the complexity of individual processing elements, such as for optimizing the overall system performance. The complexity of the pixel refers here to the degree of smartness. Very complex ~in- telligent! pixels demand a large chip area, so that few can be accommodated per chip. The resulting low on-chip parallelism may not optimally exploit the rec- ognized advantages of nonlocal or space-variant op- tical interconnects. 11 Conversely, the optics may not be able to cope with vast numbers of simple ~dumb! pixels per chip. The power dissipated on chip by the optical transceivers and the electronic circuitry also influences the pixel density. Insuffi- cient heat-removal capabilities induce thermal non- uniformity across the array and a degradation of the system performance. Furthermore, if the process- ing array is to be operated at a given frequency, the limited amount of laser-source power available to read from or write onto the array imposes, for some systems, a maximum pixel density. All the above limitations, which depend on the ar- chitecture, the algorithm, and the sophistication of the electronic and optoelectronic components, are quantified in this paper. The optoelectronic compo- nents are taken in all the cases studied to be multiple-quantum-well structures ~MQW’s! acting as p-i-n photodetectors or as electro-absorption modula- tors. 12 The different transceivers considered span the evolution of self-electro-optic-effect device ~SEED! technology. 13 Classes of devices ranging from the symmetric-SEED 14 ~S-SEED! to the CMOS-SEED 4 are investigated, accounting for the logic functional- M. P. Y. Desmulliez is with the Department of Computing and Electrical Engineering and B. S. Wherrett, A. J. Waddie, J. F. Snowdon, and J. A. B. Dines are with the Department of Physics, Heriot-Watt University, Edinburgh EH14 4AS, Scotland, United Kingdom. Received 20 February 1996; revised manuscript received 3 June 1996. 0003-6935y96y326397-20$10.00y0 © 1996 Optical Society of America 10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6397

Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

  • Upload
    j-a-b

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

Performance analysis ofself-electro-optic-effect-device-based~SEED-based! smart-pixel arrays used in data sorting

M. P. Y. Desmulliez, B. S. Wherrett, A. J. Waddie, J. F. Snowdon, and J. A. B. Dines

The performance factors associated with self-electro-optic-effect-device- ~SEED-! based smart-pixel ar-rays are analyzed in terms of semiconductor technology and pixel complexity. The sorting task is chosenas a practical example. Complementary metal-oxide semiconductor ~CMOS!–SEED 2 3 2 self-routingnodes operated with quasi-cw-mode lasers are shown to provide the maximum processing power and on-or off-chip communication rate. The need for new front-end amplifiers for the smart-pixel technology isemphasized. © 1996 Optical Society of America

Key words: Optical computing, optical information processing, smart pixel arrays, sorting algorithm,perfect shuffle, self-electro-optic-effect device.

1. Introduction

Advances in the integration of micrometer-sizeoptoelectronic components with very-large-scale-integration ~VLSI! circuits have enabled arrays ofelectronic-processing elements to be interconnectedthrough free space or by means of guided-wave struc-tures.1 The combination of these components withconventional electronic-logic families @emitter-coupled logic ~ECL!, complementary metal-oxidesemiconductors ~CMOS’s!, bipolar CMOS’s ~Bi-CMOS’s!# has permitted the design of various ele-ments, defined as smart pixels, whose commonfeature is the reception and the modulation or emis-sion of information in the optical domain and its pro-cessing in the electrical domain.2,3 The use of thishybrid technology in optical demonstrator systems4,5has prompted investigations of whether the perfor-mance of smart-pixel arrays,6–9 their cost, and theirreliability10 can be regarded as feasible in informa-tion processing.Of particular importance are the issues involved in

M. P. Y. Desmulliez is with the Department of Computing andElectrical Engineering and B. S. Wherrett, A. J. Waddie, J. F.Snowdon, and J. A. B. Dines are with the Department of Physics,Heriot-Watt University, Edinburgh EH14 4AS, Scotland, UnitedKingdom.Received 20 February 1996; revised manuscript received 3 June

1996.0003-6935y96y326397-20$10.00y0© 1996 Optical Society of America

determining the complexity of individual processingelements, such as for optimizing the overall systemperformance. The complexity of the pixel refershere to the degree of smartness. Very complex ~in-telligent! pixels demand a large chip area, so that fewcan be accommodated per chip. The resulting lowon-chip parallelismmay not optimally exploit the rec-ognized advantages of nonlocal or space-variant op-tical interconnects.11 Conversely, the optics maynot be able to cope with vast numbers of simple~dumb! pixels per chip. The power dissipated onchip by the optical transceivers and the electroniccircuitry also influences the pixel density. Insuffi-cient heat-removal capabilities induce thermal non-uniformity across the array and a degradation of thesystem performance. Furthermore, if the process-ing array is to be operated at a given frequency, thelimited amount of laser-source power available toread from or write onto the array imposes, for somesystems, a maximum pixel density.All the above limitations, which depend on the ar-

chitecture, the algorithm, and the sophistication ofthe electronic and optoelectronic components, arequantified in this paper. The optoelectronic compo-nents are taken in all the cases studied to bemultiple-quantum-well structures ~MQW’s! acting asp-i-n photodetectors or as electro-absorption modula-tors.12 The different transceivers considered spanthe evolution of self-electro-optic-effect device ~SEED!technology.13 Classes of devices ranging from thesymmetric-SEED14 ~S-SEED! to the CMOS-SEED4

are investigated, accounting for the logic functional-

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6397

Page 2: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

ity that each can deliver. The S-SEED provides theleast intelligent of the pixels; the logic functions ofthese p-i-n photodiodes are limited to NAND and NOR orAND and OR operations. By comparison, any logic is apriori possible with logic-SEED’s15 ~L-SEED’s!,which form the second class of pixels studied. Fi-nally, the integration of photodetectors and lightmodulators with VLSI electronics is considered.This integration is either monolithic16 @field-effect-transistor ~FET!-SEED# or a result of the flip-chipbonding of the optical transceiver chip onto a CMOSchip17 ~CMOS-SEED!. For all these components,the smart-pixel arrays are cascaded such that a lasersource is used to read the output modulators of agiven array, and the resulting information signalspropagate, by means of interconnection optics, to thephotodetectors of a consecutive array.One specific information-processing task has been

chosen for this study: the sorting of a two-dimensional ~2-D! array of bit-serial optical datastreams. Data sorting has been and remains one ofthe most commonly performed tasks in computing.18On algorithmic grounds, data sorting is achieved inreduced computational time if it is carried out by theuse of space-variant nonlocal interconnects11 ratherthan the nearest-neighbor ~mesh! interconnects com-mon in electronic parallel-array processors. It is inthe provision of such nonlocal interconnections incombination with the reduction in power required totransmit signals off chip19 that optics excels. Pixelsof different complexities have been designed and laidout to perform the sorting task. The use of this par-ticular example, hence, enables us to quantify accu-rately the power dissipated and the real estatedemanded by the pixel in each of the different opto-electronic families examined. The purpose of thisarticle is fourfold:

1. To define suitable overall system-performancemetrics and figures of merit for SEED-based smart-pixel arrays.2. To analyze how, for a given system configura-

tion, the performance factors are influenced by oper-ating conditions, such as the mode of operation ~cw orquasi-cw! or the wavelength of the laser source.3. To determine the optimum degree of intelli-

gence that the pixels should possess to maximize theperformance factors while satisfying optical, architec-tural, algorithmic, and thermal constraints.4. To propose an optimum system configuration

for an optical sorting-module demonstrator.

The plan of the paper is as follows: The extended-interconnect cellular-logic image processor ~EX-CLIP! architecture11 is used as a generic architecturedesigned to take advantage of the potential of opticsto implement a variety of nonlocal interconnect pat-terns. A fast sorting algorithm, the Batcher’s bi-tonic sort,20 is mapped onto the architecture. Thisalgorithm is explained in Section 2, as are the EX-CLIP architecture and the specific optical intercon-nection patterns used to implement the mapping.

6398 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

The 2-D perfect-shuffle interconnect is selected fordata sorting ~in combination with a derivative, thestacked-deck interconnect, for the case of simple pix-els!. At each stage of the sorting task, the informa-tion is shuffled and neighboring data words arecompared and either exchanged or not. This proce-dure occurs a number of times. The interconnectpattern is identical at each stage of the completealgorithm ~i.e., it is stage invariant! in order to mapthe bitonic sort. Hence, a recirculating architecturecan be exploited, minimizing the optical and electri-cal hardware. This property makes the use of shuf-fle interconnections an attractive proposition whencompared with other isomorphic interconnectionsthat are not stage invariant, such as the banyan orthe crossover.21 In Section 2, the different architec-tures used in data sorting and their modes of opera-tion are also explained. These architectures dependon the complexity of the optoelectronic componentsthat perform the conversion, processing, or both ofthe signals. For each system the number of cyclesnecessary to perform the sort is derived.The SEED-based devices are considered in Section

3. Models for the optical and electrical responses ofthe components are presented. The model for theSEED response must include the saturation that canoccur at high optical-power levels. Too much opticalpower decreases the peak responsivity of the photo-detectors and reduces the output contrast ratio of themodulators; this leads to a maximum operation fre-quency if cascaded arrays are utilized.22–24 The re-sulting system-throughput rate, defined as thenumber of millions of sorted data that can be outputper second @millions of operations per second ~Mops!#,is therefore a nonmonotonic function of the laser-source power per pixel.The different contributions to the net processing

time of a smart-pixel array are discussed in Section 4.These contributions are influenced by the size of thearray, the wavelength of operation, and the electronicfamily involved for the more complex pixels. It isalso shown in Section 4 that the frequency of opera-tion depends on the mode of operation of the lasersource. The lasers themselves may be operated ei-ther in cw mode or in quasi-cw mode. In the anal-ysis the optical power is allowed to vary up to a peakpower of 1 Wylaser source, such that the opticaltransceivers are used at their optimum operatingconditions.By the combination of the results of Sections 2 and

4, the throughput rates of the different systems, sub-ject to the limited amount of laser-source power, arepresented in Section 5. The throughput rate is cal-culated for each system as a function of the chip size.For all the technologies considered, the maximumchip area is taken to be 1 cm2; such an area wouldpermit a good yield in the manufacturing process anda straightforward implementation of the opticalhardware.4The throughput rate is not determined by the

laser-source power alone but also by the ability toremove the heat dissipated from the optoelectronic

Page 3: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

and electronic chips. The heat generated at a smartpixel is the sum of that heat induced optically at thetransceivers and that produced by the electronic cir-cuitry. The overall power dissipated is calculated asa function of the frequency of operation of the system.This power depends strongly on the mode of opera-tion of the laser source. The maximum heat-removal capability is set at 10 W cm22, for whichconventional cooling methods are just adequate.The resulting effects on the potential system through-put are described in Section 6. For comparing thedifferent technologies, the communication rates onand off the individual 1-cm2-area chips are deter-mined in Section 7. Finally, in Section 8 the varioussorters are studied under the relaxation of one orseveral constraints.

2. Architectures for Implementing the BatcherBitonic Sort

A. Generic Extended-Interconnect Cellular-Logic ImageProcessor Architecture

Figure 1 shows the general architecture consideredfor SEED-based 2-D data sorting. The 2m L-bit-longwords to be sorted are input in bit-slice form from aspatial light modulator ~SLM!. The L bit planes en-ter the processing loop at a 2-D smart-pixel arrayacting as a memory unit. The output from thismemory passes through the interconnection pattern.The bit planes are then input sequentially at a smart-pixel array, which performs the processing. Theprocessed data are then reintroduced onto the mem-ory unit prior to further processing, etc. Circulationaround this procedure loop is repeated a specifiednumber of times ~known in advance from the algo-rithm! until sorting from the maximum to the mini-mum has been completed. The sorted words arethen output from the system.All the systems analyzed here use time-

multiplexed multistage networks designed with re-spect to such an iterative architecture. The numberand setup of the memory and processing arrays differaccording to the pixel smartness. The detailed ar-chitecture and the performance factors depend on the

Fig. 1. Functional schematic diagram of the EX-CLIP architec-ture used for data sorting. An unsorted 2-D array of input databeams circulates a fixed number of times around the processingloop before being output to the output SLM.

logic functionality at a particular pixel. For exam-ple, the reduction in pixel complexity from a fullyintegrated 2 3 2 self-routing node4,25 to a singleS-SEED, although permitting an increased pixel den-sity and decreased cycle time, increases the numberof physical cycles necessary to implement the sortingalgorithm.

B. Bitonic Sorting Algorithm

The Batcher bitonic sorting algorithm20 uses arraysof 2-input, 2-output nodes as processing elements.Three logic functions need be achieved at the nodes:~i! compare the two inputs and send the maximuminput to one output channel ~and the minimum to theother!, ~ii! compare the inputs and send theminimumto the first channel ~and the maximum to the other!,and ~iii! let the data go through uncompared. Theenvisaged arrays are two dimensional. However, forclarity the algorithm is explained here for the one-dimensional ~1-D! case; mapping between the twoschemes is straightforward.Figure 2 shows the sorting of a set of eight numbers

entered in ~eight! channels ~at the left-hand side!.4,26A reverse-bitonic sequence of length n, ^z1, . . . , zn&, isdefined by the concatenation of an increasing and adecreasing set of numbers, z1 # z2 # . . . # zk $ . . . $zn, for some k [ @1, n#. A ~reverse! bitonic mergesorts a reverse-bitonic sequence into a set of ~decreas-ing! increasing numbers. At the end of the firststage of the algorithm, two reverse-bitonic sequencesof length 4, ^4, 8, 3, 1& and ^2, 7, 6, 5&, have beencreated from four reverse-bitonic sequences of length2. The second stage generates a single reverse-bitonic sequence of length 8, ^1, 3, 4, 8, 7, 6, 5, 2&,which is subsequently sorted in the third and finalprocessing stage.It is shown, for example in Refs. 4 and 27, that, if

the three functionalities described above can beachieved if a single smart pixel is programmed, thenthe algorithm shown in Fig. 2 can be implemented inthe general EX-CLIP architecture by the use of asingle 2-D perfect-shuffle interconnection or two or-thogonal 1-D shuffles. For a set of 2m numbers thereare ~m 2 1! stages in the algorithm, each consistingof m processing steps. Each of these steps corre-sponds to a single physical cycle of all L data planes,incorporating a 2-D shuffle and a 2 3 2 node decision.The decision to be made at a given pixel depends onthe precise cycle and on the position of that pixel inthe array. The desired programming of the pixelarray can be built into the array in advance or bedetermined by binary images ~masks! that circulatealong with the data.4,25 If the smart-pixel function-ality is less than that of the programmable 2 3 2node, then sorting can still be achieved but requireslarger numbers of cycles and somewhat more compli-cated iterative architectures, as discussed in Subsec-tions 2.C and 2.D. Alternatively 2 3 2 nodes may becombined to provide pixels of higher functionality.Data can be sorted most-significant-bit ~MSB! first orleast-significant-bit ~LSB! first. Here we concen-trate on descriptions for the MSB case.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6399

Page 4: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

C. CMOS-SEED- and FET-SEED-Based Architectures

A possible CMOS-SEED- or FET-SEED-based sorteris shown in Fig. 3 for the MSB mode of operation.This architecture is analyzed first since the systemlayout is very close to that of the generic EX-CLIP,described in Subsection 2.A. Routing the data iscarried out within a single chip ~the sorting-node ar-ray!, which encompasses the whole logic complexityof the sort. Each node treats a pair of input datawords ~2-input–2-output sorting node!. The logicfunctionality of a single node is depicted in Fig. 3.Two state variables ~S and D! are needed in the MSBcase. Defining An and Bn as the pair of inputs at thenth bit plane lets the state variable Dn be calculatedas

Dn 5 Dn21 1 AnBn 1 AnBn, (1)

withD21 5 0. ThusD latches to 1 when the first bitdifference in the two inputs is detected. The secondstate variable latches to the value of An for the bit

Fig. 2. Diagram of the Batcher bitonic sorting of eight numbers.

Fig. 3. Architectures for the 2 3 2 CMOS-SEED- or FET-SEED-based bitonic sorter. Each sorting node ~gray box! encompassesthe exchange–bypass ~eyb! and the sorting logic.

6400 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

plane at which the first bit difference occurs:

Sn 5 Dn21AnBn 1 Sn21Dn21, (2)

with S21 5 0. Thus, after D has latched, the valueof S describes which of the two input words is thelarger. In processing the nth bit plane, the determi-nation of the state variables requires the knowledgeof the values calculated during the processing of theprevious bit plane. This knowledge is obtained bythe use of two electronic feedback loops, as shown inFig. 3. Sn is then fed, along with both inputs, intothe exchange–bypass module ~eyb! whose functional-ity is determined by two algorithm-control bits.These bits indicate whether the two inputs have to beexchanged or passed through at the output; they aredefined by the bitonic-sorting algorithm and areknown in advance of processing.26 One control bit isglobal and can therefore be distributed electricallyover the whole array. The other control bit varieswith the pixel position; however, if required, it can begenerated optically within the EX-CLIP architec-ture.4,28 To achieve this optical generation it is nec-essary to introduce a single control bit plane ~called amask! before the set of words to be sorted; this bitplane circulates alongwith the data planes, providingall the necessary subsequent masks.The sorting-module clock period is the time be-

tween optical data planes. At each processing step,~L 1 F! activations of the clock are needed to processthe words, where F is the number of clock signalsrequired to load the control mask and to reset the SandDmodules to their initial ~n 5 21! state. In thepresent scheme a value of F 5 2 is used. There are,in effect, ~L 1 F! active planes around the physicalEX-CLIP circuit, a number L of which contain dataarrays at any given time. All of the ~L 1 F! smart-pixel arrays are clocked simultaneously. In prac-tice, the active planes that form the memory unit areconfigured on a single smart-pixel chip. If one takesinto account the loading of the input data planes atthe beginning of the algorithm, then the total numberof clock periods is given by

NMSBhybrid 5 @m2 2 m 1 1#~L 1 F!. (3)

CMOS- and FET-SEED arrays of the functionalitydescribed by Eqs. ~1! and ~2! have been designed andfabricated by the European Silicon Structures ~ES2!and AT&T foundry services, respectively. The 2 3 2sorting nodes in both designs perform the same func-tions. Only the MSB mode of operation is consid-ered here because the chips have been designed andlaid out for this mode. The chip area of the 2 3 2node is dominated by that of the exchange–bypassunits. Reduction of the pixel area ~and thereforemore pixels per chip! is achieved by the replacementof these units by pairs of 2-input–1-output units sit-uated on separate chips. The outputs A, B, and Sfrom the array of comparison units are fanned outoptically onto two 2 3 1 node arrays. The requirednumber of clock periods is not increased by thischange; however, the period itself is increased be-

Page 5: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

cause of the added transfer time between the twoarrays and the effect of additional fan-out ~see Section3!. The trade-off between the increased parallelismand the reduced clock frequency is quantified in Sec-tion 8.

D. L-SEED-Based Architecture

Prior to the availability of hybrid optoelectronicsmart pixels, a range of all-optically addressedL-SEED chips was fabricated at AT&T Laborato-ries.13,15 The data must remain optical in this tech-nology, as a consequence of which the architecturesdescribed in Subsections 2.A and 2.C must be modi-fied. Figure 4 shows a possible L-SEED-basedsorter for the MSB mode. The state variables Dnand Sn, defined as in Eqs. ~1! and ~2!, are generatedby the diode-connected logic of specifically designedL-SEED’s. These L-SEED’s and the exchange–bypass modules, also designed with L-SEED’s, aredetailed in Section 4. As in the CMOS-based sorter,the data are processed in pairs. S-SEED arrays Dand S are used to store the state variables Dn21 andSn21 of the previous bit plane. The new state vari-ables Sn and Dn are generated in the L-SEED arraysSlogic and Dlogic. Sn and the input data are thentransferred onto the exchange–bypass modules,which perform the switching decision. The controlbits C1 and C2 are provided by the output patterns oftwo control SLM’s. These SLM’s indicate the direc-tion of the switch at each processing step of the sort-ing algorithm.The subsidiary data cycling in the S ~and D! loops

increases the number of clock periods needed to com-plete the data sorting by comparison with the CMOS-SEED case. Whenever Sn needs to be determined,that is, whenever a switch decision has to be made,3~2L 1 1! clock periods are used. Sn needs to bedetermined for only ~m2 2 m!y2 of the ~m2 2 m 1 1!

Fig. 4. Architecture for the L-SEED-based bitonic sorter. Thefunctionality of the sorting is distributed over several specificallydesigned L-SEED’s.

processing steps; the data is passed through uncom-pared on the remaining steps. If one takes into ac-count loading the data into the memory, the totalnumber of clock periods is given by

NMSBL-SEED 5 Fm2 2m

23~2L1 1! 1 ~m2 2m1 2!LG . (4)

E. S-SEED-Based Architecture

The logic functionality of the S-SEED is limited toNAND and NOR operations.13,14 The resultant archi-tecture chosen for the sorting system, shown in Fig. 5,has a set of loops used to implement the comparisonbetween the sets of input pairs and the routing deci-sions.29 These iterative loops are preferred to otherconfigurations because of the logic simplicity of thepixels and the resulting reduction in optical hard-ware. The outer loop encompasses an S-SEED logicunit ~S-SEEDLU!, two interconnection patterns ~per-fect shuffle and stacked deck29! and two S-SEED ar-rays ~S-SEED 1 and S-SEED 2!. Other loops areformed by the S-SEED logic unit and one of theparallel sets of S-SEED memory arrays. One ac-complishes the 2-input–2-output self-routing func-tionality by cycling the data around the loops severaltimes, involving the use of several S-SEED arrays.It is not the purpose of this paper to describe in

detail the sequence of instructions that lead to thesorting of data. Two phases, however, can be distin-guished:

1. The generation of a 2-D array of informationbits, i.e., the mask Sn, which indicates the position ofthe larger bit for each pair of input data in the array.2. The routing of the input data according to the

above-mentioned mask and specific 2-D input pat-terns provided by SLM 1 and SLM 2.

The outer loop is used to generate intermediate datapatterns that are stored in one of the memories~phase 1! and is also used to route the data ~phase 2!.The other loops are utilized to compute the mask inthe S-SEED logic unit ~phase 1!. An additional

Fig. 5. Architecture for the S-SEED-based bitonic sorter. OnlyNOR and NAND operations are permitted for S-SEED’s, resulting ina more complex architecture than that shown in Figs. 3 and 4.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6401

Page 6: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

memory unit is necessary to store the value of thevariable Dn.For sorting 2m words of length L bits, 20 clock

periods are needed to produce the mask, and routingis carried out in 11 clock periodsybit plane. Addi-tional processing steps are needed, for which all thenodes must let the input data pass straight through,as in the L-SEED case. The total number of clockcycles is

NMSBS-SEED 5 Fm2 2 m

2~20 1 11! 1 ~m2 2 m 1 2!GL

5 @33m2 2 33m 1 4#~Ly2!. (5)

Practical data-array sizes will depend on the chosentechnology. A standard size of 32 3 32 ~m 5 10!,8-bit words can be used to give some feel for theefficiencies of the architectures. For the MSB oper-ation for each case, the number of clock periods re-quired for a complete data sort is Nhybrid 5 910,NL-SEED 5 3031, and NS-SEED 5 11,896.

3. SEED-Based Devices

A. SEED Optical Response

The optoelectronic elements considered in this paperare MQW p-i-n photodetectors and electro-absorptionmodulators, all used in a dual-rail logic mode.14Whereas the S-SEED and L-SEED rely on theelectro-optic effect for bistability, the latching behav-ior of the FET-SEED and CMOS-SEED is imple-mented by electronics. As a consequence, theoperational wavelength for the hybrid pixels can bedifferent from those for S- and L-SEED’s. For alloptoelectronic elements the most appropriate exper-imental characteristics for modeling the devices arethe reflectivity R and responsivity S.30,31 Figures 6and 7 show the SEED-response characteristics asfunctions of the voltage drop across the diode for the

Fig. 6. Experimental SEED-responsivityS curves at wavelengthsof l0 ~solid curves! and l1 ~dashed curves!. The two-line idealizedmodels are also represented ~straight, solid line segments!. Thesuperscripts 0 and 1 represent l0 and l1 operation, respectively.

6402 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

two relevant operational wavelengths,32 traditionallycalled l0 and l1. The corresponding idealized mod-els used in the present analysis are shown asstraight, solid line segments. The S-SEED andL-SEED are operated at the heavy-hole excitonabsorption-peak wavelength l0 at a zero voltagedrop.12 The negative-conductance region of the re-sponsivity curve ensures that, at this wavelength,bistability and latching behaviors exist.At longer wavelengths, the device is no longer bist-

able but a greater output contrast ratio is experi-enced.23 The precise wavelength used for the hybridpixels, l1, is usually determined by the simultaneousoptimization of the photoreceiver and the modula-tor.32,33 Modeling SEED and S-SEED operation haspreviously been based on either the empirical data orthree-line approximations to the reflectivity and re-sponsivity.31,34 For the present purpose a simpler~two-line! approximation is appropriate. It is ade-quate to divide the voltage range into just two re-gimes, in each of which the reflectivity R andresponsivity S are independent of the voltage.30 Theregimes are shown in Figs. 6 and 7 for each wave-length.

B. Saturation Effects

It is shown below that, for a fast electronic responseof the photodetectors, a high input optical power isnecessary. As the power increases, so does the pos-sible operating frequency because the time to convertthe optical signals into electronic signals decreases.However, eventually saturation effects start to dom-inate, with the result that the operating frequencyactually decreases again.At a phenomenological level, the irradiance-

dependent absorption coefficient a at the heavy-holeexciton peak can be described by22–24

a~I! 5a0

1 1 IyIsat, (6)

Fig. 7. Experimental SEED reflectivity R at l0 ~solid curves! andl1 ~dashed curves!, and the two-line idealized models ~straight,solid line segments!. The superscripts 0 and 1 are the same as forFig. 6.

Page 7: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

where I 5 Pypz2 is the irradiance of the Gaussianlaser beam of waist z, a0 is the absorption coefficientat low optical irradiance and Isat is the saturationirradiance. The absorption saturation affects the re-flectivity R and responsivity S responses at low volt-ages. The peak responsivity Sp decreases inmagnitude and shifts toward higher voltage. Thelow-level reflectivity RP increases accordingly as theabsorption coefficient decreases.35–37 During thetransfer of information from one array to another, thetime required to convert optical into electronic sig-nals increases at high power levels because

1. The decreasing difference in low-level andhigh-level reflectivity states of the output modulatorsof the previous array induces a reduction in theoptical-power difference.2. The peak responsivity of the photodetectors de-

creases, which reduces the induced photocurrent.

The variations of the responsivity and reflectivity inthe higher-voltage regime, which are weaker thanthose in the low-voltage region, are neglected in thepresent model. The power dependence of the volt-age at the peak in the responsivity is also neglected.At l0, the low-level reflectivity and the correspondingresponsivity of the SEED are then

Rp~P! 5 Rp~0!x, where x 5 1y~1 1 PyPsat!, (7)

Sp~P! 5 Sp~0!1 2 Rp~P!

1 2 Rp~0!. (8)

The power dependences of these response functionsare used in the following subsections to calculate thethroughput rate and power consumption of smart-pixel arrays. Saturation effects can usually be ne-glected at l1.23 At this wavelength, the absorptionincreases with the applied electric field. The result-ing increase of the photogenerated carriers does notbleach the excitonic peak because the carriers areswept away by the strong field.

C. Modes of Operation of the Laser Source

The laser sources for smart-pixel circuitry can be op-erated cw or at a high modulation rate quasi-cw.Typically cw power levels of 1 W are now achieved forsemiconductor diode-laser bars. In the quasi-cwmode, a laser source such as the master-oscillatorpower amplifier38 has been successfully operated at0.6-W time-averaged power for a modulation rate of 1Gbitsys. In the analysis presented in Section 4,quasi-cw is taken to refer to any periodically modu-lated radiation, regardless of the mark-to-space ratio.In all cases a maximum ~peak! power of 1 W is as-sumed.

D. Optical Losses

In all architectures optical losses must be accountedfor in the calculation of the throughput rate, becausethe processing time of the arrays is power dependent.The most lossy optical path from a laser source to the

detectors of a smart-pixel array determines the over-all system frequency. The efficiency of this path, h,is4

h 5 hbpghbulkhinter, (9)

where the subscript bpg denotes a binary phase grat-ing, bulk denotes the bulk optics, and inter denotesthe optical interconnects. Typically the efficiency ofthe binary phase grating, which generates the arrayof beams from the laser source, has a value of hbpg5 0.8. For the bulk optics, which encompass therelay lenses and beam splitters, the typical value ishbulk 5 0.5, and for the shuffle interconnects it is hinter5 0.7. The resultant efficiency is therefore h 5 0.28.The total laser-source power Ptot is divided into 2

m11

beams for sorting 2m words if dual-rail logic is em-ployed. The two ~dual-rail! beam powers incidentupon the individual photodetectors are therefore

Phigh 5 22m21hRmPtot,

Plow 5 22m21hRpPtot. (10)

It is shown below that the difference in these levelsdetermines the system performance for the chosenoptoelectronic hardware.

4. Clock Period

A. Introduction

Knowledge of the clock period is required to deter-mine the throughput rate. Figure 8 is a schematicdiagram of a smart pixel decomposed into functionalblocks with their respective latency times; the pro-cessing time T of a smart-pixel array is the sum of thedifferent times shown. The clock period must be atleast as long as the time T. The optical-to-electronicconversion time Tconv depends on the optical inputpower on the photodetectors. For reducing the op-tical input power an amplification stage is oftenadded at the front end of the most complex hybridsmart pixel.39 Tamp is the time needed to amplifythe signal for logic processing. Amplification of theelectronic signals takes place as soon as the inputvoltage has reached the threshold voltage, that is,before the full conversion of the optical signals; Tconvand Tamp therefore overlap each other. Telec is thepropagation time of the electronic signals from the

Fig. 8. Functional schematic diagram of a smart pixel. The dif-ferent times depicted are explained in the text.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6403

Page 8: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

output of the amplification stage to the input of thelast electronic stage. This time is usually deter-mined bymeans of the computer program SimulationPackages for Integrated Circuits in Electronics~SPICE!, although analytical determination is possiblefor some cases.40 The switching time of the outputelectronic stage Tout depends on the optical power ifthe laser source that reads the output modulators isoperated in the cw mode, as explained in Subsection4.D.41 Finally, the time of flight of the informationsignals between arrays, Topt, has also to be included.Typically Topt is of the order of 1 ns. The processingtimes of the different pixels are analyzed in Subsec-tions 4.B–4.D. Quasi-cw operation can be employedfor all devices; for the CMOS- and FET-SEED’s, a cwmode can also be utilized, as explained in Subsection4.D.

B. Timing of the S-SEED

In the quasi-cw mode, the read beams of one arrayare switched off as soon as the full input-voltageswing DVin has been reached by the receivers of thenext array. For simplicity, the laser source is as-sumed to generate square pulses of a duration w andpeak power Ppeak. For S-SEED arrays the mark-to-space ratio must equal unity, and the processing timeis34,42 given by

TS-SEED 5 2Tconv 1 Topt

5 2CS-SEED *Vinit

Vfin dVuPRS~V! 2 PSS~V0 2 V!u

1 Topt,

(11)

where PR and PS are the powers of the beams derivedfrom the S-SEED R and S windows, respectively, of aprevious array,14Vinit andVfin are the initial and finalvoltages, respectively, across the R window, CS-SEEDis the total capacitance of the two photodetectors, andV0 is the bias voltage applied to the S-SEED.An analytical expression for the processing time

can be provided by the use of the idealized modeldescribed in Subsection 3.A. The responsivity S isequal to Sp over the voltage range @2Vt, Vm# and isequal toSm elsewhere; the reflectivityR is equal toRpand Rm, respectively. Again, in Figs. 6 and 7 thesuperscripts on Sp, etc., refer to operation at l0 andl1, over these voltage ranges, respectively. For sim-plification of the notation, the power of the individualbeams ~from the binary phase grating! at the outputmodulators is defined as Pread 5 22m21hbpgPtot.The low-level reflectivity from a S-SEED modu-lator and the peak responsivity of the following, re-ceiving S-SEED detector both depend on the individ-ual beam powers, according to Eqs. ~7! and ~8!:

Rp 5 Rp~Pread!, (12)

Sp 5 Sp~Plow! 5 Sp@22m21hRp~Pread!Ptot#,

Sp~Phigh! 5 Sp~22m21hRmPtot!, (13)

6404 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

depending on the S-SEED window. Equation ~11!then becomes

TS-SEED 5 4CcapAF*2Vt

Vm dVSmP

R 2 SpPS

1 *Vm

V02Vm dVSm~PR 2 PS!

1 *V02Vm

V01Vt dVSmP

R 2 SpPSG 1 Topt, (14)

where CS-SEED 5 2CcapA, Ccap is the capacitance perunit area, and A is the window area of the photode-tector. The S-SEED voltage is assumed to swingfrom2Vt toV0 1Vt. With the approximation for theSEED responses of Figs. 6 and 7 we have

TS-SEED 5 4CcapAF Vm 1 Vt

SmPhigh 2 Sp~Plow!Plow

1V0 2 2Vm

Sm~Phigh 2 Plow!1

Vt 1 Vm

Sp~Phigh!Phigh 2 SmPlowG

1 Topt. (15)

Typical values for TS-SEED exceed 20 ns; Topt makesup only a few percent of this period. In practice thetwo windows are fabricated on a mesa, which in-creases the effective S-SEED capacitance. In thepresent analysis it is assumed that the mesa contri-bution can be made small; in the initial technologythis has not been the case, and the resulting switch-ing times and switching energies are an order of mag-nitude larger than those given above.

C. Timing of the L-SEED

An expression similar to Eq. ~11! can be obtained forthe processing time of an L-SEED by the replacementof CS-SEED with an effective capacitance Ceff, whichdepends on the pixel smartness.15 Attenuation ofsome of the input beams of the L-SEED is also nec-essary for correct operation. As for the S-SEED,there are no electronic amplification or output stages,so that Tamp 5 0 and Tout 5 0. The propagationtimes of the electrical signals within the pixel are alsonegligible, so that Telec 5 0.The most complex L-SEED determines the fre-

quency of operation of the system. This is likely tobe the circuit that generates the variables Sn and isshown in Fig. 9. For such a circuit some of the op-tical signal powers must be reduced in order that thegenerated photocurrents produce the correct logicfunctionality @Eq. ~2!#. This reduction influences thepotential operating frequency. For example, thephotocurrent generated at Dn21 or Sn21 in the upperhalf of the circuit must exceed the sum of the pho-tocurrents produced by Dn21, An, and Bn in the lowerhalf. The optical input powers at the latter SEEDdetectors must therefore be one third of those powersat the upper-half SEED’s. The determination of theswitching times in this complex circuit would involve

Page 9: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

the study of the time evolution of the voltage nodes.15It is, however, possible to provide an estimate of theswitching time by a consideration of the effective ca-pacitance. In general, serially connected diodes canbe replaced by the diode that generates the leastcurrent. It can also be shown that one of two groupsof the parallel connected diodes in the bottom half ofthe circuit can be discarded.15 The total capacitanceis then Ceff 5 7CcapA. The factor of 7 stems from thesum of the two capacitances at the output: two forthe top half and three for the bottom half. The pro-cessing time is then

TL-SEED 5 21CcapAF*2Vt

Vm dVSmP

R 2 SpPS

1 *Vm

V02Vm dVSm~PR 2 PS!

1 *V02Vm

V01Vt dVSpP

R 2 SmPSG 1 Topt. (16)

Note that TL-SEED is closely equal to 21~TS-SEEDy4!,given the small contributions from Topt. For thelogic functionalities relevant here L-SEED clock pe-riods exceed 100 ns.

D. Timing of the FET- and CMOS-SEED

As in the cases described above, the difference ininput optical-pulse energies on the dual-rail detectorsmust be sufficient to produce a voltage swing DVin.In the case of hybrid devices, however, this need beonly a small voltage swing about the threshold volt-age of the input transistor. The replacement of theresponsivity function by its mean value S# near thethreshold voltage is therefore justified. Any excur-sion of the voltage swing beyond DVin is usually pre-vented by the addition of clamping diodes at thereceivers,39 as shown in Fig. 10, which shows thefront end of such a smart pixel. We consider hereonly a single-stage voltage amplifier with zero output

Fig. 9. L-SEED circuit that generates the output Sn according toEq. ~13!.

conductance, although the generalization to multi-stage amplifiers is straightforward.37 The pulse du-ration w times the difference of the high and lowoptical powers of the writing beams must satisfy thefollowing condition:

w~Phigh 2 Plow! $ DEmin 5 Cin

DVin

S#, (17)

where Cin is the total capacitance formed by the de-tectors, the bonding pad, and the input transistorgate ~Table 1!. In practice, 75% of this energy isinput before the onset of the amplification period.39For an amplifier of constant transconductance gm,which establishes a logic-level voltage swing DVlog,the conversion and amplification times are then39,40

Tconv 1 Tamp 534Cin

DVin

S#1

~Phigh 2 Plow!1 2Cout

DVlog

gmDVin,

(18)

where Cout is the output capacitance of the amplifi-cation stage. The switching time of the last elec-tronic stage, which is required to establish a voltageV0 at the modulators, depends on the mode of oper-ation of the smart pixel. In the quasi-cw mode, themodulator states are prepared prior to the introduc-tion of the optical read beams, so that we have

Tout~P 5 0! 5 2CexV0

DItrans, (19)

whereCex is the output capacitance of the smart pixeland DItrans is the difference of the transistor currentsflowing to the output capacitance. In FET- andCMOS-SEED-based circuits, cw lasers may be used,provided that the electronic drivers of the outputmodulators are properly scaled to sink the photocur-rents created by the continuous reading of the mod-ulators rapidly. The switching time is affected bythe read-beam powers through the photocurrents,41

Fig. 10. Schematic diagram of a diode-clamped smart pixel andan example of the FET-SEED. The clamping diodes limit theexcursion of the input voltage. Vbi is the built-in voltage of thediodes.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6405

Page 10: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

as shown by

Tout~P! 5 Tout~0!~ fy2!lnSf 1 1f 2 1D , (20)

where f 5 DItransyDIph is the ratio of the difference ofthe transistor currents to the difference of the pho-tocurrents of the modulators. Tout~P! approachesTout~P 5 0! when the transistors are properly dimen-sioned. Tout~P 5 0! is of the order of tens of picosec-onds, so that the output time can generally beneglected.The clock period for the case of the hybrid devices

must exceed the processing time:

Thybrid 534Cin

DVin

S#1

~Phigh 2 Plow!1 2

CoutDVlog

gmDVin

1 Telec 1 Topt, (21)

where Telec is determined, for example, by SPICE sim-ulations. Typical values of Telec are 10 ns for CMOStechnology and 3 ns for FET components. The sec-ond term on the right-hand side in Eq. ~21! contrib-utes approximately 1 ns to Thybrid ~see Table 1!.Because of the power dependence of the first term onthe right-hand side its value depends strongly on thearray size; for a 32 3 32 array subject to a 1-Wlaser-source power, this conversion period is of theorder of 3 ns.

5. Throughput Rate and Results Based onLaser-Power Restrictions

In this section, the expressions of Section 2 for thenumber of clock periods required for the sorting task

Table 1. Definitions and Selected Values of the S-SEED-TechnologyDevice Parameters

Definition Symbol Value

Applied voltage V0 8 VVoltage threshold of minimum responsivity Vm 4 VVoltage threshold for current 2Vt 21 VMaximum responsivity voltage Vp 1 VPeak responsivity Sp 0.5 AyWMinimum responsivity Sm 0.3 AyWMean responsivity S# 0.5 AyWLogic-0 reflectivity at l0 Rp 0.125Logic-1 reflectivity at l0 Rm 0.375Logic-0 reflectivity at l1 Rp

1 0.10Logic-1 reflectivity at l1 Rm

1 0.60Saturation irradiance Isat 3 kWycm2

S-SEED window area A 5 3 10 mm2

L-SEED window area A 5 3 5 mm2

CMOS- and FET-SEED window area A 10 3 10mm2

Maximum input-voltage swing DVin 0.25 VLogic-voltage swing DVlog 1 VWord length L 8Transconductance gm 1023 SInput-gate capacitance Cinp 60 fFBond-pad capacitance Cpad 200 fFFirst electronic stage output capacitance Cout 120 fFDetector capacitance per area Ccap 200 aFymm2

6406 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

are combined with those of Section 4 for the timescale of a single clock period. In each case in thissection, the throughput rate is given for devices op-erated at their optimum conditions. That is, for agiven chip size and problem size, the total laser-source power per pixel array has been calculated ~inthe limit of a 1-W peak power! so as to provide themaximum throughput rate. The system throughputrate H is defined as the number of words that can besorted per second. If the sorting per word is re-garded as an operation, the performance metric H isexpressed inmillions of operations per second ~Mops!.For each system, the maximum throughput rate is

H~m! 5 1026 2m

N~m!T~m!, (22)

where N and T are the number of clock periods forsorting and the processing times, respectively. Notethat T depends onm through the power available perpixel.The parameter H is especially useful in comparing

the performance of different technologies for identicalarray sizes ~equal m!. It must be used with care,however, for comparisons involving different m val-ues. The reason is that sorting 2m words is morethan twice as complex a problem as sorting 2m21

words. Hmay be modified so as to make direct com-parisons by the technique discussed in Ref. 4. Forthe hybrid technologies, sorting a given number ofwords, 2M, say, by the use of a physical array of size2m is performed at a throughput rate of

H~M, m! 5 H~m!1

~M 2 m!2 1 ~M 2 m! 1 1M . m

5 H~m!~m2 2 m 1 1!

~m2 2 m 1 1! 2 ~m 2 M!mM , m.

(23)

The architectures employed for the S-SEED andL-SEED technologies lead to more complicated rela-tions, which are accounted for in the following anal-ysis. The time taken to sort a single set of 2M

words is

Tsort 5106

2MH~M, m!s. (24)

For example, two sets of 1024 words ~M 5 10! can besorted with an increase of 11% in the throughput rateby the use of a 32 3 64 channel number ~m 5 11!,compared with the matched case ~M 5 m 5 10!.Conversely, sorting 2048 ~M 5 11! words by the useof a 32 3 32 channel number ~m 5 10! has a through-put rate equal to 50% of that of the matched case~M 5 m 5 11!.

A. S-SEED-Based Sorter

The calculated throughput rates of the S-SEED-based sorter are shown in Fig. 11. In this and Figs.12 and 13 ~below!, H is plotted as a function of the

Page 11: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

number of input data channels per array ~2m!. Themaximum chip size, 1 cm2, is indicated by a verticalhash mark on the lower horizontal axis. For theS-SEED case, each device corresponds to a datachannel; given a total S-SEED pixel area12 of 20mm 3 40 mm, the channel numbers convert to theoccupied chip areas indicated. The saturation ir-radiance22 is assumed to be 3 kWycm2, and theradiation spot size to be 5 mm 3 5 mm. Two prob-lem sizes, H~10, m! and H~16, m!, have been ana-lyzed; they correspond to sorting 1024 ~M 5 10! and65,536 ~M 5 16! 8-bit words, respectively. For asmall number of devices, the optical power avail-able at each S-SEED is adjusted to provide thelargest throughput rate. Thus, for array sizes be-low 16 3 32 S-SEED’s, the optical power is reducedto avoid undue saturation, as indicated by the dot-ted curve in Fig. 12. Conversely, for array sizes inexcess of 32 3 64 the reduction in the input poweravailable per node leads to longer processing times.The reduction in the throughput rate at high arraysizes occurs because the interconnect is assumed tobe a fixed 2-D perfect shuffle of the full array size.The increase in the number of nodes that exactlyoffset the increase in processing time cannot there-fore be fully exploited.For sorting 32 3 32 arrays of data, H~10, m!, the

optimum throughput achievable is 3.4 Mops ~pointA in Fig. 11!. This optimum occurs for a 32 3 64channel number and a clock period of 46 ns; a totalof 13,244 clock periods is required, taking a total ofapproximately 0.6 ms to sort each data set. Notethat the optimum in throughput does not corre-spond to the matched problem ~m 5 M! in the H~10,

Fig. 11. Throughput rate and laser-source power for the S-SEED-based bitonic sorter plotted as functions of the channel number.Each S-SEED element corresponds to a channel. The solid curveindicates the throughput rate achieved for sorting in 1024 datasets. The dashed curve shows the throughput when 65,536 ele-ments are to be sorted. The corresponding amount of laser-sourcepower required in both cases is shown by the dotted curve. Thevertical hash mark on the lower horizontal axis indicates the chip-area limit ~1 cm2!.

m! case because a greater number of devices oper-ating at a slightly lower frequency can still increasethe throughput rate. For larger data sets the op-timum throughput rate naturally occurs for a largerchannel number. A 256 3 256 array can be sortedat 1.4 Mops ~point B, Fig. 11! by the use of amatched physical array. The clock period must beincreased to 2.9 ms as a result of the source-powerreduction per channel. The 31,696 clock periodsrequired now need approximately 93 ms to sort eachdata set.

B. L-SEED-Based Sorter

Figure 12 shows the throughput rate and laser-source power for the H~10, m! case plotted as func-tions of the channel number. Note that eachL-SEED ~2 3 1! node receives two input data sig-nals; the number of L-SEED pixels on a single chipis one half the data channel number. For compar-ison, the throughput rate of the S-SEED system isdisplayed ~dashed curve!. The area of a 2 3 1L-SEED switching node is 55 mm 3 55 mm, which istaken as the area for all L-SEED circuits of thesystem.12 A slightly lower throughput rate isachieved than for the S-SEED-based sorter. Theoptimum performance is 2.6 Mops for a 32 3 64array size ~point A, Fig. 12!. The correspondingprocessing time TL-SEED is 240 ns; 3702 cycles arenecessary to sort, in ;0.9 ms, the two sets of 1024numbers. Thus, the increase in pixel intelligencecompared with the single NAND–NOR operation of theS-SEED does not compensate for the increasedclock period for the L-SEED architecture.

Fig. 12. Throughput rate and laser-source power for the L-SEED-based bitonic sorter plotted as functions of the channel number.Results of sorting 1024 data are represented by the solid curve.Each L-SEED node corresponds to one channel. The throughputrate of the S-SEED-based system is indicated for comparison~dashed curve!.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6407

Page 12: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

C. CMOS- and FET-SEED-Based Sorters

Quasi-cw-mode operation is considered for detailedanalysis at the l1 operating wavelength, because thismode offers the shortest processing time. The cal-culation of the processing time requires the determi-nation of the electronic-propagation times. Asimulation package of the electronic circuit predictsthat the CMOS version can run at 100 MHz and theGaAs-based FET-SEED at approximately 315 MHz.Figure 13 shows the throughput rates of the CMOS

and FET-SEED-based bitonic sorters as functions ofthe channel number for an assumed input-voltageswing of DVin 5 0.25 V.28 The other numerical con-stants used are listed in Table 1. The smart pixel ineach family is a 2 3 2 self-routing node; hence againthe number of pixels is one half the channel number.The maximum array sizes that can be laid out on 1cm2 are indicated by vertical hash marks on the hor-izontal axes for the CMOS-SEED- and the FET-SEED-based sorters. In all cases, the peak powersat individual detectors are given by Eq. ~10!, wherePtot ~the peak power of the laser source! is taken to be1 W. The duration of the pulse is taken to be thatrequired to satisfy Eq. ~17! so that the optical energyreceived is the minimum needed to induce the fullinput-voltage swing DVin. For both technologies,the maximum throughput rate occurs at the maxi-mum chip size.The dimension of the 1-mm CMOS node is 200

mm3 400 mmand a 323 64 channel array can be laidout on a 1-cm2 chip.25 The maximum throughputrate is 140Mops ~point A, Fig. 13!, which correspondsto a clock period of approximately 14 ns; the numberof clock periods per sort is 1000. The area of theFET-SEED 2 3 2 node is 280 mm 3 560 mm, and a323 32 channel array can be laid out on a 1-cm2 chip.

Fig. 13. Throughput rates of the CMOS-SEED- and FET-SEED-based 2 3 2 bitonic sorters plotted as functions of the channelnumber. Each node corresponds to two channels. The chip-arealimits are indicated by the vertical hash marks on the horizontalaxes.

6408 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

The maximum throughput rate is 215 Mops ~point B,Fig. 13!, which corresponds to a clock period of ap-proximately 3 ns, and 910 clock periods.

6. Thermal Limitations

The variation of the throughput rate under the con-straint of limited laser-source power suggests thatthe FET-SEED-based arrays should be employed fordata sorting. A further constraint, the heat-removalcapability, however, has to be included in bench-marking the different arrays. This technologicalconstraint is all the more important because both themodulator output response and the behavior of theelectronic circuit are sensitive to the heat dissipated.In this section, the power dissipation in the S-SEEDand CMOS- and FET-SEED’s under cw-mode andquasi-cw-mode operation is therefore reviewed. Inthe following a maximum power dissipation of 10Wycm2 is assumed.

A. S-SEED Architecture

The power dissipated in the S-SEED’s originatesfrom the photocurrent generated by the opticalbeams. The power Psw, dissipated during the con-version time of the S-SEED, is43

Psw 5 *Vinit

VfinalIph

SdVS 1 *Vfinal

VinitIph

RdVR, (25)

where IphS and Iph

R are the photocurrents generatedat the S andRwindows, respectively. The final volt-age Vfinal and the low-level reflectivity of a previousSEED, Rp, depend on the writing-beam powers, sothat

Psw 5Phigh *Vinit

VfinalS~VS!dVS 1 Plow *

Vfinal

VinitS~VR!dVR.

(26)

To a good approximation Vfinal 5 V0 1 Vt and Vinit 52Vt. During the reading of the final state of thedevice, the power supplied by the optical read beamsthat is dissipated by the two SEED’s is therefore

Pstate 5 PreadS~V0 1 Vt!V0, (27)

where V0 is the applied voltage. The average powerdissipated over a total clock period is then

Pav 512

~Pstate 1 Psw!

512

$PreadS~V0 1 Vt!V0 1 ~Plow 1 Phigh!Sm~V0 1 Vt

2 Vm! 1 @PlowSp~Plow! 1 PhighSp~Phigh!#~Vm 1 Vt!%.

(28)

For a given laser-source power per pixel, there is amaximum frequency at which the sorting module canoperate. Under such an operating condition, Eq.~28! determines the power per channel that must be

Page 13: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

dissipated. To demonstrate which of the limitations~laser-source power or power dissipation! applies inpractice, it is useful to plot these powers per channelas functions of the clock frequency. The constantvalue of the product of the optical power and theconversion time for these devices suggests that highfrequencies can be achieved if high optical power isavailable. The saturation of the absorption peak athigh power levels, however, limits the maximum fre-quency of operation. For the device data givenabove a maximum frequency of 50 MHz is achievedfor 2.2 mW of laser-source power for each S-SEED;this amount corresponds to 200 mW of differentialpower reaching the S-SEED. The corresponding op-tical switching energy is approximately 0.6 pJ. Atthis maximum operating frequency, 370 mW of aver-age power is dissipated per S-SEED. The 50-MHzfrequency, however, cannot be obtained for a largearray that is given only 1 W of laser-source power.This situation is shown in Fig. 14 for the writing ofone S-SEED array from a similar one. For Fig. 14and the following Figs. 15 and 16, the required laser-source power per data channel ~left-side vertical axis!is displayed as a function of the operating frequencyof the devices. Also plotted ~right-side vertical axis!is the power dissipated per data channel as a conse-quence of operation at that frequency and sourcepower. The two vertical axes are scaled such thatthe maximum channel number that can be toleratedappears at the same position on each axis. The ver-tical axes therefore correspond to the inverse of thearray size. The area limit is drawn for the maxi-mum array size ~256 3 256! that can be fabricated ona 1-cm2 chip.For the 323 64 channel number indicated, the 1W

of available laser power is capable of operating thesystem at 22 MHz ~46-ns clock period!; see point A inFig. 14. The corresponding power dissipated per

Fig. 14. Average power dissipated ~dashed curve! and laser-source power required ~solid curve! plotted as functions of thefrequency of operation of the S-SEED. The area limit is indicatedin long dashed lines. The limit on laser source power and heatdissipated which is induced by the sorting of 32 3 64 channels isrepresented by the horizontal solid line. See text.

channel is approximately 80 mW ~point A9!, and theoptical switching energy is 0.6 pJ. This is well belowthe 5mWychannel that could be dissipated for the 211channels. It can be seen that, for all operating con-ditions, the power-dissipation ~dashed! curve lies be-low the source-power curve ~solid curve! at allfrequencies. The S-SEED system throughput istherefore limited by the laser-source power. PointsB and C in Fig. 14 correspond to the identically la-beled points in Fig. 11, i.e., the matched situation for216- and 210-sized data sets, respectively. At a highsource-power per node, the effect of saturation is suf-ficient to reduce the achievable operating frequency;one would not operate above the turning points of thecurves in Fig. 14.

B. L-SEED Architecture

It was shown in Section 5 that there is a degradationof performance in terms of the throughput in combin-ing SEED’s into the logic combinations needed for thesorting task. No further analysis is therefore pre-sented here, but the thermal constraints are expectedto have little effect on the L-SEED throughputs, justas they do not influence S-SEED operating condi-tions.

C. CMOS- and FET-SEED Architectures

For CMOS- and FET-SEED-based sorting nodes, theheat is created by the power absorbed at the trans-ceivers and the power dissipated by the electronicgates. Power dissipation depends on the mode ofoperation of the laser and the electronic family con-sidered.

1. Quasi-cw Mode of OperationIn the quasi-cw mode of operation, the writing beamsare switched off as soon as the input-voltage swinghas been achieved. During the optical-to-electronicconversion time Tconv the output voltage of the pho-todetectors experiences the voltage swing DVin sothat the power dissipated per node is

Psw1 5 2hPread@Rm 1Rp~Pread!#S# ~Pread!DVin. (29)

The factor of 2 in the above expression indicates thateach node has two inputs. The power dependences ofthe low-level reflectivity and mean responsivity disap-pear at l1. During the rest of the processing time,T 2 Tconv, the input beams are OFF. Power is dissi-pated at the output modulators when the S-SEED’sare read during the time Tconv required to transfer theinformation signals onto the next smart-pixel array.Beforehand, the output voltage has been set up by theelectronics. In the worst case, the power dissipated isthen

Pout1 5 2PreadSpV0. (30)

The total average power is then

Pavquasi-cw 5

~Psw1 1 Pout

1!Tconv 1 PelecTelec

T, (31)

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6409

Page 14: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

where Pelec is the power dissipated by electronics dur-ing the time Telec, which is discussed below.

2. cw Mode of OperationIn the cw mode, the writing and read beams are ON

during the whole processing time. The same powerconsumption as demonstrated in Eq. ~29! occurs atthe photodetectors while the information is written.During the rest of the processing time T 2 Tconv, theinput beams dissipate energy in the fully switchedinput stage ~of voltage DVin!, so that the power Psw

2 is

Psw2 5 Psw

1y2. (32)

The determination of the power dissipation at theoutput modulators is more complicated in the cwcase. Reading the output S-SEED’s can be decom-posed into three distinct steps:

1. The output voltage remains constant duringthe time T 2 Tout because the signals have not yetreached the last electronic stage. Irrespective of thewavelength of operation, the average power is Pout

1.2. Either the output-node voltage can stay con-

stant or the node can switch during the time Tout. Ineither case, this time is usually extremely small if theoutput transistors are scaled properly, so that theaverage power dissipation of this stage can safely beneglected.3. The output S-SEED is read during Tconv to per-

mit the transfer of information onto the next array.The situation is in fact identical to the first stage ofthe reading process. The power dissipated is then

Pout2 5 Pout

1. (33)

The average power over the clock period is thereforegiven by

The electronics in the CMOS 2 3 2 node dissipates0.3 mW at 10 MHz and 2.5 mW at 100 MHz.Although important in CMOS logic, the dynamic

power accounts for no more than 10% of the totalelectronic power dissipated in MESFET technology.Thus, the quiescent power forms the main powercontribution in buffered-layer MESFET logic,whereas this power can be neglected in CMOS. Inthe buffered-FET logic, this static power Pq is dis-sipated over the whole clock period, so that in Eq.~35! the factor TelecPelecyT should be replaced by Pqplus a small dynamic power contribution. Thesimulation of an optimized version predicts thatPelec 5 160 mW will be dissipated in the 2 3 2MESFET sorting node.

Fig. 15. Average power dissipated and laser-source power re-quired for a 2 3 2 CMOS-SEED sorting node plotted as a functionof the frequency of operation. See text for details.

Pavcw 5

Psw1Tconv 1 Psw

2~T 2 Tconv! 1 TelecPelec 1 ~T 2 Tout!Pout1 1 TconvPout

2

T. (34)

Neglecting Tout yields

Pavcw 5

~Psw2 1Pout

2!@T1Tconv# 1TelecPelec

T. (35)

D. Power PelecThe power dissipated by the logic circuitry, Pelec, de-pends on the electronic logic family. Pelec is calcu-lated by the use of the SPICE simulation package. Tothis power should also be added the power required todrive the clock distribution network. In the CMOSlogic, the dynamic power, averaged over the process-ing time, is roughly proportional to the frequency ofoperation. This power is dissipated during switch-ing of the electronic gates, with the maximum instan-taneous power occurring at the reset of the node.

6410 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

E. Results for the Hybrid Systems

Figure 15 shows the average power dissipated pernode and the corresponding laser-source power forthe CMOS-SEED system under quasi-cw operation.The frequency of operation and average power dissi-pated increase as the input power increases. In thequasi-cw mode of operation, for a 32 3 64 channelnumber, a maximum of a 70-MHz operating fre-quency ~point A, Fig. 15! is achievable for a power-dissipation value of 2 mWynode and a source-powervalue of ;1 mWynode. In the cw regime, only 53MHz is achievable for the above channel number;more generally, for a given laser-source power pernode, the operating frequency achievable in thequasi-cw mode is higher than that in the cw mode.At any frequency, the quasi-cw mode of operation isalso the more energy efficient, because the power is

Page 15: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

applied only when needed: 1 mWynode of sourcepower is required, and 6 mWynode must be dissi-pated. However, in bothmodes, the limitation is thelaser-source power rather than heat dissipation.Thus, the CMOS-SEED-based bitonic sorter is con-strained by the laser-source power and ideally shouldbe operated in the quasi-cw mode.The situation is different for the FET-SEED bitonic

sorter, as shown in Fig. 16. In this case, the size ofthe array is limited by the heat-removal capabilitybecause of the large amount of quiescent power Pqdissipated per node ~160 mW!. At all frequencies,the power dissipated in the quasi-cw mode is mainlyelectronic in nature; the optically induced heat ac-counts for no more than 1% of the total averagepower. A 32 3 32 channel number is not possiblewith this technology because the power dissipated bysuch an array vastly exceeds the allowable limit. An8 3 8 data-channel array size can be operated at 303MHz in the quasi-cw mode ~point C, Fig. 16!. Thispoint of operation stands just under the limit of theheat-removal capability. With reference to point Cin Fig. 13, the maximum throughput rate for theFET-SEED system is therefore of the order of only 5Mops for sorting 32 3 32 data sets.

7. Communication Rates between Smart-Pixel Chips

It was shown in Section 6 that smarter, faster, or bothsmarter and faster pixels do not necessarily providehigher throughput rates. The constraints may be ofan algorithmic and architectural nature, as in theS-SEED sorter, of a thermal nature, as in the 2 3 2FET-SEED-based sorter, or of an optical origin ~lasersource!, as in the CMOS-SEED sorter. Whereas thethroughput rate describes the performance of the en-tire system, it is useful to define a performance mea-sure for the smart-pixel chip itself on the basis of thearguments of Sections 5 and 6. The number of bitstransmitted per second both on chip and off chip pro-

Fig. 16. Average power dissipated and laser-source power re-quired for a 2 3 2 FET-SEED sorting node ~AT&T Laboratories!plotted as a function of the frequency of operation. Themaximumheat-removal capability of 10 Wycm22 permits sorting no morethan 8 3 8 data channels.

vides a suitable measure, the figure of merit; it is theaggregate communication rate and is expressed ineffective pin-Hz ~inputs times hertz!, G. For theS-SEED sorter,

G 5time 2 average power at detectors

optical switching energy needed per channel,

(36)

is the data rate experienced by the detectors. Forthe hybrid technologies, G is twice the detection ratebecause input and output beams from a single chipare present within a single clock period. Equations~15! and ~21!, which describe the clock periods of thedifferent sorters, show that, in the limit of large op-tical fan-out ~i.e., low laser-source power per chan-nel!, the conversion time is the dominant term. Thetime of flight of the optical signals and the propaga-tion time of the electronic signals can be neglected.In this limit the clock period is inversely proportionalto the laser-source power per channel so that theaggregate communication rate G becomes indepen-dent of the array size. In the S-SEED case,

GS-SEED 5h

4CcapAK, (37)

where, if saturation effects are neglected,

K 5Vm 1 VT

SmRm 2 SpRp1V0 1 VT 2 Vm

SpRm 2 SmRp.

For the hybrid technology,

Ghybrid 5 2hS# ~Rm 2 Rp!

CinDVin, (38)

when the amplification time is neglected. Expres-sions ~37! and ~38! provide an upper value of themaximum aggregate data rates on and off chip thatcan be achieved. By the use of the empirical data inTable 1, the theoretical maximum communicationrate for S-SEED chips is 5 3 1010 pin-Hz for 1 W ofsource power. The practical value, taking all of theabove factors into consideration, is very close to this~4.5 3 1010 pin-Hz! because the optical-to-electronicconversion time dominates. For the hybrid technol-ogy, the theoretical limit is 5.2 3 1011 pin-HzyW.For the CMOS-SEED sorter the practical limit iscloser to 3 3 1011 pin-Hz because the electronic logictiming is still significant at those array sizes that canbe fabricated onto a 1-cm2 chip. The position is farworse for the FET-SEED technology, in which thethermal-dissipation constraint prevents one fromworking at a high pixel density; the practical commu-nication limit for the sorting-node array is 3.9 3 1010

pin-Hz.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6411

Page 16: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

8. Relaxation of the Constraints

A. Algorithm Options: MSB or LSB

For each of the technologies above it is possible toconstruct a sorting module to operate on the datainput in the LSB format. The influence of LSB tech-nology on the throughput rate depends on the tech-nology used. In the hybrid case, the throughput isdecreased by a factor of 2 because the data have topass twice through the sorting node before being out-put. In the S-SEED and L-SEED systems, however,the throughput is increased from 3.4 to 6.9 Mops~S-SEED! and from 2.5 to 5 Mops ~L-SEED! for thematched case M 5 m 5 10. This counterintuitiveresult is due to the number of additional steps thatare necessary in MSB operation for the logic deter-mination of the state variable Dn. Thus, althoughthe nonhybrid technologies are slightly improved incomparison with the hybrid systems when the algo-rithm is changed in this way, the general conclusionsof this study are not altered.

B. Architectural Options: Pixel Smartness

The method that has been described above for deter-mining the throughput can be applied to more com-plex pixels. Four 2 3 2 nodes are required toconstruct a 4 3 4 node capable of merging two bitonicsequences of length 2; 12 nodes would be required toconstruct an 8 3 8 node, and so forth. Increasingthe node complexity decreases the number of cyclesthat must be made around the processing loop at theexpense of a greater chip area per pixel. In thepulse-mode limit and for a fixed problem size of 1024words, 112, 140, and 180 Mops would be achieved formatched 2 3 2, 4 3 4, and 8 3 8 sorting-node arrays,respectively, in the CMOS family. The correspond-ing chip areas, 0.4, 0.8, and 1.2 cm2, respectively,imply that themost efficient use of a 1-cm2 chip wouldbe in the 2 3 2 node configuration.The pixel complexity of the hybrid technologies

could also be reduced ~to 2 3 1 nodes!. Because anadditional fan-out of two is needed, the pulse dura-tion would need to be twice as long as in the 2 3 2node case. The reduced logic complexity would en-able the array to run faster. The net effect is esti-mated to be a small ~10%! reduction in the clockperiod. This benefit is, however, offset by the need todouble the smart-pixel hardware for the 2 3 1 case.

C. Interconnection Reconfiguration

Figure 11 shows that, for undersized problems ~M ,m!, the performance of the S-SEED sorter actuallydecreases as the device array size increases. In thelimit of large m

H~M! } ~Mm!21. (39)

The reason for this increase is that, in all of theanalysis, a single 2-D perfect-shuffle interconnectacross the full device array size has been assumed.For undersized problems it is, however, more efficientto operate a number of myM quite independent

6412 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

sorting tasks, each of size 2M. An S-SEED sortertargeted specifically at sorting-data sets of size 210

would then achieve 3.4 Mops for all array sizes ex-ceeding 32 3 64 S-SEED’s. Added flexibility wouldbe given if the interconnect were dynamically recon-figurable; the performance could be optimized foreach incident data-set size. For the hybrid technol-ogies the lower number of pixels per centimetersquared means that little benefit would be gained bythe above approach. If reconfigurability were avail-able, however, a quite different interconnect topology,the radix-2 scheme, would enable higher sorting per-formances for all the SEED technologies.44,45A throughput rate of 1.7Mops is achieved on a 2562

S-SEED array size for sorting 256 3 256 data sets.This is the largest array size that can be accommo-dated on one chip and therefore makes the highestuse of the 2-D perfect-shuffle interconnection. Incomparison, 7Mops are achieved for sorting the sameproblem size on the CMOS-SEED sorter, which isrestricted to a 32 3 64 array size. The advantage ofintelligent 2 3 2 nodes in the hybrid technology isgreater than the disadvantage of a small interconnectsize. If there is a regime for the simple S-SEEDtechnology, it would be at these high array sizes, butan increased laser-source power would be essential.

D. Beyond 1 W of Laser-Source Power

The throughputs of the S-SEED and CMOS-SEEDsorters are both limited by the laser-source power.With reference to Fig. 14 for the S-SEED’s, operationat the maximum frequency of 50 MHz requires 2.2mWychannel and demands a rate of 300-mW powerdissipationychannel. Hence, for a 32 3 32 channel-number sorter, a maximum throughput of 8.6 Mopscould be achieved if a 2.3-W source-power availabilitywere generated. For larger array sizes improvedperformance could in principle be gained by an in-crease in the laser power. The theoretical limit of 74Mops for a 32 3 32 problem tackled with a 1282

channel-number array dissipating 6 Wycm2 of heatwould, however, require 36 W of input power ~andthere would be associated electrical-power-supplyproblems!.In the CMOS-SEED case ~Fig. 15!, for which a 323

64 array can be operated at 50% of the maximumfrequency and for which the 1-cm2 area limitationprevents channel numbers exceeding 642, 140 Mopswere achievable for a 1-W laser-source power. Inthe pulse-mode limit the throughput rate could go upto 260Mops for a 323 32 problem tackled with a 32364 channel array. More realistically a 4-W lasersource would enable achievement of a 210-Mopsthroughput. Regarding the communication rates,the theoretical limits are 8 3 1010 pin-Hz ~S-SEED!and 1012 pin-Hz ~CMOS-SEED!, given a 1-cm2 chipwith a 10-Wycm2 dissipation capacity but high inputpower.

E. Beyond a 10-W Heat-Removal Capacity

With 1 W of laser power, the FET-SEED sorter is theonly system affected by the relaxation of the thermal

Page 17: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

Table 2. Performance Characteristics of S-SEED’s and Hybrid-SEED-Based Smart Pixels for Data Sortinga

S-SEED CMOS-SEED FET-SEED

Throughput ~Mops! 3.4 140 4.8Clock period ~ns! 46 14.3 3.3Optical pulse duration ~ns! 23 4.3 0.13Channel number 32 3 64 32 3 64 8 3 8Active chip area ~cm2! 0.02 0.8 0.6Number of logic chips required 8 1 1Number of clock periods 13,244 1000 64,480Time to sort a single 322 data set ~ms! 609 14.3 213Laser-source power per chip ~W! 1 1 1Maximum power dissipation on chip ~W! 0.16 2.5 5.4

Communication Rate ~pin-Hz! 4.5 3 1010 2.9 3 1011 3.9 3 1010

Data rate ~MHz! 22 70 303Optical pin number ~input–output! 2048 4096 128On-chip gate ~switches per second! 4.5 3 1010 2.4 3 1013 2.3 3 1012

aSorting 32 3 32, 8-bit words is chosen to permit comparison of different technologies.

constraint. For such a source power a 32 3 32channel-number sorter could operate at up to 315MHz if up to 67 Wycm2 could be dissipated. Thethroughput rate would then be 210 Mops, and thecommunication rate 2.8 3 1011 pin-Hz.

F. Beyond a 1-cm2 Chip Area

Even if the optical systems were able to cope withlarger fields of view, there is nothing to be gained byan increase in the chip area for the S-SEED andCMOS-SEED cases unless one of the above con-straints ~Subsections 8.A–8.E! is also relaxed. How-ever for the FET-SEED, an increased chip area wouldallow the same number of smart pixels to be distrib-uted less densely and thereby alleviate the thermaldissipation. For example, a 4-cm2 chip could accom-modate 8 3 16 nodes, providing 16 3 16 data chan-nels with the same power dissipated per node as inthe 8 3 8 channel sorter. The resulting throughputrate would be approximately 45 Mops.

G. Need for New Front-End Interfaces

A limiting factor in the CMOS-SEED technology isthe time taken to convert the optical signals to volt-age levels compatible with the electronic logic ~Tconv!.The whole analysis for the hybrid technologies wasbased on the use of a voltage-gain stage for the am-plification scheme. The performance of the systemsis, however, strongly dependent on the input-voltageswing, that is, on the kind of amplifier used for theoptoelectronic interface. As the input voltage de-creases so does the minimum optical switching en-ergy. For a given total laser-source power andchannel number, an optimum input-voltage swingDVin,optim that minimizes the conversion and ampli-fication times @Eq. ~18!# can be calculated by

DVin,optim 5 Î8hS#CoutDVlog~Rm 2 Rp!

3Cingm

Ptot

2m11 . (40)

with the values from Table 1, the optimum inputvoltage swing DVin,optim for a 32 3 32 channel-number sorter isDVin 5 0.14 V. With a value ofDVin

5 0.25 V, the CMOS-SEED sorter has a longer con-version time, Tconv 5 2.7 ns, than does the optimumsystem with a time of Tconv 5 2.1 ns. Another alter-native consists of providing a current swing instead ofa voltage swing in the amplification process, therebyreducing the effective input capacitance. Severalcurrent-mode circuits, such as low-power transim-pedance amplifiers,46 current conveyors,47 orcharged-sense amplifiers,48 have been proposed toavoid charging the front-end capacitance. It is ex-pected that the value of the minimum switching en-ergyDEmin will be lower than that used in the presentanalysis, with a consequent reduction in Tconv andimprovement in the module throughput rates.The FET-SEED configuration, in the form fabri-

cated to date, has been shown to be power-dissipationlimited. In principle, the use of direct-coupled FETlogic ~DCFL!49 would reduce the quiescent power andimprove throughputs. Monolithic DCFL-SEED in-tegration has not yet been achieved, however.

9. Conclusion

A variety of smart-pixel arrays based on existingSEED technology has been studied in the context ofdemonstrators that perform sorting. Two figures ofmerit have been defined to quantify the performanceof each system. The sorting of 32 3 32, 8-bit datasets has been taken as a benchmark problem. Eachcompleted sort is considered to be a task involving1024 operations, and the system figure of merit isquoted in the number of millions of operations persecond ~Mops!. The device figure of merit is the com-munication rate on and off chip and is quoted inpin-hertz ~inputs times hertz!. Three constraintswere imposed on the systems; they reflect the realoperating conditions of a demonstrator: ~i! The chiparea was limited to 1 cm2, ~ii! the power dissipated offchip was limited to 10W cm22, and ~iii! themaximumpeak power of the laser source was 1 W. Table 2summarizes the throughput and communicationrates for the technologies considered, under optimaloperating conditions in each case.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6413

Page 18: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

It has long been recognized that the pure S-SEEDtechnology has a lower performance level than itshybrid versions. This is quantified here forthe benchmark task. The S-SEED chip data-communication rate is one quarter that of the CMOS-SEED chip, primarily because of the optical energyneeded to switch each device ~;0.4 pJychannel at theS-SEED detectors and 0.15 pJychannel for theCMOS-SEED!. The resultant system performanceis, however, considerably lower for the S-SEED tech-nology because each optical channel takes advantageof the processing power of more than 100 electronicgates in the CMOS-SEED case. Only 3.4 Mops areachievable for the 32 3 64 channel S-SEED sortingmodule, whereas 140 Mops are expected for theCMOS-SEED system. In both cases the laser-source power is the main performance limiter. TheFET-SEED technology is, in contrast, presently lim-ited by heat-dissipation requirements, so that fewerelectronic gates can be operated per unit area. Thusthe FET-SEED chip data-communication rate islower than that for the other technologies, i.e., thesystem performance, even though 300-MHz clock fre-quencies are possible, is restricted to approximately 5Mops.It is not the purpose of this paper to make compar-

isons between optoelectronic and all-electronic dedi-cated systems. However, it is noted that Bellcorehave developed a 32-channel sorter on a single chipusing a space-multiplexed architecture.50 ThisBatcher chip sorts the small 8-bit data set at an im-pressive 675 Mops. Translated to 32 3 32 data setssuch chips should in principle achieve sorting in ap-proximately 32 Mops, at the expense, however, of ad-ditional chips used for data storage. A complexpicture arises from the analysis carried out in thispaper. Themethodology laid out in this article, whichcombines analytic and SPICE simulation results, how-ever, allows us to quantify the relative performance ofone system against another. This methodology canbe extended to any kind of task and smart-pixel tech-nology without too much difficulty. It is important tonote that optoelectronic smart-pixel-based modulescan now exceed the capabilities of electronics for spe-cific tasks. This opens up considerations of architec-tures and applications that exploit the optical 2-Dparallel interconnection of electronic chips. The rea-sons that these comments can now be made are

1. The high optical chip-to-chip communicationrate, 33 1011 pin-Hz, is an order of magnitude higherthan electrical pin-out achieves. For example, the64 3 64 crosspoint-switch chip from Vitesse ~part no.VSC864! operates at 200 Mbys, providing a commu-nication rate of 0.25 3 1011 pin-Hz.512. The free-space optical interconnection can ex-

ploit nonlocal space-variant shuffle patterns, as wellas providing an effective optical pin-out count of 4096~for the CMOS-SEED sorter!.3. The SEED-based differential detectors consid-

ered here have a low optical-energy budget. Thiscircumstance is especially important given the diffi-

6414 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

culty in delivering optical power to chips ~a less than1-W capacity! compared with the amount of electricalpower ~1–10 W!.

The perfect-shuffle interconnected bitonic sorter isone of the optical demonstrators under constructionwithin the framework of the Scottish CollaborativeInitiative in Opto-electronic Sciences ~SCIOS!,funded by the Engineering and Physical Sciences Re-search Council of the United Kingdom.The authors have benefited from numerous discus-

sions with D. T. Neilson, D. A. Baillie, S. M. Prince,L. C. Wilkinson, M. G. Forbes, and A. C. Walker ~whois the coordinator of the above consortium!. One ofthe authors, M. Desmulliez, acknowledges the finan-cial support of the European Programme’s OpticalParallel Interconnected Processors ~OPIP! project.Support is also acknowledged from the UK Enginer-ing and Physical Sciences Research Council ~EPSRC!through SCIOS.The authors are grateful for the comments of one of

the anonymous referees, some of which have beenimplemented in this article.

The e-mail address for M. P. Y. Desmulliez [email protected].

References1. P. Heremans, M. Kuijk, R. Vounckx, and G. Borghs, “Differ-

ential optical pnpn switch operating at 16 MHz with 250 fJoptical input energy,” Appl. Phys. Lett. 65, 19–21 ~1994!.

2. H. S. Hinton, “Architectural considerations for photonicswitching networks,” IEEE J. Select. Areas Commun. 6, 1209–1226 ~1988!.

3. S. R. Forrest and H. S. Hinton, “Introduction to the specialissue on smart pixels,” IEEE J. Quantum Electron. 29, 598–599 ~1993!.

4. M. P. Y. Desmulliez, F. A. P. Tooley, J. A. B. Dines, N. L. Grant,D. A. Baillie, B. S. Wherrett, P. W. Foulk, S. Ashcroft, and P.Black, “Perfect-shuffle interconnected bitonic sorter: opto-electronic design,” Appl. Opt. 34, 5077–5090 ~1995!.

5. F. B. McCormick, T. J. Cloonan, A. L. Lentine, J. M. Sasian,R. L. Morrison, M. G. Beckman, S. L. Walker, M. J. Wojcik,S. J. Hinterlong, R. J. Crisci, R. A. Novotny, and H. S. Hinton,“Five-stage free-space optical switching network with field-effect transistor self-electro-optic-effect-device smart-pixel ar-rays,” Appl. Opt. 33, 1601–1618 ~1994!.

6. M. K. Hibbs-Brenner, S. D. Mukherjee, B. L. Grung, and J.Skogen, “GaAs OEICs for opto-electronic smart pixels,” inLEOS Summer Topical Meeting Digest on Smart Pixels, 1993~Institute of Electrical and Electronics Engineers, Lasers andOptoelectronics Society, New York, 1993!, pp. 26–27.

7. F. E. Kiamilev, P. J. Marchand, A. V. Krishnamoorthy, S. C.Esener, and S. H. Lee, “Performance comparison between op-toelectronics and VLSI multistage interconnection networks,”J. Lightwave Technol. 9, 1674–1692 ~1991!.

8. A. V. Krishnamoorthy, P. J. Marchand, F. E. Kiamilev, andS. C. Esener, “Grain-size considerations for optoelectronic mul-tistage interconnection networks,” Appl. Opt. 31, 5480–5507~1992!.

9. D. T. Lu, V. H. Ozguz, P. J. Marchand, A. V. Krishnamoorthy,F. A. Kiamilev, R. Paturi, S. H. Lee, and S. C. Esener, “Designtrade-offs in optoelectronic parallel processing systems usingsmart SLM’s,” Opt. Quantum Electron. 24, S379–S403 ~1992!.

10. C. W. Stirk, “Cost models of components for free-space opti-cally interconnected systems,” in Photonics for Computers,

Page 19: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

Neural Networks and Memories, Proc. SPIE 1773, 231–241~1993!.

11. B. S. Wherrett, J. F. Snowdon, S. Bowman, and A. Kashko,“Digital optical circuits for 2-D data processing,” in OpticalComputing 1992, Proc. SPIE 1806, 333–346 ~1992!.

12. D. A. B. Miller, “Quantum well self-electro-optic-effect devic-es,” Opt. Quantum Electron. 22, S61–S98 ~1990!.

13. A. L. Lentine and D. A. B. Miller, “Evolution of the SEEDtechnology: bistable logic gates to opto-electronics smart pix-els,” J. Quantum Electron. 29, 655–669 ~1993!.

14. A. L. Lentine, H. S. Hinton, D. A. B. Miller, J. E. Henry, J. E.Cunningham, and L. M. F. Chirovsky, “Symmetric self-electro-optic-effect device: optical reset latch, differential logic gateand differential modulatorydetector,” J. Quantum Electron.25, 1929–1936 ~1989!.

15. A. L. Lentine, D. A. B. Miller, J. E. Henry, J. E. Cunningham,L. M. F. Chirovsky, and L. A. d’Asaro, “Optical logic usingelectrically connected quantumwell PIN diodemodulators anddetectors,” Appl. Opt. 29, 2153–2163 ~1990!.

16. T. K. Woodward, L. M. F. Chirovsky, A. L. Lentine, L. A.d’Asaro, E. Laskowski, M. Focht, G. Guth, S. Pei, F. Ren, G.Przybylek, L. Smith, R. Leibenguth, M. Asom, R. Kopf, J. Kuo,and M. Feuer, “Operation of a fully integrated GaAs-AlxGa12xAs FET-SEED: a basic optically addressed inte-grated circuit,” Photon. Technol. Lett. 4, 616–618 ~1992!.

17. M. Goodwin, A. Moseley, M. Kearley, R. Morris, C. Kirkby,J. Thompson, R. Goodfellow, and I. Bennion, “Opto-electroniccomponent array for optical interconnection of circuits andsubsystems,” J. Light. Technol. 9, 1639–1644 ~1991!.

18. D. Knuth, The art of computer programing, ~Addison Wesley,Reading, Massachusetts, 1973!, Vol. 3.

19. J. W. Goodman, “Optics as an interconnect technology,” inOptical Processing and Computing, H. H. Arsenault, T. Szop-lik, and B. Macukow, eds. ~Academic, San Diego, 1989!, pp.1–32.

20. K. E. Batcher, “Sorting networks and their applications,” inProceedings of the Spring Joint Computer Conference, Vol. 32of Proceedings Series ~American Federation of InformationProcessing Societies, Reston, Virginia, 1968!, pp. 307–314.

21. T. J. Cloonan, G. W. Richards, R. L. Morrison, A. L. Lentine,J. L. Sasian, F. B. McCormick, S. J. Hinterlong, and H. S.Hinton, “Shuffle-equivalent interconnection topologies basedon computer-generated binary phase gratings,” Appl. Opt. 33,1405–1430 ~1994!.

22. A. M. Fox, D. A. B. Miller, G. Livescu, J. E. Cunningham, andW. Y. Jan, “Quantum well carrier sweep-out: relation toelectro-absorption and exciton saturation,” J. Quantum Elec-tron. 27, 2281–2295 ~1991!.

23. G. D. Boyd, J. A. Cavailles, L. M. F. Chirovsky, and D. A. B.Miller, “Wavelength dependence of saturation and thermaleffects in multiple quantum well modulators,” Appl. Phys.Lett. 63, 1715–1717 ~1993!.

24. T. Sizer, T. K.Woodward, U. Keller, K. Sauer, T. H. Chiu, D. L.Sivco, and A. Y. Cho, “Measurement of carrier escape rates,exciton saturation intensity and saturation density in electri-cally biased multiple-quantum-well modulators,” J. QuantumElectron. 30, 399–407 ~1994!.

25. P. W. Foulk, M. P. Y. Desmulliez, F. A. P. Tooley, J. G. Crow-der, N. L. Grant, and B. S. Wherrett, “Arrays of processingnodes for massively parallel sorting sorting using opticalswitching and interconnect,” in Proceedings of the Interna-tional Conference on ASIC, ASICON ~IEEE Beijing Section,Beijing, China, 1994!, pp. 201–204.

26. C. W. Stirk and R. A. Athale, “Sorting with optical compare-and-exchange modules,” Appl. Opt. 27, 1721–1726 ~1988!.

27. M. P. Y. Desmulliez, B. S.Wherrett, J. F. Snowdon, and J. A. B.Dines, “Optical, algorithmic and electronic considerations onthe desirable ‘smartness’ of optical processing pixels,” in Op-

tical Computing 1994, B. S. Wherrett and P. Chavel, eds., Vol.139 of Proceedings Series ~Institute of Physics, Bristol, UK,1995!, pp. 489–492.

28. A. C. Walker, M. P. Y. Desmulliez, F. A. P. Tooley, D. T.Neilson, J. A. B. Dines, D. A. Baillie, S. M. Prince, L. C.Wilkinson, M. R. Taghizadeh, P. Blair, J. F. Snowdon, B. S.Wherrett, C. Stanley, F. Pottier, I. Underwood, D. G. Vass, W.Sibbett, and M. H. Dunn, “Construction of demonstration par-allel optical processors based on CMOSyInGaAs smart pixeltechnology,” in Massively Parallel Processing Using OpticalInterconnections, E. Schenfeld ed. ~IEEE Computer SocietiesPress, Los Alamos, N.M., 1995!, pp. 180–187.

29. J. F. Snowdon, A. J. Waddie, and B. S. Wherrett, “Efficientdeployment of digital processing modules,” in Photonics forComputers, Neural Networks, and Memories, W. J. Miceli,J. A. Neff, and S. T. Kowel, eds., Proc. SPIE 1773, 193–197~1992!.

30. A. L. Lentine, D. A. B. Miller, L. M. F. Chirovsky, and L. A.D’Asaro, “Optimization of absorption in symmetric self-electro-optic-effect devices: a systems perspective,” J. QuantumElec-tron. 27, 2431–2439 ~1991!.

31. M. P. Y. Desmulliez, B. S. Wherrett, and J. F. Snowdon, “Tol-erance analysis of cascaded self-electro-optic-effect device ar-rays,” Appl. Opt. 33, 1368–1375 ~1994!.

32. D. J. Goodwill, A. C. Walker, C. R. Stanley, M. C. Holland, andM. McElhinney, “Improvements in strain-balanced InGaAsyGaAs optical modulators for 1047-nm operation,” Appl. Phys.Lett. 64, 1192–1994 ~1994!.

33. G. D. Boyd, L. M. F. Chirovsky, A. L. Lentine, and G. Livescu,“Wavelength optimization of quantum well modulators insmart pixels,” Appl. Opt. 34, 323–332 ~1995!.

34. L. M. Loh, J. L. LoCicero, and A. L. Lentine, “S-SEED switchingcharacteristics,” J. Lightwave Technol. 12, 2122–2130 ~1994!.

35. D. S. Chemla, D. A. B. Miller, P. W. Smith, A. C. Gossard, andW.Wiegmann, “Room temperature excitonic nonlinear absorp-tion and refraction in GaAsyAlGaAs multiple quantum wellstructures,” IEEE J. Quantum Electron. 20, 265–275 ~1984!.

36. S. Schmitt-Rink, D. S. Chemla, and D. A. B. Miller, “Linearand nonlinear properties of semiconductor quantum wells,”Adv. Phys. 38, 89–188 ~1989!.

37. T. K. Woodward, W. H. Know, B. Toll, A. Vinattieri, and M. T.Asom, “Experimental studies of proton-implanted GaAsyAl-GaAs multiple quantum well modulators for low photocurrentapplications,” J. Quantum Electron. 16, 2854–2865 ~1994!.

38. A. Yu, M. Krainak, and G. Unger, “1047-nm laser diode masteroscillator Nd:YLF power amplifier laser system,” Electron.Lett. 29, 678–679 ~1993!.

39. A. L. Lentine, L. M. F. Chirovsky, and T. K. Woodward, “Op-tical energy considerations for diode-clamped smart-pixel op-tical receivers,” J. Quantum Electron. 30, 1167–1174 ~1994!.

40. T. K. Woodward, A. L. Lentine, and L. M. F. Chirovsky, “Ex-perimental sensitivity studies of diode-clamped FET-SEEDsmart pixel optical receivers,” J. Quantum Electron. 30, 2319–2324 ~1994!.

41. A. L. Lentine, L. M. F. Chirovsky, L. A. D’Asaro, E. Laskowski,S. Pei, M. Focht, J. Freund, G. Guth, R. Leibenguth, L. Smith,and T. K. Woodward, “Field-effect transistor self-electro-optic-effect device ~FET-SEED! electrically addressed differentialmodulator array,” Appl. Opt. 33, 2849–2855 ~1994!.

42. B. S. Wherrett, M. P. Y. Desmulliez, and J. F. Snowdon, “Op-erating conditions for symmetric self-electro-optic-effect de-vices within digital optical circuits,” Opt. Comput. Process. 3,19–38 ~1993!.

43. S. Yu and S. R. Forrest, “Implementations of smart pixels foroptoelectronic processors and interconnection systems. 2.SEED-based technology and comparison with optoelectronicgates,” J. Lightwave Technol. 11, 1670–1680 ~1993!.

10 November 1996 y Vol. 35, No. 32 y APPLIED OPTICS 6415

Page 20: Performance analysis of self-electro-optic-effect-device-based (SEED-based) smart-pixel arrays used in data sorting

44. C. R. Jesshope, “The implementation of fast radix-2 trans-forms on array processors,” IEEE Trans. Comput. 29, 20–27~1980!.

45. J. A. B. Dines, Department of Physics, Heriot-Watt University,EdinburghEH14 4AS, Scotland, UK ~personal communication,15 April 1996!.

46. M. Ingels, G. Vanderplas, J. Crols, and M. Steyaert, “A CMOS18-THz-Omega 240 Mbyb transimpedance amplifier and 155Mbys LED-driver for low-cost optical fiber links,” IEEE J. SolidState Circuits 29, 1552–1559 ~1994!.

47. N. Tan and S. Eriksson, “Low-power chip-to-chip communica-tion circuits,” Electron. Lett. 30, 1732–1733 ~1994!.

6416 APPLIED OPTICS y Vol. 35, No. 32 y 10 November 1996

48. J. A. B. Dines, “Smart pixel optoelectronic receiver based on acharge sensitive amplifier design,” IEEE Special Issue on Se-lected Topics in Quantum Electronics ~to be published!.

49. S. I. Long and S. E. Butner, “Gallium arsenide digital inte-grated circuit design,” McGraw-Hill Series in Electrical Engi-neering ~McGraw-Hill, New York, 1990!.

50. W. S. Marcus, “A CMOS Batcher and Banyan chip set forB-ISDN packet switching,” IEEE J. Solid State Circuits 25,1426–1432 ~1990!.

51. Vitesse Corporation, “Gallium arsenide 64 3 64 CrosspointSwitch,” Preliminary Data Sheet, Part no. VSC864A-2 ~Vit-esse, San Jose, Calif., 1993!.