4
Simulation-Framework for Purely Digital CNNIMRF-Architectures Stephan C. Stilkerich EADS Corporate Research Center Image and Signal Processing Group D-8 1663 Munich, Germany E-mail: [email protected] Abstract-Any kind of hardware-relevant modeling, simula- tion and analysis of purely digital and massively parallel archi- tectures, which are based on CNN/MRF1 processing principles is a time consuming, computing resources intensive, fault-prone and complex task. Until now there is no industrially qualified toolkit available to systematically support these tasks in a single closed environment. In this contribution we present a novel Simulation-Framework for purely digital CNNIMRF processing systems. The unique modeling, hardware-relevant simulation and analysis capabilities unified in this Simulation-Framework allows it to systematically investigate (1) the massively parallel processing dynamic of digital CNN/MRF devices, (2) the model's convergence behavior and (3) complete CNNIMRF systems with an application-specific size. The paper is finalized by simulation results demonstrating the ability of the framework to handle CNNIMRF processing systems of realistic size and complex- ity. This manifests the industrial relevance of the proposed CNN/MRF Simulation-Framework. I. INTRODUCTION The continuously impressive semiconductor progress of the last decade and the resulting availability of digital ASIC technology families with up to 100 million usable gates and 6-8 global wiring levels paves the way for industrial relevant massively parallel and purely digital architectures of CNN/MRF based image- and signal-processing systems. In contrast to the prevalent CNN architectures, where on the one hand the sensing- and processing elements often form a spatial neighboring structure [6] [7] [2] and on the other hand the implementation is done in analog- or mixed-mode technologies [5] [1], we advocate a different architectural CNN/MRF setting. We propose a sensing-processing arrange- ment with a clear separation of the sensor element(s) and CNN/MRF processing structure(s). Furthermore we advocate a purely digital realization of the CNN/MRF processing units. Of course we are aware of the data-bottleneck between the sensor and the CNN/MRF processing unit. One architectural possibility to mollify this problem is described in [9]. Without the separated sensing-processing arrangement and the purely digital realization several promising application scenarios and systems, where the unique CNN/MRF processing paradigm is essentially required, will not become reality. The following arguments justify the decision to separate the sensor element(s) and the CNN/MRF processing structure(s) as well as to realize lMarkov Random Field [10], [3] the processing unit(s) purely digital: (1) Several important sensor signal sources like radar, infrared and sonar are ex- cluded from been processed in the CNN/MRF paradigm if one insists on spatially merged sensing-processing arrange- ments on one chip. The integration of CNN/MRF processing structures in the above mentioned sensors is not feasible due to technical, economic and intellectual property reasons. (2) Any data-fusion process is substantially aggravated in the CNN/MRF processing paradigm with a combined sensing- processing arrangement in analog technologies because several signal sources are digitally available or the analog signal-levels ae incompatible. (3) The application for approval of comput- ing devices for safety critical scenarios in space flight and aviation (airplane, helicopter, UAV2) is difficult to manage for combined sensing and processing units in analog technologies. (4) Radiation hardening and also hardening for just harsh envi- ronments is in a merged analog CNN/MRF sensing-processing arrangement extremely difficult to manage as the hardening has to be done simultaneously for both the sensor and the processing structures and thus surely affecting the performance of one of them. (5) High-Level design methodologies have recently been demonstrated [8] to be promising in adequately handling the design-complexity of large CNNs/MRFs. This paper describes the novel industrial approved Simulation-Framework, which systematically supports the modeling, hardware-relevant simulation and analysis of purely digital and massively parallel CNN/MRF processing architectures. The Simulation-Framework is not a stand- alone system, but also linked to a technology independent CNN/MRF high-level design framework [8] for FPGA prototyping and ultra deep submicron ASIC realization. Together, the Simulation-Framework and the high-level design framework form a seamless digital CNN/MRF design flow. II. SIMULATION-FRAMEWORK Obviously, one of the very first steps toward any kind of CNN/MRF hardware-architecture conception, verification, realization and overall system integration is the formally alge- braic formulation of the corresponding CNN/MRF signal- and 2Unmanned Aerial Vehicle 94

[IEEE 2005 9th International Workshop on Cellular Neural Networks and Their Applications - Hsinchu, Taiwan (28-30 May 2005)] 2005 9th International Workshop on Cellular Neural Networks

  • Upload
    sc

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE 2005 9th International Workshop on Cellular Neural Networks and Their Applications - Hsinchu, Taiwan (28-30 May 2005)] 2005 9th International Workshop on Cellular Neural Networks

Simulation-Framework for Purely DigitalCNNIMRF-Architectures

Stephan C. StilkerichEADS Corporate Research CenterImage and Signal Processing Group

D-8 1663 Munich, GermanyE-mail: [email protected]

Abstract-Any kind of hardware-relevant modeling, simula-tion and analysis of purely digital and massively parallel archi-tectures, which are based on CNN/MRF1 processing principlesis a time consuming, computing resources intensive, fault-proneand complex task. Until now there is no industrially qualifiedtoolkit available to systematically support these tasks in a singleclosed environment. In this contribution we present a novelSimulation-Framework for purely digital CNNIMRF processingsystems. The unique modeling, hardware-relevant simulationand analysis capabilities unified in this Simulation-Frameworkallows it to systematically investigate (1) the massively parallelprocessing dynamic of digital CNN/MRF devices, (2) the model'sconvergence behavior and (3) complete CNNIMRF systems withan application-specific size. The paper is finalized by simulationresults demonstrating the ability of the framework to handleCNNIMRF processing systems of realistic size and complex-ity. This manifests the industrial relevance of the proposedCNN/MRF Simulation-Framework.

I. INTRODUCTION

The continuously impressive semiconductor progress of thelast decade and the resulting availability of digital ASICtechnology families with up to 100 million usable gatesand 6-8 global wiring levels paves the way for industrialrelevant massively parallel and purely digital architectures ofCNN/MRF based image- and signal-processing systems. Incontrast to the prevalent CNN architectures, where on theone hand the sensing- and processing elements often forma spatial neighboring structure [6] [7] [2] and on the otherhand the implementation is done in analog- or mixed-modetechnologies [5] [1], we advocate a different architecturalCNN/MRF setting. We propose a sensing-processing arrange-ment with a clear separation of the sensor element(s) andCNN/MRF processing structure(s). Furthermore we advocatea purely digital realization of the CNN/MRF processing units.Of course we are aware of the data-bottleneck between thesensor and the CNN/MRF processing unit. One architecturalpossibility to mollify this problem is described in [9]. Withoutthe separated sensing-processing arrangement and the purelydigital realization several promising application scenarios andsystems, where the unique CNN/MRF processing paradigmis essentially required, will not become reality. The followingarguments justify the decision to separate the sensor element(s)and the CNN/MRF processing structure(s) as well as to realize

lMarkov Random Field [10], [3]

the processing unit(s) purely digital: (1) Several importantsensor signal sources like radar, infrared and sonar are ex-cluded from been processed in the CNN/MRF paradigm ifone insists on spatially merged sensing-processing arrange-ments on one chip. The integration of CNN/MRF processingstructures in the above mentioned sensors is not feasible dueto technical, economic and intellectual property reasons. (2)Any data-fusion process is substantially aggravated in theCNN/MRF processing paradigm with a combined sensing-processing arrangement in analog technologies because severalsignal sources are digitally available or the analog signal-levelsae incompatible. (3) The application for approval of comput-ing devices for safety critical scenarios in space flight andaviation (airplane, helicopter, UAV2) is difficult to manage forcombined sensing and processing units in analog technologies.(4) Radiation hardening and also hardening for just harsh envi-ronments is in a merged analog CNN/MRF sensing-processingarrangement extremely difficult to manage as the hardeninghas to be done simultaneously for both the sensor and theprocessing structures and thus surely affecting the performanceof one of them. (5) High-Level design methodologies haverecently been demonstrated [8] to be promising in adequatelyhandling the design-complexity of large CNNs/MRFs.

This paper describes the novel industrial approvedSimulation-Framework, which systematically supports themodeling, hardware-relevant simulation and analysis ofpurely digital and massively parallel CNN/MRF processingarchitectures. The Simulation-Framework is not a stand-alone system, but also linked to a technology independentCNN/MRF high-level design framework [8] for FPGAprototyping and ultra deep submicron ASIC realization.Together, the Simulation-Framework and the high-leveldesign framework form a seamless digital CNN/MRF designflow.

II. SIMULATION-FRAMEWORK

Obviously, one of the very first steps toward any kindof CNN/MRF hardware-architecture conception, verification,realization and overall system integration is the formally alge-braic formulation of the corresponding CNN/MRF signal- and

2Unmanned Aerial Vehicle

94

Page 2: [IEEE 2005 9th International Workshop on Cellular Neural Networks and Their Applications - Hsinchu, Taiwan (28-30 May 2005)] 2005 9th International Workshop on Cellular Neural Networks

image-processing model. Additionally, a functionally equiva-lent model in C/C++ or Matlab can at an early stage revealelementary problems and is - according to our experience- rather helpful. The developing step directly following isthat of exhaustive hardware-relevant simulations, analysis andpossible model refinement. Figure 1 shows the differentcomponents and the overall arrangement of the proposedSimulation-Framework, which supports this developing step.Structurally the framework is divided into two parts, indicatedby the dashed line in figure 1. The upper part representsthe infrastructure and modules to define and generate theCNN/MRF model, which is intended to be simulated. Thebottom part, however, represents the modules, which actuallycontrol and conduct the simulation itself, including modulesto display different data during simulation and to store thesedata for further off-line analysis.

-_

Fig. 1. Components and arrangment of the proposed hardware-relevantCNN/MRF Simulation-Framework.

A. Simulation SetupIf the CNN/MRF model has been mathematically defined

it is a straightforward task in to setup that specific modelwithin the Simulation-Framework and to start and conduct asimulation run. Merely the following steps have to be doneby the user: (1) The overall size of the CNN/MRF andthe neighborhood system, connecting the processing elementsamong each other, has to be defined. This definition is done bya text-file and in a predefined syntax. (2) Then the user has togenerate a so-called Frame Cell Module by hand. This moduledefines the unique name of the simulation model, the optimiza-tion method of each CNN/MRF processing element, signals ofthe model to be stored for later off-line analysis and valuesfor the CNN/MRF border, where the neighborhood system isnot complete. (3) After that the user has to define the corecalculations of each processing element in standard C/C++,eventually by referencing, predefined standard CNN/MRFcalculations, collected in the energy functional models' library.These core calculations represent the CNN/MRF signal- orimage processing model. (4) Furthermore the core calcula-tions have to be constrained (bit-width) with respect to fix-

point arithmetic3 used in digital CNN/MRF realizations. (5)All these definitions, directives and the frame cell moduleare passed over to the Simulation-Model Generator, whichautomatically generates the complete CNN/MRF model.To stress this point, each CNN/MRF processing element

is explicitly and distinctly generated as a simulation part.This also holds true for the wiring among the processingelements, defined by the neighborhood system. Consequently,the simulation-model generator synthesizes a complete and de-tailed simulation-copy of a massively parallel CNN/MRF dig-ital VLSI architecture, which realizes the CNN/MRF signal-or image processing model.

(6) This simulation-copy of that CNN/MRF digital VLSIarchitecture is in the following triggered and executed byan event-driven simulation kernel, which imitates the time-discrete and clocked behavior of digital systems. (7) Finally,the simulation results can be displayed to the user during thesimulation run or alternatively be stored so that the user canconduct off-line analyses.

Several module libraries are arranged around theSimulation-Model Generator (cf. figure 1) and are usedby this generator during the synthesis process of thesimulation model. On the right hand side module librariesare disposed, which contribute solely to the structure andtopology of each massively parallel CNN/MRF architecture.On the left hand side module libraries are disposed, whichdefine general processing parts within each CNN/MRF cell.These two groups of module libraries are explained in thenext sections.

B. Topology & Structure

The topology and structure defining building blocks,which are essentially required to setup a hardware relevantCNN/MRF simulation model, are organized in three differentlibraries. The first library comprises empty Cells and agglom-eration of empty Cells ( Cell-Cluster ), definitely representingthe most fundamental and topology defining parts of eachsingle massively parallel CNN/MRF architecture. Several dif-ferent types of cells are currently collected in this library.These cells differ only with respect to ports, where data istransmitted and received at each cell and the kind of data-buffering at these ports. Most of the VLSI significant casesare covered by these cells or can easily build up by thealready defined cells. Cell-Cluster are recursive structures withinstantiations of smaller cell-cluster until only simple cells areinstantiated. These cell-cluster structures were introduced tosimplify the generation process and also to improve the usageof computing resources, because each cell-cluster is optimizedwith respect to memory and CPU-time usage.The second library comprises a rich set of wiring modules

to connect cells and cell-cluster. Depending on the usedneighborhood system the wiring respectively the connectionsbetween cells differ and have to be adapted correctly. Wiring

3float-point arithmetic is not suitable for digital VLSI realizations due tochip area constraints

95

Page 3: [IEEE 2005 9th International Workshop on Cellular Neural Networks and Their Applications - Hsinchu, Taiwan (28-30 May 2005)] 2005 9th International Workshop on Cellular Neural Networks

characteristics for standard CNN/MRF neighborhood systems(currently 1st to 5th order) as well as generic wiring modulesare collected in this library. Again, the simulation-modelgenerator selects the corresponding wiring modules out of thislibrary, which are required to synthesize the simulation model.The third library comprises different framer modules, which

represent common simulation settings with already predefineddisplay and data storing settings. The selection of such aframer can be defined and passed over to the simulation-modelgenerator; if needed. This scheme simplifies and speed-up thesimulation setup for first runs on CNN/MRF models underinvestigation and improvement or on complete new models.

C. Processing

The processing defining building blocks of CNNs/MRFs,essentially for any hardware relevant simulation of massivelyparallel architectures, are also organized in three differentlibraries. The first library enfolds so-called energy-functionals,which represent the calculations within each CNN/MRF cell;of course depending on the concrete signal- or image process-ing problem to solve. Both, complete CNN/MRF models forde-noising, de-blurring, edge extraction and segmentation andcommonly used sub-modules and calculations are summarizedin this library. Consequently, the user can refer to a rich setof already predefined specific as well as common modulesto define its own CNN/MRF model within this Simulation-Framework. This simplifies the simulation-model creation andtremendously shortens the development time for each newCNN/MRF model.The second library comprises several different optimization

methods in order to calculate the solution of the correspond-ing signal- or image processing problem represented by theCNN/MRF. The spectrum of optimization methods rangesfrom pure deterministic methods (e.g. Iterated ConditionalModes ICM) to pure stochastic samplers (e.g. Gibbs sampler)and also includes mixed methods like Modified MetropolisDynamic [4]. Additionally, one can choose DeterministicAnnealing (DA) or Expectation Maximization (EM) as op-timization method. With these different methods at hand onecan systematically investigate the models' performance withrespect to the optimization method.

Finally the third library comprises hardware relevant inter-face modules, which are enriched with processing capabilitiesin order to conduct the model's parameter estimation. Duringthe transfer of raw data to the cells this structure can performcalculations to estimate the model's parameters and likewisealso when result-data is received from the cells. These modulesare very specific for VLSI implementations and for systemintegration issues; this is the reason why we do not detail thisdiscussion here.

D. Limitations

The Simulation-Framework presently owns two main lim-itations, which will be removed step by step in future andupcoming versions of the framework. As first limitation the

TABLE ICOMPUTING RESOURCES FOR THE UNSUPERVISED SEGMENTATION

MODEL WITH 8 CLASSES AND 5TH ORDER NEIGHBOURHOOD SYSTEM

Size Memory Model Generation Sim Setup Sim time64x64 0,2GB O.lh O.Olh 0.15h128x128 0,7GB 0.2h 0.03h 0.25h256x256 2,8GB 0.4h 0.05h 0.35h512x512 -GB -h -Oh --h

system can only handle CNNs/MRFs with a limited neigh-borhood support. Neighborhood systems of size 5 x 5 withrespect to the central site/pixel can be simulated withoutlimitations. Any larger neighborhood system affects details ofthe hardware relevant simulation, which finally leads to a lossof accuracy. The second limitation is the tremendous memoryusage (cf. table I) of the Simulation-Framework, dependingon the complexity and the used neighborhood system of theCNNJMRF models.

III. RESULTS

The afore proposed and explained Simulation-Frameworkwas implemented in C++ and intensively tested, improved,rearranged and debugged. For this purpose, we have usedvarious artificial simulation settings as well as two concreteand industrial relevant CNN/MRF models. With the help ofthe artificial simulation settings we have tested individualcomponents of the Simulation-Framework apart and differentcombinations of them. Exactly this testing procedure of thedistinct framework parts exposed some problems and at thesame time allowed us to improve each module separately andthe interplay of the different modules. The actual spectrum ofcapabilities, that the novel Simulation-Framework offers, wasthen demonstrated by two concrete CNN/MRF models. Thefirst model - not shown in this publication- removes noisefrom grey-tone images and concurrently preserves significantintensity changes of the image, which represent discriminatingfeatures for different image regions. The second CNN/MRFmodel realizes an unsupervised segmentation process. Figure2 shows the raw image data and a simulation sequence ofthat unsupervised segmentation model that illustrates the per-formance and processing dynamics of that model. A detaileddiscussion of this specific model and earlier simulation resultscan be found in [9].The exhaustive simulation runs, which were conducted

on the unsupervised segmentation model and its followingsystematic analysis uncloses two main insights into the model.Both insights into the unsupervised segmentation model areessential for future digital VLSI device implementations andsystem integrations. The first insight reveals that the conver-gence speed of the model is much faster then expected andcan even be further improved by processing image sequences,where the the old result is used as a starting point for the newimage. The second insight reveals detailed suggestions for thebit-widths of the different fix-point representations within the

96

Page 4: [IEEE 2005 9th International Workshop on Cellular Neural Networks and Their Applications - Hsinchu, Taiwan (28-30 May 2005)] 2005 9th International Workshop on Cellular Neural Networks

model and the astonishing robustness of the model against This novel Simulation-Framework, suitable for a broughtnumerical inaccuracy. class of CNNs/MRFs formulated on regular grids with a

limited neighborhood relation, represents one absolutely es-sential part of an overall systematic development flow, whichsupports our effort to be first-time-silicon-right with respect tohighly integrated CNN/MRF devices in ultra deep sub-micronsemiconductor technologies.

ACKNOWLEDGMENT(a) I appreciate Prof. J. M. Buhmann and his working group at

the Swiss Federal Institute of Technology (ETH Zurich) fortheir discussions and suggestions on state-of-the-art statisticalclustering and segmentation models. I would also like to thankmy colleagues at EADS Corporate Research Center for theircritical comments and fruitful discussions on the Simulation-Framework.

SS - ~ REFERENCES

[1] J.M. Cruz and L.O. Chua. Pinout and operation manual of Analog-input DuaI(AnalogLogic)-output 16* 16 CNN universal chip. Technicalreport, UC Berkeley, 1996.

[2] R. Domiinguez-Castro, S. Espejo, A. Rodriguez-Vazquez, and R. Car-mona. A CNN Universal Chip in CMOS Technology. In Proc. ofthe third IEEE Int. Workshop on Cellular Neural Networks and theirApplications (CNNA'94), pages 91-96, 1994.

_zYSc_ _FF _ _&E2 _ [3] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions,and the Bayesian restoration of images. IEEE Transactions on PatternAnalysis and Machine Intelligence (PAMI), 6(l):721-741, 1984.

[4] Z. Kato,y . Zerubia, and M. Berthold. Image classification using MarkovRandom Fields with two new relaxation methods: Deterministic pseudoannealing and modified Metropolis dynamiics. Technical report 1606,INRIA, 1992.

[5] G. Linan, R. Dominguez-Castro, S. Espejo, and A. Rodriguez-Vazquez.Design of a large-complexity analog 1/0 CNNUC. In Proc. of EC-

*___ ________ _ CTD99, pages 42-57, Stresa, 1999..2~.. ;, * * .2 . > > [6] Alireza Moini. Vision chips or seeing silicon. Technical report, The

University of Adelaide, March 1997.[7] T. Roska and L. 0. Chua. The CNN Universal Machine: An Analogic

Array Computer. IEEE Transactions on Circuits and Systems-lI, 40:163-173, March 1993.

[8] Stephan C. Stilkerich and Joachim K. Anlauf. High-Level designenvironment for massive parallel VLSI-Implementations of statisticalsignal- and image processing models. Proc. IEEE ISCAS Int. Sym. on

. . . . . .Circuits and Systems, 2004.

_ g _ _ _ _ _ [9] Stephan C. Stilkerich and Joachim M. Buhmann. Massively ParallelArchitecture for an unsupervised segmentation model. Proc. IEEEICSES Int. Conf on Signals and Electronic Systems, 2004.

(b) [10] G. Winkler. Image Analysis, Random Fields and Dynamic Monte CarloMethods, volume 27 of Applications of Mathematics. Springer-Verlag,

Fig. 2. Unsupervised 8-class segmentation. Raw image data (a) and 2003.simulation sequence (b) of processing dynamic (from left to right and upto down). Exactly two calculation steps are omitted between each picturepair. Thus the sequence reads (1 + 3 n), n = 0,1, 2, 3, 4,...

IV. CONCLUSIONIn this contribution we have introduced and described a

novel Simulation-Framework for purely digital and massivelyparallel CNN/MRF image- and signal processing architectures,which uniquely combines the following key capabilities: (1)simulation of complete CNN/MRF systems, (2) purely digitalVLSI-relevant modeling and simulation, (3) significant simu-lation speed-up compared with standard HDL simulations, (4)flexible CNN/MRF simulation-model generation by means ofself-generating CNN/MRF topology gantries.

97