16
Visual saliency on networks of neurosynaptic cores A. Andreopoulos B. Taba A. S. Cassidy R. Alvarez-Icaza M. D. Flickner W. P. Risk A. Amir P. A. Merolla J. V. Arthur D. J. Berg J. A. Kusnitz P. Datta S. K. Esser R. Appuswamy D. R. Barch D. S. Modha Identifying interesting or salient regions in an image plays an important role for multimedia search, object tracking, active vision, segmentation, and classification. Existing saliency extraction algorithms are implemented using the conventional von Neumann computational model. We propose a bottom-up model of visual saliency, inspired by the primate visual cortex, which is compatible with TrueNorth-a low-power, brain-inspired neuromorphic substrate that runs large-scale spiking neural networks in real-time. Our model uses color, motion, luminance, and shape to identify salient regions in video sequences. For a three-color-channel video with 240 136 pixels per frame and 30 frames per second, we demonstrate a model utilizing 3 million neurons, which achieves competitive detection performance on a publicly available dataset while consuming 200 mW. Introduction From an engineering perspective, building a portable, energy-efficient, and real-time vision system that is capable of identifying interesting or salient visual regions of an image is challenging, because of the inherent ambiguity and intractability of the vision problem [1–10]. Biological visual systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate and discriminate visual events important for an organism’s survival [11]. Remarkably, the biological neural networks that underlie this complex visual task are extremely energy-efficient (the entire human brain consumes only 10 W), perform well in a wide range of conditions (low light and moving and cluttered environments), and react quickly (complex identification occurs in a few hundred milliseconds). It has been proposed that biological visual systems make use of the fact that salient objects in a visual image differ statistically from their backgroundVthe so called Bpop-out[ effect that automatically draws an observer’s attention. Based on Treisman and Gelade’s Feature Integration Theory [12], Koch and Ullman [13] hypothesized a purely stimulus-driven model of primate visual attention selection. The basic elements of this hypothesis are 1) a massively parallel computation of separate feature maps (orientation, color, motion, etc.), 2) the merging of the separate feature maps into a single topographic saliency map encoding the relative locations of salient regions, and 3) a winner-take-all mechanism enabling the serial selection of conspicuous image locations. Itti et al. [14] presented an algorithmic implementation building upon [13] that has been used for diverse tasks such as predicting eye-movement fixation patterns in primates [15], target detection [14], and video compression [16]. Judd et al. [17] used a database of eye-tracking experiments to build a supervised learning model of saliency that combines bottom-up cues (characterized by little or no high-level direction) and top-down image semantics (satisfying specific goals and targets). By drawing upon the rich and diverse literature on saliency algorithms, our main contribution is to invent a spiking-based, real-time, bottom-up visual saliency model and implement it on a brain-inspired processor, TrueNorth [18, 19], to enable real-world vision applications (Figures 1 and 2). Hardware and software substrates We begin by providing an overview of the TrueNorth hardware, followed by an introduction to the software environment used for application development. This material constitutes a necessary prerequisite for describing the implementation of the saliency model. ÓCopyright 2015 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. A. ANDREOPOULOS ET AL. 9:1 IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015 0018-8646/15 B 2015 IBM Digital Object Identifier: 10.1147/JRD.2015.2400251

Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

Visual saliency on networksof neurosynaptic cores

A. AndreopoulosB. Taba

A. S. CassidyR. Alvarez-IcazaM. D. Flickner

W. P. RiskA. Amir

P. A. MerollaJ. V. ArthurD. J. Berg

J. A. KusnitzP. Datta

S. K. EsserR. Appuswamy

D. R. BarchD. S. Modha

Identifying interesting or salient regions in an image plays animportant role for multimedia search, object tracking, active vision,segmentation, and classification. Existing saliency extractionalgorithms are implemented using the conventional von Neumanncomputational model. We propose a bottom-up model of visualsaliency, inspired by the primate visual cortex, which is compatiblewith TrueNorth-a low-power, brain-inspired neuromorphic substratethat runs large-scale spiking neural networks in real-time. Ourmodel uses color, motion, luminance, and shape to identify salientregions in video sequences. For a three-color-channel video with240 � 136 pixels per frame and 30 frames per second, wedemonstrate a model utilizing �3 million neurons, which achievescompetitive detection performance on a publicly available datasetwhile consuming �200 mW.

IntroductionFrom an engineering perspective, building a portable,energy-efficient, and real-time vision system that is capableof identifying interesting or salient visual regions of an imageis challenging, because of the inherent ambiguity andintractability of the vision problem [1–10]. Biological visualsystems, on the other hand, have evolved specializedattentional neural circuitry that is able to quickly locate anddiscriminate visual events important for an organism’ssurvival [11]. Remarkably, the biological neural networksthat underlie this complex visual task are extremelyenergy-efficient (the entire human brain consumes only10 W), perform well in a wide range of conditions (low lightand moving and cluttered environments), and react quickly(complex identification occurs in a few hundredmilliseconds).It has been proposed that biological visual systems make

use of the fact that salient objects in a visual image differstatistically from their backgroundVthe so called Bpop-out[effect that automatically draws an observer’s attention.Based on Treisman and Gelade’s Feature IntegrationTheory [12], Koch and Ullman [13] hypothesized a purelystimulus-driven model of primate visual attention selection.The basic elements of this hypothesis are 1) a massively

parallel computation of separate feature maps (orientation,color, motion, etc.), 2) the merging of the separate featuremaps into a single topographic saliency map encoding therelative locations of salient regions, and 3) a winner-take-allmechanism enabling the serial selection of conspicuousimage locations. Itti et al. [14] presented an algorithmicimplementation building upon [13] that has been used fordiverse tasks such as predicting eye-movement fixationpatterns in primates [15], target detection [14], and videocompression [16]. Judd et al. [17] used a database ofeye-tracking experiments to build a supervised learningmodel of saliency that combines bottom-up cues(characterized by little or no high-level direction) andtop-down image semantics (satisfying specific goals andtargets).By drawing upon the rich and diverse literature on

saliency algorithms, our main contribution is to inventa spiking-based, real-time, bottom-up visual saliencymodel and implement it on a brain-inspired processor,TrueNorth [18, 19], to enable real-world vision applications(Figures 1 and 2).

Hardware and software substratesWe begin by providing an overview of the TrueNorthhardware, followed by an introduction to the softwareenvironment used for application development. This materialconstitutes a necessary prerequisite for describing theimplementation of the saliency model.

�Copyright 2015 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done withoutalteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed

royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

A. ANDREOPOULOS ET AL. 9 : 1IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

0018-8646/15 B 2015 IBM

Digital Object Identifier: 10.1147/JRD.2015.2400251

Page 2: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

The TrueNorth architectureTrueNorth is a low-power, brain-inspired, digital chiparchitecture [18, 19] with one million spiking neurons and256 million synapses organized in 4,096 neurosynapticcores. TrueNorth is implemented in a 28-nm silicon processand has �5.4 billion transistors (Figure 1). The cores areinterconnected by a two-dimensional on-chip mesh network.Further, multiple TrueNorth chips can be seamlesslyinterconnected by grid tiling.Each core consists of 256 input axons i 2 f0; . . . ; 255g

and 256 output neurons j 2 f0; . . . ; 255g, interconnectedby programmable binary synapses Wi;j, implemented asa 256 � 256 binary crossbar (Figure 3). To implementweighted synapses, each axon i has a type indexGi 2 f0; 1; 2; 3g, and each neuron j assigns a 9-bit signedweight, SGij , to axon type Gi. Thus, the effective weightfrom axon i to neuron j is Wi;j � SGij .Information is communicated via spikes, generated by

neurons and sent to axon inputs via the on-chip/off-chipinterconnection network. A spike is a packet with a targetdelivery time, encoding the value 1. The absence of a spike

corresponds implicitly to a value of 0. An axon receiving aspike transfers it to each neuron it is connected to via thebinary synaptic crossbar. Spikes may be used to representvalues using the rate, time, and/or place at which spikesare created (Figure 4). A core’s operation is driven by aglobal clock with a nominal 1-ms tick, during which allspikes are delivered to their destinations.The computation performed by neuron j at tick t is

defined by the neuron equation, described in detail in[20]. It is an extension of the leaky integrate-and-fire neuronmodel and comprises five operations executed insequence:

1. Synaptic integrationVThe neuron state, or membranepotential, VjðtÞ, is the sum of its value Vjðt � 1Þ at theprevious tick and the weighted sum of input spikes AiðtÞthat arrive at the neuron from up to 256 input axons i,using the neuron’s weight SGij associated with each axon’stype Gi:

VjðtÞ ¼ Vjðt � 1Þ þX

i

AiðtÞ �Wi;j � SGij

Figure 1

A TrueNorth chip consists of 4,096 interconnected cores. Each core takes its input spikes through 256 input axons and also has 256 output neurons. Anetwork of nine such cores is shown at the bottom left inset.

9 : 2 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 3: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

2. Leak integrationVThe neuron membrane potential VjðtÞis incremented (or decremented) by a signed leak �j,acting as a constant bias on the neuron dynamics.

3. Threshold evaluationVAfter synaptic and leakintegration, VjðtÞ is compared with a threshold �j.

4. Spike firingVIf VjðtÞ � �j, the neuron Bfires[ or injects aspike into the network, bound for its destination axon.Note that if VjðtÞ does not reach its threshold, it willnot spike.

5. ResetVIf a spike is fired, the neuron resets VjðtÞ to aconfigurable reset value.

In addition to this basic operation mode, the neuron alsosupports several leakage modes, stochastic modes forsynapses, leak, thresholds, and more. This neuron model hasbeen demonstrated to implement a wide range of arithmetic,logical, and stochastic operations, and to emulate the 20Izhikevich models of biological neurons [20]. Configuringa neuron’s computational behavior is accomplished bysetting the 23 neuron parameters (synaptic weights, leak,thresholds, stochastic operation, etc.). Various neuronconfigurations were used to implement a number of different

algorithms and applications [21]. In this paper, we useseveral neuron configurations/types to implement the neededfunctionality.Each neuron can output up to one spike per tick. With

a tick frequency of 1 kHz, each neuron can output between0 and 1,000 spikes per second, sent to an axon locatedeither on the same core or on a different core. An axon mayreceive its input from more than one neuron (referred toas a bus-OR), such that if more than one spike arrives at anaxon in the same tick, these spikes are merged into a singlespike (logical OR operation). Furthermore, each neuronhas an associated delay value between 1 and 15, which is thenumber of ticks from the time a spike is generated by theneuron until the time the spike is consumed by the targetaxon. At the time of consumption, the axon distributesthe spike to up to 256 neurons on the core, accordingto the programmed crossbar connectivity.

The corelet programming paradigmThis section provides a brief overview of the coreletprogramming paradigm [22] for building TrueNorthapplications. This new paradigm supports the different

Figure 2

An overview of the basic saliency system’s corelet architecture (top) as well as example inputs and outputs (bottom). Each of these subcomponents isdescribed in the paper.

A. ANDREOPOULOS ET AL. 9 : 3IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 4: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

thinking programmers must acquire in their migration fromimplementing sequential algorithms on von Neumannarchitectures toward programming brain-inspired,neurosynaptic architectures.

The corelet programming paradigm offers

1. An abstraction for a TrueNorth program, named a corelet,which is an object-oriented paradigm for creating andhierarchically composing networks [22], conceptuallyakin to the concepts of compositional modularity [23].

2. A library of reusable corelets suitable for combining intolarger, complex functional networks.

3. An end-to-end corelet programming environment (CPE)that integrates seamlessly with the TrueNorthsimulatorVCompass. Compass is a highly scalable,parallel, spike-for-spike equivalent simulator of theTrueNorth functional blueprint, which runs onLinux**, OS X**, Windows**, and IBM Blue Gene*supercomputers. Compass has been tested with networkscontaining more than 2 billion neurosynaptic cores,500 billion neurons, and more than 1014 synapses[24, 25].

A TrueNorth program is a complete specification of anetwork of neurosynaptic cores, along with a specificationof its input axons and output neurons. A corelet is anabstraction that represents a parameterized TrueNorthprogram, exposing only its external inputs and outputs whilemasking all other details of the neurosynaptic cores and theirconnectivity and configuration. The internal network

Figure 4

The value 7 encoded by three different spike-coding schemes, all usinga 16-tick time window. In rate coding, the information is encoded bythe number of spikes occurring over the time window. In burst coding,the spikes encoding the value are sequential and commence at thebeginning of the time window. In time-to-spike coding, the spike’s timeof occurrence in ticks within the time window denotes the value.

Figure 3

The left inset shows a TrueNorth core’s crossbar, with input spikes entering via the axons to the left, and spikes exiting via neurons at the bottom. Blackcircles at crossbar intersections denote synaptic connectivity. PRNG stands for Bpseudo-random number generator.[ The right insets give an example oftwo periodically spiking neurons that send their periodic spikes to a four-way splitter which is responsible for making four copies of each bus-ORmerged input spike (see Table 1 for the actual neuron parameters). Such splitters and synchronization neurons are fundamental blocks used in thesaliency system. The six neurons are indexed as six distinct neuron types A-F.

9 : 4 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 5: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

connectivity of these external inputs and outputs is hiddenfrom the corelet user through the use of lookup tables(referred to as input connectors and output connectorsrespectively). By specifying values for the coreletparameters, a TrueNorth program instantiation is generatedthat can specify the TrueNorth processor’s behavior.The saliency system introduced in this paper is a corelet

that consists of multiple subcorelets, each of which inturn consists of several subcorelets. Some of the moreprominent subcorelets used within the saliency system are:convolution, Gaussian pyramid, image gradient, rate-to-burstconversion, local averaging, center-surround filter,weighted feature merge, image upsampling, splitter, delay,motion detection, and motion history corelets.

MethodsFigure 2 illustrates the block-level organization of thecomponents that make up the saliency system. The inputconsists of sequences of video frames, where each frametypically consists of three channels representing luminanceand chromaticity. A spike representation of each pixel’sintensity and color in each frame is provided as inputto the system.The saliency system’s pipeline first calculates a Gaussian

pyramid [26] in each channel, which enables the processingof each input frame at multiple scales. The luminancechannel is sent to the motion detection block, whichdetects image pixels where a significant change in luminanceoccurs over a short period of time, and is realized in theform of a multiscale frame-difference algorithm combinedwith a stochastic motion history neuron for retainingthe recent frame-difference history. The user has theflexibility of specifying the actual separation of framepairs whose difference is calculated, as well as the spatialscales used.The Gaussian pyramid outputs for all three channels

are also sent to the center surround detector [27]. Thecenter-surround detector responds strongly to small regionsthat are significantly different than their surroundings,and is realized in the form of a 15 � 15 center-on,surround-off filter that is applied to the multiscale edge maps.These edge maps are extracted from the output of theGaussian pyramid by a finite differences algorithm. Inunison, this Gaussian pyramid with edge detection operationis akin to performing a difference-of-Gaussians on theinput image. Subsequently, a spatial rescaling step is applied.It involves upsampling and downsampling blocks to bringall the motion detection and center surround outputs to acommon resolution. Afterwards, a smoothing block isapplied to the registered images. It suppresses speckle noiseand enhances the centers of salient pixel patches. The finalstep involves applying a cascade of nonlinear weightedlocal maxima blocks, combined with weighted averaging,to produce the final saliency map.

The next section explains how to map the above-describedconceptual blocks onto corelets for the TrueNortharchitecture.

Gaussian pyramid

Input considerations: Source of input spikesIf this is a standalone corelet, the spikes arrive from theexternal video transduction mechanism that represents eachchannel of each input frame as a set of spikes. If this is asubcorelet of a larger system, the corelet that generates theseinput spikes must also output a spike-based representation ofeach frame in a video sequence.

Input considerations: Input spike formatThe input consists of sequences of video frames enteringat 30 frames per secondVone frame every 33 ticks.Each frame typically consists of three channels, somechannels (typically one) representing luminance and therest chromaticity. For each channel, the pixel values areuniformly quantized into 4 bits and then rate-codedusing 0 to 15 spikes distributed across the first 15 ticksof the 33 tick window (see Figure 4).

Algorithm: Functional descriptionThe algorithm calculates the Gaussian pyramidrepresentation of each input frame, which provides amultiscale representation of each input frame. A cascadeof Gaussian-filter-based convolutions followed bysubsampling is applied, which provides the desiredpyramidal representation of the input.

Algorithm: Corelet implementationAs shown in Figure 2, the first layer of the saliency pipelineconsists of a corelet that extracts three-layer GaussianpyramidsVone pyramid per input channel. The resolutionsfor the three layers are 240 � 136, 120 � 68, and60 � 34 pixels ðwidth� heightÞ.Each downsampled pyramid layer applies a 3 � 3

convolution with a discrete Gaussian kernel to the previouslayer’s output and downsamples the resultant image bydiscarding the output of every second horizontal and verticalpixel. As shown in Figure 5 and Figure 6, the kernelconsists of three distinct non-negative integers r, s, and t.Each layer outputs a 4-bit rate-coded representation of theresult. Figures 5, 6, and Table 1 show the neuron parametersnecessary to implement this convolution.In order to make efficient use of the TrueNorth

architecture, each core receives input from up to 256 pixels,with one axon used per input pixel. Each input pixel/axonaffects the output of multiple neurons, due to theoverlapping 3 � 3 window of the Gaussian convolution.As a result, it becomes necessary to use three different neurontypes per core (see Figure 5 and Figure 6) and three axon

A. ANDREOPOULOS ET AL. 9 : 5IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 6: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

types. The fourth axon type of each neuron has a negativeweight and is used to reset themembrane potential every frame.Note that neurons of type D calculate the weighted average

value of the eight border pixels of each 3 � 3 kernel,and then by using a copy of the central pixel value (neuron E)a weighted average neuron C calculates the weighted

average of the two inputs, which provides the desired result.Splitters are used to replicate any input pixels that are used atthe boundaries by more than one core. Each one of theseneurons also uses rounding bias axons that increment themembrane potential by a constant value at the start of eachperiod to make the normalized output value lie closer the

Figure 5

Gaussian convolutions and weighted average neuron. Shown here are the neuron parameters used to calculate the 3 � 3 Gaussian convolution for eachblock of pixels mapped to a core, by using neuron types A-E (also see Table 1). Note that the letters a, b, c, and d in square boxes denote an axon type.See Figure 6 for related information.

Figure 6

Kernel and gradient. The subfigure at the left shows the modeling of a 5 � 5 Gaussian kernel as a cascade of two 3 � 3 kernels where the second 3 � 3kernel is applied over four distinct grates. The right subfigure shows the edge extraction network, with the neuron type A-D specification given inTable 1. See Figure 5 for related information.

9 : 6 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 7: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

rounded result after division by r þ 4sþ 4t or 4sþ 4t.Synchronization neurons (see Figure 3) are used to sendperiodic reset and rounding pulses to the control axons,at the appropriate ticks within each frame.While only 3 � 3 kernels were used in this instantiation of

the Gaussian pyramid, the corelet can also use kernels withlarger support regions as it is shown in Figures 5 and 6for 5 � 5 kernels. The idea is to decompose an image intomultiple sub-images by uniformly sampling the originalimage every f ¼ 2 pixels horizontally and vertically tocreate f 2 subimages (f is referred to as the grating parameter)and apply a 3 � 3 convolution to each of these sub-images.In other words, the idea is to first apply a standard 3 � 3Gaussian convolution, and then successively apply moreconvolutions as needed, with larger but sparser kernels(see Figures 5 and 6), so that the result approximates thedesired kernel.

Output considerations: Output spike formatFor each channel, output connectors are provided thatrepresent the output for the 240 � 136, 120 � 68,and 60 � 34 pixel resolutions of the respective channel.

Output considerations: Destination of output spikesFor each output resolution of the luminance channel,one copy of the spikes is sent to a motion detectioncorelet, and the other copy is sent to an edge extractioncorelet, which is in turn used as part of the centersurround operation. Each unique output resolution of thenon-luminance channels is sent to an edge extraction corelet.

Rate-to-burst conversion and edge extraction

Input considerations: Source of input spikesThe input spikes represent the result of a Gaussian pyramid layer.

Table 1 The assignments for the neuron parameters of the networks discussed in this paper. These neuron parametersdefine the dynamic behavior of the neuron according to the neuron equation [20] with each new tick of the simulationand with each new set of input spikes that enter the neuron at the corresponding tick. Variables r, s, t, tc, tm, k aredescribed in the BMethods[ section of this paper and in Figures 3, and 5 through 8. The function fl(.) stands for the flooroperator.

A. ANDREOPOULOS ET AL. 9 : 7IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 8: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

Input considerations: Input spike formatThe input to the edge extraction network must beburst-rate coded (see Figure 4). If only rate-codedinput is available, a rate-to-burst conversion coreletcan be used first to convert the rate code to aburst code. Tests show that if a rate-coded representationis provided to the edge extractor, system performanceis not significantly affected, and therefore, removingthe conversion mechanism is possible in order todecrease the system’s neuron count.

Algorithm: Functional descriptionThe absolute gradient network is shown in Figure 6. Thenetwork takes a batch of pixels as input, and for every pixelðx; yÞ in this batch, it uses the burst representation of pixelintensity values at coordinates fðx� 1; yÞ; ðx; yÞ; ðx; y� 1Þgto calculate the horizontal and vertical finite differences,thus approximating the partial derivative of the inputintensity image.

Algorithm: Corelet implementationThe output of the Gaussian pyramid network is rate-codedbut not necessarily burst coded. This suggests the inclusionof an optional rate-to-burst code conversion network toconvert the output of the Gaussian pyramid before beingused by the edge extraction network, in order improveperformance. In other words, we need a mechanism that takesas input a sequence of potentially random spikes and outputsthe same number of concatenated/juxtaposed spikes, sincethis is the input that the edge extraction mechanism expects.The idea is simpleVby using a linear neuron ð�j ¼ 1Þ with apositive threshold of 1 and a negative threshold of �255,which is also initialized to a membrane potential of �255,we are guaranteed that the neuron will store any input spikeswithout producing any output spikes, so long as the totalnumber of input spikes does not exceed 255. By usinga control pulse to increase the membrane potentialby 255, once all the output spikes of the correspondingGaussian pyramid neuron have entered the conversionneuron’s axons, we can force the neuron to output asequence of juxtaposed spikes (burst code) of length equalto the number of input spikes. After all the spikes haveexited the neuron, a reset pulse along a synapse withweight �255 causes a reset of the membrane potentialto �255, at which point the process is repeated for thenext frame.The edge-extraction corelet (Figure 6) can then process

this burst-coded input. By merging the first two outputneurons of the finite-differences network in Figure 6 usinga single bus-OR axon, and also merging the last two neuronsusing another bus-OR axon, we are guaranteed that thetwo destination axons will take as input a number of spikesthat is equal to the absolute value of the horizontal andvertical finite differences, respectively.

Output considerations: Output spike formatThe four output neurons provide a number of spikes equal tothe positive and negative values of the horizontal and verticalfinite differences. As a result, for any frame at most twoneurons output spikes.

Output considerations: Destination of output spikesAs previously described, each horizontal and vertical pairof neurons can be bus-OR merged via an input axon of thelocal averaging corelet, providing as input to the localaveraging module the absolute value of the horizontal orvertical finite differences.

Motion detection

Input considerations: Source of input spikesThe input spikes correspond to the output of a Gaussianpyramid layer, typically that of a luminance channel.

Input considerations: Input spike formatThe input consists of sequences of video frames entering at30 frames per secondVone frame every 33 ticks. The pixelvalues may be rate-coded using 0 to 15 spikes distributedacross the first 15 ticks of the 33 tick window, or burst-coded.

Algorithm: Functional descriptionAs it is shown in the saliency system’s architecture inFigure 2, the Gaussian pyramid is sent to a motion detectionmodule (Figure 7). The network subtracts the current framefrom the k-th previous frame by calculating the absolutevalue of the pixel intensity differences at each scale. If thisvalue is at least equal to a user-specified threshold tm, a singleoutput spike is created for that pair of pixels. This spikein turn is sent to a motion history neuron, which uses thestochastic threshold parameters of the TrueNorth neuron [20]to integrate the current input spike with previous spikes andfire a spike with a probability proportional to the currentmembrane potential of the neuron. Notice that a negative leakis associated with the motion history neuron, and as a resultin the absence of new inputs, the probability of it firingdecreases with time. In other words, the motion historyneuron fires with a time weighted probability that increasesas it receives more spikes over a more recent time window.

Algorithm: Corelet implementationEvery input pixel goes through a splitter corelet creating twocopies of the input pixel intensity’s rate-coded representation.One copy goes through a number of delay neurons so thatthe total extra delay is 33 ticks (i.e., k ¼ 1 frame delay).This delayed frame and the current frame finally go into themain motion detection network axon inputs (see Figure 7).These two rate-coded inputs are subtracted from one anotherusing two neurons H and I. One neuron’s membranepotential is increased in proportion to the positive difference

9 : 8 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 9: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

of the inputs, and the other neuron’s membrane potential isincreased in proportion to the negative difference of theinputs. A synchronization neuron E is then responsible forsending a periodic probe signal at the appropriate tickafter all the input spikes have entered the axons, so that thetwo neurons output a total of at most one spike for the currentframe if and only if the absolute value of the differencebetween the two input rate codes is at least equal to thresholdtm. These two neurons are merged using a bus-OR whichcauses a neuron J to output a single spike if and only ifthe absolute value of the difference between the two inputs isat least equal to the desired threshold. Right after the probingcontrol spike enters, a reset spike is sent from neuron F,which resets the membrane potentials of neurons H and I inpreparation for the next frame.Notice that there is also a suppression neuron G that is

responsible for suppressing any possible outputs at thebeginning of the simulation. This is important until the firstdelayed frame enters the input axon, to ensure that nounwarranted output spikes are created. Notice that we onlyneed to use one periodic probe, reset, and suppress neuronper core, since the corresponding control spikes can beshared by all neurons in the crossbar. The output of neuron Jis then sent to the motion history neuron K of Figure 7. Aspreviously described, this is a leaky neuron that fires with atime-weighted probabilistic threshold that is proportional tothe number of spikes that entered the neuron over a finite

time window. By appropriately controlling the neuronparameters, it is possible for example to maintain a Bmotiontail[ of the recent paths traversed by the moving target.

Output considerations: Output spike formatThe output of stochastic neuron K is binarized so that asingle spike is sent to the next corelet if and only if thestochastic neuron outputs at least a certain number of spikesduring the frame.

Output considerations: Destination of output spikesThe binarized output spikes of each scale of the motiondetection corelet are sent to a Bnormalization[ subcorelet thatuses nearest-neighbor interpolation to resample thecorresponding map to a common resolution. This in effectregisters the images to a common coordinate frame.

Local averaging and center surround

Input considerations: Source of input spikesThe input spikes are a function of the output of theedge extraction module.

Input considerations: Input spike formatThe edge extraction module provides the positive andnegative values of the horizontal and vertical partialderivatives at every pixel of an image. These four outputs are

Figure 7

Motion detection network. See Figure 8 for a blocked center-surround network. See Table 1 for the neuron parameter values.

A. ANDREOPOULOS ET AL. 9 : 9IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 10: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

fused at the corresponding input axon of the local-averagingmodule by a bus-OR operation, providing a simple andefficient representation of the intensity of the edge at thecorresponding pixel. Each one of the four inputs thatare bus-ORed is rate-coded using 0 to 15 spikes distributedacross the first 15 ticks of the 33 tick window.

Algorithm: Functional descriptionFigure 8 shows the 15 � 15 center-on, surround-off operatorused, where the central 5 � 5 pixels correspond to thecenter-on region. Compared with often-used operators, suchas difference of Gaussians, the Haar-like operator witha nontrivial support region lends itself to an efficientimplementation on TrueNorth.There are two main components to the operator. The first

component is local averaging, which estimates the 5 � 5average pixel value at each scale and pixel that is output bythe corresponding absolute gradient networks (see Figs. 2and 6). Then, based on the definition of image gratingsgiven above for the Gaussian convolution corelet, eachlocally averaged matrix/image is decomposed into25 grates/sub-images, by using an f ¼ 5 grating parameter.Similar to the case of the 5 � 5 Gaussian, each grate isan independent non-overlapping sub-matrix/sub-imagecontaining all the information needed to calculate the centersurround response.

Algorithm: Corelet implementationFigure 8 and Table 1 show the neuron tiling andneuron parameters used to apply the center surroundoperator on a 2D rate-coded input matrix correspondingto each one of these grates. The placement of thethree axon types fa; b; cg across a grate’s pixels is shown,which is similar to the placement for the Gaussiankernel. The fourth axon type available is used for thereset axon which resets the membrane potential at theend of the frame. An output neuron type from A, B, andC is associated with each grate pixel, where for eachoutput neuron its corresponding 3 � 3 receptive field iscentered at an axon/input pixel of type a, b and c,respectively.By distributing within each grate the neuron types in

the manner shown in Figure 8, it is possible to create anetwork that uses nine 5 � 5 pixel sub-tiles to patch togethera 15 � 15 center-surround operator, which outputs a singlespike if the result is at least equal to the user-specifiedthreshold tc. Neurons A, B output the thresholdedcenter surround response, while neuron D outputs theaverage response of the surround region, which is latercombined with a copy of a 5 � 5 center region (neuron E)by using a Bpositive difference[ neuron of type C to givethe thresholded center surround response for the rest of theimage pixels.

Figure 8

The blocked center-surround network is implemented as a cascade of filtering operations applied on the original input. See Table 1 for the neuronparameter values.

9 : 10 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 11: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

Similar to the previously described synchronization designpattern, the fourth axon type in each grate is dedicated as areset axon with a negative synaptic weight of �255, andlinear combinations of the other two or three positivelyweighted axon types of a neuron are defined so that controlspikes, which arrive simultaneously and one tick beforethe reset spike, increase the membrane by a sufficient amountto cause a single output spike only when the center isgreater than the average of the surrounding by at least tc.Mathematically, this can be expressed by 8� centerðx; yÞ�P8

i¼1 avgiðx; yÞ � 8tc, where center ðx; yÞ denotes the5 � 5 local average of the patch centered at pixel ðx; yÞ,and avgiðx; yÞ, for i ¼ 1; . . . ; 8 denotes the local averages ofthe eight 5 � 5 patches that have a horizontal or verticaldistance of 5 pixels from pixel ðx; yÞ, or in other words arecentered at pixels fðx� 5; y� 5Þ; ðx� 5; yÞ; ðx� 5; yþ 5Þ;ðx; yþ 5Þ; ðx; y� 5Þ; ðxþ 5; y� 5Þ; ðxþ 5; yÞ; ðxþ 5;yþ 5Þg of the original input image. Notice that foruniform image regions the filter gives no response.Within each grate, the problem of center-surround

operators that overlap multiple cores is addressed in the sameway it was addressed for the Gaussian pyramid, namelyby using splitters to replicate pixels whose values are used bymultiple neighboring cores.

Output considerations: Output spike formatEither one or zero spikes are output per frame for each outputneuron on which a center-surround operator is centered.The output spikes might have an extra delay associated withthem to synchronize their arrival at the destination axonwith the outputs from other modules.

Output considerations: Destination of output spikesSimilar to the output for the motion detection module, thespikes are sent to a Bnormalization[ subcorelet that usesnearest-neighbor interpolation to resample the correspondingmap to a common resolution. This in effect registers thebinary images to a common coordinate frame.

Weighted multichannel merge

Input considerations: Source of input spikesAs discussed in the introduction, one of the core componentsof the saliency map hypothesis is the existence of a stagewhere the individual feature maps are merged into a singlemaster saliency map. This paper’s saliency system mergesthe maps through a preprocessing stage for registeringand smoothing the individual multiscale feature maps,followed by the application of a network for merging inputfeature maps by applying cascades of (and possiblyalternating) weighted max and weighted average operations(see Figure 2). The first stage of this merging phase consistsof a normalization/registration that aligns all the multiscalemotion and center-surround maps into a common resolution.

The input for this normalization routine is provided by themotion detection and center-surround routines previouslydescribed.

Input considerations: Input spike formatThe input is binary and therefore consists of 0 or 1 spike perframe per pixel.

Algorithm: Functional descriptionThe registration downsamples or upsamples some of thefeature maps through a nearest-neighbor interpolation corelet(implemented through a splitter network), in order to bringthe feature maps to a common resolution. This is thenfollowed by a Gaussian smoothing stage that inhibits specklenoise and enhances the centroids of salient patches.For the subsequent weighted max operator, a gain value is

assigned to the input rate code at the moment the code iscreated at the source, which effectively multiplies the numberof input spikes by a weight of significance specifiedat corelet/network instantiation time. By merging theburst-rate-codes through a bus-OR, we obtain the maximumof the input rate codes. A cascade of weighted averagescan then merge multiple results of the weighted maxoperation into a single saliency map.

Algorithm: Corelet implementationThe weighted average operation relies to a large extent on theweighted average neuron, an instantiation of which waspreviously used for the Gaussian pyramid corelet (see neuronC of Gaussian convolution in Figure 5). Assume that theuser wishes to use the weighted average neurons to merge nfeature maps, where with each feature map a positive weightof importance is associated. The first step is to normalizethe weights so that their sum does not exceed 255. Thisconstraint is attributed to the use of an 8 bit synaptic weightin TrueNorth. Since each weighted average neuron canfind the average of two inputs, we apply a cascade ofweighted pairwise merges, where at each layer of thecascade, the merge weight of each input is equal tothe summation of the weights of all previously mergedfeatures that the input rate code represents.As shown in Figure 2, in the basic saliency system a single

weighted max operation is first applied to each triple ofcorresponding pixels from the motion or center surroundsequence. This results in numerous merged maps, where eachmap contains the cross-scale maximum of motion and centersurround responses. The output of this operation is thenfollowed by a number of weighted merge operations, whichmerge the resulting maps across all channels into the finalsaliency map. In Figure 9, we show the resulting rawsaliency maps, as well as their peaks, which result afterthresholding the raw saliency maps. Note that the corelet isflexible in the actual sequence of operations performed.For example with different corelet parameter values it is

A. ANDREOPOULOS ET AL. 9 : 11IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 12: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

possible to fuse all maps using a single max operationwithout any Gaussian smoothing or averaging.

Output considerations: Output spike formatThe output consists of a single saliency map (typically120 � 68 or 240 � 136 pixels) where each pixel outputsbetween 0 and 15 spikes per frame, with more spikesdenoting a more salient region. Sometimes an extrathresholding is performed on this map by suppressing tozero the regions that have below a certain number of spikes,as this better helps visualize the level sets.

Output considerations: Destination of output spikesThe output spikes are used by whatever module mightBwant[ to use the saliency system, such as a router systemthat simulates eye movements.

Results and discussionThe system is evaluated on the publicly available NeoVisionTower dataset [28, 29], which contains annotated videosequences of pedestrians, cars, buses, trucks, and bicyclists.This provides a qualitative and quantitative measure of

system performance on a real-world problem. To thebest of our knowledge, this constitutes the first attempt atconstructing a saliency model that is implemented and testedon a low-power spiking neurosynaptic architecture. Ourresults provide an early proof-of-concept that TrueNorthprovides a viable platform for a low-power implementationof a complex application.The dataset consists of 45,000 RGB frames at a resolution

of 1,920 � 1,088 pixels per frame ðwidth� heightÞ. Thevideo was recorded at 30 frames per second, for a totalof 25 minutes of video. See Figure 2 for an example RGBframe. The transduction process converts each input frame toa 240 � 136 pixel three-channel input frame, where eachchannel is quantized to 4 bits from the original 8 bits.This results in a 0 to 15 spike rate-coded representation forevery pixel of every input channel, and an average of�23,500,000 input spikes per second.The basic saliency system was optimized for the

L� logYb� space (a mix of channels from CIE L�a�b�

and YCbCr color spaces) and was tested using a modelconsisting of 13,727 neurosynaptic cores (about 3 millionneurons). The network size depends on many of the corelet

Figure 9

Examples of input frames for different camera orientations and illuminations (column 1), the corresponding saliency maps (column 2), and thethresholded saliency maps that result after suppressing any values below a certain threshold (column 3).

9 : 12 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 13: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

parameters, such as the number of layers and theresolutions/filters of the Gaussian pyramids, as well as theparameter k of the motion detector, which specifies thenumber of frames separating the frame pairs used in motiondetection, which in turn affects the number of delay neuronsneeded. Under appropriate parameters, the corelet cangenerate saliency models that process a maximum of52 frames per second, since each frame can be pipelinedwithin a minimum of 19 ticks for 4-bit inputsVeach extraGaussian pyramid layer is the most costly factor thatdecreases the maximum frame rate. Our basic model andsynchronization neuron parameters were set to expect asingle frame every 33 ticks. The motion detector was onlyapplied to the luminance channel and the center surroundoperator was applied to all 3 channels.The system accuracy was measured with respect to various

overlap ratios between salient/non-salient pixels andforeground annotated objects versus background pixels. Thefollowing list provides more details. (i) The true positive rateðTPÞ is defined as the number of ground truth patchesoverlapping at least one salient pixel divided by the numberof ground truth patches. (ii) The positive predictive valueðPPV Þ is defined as the number of salient labeled pixelsoverlapping a ground truth patch divided by the number of allsalient labeled pixels. (iii) The true negative value ðTNÞ is thenumber of non-salient pixels not overlapping the groundtruth patch divided by the number of pixels not overlappinga ground truth patch. (iv) The negative predictive valueðNPV Þ is the number of non-salient pixels not overlapping aground truth patch divided by the number of non-salientpixels. For completeness, YCbCr, and CIE L�a�b�

spaces were also tested under slightly different modelparameters/thresholds, and gave performance results(see Table 2) demonstrating that system performance under4-bit quantization is only slightly affected by the choice ofcolor representation.We also executed on a single TrueNorth chip,

consisting of 4,096 cores, a smaller version of the saliencysystem (four chips are necessary to run the full 13,727 coresystem). The smaller version uses a single input channel(luminance) to calculate the Gaussian pyramid, motiondetection, center surround and merging. On a 4-secondtest video sequence, we measured power consumptionof 50 mW, and based on this we extrapolate a powerconsumption of �200 mW for the full system. On

average, each neuron’s firing rate for the single chipversion of the basic saliency system was �97 spikes/secondor �3 spikes/frame.Qualitatively, we observe that the system detects most

moving objects, and has trouble detecting small movingobjects with low contrast. Most of the nonannotatedsalient moving object detections occurred due to splashingwater near a fountain. While gusts of wind often causedfoliage to move, the resolution of the input stream diminishedsuch false detections. Salient non-moving objects, suchas sitting pedestrians, were usually detected by the centersurround operators. False positive detections typicallyoccurred near objects whose structure varied significantlycompared to their surroundings, such as lamp-posts (seeFigures 2 and 9).

ConclusionWe described an implementation of a bottom-upvisual saliency algorithm on a spiking, neurosynaptic,non-von Neumann, parallel, distributed, real-time,energy-efficient architecture that supplies an alternativetemplate to the prevailing von Neumann architecturefor future research in cognitive computing [30]. Futureimprovements include the use of different spike codingschemes for producing saliency maps with higherdynamic ranges, the inclusion of more channels, andmore features [31].

AcknowledgmentsWe thank Hayley Wu and Marc Gonzalez-Tallada forproviding suggestions that improved its presentation. Thisresearch was sponsored by DARPA (Defense AdvancedResearch Projects Agency) under contract No. HR0011-09-C-0002. The views and conclusions contained herein arethose of the authors and should not be interpreted asrepresenting the official policies, either expressly or implied,of DARPA or the U.S. Government.

*Trademark, service mark, or registered trademark of International

Business Machines Corporation in the United States, other countries, or

both.

**Trademark, service mark, or registered trademark of Linus Torvalds,

Apple, Inc., or Microsoft Corporation in the United States, other

countries, or both.

Table 2 System performance achieved for various color representations.

A. ANDREOPOULOS ET AL. 9 : 13IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 14: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

References1. J. K. Tsotsos, A Computational Perspective on Visual Attention.

Cambridge, MA, USA: MIT Press, 2011.2. S. W. Zucker, BStereo, shading, surfaces: Curvature

constraints couple neural computations,[ Proc. IEEE, vol. 102,no. 5, pp. 812–829, May 2014.

3. A. Andreopoulos and J. K. Tsotsos, B50 years of objectrecognition: Directions forward,[ Comput. Vis. ImageUnderstanding, vol. 117, no. 8, pp. 827–891, Aug. 2013.

4. G. Cauwenberghs, BReverse engineering the cognitive brain,[Proc. Nat. Academy Sci., vol. 110, no. 39, pp. 15512–15513,Sep. 2013.

5. G. Indiveri, B. Linares-Barranco, T. J. Hamilton, A. van Schaik,R. Etienne-Cummings, T. Delbruck, S. Liu, P. Dudek,P. Hafliger, S. Renaud, J. Schemmel, G. Cauwenberghs,J. Arthur, K. Hynna, F. Folowosele, S. Saighi,T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, andK. Boahen, BNeuromorphic silicon neuron circuits,[ FrontiersNeurosci., vol. 5, no. 73, May 2011.

6. E. Neftcia, J. Binasa, U. Rutishauserb, E. Chiccaa, G. Indiveria,and R. J. Douglas, BSynthesizing cognition in neuromorphicelectronic systems,[ Proc. Nat. Academy Sci., vol. 110, no. 37,pp. E3468–E3476, Jun. 2013.

7. R. J. Vogelstein, U. Mallik, E. Culurciello, G. Cauwenberghs, andR. Etienne-Cummings, BSaliency-driven image acuity modulationon a reconfigurable array of spiking silicon neurons,[ in Proc. Adv.Neural Inf. Process. Syst., 2004, pp. 1457–1464.

8. R. J. Vogelstein, U. Mallik, J. T. Vogelstein, andG. Cauwenberghs, BDynamically reconfigurable silicon array ofspiking neurons with conductance-based synapses,[ IEEE Trans.Neural Netw., vol. 18, no. 1, pp. 253–265, Jan. 2007.

9. S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, BTheSpiNNaker project,[ Proc. IEEE, vol. 102, no. 5, pp. 652–665,May 2014.

10. K. A. Zaghloul and K. Boahen, BA silicon retina thatreproduces signals in the optic nerve,[ J. Neural Eng., vol. 3, no. 4,pp. 257–267, Dec. 2006.

11. L. Itti and C. Koch, BComputational modelling of visualattention,[ Nat. Rev. Neurosci., vol. 2, no. 3, pp. 194–203,Mar. 2001.

12. A. Treisman and G. Gelade, BA feature integration theory ofattention,[ Cognitive Psychol., vol. 12, no. 1, pp. 97–136,Jan. 1980.

13. C. Koch and S. Ullman, BShifts in selective visual attention:Towards the underlying neural circuitry,[ Human Neurobiol.,vol. 4, no. 4, pp. 219–227, 1985.

14. L. Itti, C. Koch, and E. Niebur, BA model of saliency-based visualattention for rapid scene analysis,[ IEEE Trans. Pattern AnalysisMach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.

15. D. J. Berg, S. E. Boehnke, R. A. Marino, D. P. Munoz, and L. Itti,BFree viewing of dynamic stimuli by humans and monkeys,[J. Vis., vol. 9, no. 5, p. 19, May 2009.

16. L. Itti, BAutomatic foveation for video compression using aneurobiological model of visual attention,[ IEEE Trans. ImageProcess., vol. 13, no. 10, pp. 1304–1318, Oct. 2004.

17. T. Judd, K. Ehinger, F. Durand, and A. Torralba, BLearning topredict where humans look,[ in Proc. IEEE 12th Int. Conf.Comput. Vis., 2009, pp. 2106–2113.

18. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy,J. Sawada, F. Akopyan, B. L. Jackson, N. Imam,C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser,R. Appuswamy, B. Taba, A. Amir, M. D. Flickner,W. P. Risk, R. Manohar, and D. S. Modha, BA Millionspiking-neuron integrated circuit with a scalable communicationnetwork and interface,[ Science, vol. 345, no. 6197, pp. 668–673,Aug. 2014.

19. A. Cassidy, R. Alvarez-Icaza, F. Akopyan, J. Sawada,J. V. Arthur, P. A. Merolla, P. Datta, M. Gonzalez Tallada,B. Taba, A. Andreopoulos, A. Amir, S. K. Esser,J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo,R. Moussalli, R. Bellofatto, C. Baks, M. Mastro, K. Schleupen,

C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn,Y. Y. Nakamura, I. Vo, C. Guo, D. Nguyen, S. Lekuch,S. Asaad, D. Friedmann, B. L. Jackson, M. D. Flickner,W. P. Risk, R. Manohar, and D. S. Modha, BReal-timescalable cortical computing at 46 giga-synaptic ops/watt with�100� speedup in time-to-solution and �100; 000� reductionin energy-to-solution,[ in Int. Conf. High Perform.Comput., Netw., Storage Anal.VSupercomput.,2014, pp. 27–38.

20. A. S. Cassidy, P. Merolla, J. V. Arthur, S. K. Esser, B. Jackson,R. A Icaza, P. Datta, J. Sawada, T. M. Wong, V. Feldman,A. Amir, D. Ben-Dayan Rubin, F. Akopyan, E. McQuinn,W. P. Risk, and D. S. Modha, BCognitive computing buildingblock: A versatile and efficient digital neuron model forneurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. NeuralNetw., 2013, pp. 1–10.

21. S. K. Esser, A. Andreopoulos, R. Appuswamy, P. Datta,D. Barch, A. Amir, J. Arthur, A. Cassidy, M. Flickner, P. Merolla,S. Chandra, N. Basilico, S. Carpin, T. Zimmerman, F. Zee,R. A. Icaza, J. A. Kusnitz, T. M. Wong, W. P. Risk,E. McQuinn, T. K. Nayak, R. Singh, and D. S. Modha, BCognitivecomputing systems: Algorithms and applications for networksof neurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. NeuralNetw., 2013, pp. 1–10.

22. A. Amir, P. Datta, W. P. Risk, A. S. Cassidy, J. A. Kusnitz,S. K. Esser, A. Andreopoulos, T. M. Wong, M. Flickner,R. A. Icaza, E. McQuinn, B. Shaw, N. Pass, andD. S. Modha, BCognitive computing programmingparadigm: A corelet language for composing networks ofneurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. NeuralNetw., 2013, pp. 1–10.

23. G. S. Banavar, BAn application framework for compositionalmodularity,[ Ph.D. Dissertation, Univ. Utah, Salt Lake City, UT,USA, 1995.

24. T. M. Wong, R. Preissl, P. Datta, M. Flickner, R. Singh,S. K. Esser, E. McQuinn, R. Appuswamy, W. P. Risk,H. D. Simon, and D. S. Modha, B1014,[ IBM Res. Div.,Armonk, NY, USA, Res. Rep. RJ 10 502, 2012.

25. R. Preissl, T. M. Wong, P. Datta, M. Flickner, R. Singh,S. K. Esser, W. P. Risk, H. T. D. Simon, and D. S. Modha,BCompass: A scalable simulator for an architecture for cognitivecomputing,[ in Proc. IEEE Int. Conf. High Perform. Comput.,Netw., Storage Anal. (SC), 2012, p. 54.

26. P. J. Burt, BFast filter transform for image process.,[ Comput.Graph. Image Process., vol. 16, no. 1, pp. 20–51, May 1981.

27. D. H. Hubel, BThe visual cortex of the brain,[ Sci. Amer., vol. 209,no. 5, pp. 54–62, 1963.

28. Neovision2 dataset-iLab-University of Southern California,Los Angeles, CA, USA. [Online]. Available: http://ilab.usc.edu/neo2/dataset/

29. R. Kasturi, D. Goldgof, R. Ekambaram, G. Pratt, E. Krotkov,D. D. Hackett, Y. Ran, Q. Zheng, R. Sharma, M. Anderson,M. Peot, M. Aguilar, D. Khosla, Y. Chen, K. Kim, L. Elazary,R. C. Voorhies, D. F. Parks, and L. Itti, BPerformance evaluationof neuromorphic-vision object recognition algorithms,[ in Int.Conf. Pattern Recog., 2014, pp. 2401–2406.

30. B. Shaw, A. Cox, P. Besterman, J. Minyard, C. Sassano,R. A. Icaza, A. Andreopoulos, R. Appuswamy, A. Cassidy,S. Chandra, P. Datta, E. Mcquinn, W. Risk, and D. S. Modha,BCognitive computing commercialization: Boundary objects forcommunication,[ in Proc. Int. Conf. IDEMI, Porto, Portugal,Sep. 4–6, 2013, pp. 1–10.

31. J. M. Wolfe and T. S. Horowitz, BWhat attributes guide thedeployment of visual attention and how do they do it?[ Nat. Rev.Neurosci., vol. 5, no. 6, pp. 1–7, Jun. 2004.

Received May 1, 2014; accepted for publication June 5, 2014

9 : 14 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 15: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

Alexander Andreopoulos IBM Research - Almaden, San Jose,CA 95120 USA ([email protected]). Dr. Andreopoulos is aResearch Staff Member working on the Cognitive Computing/DARPASyNAPSE project at IBM Research - Almaden. His research interestslie in the areas of computer vision, vision-based robotics, computationalneuroscience, machine learning, and medical imaging. He has anHonors B.Sc. degree from the University of Toronto in computerscience (first class honors) and mathematics, as well as M.Sc.and Ph.D. degrees in computer science from York University, Toronto,Canada.

Brian Taba IBM Research - Almaden, San Jose, CA 95120 USA([email protected]). Dr. Taba is a Software Engineer working on theDARPA SyNAPSE project in the Cloud and Synaptic Systemsdepartment at the Almaden Research Center. He received a B.S. degreein electrical engineering from the California Institute of Technology in1999 and a Ph.D. degree in bioengineering from the University ofPennsylvania in 2005.

Andrew S. Cassidy IBM Research - Almaden, San Jose, CA95120 USA ([email protected]). Dr. Cassidy is a Research StaffMember on the SyNAPSE project at the Almaden Research Center.He received M.S. and Ph.D. degrees in electrical and computerengineering from the Carnegie Mellon University and Johns HopkinsUniversity in 2002 and 2010, respectively. He subsequently joinedIBM at the Almaden Research Center, where he has worked onlarge-scale neural computing architecture, with the SyNAPSE team.He is author or coauthor of over 20 technical papers. Dr. Cassidyis a member of the Institute of Electrical and Electronics Engineers(IEEE), Tau Beta Pi, and Eta Kappa Nu.

Rodrigo Alvarez-Icaza IBM Research - Almaden, San Jose, CA95120 USA ([email protected]). Dr. Alvarez is a Research StaffMember on the SyNAPSE project at the Almaden Research Center.He received a B.S. degree in mechanical and electrical engineeringfrom Universidad Iberoamericana, Mexico City, an M.S. degreein bioengineering from the University of Pennsylvania, and a Ph.D.degree in bioengineering from Stanford University in 1999, 2005, and2010 respectively. His research focuses on brain-inspired computerarchitectures and spans across all layers of hardware. He is author orcoauthor of over 20 patents and 12 technical publications.

Myron D. Flickner IBM Research - Almaden, San Jose, CA95120 USA ([email protected]). Mr. Flickner is an engineer,manager, inventor, programmer, and author with more than 20 patentsand more than 75 publications in areas of image analysis, computervirus detection, retail, neuromorphic systems, and human-computerinteraction. He currently works at IBM Research - Almaden oncognitive computing, creating brain-inspired low-power computersystems. Mr. Flickner joined IBM Research - Almaden in 1982,working on automated inspection of thin film disk heads. Since then,he held a variety of roles in IBM and Google. He received aB.S. degree (1980) and an M.S. degree (1982) in electricalengineering from Kansas State University.

William P. Risk IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Risk is a Senior Engineer and TechnicalProject Manager working on the DARPA SyNAPSE project at IBMResearch - Almaden. He received a B.S.E. degree in electricalengineering from Arizona State University in 1982, and M.S. andPh.D. degrees in electrical engineering from Stanford University in1983 and 1986, respectively. He subsequently joined IBM at theAlmaden Research Center, where he has worked on lasers, optics,optical storage, quantum cryptography, nanoscale devices,

visualization, and neurosynaptic systems. He is author or coauthor on114 publications, 15 patents, and one book. Dr. Risk is a Fellow ofthe Optical Society of America and a Senior Member of the Institute ofElectrical and Electronics Engineers (IEEE).

Arnon Amir IBM Research - Almaden, San Jose, CA 95120 USA([email protected]). Dr. Amir is a Research Staff Member in theCognitive Computing Group at the IBM Almaden Research Center,where he works on the DARPA SyNAPSE project. As a member of thesoftware team, he develops the corelet programming paradigm andnew algorithms for neurosynaptic computational substrates. Hereceived his B.Sc. degree in electrical and computer engineering fromthe Ben Gurion University, Israel, in 1989, and an M.Sc. and D.Sc.degrees in computer science from the Technion-Israel Institute ofTechnology in 1992 and 1997, respectively. Since joining IBM in 1997,he worked on a number of projects, from eye-gaze trackingand human-computer interaction, speech and video indexing andretrieval, to video archival and tape storage. He initiated andco-invented the Emmy-awarded Linear Tape File System (LTFS).Dr. Amir has coauthored more than 70 technical papers and 20 issuedpatents. He served as program chair and other roles at variousinternational conferences in computer vision and multimedia.He is a senior member of the Institute of Electrical and ElectronicsEngineers (IEEE).

Paul A. Merolla IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Merolla received his B.S. degreewith high distinction in electrical engineering from the University ofVirginia, Charlottesville, Virginia, in 2000 and his Ph.D. degree inbioengineering from the University of Pennsylvania, Philadelphia, in2006. He was a Post-Doctoral Scholar in the Brains in Silicon Lab atStanford University (2006–2009), working as a lead chip designeron Neurogrid, an affordable supercomputer for neuroscientists. Startingin 2010, he has been a Research Staff Member at the IBM AlmadenResearch Center, where he was a lead chip designer for the firstfully digital neurosynaptic core as part of the DARPA-fundedSyNAPSE project, and more recently, the TrueNorth chip with onemillion neurons and 256 million synapses, which consumes lessthan 100 mW. His research involves building more intelligentcomputers, drawing inspiration from neuroscience, neural networks,and machine learning. His interests include low-power neuromorphicsystems, asynchronous circuit design, large-scale modeling ofcortical networks, statistical mechanics, machine learning, andprobabilistic computing.

John V. Arthur IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Arthur is a Research Staff Memberworking on the SyNAPSE project. He received a B.S.E. degree inelectrical engineering from Arizona State University in 2000 anda Ph.D. degree in bioengineering from the University of Pennsylvaniain 2006. He was a postdoctoral scholar in bioengineering at StanfordUniversity. His research focuses on applying brain-inspiredprinciples to chip design and architecture, with interests includingdynamical systems, neuromorphic and neurosynaptic architecture,and hardware-aware algorithm design.

David J. Berg IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Berg is a Senior Software Engineer inthe Cognitive Computing Group at the IBM Almaden Research Center.He received a B.S. degree in cognitive science at the University ofCalifornia at San Diego in 2003, and a Ph.D. degree in computationalneuroscience at the University of Southern California in 2013.He subsequently joined IBM at the Almaden Research Center,where his work is focused on software tools and visual processingalgorithms for the DARPA SyNAPSE project.

A. ANDREOPOULOS ET AL. 9 : 15IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015

Page 16: Visual saliency on networks of neurosynaptic cores · 2017. 7. 20. · systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate

Jeff A. Kusnitz IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Mr. Kusnitz has been with IBM for more than26 years. He spent 15 of those years working in IBM Research andSoftware Group organizations on speech and telephony platformsand technologies, in roles ranging from software development toplatform integration to worldwide standards as IBM’s representativeto several industry standards organizations. Following that, he workedwith IBM Research’s WebFountain and Semantic Super Computingorganizations, focusing primarily on enterprise-scale text analyticsand indexing, developing and maintaining several services used withinIBM to manage and mine IBM-internal web pages. He is currentlyworking at IBM Research - Almaden in the Cognitive ComputingGroup, focusing on infrastructure, tooling, and simulators.

Pallab Datta IBM Research - Almaden, San Jose, CA 95120 USA([email protected]). Dr. Datta is a Research Staff Member on theDARPA SyNAPSE project in the Cognitive Computing Group at theIBM Almaden Research Center. He received a B.E. degree inelectronics engineering from University of Allahabad, India, in 1999,and Ph.D. degree in computer engineering from Iowa State Universityin 2005. Prior to joining IBM Almaden Research Center, Dr. Dattahad worked at The NeuroSciences Institute in San Diego, California.Dr. Datta was also a Technical Staff Member at Los Alamos NationalLab in the Information Sciences (CCS-3) Division. He was also avisiting researcher at INRIA, Sophia-Antipolis France. He is currentlyworking on large-scale simulations using the IBM Neuro-SynapticCore Simulator (Compass) and development of the CoreletProgramming Language for programming the reconfigurableneurosynaptic hardware. He is also involved in the development ofalgorithms and applications with networks of neurosynaptic coresfor building cognitive systems. His technical interests includeneuromorphic architecture and simulations, high-performancecomputing, machine learning, optimization techniques, and graphtheory. He is author or coauthor of several patents and 20 technicalpapers. Dr. Datta is a member of the Institute of Electrical andElectronics Engineers (IEEE) and the Association for ComputingMachinery (ACM).

Steve K. Esser IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Esser is a Research Staff Memberworking on the DARPA SyNAPSE project in the Cloud and SynapticSystems department at the IBM Almaden Research Center. He hasa B.S. and Ph.D. degree from the University of Wisconsin-Madison,where he developed computational models of the brain during sleep andwakefulness. His current research focuses on brain-inspired algorithmsand applications for operation on the TrueNorth chip.

Rathinakumar Appuswamy IBM Research - Almaden, SanJose, CA 95120 USA ([email protected]). Dr. Appuswamyreceived his B.Tech. degree from Anna University, Chennai, India, andhis M.Tech. degree from the Indian Institute of Technology, Kanpur,India, both in electrical engineering in 2002, and 2004, respectively. Hereceived the M.A. degree in mathematics and his Ph.D. degree inelectrical and computer engineering both from the Universityof California, San Diego, in 2008, and 2011, respectively. During 2011,he was a postdoctoral researcher at IBM Research - Almaden, andsince 2012 he has been a Research Staff Member. His research interestsinclude multi-modal learning, network coding, communication forcomputing, and network information theory.

Davis R. Barch IBM Research - Almaden, San Jose, CA 95120USA ([email protected]). Dr. Barch is a Senior Software Engineerin the Cognitive Computing group at the IBM Almaden ResearchCenter. He received a B.S. degree in chemistry from The GeorgeWashington University, an M.S. degree in biochemistry from the

University of Pittsburgh, an M.S. degree in computer science fromthe University of California at Santa Barbara, and a Ph.D. degreein vision science from the University of California at Berkeley. Hejoined the Cognitive Computing Group at the IBM Almaden ResearchCenter in 2010. Dr. Barch is a member of the American Associationfor the Advancement of Science (AAAS).

Dharmendra S. Modha IBM Research - Almaden, San Jose,CA 95120 USA ([email protected]). Dr. Modha is an IBM Fellowand IBM Chief Scientist, Brain-Inspired Computing. He is alsoPrincipal Investigator for the DARPA SyNAPSE project. He holds aB.Tech. degree in computer science and engineering from IIT Bombay(1990) and a Ph.D. degree in electrical and computer engineeringfrom University of California at San Diego (1995). Dr. Modha hasauthored more than 60 publications in international journals andconferences, holds more than 50 U.S. patents, and is an IBM MasterInventor. He is a member of the IBM Academy of Technology,the American Association for the Advancement of Science, theAssociation for Computing Machinery, and the Society forNeuroscience. He is also a Fellow of the Institute of Electricaland Electronic Engineers (IEEE) and a Fellow of the WorldTechnology Network. He has received the FAST (File and StorageTechnologies) Test-of-time Award; the Best Paper Award at IDEMI(International Conference on Integration of Design, Engineering andManagement for Innovation); First Place for the 2012 Science/NSFInternational Science and Engineering Visualization Challenge,Illustration Category; the Best Paper Award at ASYNC (InternationalSymposium on Asynchronous Circuits and Systems); and the ACMGordon Bell Prize. SyNAPSE was named as one of the Best InnovationMoments of 2011 by The Washington Post, and Dr. Modha wasnamed as one of the B10 Electronics Visionaries to Watch[ by the EETimes on its 40th Anniversary.

9 : 16 A. ANDREOPOULOS ET AL. IBM J. RES. & DEV. VOL. 59 NO. 2/3 PAPER 9 MARCH/MAY 2015