11
. ................................................................................................................................................................................................................ MBUS:ASYSTEM INTEGRATION BUS FOR THE MODULAR MICROSCALE COMPUTING CLASS . ................................................................................................................................................................................................................ MBUS IS A NEW INTERCHIP INTERCONNECT MADE OF TWO SHOOT-THROUGH” RINGS THAT RESOLVES FUNDAMENTAL SIZE AND POWER ISSUES THAT PREVENT THE DESIGN OF COMPOSABLE MICROSCALE SYSTEMS. MBUS INTRODUCES POWER-OBLIVIOUS COMMUNICATION, WHICH GUARANTEES MESSAGE RECEPTION REGARDLESS OF THE RECIPIENTS POWER STATE.THIS DISENTANGLES POWER MANAGEMENT FROM COMMUNICATION, GREATLY SIMPLIFYING THE CREATION OF VIABLE, MODULAR, AND HETEROGENEOUS SYSTEMS THAT OPERATE ON THE ORDER OF NANOWATTS. ...... Bell’s law follows the trend from mainframes to desktops to mobile phones and predicts the emergence of a new class of computing roughly every decade. Today, we see the emergence of the wearable class: centi- meter-scale, wireless, battery-powered devi- ces. We look ahead to the next generation, the microscale computing class, and make the critical observation that even if we scale all the components, the processor, the radio, sensors, and so on, we still will not be able to produce viable microscale systems. The problem is rooted in the interconnect technologies used to synthesize systems. Micro- scale systems such as the pressure sensor in Figure 1 by their definition introduce unprece- dented constraints on size, which in turn intro- duces unprecedented constraints on available energy. Since their invention around the early 1980s, the Serial Peripheral Interface (SPI) and Inter-Integrated Circuit ðI 2 CÞ buses have served as the interconnects for the vast majority of embedded designs. In the creation of modu- lar, microscale systems, we discover that the area demands of SPI and the energy demands of I 2 C render them fundamentally unusable for composing the next class of computing. To address these limitations, we introduce MBus, a new interchip interconnect designed for the composition of current and future microscale and resource-constrained sys- tems. 1 MBus connects components with a pair of “shoot-through” rings, CLK and DATA, and delivers a superset of the features from I 2 C and SPI but at lower power, with a fixed area and pin count, using fully synthe- sizable logic, and minimal protocol overhead. We designed MBus from the ground up, lev- eraging a principled approach that begins by identifying the key properties of interconnect technologies from extant and newly emerging embedded designs. Pat Pannuto University of Michigan Yoonmyung Lee Sungkyunkwan University Ye-Sheng Kuo ZhiYoong Foo Benjamin Kempke Gyouho Kim Ronald G. Dreslinski David Blaauw Prabal Dutta University of Michigan ....................................................... 60 Published by the IEEE Computer Society 0272-1732/16/$33.00 c 2016 IEEE

MBUS:ASYSTEM INTEGRATION BUS MODULAR …web.eecs.umich.edu/faculty/blaauw/images/pdfs/Pannuto,MBus-a... · clockless silicon is a challenging problem. The ... on more than a dozen

  • Upload
    lymien

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

.................................................................................................................................................................................................................

MBUS: A SYSTEM INTEGRATION BUSFOR THE MODULAR MICROSCALE

COMPUTING CLASS.................................................................................................................................................................................................................

MBUS IS A NEW INTERCHIP INTERCONNECT MADE OF TWO “SHOOT-THROUGH” RINGS

THAT RESOLVES FUNDAMENTAL SIZE AND POWER ISSUES THAT PREVENT THE DESIGN OF

COMPOSABLE MICROSCALE SYSTEMS. MBUS INTRODUCES POWER-OBLIVIOUS

COMMUNICATION, WHICH GUARANTEES MESSAGE RECEPTION REGARDLESS OF THE

RECIPIENT’S POWER STATE. THIS DISENTANGLES POWER MANAGEMENT FROM

COMMUNICATION, GREATLY SIMPLIFYING THE CREATION OF VIABLE, MODULAR, AND

HETEROGENEOUS SYSTEMS THAT OPERATE ON THE ORDER OF NANOWATTS.

......Bell’s law follows the trend frommainframes to desktops to mobile phonesand predicts the emergence of a new class ofcomputing roughly every decade. Today, wesee the emergence of the wearable class: centi-meter-scale, wireless, battery-powered devi-ces. We look ahead to the next generation,the microscale computing class, and makethe critical observation that even if we scaleall the components, the processor, the radio,sensors, and so on, we still will not be able toproduce viable microscale systems.

The problem is rooted in the interconnecttechnologies used to synthesize systems. Micro-scale systems such as the pressure sensor inFigure 1 by their definition introduce unprece-dented constraints on size, which in turn intro-duces unprecedented constraints on availableenergy. Since their invention around the early1980s, the Serial Peripheral Interface (SPI) andInter-Integrated Circuit ðI2CÞ buses have

served as the interconnects for the vast majorityof embedded designs. In the creation of modu-lar, microscale systems, we discover that thearea demands of SPI and the energy demandsof I2C render them fundamentally unusablefor composing the next class of computing.

To address these limitations, we introduceMBus, a new interchip interconnect designedfor the composition of current and futuremicroscale and resource-constrained sys-tems.1 MBus connects components with apair of “shoot-through” rings, CLK andDATA, and delivers a superset of the featuresfrom I2C and SPI but at lower power, with afixed area and pin count, using fully synthe-sizable logic, and minimal protocol overhead.We designed MBus from the ground up, lev-eraging a principled approach that begins byidentifying the key properties of interconnecttechnologies from extant and newly emergingembedded designs.

Pat Pannuto

University of Michigan

Yoonmyung Lee

Sungkyunkwan University

Ye-Sheng Kuo

ZhiYoong Foo

Benjamin Kempke

Gyouho Kim

Ronald G. Dreslinski

David Blaauw

Prabal Dutta

University of Michigan

.......................................................

60 Published by the IEEE Computer Society 0272-1732/16/$33.00�c 2016 IEEE

Because existing technologies have workedfor so long, little to no work outlines thedesign space and requirements for the inter-chip networks of embedded devices. Our firstmajor contribution was to fill this void andpresent an analysis of the requirements anddesirable features we identified from nearly adecade of designing microscale systems. Fromthis discussion emerged a new constraint,unique to the new but growing class of hyper-power-conscious systems of a “power-aware”interconnect.

With power awareness, MBus takes a chal-lenging circuits problem and a challengingsystems problem and creates an architecturalsolution. To preserve precious energy, micro-scale systems completely shut off unused com-ponents. This is not the traditional darksilicon, which still leaks static power whileunclocked. This is “pitch-black” silicon thatdraws no power but also preserves no state. Incircuit design, cold-booting pitch-black,clockless silicon is a challenging problem. TheMBus protocol adds explicit support for wak-ing powered-off nodes, enabling circuitdesigners to leave fewer than 50 gates poweredfor maximal energy savings. When designingsystems, tracking every component’s powerstate can become a challenging distributedstate problem. Because MBus controls eachcomponent’s power state, it can provide theappearance that components are always onsimply by waking them before delivering mes-sages. This lets MBus member nodes be poweroblivious, which greatly simplifies the systemdesign and enables traditional componentsthat have no power awareness to seamlesslybenefit from the power-saving capabilities ofpower-conscious elements.

We have implemented MBus front endson more than a dozen distinct chips and usedthese components to compose half a dozendifferent systems. From empirical measure-ments, MBus requires only 22.6 pJ/bit/chipon average, and its efficient transmission-levelacknowledgements provide a 90 to 99 per-cent reduction in overhead for long messages.

Shortcomings of Existing InterconnectsThe SPI bus is conceptually very simple.Direct, single-ended connections run betweena master and peripheral devices. For efficiency,

the clock and data wires (SCLK, MOSI,MISO) are shared among all peripherals, withonly one distinct chip select line running toeach slave device. As we push toward smallerand more I/O-constrained devices, however, itis this multitude of chip selects that becomesthe issue.

In a modular system with a variable (andunknown until system design time) numberof components, it is difficult to choose theright number of chip select lines a priori—too few impede modularity and too manyviolate the area constraints of microscalesystems. Furthermore, the master/slave archi-tecture actually makes I/O twice as problem-atic. A subtle, yet critical, implication of asingle-master design is that all communica-tion is master-initiated. For a sensor to signalthe microcontroller (that is, an interrupt), itrequires a second dedicated wire, a resourcethat is unavailable to microscale systems.

Another drawback of the master/slavearchitecture is that all communication betweenslave devices must go through the masternode. This more than doubles the communi-cation cost for slave-to-slave transmissions:every message is sent twice, plus the extraenergy cost of running the central controller.Owing primarily to I/O constraints, SPI andits derivatives are fundamentally incompatiblewith size-constrained microsystems.

The other primary embedded interconnecttechnology, the I2C bus, requires only two

Figure 1. Microscale pressure sensing

system on the edge of a US nickel.

Microscale systems, and the techniques

such as 3D-stacking employed to realize

their size, introduce volume and surface

area constraints that lead to extremely

limited energy storage, energy harvesting,

and I/O capabilities.

.............................................................

MAY/JUNE 2016 61

wires, clock and data (SCL and SDA), inde-pendent of the number of connected devices.Each wire is an open-collector circuit, con-nected to each device. This turns each bus lineinto a wired-AND; one or many devices candrive a 0 on the bus, but if nothing activelydrives low, then pull-up resistors pull each linehigh. This approach has the advantages ofdecentralized arbitration and multitiered pri-ority. The pull-up resistors, however, are notenergy efficient and result in designs that haveup to three orders of magnitude worse energyper bit than MBus.

To illustrate, consider an idealized I2C con-figuration running at 1.2 V that we try to opti-mize for energy consumption. I2C typicallyrequires the pull-up resistor be sized to accom-modate 400 pF of total bus capacitance, butlet us relax that to 50 pF for microscale sys-tems; fast mode I2C has a 400 kHz clock andmust reach 80 percent VDD in 300 ns, but letus relax that (eliminate setup and hold time)to the full half-cycle (1.25 ls). This relaxedI2C bus requires a pull-up resistor no greaterthan 15.5 kX. To generate the bus clock, thisresistor is shorted to ground for a half-period,dumping the charge in the bus wires, pads,and FET gates (23 pJ) and dissipating powerin the resistor (116 pJ). The clock line thenfloats for a half-cycle and the resistor pulls ithigh (35 pJ). Thus, generating the clock alonedraws 69.6 lW. Eliminating the switchingpower—the 23 pJ/bit charging and discharg-ing of the wire, pad, and gate capacitance—requires complex adiabatic clocks, which areoutside the scope of our design. MBus finds itsenergy gains by eliminating the 151 pJ/bit lostto the pull-up resistor.

In an earlier generation, we attempted tomodify I2C, replacing the energy-hungry pull-up resistors with a low-energy bus-keeper cir-cuit that preserves the last value, similar toI2C’s ultrafast mode.2 Although this improvedenergy, pushing it down to 88 pJ/bit (fourtimes that of MBus), the design required man-ual, process-specific tuning for each fabricatedchip—not an appealing property for a gen-eral-purpose bus. Furthermore, although thedesign attempted to preserve commercialinteroperability, in practice, actual interopera-tion required a field-programmable gate arrayto translate between actual I2C and the“I2C-like” bus. MBus eschews the “partial

compatibility” that I2C-like buses provide anduses the clean break to reconsider the primi-tives provided by the system interface, allowingthe addition of features such as power-oblivi-ous communication, broadcast messages, andefficient transaction-level acknowledgments.

Interconnect RequirementsHaving found show-stopping limitations withexisting interconnect technology, we decidedto turn the problem on its head. Drawingfrom our experience as embedded system andmicroscale system developers, we drew up a listof key properties that we identified for the sys-tem interconnect of ultraconstrained systems(see Table 1). Although these properties willeventually guide our development of MBus,they stand independent of the MBus design asan architectural summary of the demands foran interconnect for microscale systems.

SynthesizableTo facilitate widespread adoption, we requirea process-agnostic solution, a block of “pure”HDL with no process-specific custom macros.Our previous I2C variant required process-specific tuning of custom ratioed logic, addingcost, complexity, and risk to every chip.2

Low Wire CountWith submillimeter-scale systems, the areacost required to place bonding pads (35 to65 lm wide) on one edge or around the perim-eter of a chip limits the system’s ability to scale.Although advancements such as through-silicon vias help, many popular processes do notsupport them (such as IBM130 or TSMC65).

Address SpaceA bus must provide a means of addressingeach element, for example, with hardwaresupport (such as SPI chip selects) or explicitaddresses (such as I2C). If addresses are used,they must both support a large number ofpossible devices (a naive search of DigiKeyreturns over 215 I2C-capable devices) andminimize overhead on each transmission.

Low Standby PowerResource-constrained systems spend most oftheir time in standby, so standby inefficienciesare magnified. Existing interconnects are well-

..............................................................................................................................................................................................

TOP PICKS

............................................................

62 IEEE MICRO

suited to this, so any new bus must draw lessthan 100 pW to be competitive.

Low Active PowerMicroscale systems have extremely con-strained power budgets. Our most constrainedsystem is powered by a 0.5 lAh battery andtargets a total system active power budget ofless than 40 lW (and idle power of 20 nW).For reference, recall that the idealized I2Cclock alone uses 69.6 lW. Although any abso-lute number will be system dependent, toallow an Amdahl-balanced system design, wetarget an upper limit of 20 lW total activepower draw for the system interconnect.

Data-Independent BehaviorProtocols that use dedicated symbols to com-municate special cases (such as an end-of-message indicator) require byte stuffing,which in pathological cases can double a mes-sage’s length. This affects the ability to reasonabout protocol performance both in energyand time, and in real-time systems can leadto violations of timing requirements orrequire artificially high provisioning.

Fault ToleranceIt must be impossible for the bus to enter a“locked-up” state due to any transient faults.

These faults include spontaneous bit flips orother glitches, but cannot necessarily accountfor permanent failures such as faultyhardware.

InterruptsTo facilitate a diverse and unpredictable setof devices and system applications, any devicemust be able to initiate a transmission to anyother device at any time. This requires eitheran efficient, non-polling-based interruptmechanism or a true multi-master design.

Efficient AcknowledgmentsMany applications require reliable messagetransport. This feature can be directly sup-ported by the bus protocol in hardware or asan optional software feature if it can be madesufficiently low-overhead.

Power AwareUnlike deep sleep, which still loses energy tostatic leakage, a power-gated circuit loses allstate. Ultraconstrained systems need to fre-quently cold boot and shut down subcircuitswithout affecting active areas. To power on apower-gated circuit reliably and withoutintroducing glitches can be a challenging cir-cuit-design problem, but it is made muchsimpler with a reliable clock source.

Table 1. Population-independent area, ultra-low-power operation, synthesizability, an area-free global name-

space, and interrupt support are fundamental requirements for a microscale interconnect.

Key properties I2C SPI UART Lee-I2C MBus

Critical

I/O pads (n nodes) 2/4 3þ n 2�n 2/4 4

Standby power Low Low Low Low Low

Active power High Low Low Med Low

Synthesizable Yes Yes Yes No Yes

Globally unique addresses 128 — — 128 224

Multi-master (interrupt) Yes No No Yes Yes

Desirable

Broadcast messages No Option No No Yes

Data independent Yes Yes Yes Yes Yes

Power aware No No No No Yes

Hardware acknowledgments Yes No No Yes Yes

Bits overhead (n byte message) 10þ n 2 (2 – 3)�n 10þ n 19 or 43...................................................................................................................................*I2C: Inter-Integrated Circuit; SPI: Serial Peripheral Interface; UART: Universal Asynchronous Receiver Transmitter.

.............................................................

MAY/JUNE 2016 63

Aggressively low-power designs have noclock sources in their lowest-power state,however, so every design requires a customwakeup circuit to generate these edges, add-ing cost and complexity. For systems develop-ers, managing the system’s power state asvarious subcomponents try to quickly powerone another on and off becomes a challeng-ing distributed state problem.

By adding power awareness to the intercon-nect fabric, we can eliminate difficult challengesfor both systems engineers and circuit design-ers. The system interconnect should provide aclean abstraction for sending messages—that is,one that ensures receipt independent of the tar-get device type or immediate power state.Because the interconnect is responsible foralerting a component that it needs to boot inthe first place, it should also provide assistanceto the booting device’s wakeup circuitry.

InteroperabilityNot all systems are severely resource or energyconstrained, yet they might still wish to usechips designed for ultra-low-power applica-tions. Any interconnect must bridge the gapbetween power conscious and power oblivi-ous—no notion of power gating and no speci-alized constructs to support it—devices toavoid fracturing the component ecosystemand to enable reuse across all device classes.

MBus DesignWorking from the requirements outlined forthe interconnect of microscale systems, wedeveloped a new interconnect system: MBus.

TopologyTo meet the wire count requirement, the bustopology must be independent of the num-ber of nodes—that is, adding another nodecannot require adding a new wire. Becausemany nodes thus share the same wires, MBusrequires a scheme to avoid conflicts, drivingthe same line high and low. Some form oftoken-passing or leader-based protocol viola-tes the efficient interrupt requirement,requiring the leader to poll to find the inter-rupter. MBus prevents conflicts by using tworings, CLK and DATA, as shown in Figure 2.To minimize active power, MBus clocks allbus logic off the bus clock itself, obviatingthe need for a local oscillator on each node.With no local clock, the rings are “shoot-through”: signals pass through only a mini-mal amount of combinational logic from onenode to the next.

Clock GenerationMBus introduces one special node, the medi-ator, which is responsible for generating theMBus clock and resolving arbitration. EveryMBus system must have exactly one media-tor, either attached to a core device (such as amicrocontroller device) or as a stand-alonecomponent (similar to the pull-up resistorsin I2C). For ultra-low-power designs, MBuspower gates all but the forwarding drivers(the wire controller) and a minimalistwakeup front end (the sleep controller). Themediator must therefore be capable of self-starting. In an ultra-low-power design, some-thing must have the capability to self-start;the mediator allows that self-start require-ment to be contained within a single, reus-able component.

ArbitrationIn the idle state, all nodes forward high CLKand DATA signals around the rings. A noderequests the bus by breaking the chain anddriving its DATAOUT low. This propagatesaround the DATA ring until it reaches themediator, which does not forward DATA

DOUT CLKOUT CLKINDIN

Mediator

Figure 2. MBus physical topology. An MBus

system consists of a mediator node and one

or more member nodes, connected in two

“shoot-through” rings,CLK andDATA.

..............................................................................................................................................................................................

TOP PICKS

............................................................

64 IEEE MICRO

during arbitration. The falling edge onDATAIN triggers the mediator self-start,which begins toggling CLK as soon as it isactive. At the first rising edge of CLK, anyarbitrating nodes sample their DATAIN line.If DATAIN is high, the node has won arbitra-tion; otherwise, it has lost.

This arbitration scheme introduces atopologically dependent priority on MBusnodes. To afford physically low-prioritynodes an opportunity to send low-latencymessages, MBus adds a priority arbitrationcycle after arbitration. The priority arbitra-tion scheme is similar, except it is the arbitra-tion winner that does not forward DATA andnodes pull DATAOUT high to issue a priorityrequest. Figure 3 shows a waveform of arbi-tration and priority arbitration.

Hierarchical WakeupFor maximum power efficiency, MBus on apower-gated node leaves only a minimalist,highly optimized front end on continuously.To receive an MBus transmission, however,the power-gated node’s bus controller mustfirst be activated. The key insight that enablesMBus’s power-oblivious properties is that apower-gated node can use the edges on theCLK line from arbitration to drive the

wakeup circuitry of the bus controller.Because arbitration occurs before every mes-sage, all bus controllers in the ring are activeby the addressing phase and can determinewhether the message is destined for thisnode. MBus does not wake the rest of thenode until its address matches. This ensuresthat a message powers on only the destinationnode. This design lets any node transmit toany other node at any time while ensuringthat the receiving node (and only the receiv-ing node) will be powered on to receive themessage.

AddressesMBus uses an addressing scheme to directtransmissions and divides addresses into twocomponents: a prefix and a functional unitID (FU-ID). A prefix uniquely addresses aphysical MBus interface (one of the actualchips in the system), whereas FU-IDs areused to address chip subcomponents. FU-IDs are 4 bits, allowing for up to 16 subcom-ponents behind each physical MBus frontend. MBus reserves prefix 0 for broadcastmessages. On a shared bus, broadcast mes-sages are cheap to implement in hardwarebut expensive (linear in system size) to emu-late in software, motivating hardware

Bus clk

Data inData out

Data inData out

Data inData out

Data inData out

1

2

3

Med

←Drive bus request

← Does not forward

Drive bus request

Loses arbitrationb/c DATAIN = 0

→→

tDoes noteforwardw

Drive priority request

Begin forwarding →

←Priority requested,back off

← Wins priority arbitration

Begin forwarding →

Drive bit 0 →

Mediator wakeup0 1 2 3 4 5 6 7 8

Arbitra

tion

Priority

drive Prio

rity

latch

Reser

ved

Reser

ved

Drive

bit 0 Latch

bit 0 Drive

bit 1 ...

Wins arbitration b/c DATAIN = 1

Figure 3. MBus arbitration. To begin a transaction, one or more nodes pull down on DATAOUT. Here, we show node 1 and

node 3 requesting the bus at nearly the same time (node 1 shortly after node 3). Node 1 initially wins arbitration, but node 3

uses the priority arbitration cycle to claim the bus. The propagation delay of the data line between nodes is exaggerated to

show the shoot-through nature of MBus. Momentary glitches caused by nodes transitioning from driving to forwarding are

resolved before the next rising clock edge.

.............................................................

MAY/JUNE 2016 65

broadcast support. MBus repurposes the FU-ID of broadcast messages as broadcast chan-nel identifiers, which lets nodes listen to onlythe broadcast messages they support or areinterested in.

Prefix AssignmentTo retain the efficiency afforded by shortaddresses while allowing for a diverse ecosys-tem of unique components, MBus uses run-time enumeration to assign 4-bit shortprefixes. Enumeration is a series of broadcastmessages containing short prefixes that canbe sent by any node (although in practice,most likely by a microcontroller). All unas-signed nodes attempt to reply with an identi-fication message, and the arbitration winneris assigned the enumerated short prefix. Aside effect of this enumeration protocol isthat a node’s short prefix encodes its topologi-cal priority. Enumeration is performed once,when the system is first powered. As an opti-mization, devices can assign themselves astatic short prefix, akin to I2C addressing, sothat, if there are no conflicts, enumerationcan be skipped.

Every chip design is assigned a unique, 20-bit full prefix. Full prefixes let nodes refer toone another with static addresses at the cost of16 bits of additional overhead per message.The short prefix 0xF is reserved to indicate fulladdresses, leaving MBus with 14 usable shortprefixes per system. Chips can be addressedusing either short or full addresses interchange-ably. It is sometimes advantageous for a systemto have two copies of the same chip (for exam-ple, memory), which requires short prefixesand enumeration to disambiguate.

Sending DataMBus transmitters drive data on the fallingedge of CLK, and receivers latch data on therising edge of CLK. Although standard flopscan be clocked only on one edge (rising orfalling), only the internal data FIFO needs tobe clocked on the falling edge, so this doesnot violate the synthesizability requirement.This allows for identical setup and hold mar-gins, easing interoperability when drivingunknown loads (for more details, see ourCICC paper3).

At the end of a message, the receiveracknowledges or rejects the entire message. In

MBus, any node can terminate any message atany time, even a forwarder. The transmittermight end the message when it is finished, orthe receiver might interject midmessage toindicate an error, such as a buffer overrun.Thus, by not interjecting, a receiver implicitlyacknowledges every byte. This is less powerfulthan I2C acknowledgments, which can detecta dead receiver after the first byte, but is moreefficient during error-free operation andallows MBus to scale to long (multi-kilobyte)messages with a fixed, length-independentoverhead.

Ending MessagesAt any point, the bus may be interrupted byan MBus interjection. In normal MBus oper-ation, DATA never toggles meaningfully with-out a CLK edge. This lets us design a reliable,independent interjection-detection module,essentially a saturating counter clocked byDATA and reset by CLK. The interjection sig-nal acts as a reset signal to the bus controller,clearing its current state and placing it in con-trol mode. MBus control is two cycles longand is used to express why the bus was inter-jected, either because the transmitter finishedsending data or to express some type of error.

The MBus interjection request mecha-nism, holding CLK high, results in nodesobserving a varying number of clock edgesdepending where they are in the ring. MBusrequires that messages be byte aligned toresolve this potential ambiguity, potentiallyrequiring a small amount (up to 7 bits) ofpadding to be added to MBus messages.

MBus interjections are used both forextreme cases, such as rescuing a hung bus orindicating receiver error, and as a regular end-of-message signal. Any node can generate aninterjection at any time. This allows for unam-biguous signaling of control functions that canresynchronize a bus without requiring out-of-band signals like chip selects or a reset line.This further lets a node with a latency-sensitivemessage interrupt an active transaction, ena-bling responsiveness across a diverse array ofworkloads not possible with current buses.

EvaluationIn our ISCA paper,1 we evaluated the per-formance of the MBus protocol in theory

..............................................................................................................................................................................................

TOP PICKS

............................................................

66 IEEE MICRO

and practice and empirically validated theultra-low-power claims. MBus supports apeak throughput of 7.1 Mbits/second forcommon designs. Its implicit acknowledge-ments make it more efficient than I2C andthe Universal Asynchronous Receiver Trans-mitter (UART) for messages at least 9 byteslong and only slightly (8 bits) less efficientthan I2C for 1-byte messages.

We taped out four chips (processor, radio,temperature sensor, and image sensor) thatwe composed with MBus into two distinctsystems: temperature sensing and imaging(see Figures 4 and 5, respectively). Frompower traces of the temperature sensor, wefound that MBus requires only 22.6 pJ/bit/chip, a four-times reduction over our pre-vious I2C-like bus and two orders of magni-tude better than standard I2C.

We further evaluated each of the com-posed systems, showing how MBus’s multi-master faculties alone achieve a 7 percentimprovement in system lifetime for the tem-perature sensor. Evaluating the image sensorshowcases how well MBus scales to handle(relatively) large volumes of data, such as a28.8 Kbit image. At that size, MBus cantransmit up to 238 frames per second (farmore than the imager can produce). Themessage-level acknowledgements are a key toenabling this image bandwidth, generating a13 percent reduction in the number of bitssent to transfer images.

MBus Limitations, Open Questions, andFuture DirectionsMBus does not guarantee fairness (nor doesI2C). Making arbitration fair a priori is diffi-cult (how should ties be broken?), and inmany cases, system designers might prefer pri-oritization over fairness (for example, in theautomotive CAN bus, braking has higher pri-ority than windshield wipers). If mutable pri-ority is available, one fair scheme couldautomatically rotate priority on every message.

Using interjections to end messages per-mits nodes to send messages of arbitrarylength. While this is useful for sending longmessages efficiently, it also has the effect oflocking the bus for a long time, which couldharm responsiveness. Although the MBusdesign lends itself well to resuming an inter-

rupted transmission (both TX and RX nodesknow how far through a message they were),it is not possible to indicate to other nodeson the bus which messages are interrupt-friendly. One idea is to leverage one or morefunctional units as well-known resumablemessage destinations to indicate to all nodesthat this message can be opportunisticallyinterrupted. However, resumable messagescome with other challenges—nodes musthave buffers for multiple in-flight transac-tions and preserve state across transactions,which complicates bus controller design.

As a ring with a (small) number of gatesbetween IN and OUT at each node, the upperbound for clock speed is limited. However,the design is amenable to additional DATA

Figure 4. Temperature sensing system. Our

evaluated temperature sensing system

consisting of a 2 lAH battery, a 900-MHz

near-field radio, an ARM Cortex M0

processor, and an ultra-low-power

temperature sensor, interconnected using

MBus.

(a) (b)

Figure 5. Motiondetectionandimagingsystem.(a)Ourimagermadeofa900MHz

near-field radio, a 5 lAH battery, an ARM Cortex M0, and a 160 x 160 pixel, 9-bit

graysacle imager with ultra-low-power motion detection all connected using

MBus. (b) A full-resolution (28.8 Kbits) image that was transferred by MBus.

.............................................................

MAY/JUNE 2016 67

lines. Although this increases I/O cost, eachadditional DATA line doubles the MBus pay-load throughput. A hybrid ring of traditionaland parallel MBus nodes can be imagined,with the DATA0 line touching every nodeand the additional data lines forming asmaller ring of only parallel-capable MBusnodes. When transmitting, the bus controllercould stripe data across as many DATA linesas the target node has available. This parallelMBus design is backward compatible withtraditional MBus and can even use the same,unmodified mediator.

ImpactHistory has shown time and again that pro-viding a diverse array of powerful buildingblocks will enable the masses to synthesizeunique and creative things undreamed of bythe original creators. Today’s explosion ofintelligent, networked devices (the Internetof Things) is driven, at least in part, by theaccessibility of modular components, whichlet individuals pull together a custom-tailored set of microcontrollers, radios, sen-sors, and other parts optimized for newapplications.

In contrast to these systems, previous for-ays into microscale design have lacked thismodularity. From our own experience, theMichigan Integrated Circuits Lab—theresearch team that produced MBus andtoday’s Michigan Micro Motes—in 2008produced the Phoenix processor, then thesmallest, lowest-power computing system; itwas essentially a temperature sensor. Uponrequest from collaborators in medicine, thenext project was to build a pressure sensor ofsimilar size and energy budget. As a conse-quence of the tight, monolithic design, ittook nearly three years to change a tempera-ture sensing system to a pressure sensing sys-tem. MBus introduces critical modularity tothe microscale design space. Today, we candevelop, optimize, and debug a processorindependent of the radio, temperature, orpressure sensor. With MBus, we can pivotthe temperature sensing stack presented inour ISCA paper1 to a pressure sensing stackin only three months, reusing 80 percent ofthe components.

MBus’s key contribution is the recogni-tion that modularity is critical for system

design and that no existing technology canenable modularity for microscale systems.MBus (or perhaps a different, new intercon-nect that respects the requirements of micro-scale systems) is a necessary and criticalenabler for the development of the next classof computing.

A lthough we have derived great benefitfrom MBus, it is not a success if it is a

technology used exclusively by a team ofresearchers at one university. For this reason,we released the MBus specification and areference Verilog implementation for free tothe public domain (http://mbus.io).

Today, it is difficult if not impossible topurchase a microcontroller without supportfor SPI, I2C, or UART. In 20 years, we envi-sion MBus joining that list. Today, we con-tinue to develop new chips that add newcapabilities to microscale systems. Leveragingthe modularity and reusability afforded byMBus, we can synthesize completely new sys-tems in a fraction of the time previouslyrequired. As technology progresses and inter-est in the design and development of milli-meter-scale systems grows beyond theresearch realm, we aim to establish MBus asthe solution for next-generation systemsintegration.

Currently, as key core components of theMichigan Micro Mote platform stabilize, weare beginning the outreach effort. By provid-ing core components with MBus front endssuch as the processor, radio, and power man-agement to system developers, we aim to ena-ble others to springboard rapidly intomicroscale system design. By releasing MBusitself as a free and open standard with a freelyavailable implementation, we aim to encour-age its use and adoption by the emergingmicroscale systems design community, lead-ing to eventual integration with the broaderembedded systems community as microscalesystems become mainstream.

Last spring, the Michigan Micro Motewas inducted into the Computer HistoryMuseum as the world’s smallest computer. Itis our sincerest hope that the record does notstand for long, and we expect that MBus willplay a key role in helping to continue tobreak records in the years to come. MICRO

..............................................................................................................................................................................................

TOP PICKS

............................................................

68 IEEE MICRO

AcknowledgmentsThis work was supported in part by the Ter-raSwarm Research Center, one of six centerssupported by the STARnet phase of theFocus Center Research Program (FCRP), aSemiconductor Research Corporation pro-gram sponsored by MARCO and DARPA.This research was conducted with govern-ment support under and awarded by theDoD, Air Force Office of ScientificResearch, National Defense Science andEngineering Graduate (NDSEG) Fellow-ship, and 32 CFR 168a. This material isbased on work partially supported by theNational Science Foundation under grantsCNS-0964120, CNS-1111541, and CNS-1350967, and generous gifts from Intel,Qualcomm, and Texas Instruments.

....................................................................References1. P. Pannuto et al., “MBus: An Ultra-Low

Power Interconnect Bus for Next Genera-

tion Nanopower Systems,” Proc. 42nd Int’l

Symp. Computer Architecture, 2015,

pp. 629–641.

2. Y. Lee et al., “A Modular 1 mm3 Die-

Stacked Sensing Platform with Low Power

I2C Inter-Die Communication and Multi-

Modal Energy Harvesting,” IEEE J. Solid-

State Circuits, vol. 48, no. 1, 2013,

pp. 229–243.

3. Y.-S. Kuo et al., “MBus: A 17.5 pJ/bit/chip

Portable Interconnect Bus for Millimeter-

Scale Sensor Systems with 8 nW Standby

Power,” IEEE Proc. Custom Integrated Cir-

cuits Conf., 2014; doi:10.1109/CICC.

2014.6946046.

Pat Pannuto is a PhD candidate in the Com-puter Science and Engineering Department atthe University of Michigan. His researchinterests include systems design for microscalesystems and technologies for high-fidelityindoor localization. Pannuto received a BSEin computer engineering from the Universityof Michigan. He also received the NationalDefense Science & Engineering GraduateFellowship, the National Science FoundationGraduate Research Fellowship ProgramFellowship, and the Qualcomm InnovationFellowship. Contact him at [email protected].

Yoonmyung Lee is an assistant professor inthe Department of Semiconductor SystemsEngineering at Sungkyunkwan University.His research interests include energy-efficientintegrated circuits design for low-power, high-performance VLSI systems and millimeter-scale wireless sensor systems. Lee received aPhD in electrical engineering from theUniversity of Michigan. He also received aSamsung Scholarship and Intel PhD Fellow-ship. Contact him at [email protected].

Ye-Sheng Kuo is a research fellow in theComputer Science and Engineering Depart-ment at the University of Michigan. Hisresearch interests include embedded systems,sensor networks, and visible light communica-tion. Kuo received a PhD in electrical engi-neering from the University of Michigan.Contact him at [email protected].

ZhiYoong Foo is the chief executive officer ofCubeWorks, a start-up spun out of the Univer-sity of Michigan. His research interests includelow-cost and low-power VLSI circuit systemsintegration. Foo received a PhD in electricalengineering from the University of Michigan,where he completed the work for this article.Contact him at [email protected].

Benjamin Kempke is a PhD candidate inthe Electrical Engineering and ComputerScience and Engineering Departments atthe University of Michigan. His researchinterests include the design of low-powerand high-accuracy indoor RF localizationtechnologies. Kempke received an MSE incomputer science and engineering from theUniversity of Michigan. Contact him [email protected].

Gyouho Kim is a research fellow in the Elec-trical Engineering Department at the Univer-sity of Michigan. His research interests includeultra-low-power VLSI design for energy-constrained systems and novel sensing plat-forms. Kim received a PhD in electricalengineering from the University of Michigan.Contact him at [email protected].

Ronald G. Dreslinski is an assistant profes-sor in the Computer Science and EngineeringDepartment at the University of Michigan.

.............................................................

MAY/JUNE 2016 69

His research interests include near-thresholdcomputing (NTC), architectural simulatordevelopment, and high-radix on-chip inter-connects. Dreslinski received a PhD in com-puter science and engineering from the Uni-versity of Michigan. He is the winner of theISSCC 2011 student design contest and is therecipient of the Young Computer ArchitectAward from the IEEE Computer Society’sTechnical Committee on Computer Architec-ture. Contact him at [email protected].

David Blaauw is a professor in the Electri-cal and Computer Engineering Departmentat the University of Michigan. His researchinterests include VLSI design, ultra-lowpower, and high-performance design.Blaauw received a PhD in computer sciencefrom the University of Illinois, Urbana-Champaign. He is an IEEE Fellow. Contacthim at [email protected].

Prabal Dutta is an associate professor in theElectrical Engineering and Computer ScienceDepartment at the University of Michigan.His research interests include design, deploy-ment, and scaling of wireless, embedded, net-worked, and sensory systems for applicationsin health, energy, and the environment. Duttareceived a PhD in computer science from theUniversity of California, Berkeley. He is therecipient of an Alfred P. Sloan Research Fel-lowship, an NSF CAREER Award, a PopularScience Brilliant Ten Award, and an IntelEarly Career Award. Contact him at [email protected].

..............................................................................................................................................................................................

TOP PICKS

............................................................

70 IEEE MICRO