Lifting-BasedFractionalWaveletFilter:Energy-EfficientDWT ...tionalwaveletﬁlter(FrWF)architecture[62].Foranimage ofdimension J × J pixels,theline,stripe,andblock-based architectures

Research ArticleLifting-Based Fractional Wavelet Filter Energy-Efficient DWTArchitecture for Low-Cost Wearable Sensors

Mohd Tausif 1 Ekram Khan 2 Mohd Hasan 2 and Martin Reisslein 3

1Faculdade de Engenharia Departamento de Informatica Universidade da Beira Interior Covilhatilde Portugal2Department of Electronic Engineering Zakir Husain College of Engineering amp Technology Aligarh Muslim UniversityAligarh 202002 India3School of Electrical Computer and Energy Engineering Arizona State University Goldwater Center East Tyler Mall 650MC 5706 Tempe AZ 85287-5706 USA

Correspondence should be addressed to Martin Reisslein reissleinasuedu

Received 20 April 2020 Revised 18 November 2020 Accepted 28 November 2020 Published 16 December 2020

Academic Editor Constantine Kotropoulos

Copyright copy 2020 Mohd Tausif et al ampis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

ampis paper proposes and evaluates the LFrWF a novel lifting-based architecture to compute the discrete wavelet transform (DWT)of images using the fractional wavelet filter (FrWF) In order to reduce the memory requirement of the proposed architecture onlyone image line is read into a buffer at a time Aside from an LFrWF version with multipliers ie the LFrWFm we develop amultiplier-less LFrWF version ie the LFrWFml which reduces the critical path delay (CPD) to the delay Ta of an adder ampeproposed LFrWFm and LFrWFml architectures are compared in terms of the required adders multipliers memory and criticalpath delay with state-of-the-art DWTarchitectures Moreover the proposed LFrWFm and LFrWFml architectures along with thestate-of-the-art FrWF architectures (with multipliers (FrWFm) and without multipliers (FrWFml)) are compared throughimplementation on the same FPGA boardampe LFrWFm requires 22 less look-up tables (LUT) 34 less flip-flops (FF) and 50less compute cycles (CC) and consumes 65 less energy than the FrWFm Also the proposed LFrWFml architecture requires 50less CC and consumes 43 less energy than the FrWFml ampus the proposed LFrWFm and LFrWFml architectures appear suitablefor computing the DWT of images on wearable sensors

1 Introduction

11 Motivation ampe availability of low-cost small-sizedcameras attached to wearable sensors and portable imagingdevices has opened up a wide range of imaging-orientedapplications including assisted living smart healthcaretraffic monitoring virtual sports experiences and posturerecognition [1ndash12] An interconnection of visual sensornodes (sensor nodes with attached camera) is known asvisual sensor network (VSN) [13 14] or as wireless multi-media sensor network (WMSN) [15 16] Wearable visualsensors may also be a part of the Internet of things (IoT)[17ndash21] Low-cost IoT wearable sensors [22] enable a widerange of activities for the benefit of society eg hazardavoidance systems for worker safety [23] navigation aids for

visually impaired individuals [24] activity monitoring [25]smart irrigation [26] and sports [27]

In many visual applications of wearable sensors andportable imaging devices images captured by the cameraneed to be transmitted wirelessly to a body-worn or nearbyhub device ampe wearable sensors and portable imagingdevices have limited resources and the wireless links havenarrow bandwidth [28] making it impossible to directlysend the raw (uncompressed) imagesampus there is a need tocompress the images before transmission [29] amperefore animage coder is needed in order to compress the images In animage coder an image is generally first transformed usingthe discrete cosine transform (DCT) [30] or discrete wavelettransform (DWT) [31 32] and then it is quantized andentropy coded ampe DWT which is also used in JPEG 2000

HindawiAdvances in MultimediaVolume 2020 Article ID 8823689 13 pageshttpsdoiorg10115520208823689

[33] is popular in a wide variety of applications includingactivity monitoring [34] fault detection in inverter circuits[35] medical imaging [36] image denoising [37] imagerecognition [38] image reconstruction [39] watermarking[40] computer graphics and real-time processing [41] dueto its multiresolution feature and excellent energy com-paction properties [42 43]

ampe hardware architectures for wearable visual sensorsand portable imaging devices in the IoT and wirelessmultimedia sensor networks should require minimalhardware resources and consume low energy for a smallform factor and long battery life [44 45] Generally thecomputational capabilities of visual sensor nodes have beenincreasing in recent years [46] Nevertheless due to theeconomic pressures on visual sensor designs and despite theemergence of specialized hardware acceleration eg FPGAand components [47ndash49] the computational resources ofvisual sensors will likely remain scarce Emerging computingand communication paradigms such as mobile ad hoc cloudcomputing [50 51] expect the nodes to not only transmitsensed images but also to participate in some servicecomputing functions eg for localized image analysis anddecision making which can be orchestrated through soft-ware-defined networking and control structures [52ndash54] Inorder to make the economical functioning of wearable visualsensors in such networked systems feasible the resourceusage of the image coding and transform must be very lowIn particular as the DWT is an important component of animage coder for visual sensors the DWT hardware archi-tecture should have minimal area and energy consumption

12 Related Work ampe conventional convolution-basedDWT computation of an image requires a huge amount ofmemory due to its row- and column-wise scanning [55 56]making it unsuitable for memory-constrained wearablesensors ampe different low-memory architectures reported inthe literature for computation of DWTcan be categorized asline-based architectures [57] stripe-based architectures[58 59] block-based architectures [60 61] and the frac-tional wavelet filter (FrWF) architecture [62] For an imageof dimension J times J pixels the line stripe and block-basedarchitectures require random access memory (which werefer to as RAM or memory for brevity) in the range of 3J to55J words while the FrWF architecture requires 2J + 22words of RAM [62]

Another low-memory pipeline-based architecture hasbeen proposed in [63] However the design in [63] is basedon the nonseparable DWT computation approach which isunpopular because of its higher computational requirementsthan the conventional separable approach It is a well-knownfact that at a particular throughput the separable 2D DWTcomputation approach is computationally more efficientthan the nonseparable approach [64] A dual data scanning-based DWT architecture is reported in [65] In this archi-tecture several 2D DWT units are combined into a parallelmultilevel architecture for computing up to six DWT levelsHowever this architecture needs 13J words of memory Anarchitecture based on an interlaced read scan algorithm

(IRSA) is proposed in [66] in conjunction with a lifting-based approach with a 53 filter-bank which requires 2J

words of memory However the long critical path delay(CPD) of 2Tm + 4Ta (where Tm is the multiplier delay andTa is the adder delay) of the architecture in [66] may limit itsuse in real-time applications

An LUT-based lifting architecture for computing theDWT has been reported in [67]ampe design [67] has low areaand power requirements However it has a long CPD equalto TLUT + (W4 minus 1)TFA + 2Ta (where TLUT is the look-uptable (LUT) delay W 16 bits is the word length and TFA isthe full adder delay) A lifting-based architecture for com-puting both the 1D and 2DDWT has been presented in [68]However this design uses a transpose buffer of size J2 Anenergy-efficient block-based DWT architecture has beenproposed in [61] However this architecture requires a largenumber of multipliers namely 16 and 36 multipliers for 53and 97 filters respectively Another energy-efficient lifting-based reconfigurable DWT architecture has been proposedin [69] mainly for medical applications However thefrequency of operation of this architecture is limited to20MHz An energy-efficient lifting-based configurableDWT architecture for neural sensing applications has beenproposed in [70] requiring 12 adders and 12 multipliersHowever its operating frequency is limited to only 400KHzand 80KHz for the gating and interleaving architecturesused in the main architecture respectively

A power-efficient modified form of the DWT architec-ture has been presented in [71] using Radix-8 boothmultipliers ampis architecture uses bit truncation to reducethe area and power However bit truncation degrades thequality of the reconstructed image when the inverse DWT isapplied ampere have been some DWT implementations ongraphics processing units (GPUs) [72ndash78] however GPUsare relatively expensive for low-cost sensing platforms

ampe recently proposed FrWF architecture requires only2J + 22 words of memory and has a CPD equal to the delayof a multiplier Tm [62] A multiplier-less FrWF architecturewas also reported in [62] which reduces the CPD to the delayof an adder Ta Ta ltTm However the FrWF architecture(with and without multipliers) has high energy consumptionowing to its large number of compute cycles ampe highenergy consumption of the FrWF architecture may beprohibitive for wearable sensors and portable imaging de-vices with tight memory and energy constraints [79]

13 Contributions and Structure of 7is Article ampis paperproposes the LFrWFm a novel lifting-based energy-efficientarchitecture to compute the DWT coefficients of an imagewith a 53 filter-bank At the core of the proposed LFrWFmarchitecture is a novel basic Lift_block that computes the H

and L subband coefficients with only two two-input addersand one multiplier (plus two pipeline registers) thus greatlyreducing the hardware requirements compared to priorconvolution architectures Moreover a multiplier-lessimplementation of the proposed architecture denoted byLFrWFml is designed ampe multiplier-less LFrWFml has ashorter CPD than the proposed multiplier-based LFrWFm

2 Advances in Multimedia

architecture ampe proposed LFrWFm and LFrWFml archi-tectures are not only efficient in terms of energy but alsorequire fewer adders multipliers and registers than thestate-of-the-art FrWF architectures (with multipliers(FrWFm) and without multipliers (FrWFml)) We comparethe proposed architectures with state-of-the-art DWTcomputation architectures in terms of the required addersmultipliers memory and critical path delay We also im-plement the proposed architectures and the state-of-the-artFrWF architectures on the same FPGA board Experimentalresults demonstrate that the proposed LFrWFm andLFrWFml architectures have lower hardware resource re-quirements and energy consumption than the state-of-the-art FrWFm and FrWFml architectures

ampe remaining part of the paper is arranged as followsSection 2 gives a brief overview of the DWT and FrWFtechniques ampe proposed lifting-based LFrWF architectureis described in detail in Section 3 along with its memoryrequirement ampe evaluation results along with relateddiscussions are presented in Section 4 Finally Section 5concludes the paper

2 Background

ampis section briefly reviews the DWT and FrWF techniquesalong with FrWF architecture ampe main notations used inthis article are summarized in Table 1

21 Discrete Wavelet Transform (DWT) ampe most popularapproach for computing the two-dimensional (2D) DWTofan image is the separable approach in which the rows arefiltered first followed by column-wise filtering of theresulting coefficientsWhen a row is convolved (filtered) by alow-pass filter (LPF) and a high-pass filter (HPF) followedby downsampling by a factor of two the results are known asapproximation and detail coefficients respectively For a 1Dsignal of dimension J which we consider as a preliminarystep for computing the 2D DWT there are J2 approxi-mation coefficients and J2 detail coefficients Combiningthe downsampling with the convolution operation theapproximation coefficients a(i) and the detail coefficientsd(i) for i 0 1 J2 minus 1 can be expressed mathematicallyas [55]

a(i) 1113944

jlfloorf12rfloor

jminuslfloorf12rfloorx2i+jlj i 0 1

J

2minus 1

d(i) 1113944

jlfloorf22rfloor

jminuslfloorf22rfloorx2i+j+1hj i 0 1

J

2minus 1

(1)

respectively whereby lj and hj denote the jth LPF and HPFcoefficient respectively x2i+j denotes the (2i + j)th signalsample while f1 and f2 are the number of LPF and HPFcoefficients respectively ampe largest integer less than orequal to x is denoted by the symbol lfloorxrfloor

In the separable approach all image rows are firstconvolved separately by a HPF and a LPF followed bydownsampling with a factor of two resulting in the H and L

subbands ampen the columns of the H and L subbands areconvolved by a HPF and a LPF followed by downsamplingwith a factor of two resulting in the HH HL LH and LLsubbands [80] However this approach needs to save theentire J times J image in the RAM on the sensor (board) systemampus this DWT computation approach requires a hugeamount of memory making this approach unsuitable forlow-cost wearable sensors and portable imaging devices withlimited RAM [55 56]

ampe lifting scheme [81] computes the DWT of imagesusing inplace computations which save memory Moreoverthe lifting scheme uses predict and update steps for com-puting the subbands In particular the low-pass filteredcoefficients are predicted using the high-pass filtered coef-ficients ampus the lifting scheme reduces the convolutionoperations needed by the LPF coefficients Hence the liftingscheme reduces the number of arithmetic computationsrequired for computing the image DWT [82]

ampe lifting scheme for a 53 filter-bank is shown inFigure 1 In this figure x0 x1 x6 are the input signalsamples Among these samples x0 x2 x4 and x6 are theeven-indexed samples while x1 x3 and x5 are the odd-indexed samples Also α and β are the high-frequency andlow-frequency lifting parameters respectively G0 and G1 arethe scaling parameters whereby α minus05 β 025 andG0 G1 1 [66] d0 d1 and d2 are the high-frequencywavelet coefficients while a0 a1 a2 and a3 are the low-frequency wavelet coefficients ampe high- and low-frequencywavelet coefficients are computed following the diagram inFigure 1 for instance

d0 x0 + x2( 1113857α + x11113858 1113859G0

a1 d0 + d1( 1113857β + x21113858 1113859G1(2)

It should be noted that the arrows without an associatedsymbol in Figure 1 have the unit multiplication factor ie 1

22 Fractional Wavelet Filter (FrWF) ampe FrWF is a low-memory DWT computation technique [56] It uses a specificimage data scanning technique in order to reduce the memoryrequired for computing the DWT It selects a vertical filter area(VFA) scanning f1 rows of the image from an SD-card (wheref1 is the number of LPF coefficients) ampe rows in a VFA areread in raster scan order Once the reading of all the image rowsin a VFA is complete the VFA is shifted by two lines in thevertical direction ampis shifting of the VFA is done in order toincorporate the dyadic downsampling One line of theHHHL LH and LL subbands is computed from one VFA Allthe image lines are covered by shifting the VFA ampe VFA will

Table 1 Summary of main notations

J times J Image size in pixelsG Number of segments per image lineTm Delay of a multiplierTa Delay of an adderf1 Number of low-pass filter coefficientsf2 Number of high-pass filter coefficients

Advances in Multimedia 3

be shifted J2 times for an image of dimension J times Jampe FrWFhas been combinedwith a low-memory image coding algorithmto design an efficient image coder for WMSNs in [83]

An FPGA architecture for the FrWF with a 53 filter-bank has been proposed in [62] ampis FrWF architecturewhich follows the FrWF data scanning order requires 2J +

22 words of memory and a total of 5J22 compute cyclesampelarge number of compute cycles results in a high energyconsumption which may be prohibitive for resource-con-strained wearable visual sensors and portable imaging de-vices ampe proposed LFrWF focuses on reducing the energyconsumption for computing the DWT of images

3 Proposed LFrWF Low Energy Architecture

ampis section presents the proposed LFrWF lifting-basedarchitecture to compute the DWT of an image using theFrWF approach with a 53 filter-bank

31 Data Scanning Order ampe proposed lifting-based ar-chitecture follows the data scanning order of the FrWFalgorithm [56] It is assumed (as is common for low-memoryimplementations of the DWTcomputation) that the originalimage is stored on an SD-card throughout the SD accessesare appropriately buffered to compensate for the latencies ofthe SD-card accesses Initially a vertical filter area whichspans f1 image lines (f1 is the number of LPF coefficients) ismarked in the SD-card ampe rows of the image are read inraster scan order from the VFA one line at a time into theRAM buffer P_store (as shown in Figure 2) After theprocessing of all the rows of the VFA is completed the VFAis shifted down by two lines and the new rows are again readinto buffer P_store in raster scan order ampe complete imageis read by repeatedly shifting the VFA downwards by twolines until all the rows are read In the proposed architectureone complete line is read at a time and scanned in rasterorder in contrast the FrWF architecture in [62] reads only 5coefficients of an image line at a time

32 Proposed Lifting-Based LFrWF Architecture ampis sub-section describes the proposed lifting-based DWT archi-tecture in detail

321 Top-Level Architecture Figure 2 shows the top-levelblock diagram of the proposed LFrWF architectureampe LFrWFarchitecture works as follows First the input image pixels of aline are read into the register P_store ampis P_store register

stores the original image pixels of 8 bits each ampe pixels of theimage from P_store are sent to the Lift_block (as detailed inFigure 3) to compute the H and L subband coefficients usingthe lifting schemeampe generated H and L subband coefficientsare saved in the register 1D_storeampe contents of the 1D_storeregister are used as inputs for the Conv_block (as shown inFigure 4) which generates intermediate coefficients that aresaved in the HH_store HL_store LH_store and LL_storeregisters ampese intermediate values are successively updated bythe next image lines ampe intermediate values in the registersHH_store HL_store LH_store and LL_store after updatingwill give the values of the HHHL LH and LL subbandsrespectively Once the final subband coefficients of theHHHL LH and LL subbands are computed they are trans-ferred and saved in an external SD-cardampe functioning of thedifferent blocks leading to the computation of the subbands isdescribed next

322 Lifting Block In the lifting schemewith a 53 filter-banktwo previous high-pass filtered coefficients are used to predict alow-pass filtered coefficient For the efficient implementation ofthe lifting scheme we introduce a novel basic Lift_block Asillustrated in Figure 3 the basic Lift_block computes two H

subband coefficients and one L subband coefficient from agroup of five input pixels in three steps ampe inputs (Input1Input2 Input3 and Liftpar) and output (Out1) of the adders andmultiplier to be used in Figure 3 for the different steps areshown in Table 2ampe first two steps compute two coefficients oftheH subband and the third step computes a coefficient of theL

subband In Table 2 P0 P1 P2 P3 and P4 are the first fivepixels of an image line H0 and H1 are the first two high-passfiltered coefficients which are stored as the first two elements ofthe register 1D_store L0 is the first low-pass filtered coefficientand is stored as the third element of the register 1D_store ampehigh-pass filtered coefficients (H0 and H1) and the low-passfiltered coefficient (L0) are computed as

H0 P0 + P2( 1113857α + P1 (3)

H1 P2 + P4( 1113857α + P3 (4)

L0 H0 + H1( 1113857β + P2 (5)

G1

x0 x1 x2 x3 x4 x5 x6

α α α α α α

β β β β β β

d0 d1 d2

a0 a1 a2 a3

G0

Figure 1 Lifting steps for 53 filter-bank [66]Controller

Imag

e pix

els

P_store Li_block

1D_store Conv_ block

HH_store

HL_store

LH_store

LL_store To ex

tern

al m

emor

yca

rdS

D-c

ard

Data-path

Figure 2 Block diagram of top-level structure of the proposedlifting-based LFrWF architecture the novel Lift_block is displayedin detail in Figure 3 while the Conv_block is displayed in detail inFigure 4


where α minus05 and β 025 are lifting parameters [66]Once the five pixels (P0 P1 P2 P3 and P4) are processed thefirst two pixels are discarded and two new pixels are readalong with the previous last three pixels ampe same proce-dure in equations (3)ndash(5) is repeated on these new pixels tocompute the H and L subband coefficients

ampe basic Lift_block in Figure 3 requires two two-inputadders and one multiplier ampe functionality of this basicLift_block essentially replaces the functionality of the con-volution stage-1 block in the FrWFm architecture as shownin Figure 3 in [62] and elaborated in Figures 4ndash7 in [62] Foran LPF length of f1 and an HPF length of f2 the FrWFm

convolution stage-1 block in [62] requires f1 minus 1 two-inputadders and f1 multipliers for the low-pass filtering as well asf2 minus 1 two-input adders and f2 multipliers for the high-passfiltering ampus for a 53 filter the FrWFm convolution stage-1 block requires six adders as well as eight multipliers

323 Convolution Block In the Conv_block in Figure 4 theH subband coefficients from the 1D_store register aremultiplied by a suitable HPF and LPF coefficient (as de-termined by a multiplexer) and then addedstored with theprevious value in the registers HH_store and HL_storerespectively Similarly the L subband coefficient in the1D_store register is multiplied by a suitable HPF and LPFcoefficient (as determined by a multiplexer) and then addedstored with the previous value in the registers LH_store andLL_store respectively ampe values in the registers HH_storeHL_store LH_store and LL_store are updated to computethe coefficients of the HHHL LH and LL subbandsrespectively

We note that the Conv_block in Figure 4 is essentiallyequivalent to the aggregation of the FrWF convolutionstage-2 blocks in Figures 4ndash7 in [62] ampe Conv_block inFigure 4 requires four two-input adders and four multipliersOn the other hand the aggregation of the FrWF convolutionstage-2 blocks in Figures 4ndash7 in [62] requires two two-inputadders and two multipliers

324 Pipeline Registers ampe Lift_block and the Conv_blockuse two and four pipeline registers respectively to tem-porarily save the intermediate results after each computecycle amprough the use of the pipeline registers the criticalpath delay (CPD) of the proposed LFrWF architecturebecomes the multiplier delay Tm

Overall for a 53 filter considering both the basicLift_block (Figure 3) and the Conv_block (Figure 4) theproposed LFrWFm requires six two-input adders and fivemultipliers compared to eight two-input adders and tenmultipliers of the FrWFm architecture (Figures 4ndash7 in [62])

+

+

timesPipeline

Register

Out1

Input3

Liftpar

Input2Input1

Figure 3 Schematic representation of the novel basic Lift_blockemployed in the proposed LFrWF architecture

X

Select

+

X

Select

+

X

Select

+

X

Select

+

L subbandcoefficients

H subband coefficients

HH subband

line saved to SD-card

HL subband


LH subband


LL subband


LL_store

LH_store

HL_store

HH_store

Pipelineregister

Pipelineregister

Pipelineregister

Pipelineregister

Conv_block

hndash1h0h1

lndash2lndash1l0l1l2

hndash1h0h1


Figure 4 Schematic representation of Conv_block for computingsubbands in the proposed LFrWF

Table 2 Input and output to be used in the lift block in Figure 3

Input1 Input2 Input3 Liftpar Out1Step 1 P0 P2 P1 α H0Step 2 P2 P4 P3 α H1Step 3 H0 H1 P2 β L0


ampe proposed LFrWF architecture stores the originalimage and the subbands in the SD-cardampus higher waveletdecomposition levels can be computed with the same ar-chitecture whereby the LL subband coefficients are taken asinput

33 Proposed Multiplier-less LFrWFml Implementationampe 53 filter-bank coefficients (shown in Table 3) and the 53filter-bank lifting parameters involve integer division andmultiplication ampus they can be implemented using the shiftand add method More specifically the convolution with the53 filter-bank requires only integer multiplication and di-vision and can therefore be implemented with only shift andadd operations For example z middot 025 z middot 2minus 2 ie shiftingthe number z two times to the right is equivalent to dividing z

by 4 ampe shift and add concept as applied to the 53 filtercoefficients operates as follows

(1) ampe filter coefficient lminus2 l2 minus18 minus123 can beimplemented by three right shift operations fol-lowed by a complement operation

(2) ampe filter coefficient lminus1 l1 28 122 can beimplemented by two right shift operations

(3) ampe filter coefficient l0 68 122 + 12 can beimplemented by two right shift operations followedby addition with one right shift

(4) ampe filter coefficient h0 h2 minus12 can be imple-mented by one right shift operation followed by acomplement operation

(5) ampe coefficient h1 1 thus no shifting is required

With these specified shifting operations the convolutionblock can be simplified and implemented using only shiftersand adders Multiplier-less computation blocks for the 53LPF and HPF coefficients are given in Figures 5 and 6respectively One benefit of the multiplier-less imple-mentation over the multiplier-based architecture in Section32 is that the multiplier-less implementation reduces theCPD from the multiplier delay Tm down to the adder delayTa

34 Memory Requirement In order to compute the DWTcoefficients the proposed LFrWF architecture uses fourregisters (HH_store HL_store LH_store and LL_store)two register arrays (P_store and 1D_store) and six pipelineregisters ampe register array P_store (of size J words) is usedto store an image line ampe H and L subband coefficientscomputed by the Lift_block are saved in the register array1D_store of 3 wordsampe four registers HH_store HL_storeLH_store and LL_store are of J2 words each ampe totalmemory requirement of the proposed architecture is equalto the sum of all registers ie

MemLFrWF 3J + 9words (6)

35 Line Segmentation Equation (6) indicates that LFrWFmemory requirement grows with the image dimension J and

thus will be significantly greater than the FrWF memoryrequirement of 2J + 22 words for large images In order toreduce the memory requirement of the proposed LFrWFarchitectures each image line may be segmented as illus-trated in Figure 7 with overlapping of 1113860f121113861 coefficients atboth boundaries of the second to the last but one segment(the first and last segments only require overlapping at oneboundary) (Appendix E in reference [88]) In this approachonly one line segment needs to be read into the register arrayP_store ampus the memory requirement of the LFrWF withG line segments is

MemLFrWF seg J

G+ 2J + 2lfloor

f1

2rfloor + 9words (7)

For the 53 filter-bank with a VFA of f1 5 lines thememory requirement is

Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)

ampe other resource requirements are independent of linesegmentation and remain unchanged

ampe line segmentation reduces the memory requirementof the proposed LFrWF architectures so that their memoryrequirement can be reduced below the memory required byFrWF architectures of [62] ampe FrWF architecture does notinclude a line segmentation provision therefore its memoryrequirement cannot be reduced further We observe fromTable 4 that the memory requirements of the proposedLFrWF architectures are greater than the FrWF memoryrequirements However by incorporating the line seg-mentation approach the memory requirement of theLFrWF architectures can be reduced below that of the FrWFarchitectures In case of the 53 filter-bank we observe fromTable 4 that the memory requirement of the FrWF archi-tectures is 2J + 22 while the memory requirement of LFrWFarchitecture with G line segments is JG + 2J + 13 seeequation (8) amperefore the LFrWF memory requirement isless than the FrWF memory requirement if Ggt J9

4 Results and Discussion

ampis section presents the implementation of the proposedLFrWF architecture and its comparison with state-of-the-artarchitectures First we compare the proposed LFrWF ar-chitecture with several state-of-the-art architectures in termsof the required numbers of adders and multipliers as well asthe critical path delay (CPD) and required memory Nextthe postimplementation results of the proposed LFrWFarchitectures are compared with the state-of-the-art FrWF

Table 3 Coefficients of 53 filter-bank [84]

LPF coeff Value HPF coeff Valuelminus2 minus18 hminus2 0lminus1 28 hminus1 0l0 68 h0 minus12l1 28 h1 1l2 minus18 h2 minus12


architecture [62] by implementing both architectures on theXilinx Artix-7 FPGA platform

41 Adders Multipliers CPD and Memory Table 4 com-pares the numbers of required adders andmultipliers as wellas the CPD and the required RAM of the proposed LFrWFarchitectures with state-of-the-art architectures ampe

numbers of adders and multipliers of the existing state-of-the-art architectures shown in Table 4 have been taken fromthe corresponding papers We observe from Table 4 that theproposed LFrWFm architecture requires the least number ofadders (namely only six adders see Figures 3 and 4) amongthe state-of-the-art architectures While the proposedLFrWFm reduces the number of required adders only by twocompared to the FrWFm architecture the proposed LFrWFm

Select

Adderp3

AdderTo RAM

BufferClearLoad

Three shift and complement (lndash2)

Two shift (lndash1)

One shift + two shift (l0)

Two shift (l1)

Three shift and complement (l2)

Three shift and

complement (l-2)

Two shift(l-1)

One shift + Two shift

(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5

Figure 5 Multiplier-less block for LPF

Select

Adder AdderTo RAM

BufferClearLoad

One shi and onecomplement (h0)

No shi (h1)

One shi and one complement (h2)

One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3

Figure 6 Multiplier-less block for HPF

JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment

Figure 7 Illustration of segmentation of an image line of J pixels into G segments with overlapping of half the VFA ie of f12 lines at eachend of a segment


reduces the number of adders down to less than half of theother prior architectures Among the architectures usingmultipliers the proposed LFrWFm architecture also requiresthe least number of multipliers namely only five multi-pliers see Figures 3 and 4 Only the RMA [85] has a similarlylowmultiplier requirement with six multipliers (but requiresapproximately twice the memory compared to LFrWF) ampeother prior architectures require twice or more multipliersthan the proposed LFrWFm architecture

We also observe from Table 4 that the CPD of theproposed LFrWFm architecture and the FrWFm architecture[62] are Tm which is less than the architectures in [85 86]We note from Table 4 that the multiplier-less LFrWFml andFrWFml have reduced the CPD to Ta which is less than theCPD of other state-of-the-art architectures ampe CPD of Ta

achieved by the proposed LFrWFml architecture cuts theshortest CPD of any existing architecture of 2Ta achieved bythe Aziz architecture [87] down to half Note that the shifterdelay Ts is commonly larger than the adder delay Ta ieTs gtTa thus the PMA architecture [85] has a longer CPDthan the Aziz architecture ampe benefit of the reduction inCPD is that the architectures can be operated at higherfrequencies since maximum operations frequency 1CPDAs the CPD decreases the maximum operating frequencyincreases

Table 4 furthermore indicates that the FrWF architecturehas the lowest memory requirement However the memoryrequirement of the proposed LFrWF architecture is less thanthe memory requirement of the other state-of-the-art ar-chitectures in Table 4 As noted in Section 43 with seg-mentation of a line of J words (pixels) into G segments (ofJG words each) the LFrWF memory requirement dropsbelow the FrWF memory requirement if more than J9segments are used

42 FPGA Implementation ampe proposed LFrWF archi-tecture computes the DWT coefficients of images based onlifting while following the FrWF approach As observedfrom Table 4 the FrWF architecture [62] requires the leastmemory among the state-of-the-art architectures ampus weimplemented the FrWF architectures [62] and the proposedLFrWF architectures (initially without segmentation ieG 1) on an Artix-7 FPGA (family Artix-7 device xc7a15tpackage csg324 speed minus2L) ampe implementations used

identical multipliers adders and other components pro-vided by the Xilinx Artix-7 FPGA family All architecturesused an input pixel width of 8 bits and a data-path width of16 bits Table 5 summarizes the FPGA implementationcomparison We report averages for evaluations with sevenpopular 512 times 512 (8 bitspixel) test images namely ldquolenardquoldquobarbarardquo ldquogoldhillrdquo ldquoboatrdquo ldquomandrillrdquo ldquopeppersrdquo andldquozeldardquo obtained from the Waterloo Repertoire (httplinksuwaterlooca) ampe energy consumption is evaluated bymultiplying the number of compute cycles with the averagepower consumption and the compute (clock) cycle dura-tions of 50 ns and 15 ns for the architectures with multi-pliers and without multipliers respectivelyampese clock cycledurations have been selected to satisfy the CPD constraint asgiven in Table 5 namely a CPD of 48 ns for the design withmultipliers and a CPD of 145 ns for the multiplier-lessdesign ampe number of compute cycles and the averagepower consumption were evaluated by simulation with theXilinx Vivado software suite version 20182

We observe from Table 5 that the proposed LFrWFmarchitecture requires approximately 22 less LUTs 34 lessFFs and 50 less compute cycles and consumes 65 lessenergy than the FrWFm architecture Due to the reducednumber of hardware components (LUTs and FFs) the areaoccupied by the LFrWFm architecture will be less than thearea of the corresponding FrWFm architecture Moreoverthe proposed multiplier-less LFrWFml architecture requires26 less FFs and 50 less cycles and consumes 43 lessenergy than the multiplier-less FrWFml architecture [62]ampe proposed LFrWFml architecture requires slightly moreLUTs than the multiplier-less FrWFml architecture

We also observe from Table 5 that the proposed LFrWFreduces the number of required compute cycles to roughlyhalf the compute cycles required by the FrWF More spe-cifically while the FrWF requires on the order of 10 millioncompute cycles for a 512 times 512 image the proposed LFrWFrequires only a little more than 5 million compute cyclesampis substantial reduction is primarily due to the compu-tational efficiency of the novel Lift_block (see Section 322)for computing the decomposition subband coefficients

Moreover we observe from Table 5 that the powerconsumption of the proposed LFrWF architecture withmultipliers is less than the power consumption of thecorresponding FrWF architecture with multipliers while themultiplier-less LFrWF and FrWF have approximately thesame power consumption ampe energy consumption isevaluated by multiplying clock cycle duration (which isbased on the CPD) with the number of clock cycles and theconsumed power Due to the reduced (almost half ) numberof compute cycles and the lower (or same) power con-sumption the energy consumption levels of the proposedLFrWF architectures are substantially lower than the energyconsumption levels of the FrWF architectures We furtherobserve from Table 5 that compared to the designs withmultipliers the multiplier-less designs of both the LFrWFand the FrWF have the same numbers of clock cycles butshorter CPD and (slightly) reduced power levels thus themultiplier-less designs have substantially reduced energyconsumption levels

Table 4 Comparison of required adders and multipliers as well ascritical path delay and required RAM of proposed LFrWFm andLFrWFml vs state-of-the-art architectures for one wavelet de-composition level with 53 filter-bank (S number of parallel procunits Ts delay of shifter)

Architecture Add Mul CPD MemPMA [85] 56 28 Ta + Ts 48J

RMA [85] 12 6 2Ta + Tm 65J

Savic and Rajovic [86] 22 17 Tm + Ta 4J

Aziz [87] 20 10 2Ta 4J

FrWFm [62] 8 10 Tm 2J + 22FrWFml [62] 10 0 Ta 2J + 22LFrWFm 6 5 Tm 3J + 9LFrWFml 11 0 Ta 3J + 9


We also observe from Table 5 that both architectureshave the same CPD We note that the numbers of hardwarecomponents eg adders multipliers LUT and FF andother parameters such as the number of clock cyclesmemory and CPD (Tm or Ta) are independent of theplatform on which the design is implemented and the testimage Among the results presented in Tables 4ndash6 only theenergy consumption the power consumption and the en-ergy delay product (EDP) depend on the platform andimage

43 Line Segmentation We observe from Table 6 that in-creasing the number of line segments G reduces the memoryrequirement while increasing the number of compute cyclesand the energy consumption ampe compute cycle and energyconsumption increases are mainly due to the overlapping of1113860f121113861 coefficients at the line segment boundaries whichneed to be read twice However for all line segmentations(G 2 4 8) the number of compute cycles and energyconsumption are less than for the corresponding FrWFarchitectures see Table 5 We observe from Tables 5 and 6that even with G 8 segments per line the number ofcompute cycles and the energy consumption of the proposedLFrWF architectures are less than those for the corre-sponding FrWF architectures Since the FrWF architecturesof [62] read only 5 pixels at a time the segmentation ap-proach cannot be incorporated into the FrWF architectureHence the memory of the FrWF architectures cannot befurther reduced by incorporating line segmentation

ampe EDPs of the LFrWF and FrWF architectures withand without multipliers are compared in Figures 8 and 9respectively ampe EDP which characterizes both the con-sumed energy and the computational performance isevaluated by multiplying the consumed energy with thecorresponding clock cycle duration We observe from Fig-ures 8 and 9 that the EDPs of the proposed LFrWF archi-tectures (with and without multipliers) are less than theEDPs of the corresponding FrWF architectures (with andwithout multipliers) ampe EDP of the proposed LFrWFm

architecture with multipliers (G 1) is approximately 65less than that for the FrWF architecture withmultipliers andthe EDP of the proposed LFrWFml multiplier-less archi-tecture (G 1) is approximately 43 less than that for themultiplier-less FrWF architecture We observe from

Figures 8 and 9 that the EDPs of the proposed LFrWFarchitectures increase with the number G of segmentsHowever even with G 8 segments per image line theEDPs of the proposed LFrWF architectures are less thanthose for the corresponding FrWF architectures

5 Conclusion

ampis paper proposed and evaluated a lifting-based archi-tecture to compute the DWT coefficients of an image basedon the FrWF approach with a 53 filter-bank ampe proposedarchitecture requires fewer adders and multipliers thanstate-of-the-art architectures ampe proposed architecture

Table 5 Comparison of FPGA implementation resource utiliza-tion of proposed LFrWF vs FrWF [62] with 53 filter-bank for onewavelet decomposition level for 512 times 512 image

Param FrWFm LFrWFmMultiplier-less

FrWFml LFrWFml

LUT 215 168 119 135FF 305 201 190 185CC 10485760 5242880 10485760 5242880CPD (ns) 480 480 145 145Power (W) 0162 0114 0110 0109Energy (mJ) 8545 3014 1741 0860LUT look-up tables FF flip-flops CC compute cycles

Table 6 Line segmentation evaluation memory requirement andnumber of compute cycles for proposed LFrWF as well as energyconsumption of proposed LFrWFm and LFrWFml for different linesegment numbers G with 53 filter-bank for 512 times 512 image

Parameters G 1 G 2 G 4 G 8Memory (words) 1545 1293 1165 1101 cycles 5242880 5243904 5244928 5245952E (mJ) LFrWFm 3014 3040 3073 3122E (mJ) LFrWFml 0990 0997 1006 1017

4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm

(G = 1)LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)

Architectures with multipliers

Figure 8 Energy delay product (EDP) of architectures withmultipliers LFrWF with G line segments and FrWF

3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)

Multiplier-less architectures

Figure 9 Energy delay product (EDP) of architectures withoutmultipliers LFrWFml with G line segments and FrWFml


with multipliers (LFrWFm) and without multipliers(LFrWFml) and the state-of-the-art FrWF architecture (withand without multipliers) [62] have been implemented on thesame FPGA board and compared

ampe experimental results show that the proposedLFrWFm architecture requires less hardware components(and thus less area) and consumes 65 less energy than theFrWFm architecture Moreover the proposed LFrWFmlarchitecture consumes 43 less energy with only a slightincrease in area compared to the FrWFml architecture ampelower energy consumption with minimal area overheadmakes the proposed architectures promising candidates forcomputing the DWT of images on resource-constrainedwearable sensors

An important direction for future research is to integratethe LFrWF architecture with efficient architectures of state-of-the-art wavelet-based image coding algorithms to designFPGA-based image coders for real-time applications onwearable visual sensors and IoT platforms Another inter-esting future research direction is the examination of the useof our proposed approach in the context of compressivesensing [15 89]

Data Availability

ampe evaluation data used to support the findings of this studyare included within the article

Conflicts of Interest

ampe authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] Y Gu Y Tian and E Ekici ldquoReal-time multimedia pro-cessing in video sensor networksrdquo Signal Processing ImageCommunication vol 22 no 3 pp 237ndash251 2007

[2] T Kamal R Watkins Z Cen J Rubinstein G Kong andW M Lee ldquoDesign and fabrication of a passive dropletdispenser for portable high resolution imaging systemrdquo Sci-entific Reports vol 7 pp 1ndash13 2017

[3] R LeMoyne and T Mastroianni ldquoWearable and wireless gaitanalysis platforms smartphones and portable media devicesrdquo inWireless MEMS Networks and Applications D UttamchandaniEd pp 129ndash152 Woodhead Publishing Cambridge UK 2017

[4] A Nag S C Mukhopadhyay and J Kosel ldquoWearable flexiblesensors a reviewrdquo IEEE Sensors Journal vol 17 no 13pp 3949ndash3960 2017

[5] H NgW-H Tan J Abdullah andH-L Tong ldquoDevelopmentof vision based multiview gait recognition system withMMUGait databaserdquo 7e Scientific World Journal vol 20142014

[6] S Plangi A Hadachi A Lind and A Bensrhair ldquoReal-timevehicles tracking based on mobile multi-sensor fusionrdquo IEEESensors Journal vol 18 no 24 pp 10077ndash10084 2018

[7] S Seneviratne Y Hu T Nguyen et al ldquoA survey of wearabledevices and challengesrdquo IEEE Communications Surveys ampTutorials vol 19 no 4 pp 2573ndash2620 2017

[8] L Tsao L Li and L Ma ldquoHuman work and status evaluationbased on wearable sensors in human factors and ergonomics

a reviewrdquo IEEE Transactions on Human-Machine Systemsvol 49 no 1 pp 72ndash84 2019

[9] B Velusamy and S C Pushpan ldquoAn enhanced channel accessmethod to mitigate the effect of interference among bodysensor networks for smart healthcarerdquo IEEE Sensors Journalvol 19 no 16 pp 7082ndash7088 2019

[10] G Yang W Tan H Jin T Zhao and L Tu ldquoReview wearablesensing system for gait recognitionrdquo Cluster Computingvol 22 no S2 pp 3021ndash3029 2019

[11] M Zheng P X Liu R Gravina and G Fortino ldquoAn emergingwearable world new gadgetry produces a rising tide ofchanges and challengesrdquo IEEE Systems Man and CyberneticsMagazine vol 4 no 4 pp 6ndash14 2018

[12] J Wang and S Payandeh ldquoHand motion and posture rec-ognition in a network of calibrated camerasrdquo Advances inMultimedia vol 2017 Article ID 2162078 2017

[13] C-H Hsia J-M Guo and J-S Chiang ldquoA fast DiscreteWavelet Transform algorithm for visual processing applica-tionsrdquo Signal Processing vol 92 no 1 pp 89ndash106 2012

[14] S Soro and W Heinzelman ldquoA survey of visual sensornetworksrdquo Advances in Multimedia vol 2009 Article ID640386 2009

[15] A S Unde and P P Deepthi ldquoRate-distortion analysis ofstructured sensing matrices for block compressive sensing ofimagesrdquo Signal Processing Image Communication vol 65pp 115ndash127 2018

[16] S Heng C So-In and T G Nguyen ldquoDistributed imagecompression architecture over wireless multimedia sensornetworksrdquo Wireless Communications and Mobile Computingvol 2017 Article ID 5471721 2017

[17] F Sun C Mao X Fan and Y Li ldquoAccelerometer-basedspeed-adaptive gait authentication method for wearable IoTdevicesrdquo IEEE Internet of 7ings Journal vol 6 no 1pp 820ndash830 2019

[18] H Baali H Djelouat A Amira and F Bensaali ldquoEmpow-ering technology enabled care using IoT and smart devices areviewrdquo IEEE Sensors Journal vol 18 no 5 pp 1790ndash18092018

[19] S Hiremath G Yang and K Mankodiya ldquoWearable Internetof ampings concept architectural components and promisesfor person-centered healthcarerdquo in Proceedings of the 7thInternational Conference on Wireless Mobile Communicationand Healthcare (MOBIHEALTH) pp 304ndash307 ViennaAustria 2014

[20] M Manas A Sinha S Sharma and M R Mahboob ldquoA novelapproach for IoT based wearable health monitoring andmessaging systemrdquo Journal of Ambient Intelligence and Hu-manized Computing vol 10 no 7 pp 2817ndash2828 2019

[21] L E Lima B Y L Kimura and V Rosset ldquoExperimentalenvironments for the Internet of ampings a reviewrdquo IEEESensors Journal vol 19 no 9 pp 3203ndash3211 2019

[22] F Javed M K Afzal M Sharif and B Kim ldquoInternet ofampings (IoTs) operating systems support networking tech-nologies applications and challenges a comparative reviewrdquoIEEE Sensors Journal vol 20 no 3 pp 2062ndash2100 2018

[23] K Kim H Kim and H Kim ldquoImage-based constructionhazard avoidance system using augmented reality in wearabledevicerdquo Automation in Construction vol 83 pp 390ndash4032017

[24] Z Bauer A Dominguez E Cruz F Gomez-Donoso S Orts-Escolano and M Cazorla ldquoEnhancing perception for thevisually impaired with deep learning techniques and low-costwearable sensorsrdquo Pattern Recognition Letters vol 137pp 27ndash36 2020


[25] U Lee K Han H Cho et al ldquoIntelligent positive computingwith mobile wearable and IoT devices literature review andresearch directionsrdquoAdHoc Networks vol 83 pp 8ndash24 2019

[26] M Ayaz M Ammad-uddin I Baig and E-H M AggouneldquoWireless sensorrsquos civil applications prototypes and futureintegration possibilities a reviewrdquo IEEE Sensors Journalvol 18 no 1 pp 4ndash30 2018

[27] A Kos V Milutinovic and A Umek ldquoChallenges in wirelesscommunication for connected sensors and wearable devicesused in sport biofeedback applicationsrdquo Future GenerationComputer Systems vol 92 pp 582ndash592 2019

[28] M Cagnazzo F Delfino L Vollero and A Zinicola ldquoTradingoff quality and complexity for a HVQ-based video codec onportable devicesrdquo Journal of Visual Communication andImage Representation vol 17 no 3 pp 564ndash572 2006

[29] A Chefi A Soudani and G Sicard ldquoHardware compressionscheme based on low complexity arithmetic encoding for lowpower image transmission over WSNsrdquo AEU-InternationalJournal of Electronics and Communications vol 68 no 3pp 193ndash200 2014

[30] M Chen Y Zhang and C Lu ldquoEfficient architecture ofvariable size HEVC 2D-DCT for FPGA platformsrdquo AEU-International Journal of Electronics and Communicationsvol 73 pp 1ndash8 2017

[31] A Madanayake R J Cintra V Dimitrov et al ldquoLow-powerVLSI architectures for DCTDWT precision vs approxi-mation for HD video biomedical and smart antenna ap-plicationsrdquo IEEE Circuits and SystemsMagazine vol 15 no 1pp 25ndash47 2015

[32] T K Araghi A A Manaf A Alarood and A B Zainol ldquoHostfeasibility investigation to improve robustness in hybridDWT+SVD based image watermarking schemesrdquo Advancesin Multimedia vol 2018 Article ID 1609378 2018

[33] H Persson A Brunstrom and T Ottosson ldquoUtilizing cross-layer information to improve performance in JPEG2000decodingrdquo Advances in Multimedia vol 2007 Article ID024758 2007

[34] B Yan T Pei and X Wang ldquoWavelet method for automaticdetection of eye-movement behaviorsrdquo IEEE Sensors Journalvol 19 no 8 pp 3085ndash3091 2019

[35] B D E Cherif and A Bendiabdellah ldquoDetection of two-levelinverter open-circuit fault using a combined DWT-NN ap-proachrdquo Journal of Control Science and Engineering vol 2018Article ID 1976836 2018

[36] A F R Guarda J M Santos L A da Silva CruzP A A Assunccedilatildeo N M M Rodrigues and S M M de FarialdquoA method to improve HEVC lossless coding of volumetricmedical imagesrdquo Signal Processing Image Communicationvol 59 pp 96ndash104 2017

[37] M L L De Faria C E Cugnasca and J R A AmazonasldquoInsights into IoT data and an innovative DWT-basedtechnique to denoise sensor signalsrdquo IEEE Sensors Journalvol 18 no 1 pp 237ndash247 2018

[38] F Han X Qiao Y Ma W Yan X Wang and X Pan ldquoGrassleaf identification using dbNwavelet and CILBPrdquoAdvances inMultimedia vol 2020 Article ID 1909875 8 pages 2020

[39] J G Escobedo and K B Ozanyan ldquoTiled-block image re-construction by wavelet- based parallel-filtered back-pro-jectionrdquo IEEE Sensors Journal vol 16 no 12 pp 4839ndash48462016

[40] D Bhowmik and C Abhayaratne ldquo2D+ t wavelet domainvideo watermarkingrdquo Advances in Multimedia vol 2012Article ID 973418 19 pages 2012

[41] P R Hill N Anantrasirichai A Achim M E Al-Mualla andD R Bull ldquoUndecimated dual-tree complex wavelet trans-formsrdquo Signal Processing Image Communication vol 35pp 61ndash70 2015

[42] T Brahimi F Laouir L Boubchir and A Ali-Cherif ldquoAnimproved wavelet-based image coder for embedded greyscaleand colour image compressionrdquoAEU-International Journal ofElectronics and Communications vol 73 pp 183ndash192 2017

[43] H Liu K-K Huang C-X Ren Y-F Yu and Z-R LaildquoQuadtree coding with adaptive scanning order for space-borne image compressionrdquo Signal Processing Image Com-munication vol 55 pp 1ndash9 2017

[44] R Banerjee and S Das Bit ldquoAn energy efficient imagecompression scheme for wireless multimedia sensor networkusing curve fitting techniquerdquo Wireless Networks vol 25no 1 pp 167ndash183 2019

[45] M Tukel A Yurdakul and B Ors ldquoCustomizable embeddedprocessor array for multimedia applicationsrdquo Integrationvol 60 pp 213ndash223 2018

[46] D G Costa ldquoVisual sensors hardware platforms a reviewrdquoIEEE Sensors Journal vol 20 no 8 pp 4025ndash4033 2020

[47] L Linguaglossa S Lange S Pontarelli et al ldquoSurvey ofperformance acceleration techniques for network functionvirtualizationrdquo Proceedings of the IEEE vol 107 no 4pp 746ndash764 2019

[48] G S Niemiec L M S Batista A E Schaeffer-Filho andG L Nazar ldquoA survey on FPGA support for the feasibleexecution of virtualized network functionsrdquo IEEE Commu-nications Surveys amp Tutorials vol 22 no 1 pp 504ndash525 2020

[49] P Shantharama A S ampyagaturu and M ReissleinldquoHardware-accelerated platforms and infrastructures fornetwork functions a survey of enabling technologies andresearch studiesrdquo IEEE Access vol 8 pp 132021ndash1320852020

[50] A J Ferrer J M Marques and J Jorba ldquoTowards thedecentralised cloud survey on approaches and challenges formobile ad hoc and edge computingrdquo ACM ComputingSurveys (CSUR) vol 51 no 6 2019

[51] M Mehrabi D You V Latzko H Salah M Reisslein andF H Fitzek ldquoDevice-enhanced MEC multi-access edgecomputing (MEC) aided by end device computation andcaching a surveyrdquo IEEE Access vol 7 pp 166 079ndash1661082019

[52] N Karakoc A Scaglione A Nedic and M Reisslein ldquoMulti-layer decomposition of network utility maximization prob-lemsrdquo IEEEACM Transactions on Networking vol 28 no 5pp 2077ndash2091 2020

[53] T Luo H-P Tan and T Q S Quek ldquoSensor OpenFlowenabling software-defined wireless sensor networksrdquo IEEECommunications Letters vol 16 no 11 pp 1896ndash1899 2012

[54] P Shantharama A S ampyagaturu N Karakoc et al ldquoLay-Back SDN management of multi-access edge computing(MEC) for network access services and radio resource shar-ingrdquo IEEE Access vol 6 pp 57 545ndash557 561 2018

[55] S Rein and M Reisslein ldquoLow-memory wavelet transformsfor wireless sensor networks a tutorialrdquo IEEE Communica-tions Surveys amp Tutorials vol 13 no 2 pp 291ndash307 2011

[56] S Rein and M Reisslein ldquoPerformance evaluation of thefractional wavelet filter a low-memory image wavelettransform for multimedia sensor networksrdquo Ad Hoc Net-works vol 9 no 4 pp 482ndash496 2011

[57] B K Mohanty A Mahajan and P K Meher ldquoArea- andpower-efficient architecture for high-throughput


implementation of lifting 2-D DWTrdquo IEEE Transactions onCircuits and Systems II Express Briefs vol 59 no 7pp 434ndash438 2012

[58] R K Bhattar K R Ramakrishnan and K S Dasgupta ldquoStripbased coding for large images using waveletsrdquo Signal Pro-cessing Image Communication vol 17 no 6 pp 441ndash4562002

[59] Y Hu and C C Jong ldquoA memory-efficient scalable archi-tecture for lifting-based discrete wavelet transformrdquo IEEETransactions on Circuits and Systems II Express Briefs vol 60no 8 pp 502ndash506 2013

[60] L Ye and Z Hou ldquoMemory efficient multilevel discretewavelet transform schemes for JPEG2000rdquo IEEE Transactionson Circuits and Systems for Video Technology vol 25 no 11pp 1773ndash1785 2015

[61] Y Hu and V K Prasanna ldquoEnergy- and area-efficient pa-rameterized lifting-based 2-D DWT architecture on FPGArdquoin Proceedings of the IEEE High Performance Extreme Com-puting Conference (HPEC) pp 1ndash6 MA USA 2014

[62] M Tausif A Jain E Khan and M Hasan ldquoLow memoryarchitectures of fractional wavelet filter for low-cost visualsensors and wearable devicesrdquo IEEE Sensors Journal vol 20no 13 pp 6863ndash6871 2020

[63] C Zhang C Wang and M O Ahmad ldquoA pipeline VLSIarchitecture for fast computation of the 2-D discrete wavelettransformrdquo IEEE Transactions on Circuits and Systems IRegular Papers vol 59 no 8 pp 1775ndash1785 2012

[64] B K Mohanty and P K Meher ldquoMemory-efficient high-speed convolution-based generic structure for multilevel 2-DDWTrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 23 no 2 pp 353ndash363 2013

[65] Y Zhang H Cao H Jiang and B Li ldquoMemory-efficient high-speed VLSI implementation of multi-level discrete wavelettransformrdquo Journal of Visual Communication and ImageRepresentation vol 38 pp 297ndash306 2016

[66] C-H Hsia J-S Chiang and J-M Guo ldquoMemory-efficienthardware architecture of 2-D dual-mode lifting-based dis-crete wavelet transformrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 23 no 4 pp 671ndash6832013

[67] G Hegde K S Reddy and T K Shetty Ramesh ldquoA newapproach for 1-D and 2-D DWT architectures using LUTbased lifting and flipping cellrdquo AEU - International Journalof Electronics and Communications vol 97 pp 165ndash1772018

[68] M M A Basiri and S N Mahammad ldquoAn efficient VLSIarchitecture for lifting based 1D2D discrete wavelet trans-formrdquo Microprocessors and Microsystems vol 47 pp 404ndash418 2016

[69] C Wang J Zhou L Liao et al ldquoNear-threshold energy- andarea-efficient reconfigurable DWPTDWT processor forhealthcare-monitoring applicationsrdquo IEEE Transactions onCircuits and Systems II Express Briefs vol 62 no 1 pp 70ndash742015

[70] T Wang P Huang K Chen et al ldquoEnergy-efficient con-figurable discrete wavelet transform for neural sensing ap-plicationsrdquo in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS) pp 1841ndash1844Melbourne Australia June 2014

[71] G Kumar N Balaji N Balaji K Reddy and V ampanujaldquoPower and area efficient radix-8 booth multiplier for 2-DDWT architecturerdquo International Journal of Intelligent En-gineering and Systems vol 12 no 3 pp 148ndash155 2019

[72] J Franco G Bernabe J Fernandez and M Ujaldon ldquoampe 2Dwavelet transform on emerging architectures GPUs andmulticoresrdquo Journal of Real-Time Image Processing vol 7no 3 pp 145ndash152 2012

[73] V Galiano O Lopez M P Malumbres and H MigallonldquoParallel strategies for 2D Discrete Wavelet Transform inshared memory systems and GPUsrdquo 7e Journal of Super-computing vol 64 no 1 pp 4ndash16 2013

[74] T Ikuzawa F Ino and K Hagihara ldquoReducingmemory usageby the lifting-based discrete wavelet transform with a unifiedbuffer on a GPUrdquo Journal of Parallel and Distributed Com-puting vol 93-94 pp 44ndash55 2016

[75] W J van der Laan A C Jalba and J B T M RoerdinkldquoAccelerating wavelet lifting on graphics hardware usingCUDArdquo IEEE Transactions on Parallel and Distributed Sys-tems vol 22 no 1 pp 132ndash146 2011

[76] M Kucis D Barina M Kula and P Zemcik ldquo2-D discretewavelet transform using GPUrdquo in Proceedings of the Inter-national Symposium on Computer Architecture and HighPerformance Computing Workshop pp 1ndash6 Paris France2014

[77] T M Quan and W-K Jeong ldquoA fast discrete wavelettransform using hybrid parallelism on GPUsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 27 no 11pp 3088ndash3100 2016

[78] R Khemiri F Sayadi M Atri and R Tourki ldquoMatLab ac-celeration for DWT rdquodaubechies 97rdquo for JPEG2000 standardon GPUrdquo in Proceedings of the Global Summit on Computer ampInformation Technology (GSCIT) pp 1ndash4 Sousse Tunisia2014

[79] G Suseela and Y Asnath Victy Phamila ldquoEnergy efficientimage coding techniques for low power sensor nodes a re-viewrdquo Ain Shams Engineering Journal vol 9 no 4pp 2961ndash2972 2018

[80] H Sun and Y Q Shi Image and Video Compression forMultimedia Engineering Fundamentals Algorithms andStandards CRC Press Boca Raton FL USA 2nd edition2008

[81] H ZainEldin M A Elhosseini and H A Ali ldquoA modifiedlistless strip based SPIHT for wireless multimedia sensornetworksrdquo Computers amp Electrical Engineering vol 56pp 519ndash532 2016

[82] W Sweldens ldquoampe lifting scheme a construction of secondgeneration waveletsrdquo SIAM Journal on Mathematical Anal-ysis vol 29 no 2 pp 511ndash546 1998

[83] S A Rein F H P Fitzek C Guhmann and T SikoraldquoEvaluation of the wavelet image two-line coder a lowcomplexity scheme for image compressionrdquo Signal ProcessingImage Communication vol 37 pp 58ndash74 2015

[84] A Skodras C Christopoulos and T Ebrahimi ldquoampe JPEG2000 still image compression standardrdquo IEEE Signal Pro-cessing Magazine vol 18 no 5 pp 36ndash58 2001

[85] A D Darji S S Kushwah S N Merchant andA N Chandorkar ldquoHigh-performance hardware architec-tures for multi-level lifting-based discrete wavelet transformrdquoEURASIP Journal on Image and Video Processing vol 20142014

[86] G Savic and V Rajovic ldquoNovel memory efficient hardwarearchitecture for 53 lifting-based 2D Inverse DWTrdquo Journal ofCircuits Systems and Computers vol 28 no 7 2018

[87] S M Aziz and D M Pham ldquoEfficient parallel architecture formulti-level forward discrete wavelet transform processorsrdquoComputers amp Electrical Engineering vol 38 no 5 pp 1325ndash13352012


[88] M Tausif E Khan M Hasan and M Reisslein ldquoSMFrWFsegmentedmodified fractional wavelet filter fast low-memorydiscrete wavelet transform (DWT)rdquo IEEE Access vol 7 pp 84448ndash84 467 2019

[89] R Li H Liu Y Zeng and Y Li ldquoBlock compressed sensing ofimages using adaptive granular reconstructionrdquo Advances inMultimedia vol 2016 2016


[33] is popular in a wide variety of applications includingactivity monitoring [34] fault detection in inverter circuits[35] medical imaging [36] image denoising [37] imagerecognition [38] image reconstruction [39] watermarking[40] computer graphics and real-time processing [41] dueto its multiresolution feature and excellent energy com-paction properties [42 43]

ampe hardware architectures for wearable visual sensorsand portable imaging devices in the IoT and wirelessmultimedia sensor networks should require minimalhardware resources and consume low energy for a smallform factor and long battery life [44 45] Generally thecomputational capabilities of visual sensor nodes have beenincreasing in recent years [46] Nevertheless due to theeconomic pressures on visual sensor designs and despite theemergence of specialized hardware acceleration eg FPGAand components [47ndash49] the computational resources ofvisual sensors will likely remain scarce Emerging computingand communication paradigms such as mobile ad hoc cloudcomputing [50 51] expect the nodes to not only transmitsensed images but also to participate in some servicecomputing functions eg for localized image analysis anddecision making which can be orchestrated through soft-ware-defined networking and control structures [52ndash54] Inorder to make the economical functioning of wearable visualsensors in such networked systems feasible the resourceusage of the image coding and transform must be very lowIn particular as the DWT is an important component of animage coder for visual sensors the DWT hardware archi-tecture should have minimal area and energy consumption

12 Related Work ampe conventional convolution-basedDWT computation of an image requires a huge amount ofmemory due to its row- and column-wise scanning [55 56]making it unsuitable for memory-constrained wearablesensors ampe different low-memory architectures reported inthe literature for computation of DWTcan be categorized asline-based architectures [57] stripe-based architectures[58 59] block-based architectures [60 61] and the frac-tional wavelet filter (FrWF) architecture [62] For an imageof dimension J times J pixels the line stripe and block-basedarchitectures require random access memory (which werefer to as RAM or memory for brevity) in the range of 3J to55J words while the FrWF architecture requires 2J + 22words of RAM [62]

Another low-memory pipeline-based architecture hasbeen proposed in [63] However the design in [63] is basedon the nonseparable DWT computation approach which isunpopular because of its higher computational requirementsthan the conventional separable approach It is a well-knownfact that at a particular throughput the separable 2D DWTcomputation approach is computationally more efficientthan the nonseparable approach [64] A dual data scanning-based DWT architecture is reported in [65] In this archi-tecture several 2D DWT units are combined into a parallelmultilevel architecture for computing up to six DWT levelsHowever this architecture needs 13J words of memory Anarchitecture based on an interlaced read scan algorithm

(IRSA) is proposed in [66] in conjunction with a lifting-based approach with a 53 filter-bank which requires 2J

words of memory However the long critical path delay(CPD) of 2Tm + 4Ta (where Tm is the multiplier delay andTa is the adder delay) of the architecture in [66] may limit itsuse in real-time applications

An LUT-based lifting architecture for computing theDWT has been reported in [67]ampe design [67] has low areaand power requirements However it has a long CPD equalto TLUT + (W4 minus 1)TFA + 2Ta (where TLUT is the look-uptable (LUT) delay W 16 bits is the word length and TFA isthe full adder delay) A lifting-based architecture for com-puting both the 1D and 2DDWT has been presented in [68]However this design uses a transpose buffer of size J2 Anenergy-efficient block-based DWT architecture has beenproposed in [61] However this architecture requires a largenumber of multipliers namely 16 and 36 multipliers for 53and 97 filters respectively Another energy-efficient lifting-based reconfigurable DWT architecture has been proposedin [69] mainly for medical applications However thefrequency of operation of this architecture is limited to20MHz An energy-efficient lifting-based configurableDWT architecture for neural sensing applications has beenproposed in [70] requiring 12 adders and 12 multipliersHowever its operating frequency is limited to only 400KHzand 80KHz for the gating and interleaving architecturesused in the main architecture respectively

A power-efficient modified form of the DWT architec-ture has been presented in [71] using Radix-8 boothmultipliers ampis architecture uses bit truncation to reducethe area and power However bit truncation degrades thequality of the reconstructed image when the inverse DWT isapplied ampere have been some DWT implementations ongraphics processing units (GPUs) [72ndash78] however GPUsare relatively expensive for low-cost sensing platforms

ampe recently proposed FrWF architecture requires only2J + 22 words of memory and has a CPD equal to the delayof a multiplier Tm [62] A multiplier-less FrWF architecturewas also reported in [62] which reduces the CPD to the delayof an adder Ta Ta ltTm However the FrWF architecture(with and without multipliers) has high energy consumptionowing to its large number of compute cycles ampe highenergy consumption of the FrWF architecture may beprohibitive for wearable sensors and portable imaging de-vices with tight memory and energy constraints [79]

13 Contributions and Structure of 7is Article ampis paperproposes the LFrWFm a novel lifting-based energy-efficientarchitecture to compute the DWT coefficients of an imagewith a 53 filter-bank At the core of the proposed LFrWFmarchitecture is a novel basic Lift_block that computes the H

and L subband coefficients with only two two-input addersand one multiplier (plus two pipeline registers) thus greatlyreducing the hardware requirements compared to priorconvolution architectures Moreover a multiplier-lessimplementation of the proposed architecture denoted byLFrWFml is designed ampe multiplier-less LFrWFml has ashorter CPD than the proposed multiplier-based LFrWFm




2 Background



a(i) 1113944

jlfloorf12rfloor


J

2minus 1

d(i) 1113944

jlfloorf22rfloor


J

2minus 1

(1)






d0 x0 + x2( 1113857α + x11113858 1113859G0

a1 d0 + d1( 1113857β + x21113858 1113859G1(2)


















H0 P0 + P2( 1113857α + P1 (3)

H1 P2 + P4( 1113857α + P3 (4)

L0 H0 + H1( 1113857β + P2 (5)

G1

x0 x1 x2 x3 x4 x5 x6

α α α α α α

β β β β β β

d0 d1 d2

a0 a1 a2 a3

G0


Imag

e pix

els

P_store Li_block


HH_store

HL_store

LH_store

LL_store To ex

tern

al m

emor

yca

rdS

D-c

ard

Data-path










+

+

timesPipeline

Register

Out1

Input3

Liftpar

Input2Input1


X

Select

+

X

Select

+

X

Select

+

X

Select

+



HH subband


HL subband


LH subband


LL subband


LL_store

LH_store

HL_store

HH_store

Pipelineregister

Pipelineregister

Pipelineregister

Pipelineregister

Conv_block

hndash1h0h1


hndash1h0h1



















MemLFrWF seg J

G+ 2J + 2lfloor

f1



Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)











Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References


































































































2 Background



a(i) 1113944

jlfloorf12rfloor


J

2minus 1

d(i) 1113944

jlfloorf22rfloor


J

2minus 1

(1)






d0 x0 + x2( 1113857α + x11113858 1113859G0

a1 d0 + d1( 1113857β + x21113858 1113859G1(2)


















H0 P0 + P2( 1113857α + P1 (3)

H1 P2 + P4( 1113857α + P3 (4)

L0 H0 + H1( 1113857β + P2 (5)

G1

x0 x1 x2 x3 x4 x5 x6

α α α α α α

β β β β β β

d0 d1 d2

a0 a1 a2 a3

G0


Imag

e pix

els

P_store Li_block


HH_store

HL_store

LH_store

LL_store To ex

tern

al m

emor

yca

rdS

D-c

ard

Data-path










+

+

timesPipeline

Register

Out1

Input3

Liftpar

Input2Input1


X

Select

+

X

Select

+

X

Select

+

X

Select

+



HH subband


HL subband


LH subband


LL subband


LL_store

LH_store

HL_store

HH_store

Pipelineregister

Pipelineregister

Pipelineregister

Pipelineregister

Conv_block

hndash1h0h1


hndash1h0h1



















MemLFrWF seg J

G+ 2J + 2lfloor

f1



Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)











Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References












































































































H0 P0 + P2( 1113857α + P1 (3)

H1 P2 + P4( 1113857α + P3 (4)

L0 H0 + H1( 1113857β + P2 (5)

G1

x0 x1 x2 x3 x4 x5 x6

α α α α α α

β β β β β β

d0 d1 d2

a0 a1 a2 a3

G0


Imag

e pix

els

P_store Li_block


HH_store

HL_store

LH_store

LL_store To ex

tern

al m

emor

yca

rdS

D-c

ard

Data-path










+

+

timesPipeline

Register

Out1

Input3

Liftpar

Input2Input1


X

Select

+

X

Select

+

X

Select

+

X

Select

+



HH subband


HL subband


LH subband


LL subband


LL_store

LH_store

HL_store

HH_store

Pipelineregister

Pipelineregister

Pipelineregister

Pipelineregister

Conv_block

hndash1h0h1


hndash1h0h1



















MemLFrWF seg J

G+ 2J + 2lfloor

f1



Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)











Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References







































































































+

+

timesPipeline

Register

Out1

Input3

Liftpar

Input2Input1


X

Select

+

X

Select

+

X

Select

+

X

Select

+



HH subband


HL subband


LH subband


LL subband


LL_store

LH_store

HL_store

HH_store

Pipelineregister

Pipelineregister

Pipelineregister

Pipelineregister

Conv_block

hndash1h0h1


hndash1h0h1



















MemLFrWF seg J

G+ 2J + 2lfloor

f1



Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)











Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References













































































































MemLFrWF seg J

G+ 2J + 2lfloor

f1



Mem53 filtLFrWF seg

J

G+ 2J + 13words (8)











Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References



































































































Select

Adderp3

AdderTo RAM

BufferClearLoad


Two shift (lndash1)


Two shift (l1)


Three shift and

complement (l-2)

Two shift(l-1)


(l0)

Two shift (l1)

Three shift and

complement (l2)

p1

p2

p4

p5


Select

Adder AdderTo RAM

BufferClearLoad


No shi (h1)


One shiand

one complement(h0)

No shi (h1)

One shiand

one complement(h2)

p5

p4

p3


JG JG JG

hellip

JG

hellip

[f12]

[f12][f12]

[f12] [f12] [f12]

Firstsegment

Secondsegment

gthsegment

Lastsegment














RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References











































































































RMA [85] 12 6 2Ta + Tm 65J


Aziz [87] 20 10 2Ta 4J








5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References





































































































5 Conclusion




FrWFml LFrWFml




4540353025201510

50

EDP

(pJs

)FrWFm LFrWFm


LFrWFm(G = 4)

LFrWFm(G = 8)



3

25

2

15

1

05

0

EDP

(pJs

)

FrWFm LFrWFm(G = 1)

LFrWFm(G = 2)

LFrWFm(G = 4)

LFrWFm(G = 8)







Data Availability




References



































































































Data Availability




References











































































































































































































Documents

Lifting-BasedFractionalWaveletFilter:Energy-EfficientDWT ...tionalwaveletﬁlter(FrWF)architecture[62].Foranimage ofdimension J × J pixels,theline,stripe,andblock-based architectures