Design and Implementation of Fast FPGA Based (1)

7/24/2019 Design and Implementation of Fast FPGA Based (1)

1/6

Design and Implementation of Fast FPGA Based

Architecture for Reversible WatermarkingSudip Ghosh

1*, Bijoy Kundu

2, Debopam Datta

3, Santi P Maity,

4and Hafizur Rahaman

1,4

1School of VLSI Technology (Bengal Engineering and Science University at Shibpur, India)

2

Dept. of Electronics and Telecommunication (Bengal Engineering and Science University at Shibpur, India)3Dept. of Electrical and Computer Engineering (University of Illinois at Chicago, USA)

4Dept. of Information Technology (Bengal Engineering and Science University at Shibpur, India)

*E-mail: [email protected]

AbstractThere are diverse hardware realization for digital

watermarking of multimedia proposed in the literature. This

paper focuses on the design and implementation of a fast

FPGA(Field Programmable Gate Array) based architecture

using reversible contrast mapping (RCM) based image

watermarking algorithm. The specialty of this architecture

attracts to the fact of clock-less encoder design and

implementation which makes the design faster. The encoder

module response time is independent of clock frequency, so the

embedding of the watermark is possible as soon as the input isfetched. The schematic based design and implementation of the

VLSI architecture have been done with Xilinx 14.1 on Spartan

3E FPGA family. The encoder requires 528 4-input LUTs and

303 slices. On the contrary, the decoder requires 613 LUTs and

347 slices. The maximum clock frequency of the decoder is 45

MHz. The results show the viability of low cost, high speed real-

time use of the proposed VLSI architecture.

Keywords- VLSI Architecture,Reversible Watermarking,

FPGA .

I. INTRODUCTION

Digital watermarking [1] is an efficient tool to prevent

unauthenticated use of data. Digital watermarks may be usedto verify the authenticity or integrity of the original data.

Nowadays, it is prominently used for tracing copyrightinfringements and for banknote authentication. Digitalwatermarking is broadly classified depending on the type ofsignal like audio watermarking, image watermarking, videowatermarking, and database watermarking etc. The presentwork is focused on image watermarking.

In image watermarking, the digital information (like adigital image, a digital signature or a random sequence of

binary numbers) is embedded into an image. The embeddedinformation may or may not be perceptible after watermarkingand therefore falls into the category of visible or invisible

watermarking respectively. Depending on the robustness ofthe watermark, it can also be categorized as robust or fragilewatermarking[2].

One limitation of watermarking-based authenticationschemes is the distortion inflicted on the host media by theembedding process. Although the distortion is ofteninsignificant, it may not be acceptable for some applications,especially in the areas of medical imaging and militaryapplications. Therefore, watermarking scheme capable ofremoving the distortion and recovering the original mediaafter passing the authentication is desirable. Schemes with this

capability are often referred to as reversible watermarkingschemes [3].Various Reversible Watermarking techniqueshave been proposed with different type of algorithm [4]-[5].Popular techniques of reversible watermarking are: i)Difference Expansion, ii) Histogram bin Shifting, iii) Datahiding using Integer Wavelet Transform, iv) ContrastMapping, and v) Integer Discrete Cosine Transform.Usually, areversible scheme performs some type of lossless compressionoperation on the host media in order to make space for hidingthe compressed data and the Message Authentication Code(MAC) (e.g., hash, signature, or some other feature derivedfrom the media) used as the watermark [6]. To authenticate thereceived media, the hidden information is extracted and thecompressed data is decompressed to reveal the possibleoriginal media. MAC is then derived from the possibleoriginal media. If the newly derived MAC matches theextracted one, the possible original media is deemedauthentic/original.

However, in this paper a reversible watermarking techniqueis implemented using a specific transform reported by Coltucet. al. in [5]. The choice of this technique includes its lowcomputational complexity and robustness. The primary goal ofthe proposed design is to achieve high speed hardwareefficient VLSI architecture. The RCM technique was firstimplemented in Matlab to verify the algorithm and analyzevarious design constraints. Later the desired architecture isestablished in FPGA using Xilinx. The paper is organized bystarting with an abstract followed by section I with anintroduction. Section II describes the related works. Nextsection III reports the proposed VLSI architecture ofreversible watermarking followed by the analysis andexperimental results in section IV, finally the work isconcluded in section V with references.

II. RELATED WORKS

In the scheme proposed by Fridrich et al. [1], DiscreteCosine Transform (DCT) technique has been implemented.128-bit hash of all the DCT coefficients is used as thewatermark. The extracted compressed bit-stream is used forverification;however, the hash contains only the signature ofthe image, with no local information.Therefore, despite itssimplicity and ability to detect inauthenticity, this technique isunable to locate the position where the tampering has beendone. Van Leest et.al.[3] proposed another reversiblewatermarking scheme based on a transformation function thatintroduces gaps in the image histogram of image blocks.

2013 International Conference on Electrical Information and Communication Technology (EICT)

978-1-4799-2299-4/13/$31.00 2013 IEEE


2/6

One drawback of this scheme is its need for the overheadinformation and the protocol to be hidden in the image.Moreover, a potential security loophole in the scheme is thatgiven the fact that the computational cost for extracting thewatermark is insignificant; an attacker can defeat the scheme

by exhausting all the 256 possible gray level assuming that thegray level being tried is the gap. In [5], Coltuc et al. proposeda Reversible Contrast Mapping (RCM) based algorithm of

reversible watermarking in the spatial domain. It provides ahigh data embedding bit-rate at a very low mathematicalcomplexity. The proposed scheme does not need anyadditional data compression but is able to recover the originalimage even after alterations in the encoded data.

Over the last decade, a lot of research is performed onReversible Watermarking, however, VLSI implementation ofRCM based approach is still an area to be explored. In this

paper, the advantages of RCM based watermarking techniquehave been explored and implemented. Major concentration isgiven on developing a low cost, high speed VLSI architecturethat can be used for real-time applications. Some significanthardware implementations of digital watermarking include the

work [8], [9], [10], [11]. Mohanty et al.[9]concentrated on aspatial-domain invisible-fragile watermarking and theirarchitecture. But these designs are seriously constrained due totheir hardware complexities. In [12], a hardware architecturethat can insert two visible watermarks in images in the spatialdomain is introduced. The main objective of the proposedarchitecture was to decrease the hardware complexity keepingthe performance intact. Employing the advantages of RCMtechnique, a low cost hardware efficient VLSI implementationof RCM based RW has been presented in this paper.

III. PROPOSED VLSI ARCHITECTURE OF REVERSIBLE

WATERMARKING

The implementation of the watermarking algorithm is doneusing the ISE Design Suite of Xilinx for Spartan 3E FPGAfamily. FPGA, because of its advantages like re-configurability, low cost and simpler design process, is usedfor the hardware implementation. The entire watermarkingarchitecture design involved construction of two main blocks,the encoder and the decoder. Each of the blocks is furtherdivided into three sub-blocks named as module 1, module 2,and module 3. Each of these modules is designed individuallythrough modularization and later interfaced with each other.The encoder and decoder were designed and simulatedseparately.Both the encoder and decoder designs are described in detailwith their respective modules in the following subsections 1

and 2.1. ENCODER:

In the proposed architecture, the encoder part is designed in

three stages as given in Fig. 1.

Image Acquisition and Pixel Transform:

The proposed architecture is implemented and optimized for8 bit gray image. The source to the encoder, which is basicallya device providing image pixel as input, can be a storagedevice like a RAM or direct external input by the user in 8 bit

digital form. In the proposed architecture, original image datais stored in a 256 byte RAM (eight 32-word by 8-bit SRAM).As discussed in [5], a specific transformation technique is

performed on the image involving a pair of pixels. Among

various ways of acquiring these pixels from the source,sequential column wise fetching (each element of a particularrow and column is an 8-bit pixel value represented by an 8-bitaddress) from the memory is carried out in this design. The 8-

bit pixel value read from the memory is then converted to a10-bit data (adding zero at the 9

th and 10

th bit position) to

provide the correct form of input for pixel transformation. Thetransformation technique [5] is mathematically given by,

(1) (2)Where A, B are the pair of input pixels of the original imageand A

transform, B

transformare the pair of transformed pixels. This

transformation technique allows error free transmission and

A

B

Atransform

Btransform

X

Y

Z

OUTA

OUTB

A

B

Atransform

Btransform

Atransform

(9:0)

Btransform

X

Y

Z

MODULE 1

MODULE 2

Fig. 1. Data flow path in encoder

MODULE 3

SUB SUB

SUB SUB

A

B

B

A

A

B

Atransform

Btransform

256 byte

RAM

Address decoderFig. 2. Data flow path in image acquisition and pixel transform

module


3/6

detection of image both at the transmitter and receiver endrespectively [5].Fig. 2 shows the data flow path in image acquisition and pixel

transform module gives the pixel transform module

implemented by using only two subtractor modules for each

pixel. 10 bit subtractor ensured signed subtraction using twos

complement logic and also prevented overflow.

Control Signal:As mentioned earlier, the watermarking (embeddingwatermark image data into original image) algorithm is

performed on image pixels constrained to a particular domain,Dc, of the transformed pairs [5]. Domain Dc of transformed

pixels of the original image is defined such that the pair oftransformed pixels, Atransform& Btransform, belong to [0, L] whereL takes values from 0 to 254 leaving 1. The domain Dc

prevents underflow and overflow as well as removesambiguous pairs. It also ensures robust error free transmissionof the watermarked image. This module generates controlsignals that are essential to carry out data embedding processwhich include determining Dc along with other essential

control signals. As mentioned by Coltuc et. al. in [5], threedistinct groups are made partially depending on Dcwhich aredetermined distinctly by three control signals (X, Y, and Z) inFig. 3.

The generation of these control signals is briefed below.

A low logic level, 0, of X is generated when pair

Atransform and Btransform belongs to Dc and each of them is

even.

A high logic level, 1, of Y is generated when pair

Atransformand Btransformbelongs to Dcand is odd.

A high logic level, 1, of Z is generated when the pair

does not belong to Dc.

These control signals are generated completely using logical

gates as shown on Fig. 3. The 10th

bit determines the polarity

(either positive or negative) of the transformed pair while the

9th

bit determines if the transformed pair is below 255. The

LSB determines whether it is even or odd.

Data Embedding:

The circuit generating performing the task of watermarkimage embedding in the original image is given in Fig. 4. Thismodule uses the control signals as input to selectively performwatermarking depending on the control signals. The controlsignals determine whether the pixels are to be transformed

before embedding the watermark sequence into the originalimage. The LSB of the 2

nd pixel among the pair is used to

embed the watermark image while the transformation

information (whether pixels are transformed or not) isembedded into the LSB of the 1

st pixel. The watermarking

algorithm in terms of the control signals is given in pseudocode as follows.

Watermarking based on control signals:

When X=0, pass pair Atransform and Btransform forwatermarking. Set LSB of Atransform to 1 and LSB ofBtransformreplaced by watermark image.

When Y=1, pass pair A and B for watermarking. SetLSB of A to 0 and LSB of B replaced by watermarkimage.

When Z=1, watermarking step is skipped. Set LSB of Ato 0 and the original image pixels are transmitted.

2. DECODER:

The decoder block is structured similarly like the encoderblock. It is comprised of the three modules, the signalgeneration block, the inverse transform block, and the imageand watermark extraction block. The entire decoderarchitecture is given in Fig. 5. Following sections from (i)-(iii) give a detailed hardware description of the individualmodules of the decoder.

Control Signal:

Similar to the encoder part, the control signals are generatedfrom the 8-bit input data received from the transmitter i.e. the

encoder. The watermark image data as well as the

transformation information has been embedded into the LSBof the transmitted pairs. Therefore, the preliminary task of this

module is to extract LSB of both the received pairs. The LSB

of the WIA received signal contained the transformation

information and WIB contained the watermark data. The

Fig. 3. Circuit diagram of the control signal module of Encoder


4/6

primary task of this module is the generation of the signal

labeled as Dcwhich checks if the corresponding signal in the

encoder input belonged to the domain Dc. The image

transform module, previously used in encoder, followed by theDc check block and few logical blocks generate this signal.

The LSB of WIA determines if the received pair was

transformed. If the LSB, WIA(0), is equal to logical 1 then

the pair was transformed. Consequently the pair is fed to the

inverse transform module; else it is passed on to the Dc check

block. This control is achieved by the 8-bit 2-1MUX andDEMUX pair. If the generated signal Dc is satisfied, then the

pair corresponds to one of the odd pairs transmitted by the

encoder. It is then passed forward for further processing.

Inverse Transform:

This module is the most important part of the decoder and

consumes major processing time. The received transformedpairs are performed inverse transform to get original image

pixel. As mentioned in [5], the inverse transform is achieved

by the mathematical expressions as given below:

(3)

(4)

where, denotes the ceil function (the smallest integergreater than or equal to x).From (3) & (4), the above inverse transform can be executed

by addition and division without using multiplier. Addition is

performed twice followed by division by 3. All of these tasks

are executed by the inverse transform block in Fig. 5. Keeping

the cost constraint in mind, the division is performed by

repetitive subtraction method limiting the overall decodingspeed. However, the delay due to other combinatorial blocks is

low enough to facilitate a high frequency clock. The divider

used in this module had to be 10-bit because the upper limit of

Atransformand Btransform(inputs WIA and WIB at the decoder) is

255 which could result a 10 bit input at the divider. The ceil

function is achieved by a check operation performed on the

two LSBs, 0thand 1stbit, followed by adder block. In Fig. 5,the check is performed by a single OR gate, and the output is

added with the counter output, quotient, of the divider block.

Image and Watermark Extraction:

As mentioned earlier, watermark extraction isentirely

performed depending on the control signals. The watermarkimage, embedded into the LSB of WIB, is extracted by using

the signal labeled Watermark_seq_sig in Fig. 5. This signal

determines the WIB that contains the embedded watermark

sequence. The watermark sequence is stored in a 128 byte

RAM (four 32-word by 8-bit SRAM). The size of the storage

device depends on the size of the watermark image. The signal

labeled Watermark_seq_sig is applied to the Write Enable

(WE) input to the RAM as shown in Fig. 5. The A and B

output signals from the 8 bit MUX forms the extracted image.

The select line of this MUX is generated from the LSB of the

WIA and carry out (Cout) of the inverse transform block as

shown in Fig. 5.

IV. ANALYSIS AND EXPERIMENTAL RESULTS

The simulation and implementation of the entire architecture

is carried out in ISE Design suite and other tools of Xilinx.

The hardware is optimized in terms of hardware cost. A

size binary watermark image is used to perform the

watermark embedding and its extraction. The watermarking is

performed on an 8 bit gray image of size . Theexperimental results and their analysis are summarized in

following part of this section.

1. Encoder Results :

As discussed earlier, the watermark encoding process is

carried by important blocks like the transform block, control

signal block and the final watermark image embedding block.

Considering the hardware complexity of these blocks, they

were designed and implemented separately and then integrated

Fig. 4. Circuit diagram of the watermark data embedding module of encoder


5/6

to perform the desired operation. The multiplier requirement

of the transform operation of the pair of pixels is achieved

using four subtractors in order to maintain hardwareefficiency. The complex watermark embedding operation is

very efficiently performed by using the control signals

generated from combinational logical gates. Finally, the

watermark embedding is realized using customized

multiplexers and logical blocks. The implementation of the

entire encoder required 303 slices and 528 four input LUTs.The hardware utilization of encoder along with its sub-blocks

is given in Table I.

TABLEI:DEVICE UTILIZATION SUMMARY (ESTIMATED VALUES)

Logic Utilization Different modules

Pixel

transform

Control

signal

Watermark

embedding

Watermarking

encoder

Number of 4 input LUTs 432 17 19 528

Number of occupied Slices 268 9 10 303

Number of bonded IOBs 152 23 50 230

The encoder module is practically devoid of any clock signals

as the module is realized only using combinatorial and logical

elements. As a result, the watermarking process is fastalthough some intensive operations are being performed. The

clock less architecture can be extended by incorporating

parallel processing and pipelining, enabling the system to be

highly effective in real time applications like in digital

cameras, printers, medical and military applications etc.A concern of the implemented encoder is the combinational

path delay whose maximum value is found out to be 31.642ns.

However, there is a high scope of reducing this delay by

employing pipeline architecture.

2. Decoder results:

The decoder module is having a higher complexity as

compared to the encoder mainly because of the pixel inverse

transform and the recovery of the original pixels. As a result,

the hardware requirement is also higher that the encoder part

which is detailed in Table II. The divider (division by 3) used

for pixel inverse transformation is application specific and is

of subtraction followed by right shifting type. It require only613, 4-input LUTs, 347 slices and 56 slice flip-flops.This

transform module also involved 4 adders, 1 subtractor, 8 bit

counter and multiplexers.

TABLEII:DEVICE UTILIZATION SUMMARY (ESTIMATED VALUES)

Logic Utilization Different modules

pixel inverse

transform

control

signal

Watermark

extraction(decoder)

Number of Slice Flip Flops 48 8 56

Number of 4 input LUTs 217 13 613

Number of occupied Slices 125 8 347

Number of bonded IOBs 42 20 37

Because of the hardware complexity, the implemented

architecture is prone to lower response time. The incorporation

of the pipelined architecture facilitated in reducing the delay to

a minimum of 22.663 ns. The maximum clock frequency of

the watermark extraction module is 45 MHz.

V. CONCLUSION

This paper focuses on the design and implementation of a fast

FPGA(Field Programmable Gate Array) based architecture

using reversible contrast mapping (RCM) based imagewatermarking algorithm. To the best of our knowledge, prior

research on RCM based watermarking algorithm with its

VLSI implementation is very shallow. This limited the

comparison of this hardware implementation with others and

hence sole significance has been summarized. The encoder

requires528 4 input LUTs and 303 slices. On the contrary, the

decoder requires 613 LUTs and 347 slices. The encoder

module is practically independent of clock, so the embeddingof the watermark is possible as soon as the input is fetched.

This feature along with low hardware cost facilitates the

Fig. 5. Circuit diagram of the decoder comprised of all necessary modules


6/6

prospect of its use in real-time applications like digital

cameras, medical and military applications. The hardwarecomplexity of the decoder module is higher compared to the

encoder module because of the division followed by the ceil

function in inverse transform module. The maximum clock

frequency of the decoder is 45 MHz.The design is fast, low

cost and easily implementable for real time watermarking.

REFERENCES[1] Fridrich, J., Goljan, M., & Du, R. (2001). Invertible authentication

watermark for JPEG images. Proceeding of the IEEE InternationalConferenceon Information Technology, 223227.

[2] Cox, I., Miller, M., & Jeffrey, B. (2002). Digital watermarking:Principlesand practice. Morgan Kaufmann.

[3] Van Leest, A., Van der Veen, M., & Bruekers, F. (2003). Reversibleimage watermarking.Proceedings of the IEEE International ConferenceonImage Processing,II, 731734.

[4] J. Tian. Wavelet-based reversible watermarking for authentication. In E.J. Delp III and P. W. Wong, editors, Security and Watermarking ofMultimedia Contents volume 4675 of Proc. of SPIE, pages 679-690, Jan.2002

[5] Coltuc, D., Chassery, J.M.: Very Fast Watermarking by ReversibleContrast Mapping. IEEE Signal Processing Letters 14, 255258 (2007).

[6]

Juergen Seitz. Digital Watermarking for digital media, University ofCooperative Education Heidenheim, Germany, 2005

[7] M. U. Celik, G. Sharma, A. M. Tekalp, and E. Saber. Reversible datahiding. In Proc. of International Conference on Image Processing,volume II, pages 157-160, Sept. 2002.

[8] Mohanty SP, Ranganathan N, Namballa RK. VLSI implementation ofinvisible digital watermarking algorithms towards the development of asecureJPEG encoder. In: Proceedings of the IEEE workshop on signalprocessing systems; 2003. p. 1838.

[9] Mohanty SP, Kougianos E, Ranganathan N. VLSI architecture and chipfor combined invisible robust and fragile watermarking. IETComputDigital Tech(CDT) 2007;1(5):60011.

[10] Mohanty SP, Nayak S. FPGA based implementation of an invisible-robust image watermarking encoder. In: Lecture notes in computerscience, vol.3356; 2004. p. 34453.

[11]

A. Garimella, M. V. V. Satyanarayan, R. S. Kumar, P. S. Murugesh, andU. C. Niranjan, VLSI Impementation of Online Digital WatermarkingTechniques with Difference Encoding for the 8-bit Gray Scale Images,in Proceedings of the International Conference on VLSI Design, 2003,pp. 283288.

[12] S. P. Mohanty, N. Ranganathan, and R. K. Namballa, A VLSIArchitecture for Visible Watermarking in a Secure Still Digital Camera(S2DC) Design, IEEE Transactions on Very Large Scale IntegrationSystems, vol. 13, no. 8, pp. 10021012, August 2005.

Documents

Design and Implementation of Fast FPGA Based (1)