Implementing and testing various digital watermarking ...mdv/courses/CM30082/projects.bho/2003-4/Steve… · Contents Implementing and testing various digital watermarking techniques

Implementing and testing various digital

watermarking techniques on audio data.

Steven Morgan

BSc in Computer Software Theory

COPYRIGHT

Attention is drawn to the fact that copyright of this thesis rests with its author. The

Intellectual Property Rights of the products produced as part of the project belong to the

University of Bath (see http://www.bath.ac.uk/ordinances/#intelprop).

This copy of the thesis has been supplied on condition that anyone who consults it is

understood to recognise that its copyright rests with its author and that no quotation from

the thesis and no information derived from it may be published without the prior consent

of the author.

Declaration

This dissertation is submitted to the University of Bath in accordance with the

requirements of the degree of Batchelor Science in the Department of Computer Science.

No portion of the work in this dissertation has been submitted in support of an application

for any other degree or qualification of this or any other university or institution of

learning. Except where specifically acknowledged, it is the work of the author.

Abstract

Digital watermarking is a term to describe inserting data invisibly within a host sound,

image or video file in order to prove ownership. Over the past decade many

watermarking techniques have been proposed to make this possible.

Any such mark should still be detectable after common processing operations, including

lossy file compression. Various audio watermarking techniques representative of this

work are implemented and tested against many sound processing operations which may

or may not remove the watermark. The techniques are compared and their individual

strengths and weaknesses analysed and potential paths for further development

suggested. It is concluded that no technique is yet fully robust in handling all potential

attacks. It is suggested that for the technique to be seriously considered in the audio

domain, fundamental principals need to be reconsidered.

.

Acknowledgements

The author would like to thank Russell Bradford for his initial project proposal

and John Fitch for his assistance in allowing the specification focus to be brought

into the audio domain.

Contents

Implementing and testing various digital watermarking techniques on audio data.1 Abstract ................................................................................................................... 3 Acknowledgements................................................................................................. 4 Contents .................................................................................................................. 5

1. Introduction................................................................................................. 7 1.1 Background ......................................................................................... 7 1.2 What is Digital Watermarking? .......................................................... 9 1.3 What are Digital Watermarking’s Uses?............................................. 9 1.4 Digital Watermarking Characteristics............................................... 10 1.4.1 Watermark Robustness.................................................................. 11 1.4.2 Watermark Extractability .............................................................. 11 1.4.3 Watermark Fidelity ....................................................................... 12 1.5 Report Outline................................................................................... 12

2. Literature Review...................................................................................... 14 2.1 History............................................................................................... 14 2.2 Image vs. Audio watermarking......................................................... 15 2.3 Spatial vs. Frequency Domain .......................................................... 17 2.4 Specific Audio Techniques ............................................................... 19

3. Preparation ................................................................................................ 21 3.1 Technology Choices.......................................................................... 21 3.2 High Level Design ............................................................................ 22 3.2.1 Finding the maximum inaudible volume ...................................... 23 3.2.2 LSB ............................................................................................... 23 3.2.3 Echo Hiding .................................................................................. 24 3.2.4 Spread-Spectrum........................................................................... 27 3.2.5 The Patchwork Technique ............................................................ 30 3.3 Development Methodology............................................................... 31 3.4 Test Plan............................................................................................ 32 3.4.1 Module Testing ............................................................................. 32 3.4.2 End To End Testing ...................................................................... 33 3.5 Evaluation Data................................................................................. 33 3.6 Attacking Techniques ....................................................................... 34 3.6.1 Cropping........................................................................................35 3.6.2 Noise Reduction............................................................................ 35 3.6.3 High and Low Pass Filtering......................................................... 36 3.6.4 Lossy Compression ....................................................................... 36 3.6.5 Addition of noise........................................................................... 37 3.6.6 Changing The Sampling Rate ....................................................... 37

3.6.7 Pitchshifting .................................................................................. 38 3.6.8 Volume Reduction ........................................................................ 38 3.6.9 Sound Compression ...................................................................... 38

4. Implementation ......................................................................................... 40 4.1 Framework Overview ....................................................................... 40 4.2 Graph Visualisation Tool .................................................................. 43 4.3 Fast Fourier Transform ..................................................................... 47 4.4 Watermark Implementations............................................................. 48 4.5 Implementation Review .................................................................... 48

5. Evaluation & Test Results......................................................................... 50 5.1 Attacking Techniques ....................................................................... 50 5.1.1 Cropping........................................................................................53 5.1.2 Noise Reduction............................................................................ 53 5.1.3 High and Low Pass Filtering......................................................... 53 5.1.4 Lossy Compression ....................................................................... 54 5.1.5 Addition of noise........................................................................... 54 5.1.6 Changing The Sampling Rate ....................................................... 54 5.1.7 Pitchshifting .................................................................................. 55 5.1.8 Volume Reduction ........................................................................ 55 5.1.9 Sound Compression ...................................................................... 55 5.2 Evaluation Summary......................................................................... 56

6. Conclusion ................................................................................................ 58 7. Further Development ................................................................................60

7.1 Potential Improvements .................................................................... 60 8. Bibliography.............................................................................................. 62 9. Appendices................................................................................................ 65

9.1 Additional Algorithm Descriptions................................................... 65 9.1.1 Phase Coding................................................................................. 65 9.1.2 Phase Modulation.......................................................................... 66 9.1.3 Watermarking the compressed bitstream...................................... 66 9.1.4 Integrating watermark embedding into compression encoder ...... 68 9.1 Source Code Sample ......................................................................... 69 9.2 Project Proposal ................................................................................ 91

1. Introduction

1.1 Background

One of the greatest technological advancements to change people’s lives in the past

decade or so has been the Internet. From humble beginnings during the early 1960s

where it started as a method of retaining communication in the event of nuclear war to the

worldwide network of computer networks it is today, as discussed in How the Web Was

Born [1]. Today, the Internet affects all industries as its potential continues to be

recognised and explored. From e-commerce to a means of advertising, it has become the

first port of call for many uses. The increase in demand has meant great attention has

been paid to the evolution of net technology, including increased data transfer speeds.

Phone modems are becoming less common as broadband becomes a common option for

residents of developed nations with businesses investing in more costly alternatives to

cater for their large user bases.

One of the more notable side effects of this evolution has been the volume of data transfer

between users, including the trading of pirate material, whether software, video or audio.

Piracy has always been a major problem that has concerned the music and film industries.

Before the advent of the Internet other mediums were met with outcry. Audio tapes and

video cassettes were both greeted with the same concerns about intellectual property

theft. However, most personal breaches of copyright were ignored because they made

little difference to sales. Organised piracy syndicates were the focus of the industry’s

protests. The advent of the Internet has changed this. With high transfer speeds and the

ability to communicate with any other user connected to the Internet, piracy has become a

serious problem. The ease of which pirated material can be obtained has dramatically

affected the volume of intellectual property breaches.

The music industry in particular has been affected with significantly reduced sales. Sonic

Boom: Napster, P2P and the Battle for the Future of Music [2] tells of the MP3

revolution and Napster, which has made headlines around the world. Napster was a

company that was famed for its pioneering peer-to-peer software that enabled the transfer

of audio files between users with little more effort than the click of a mouse. Legitimate

music downloads were indistinguishable from pirated material making it hard for the user

to distinguish between the two. Sometimes users didn’t realise they were breaking the

law.

Given Napster’s ease-of-use, it didn’t take long for the network of users to grow to a

considerable number, with huge volumes of music being transferred on a daily basis.

Record sales dropped and the industry began to acknowledge the extent of the problem.

Since that time, measures have been taken to combat this huge flow of piracy. In 2000,

the Recording Industry Association of America (RIAA) won a injunction to shut Napster

down but similar software emerged, such as AudioGalaxy, Kazaa or eDonkey, with even

greater popularity and functionality. The piracy battle isn’t being fought by the RIAA

and it’s international counterparts alone, many musicians became vocal on the issue.

Many approaches have been taken to try and stop or discourage the file sharing, one

being the RIAA systematically suing individual downloaders for large sums of money,

seeming to show the industry’s desperation in the matter.

Another side-effect of the Internet’s usage growth has been the ease of finding images to

suit your purpose. By simply referring to a search engine, it can be easy to find any

picture to download. With such flow of images between computers with many web-sites

using images without declaring their source, ownership has become an issue. With digital

technology, there is no way to prove that you originally took a picture, since digital

cameras have no film. One of the major technologies that has arisen to combat this

problem has been Digital Watermarking.

1.2 What is Digital Watermarking?

A watermark is a translucent design impressed on paper during manufacture and visible

when the paper is held to the light. It’s purpose is to stop imitation, by making the

watermark near impossible to reproduce. The concept of Digital watermarking considers

this process in the digital domain from the ethos of cryptography and steganography. The

purpose of a digital watermark is to hide within an image or audio file some data

specifically relating to that file. The base requirement of a watermarking technique is that

adding a mark to a sound file should not degrade its quality. The watermark should

survive altering operations and remain detectable.

The number of papers written about Digital Watermarking have grown considerably as its

applications are recognised more and more. Common art packages such as Adobe

Photoshop now provide watermarking as a built in option. Watermarking has expanded

it’s uses to image, video and to a lesser extent, audio. Watermarking has remained a

fringe idea in the audio domain for reasons discussed later but has not been disregarded.

This project will focus specifically on watermarking of audio and trying to discover why

this process has been relatively fruitless in this domain and to suggest areas of

improvement from our results.

1.3 What are Digital Watermarking’s Uses?

A digital watermark has three broad uses:

• To prove whether a media file is the original.

• To prove who owns the media file.

• To identify whether a media file is copyrighted material.

In real world scenarios:

- Watermarks could be embedded in media sold via the Internet from outlets such

as iTunes or Rhapsody. This would enable the music industry to create software

agents that could trawl P2P networks for copyrighted material. Upon identifying

copyrighted material the agent could report the user to a regulative authority.

- An artist may make their music available for download on their promotional web-

site. In this situation, the artist would not be overly bothered by the sound quality

of the download since it’s purpose is for previewing purposes only. As such a

robust watermark could be used that was audible to the human ear.

- In both of these cases, the watermark could be used as evidence in a court of law

to determine who owns the original copyrighted material.

Potentially, watermarking could be performed in the recording studio or editing suite. As

part of the final mixing process the relevant copyright information could be inserted.

This would ensure that the watermark was included early on in the production and

distribution chain. Apart from the master, all other copies would contain the watermark.

However, to maintain the integrity of the essence, watermarking conducted in the studio

would have to be completely undetectable to human perception.

1.4 Digital Watermarking Characteristics

The concerns surrounding watermarking are as follows:

• Robustness –Ease with which the watermark can be detected after intentional and

unintentional alterations of the watermarked image.

• Extractability – There is disagreement as to whether the extraction of the

watermark should be blind or informed.

• Fidelity – How well the watermarked sound resembles the original. This has

been the biggest issue to date with audio watermarking.

1.4.1 Watermark Robustness

When discussing watermarking it is common to discuss a watermark’s fragility.

Fragile watermarks refer to watermarks where any modifications to the essence would

either remove the watermark or irrevocably alter it, so that only the true original would

pass a watermark check.

Semi–fragile watermarks refer to watermarks that would survive some basic operations,

such as saving to a different format including lossy file compression techniques such as

MPEG, but would be removed by any actual editing of the image.

Robust watermarks refer to watermarks that when called upon, are able to prove without a

doubt who owns the original media.

1.4.2 Watermark Extractability

Blind extraction is where an extraction program exists which can detect and read the

watermark, so that it can be used to inform of ownership as well as prove it, this does

however put some restrictions on the watermarking techniques available and may make

attacking easier.

Informed detection means that generally only the owner of the original image can detect

the watermark, the extraction method may even require the original image.

If the watermark is to have effective commercial use, then it needs to be blind detectable,

however this leads to many problems. Presumably the method of watermarking should be

as open-source as possible, like any good encryption technique. However this would

make removing the watermark very easy if no key is required to detect just where in the

wave form the watermark lies. Similarly it would be impractical to require the original

version of the sound is since it may not be available. Allowing the original sound to be

released in the public domain effectively makes the watermark pointless. For these

reasons the two types of watermark are also known as public watermarking systems and

private watermarking systems respectively as described in Digital Watermarking [3].

1.4.3 Watermark Fidelity

In some applications if the watermark introduces audible artifacts into the signal then it

will be instantly dismissed as a candidate technique for identifying copyrighted material.

For example, there was a plan to introduce audio watermarking to coincide with the

release of the DVD-A format but unfortunately, the technology didn’t meet the

requirements. In August 2000, leading classical recording engineer Tony Faulkner was

quoted as saying “Watermarking could reduce the perceived quality of DVD-A to

somewhere between a good MiniDisc and a below-average CD,” after conducting

research into the format. The plans to use an audio watermark were dropped since the

value of releasing the DVD-A came from the ability to produce a level of quality akin to

the original studio recording.

1.5 Report Outline

The following report will read as follows:

• Chapter 2 discusses previous work in the field, drawing out relevant

developments and describing aspects utilised at later stages.

• Chapter 3 states exactly what is to be done, why and how.

• Chapter 4 describes the details of the implementation described in the source

code from the outline set in Chapter 3

• Chapter 5 evaluates the results obtained in Chapter 4 by implementing attacks.

• Chapter 6 concludes.

• Chapter 7 suggests further development that could be carried out with a larger

timescale.

2. Literature Review

2.1 History

Traditionally, cryptography has been the principle technique for obscuring information.

From the pre-computer era, encrypted messages have been exchanged between people in

situations where the sender wants to ensure that the message in question cannot be

understood by anyone other than those with the decrypting key. A famous example of

this was the German Enigma Machine from the second world war, whose story it told in

Enigma [4]. This was a machine used by the Germans with a complicated encryption

technique that was eventually broken by the allied forces.

As discussed previously, the rise of peer-to-peer (P2P) software has caused a surge in the

ease of piracy, as discussed in Sonic Boom: Napster, P2P and the Battle for the Future of

Music [2]. Piracy has always been a matter the music industry has taken very seriously

but in recent years, the ease of file sharing has caused actual media sales to decline

noticeably. This has been a major reason for the continued research and development of

audio watermarking as the industry continues to seek ways to stem the flow of illegal

downloads. Napster’s closure only caused a ‘blip’ in the rise of piracy methods and

paved the way for other software to appear in a similar vein. It was evident that

something else needed to be done.

It is only in the past decade that steganography has received serious consideration as a

method of combating this. Steganography is the process of hiding a message within a

larger one in such a way that others can not discern the presence or contents of the hidden

message. It wasn’t until the first conference on the subject in 1996 that it was formally

addressed. Many groups had already been researching the subject independently of each

other.. The details of that conference are stored in the journal Information Hiding: 1st

International Workshop [5]. Common terminology was agreed upon at this pioneering

event.

With collaboration from different research groups, ideas were traded and the development

of these steganographic techniques rapidly gained pace as new ideas were thought of

from these original sparks. Image watermarking was the most natural form of

development due to its similarities to the way an artist would signify their creation of a

painting by signing the bottom corner of the picture.

A web-site named Digital Watermarking World [6] was set up centralised watermarking

resources, including relevant books, research and upcoming conferences to name a few.

Unfortunately due to the commercialisation of watermarking techniques, the web-site is

now defunct and serves merely as an archive of past research.

As watermarking evolved, the focus of the research remained on image processing with

few exceptions due to the added complications in making the process effective for audio,

as discussed in the first book to focus purely on digital watermarking, intuitively named

Digital Watermarking [3]. This book serves as a good introduction to the various

approaches used in watermarking but mainly focuses on images. The book doesn’t have

great depth but introduces the concepts and outlines of terminologies alongside

definitions and explanations very well. A similar surface level breakdown of this can be

found in Information Hiding Techniques for Steganography and Digital Watermarking

[7].

2.2 Image vs. Audio watermarking

Image watermarking cannot be directly transferred to audio watermarking due to the

fundamental differences between the way our eyes and ears work, as discussed in

Sensation and Perception [8]. When the eye views an imperfect picture, the brain blanks

out the imperfections seeing the picture as it’s supposed to be viewed. For example,

when a person visits the cinema, they see the projected image on the screen as it was

intended to be seen without noticing the flickers unless they explicitly look out for them,

but the ear can pick up on the slightest imperfection in a sound, as shown in Attacks on

Copyright Marking Systems [9]

When represented in visual form, it is easy to see the difference between the diagrams,

just as it would be easy to tell the difference between their sounds. Here, it is evident that

(b) is the same sound wave as (a) but with an 20 milliseconds echo added as a watermark.

The echo is very clearly visible at 0.02 seconds and would be easily audible when

listening. The small imperfection on the picture’s equivalent in the audio domain would

come across as a click. A similar form of defence used in image watermarking would

have been near invisible to the human eye.

Despite the added complexity of audio watermarking, the similarities between the

robustness techniques and the attacks of visual watermarks and audible watermarks have

to be analysed for their similarities since there is so much more literature for image

watermarking. When considering Fabien Peticolas’s Stirmark Benchmark [10], attacks

can be grouped in the following way:

• Cropping (Essentially the same)

• Compression (lossy or lossless)

• Random bending (pitchshifting)

• Gaussian Filtering (low-pass filter) etc.

Following the creation of Stirmark, other benchmarks came into existence to test

the robustness of watermarks including Checkmark Benchmarking [11], which

contains extra attacks, and Optimark [12] which features a GUI. Hiding digital

watermarks using multiresolution wavelet transform [13] provides statistical

effectiveness of Checkmark showing that only one existing technique remained

completely robust against the various forms of attack, namely the Xie 1 bit transform.

These marks are the standard ways of testing the robustness of a watermark. Those

wishing to attack the watermark are always going to devise new ways of removing it.

The fastest way for the technology to evolve is to monitor the way people remove the

watermarks. Researching according to these attacks accelerates the development.

2.3 Spatial vs. Frequency Domain

Most watermarking techniques themselves can be distinguished into two approaches,

those in the Spatial domain and those in the Frequency domain. The main difference

between these approaches is their robustness, as will be discussed later. Spatial

techniques were the initial development in the field. A Watermark Technique Base On

One-Way Hash Functions [14] proposes embedding the watermark securely in the least

significant bit (LSB) plane so that only the person who has placed the watermark may

retrieve it using cryptographic hash functions to ensure the security of the watermark.

Transparent robust image watermarking [27] suggests an imaging technique that creates

a random key to decide on a co-ordinate in the image. Once the key is chosen, the

brighter and darker pixels are distinguished and the brightest is brightened and the darkest

is darkened. This technique could be applied to sound with a change in amplitude of a

sample, by negligible amounts that would remain inaudible to the listener. With the

correct key used to retrieve the watermark information, one simple comparison operation

can find the watermark. The problem with this technique is its susceptibility to noise.

Any interference could alter the volume differences and render the watermark useless.

Spatial domain watermarks are being developed today, since their techniques are

relatively cheap and for more trivial examples they can quickly create a watermark with

little effort. The main area of focus with this form of watermarking is in the randomised

key. If the key follows a pattern, then the human mind is more likely to pick up on the

imperfection and so it will be easier to notice and therefore remove.

Although it is accepted that spatial domain watermarks are not as robust as frequency

domain attacks, there seems to be some ambiguity with regards to which technique has

the largest capacity (i.e. the technique that can you can fit most of into your sound file).

Adaptive watermarking in the DCT domain [28] suggests that Spatial has the larger

capacity, but Image watermarking for tamper detection [29] has a different opinion. It

seems that both statements are true but as you fit more watermarking using Spatial

techniques, the quality of the original media will degrade, unlike frequency domain

watermarking. Since spatial domain techniques distort the media, you tend to find that

very little watermarking can be carried out to make it practical so its use is fairly limited.

Techniques applied in the Frequency domain are more robust than those applied in the

Spatial domain. This explains in part why the bulk of current research is directed towards

the the exploration of Frequency based techniques.

The benefits of real-time watermarking allow safe and traceable audio streaming, radio

broadcasting and cellular phone recordings. With these practical applications,

watermarking in audio can be viewed as a more serious technique that has great potential.

Digital image watermarking using daubechies’ wavelets and error correction coding

[30] discusses how this could be possible despite applications on the Windows platform

having an embedded 8x run time.

The biggest debate with frequency domain watermarking is where exactly to put the

watermark. Technically, if the watermark is placed in an inaudible area of the domain

then a low pass filter should be able to eradicate it. Many techniques are being developed

to counter this but if the filter cannot erase the watermark then its not doing its job

properly and needs to be adapted to stop this. With this kind of trade-off between two

different fields of development, you have to wonder what the future direction of

watermarking will be with relation to audio.

Another issue regarding watermarking is security. Who can translate the watermark? In

principle, for a watermark to be truly robust, it should only have one possible method of

removal, known only to the owner. However, this is just an ideal, which cannot be

achieved with current technology. The solution is a trade-off has to be established

between the ease-of-finding the watermark and the ease of its removal. Hidden digital

watermarks in images [31] makes the suggestion that the key that encodes the watermark

is known only to the image owner and a key that decodes the watermark, or recognises it

would be available to the public.

2.4 Specific Audio Techniques

Techniques and Applications of Digital Watermarking and Content Protection [21] is one

of the few books to contain a chapter exclusively on audio watermarking. In this chapter,

five specific watermarking techniques are formally specified:

• Least Significant Bit (LSB) Coding

The substitution of the LSB carrier signal with the bit pattern from the watermark

noise.

• Embedding Watermarks into the phase (phase coding and phase modulation)

Exploiting the fact that humans have a low sensibility against relative phase changes.

Phase coding splits the original audio stream into blocks and embeds the whole

watermark into the phase spectrum of the first block whilst phase modulation

performs independent multiband phase modulation.

• Echo hiding

Embedding watermarks into a signal by adding echoes in a slightly delayed time

position to produce a marked signal.

• Spread spectrum audio watermarking

Sending the watermark in the transmission signal.

• Patchwork Technique

Use a pseudorandom process to embed a certain statistic into a data set that is

detected in the reading process with the help of numerical indexes.

3. Preparation

3.1 Technology Choices

Development OS & Programming Language I chose to implement the project in a Windows environment due to its familiarity and

convenience. The system could have been developed in a number of languages. I selected

Java to capitalise on my experience with the language. Many people criticise Java for

being slow. However, the order of complexity of an algorithm remains invariant under the

language, thus arguments of efficiency of languages are not as important as delivering a

lean and efficient algorithm. Java 1.4 also provides added performance by virtue of the

Java HotSpot virtual machine.

The Java HotSpot uses adaptive compilation to capitalise on the trend that most programs

spend 80% of their time in 20% of the code. The HotSpot VM compiles the most

frequently run code, performing advanced optimisation and in-lining of methods. This

type of compilation is particularly effective for audio processing kernels.

Audio Library Audio functionality is a relatively new feature in Java, with Digital Audio With Java

[22], providing a slightly dated yet nonetheless useful outlook on audio programming in

the language.

I decided to you use the Java Sound API. The API provided abstraction of an audio signal

from its underlying medium, be it wav, mp3, au etc or an input device, eg

A microphone.

The API also provided a platform independent interface to audio hardware for playback.

Sound Editor

Goldwave version 4.25 is a sophisticated sound editor with many pre-built sound

manipulators such as filters, limiters and gates. It contains many filtering tools with easy

to adjust settings. Goldwave was used to create the attacks on each file since it seems

unnecessary to ‘re-invent the wheel’.

Backup Procedure

Throughout development I regularly backed up my code onto a network share.

3.2 High Level Design

What follows are some of the most commonly used audio watermarking techniques and

an analysis of their implementation methods.

It is my intension to produce implementations of the following algorithms:

- LSB

- Echo Hiding

- Spread Spectrum

- Patchwork

If time remains I will also produce implementations of the following algorithms:

-

- Additional Algorithm Descriptions

- Phase Coding

- Phase Modulation

- Watermarking the compressed bitstream

- Integrating watermark embedding into compression encoder

For a description of these algorithms see Appendices - Additional Algorithm

Descriptions.

3.2.1 Finding the maximum inaudible volume

To find the maximum volume inaudible to humans, thus finding the boundary of

audibility for future watermarks, the following steps need to be taken:

1. Calculation of the power spectrum;

2. Identification of the tonal (sinusoid-like) and nontonal (noise-like) components;

3. Decimation of the maskers to eliminate all irrelevant maskers;

4. Computation of the individual masking thresholds;

5. Computation of the global masking threshold;

6. Determination of the minimum masking threshold in each subband.

3.2.2 LSB

One of the first techniques investigated in the watermarking field, as for virtually all

media types, is the so-called LSB encoding. It is based on substituting the LSB of the

carrier signal with the bit pattern from the watermark noise.

This method places message bits into cover audio by modifying the least significant bits

of the audio. The scheme developed places bits into the mth bit of the cover audio, where

m is a parameter that ranges from 1 (MSB) to 16 (LSB). This method has extremely low

computational complexity, on the order of O(n). To allow as fair a comparison between

watermarking methods as possible, the cover audio is segmented and bits are placed at the

first location of each segment. Thus, the embedded bit locations were known in advance.

This violates the provision that watermarks should be statistically invisible; technically

this method can be considered as more of a “data-hiding” than a watermarking algorithm.

Decoding simply involves taking values at these known locations and extracting the

desired bit. Interestingly, bits encoded down to the 10th bit location could not be heard by

human observers.

3.2.3 Echo Hiding

Echo Hiding was developed by Gruhl, Lu and Bender in Echo Hiding [25] and proposed

to encode bits by introducing a small, imperceptible echo to the file. Overlapping an echo

kernel with the original signal to implements the echo.

A variety of watermarking algorithms are based on echo hiding methods, according to

Techniques and Applications of Digital Watermarking and Content Protection [21]. Echo

hiding algorithms embed watermarks into a signal co(t) by adding echoes co(t − ∆t) to

produce a marked signal cw(t ):

cw(t) = co(t) + αco(t − ∆t) (i)

In the above equation, parameters ∆t and a can be adjusted to provide inaudibility of the

echo whilst by changing ∆t alone you can encode bits of the watermark into the audio

signal. In general, (i) can be written as

cw(t) = αk co (t − ∆tk) (ii)

where co(t) is the original signal with parameters α0 = 1, ∆t0 = 0, and N the number of

different echo signals embedded. By substituting the response function

h(t) = αk δ(t − ∆tk) (iii)

a short form convolution of the echoes with the original signal can be written

cw(t) = co(t) * h(t) (iv)

The marked signal cw(t) can also be expressed in the frequency domain as

Cw(ω) = Co(ω)H(ω) (v)

where Co(ω) and H(ω) are the Fourier transformations of the signals co(t) and h(t),

respectively. During the detection step, the calculation of h(t) is necessary to determine

the individual echoes with corresponding delay times ∆tk encoding the bits k = 1, . . . , N.

According to (v), the signal can be separated by dividing Cw(ω) by Co(ω) in the frequency

domain and calculating the inverse Fourier transformation. Performing this operation

requires an a prior knowledge of the original signal Co(ω), which is not practical in the

case of watermarking. The method for separating the signal and the echoes is known as

homomorphic deconvolution.

The basic idea behind homomorphic deconvolution is to apply a logarithmic function to

convert the product (v) into a sum. Using the definition of the complex cepstrum as the

inverse Fourier transformation of the log-normalized Fourier transform of the

watermarked signal, the transformed signal can be written as

Cw(q) = F−1{log |Co(ω)H(ω)|}

= F−1{log |Co(ω)|} + F−1{log |H(ω)|} (vi)

= Co(q) + H(q)

as a function of the time or quefrency domain which is equivalent in nature to a time

domain representation. According to (vi), the original signal Co(q) and the embedded

echoes H(q) are clearly separated on the quefrency axis q. Using this deconvolution

technique in the detection of the watermark bits, an algorithm adding two different echoes

for embedding 0 and 1 bits can be constructed. The original signal co is split into M =

blocks coj, 0 ≤ j ≤ M − 1 with N samples. Each block carries 1 bit of the

watermark.

1. For each block co j of the original signal, the echo signal for the 0 and 1 bits are

constructed with the corresponding delay time and attenuation factors α0 and α1.

wk(t) = αkco(t − ∆tk), for k = 0, 1 (vii)

2. Two complementary modulation signals mk(t), k = 0, 1 for the 0 and 1 bits are

generated: m0(t) = (1 − bj) rect j (t), m1(t) = bj rect j (t) (viii)

with

m0(t) + m1(t) = 1 ∀t rectj (t) = 1 for t j ≤ t < t j+1

0 otherwise (ix)

and bj = m[ j modl(m)]

The modulation signals are used to construct the echo signals according to the bits of

the watermark.

3. After multiplying the echo signals wk(t) with the modulation signals mk(t), the marked

audio stream is generated by addition of the computed signals to the original one:

cw(t) = co(t) + m0(t)w0(t) + m1(t)w1(t) (x)

Using this mathematical theory on the

Through trial and error, echoes of 5500 and 4400 yielded good decoding results. The

amplitude of the echo kernel may also be adjusted. A higher amplitude means a stronger

echo. When the echo kernel amplitude was less than .5, very few listeners could hear any

difference between the original and echoed signal. Stronger amplitudes produced a more

resonant, “richer” sound.

Retrieving the watermark requires a synchronization procedure to perform an alignment

with the watermarked blocks:

1 Transformation of the sequence in the cepstrum domain Cw = F−1{log(|F{ cw}|)};

2 Autocorrelation of Cw in the cepstrum domain;

3 Measurement of the delay time δt via the peaks of the autocorrelation of Cw;

4 Determination of the embedded bit by comparison of δt with ∆tk, k = 0 or 1.

Using masking effects, echo hiding uses the postmasking effect in order to control the

inaudibility of the embedded watermark. The louder the echo, the stronger the watermark

will be. The watermark shouldn’t be much greater than the lowest audible level humans

can hear else any lossy compression will instantly cut it, but also shouldn’t be much

greater, else it will be too audible. The delay times ∆tk and attenuation factors αk, k = 0, 1

have to be adjusted in the embedding process according to the perception threshold of the

human auditory system to ensure the relative inaudibility of the echoes. It is a blind

watermarking method so the original audio file would not be required thus extending the

usability of the method. The embedding and the detection are performed in two different

domains, the time and cepstrum domain, respectively which can add complexity to the

algorithm since the number of transformations that have to be computed for detection in

the cepstrum domain would be great.

3.2.4 Spread-Spectrum

Spread-spectrum methods, originally conceived for masking the origin of radio

transmissions and enhancing resilience against jamming, are often used in the

transmission of digital information according to Techniques and Applications of Digital

Watermarking and Content Protection [21]. Since the requirements of suppressing

jamming during transmission, hiding a signal against an unintended listener and ensuring

information privacy are very similar to those in watermarking applications. If fact, they

are probably the most widely used techniques in the development of watermarking

algorithms. From the spread-spectrum viewpoint, the original audio signal can be

considered as a jammer interfering with the signal carrying the watermark information.

The spread-spectrum modulation is a special form of watermark modulation. The

modulation is performed on Co, which is the transformed block of samples co. The

transformation is used to model the audio signal with orthonormal base functions

spanning the signal space. If the identity transformation is used, the signal is represented

by the block of PCM samples itself. In the case of the Fourier transformation, the

trigonometric functions are used as basis functions and the transformed block consists of

the Fourier coefficients represented by the vector Co. Each bit k ε {0, 1} is modelled by a

pseudonoise pnk vector consisting of two equally probable elements {−1, +1} generated

by means of the secret key. Therefore, the expectation value of the pseudonoise sequence

is E{ pnk} = 0. Usually the pseudonoise sequences for the two bits are inverted pn0 =

−pn1 = pn. The original signal co is split into M = blocks co j, 0 ≤ j ≤ M − 1 with

N samples.

To simplify the discussion, consider one block (co : = co j ) carrying 1 bit of the

watermark.

1. The block co is transformed with the orthogonal transform T in the corresponding

domain Co.

Co = T (co) (i)

2. The PN sequence pnk is weighted with α to adjust between quality and robustness.

W = αpnk (ii)

3. The modulated and weighted watermark signal is added to the cover signal in the

transformed domain.

Cw = Co +W (iii)

4. The watermarked signal is transformed back into the time domain.

cw = T −1(Cw) (iv)

During the detection step, the same vector pnk, k = 0, 1 has to be generated via the secret

key. A comparator function is used in order to decide about the presence of the embedded

vector pn. This requires a perfect synchronization with the embedding block of samples.

1. Synchronization with the beginning of the embedding block cw;

2. Transformation of cw into embedding domain Cw = T (cw);

3. Correlation of Cw with pnk, k = 0, 1 by applying the comparator function Cτ:

Cτ (Cw, pn) = Cτ (Co, pn) + Cτ (αpn, pn) (v)

4. Detection of the transmitted bit, usually made on the sign of the comparator function

sign (Cτ (Cw, pn)) { > 0, for pn0

{ < 0, for pn1 (vi)

One of the widely used comparator functions Cτ is the linear correlation

Cτ (x, y) = [x, y] = x[i]y[i] (vii)

with the signal vectors x and y. The result of the correlation consists of the two

contributions Cτ (Co, pn) and Cτ (αpn, pn). The second term accumulates the contribution

of the pseudonoise sequence embedded in the different base functions, whereas the first

term represents the correlation or the interference of the carrier signal respectively and

pseudonoise sequence. If the pseudonoise sequence is split into the two sequences

containing positive and negative elements, the correlation Cτ (Co, pn) can also be written

as:

Cτ (Co, pn) = C+o [i] − C−

o [i] = (µ+ − µ−)

2 (viii)

with µ+ and µ− denoting the mean values. According to the central limit theorem, the

distribution of the means is normal if N is sufficiently large. Furthermore, the difference

of two normal distributions is also normal with N(µCτ , σCτ ). Since Co and pn are two

independent random variables, the mean µCτ and the variance σCτ can be calculated

according to

µCτ = {Cτ (Co, pn)} = E{ Co} E{ pnk} = 0 (ix)

Cτ2 2

(µ+ - µ−)/2 = 2(µ+ + µ−)/2 = 2

µco = (x)

By using the model of the distribution function N(0, σCo / √N) in the unwatermarked case

and assuming a fixed weighting α := {α} Ni=1 of the pseudonoise sequence, the probability

distribution function for the two different sequences is

fpn1 (t) = fpn0 (t) = (xi)

Errors in detection of the bits occur if Cτ (Co, pn) > Cτ (αpn, pn). Therefore, the false

alarm probability is obtained by

P f a = P01 + P10 = p fpn0 (t)dt + p1 fpn1 (t)dt (xii)

where P01 represents the error that a 0 bit is transmitted and a 1 bit is detected and P10

accordingly. Setting the a prior probabilites that the different bits are transmitted to

p0 = p1 = ½ and using the definition for the complementary error function erfc(x),

erfc(x) = 1 − erf(x) = e−t2dt (xiii)

this can be written with the threshold τ = α according to (xii) as

P f a = P01 + P10 = ½ erfc (xiv)

Different kinds of audio watermarking algorithms use different embedding domains and

representations of the transformed signal vector Co. Furthermore, the psychoacoustic

parameters have to correspond to the specific embedding domain in order to perform the

psychoacoustic weighting step. One of the first algorithms that used the masking

properties human auditory system by Tewfik et al. [1, 2] works in the Fourier domain.

The psychoacoustic weighting is performed by shaping the Fourier coefficients of the PN

sequence according to the masking threshold calculated by the psychoacoustic model

presented in Section 5.2.3. Furthermore, this algorithm approximates the temporal

masking behavior by using the envelope of the signal for the increase and a decaying

exponential for the decrease of the signal. Another algorithm presented by Haitsma et al.

Spread spectrum is a widely used technique for different types of media given its high

robustness against signal manipulations. If a secret key is used to generate the

pseudonoise sequence pn, this algorithm does not need the original audio signal in order

to detect the embedded bits and is therefore a blind watermarking method, provided that

the synchronization requirement is met. The main disadvantage is the vulnerability

against desynchronization attacks. Furthermore, the length of the correlator has to be

sufficient in order to ensure small error probabilities, which is evident from (xiv).

3.2.5 The Patchwork Technique

The patchwork technique, first presented by Bender et al. in Techniques For Data

Hiding, [15] for embedding watermarks in images is a statistical method based on

hypothesis testing described in Techniques and Applications of Digital Watermarking and

Content Protection [21]. These methods use stochastic models relying on large sets,

which make them applicable for CD-quality audio data due to the large amount of

samples. The watermark encoding procedure uses a pseudorandom process to embed a

certain statistic into a data set which is detected in the reading process with the help of

numerical indexes (like the mean) describing the specific distribution. This method is

applied to magnitudes in the Fourier domain in order to spread the watermark in the time

domain and be more robust against random sample cropping operations.

The selection of the two subsets can be described by a permutation of the indices i = (1, . .

. , 2N) according to the bit to be embedded:

π = (a1, . . . , aN, b1, . . . , bN), with pn[ai] = +1, pn[bi] = −1 (xv)

Therefore, the watermarked block is obtained by

Cw[n] = Co[n] + ∆Co[n]pn[n], n = π[i], i = 1, . . . , 2N (xvi)

Cw = Co +W (xvii)

where the alteration of the different Fourier magnitudes is described by the vector ∆Co.

The test performed during detection in the patchwork algorithm is a difference of subsets

defined by the indexes a1, . . . , aN and b1, . . . , bN, which can be written as

(Cw[ai] − Cw[bi]) = Cw[π[i]]pn[π[i]] = Cτ (Cw, pn) (xviii)

Therefore, the patchwork technique in this form is equivalent to the linear correlation

comparator function in the spread-spectrum technique as described earlier.

3.3 Development Methodology

Whilst implementing the project I planned to stick to an evolutionary development model

whose stages consisted of expanding increments of an operational software product. This

enabled me to deliver a working solution of at least one watermark implementation fairly

quickly.

Critics of this software model argue that it veers towards a ‘code and fix’ model so it’s

important to plan its evolution, and provision wisely for unplanned events.

Each stage is implemented and tested before progressing to the next. This provides a

trusted base to isolate bugs faster:

First phase of development

This stage is concerned with creating the end-to-end framework required to implement a

watermark and a very simple watermarking technique itself.

From a component level this involves:

- The creation of a generic interface to represent a watermark and its associated

operations.

- A component that manages the application and detection of a watermark given an

audio file and a given implementation of the watermark interface.

- An implementation of the Watermark interface that uses the LSB algorithm.

- A test class that applies a watermark to an input file and then attempts to detect

the same watermark within the resultant file.

Subsequent phases of development

Each additional stage of development was concerned with implementing a new version of

the Watermark interface described above.

3.4 Test Plan

3.4.1 Module Testing

The modules were designed to be self-contained and therefore able to be tested separately

by both an individual test harness and by hand using a debugger and visualisation tools.

For testing purposes a class that takes as input both a Watermark and an audio file will

insert the watermark into the audio file and then try to detect the resultant file for the

corresponding watermark. If it can’t detect the freshly inserted watermark there is an

error in the implementation.

In the case of the Maths functions, it will be possible to test the validity of some simple

identity relationship with random data such as:

fastFourierTransform(inverseFastFourierTransform( x )) = x

convolution(x, [1, 0, 0...]) = x

cepstrum(convolution(x, y)) = cepstrum( x ) + cepstrum( y )

3.4.2 End To End Testing

For testing purposes a class that takes as input both a Watermark and an audio file will

insert the watermark into the audio file and then try to detect the resultant file for the

corresponding watermark. If it can’t detect the watermark on an audio file that has had a

watermark freshly inserted then there is an error in the implementation.

3.5 Evaluation Data

To evaluate the different watermark techniques it is necessary to supply different types of

input files to obtain a balanced view of how the watermark copes with a variety of styles.

In mass usage, the permutations of sounds that possibly exist mean that testing cannot be

exhaustive, but by covering a handful of different sound types, we should receive a good

reflection.

The sounds chosen and the reasons why are as follows:

• A single vocal. A one layered sound that the human ear easily recognises.

Volume levels fluctuate greatly in a single vocal since pauses in between words

tend to fall to very low volumes. In a sound as basic as this, audible marks

should be relatively easy to recognise.

• A drum beat. The drum beat has some of the qualities of the single vocal in that

not much is going on in the sound so it is easy to pick out the individual elements

of the sound. The test posed for the watermark in this case is whether it can

withstand a constant rhythm with identical beats hitting in exact timings.

• Classical music test. This particular classical snippet is heavy on treble and is the

longest sample used to test the algorithms abilities to deal with sounds that really

utilise the windowing technique. The subtleties in this classical piece and the

quiet nature of it make the effects on an actual piece of music clearer.

• Loud full band test. This piece should withstand the most due to its complexity

and layering. Many layers of drums, bass, vocals and guitars, the watermarks

should be relatively easily disguisable.

3.6 Attacking Techniques

As mentioned in Chapter 2, there are two types of attack, intentional and unintentional.

As stated in [5], some sources like to define robust watermarks as those resistant to

unintentional attacks whilst secure attacks survive intentional attacks also making robust

watermarks a subclass of their secure counterparts. In the image domain, set benchmarks

exist as a set measure of the durability of the watermark. As mentioned in Chapter 2,

despite these benchmarks being designed for images, it doesn’t take much intuition to see

how they can be adapted for use in the audio domain. Take the Stirmark Benchmark

[10] for example. Stirmark was developed by Fabien Petitcolas, a well-known developer

in the field of watermarking. There also exists Checkmark Benchmarking [11] and

Optimark [12] which perform similar tasks but with extra benchmarks tested against.

With audio watermarking, more specific attacks can be implemented. Using these

adaptable attacks alongside some suggested in Techniques and Applications of Digital

Watermarking and Content Protection [21], a set of attacks were produced to test the

robustness of the watermark.

3.6.1 Cropping

Cropping is performed by shortening the sound file or by removing an arbitrary piece

within the waveform. The amount and location of the crop could occur anywhere within

a track but is most likely going to happen at the start or the end of the song to avoid

disrupting the flow of the sound.

3.6.2 Noise Reduction

Noise reduction is the process of removing specific sound frequencies within a waveform.

Using an envelope shape to specify which frequencies you want to remove means that

there are a lot of different ways to manipulate noise reduction to remove the specific

sounds that you want to remove. The following settings were applied to both the Echo

hiding and LSB Coding watermarks as a form of attack. The original sound was 2

minutes, 24 seconds and 520 milliseconds long. By cropping 1 second, 315 milliseconds

off the end of the file, it is disrupted enough to change the layout of the file but not

enough to seriously disfigure the sound inside.

3.6.3 High and Low Pass Filtering

High pass filters block low pitch frequencies, but allow high-pitched frequencies to pass.

They can remove deep rumbling noise or remove unwanted sounds below the given cut-

off frequency. A high pass filter was applied at 250Hz, only allowing frequencies above

this through. Applying the filter at this level meant the change in the song was not

affected too much with the reduction in sound.

Low pass filters block high-pitched frequencies (treble), but allow low pitched

frequencies (bass) to pass. They can be used to reduce high-end hiss noise or remove

unwanted sounds above the given cut-off frequency. By applying a low pass filter at

4000Hz to both sounds, the effect on the actual sound itself is minimised.

3.6.4 Lossy Compression

Lossy compression is probably the most common attack a watermark will have to

withstand whilst also being the most difficult to withstand. Lossy compression is

compression where the original image cannot be perfectly retained from the compressed

form as opposed to lossless compression. The most common form of lossy compression

today is MPEG Layer 3, more commonly known as MP3 compression. MP3

compression works because uncompressed audio, stores more data than the human brain

can actually process. It utilises the fact that if two sounds are very different but one is

much louder than the other, your brain may never perceive the quieter signal and that

your ears are more sensitive to some frequencies than others.

MP3 encoding tools analyze incoming source signal, and compare patterns to

psychoacoustic models stored in the encoder itself. The encoder can then discard most of

the data that doesn't match the stored models. MP3s also overlap frames with if an

adjoining one has excess space inside it.

To achieve compression of approximately 1000% efficiency and still be left with a sound

very similar to the original, MP3s go through a lot of processing. With these measures in

mind, MP3 encoding is likely to disturb the entire layout of the sound file.

3.6.5 Addition of noise

Certain noises are not recognised by the human ear, especially if quiet, consistent and

continuous. By manipulating this, noise can be added to the original file consistently

throughout but at a level barely audible to the human ear. The noise will be mixed with

the file from start to finish using an arbitrary white noise sample.

3.6.6 Changing The Sampling Rate

By changing the number of samples taken of the particular sound, what’s left is

essentially a different sound. By taking a higher sample rate, nothing would change since

you cannot introduce more samples to a sound when there are no more to choose from.

By halving the sample rate from 16bit to 8bit in both sounds, the waveform becomes half

as complex.

3.6.7 Pitchshifting

By changing the pitch of the sound sample by small amounts, the listener should not be

able to notice much difference, if any. By raising the pitch, the length of the sample is

shortened since pitchshifting also changes the tempo of a sound.

3.6.8 Volume Reduction

By making slight changes to the volume, the quality of the recording will suffer slightly

but if the change is small enough, the difference should be negligible. By increasing the

sound, there is a chance of the highest peaks of the recording to distort because they have

exceeded the highest level of volume, known as clipping. It is safest to reduce the

volume slightly of the entire sound so the difference isn’t so noticeable.

3.6.9 Sound Compression

This is a dynamic effect processes that makes use of compressors, limiters, expanders and

gates. The process consists of three variables; the ratio, the threshold and the smoothness.

The Ratio specifies the compression or expansion ratio. This value was declared at 90%

to ensure the difference made to the file itself was minimal.

The Threshold specifies the envelope level to activate the expander or compressor.

Compressors change the volume level of all sounds above that level. Depending on the

Smoothness setting, the threshold may have to be set much lower than expected. In this

case it was set at 0.250.

The Smoothness specifies how quickly the compressor changes from one volume level to

the next and how quickly it activates. Using 0% means that volumes will change

instantly, which can cause a rough distortion in sections of audio that border on the

threshold level. A value of 100% means that volumes will change gradually over 100ms.

With a high smoothness setting, the threshold will have to be reduced. The higher setting

makes the envelope detector respond more slowly to changes in the sound, resulting in a

lower envelope range. As a compromise, 50% was chosen for the smoothness.

4. Implementation

4.1 Framework Overview

The framework overview corresponds to the first phase of development as described in

Development Methodology (section 3.3):

Watermark Interface

Provides a generic interface to represent a watermark and its associated operations.

WatermarkImpl

Encapsulates common functionality all watermark implementations require.

WatermarkedAudioInputStream

The WatermarkedAudioInputStream class is a subclass of

javax.sound.sampled.AudioInputStream.

The AudioInputStream is used for both reading and writing audio. The extended version

of this class enables users to read audio, apply a watermark and save the watermarked

audio to a standard audio format.

Message

The message class can determine which bit of the watermark message needs to be

inserted for the current block of audio data.

Audio Format

This class encapsulates information about the underlying audio format.

LSBDetector

Detects input audio streams for LSB watermarks

LSBWatermark

Inserts LSB watermarks in the underlying audio stream

Framework Overview UML Class Diagram

4.2 Graph Visualisation Tool

The graph visualisation applet was developed to help visualise the processes occurring as

they occurred. It’s a great tool for debugging by allowing a visualisation of the current

watermark applications being made. Places where mistakes were occurring could be seen

to identify with the human eye where the problem lay.

On the following graphs, the x axis is measured in time samples in the time domain or

quefrency in the cepstrum domain. The two scroll bars were part of a test harness

application where they were used to calculate the power density spectrum. The debug

part is controlled by viewGraph.

In the following graph, we can see a visualisation of echo hiding. Notice the enormous

peak on the right. This is situated in one of two possible positions representing a 1 or a 0

bit. What you are seeing is a portion of the cepstrum in the echo hiding algorithm. The

strong peaks are looked for in the sound pattern to attempt to decode the watermark.

The following graph is an example where the echo has not been found.

4.3 Fast Fourier Transform

The Fast Fourier Transform is a well-studied algorithm, so to implement one would be

foolhardy. Instead I decided to obtain pre-written source code for the FFT. Searching for

such code in Java form proved fruitless and left me with two alternatives: write my own

or rewrite a FFT from C code.

I decided to rewrite Numerical Recipes implementation of the FFT from Numerical

Recipes in C [23]. This implementation is well known and is relatively straightforward to

adapt to the Java language.

A peculiarity was noticed in the specific implementation of Numerical Recipe’s FFT. In

reference to the realft packing, it stated:

“Calculates the Fourier transform of a set of n real-valued data points. Replaces this

data (which is stored in array data[1..n]) by the positive frequency half of its complex

Fourier transform. The real-valued first and last components of the complex transform

are returned as elements

data[1] and data[2], respectively. n must be a power of 2. This routine also calculates

the inverse transform of a complex data array if it is the transform of real data. (Result

in this case must be multiplied by 2/n”

From this description, it can be noted that real data[1..n] goes through realfft & becomes

the positive frequency half of its complex Fourier transform.

The real-valued first and last components are returned as data[1] & data[2] because the

negative frequency half are complex conjugates.

All arithmetic performed on arrays in the format produced by realft had to be aware of

the data format. For example, complexMultiply2, logModulus.and modulusSqr which

also unpacks its result into a plain list of real values.

4.4 Watermark Implementations

Watermark Framework UML Class Diagram

4.5 Implementation Review

The watermarking techniques implemented cover LSB encoding, Echo Hiding and the

Patchwork Technique from the types discussed in Chapter 2. Before any kind of attack

was implemented, certain strengths and weaknesses were apparent from the algorithms

implemented. Implementing the LSB algorithm is a relatively straightforward task since

it involves no actual sound manipulation, just the placing of actual binary values.

For all other techniques, the explicit coding of algorithms gave a good insight into the

true workings of the watermark. Hard coding these algorithms was intended to give a

possible insight into extended reasons for the weakness of the watermarks. Although this

process did enlighten the workings of the watermarks and give a better understanding of

their advantages and disadvantages, no epiphanies occurred.

At present the block size used is of size 8192. The echo delays are of sizes 5500 and

4400 respectively (approximately a tenth of a second at 44100Hz sample frequency)

which means only half of the 8192 sized block contains an echo. If the delays were any

less than this and it'd be far more difficult to extract the peaks because the cepstrum’s a

lot more noisy near the smaller echo delays. Any louder and the echo would have been

too audible. This will add to the uncertainty when detecting the watermark. It’s possible

to wrap the echo so the last part of the block is heard first. However, the sound this

would make would appear strange and notable so no echo wrapping occurred. This all

adds uncertainty due to the effect of silences in the sound.

A bug fix was implemented when a problem arose with the program terminating. BugFix

ensures the javax.sound.sampled api thread system is closed down allowing the program

to terminate. Which is a bug as far as could be analysed.

Due to lack of time, the implementation of the Patchwork technique did not reach a stage

worthy of testing. However, the framework for the algorithm was built with algorithms

followed for its development.

When finding the maximum audible volume, the first two steps noted in Chapter 3

(Calculation of the power spectrum; Identification of the tonal (sinusoid-like) and

nontonal (noise-like) components) were successfully implemented, but unfortunately, due

to time constraints, the process didn’t get any further.

Detailed commenting of the source code thoroughly explains the workings of the

program.

5. Evaluation & Test Results

5.1 Attacking Techniques

As described in Chapter 3, a set of attacks were declared to be implemented upon the

various watermarking methods to test their robustness. This section collects those results

and presents them in a form with analysis and reasoning behind it.

The following diagram shows a combination of results and predictions of the

watermarking robustness. Since not all watermarks were fully implemented, the results

for the Patchwork and Spread Spectrum algorithms have been carefully calculated by

referring to Enhanced Spread Spectrum Watermarking of MPEG-2 AAC Audio [20] and

drawing conclusions from the points made.

Cropping Noise

Reduction High/Low Pass Filtering

Lossy Compression

Addition of Noise

Changing the sample rate

Pitchshifting Volume reduction

Sound Compression

LSB Errors occur from the crop onwards

Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable

Echo Hiding

Errors occur from the crop onwards

Errors grew with correlation to the amount of reduction

More filtered, more errors, depending on the equaliser of the sound

Removes completely if Echo inaudible, no change if audible, errors increase around the audibility mark

More silence, more errors

No Effect

Error grows parallel to amount of reduction since higher frequency definition is lost

Error grows parallel to amount of reduction since amplitude definition is lost

Errors increase as compression increases.

Patch-work


Unrecognisable Some errors caused

Unrecognisable Unrecognisable Causes some errors

Error grows parallel to amount of reduction since higher frequency definition is lost

Error grows parallel to amount of reduction since amplitude definition is lost

Causes some errors

Spread Spec-trum


Errors grew with correlation to the amount of reduction

More filtered, more errors, depending on the equaliser of the sound

Removes completely if Echo inaudible, no change if audible, errors increase around the audibility mark

More silence, more errors

No Effect

No Effect Error grows parallel to amount of reduction since amplitude definition is lost

Errors increase as compression increases.

5.1.1 Cropping

In the case of both the LSB encoding and the echo hiding, the watermark remained

untouched up to the point where the crop occurs. Since the file is being read in

chronological order, the problem does not occur until a break in the flow occurs where

the detector can no longer find the watermark.

5.1.2 Noise Reduction

In the case of the LSB encoding, the watermark became unrecognisable, with the noise

reduction seeming to wipe out all traces of a watermark. The noise reduction caused

problems to the echo hiding since the signal was weakened from the reduction. It’s

natural that the greater the severity of the reduction is, the more the signal reduces

eventually eliminating the watermark entirely. With this particular echo hiding

algorithm, it didn’t take much nose reduction to start to distort the results with most

envelope shapes causing the watermark’s recollection relatively useless in successfully

proving ownership.

5.1.3 High and Low Pass Filtering

In the case of the LSB encoding, the watermark became unrecognisable. The effect of the

filtering depended on the content of the sound itself. In the classical music piece, the low

pass filter did not have as many errors since the sound was high in treble values in its

original form. This contradicted the results for the high pass filter where the echo was

completely eradicated, but also leaving the sound quite warped. It would be rare that

someone would apply a filter that would change the sound so much. If the echo was

inaudible, however, the filters would be the most efficient ways of removing it since they

could cancel the relevant frequencies.

5.1.4 Lossy Compression

Lossy compression was the toughest attack implemented on the sounds. In the case of the

LSB encoding, the watermark became unrecognisable. Echo hiding didn’t fair well

against this either. If the echo was inaudible, the process would wipe out any trace of it

instantly since that’s one of the main features of lossy compression. When the echo was

audible, some results could be obtained from the detector but even at a relatively high

level, the watermarking was still not sufficient to give a useful proof of ownership.

5.1.5 Addition of noise

In the case of the LSB encoding, the watermark became unrecognisable. During loud

moments in the audio, the echo hiding fair well, returning positive results, but in quieter

moments, the noise took effect. With a consistently loud sound like the rock song, the

watermark could be recalled to a reasonable level, but it’s use is limited.

5.1.6 Changing The Sampling Rate

In the case of the LSB encoding, the watermark became unrecognisable. It made no

difference to the echo hiding.

5.1.7 Pitchshifting

In the case of the LSB encoding, the watermark became unrecognisable. It also made the

echo hiding unrecognisable but was due more to the change in length of the file than the

change of pitch itself.

5.1.8 Volume Reduction

In the case of the LSB encoding, the watermark became unrecognisable. Small volume

changes made no difference to the echo hiding, but once levels dropped to around three

quarters, the errors became a lot more apparent.

5.1.9 Sound Compression

In the case of the LSB encoding, the watermark became unrecognisable. The sound

compression caused problems to the echo hiding since the signal was weakened from the

reduction. It’s natural that the greater the severity of the reduction is, the more the signal

reduces eventually eliminating the watermark entirely. With this particular echo hiding

algorithm, it didn’t take much compression to start to distort the results.

5.2 Evaluation Summary

Of the watermarks mentioned previously, each has been referred to as if it were a set

technique. Most of the watermarking methods described so far have been concepts

adapted from implementations inspired from various sources. It’s not just the delay and

audibility of the echo that can be changed to still have a functioning echo hiding

technique. The results obtained from the echo hiding contrasted with the results from the

LSB show this.

With the LSB the exact results expected were obtained each time without one single

rogue value. It failed to remain robust against anything apart from partially against a

crop. There are ways to improve the robustness of the LSB, such as moving the signal to

a more significant bit, which will probably remain inaudible, but being such a haphazard

method that fairs significantly worse to the other watermarking techniques, there’s no real

point.

The echo hiding results on average were worse than expected. Principally it shouldn’t be

too difficult to predict how an echo will fare against these various forms of attack, but

with various parameters, and a less discrete detection method, the scope for potential

occurrences increases.

In general, the loud full band song served better than all other samples. It was the only

piece not to contain a moment of absolute silence which seemed to make a difference

since the echo hiding watermark writer had nothing to work with in moments of silence.

The input data of size N was padded to block 2N before performing convolution. The

advantages were the fact that where the echo is present, it exactly echoes the source

signal by a fractional delay, making it less audible. The disadvantage of this is that the

echo is not continuously present; it’s absent at the beginning of each block. This is

deemed to make it harder for the detector to find it. It also increases computation time

because we're transforming a list of size 2N for audio data size N which means double

the cost.

Despite this, even if the techniques had surpassed expectations, their practical use would

still be severely limited. Most of the attacks implemented were merely normal processing

as opposed to malicious. If a watermark cannot withstand these then it is difficult to see

its commercial use.

6. Conclusion

Watermarking in the audio domain as a concept seems to be principally flawed. A

statement backed up by many leading steganography researchers with views similar to

that found in Why Digital Watermarking Is Nonsense [24]. The reasons for this can be

summed up as follows. If a watermark exists and is inaudible, an all-pass filter or an

MP3 compression could remove it with ease. If the watermark is audible, its uses are

already limited since in most cases, the user will not want their sound sample changed by

the watermark. Also with audibility, it would be easier to narrow down the watermark

itself, giving the attacker an advantage when trying to remove it.

Can a watermark really be called secure? Can you really say that we will reach a stage

where the information hidden within the medium is truly hidden? We can't really

measure security. All security systems are built with assumptions and measured

accordingly. Most of the assumptions in the real world do not stand on their own and tend

to be invalidated as technology/human-race evolves. This problem can be generalised

from the larger subject of all security. You could ask how hard it is to break the weakest

part of the system with existing technologies and a reasonable amount of resources given;

and how hard it is to break/invalidate the assumption of the system but the entire system

of defence is based on assumption of attacks. As long as there are watermarks existing,

there will always be people attempting to break them. With any system that isn’t trivially

simple, you cannot state what the weakest part of the system is due to it’s complexity.

One of the main forms of copyright protection being pushed by the Music Industry is

Digital Rights Management. This involves the use of a container that carries encrypted

audio. The rights are distributed within the audio file itself and are enforced by the

software player at decode time. This system means that many of the traditional watermark

attacks are foiled. Altering the file usually results in rendering the file useless. However,

most DRM systems require proprietary infrastructure to support the encoding, license

management and decoding process. Watermarks have the advantage of being format

independent as they are embedded within the essence itself.

Watermarking does not stand alone as the future of audio protection. Essentially it is a

technique that has been inspired by it’s sibling in the visual domain. The significant

overlap between the two fields may mean that the concept can be transferred with ease

into the audio domain, but the differences between the two are where it falls down.

For watermarking to seriously be considered, principle ideas need to be revised to find

solutions to the evident shortcomings, but with rival processes fulfilling the same goals,

you have to wonder what the practicalities are of furthering the development.

7. Further Development

7.1 Potential Improvements

So far the LSB coding and echo hiding watermarking techniques have been successfully

implemented. As discussed earlier, there are many more algorithms that could potentially

be tested, including variations of those already attempted, such as the echo hiding

technique [15]. The techniques currently being used also have room for improvement

with regards to their robustness. With more time, more algorithms could have been

developed with ranging characteristics to find out which had the best use for certain

places. With this, it may even have been possible to make observations with ideas of

improvement of current techniques.

More extensive testing could have revealed more about the particular strengths and

weaknesses of certain watermarks. By extending the number of source sounds being

tested, a wider idea of what sort of sound files suited which watermarks could have been

developed. Although it’s impossible to perform an exhaustive test on sounds and

combinations of sounds, more tested would have meant more assured results. More

attributes could have been applied to the attacking techniques to find out what levels the

existing watermarks succeeded most or failed most at.

Specific improvements could have been made to algorithms themselves. An example of

some variations that can be made with the technique can be found in New Echo

Embedding Technique for Robust and Imperceptible Audio Watermarking [26]. One

major drawback of echo hiding in general is its vulnerability to malicious attacks, since

the information can be detected by anyone without using a secret key. An attacker can

exploit this knowledge if he knows the underlying algorithm to apply a removal attack. A

possible countermeasure against the easy determination of the delay time is the spreading

of the echo over the time axis. This is accomplished by substituting the Dirac delta

function in the response function with a pseudonoise (PN) sequence. Instead of

calculating the autocorrelation in the cepstrum domain, despreading of the echo is

performed by cross-correlation of the cepstral signal with the PN sequence generated

from a secret key.

The lack of usability is a big flaw in the program. Usability was not a major

consideration for the project since its use was for simply collecting information regarding

watermarking techniques. With a graphical user interface (GUI), then the program would

become more available to other users making results retrieval easier and more efficient.

The program could potentially be used to implement your own watermarks to your own

media, using the testing tools to decide which watermark was right for you. It would also

make it easier to implement watermarks for testing.

8. Bibliography

[1] Gillies, James. Callilau, Robert Callilau. How the Web Was Born: The Story of

the World Wide Web (Popular Science). Oxford Paperbacks. 0192862073.

[2] Alderman, John. Sonic Boom: Napster, P2P and the Battle for the Future of

Music. Fourth Estate. 1841155136.

[3] Cox, Ingemar J. Miller, Matthew L. Bloom, Jeffrey A. Digital Watermarking.

Morgan Kaufmann. 1558607145.

[4] Kozaczuk, Wladyslaw. Enigma: How the German Machine Cipher Was Broken

and How It Was Read by the Allies in World War Two. Univ Publications of

America. 0890935475.

[5] Anderson, Ross J. Information Hiding: 1st International Workshop, Cambridge,

U.K., May 30-June 1, 1996: Proceedings (Lecture Notes in Computer Science,

1174). Springer Verlag. 3540619968.

[6] Digital Watermarking World. (http://www.watermarkingworld.org/)

[7] Katzenbeisser, Stefan. Petitcolas, Fabien A P. Information Hiding Techniques for

Steganography and Digital Watermarking. Artech House, 1580530354.

[8] Matlin, Margaret W. Foley, Hugh J. Sensation and Perception. Pearson Allyn &

Bacon. 0205263828.

[9] Petitcolas, Fabien A P. Anderson, Ross J. Kuhn, Markus G. Attacks on Copyright

Marking Systems. (http://www.cl.cam.ac.uk/~fapp2/publications/ih98-

attacks.pdf)

[10] Petitcolas, Fabien A P. Stirmark Benchmark 4.0.

(http://www.petitcolas.net/fabien/watermarking/stirmark/).

[11] Pereira, Shelby. Checkmark Benchmarking.

(http://watermarking.unige.ch/Checkmark/index.html).

[12] Argyriou, Vasilis. Optimark. (http://poseidon.csd.auth.gr/optimark/)

[13] Tsung, Din-Chand. Hsieh, Ming-Shing. Huang, Yong-Huai. Hiding digital

watermarks using multiresolution wavelet transform. IEEE Transactions

On Industrial Electronics, 48(5):875–882, 2001.

[14] Hwang, Min-Shiang. Chang, Chin-Chen. Hwang, Kuo-Feng. A Watermark

Technique Base On One-Way Hash Functions. IEEE Transactions on Consumer

Electronics, Volume 45 Issue 2. 0098-3063.

[15] Morimoto, N. Bender, W. Gruhl, D. Lu, A. Techniques For Data Hiding. IBM

Systems Journal Vol. 35, No. 3&4, 1996 - MIT Media Lab. G321-5608.

[20] Cheng, Samuel. Yu, Heather. Xiong, Zixiang. Enhanced Spread Spectrum

Watermarking of MPEG-2 AAC Audio. (ICASSP). Vol. 4, pp. 3728–3731.

[21] Arnold, Michael. Wolthusen, Stephen D. Schmucker, Martin. Techniques and

Applications of Digital Watermarking and Content Protection. Artech House.

1580531113.

[22] Lindley, Craig. Digital Audio With Java. Prentice Hall PTR. 0130876763.

[23] Press, William H. Flannery, Brian P. Teukolsky, Saul A. Vetterling, William T.

Numerical Recipes in C : The Art of Scientific Computing. Cambridge University

Press. 0521431085.

[24] Herley. C. Why Watermarking Is Nonsense. IEEE Signal Processing Magazine,

pp. 10-11, Sep. 2002.

[25] Gruhl, Daniel. Bender, Walter. Lu, Anthony. Echo Hiding. Information Hiding:

First International Workshop, Vol. 1174 of Lecture Notes in Computer Science,

Cambridge, Springer-Verlag, pp. 295-315.

[26] Oh, Hyen-O. Soek, Jong-Won. Hong, Jin-Woo. Youn, Dae-Hee. New Echo

Embedding Technique for Robust and ImperceptibleAudio Watermarking.

International Conference on Acoustics, Speech and Signal Processing (ICASSP),

Orlando, pp. 1341–1344.

[27] Swanson, Mitchell D. Zhu, Bin. Tewfik, Ahmed H. Transparent robust image

watermarking. In 1996 SPIE Conf. on Visual Communications and Image Proc.,

volume III, pages 211–214, 1996.

[28] Tao, Bo. Dickinson, Bradley. Adaptive watermarking in the DCT domain. In

International Conf.on Accoustics, Speech, and Signal Processing, ICASSP ’97,

April 1997.

[29] Fridrich, J. Image watermarking for tamper detection.

(http://citeseer.nj.nec.com/fridrich98image.html), 1998.

[30] Wang, James Ze and Wiederhold, Gio. Wavemark: Digital image watermarking

using daubechies’ wavelets and error correction coding. In Proceedings of SPIE,

volume 3528, pages 432=–439, November 1998.

[31] Hsu, Chiou-Ting and Ling, Ja. Hidden digital watermarks in images. IEEE

Transactions on Image Processing, 8(1):58–68, 1999.

9. Appendices

9.1 Additional Algorithm Descriptions

9.1.1 Phase Coding

Approaches that embed the watermark into the phase of the original signal exploit the fact

that the human auditory system has a low sensibility against relative phase changes, as

stated in [techniques].

This method was presented by Walter Bender et al. in [11] and proposes to split the

original audio stream into blocks and embed the whole watermark into the phase

spectrum of the first block, as described in [techniques].

One disadvantage of the phase coding approach is the low payload that can be achieved.

Only the first block is used in embedding the watermark. Since the watermark is not

distributed over the entire data set, but is implicitly localized, it can be removed easily if

cropping is acceptable.

9.1.2 Phase Modulation

Another form of embedding the watermark into the phase is by performing independent

multiband phase modulation, [techniques] states. Inaudible phase modifications are

exploited in this algorithm by controlled multiband phase alterations of the original

signal.

Both phase embedding approaches use the psychoacoustic features of the human auditory

system with regard to the just noticeable phase changes. They exploit the inaudibility of

phase changes if the time envelope of the original signal is approximately preserved.

Because of the phase alteration, embedding and detection of the watermark is done in the

Fourier domain by processing the audio stream blockwise. While the phase coding

method is embedding the watermark in the phases of the first block, the phase modulation

algorithm performs a long-term multiband phase modulation. Both algorithms are non-

blind watermarking methods, since they require the original signal during the watermark

retrieval, which of course limits their applicability.

9.1.3 Watermarking the compressed bitstream

Several approaches exist to embed the watermark directly into the already compressed

audio bit stream as we can see in [techniques]. Time consuming decoding, watermarking

embedding, and re-encoding in the case of pulse code modulation (PCM) watermarking

techniques are not necessary in order to embed the watermark. Furthermore, the retrieval

process does not involve a decoding procedure, which results in an additional decrease in

watermark retrieval speed. Nevertheless, the starting point for professionally created

audio material is always the PCM format. These approaches change the contents of the

MPEG frame directly. The scaling factor can be viewed as a logarithmic gain factor for

the sample values in order to retrieve the original samples in PCM format. The

embedding of the watermark is done by changing the scaling factors of different frames

according to a special pattern derived from a secret key. A problem of this method is that

some audio streams carry only a few scaling factors per frame. Therefore, the space for

embedding a watermark is reduced. This leads to the problem that multiple watermarks

cannot be embedded, because altering scale factors already used for embedding the first

watermark destroys the quality of the audio data. A second approach in the variation of

the MPEG frame tries to alter the sample values instead of the scaling factors. Embedding

multiple watermarks is also critical in this case. The additional requirement of using the

original track as input for the retrieval process further limits the applicability of this

approach.

Besides working on MP3 bit streams, methods like the one presented by Cheng et al. in

Enhanced Spread Spectrum Watermarking of MPEG-2 AAC Audio [20] are embedding

watermarks into the advanced audio coding (AAC)[21, 22] compressed bit stream by

direct modification of the quantized coefficients. The watermark bits are embedded by

performing a spread-spectrum modulation (see Sections 2.3.3 and 5.4.5) of the quantized

coefficients. The individual bits are retrieved by a linear correlation of the

PN sequence used during the embedding combined with the quantized coefficients of the

watermarked bit stream. The coefficients to be modified are selected by applying a

heuristic, which uses only nonzero coefficients in a predefined frequency range. The

amount of distortion applied is fixed and set to the quantization step size of 1.

Methods of directly watermarking the compressed bit stream have in common that they

do not make use of a psychoacoustic model. Both embedding and detection are performed

directly on the compressed bit stream, where the audio stream is processed in frames

according to the formatting of the bit stream in the specific compression algorithm.

Additional information is not necessary if the audio data are synchronized. The main

advantage is the low computational cost. Furthermore, these methods obey implicit

robustness against their specific compression format due to embedding of the watermark

in the already compressed bit stream. The main disadvantage of these methods is the

missing psychoacoustic counterpart in comparison to the uncompressed audio signal. The

influence on the audio quality of the original track by altering scaling factors, sample

data, or the quantized coefficients can only be estimated. Moreover, the decoding of the

compressed bit stream and a new compression with a shifted audio stream may lead to a

synchronization problem because of the new scaling factors, sample data, and

quantization coefficients of the MPEG frames. Furthermore, the complexity advantage is

lost if the watermarked audio tracks have to be transcoded in another compression format.

9.1.4 Integrating watermark embedding into

compression encoder

Besides directly watermarking the bit stream, other methods extract the information in the

compressed bit stream from the quantization of the audio samples. This enables the

estimation of the masking threshold to shape the watermark noise below this threshold in

order to ensure inaudibility. Integrating the watermark and compression encoder has two

advantages: The quality during the watermarking can be controlled in contrast to the

methods described above and the speed of embedding is improved in comparison to two

separate processes of watermarking and compression. The building blocks consists of

parts of the PCM watermark embedder and the compression decoder and encoder.

Part of the bit stream decoder is used in order to read the scaling factors and decode the

bit stream and perform the inverse quantization of compressed samples. The information

about the quantization enables the calculation of the masking threshold. The masking

threshold controls the multiplication factors used to multiply the spectral lines of the

constructed watermark— as usual in a perceptual watermark encoder—applying the

masking effects. The watermark generation can be the same as for the PCM watermark

embedder. After weighting the spectrum of the watermark noise, the result is added to the

original spectral lines. The extracted scaling factors from the original frame are used in

order to quantize the marked audio data again and format the bit stream. The final output

is the marked bit stream.

This method makes implicit usage of the psychoacoustic model by approximating the

perceptual information contained in the MPEG frames. Detection can be performed on

the compressed and uncompressed audio data. It is a blind watermarking method, which

distributes the bits over different MPEG frames. Due to the usage of parts of the

compression encoder and decoder, such a mechanism is tied to the special compression

scheme used. For each newly developed compression algorithm, a new integration of the

watermarking embedding procedure becomes necessary.

9.1 Source Code Sample

FFT.java package math; import applet.FloatGraph; import applet.Graph; import applet.ViewGraph; import audio.Channels; /** * @author Steven Morgan * * Calculates the Fourier transform of a set of n real - valued data * points. Replaces this data (which is stored in array data[1..n]) * by the positive frequency half of its complex Fourier transform. * * The real - valued first and last components of the complex transform * are returned as elements data[1] and data[2], respectively. n must * be a power of 2. This routine also calculates the inverse * transform of a complex data array if it is the transform of real * data. (Result in this case must be multiplied by 2/n.) */ public class FFT {

public static final void getFloats( int[] ints, float[] floats) { for ( int i = 0; i < ints.length; i++) { floats[i + 1] = ints[i]; } } public static final void getInts( float[] floats, int[] ints) { for ( int i = 0; i < ints.length; i++) { ints[i] = Math.round(floats[i + 1]); } } public static final void fudgeFactor( float[] data, int n) { // Fudge Factor: multiply result by 2 / n float factor = 2.0f / ( float) n; for ( int i = 1; i <= n; i++) data[i] *= factor; } public static final void four1( float[] data, int nn, int isign) { int i, j, m, n, mmax, istep; double wtemp, wr, wpr, wpi, wi, theta; //Double precision for the trigonometric recurrences. float tempr, tempi; n = nn << 1; j = 1; // This is the bit-reversal section of the routine. for (i = 1; i < n; i += 2) { if (j > i) { // Exchange the two complex numbers. float d; d = data[j]; data[j] = data[i]; data[i] = d; d = data[j + 1]; data[j + 1] = data[i + 1];

data[i + 1] = d; } m = nn; while (m >= 2 && j > m) { j -= m; m >>= 1; } j += m; } // Here begins the Danielson-Lanczos section of the routine. mmax = 2; while (n > mmax) { // Outer loop executed log2 nn times. istep = mmax << 1; theta = isign * (2.0 * Math.PI / mmax); // Initialize the trigonometric recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); wr = 1.0; wi = 0.0; // Here are the two nested inner loops. for (m = 1; m < mmax; m += 2) { for (i = m; i <= n; i += istep) { j = i + mmax; // This is the Danielson-Lanczos formula: tempr = ( float) (wr * data[j] - wi * data[j + 1]); tempi = ( float) (wr * data[j + 1] + wi * data[j]); data[j] = data[i] - tempr; data[j + 1] = data[i + 1] - tempi; data[i] += tempr; data[i + 1] += tempi; }

wr = (wtemp = wr) * wpr - wi * wpi + wr; // Trigonometric recurrence. wi = wi * wpr + wtemp * wpi + wi; } mmax = istep; } } public static final void realft( float data[], int n, int isign) { int i, i1, i2, i3, i4, np3; float c1 = 0.5f, c2, h1r, h1i, h2r, h2i; double wr, wi, wpr, wpi, wtemp, theta; //Double precision for the trigonometric recurrences. theta = Math.PI / ( double) (n >> 1); //Initialize the recurrence. if (isign == 1) { c2 = -0.5f; four1(data, n >> 1, 1); // The forward transform is here. } else { c2 = 0.5f; // Otherwise set up for an inverse transform. theta = -theta; } wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); wr = 1.0 + wpr; wi = wpi; np3 = n + 3; for (i = 2; i <= (n >> 2); i++) { // Case i=1 done separately below. i4 = 1 + (i3 = np3 - (i2 = 1 + (i1 = i + i - 1)));

h1r = c1 * (data[i1] + data[i3]); // The two separate transforms are separated out of data. h1i = c1 * (data[i2] - data[i4]); h2r = -c2 * (data[i2] + data[i4]); h2i = c2 * (data[i1] - data[i3]); data[i1] = ( float) (h1r + wr * h2r - wi * h2i); // Here they are recombined to form the true transform of the original real data. data[i2] = ( float) (h1i + wr * h2i + wi * h2r); data[i3] = ( float) (h1r - wr * h2r + wi * h2i); data[i4] = ( float) (-h1i + wr * h2i + wi * h2r); wr = (wtemp = wr) * wpr - wi * wpi + wr; // The recurrence. wi = wi * wpr + wtemp * wpi + wi; } if (isign == 1) { data[1] = (h1r = data[1]) + data[2]; // Squeeze the first and last data together to get them all within the original array. data[2] = h1r - data[2]; } else { data[1] = c1 * ((h1r = data[1]) + data[2]); data[2] = c1 * (h1r - data[2]); four1(data, n >> 1, -1); // This is the inverse transform for the case isign=-1. } } public static final void cosft1( float y[], int n) { int j, n2; float sum, y1, y2;

double theta, wi = 0.0, wpi, wpr, wr = 1.0, wtemp; // Double precision for the trigonometric recurrences. theta = Math.PI / n; // Initialize the recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); sum = 0.5f * (y[1] - y[n + 1]); y[1] = 0.5f * (y[1] + y[n + 1]); n2 = n + 2; for (j = 2; j <= (n >> 1); j++) { // j=n/2+1 unnecessary since y[n/2+1] unchanged. wr = (wtemp = wr) * wpr - wi * wpi + wr; // Carry out the recurrence. wi = wi * wpr + wtemp * wpi + wi; y1 = 0.5f * (y[j] + y[n2 - j]); // Calculate the auxiliary function. y2 = (y[j] - y[n2 - j]); y[j] = ( float) (y1 - wi * y2); // The values for j and N ? j are related. y[n2 - j] = ( float) (y1 + wi * y2); sum += wr * y2; // Carry along this sum for later use in unfolding the transform. } realft(y, n, 1); // Calculate the transform of the auxiliary function. y[n + 1] = y[2]; y[2] = sum; // sum is the value of F1 in equation (12.3.21).

for (j = 4; j <= n; j += 2) { sum += y[j]; // Equation (12.3.20). y[j] = sum; } } /* Calculates the sine transform of a set of n real-v alued data points stored in array y[1..n]. The number n must be a power of 2. On exit y is re placed by its transform. This program, without changes, also calculates the inverse sine transform, but in this case the output array should be multiplied by 2/n. */ public static final void sinft( float y[], int n) { int j, n2 = n + 2; float sum, y1, y2; // Double precision in the trigonometric recurrences. double theta, wi = 0.0, wr = 1.0, wpi, wpr, wtemp; theta = 3.14159265358979 / ( double) n; // Initialize the recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); y[1] = 0.0f; for (j = 2; j <= (n >> 1) + 1; j++) { wr = (wtemp = wr) * wpr - wi * wpi + wr; // Calculate the sine for the auxiliary array. wi = wi * wpr + wtemp * wpi + wi; // The cosine is needed to continue the recurrence. y1 = ( float) (wi * (y[j] + y[n2 - j])); // Construct the auxiliary array. y2 = 0.5f * (y[j] - y[n2 - j]); y[j] = y1 + y2; // Terms j and N ? j are related. y[n2 - j] = y1 - y2; } realft(y, n, 1); // Transform the auxiliary array.

y[1] *= 0.5; // Initialize the sum used for odd terms below. sum = y[2] = 0.0f; for (j = 1; j <= n - 1; j += 2) { sum += y[j]; y[j] = y[j + 1]; // Even terms determined directly. y[j + 1] = sum; // Odd terms determined by this running sum. } } public static void printFloats(String name, float[] floats) { System.out.print(name + "\t:= " ); printFloats(floats); } public static void printFloats( float[] floats) { System.out.print( "[" ); for ( int i = 0; i < floats.length; i++) { if (i > 0) System.out.print( ", " ); System.out.print(Math.round(floats[i] * 1000) / 1000.0f); } System.out.println( "]" ); } public static void fastFourierTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = floats.length - 1; i >= 0; i--) { temp[i + 1] = floats[i]; } realft(temp, floats.length, 1); for ( int i = floats.length - 1; i >= 0; i--) { floats[i] = temp[i + 1]; } } public static void inverseFastFourierTransform( float[] floats) {

float[] temp = new float[floats.length + 1]; for ( int i = floats.length - 1; i >= 0; i--) { temp[i + 1] = floats[i]; } realft(temp, floats.length, -1); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = floats.length - 1; i >= 0; i--) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void cosineTransform( float[] floats) { float[] temp = new float[floats.length + 2]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } cosft1(temp, floats.length); for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1]; } } public static void inverseCosineTransform( float[] floats) { float[] temp = new float[floats.length + 2]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } cosft1(temp, floats.length); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void sineTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } sinft(temp, floats.length); for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1];

} } public static void inverseSineTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } sinft(temp, floats.length); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void testRealft1() { float[] floats = new float[8 + 1]; floats[8] = 1; printFloats(floats); realft(floats, 8, -1); fudgeFactor(floats, 8); printFloats(floats); realft(floats, 8, 1); printFloats(floats); } public static void testRealft2() { float[] theFloats = new float[1 << 10]; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i; } printFloats(theFloats); fastFourierTransform(theFloats); //printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats);

} public static void testRealft3() { float[] theFloats = new float[1 << 3]; int v = 1; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = v; v *= -1; } printFloats(theFloats); fastFourierTransform(theFloats); printFloats(theFloats); theFloats[0] = 0; theFloats[1] = 0; theFloats[theFloats.length - 1] = theFloats.lengt h; printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats); } public static void testRealfft4() { float[] theFloats = new float[1 << 3]; int v = 1; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i + 1; } //Cepstrum.randomize(theFloats); printFloats(theFloats); float[] response = new float[theFloats.length]; response[5] = 1; printFloats(response); fastFourierTransform(theFloats); printFloats(theFloats); //if(true)return; fastFourierTransform(response);

printFloats(response); Channels.multiplyComplex2(theFloats, response); //Channels.multiply(theFloats, response); printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats); } public static void main(String[] args) { float[] theFloats = new float[1 << 3]; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i + 1; } FFT.printFloats( "floats" , theFloats); float[] fft; fft = ( float[]) theFloats.clone(); fastFourierTransform(fft); FFT.printFloats( "fft" , fft); inverseFastFourierTransform(fft); FFT.printFloats( "inverse" , fft); fft = ( float[]) theFloats.clone(); Graph graph = new FloatGraph(fft); graph.setTitle( "fft" ); ViewGraph view = new ViewGraph(graph); view.setVisible( true); } }

Cepstrum.java package math;

import audio.Channels; /** * @author Steven Morgan * * Utility class that provides Cepstrum related functions */ public class Cepstrum { /** * computes the complex log of this list of complex numbers * in the format as produced by the math.FFT.realft() method * * This method is experimental: there is a problem with phase * wraping - ie: the imaginary part of the complex log. * This problem is due to the multivalued result of a complex log * * @param dest - the list of complex numbers */ public static final void log( float[] dest) { float temp; dest[0] = ( float) Math.log(Math.abs(dest[0])); dest[1] = ( float) Math.log(Math.abs(dest[1])); for ( int i = 2; i < dest.length; i += 2) { float r = ( float) Math.sqrt( dest[i] * dest[i] + dest[i + 1] * dest[i + 1]); // 0 <= arg < Pi float arg = ( float) Math.acos(dest[i] / r); if (dest[i + 1] < 0.0) { //arg = (float)(2.0 * Math.PI - arg); arg = -arg; } //if (arg < -2.7) arg += 2.0 * Math.PI;

//arg += 8 * Math.PI; //System.out.print(((int) (arg * 1000)) / 1000.0 + ", "); dest[i] = ( float) Math.log(r); dest[i + 1] = arg; } /* * For uniqueness, it is necessary that the phase be * "unwrapped", which eliminates the jumps as the phase * passes between -PI and PI. This careful definition causes * the complex cepstrum of a real sequence to als o be a real * sequence. * Following is a utility function that will perform the * phase unwrapping for discrete numeric data. * This technique may fail for extremely oscillatory * functions, and is not appropriate for very noi sy data. */ float min = dest[3]; float max = min; //System.out.println(); boolean changed = true; while (changed) { changed = false; for ( int i = 2; i < dest.length - 2; i += 2) { float arg = dest[i + 1]; float next = dest[(i + 2) + 1]; /* while (next - arg > Math.PI) { next -= 2 * Math.PI; changed = true;

} */ while (arg - next > Math.PI) { next += 2 * Math.PI; changed = true; } dest[(i + 2) + 1] = next; //System.out.print(((int) (arg * 1000)) / 1000.0 + ", "); min = Math.min(min, next); max = Math.max(max, next); } float shift = 0.0f; if (min < -Math.PI) { } /* System.out.print(((int) (dest[dest.length - 2] * 1000)) / 1000.0 + ", "); System.out.println(); System.out.println("Min, Max:" + min + ", " + max); */ } } /** * computes the log of the modulus of this list of complex numbers * in the format as produced by the math.FFT.realft() method * * @param dest - the list of complex numbers to be modified */ public static final void logModulus( float[] dest) { float temp; dest[0] = ( float) Math.log(Math.abs(dest[0])); dest[1] = ( float) Math.log(Math.abs(dest[1])); for ( int i = 2; i < dest.length; i += 2) { float r =

( float) Math.sqrt( dest[i] * dest[i] + dest[i + 1] * dest[i + 1]); dest[i] = ( float) Math.log(r); dest[i + 1] = 0; } } /** * computes the log of the list of real numbers provided * * @param floats - the list of floats to be modified */ public static final void realLog( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) Math.log(Math.abs(floats[i])); } } /** * Computes the complex Cepstrum of the list of real valued data * - uses the experimental complex log method above * * @param floats - the list of real valued data * @return the complex cepstrum of the real input data */ public static final float[] complexCepstrum( float[] floats) { float[] cepstrum = ( float[]) floats.clone(); FFT.fastFourierTransform(cepstrum); log(cepstrum); FFT.inverseFastFourierTransform(cepstrum); return cepstrum; } /** * computes the cepstrum of the list of real valued data * - uses the logModulus method above to compute the log *

* @param floats - the list of real valued data * @return the real cepstrum of the real input data */ public static final float[] realCepstrum( float[] floats) { float[] cepstrum = ( float[]) floats.clone(); FFT.fastFourierTransform(cepstrum); logModulus(cepstrum); FFT.inverseFastFourierTransform(cepstrum); return cepstrum; } /** * Test helper method - assigns the given list of floats with * random values between - 1000 and 1000 * * @param floats - the list to be modified */ public static void randomize( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) (Math.random() - 0.5) * 2000; } } /** * Test method - to verify the cepstrum property for the given float lists: * the cepstrum of the convolution of two functions is * the same as the sum of the cepstrum of each * * This method is used to test experimental complexLog() method * * @param theC - the first input list of real numbers * @param theH - the second input list of real numbers * @param debug - flag set for debug output * @return - the number of differences between the two result lists */ public static final int testCepstrumLogProperty(

float[] theC, float[] theH, boolean debug) { float[] c = ( float[]) theC.clone(); float[] h = ( float[]) theH.clone(); float[] cepC = complexCepstrum(c); float[] cepH = complexCepstrum(h); FFT.fastFourierTransform(c); FFT.fastFourierTransform(h); if (debug) { FFT.printFloats( "F[c]" , c); FFT.printFloats( "F[h]" , h); } float[] ch = ( float[]) c.clone(); Channels.multiplyComplex2(ch, h); FFT.inverseFastFourierTransform(ch); if (debug) FFT.printFloats( "ch" , ch); float[] cepCcepH = ( float[]) cepC.clone(); Channels.addComplex(cepCcepH, cepH); float[] cepCH = complexCepstrum(ch); if (debug) { FFT.printFloats( "cC+cH" , cepCcepH); FFT.printFloats( "cepCH" , cepCH); } int diffcount = 0; for ( int i = 0; i < cepCcepH.length; i++) { if (Math.abs(cepCcepH[i] - cepCH[i]) >= 0.00001) { diffcount++; } } return diffcount; } /**

* Test method to repeatedly test the cepstrum log property * for random input lists * */ public static void testLogProperty() { float[] c = new float[1 << 3]; float[] h = new float[c.length]; for ( int i = 0; i < c.length; i++) { c[i] = ((17 * (i + 1)) % 8); h[i] = -4 + i * i; } boolean debug = false; int matchCount = 0, runCount = 100; for ( int i = 0; i < runCount; i++) { randomize(c); randomize(h); if (debug) { FFT.printFloats( "c" , c); FFT.printFloats( "h" , h); } int diffCount = testCepstrumLogProperty(c, h, false); if (diffCount == 0) { matchCount++; } } System.out.println( "matched " + matchCount + " out of " + runCount); } public static void main(String[] args) { testLogProperty(); } }

WatermarkedAudioInputStream.java

package player; import java.io.*; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioInputStream; import watermark.*; /** * @author Steven Morgan * * AudioInputStream Class which repeatedly applys a given Watermark * to fixed size windows from the source AudioInputStream * */ public class WatermarkedAudioInputStream extends AudioInputStream { private AudioInputStream ais; int currentPosition; int bytesInWindow; private byte[] window; int windowSize; Watermark watermark; public WatermarkedAudioInputStream(AudioInputStream ais, int windowSize) { super(ais, ais.getFormat(), ais.getFrameLength()); this.ais = ais; this.windowSize = windowSize; window = new byte[windowSize]; currentPosition = 0; bytesInWindow = 0; } public WatermarkedAudioInputStream(AudioInputStream ais, int windowSize, Watermark watermark) { this(ais, windowSize); setWatermark(watermark); }

public Watermark getWatermark() { return watermark; } public void setWatermark(Watermark watermark) { this.watermark = watermark; watermark.setAudioFormat(getFormat()); } public int available() throws IOException { return bytesInWindow - currentPosition + ais.available(); } public int read( byte[] buf, int offset, int length) throws IOException { int bytesRead = 0; while (bytesRead < length) { if (currentPosition >= bytesInWindow) { currentPosition = 0; bytesInWindow = ais.read(window, 0, windowSize); if (bytesInWindow <= 0) { break; } for ( int i = bytesInWindow; i < windowSize; i++) { window[i] = 0; } watermark.apply(window, 0, window.length); } int bytesToRead = Math.min(bytesInWindow - currentPosition, length - bytesRead); System.arraycopy( window, currentPosition, buf,

bytesRead, bytesToRead); bytesRead += bytesToRead; currentPosition += bytesToRead; } return bytesRead; } public int read( byte[] buf) throws IOException { return read(buf, 0, buf.length); } public int read() throws IOException { byte[] buf = new byte[1]; read(buf); return buf[0]; } public void close() throws IOException { ais.close(); } public synchronized void mark( int arg0) { throw new UnsupportedOperationException(); } public boolean markSupported() { return false; } public synchronized void reset() throws IOException { throw new UnsupportedOperationException(); } public long skip( long bytes) throws IOException { throw new UnsupportedOperationException(); } public AudioFormat getFormat() { return ais.getFormat(); }

public long getFrameLength() { return ais.getFrameLength(); } }

App.java package applet; import java.applet.Applet; import java.awt.*; import java.awt.event.*; import java.beans.*; import java.io.*; import javax.sound.sampled.*; import math.FFT; import audio.Band; import audio.Channels; /** * @author Steven Morgan * * Experimental Class used to vizualise and test the various * methods in the other packages * */ public class App extends Applet implements ActionListener, AdjustmentListener, PropertyChangeListener { public static final int MPEG_LAYER1_BLOCK_SIZE = 1 << 9; public static final int MPEG_LAYER2_BLOCK_SIZE = 1 << 10; public static final int BLOCK_SIZE = MPEG_LAYER1_BLOCK_SIZE;

AudioFormat audioFormat; Visualization vis; Label field; Graph currentSubbandGraph = null; float[] freq = new float[BLOCK_SIZE]; float[] allSubbands = null; public void setFreq( int frequency) { for ( int i = 0; i < freq.length; i++) { freq[i] = 0; } freq[frequency] = freq.length; FFT.inverseFastFourierTransform(freq); } public void setup(Visualization vis) { String filename = "I:\\Original Compositions\\Drum Samples\\808\\Open Hi-Hat\\TR-808\\OH\\OH75.WAV" ; try { File file = new File(filename); AudioInputStream ais = AudioSystem.getAudioInputStream(file); audioFormat = ais.getFormat(); byte[] buf = new byte[BLOCK_SIZE * audioFormat.getFrameSize()]; ais.read(buf, 0, buf.length); int[] channel = Channels.getChannel(buf, 0, audioFormat); ais.close(); float[] floats = Channels.getFloats(channel);

float[] hanning = Channels.hanning(floats.length); FloatGraph source = new FloatGraph( ( float[]) floats.clone(), Channels.getMinValue(audioFormat.getSampleSizeInBi ts()), Channels.getMaxValue(audioFormat.getSampleSizeInBi ts())); source.setColor(Color.black); source.setTitle( "s(t)" ); vis.addGraph(source); for ( int i = 0; i < floats.length; i++) { floats[i] *= hanning[i]; } FloatGraph graph = new FloatGraph(hanning); graph.setColor(Color.blue); graph.setTitle( "H(t)" ); //vis.addGraph(graph); FloatGraph hanningGraph = new FloatGraph(floats); hanningGraph.setColor(Color.red); hanningGraph.setTitle( "H(t) * s(t)" ); //vis.addGraph(hanningGraph); float[] fft = ( float[]) floats.clone(); FFT.fastFourierTransform(fft); FloatGraph fftGraph = new FloatGraph(fft); fftGraph.setColor(Color.green); fftGraph.setTitle( "fft(H(t) * s(t))" ); System.out.println(); float[] fftInverse = ( float[]) fft.clone(); FFT.inverseFastFourierTransform(fftInverse); graph = new FloatGraph(fftInverse); graph.setColor(Color.pink); graph.setScale(hanningGraph); graph.setTitle( "fft-1(fft(H(t) * s(t)))" );

//addGraph(graph); //vis.addGraph(fftGraph); float[] x = ( float[]) fft.clone(); for ( int i = 0; i < x.length; i++) { x[i] = ( float) (10.0 * Math.log(x[i] * x[i]) / Math.log(10.0)); } FloatGraph xGraph = new FloatGraph(x); xGraph.setTitle( "X(k)" ); xGraph.setColor(Color.blue); vis.addGraph(xGraph); allSubbands = x; Graph bandGraph = new Graph(( int) (audioFormat.getSampleRate() / 2)) { { minValue = 0; maxValue = 24; } public float getValue( int x) { return ( int) Band.getSubBand(x); } }; bandGraph.setTitle( "Sub Band" ); bandGraph.setColor( new Color(0, 128, 255)); vis.addGraph(bandGraph); //vis.graphs.clear(); setFreq(0); Graph g = new FloatGraph(freq, -2, 2); g.setColor(Color.black); g.setTitle( "freq" ); //vis.addGraph(g); } catch (UnsupportedAudioFileException e) { e.printStackTrace(); } catch (IOException e) {

e.printStackTrace(); } } public void init() { super.init(); LayoutManager layout = new FlowLayout(); this.setLayout(layout); vis = new Visualization(); setup(vis); vis.setSize(640, 480); { Scrollbar scrollbar = new Scrollbar() { public Dimension getPreferredSize() { return new Dimension(20, 200); } }; scrollbar.setName( "frequency" ); scrollbar.setValues(0, 1, 0, BLOCK_SIZE); scrollbar.setSize(20, 480); scrollbar.addAdjustmentListener( this); this.add(scrollbar); field = new Label( "0" ); this.add(field); } { Scrollbar scrollbar = new Scrollbar() { public Dimension getPreferredSize() { return new Dimension(20, 200); } }; scrollbar.setName( "subband" );

scrollbar.setValues(0, 1, 0, 25); scrollbar.setSize(20, 480); scrollbar.addAdjustmentListener( this); this.add(scrollbar); } this.add(vis); Button layoutBut = new Button( "Layout" ); layoutBut.setActionCommand( "layout" ); layoutBut.addActionListener( this); this.add(layoutBut); this.setSize(900, 480); this.doLayout(); } public void actionPerformed(ActionEvent event) { System.out.println(event.getActionCommand()); if ( "layout" .equals(event.getActionCommand())) { doLayout(); System.out.println( "size:" + vis.getWidth() + ", " + vis.getHeight()); } } public void adjustmentValueChanged(AdjustmentEvent event) { System.out.println( "adjustmentValueChanged(" + event.getValue() + ")" ); String componentName = ((Component) event.getSource()).getName(); System.out.println(componentName); if ( "frequency" .equals(componentName)) { setFreq(event.getValue()); field.setText( "" + event.getValue()); vis.repaint();

} else if ( "subband" .equals(componentName)) { vis.removeGraph(currentSubbandGraph); currentSubbandGraph = new FloatGraph( Band.getSubBand( allSubbands, event.getValue(), ( int) audioFormat.getSampleRate())); currentSubbandGraph.setColor(Color.red); vis.addGraph(currentSubbandGraph); vis.repaint(); } } public void propertyChange(PropertyChangeEvent event) { System.out.println( "propertyChange(" + event.getPropertyName() + ")" ); if ( "frequency" .equals(event.getPropertyName())) { System.out.println(event); vis.repaint(); } } }

Visualisation.java package applet; import java.awt.Component; import java.awt.Dimension; import java.awt.Graphics; import java.util.ArrayList; import java.util.Iterator; import java.util.List; /** * @author Steven Morgan

* * An awt Component which displays a plot of a number of graphs * The graphs are drawn overlaid ontop of eachother * */ public class Visualization extends Component { public List graphs; public Visualization() { graphs = new ArrayList(); } public void addGraph(Graph graph) { graphs.add(graph); } public boolean removeGraph(Graph graph) { return graphs.remove(graph); } public synchronized void paint(Graphics g) { int width = getWidth(); int height = getHeight(); Iterator it = graphs.iterator(); while (it.hasNext()) { Graph graph = (Graph) it.next(); /* System.out.println( graph.getTitle() + ": range[" + graph.getMinValue() + ", " + graph.getMaxValue() + "]"); */ float minHeight = graph.getMinValue(); float maxHeight = graph.getMaxValue(); g.setColor(graph.getColor());

for ( int t = 0; t < width; t++) { int sampleCount = 0; float minSample = maxHeight; float maxSample = minHeight; float averageSample = 0.0f; for ( int idx = (t * graph.size()) / width; idx <= ((t + 1) * graph.size()) / width; idx++) { float sample = graph.getValue(Math.min(idx, graph.size() - 1)); averageSample += sample; minSample = Math.min(minSample, sample); maxSample = Math.max(maxSample, sample); sampleCount++; } averageSample += sampleCount / 2; averageSample /= sampleCount; minSample = ( float) height - 1.0f - ((minSample - minHeight) / (maxHeight - minHeight)) * ( float) height; maxSample = ( float) height - 1.0f - ((maxSample - minHeight) / (maxHeight - minHeight)) * ( float) height; averageSample = ( float) height - 1.0f

- (averageSample - minHeight) * ( float) height / (maxHeight - minHeight); g.fillRect( t, Math.max(Math.round(maxSample), 0), 1, Math.round(minSample - maxSample + 1.0f)); /* g.fillOval(t, Math.round(maxSample + 0.5f), 5, 5); g.fillOval( t, Math.round(maxSample + 0.5f) + Math.round(minSample - maxSample), 5, 5); */ } } } public Dimension getMinimumSize() { return new Dimension(200, 200); } public Dimension getPreferredSize() { return new Dimension(800, 400); } }

Autocorrelation.java package math; /** * @author Steven Morgan * * Utility class that provides Autocorrelation related functions */ public class Autocorrelation { /** * Returns a list of the real components of a list of complex numbers. The input array * is in the format as produced by the math.FFT.realft() method * * @param floats - source and destination float array */ static final void pack( float[] floats) { float mid = floats[1]; for ( int i = 1; i < floats.length / 2; i++) { floats[i] = floats[i << 1]; } floats[floats.length / 2] = mid; for ( int i = 1; i < floats.length / 2; i++) { floats[floats.length - i] = floats[i]; } } public static final void sqrt( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) Math.sqrt(floats[i]); } } /** * Calculates the real valued modulus squares of a list of * complex numbers in the format as produced by the math.FFT.realft() method

* * * @see math.FFT.realft() * * @param floats - source and destination float array */ public static final void modulusSqr( float[] floats) { floats[0] *= floats[0]; floats[1] *= floats[1]; for ( int i = 2; i < floats.length; i += 2) { float r = floats[i] * floats[i] + floats[i + 1] * floats[i + 1]; floats[i] = r; floats[i + 1] = 0; } pack(floats); } /** * Performs the autocorrelation function on the supplied list of floats * * @param floats - list of floats in the time domain */ public static void autocorrelation( float[] floats) { FFT.fastFourierTransform(floats); modulusSqr(floats); FFT.fastFourierTransform(floats); } public static float minPositive = 99999999; public static float maxNegative = 0; public static void main(String[] args) { } }

Player.java package player;

import java.io.IOException; import java.io.*; import javax.sound.sampled.*; /** * @author Steven Morgan * * Utility class to feed an AudioInputStream to a SourceDataLine * This can be used to listen to an AudioInputStream if the * AudioInputStream can provide the data in realtime */ public class Player { private InputStream ais; private SourceDataLine source; public Player(InputStream ais, SourceDataLine source) { this.ais = ais; this.source = source; } /** * Sends the AudioInputStream data to the SourceDataLine * This method will only return once the AudioInputStream has * been completely consumed by the SourceDataLine * * @throws LineUnavailableException * @throws IOException */ public void play() throws LineUnavailableException, IOException { int bytesRead; byte[] buf = new byte[10240]; source.open(); source.start(); do { bytesRead = ais.read(buf, 0, buf.length); if (bytesRead > 0) { source.write(buf, 0, bytesRead); }

} while (bytesRead > 0); source.drain(); source.stop(); source.close(); } }

AudioSystemInfo.java package test; import javax.sound.sampled.*; /** * @author Steven Morgan * */ public class AudioSystemInfo { public static final void main(String[] args) { System.out.println( "AudioFileTypes" ); AudioFileFormat.Type[] list = AudioSystem.getAudioFileTypes(); for ( int i = 0; i < list.length; i++) { System.out.println(list[i].toString()); } System.out.println( "Mixers" ); Mixer.Info[] mixerInfoList = AudioSystem.getMixerInfo(); for ( int i = 0; i < mixerInfoList.length; i++) { Mixer.Info info = mixerInfoList[i]; System.out.println(info.toString()); Mixer mixer = AudioSystem.getMixer(info); Line[] sourceLines = mixer.getSourceLines(); System.out.println( " Found " + sourceLines.length + " SourceLines" );

for ( int n = 0; n < sourceLines.length; n++) { System.out.println( " SourceLineInfo[" + n + "]: " + sourceLines[n].getLineInfo().toString()); } } Line.Info[] lines = { Port.Info.COMPACT_DISC, Port.Info.HEADPHONE, Port.Info.LINE_IN, Port.Info.LINE_OUT, Port.Info.MICROPHONE, Port.Info.SPEAKER }; System.out.println( "Ports" ); for ( int idx = 0; idx < lines.length; idx++) { Line.Info[] line = AudioSystem.getSourceLineInfo(lines[idx]); for ( int i = 0; i < line.length; i++) { System.out.println( "SourceLine: " + line[i].toString()); } line = AudioSystem.getTargetLineInfo(lines[idx]); for ( int i = 0; i < line.length; i++) { System.out.println( "TargetLine: " + line[i].toString()); } } BugFix.apply(); } }

Graph.java

package applet; import java.awt.Color; /** * @author Steven Morgan * * Abstract Class to represent the graph of a function * */ public abstract class Graph { protected float minValue, maxValue; private String title; private Color color = Color.BLACK; protected int theSize; public Graph() { } public Graph( int size) { this.theSize = size; } /** * @return the graphs title */ public String getTitle() { return title; } /** * sets the graphs title * * @param title */ public void setTitle(String title) { this.title = title; }

/** * @return - the color used to draw the graph */ public Color getColor() { return color; } /** * Sets the color with which to draw this graph * * @param color - the color of the graph */ public void setColor(Color color) { this.color = color; } /** * @return the minimum value on the Y axis */ public float getMinValue() { return minValue; } /** * @return the maximum value on the Y axis */ public float getMaxValue() { return maxValue; } public void setScale(FloatGraph graph) { this.minValue = graph.getMinValue(); this.maxValue = graph.getMaxValue(); } public abstract float getValue( int x); public final int size() { return theSize; } }

Tone.java package audio; /** * @author Steven Morgan * * Psycho - acoustic computation functions * */ public class Tone { /** * Extracts a list of Maxima points from a power density spectrum * * @param floats * @return */ public static Maxima[] getMaxima( float[] floats) { int maximaCount = 0; Maxima[] list = null; for ( int idx = 1; idx < floats.length - 1; idx++) { if (floats[idx] > floats[idx - 1] && floats[idx] >= floats[idx + 1]) { maximaCount++; } } list = new Maxima[maximaCount]; maximaCount = 0; for ( int idx = 1; idx < floats.length - 1; idx++) { if (floats[idx] > floats[idx - 1] && floats[idx] >= floats[idx + 1]) { Maxima maxima = new Maxima(); maxima.frequencyIndex = idx; maxima.tonal = true; int range = 2; if (idx >= floats.length / 4 - 1)

range = 3; if (idx >= floats.length / 2) range = 6; for ( int i = 2; i <= range; i++) { if ((idx - i >= 0) && (floats[idx] - floats[idx - i] < 7)) { maxima.tonal = false; break; } if ((idx + i < floats.length) && (floats[idx] - floats[idx + i] < 7)) { maxima.tonal = false; break; } } list[maximaCount++] = maxima; } } return list; } }

BugFix.java package test; /** * @author Steven Morgan * * To change the template for this generated type comment go to * Window>Preferences>Java>Code Generation>Code and Comments */ public class BugFix {

public static final void apply() { Thread[] ts = new Thread[2]; int threadCount = Thread.enumerate(ts); Thread javaSoundEventDispatcherThread = null; for ( int i = 0; i < threadCount; i++) { if ( "Java Sound event dispatcher" .equals(ts[i].getName())) { javaSoundEventDispatcherThread = ts[i]; } } if (javaSoundEventDispatcherThread != null) { System.out.println( "BugFix: Interrupting 'Java Sound event dispatcher' Thread" ); javaSoundEventDispatcherThread.interrupt(); } } }

9.2 Project Proposal

Understanding The Effectiveness Of Current Digital Watermarking Techniques

(sound or picture)

Author : Steven Morgan

Supervisor : Professor John P Fitch

Initial Project Description

With the advent of Internet publishing it is difficult to retain control over data like images

or sound. Some people have developed watermarking techniques (hiding data in the

image) to promote this control. A watermark is some data hidden in the medium in such a

way that a) the watermark is not visible/audible; b) the watermark is robust is the sense

that it is not easily removable by, say, changing a few pixels in the image; c) the

watermark is easily readable by the copyright owner.

The aim of this project is to implement a few watermarking techniques, and determine

their effectiveness against a few simple attacks.

Digital watermarking is the process of editing pictures, sounds or videos to

include an unnoticeable, robust change that can be read regardless of the manipulation of

the media. Watermarking is a lot like fingerprinting except that a watermark contains

more information about the owner.

Watermarking is used to enforce copyright laws and to ensure that the creator of any

media can add a subtle mark containing traceable information. Many different

watermarking techniques currently exist, but none are impenetrable with various forms of

attack existing to break the current techniques widely available. The evolution of

watermarking is still relatively primitive but it’s an area of cryptography that’s believed

to be a promising direction for the fight against piracy.

The growth of the Internet has created problems tracking the usage of media. The

copying of MPeg Layer 3s (MP3) and films through peer to peer programs and

unauthorised usage of images are problems that have become too big to be ignored and

the development of software and techniques to prevent this are becoming more and more

sophisticated.

Watermarking shows great promise as a copyright upholding technique since the

watermarker doesn’t rely on any other person to uphold its integrity. Previous methods of

copy protection have relied on the viewer or player to abide to the copyright protection

that they have. For example, on early Digital Versatile Disks (DVD), there was a boolean

marker that decided whether a disk was allowed to be copied or not. Some disks were

allowed one copy (where after the first copy the flag would be changed to never copy)

and others allowed no copy at all. The problem with this was that this relied on the DVD

player upholding this copy protection. With some people modifying the hardware or with

some players being created without any protection at all, the entire system collapsed. By

making the watermark untraceable, and invisible, it becomes a much more difficult task

to remove.

As long as there are different techniques preventing copyright infringement there will

always be a concerted effort to break these defences.

For watermarking to be effective, it has to fulfil three important criteria:

• The watermark is not visible or audible

To make the watermark a viable option, the user should not even know it is there.

Whether an editing of a picture or an echo added to a sound file, unless the

watermark is as undetectable as possible, people will not want to use it and it

makes the job of removing or distorting it a lot easier.

• The watermark is robust

Since there will be people who will do everything they can to remove these

watermarks, the watermark should still be recognisable after stretching, shearing,

shifting, rotating etc.

• The watermark is easily recognisable

Despite being well hidden, with the right procedure, the watermark should be

simple to find. After distortions there should be effective techniques for finding

the watermark despite it’s different form.

It is accepted that watermarking will probably never be impenetrable, since there will

always be a way to recognise the watermark so there will always be a way to distort or

destroy it. If the watermark is developed to a point where the media has to be heavily

modified to destroy the watermark and the watermark removal process is long and

arduous, then this should put most people off since the media will be tainted once the

mark is removed (like ink-tags attached to clothes in department stores; once the tag is

forcefully removed, it releases staining ink that permanently damages the item of clothing

involved).

Since there are many different watermarking schemes currently available, and since so

many of them are easily broken, a standard attack was created to test the effectiveness of

watermarks in general. This tool is known as the Stirmark Benchmark [10]. StirMark

is a relatively simple attack that applies minor distortion to media to see if the watermark

survives. A surprising number of current techniques fall down even at this basic level.

For example, Adobe Photoshop and Corel Draw come equipped a watermarking facility

called PictureMarc. It relies on a user ID and a two-digit password. This watermarking

procedure does not even meet the StirMark standard yet is widely used.

To fully understand watermarking you have to look as much at prevention techniques as

ways to break the system. It’s said that the best form of defence is attack. This project

will mainly be research based to fully appreciate the current techniques being used and

understanding where they fall short. This will involve research into all possible ways to

break the watermark too. Over time, the more inventive the attack techniques become,

the more effective the defence techniques will become. Unless people try to break the

watermarks, then the technology cannot evolve, but as long as piracy exists, the testers for

the software will already exist.

Implementing the watermark techniques will take a great understanding of how they

work. This may cause problems since watermarkers will not be very keen to give up the

information of how they specifically implement their watermarks since it would be a

security breach. Many watermarking techniques are widely available but efforts will

have to be made to ensure that all information found will be cutting edge.

What follows is a high level break-down of the tasks ahead to carry out this task:

• Research

There are currently many different watermarking techniques in existence, with

none working as the definitive. Background reading needs to be done to fully

understand the wide array of current watermarking techniques and how to

implement them. Some basic programming and ordered note taking should be

performed to ensure that all relevant data read can be utilised at a later date with

maximum efficiency. Most research should come from journals since

watermarking is a cutting edge technology that is constantly evolving.

• Implementation of many watermarking techniques across a wide array of

media

Once the research is carried out, to fully understand the capabilities of the various

watermarking techniques, many of them will have to be tested across a wide

array of media to ensure consistent results from the later testing. For example,

with the echo hiding technique, which implements an fractional echo on a sound

file (between 0.5 and 2 milliseconds), too small for the human ear to hear. The

larger the echo, the more effective the watermark, but the smaller the echo the

more hidden it is. These kind of tradeoffs are the ones that need to be

implemented to find the pros and cons of each technique.

• Testing the effectiveness of each watermark

For a watermark to be effective, it needs to withstand some heavy attacking and

still come out intact. Initial tests should measure each marking technique against

the StirMark to see which marks are truly ineffectual. More complicated attacks

will then discover which watermarks are effective against which techniques.

With echo hiding this would include testing of how large the echo can be before

human detection, the ease of finding the echo depending on its size etc.

• Evaluation of findings

Without any form of evaluation, the results would mean nothing. An evaluation

looks at all existing results and weighs the pros and cons of them all. At this

point suggested improvements to existing watermarking schemes could be

suggested and even implemented, depending on previous results.

Since this project is so research based, it’s hard to predict the precise direction that the

project takes since that will depend on conclusions drawn after the literature review.

Following this project plan however, should at least help understanding of the tasks ahead

how to go about tackling them.

Documents

Implementing and testing various digital watermarking ...mdv/courses/CM30082/projects.bho/2003-4/Steve… · Contents Implementing and testing various digital watermarking techniques