Upload
dinhduong
View
215
Download
0
Embed Size (px)
Citation preview
Implementing and testing various digital
watermarking techniques on audio data.
Steven Morgan
BSc in Computer Software Theory
COPYRIGHT
Attention is drawn to the fact that copyright of this thesis rests with its author. The
Intellectual Property Rights of the products produced as part of the project belong to the
University of Bath (see http://www.bath.ac.uk/ordinances/#intelprop).
This copy of the thesis has been supplied on condition that anyone who consults it is
understood to recognise that its copyright rests with its author and that no quotation from
the thesis and no information derived from it may be published without the prior consent
of the author.
Declaration
This dissertation is submitted to the University of Bath in accordance with the
requirements of the degree of Batchelor Science in the Department of Computer Science.
No portion of the work in this dissertation has been submitted in support of an application
for any other degree or qualification of this or any other university or institution of
learning. Except where specifically acknowledged, it is the work of the author.
Abstract
Digital watermarking is a term to describe inserting data invisibly within a host sound,
image or video file in order to prove ownership. Over the past decade many
watermarking techniques have been proposed to make this possible.
Any such mark should still be detectable after common processing operations, including
lossy file compression. Various audio watermarking techniques representative of this
work are implemented and tested against many sound processing operations which may
or may not remove the watermark. The techniques are compared and their individual
strengths and weaknesses analysed and potential paths for further development
suggested. It is concluded that no technique is yet fully robust in handling all potential
attacks. It is suggested that for the technique to be seriously considered in the audio
domain, fundamental principals need to be reconsidered.
.
Acknowledgements
The author would like to thank Russell Bradford for his initial project proposal
and John Fitch for his assistance in allowing the specification focus to be brought
into the audio domain.
Contents
Implementing and testing various digital watermarking techniques on audio data.1 Abstract ................................................................................................................... 3 Acknowledgements................................................................................................. 4 Contents .................................................................................................................. 5
1. Introduction................................................................................................. 7 1.1 Background ......................................................................................... 7 1.2 What is Digital Watermarking? .......................................................... 9 1.3 What are Digital Watermarking’s Uses?............................................. 9 1.4 Digital Watermarking Characteristics............................................... 10 1.4.1 Watermark Robustness.................................................................. 11 1.4.2 Watermark Extractability .............................................................. 11 1.4.3 Watermark Fidelity ....................................................................... 12 1.5 Report Outline................................................................................... 12
2. Literature Review...................................................................................... 14 2.1 History............................................................................................... 14 2.2 Image vs. Audio watermarking......................................................... 15 2.3 Spatial vs. Frequency Domain .......................................................... 17 2.4 Specific Audio Techniques ............................................................... 19
3. Preparation ................................................................................................ 21 3.1 Technology Choices.......................................................................... 21 3.2 High Level Design ............................................................................ 22 3.2.1 Finding the maximum inaudible volume ...................................... 23 3.2.2 LSB ............................................................................................... 23 3.2.3 Echo Hiding .................................................................................. 24 3.2.4 Spread-Spectrum........................................................................... 27 3.2.5 The Patchwork Technique ............................................................ 30 3.3 Development Methodology............................................................... 31 3.4 Test Plan............................................................................................ 32 3.4.1 Module Testing ............................................................................. 32 3.4.2 End To End Testing ...................................................................... 33 3.5 Evaluation Data................................................................................. 33 3.6 Attacking Techniques ....................................................................... 34 3.6.1 Cropping........................................................................................35 3.6.2 Noise Reduction............................................................................ 35 3.6.3 High and Low Pass Filtering......................................................... 36 3.6.4 Lossy Compression ....................................................................... 36 3.6.5 Addition of noise........................................................................... 37 3.6.6 Changing The Sampling Rate ....................................................... 37
3.6.7 Pitchshifting .................................................................................. 38 3.6.8 Volume Reduction ........................................................................ 38 3.6.9 Sound Compression ...................................................................... 38
4. Implementation ......................................................................................... 40 4.1 Framework Overview ....................................................................... 40 4.2 Graph Visualisation Tool .................................................................. 43 4.3 Fast Fourier Transform ..................................................................... 47 4.4 Watermark Implementations............................................................. 48 4.5 Implementation Review .................................................................... 48
5. Evaluation & Test Results......................................................................... 50 5.1 Attacking Techniques ....................................................................... 50 5.1.1 Cropping........................................................................................53 5.1.2 Noise Reduction............................................................................ 53 5.1.3 High and Low Pass Filtering......................................................... 53 5.1.4 Lossy Compression ....................................................................... 54 5.1.5 Addition of noise........................................................................... 54 5.1.6 Changing The Sampling Rate ....................................................... 54 5.1.7 Pitchshifting .................................................................................. 55 5.1.8 Volume Reduction ........................................................................ 55 5.1.9 Sound Compression ...................................................................... 55 5.2 Evaluation Summary......................................................................... 56
6. Conclusion ................................................................................................ 58 7. Further Development ................................................................................60
7.1 Potential Improvements .................................................................... 60 8. Bibliography.............................................................................................. 62 9. Appendices................................................................................................ 65
9.1 Additional Algorithm Descriptions................................................... 65 9.1.1 Phase Coding................................................................................. 65 9.1.2 Phase Modulation.......................................................................... 66 9.1.3 Watermarking the compressed bitstream...................................... 66 9.1.4 Integrating watermark embedding into compression encoder ...... 68 9.1 Source Code Sample ......................................................................... 69 9.2 Project Proposal ................................................................................ 91
1. Introduction
1.1 Background
One of the greatest technological advancements to change people’s lives in the past
decade or so has been the Internet. From humble beginnings during the early 1960s
where it started as a method of retaining communication in the event of nuclear war to the
worldwide network of computer networks it is today, as discussed in How the Web Was
Born [1]. Today, the Internet affects all industries as its potential continues to be
recognised and explored. From e-commerce to a means of advertising, it has become the
first port of call for many uses. The increase in demand has meant great attention has
been paid to the evolution of net technology, including increased data transfer speeds.
Phone modems are becoming less common as broadband becomes a common option for
residents of developed nations with businesses investing in more costly alternatives to
cater for their large user bases.
One of the more notable side effects of this evolution has been the volume of data transfer
between users, including the trading of pirate material, whether software, video or audio.
Piracy has always been a major problem that has concerned the music and film industries.
Before the advent of the Internet other mediums were met with outcry. Audio tapes and
video cassettes were both greeted with the same concerns about intellectual property
theft. However, most personal breaches of copyright were ignored because they made
little difference to sales. Organised piracy syndicates were the focus of the industry’s
protests. The advent of the Internet has changed this. With high transfer speeds and the
ability to communicate with any other user connected to the Internet, piracy has become a
serious problem. The ease of which pirated material can be obtained has dramatically
affected the volume of intellectual property breaches.
The music industry in particular has been affected with significantly reduced sales. Sonic
Boom: Napster, P2P and the Battle for the Future of Music [2] tells of the MP3
revolution and Napster, which has made headlines around the world. Napster was a
company that was famed for its pioneering peer-to-peer software that enabled the transfer
of audio files between users with little more effort than the click of a mouse. Legitimate
music downloads were indistinguishable from pirated material making it hard for the user
to distinguish between the two. Sometimes users didn’t realise they were breaking the
law.
Given Napster’s ease-of-use, it didn’t take long for the network of users to grow to a
considerable number, with huge volumes of music being transferred on a daily basis.
Record sales dropped and the industry began to acknowledge the extent of the problem.
Since that time, measures have been taken to combat this huge flow of piracy. In 2000,
the Recording Industry Association of America (RIAA) won a injunction to shut Napster
down but similar software emerged, such as AudioGalaxy, Kazaa or eDonkey, with even
greater popularity and functionality. The piracy battle isn’t being fought by the RIAA
and it’s international counterparts alone, many musicians became vocal on the issue.
Many approaches have been taken to try and stop or discourage the file sharing, one
being the RIAA systematically suing individual downloaders for large sums of money,
seeming to show the industry’s desperation in the matter.
Another side-effect of the Internet’s usage growth has been the ease of finding images to
suit your purpose. By simply referring to a search engine, it can be easy to find any
picture to download. With such flow of images between computers with many web-sites
using images without declaring their source, ownership has become an issue. With digital
technology, there is no way to prove that you originally took a picture, since digital
cameras have no film. One of the major technologies that has arisen to combat this
problem has been Digital Watermarking.
1.2 What is Digital Watermarking?
A watermark is a translucent design impressed on paper during manufacture and visible
when the paper is held to the light. It’s purpose is to stop imitation, by making the
watermark near impossible to reproduce. The concept of Digital watermarking considers
this process in the digital domain from the ethos of cryptography and steganography. The
purpose of a digital watermark is to hide within an image or audio file some data
specifically relating to that file. The base requirement of a watermarking technique is that
adding a mark to a sound file should not degrade its quality. The watermark should
survive altering operations and remain detectable.
The number of papers written about Digital Watermarking have grown considerably as its
applications are recognised more and more. Common art packages such as Adobe
Photoshop now provide watermarking as a built in option. Watermarking has expanded
it’s uses to image, video and to a lesser extent, audio. Watermarking has remained a
fringe idea in the audio domain for reasons discussed later but has not been disregarded.
This project will focus specifically on watermarking of audio and trying to discover why
this process has been relatively fruitless in this domain and to suggest areas of
improvement from our results.
1.3 What are Digital Watermarking’s Uses?
A digital watermark has three broad uses:
• To prove whether a media file is the original.
• To prove who owns the media file.
• To identify whether a media file is copyrighted material.
In real world scenarios:
- Watermarks could be embedded in media sold via the Internet from outlets such
as iTunes or Rhapsody. This would enable the music industry to create software
agents that could trawl P2P networks for copyrighted material. Upon identifying
copyrighted material the agent could report the user to a regulative authority.
- An artist may make their music available for download on their promotional web-
site. In this situation, the artist would not be overly bothered by the sound quality
of the download since it’s purpose is for previewing purposes only. As such a
robust watermark could be used that was audible to the human ear.
- In both of these cases, the watermark could be used as evidence in a court of law
to determine who owns the original copyrighted material.
Potentially, watermarking could be performed in the recording studio or editing suite. As
part of the final mixing process the relevant copyright information could be inserted.
This would ensure that the watermark was included early on in the production and
distribution chain. Apart from the master, all other copies would contain the watermark.
However, to maintain the integrity of the essence, watermarking conducted in the studio
would have to be completely undetectable to human perception.
1.4 Digital Watermarking Characteristics
The concerns surrounding watermarking are as follows:
• Robustness –Ease with which the watermark can be detected after intentional and
unintentional alterations of the watermarked image.
• Extractability – There is disagreement as to whether the extraction of the
watermark should be blind or informed.
• Fidelity – How well the watermarked sound resembles the original. This has
been the biggest issue to date with audio watermarking.
1.4.1 Watermark Robustness
When discussing watermarking it is common to discuss a watermark’s fragility.
Fragile watermarks refer to watermarks where any modifications to the essence would
either remove the watermark or irrevocably alter it, so that only the true original would
pass a watermark check.
Semi–fragile watermarks refer to watermarks that would survive some basic operations,
such as saving to a different format including lossy file compression techniques such as
MPEG, but would be removed by any actual editing of the image.
Robust watermarks refer to watermarks that when called upon, are able to prove without a
doubt who owns the original media.
1.4.2 Watermark Extractability
Blind extraction is where an extraction program exists which can detect and read the
watermark, so that it can be used to inform of ownership as well as prove it, this does
however put some restrictions on the watermarking techniques available and may make
attacking easier.
Informed detection means that generally only the owner of the original image can detect
the watermark, the extraction method may even require the original image.
If the watermark is to have effective commercial use, then it needs to be blind detectable,
however this leads to many problems. Presumably the method of watermarking should be
as open-source as possible, like any good encryption technique. However this would
make removing the watermark very easy if no key is required to detect just where in the
wave form the watermark lies. Similarly it would be impractical to require the original
version of the sound is since it may not be available. Allowing the original sound to be
released in the public domain effectively makes the watermark pointless. For these
reasons the two types of watermark are also known as public watermarking systems and
private watermarking systems respectively as described in Digital Watermarking [3].
1.4.3 Watermark Fidelity
In some applications if the watermark introduces audible artifacts into the signal then it
will be instantly dismissed as a candidate technique for identifying copyrighted material.
For example, there was a plan to introduce audio watermarking to coincide with the
release of the DVD-A format but unfortunately, the technology didn’t meet the
requirements. In August 2000, leading classical recording engineer Tony Faulkner was
quoted as saying “Watermarking could reduce the perceived quality of DVD-A to
somewhere between a good MiniDisc and a below-average CD,” after conducting
research into the format. The plans to use an audio watermark were dropped since the
value of releasing the DVD-A came from the ability to produce a level of quality akin to
the original studio recording.
1.5 Report Outline
The following report will read as follows:
• Chapter 2 discusses previous work in the field, drawing out relevant
developments and describing aspects utilised at later stages.
• Chapter 3 states exactly what is to be done, why and how.
• Chapter 4 describes the details of the implementation described in the source
code from the outline set in Chapter 3
• Chapter 5 evaluates the results obtained in Chapter 4 by implementing attacks.
• Chapter 6 concludes.
• Chapter 7 suggests further development that could be carried out with a larger
timescale.
2. Literature Review
2.1 History
Traditionally, cryptography has been the principle technique for obscuring information.
From the pre-computer era, encrypted messages have been exchanged between people in
situations where the sender wants to ensure that the message in question cannot be
understood by anyone other than those with the decrypting key. A famous example of
this was the German Enigma Machine from the second world war, whose story it told in
Enigma [4]. This was a machine used by the Germans with a complicated encryption
technique that was eventually broken by the allied forces.
As discussed previously, the rise of peer-to-peer (P2P) software has caused a surge in the
ease of piracy, as discussed in Sonic Boom: Napster, P2P and the Battle for the Future of
Music [2]. Piracy has always been a matter the music industry has taken very seriously
but in recent years, the ease of file sharing has caused actual media sales to decline
noticeably. This has been a major reason for the continued research and development of
audio watermarking as the industry continues to seek ways to stem the flow of illegal
downloads. Napster’s closure only caused a ‘blip’ in the rise of piracy methods and
paved the way for other software to appear in a similar vein. It was evident that
something else needed to be done.
It is only in the past decade that steganography has received serious consideration as a
method of combating this. Steganography is the process of hiding a message within a
larger one in such a way that others can not discern the presence or contents of the hidden
message. It wasn’t until the first conference on the subject in 1996 that it was formally
addressed. Many groups had already been researching the subject independently of each
other.. The details of that conference are stored in the journal Information Hiding: 1st
International Workshop [5]. Common terminology was agreed upon at this pioneering
event.
With collaboration from different research groups, ideas were traded and the development
of these steganographic techniques rapidly gained pace as new ideas were thought of
from these original sparks. Image watermarking was the most natural form of
development due to its similarities to the way an artist would signify their creation of a
painting by signing the bottom corner of the picture.
A web-site named Digital Watermarking World [6] was set up centralised watermarking
resources, including relevant books, research and upcoming conferences to name a few.
Unfortunately due to the commercialisation of watermarking techniques, the web-site is
now defunct and serves merely as an archive of past research.
As watermarking evolved, the focus of the research remained on image processing with
few exceptions due to the added complications in making the process effective for audio,
as discussed in the first book to focus purely on digital watermarking, intuitively named
Digital Watermarking [3]. This book serves as a good introduction to the various
approaches used in watermarking but mainly focuses on images. The book doesn’t have
great depth but introduces the concepts and outlines of terminologies alongside
definitions and explanations very well. A similar surface level breakdown of this can be
found in Information Hiding Techniques for Steganography and Digital Watermarking
[7].
2.2 Image vs. Audio watermarking
Image watermarking cannot be directly transferred to audio watermarking due to the
fundamental differences between the way our eyes and ears work, as discussed in
Sensation and Perception [8]. When the eye views an imperfect picture, the brain blanks
out the imperfections seeing the picture as it’s supposed to be viewed. For example,
when a person visits the cinema, they see the projected image on the screen as it was
intended to be seen without noticing the flickers unless they explicitly look out for them,
but the ear can pick up on the slightest imperfection in a sound, as shown in Attacks on
Copyright Marking Systems [9]
When represented in visual form, it is easy to see the difference between the diagrams,
just as it would be easy to tell the difference between their sounds. Here, it is evident that
(b) is the same sound wave as (a) but with an 20 milliseconds echo added as a watermark.
The echo is very clearly visible at 0.02 seconds and would be easily audible when
listening. The small imperfection on the picture’s equivalent in the audio domain would
come across as a click. A similar form of defence used in image watermarking would
have been near invisible to the human eye.
Despite the added complexity of audio watermarking, the similarities between the
robustness techniques and the attacks of visual watermarks and audible watermarks have
to be analysed for their similarities since there is so much more literature for image
watermarking. When considering Fabien Peticolas’s Stirmark Benchmark [10], attacks
can be grouped in the following way:
• Cropping (Essentially the same)
• Compression (lossy or lossless)
• Random bending (pitchshifting)
• Gaussian Filtering (low-pass filter) etc.
Following the creation of Stirmark, other benchmarks came into existence to test
the robustness of watermarks including Checkmark Benchmarking [11], which
contains extra attacks, and Optimark [12] which features a GUI. Hiding digital
watermarks using multiresolution wavelet transform [13] provides statistical
effectiveness of Checkmark showing that only one existing technique remained
completely robust against the various forms of attack, namely the Xie 1 bit transform.
These marks are the standard ways of testing the robustness of a watermark. Those
wishing to attack the watermark are always going to devise new ways of removing it.
The fastest way for the technology to evolve is to monitor the way people remove the
watermarks. Researching according to these attacks accelerates the development.
2.3 Spatial vs. Frequency Domain
Most watermarking techniques themselves can be distinguished into two approaches,
those in the Spatial domain and those in the Frequency domain. The main difference
between these approaches is their robustness, as will be discussed later. Spatial
techniques were the initial development in the field. A Watermark Technique Base On
One-Way Hash Functions [14] proposes embedding the watermark securely in the least
significant bit (LSB) plane so that only the person who has placed the watermark may
retrieve it using cryptographic hash functions to ensure the security of the watermark.
Transparent robust image watermarking [27] suggests an imaging technique that creates
a random key to decide on a co-ordinate in the image. Once the key is chosen, the
brighter and darker pixels are distinguished and the brightest is brightened and the darkest
is darkened. This technique could be applied to sound with a change in amplitude of a
sample, by negligible amounts that would remain inaudible to the listener. With the
correct key used to retrieve the watermark information, one simple comparison operation
can find the watermark. The problem with this technique is its susceptibility to noise.
Any interference could alter the volume differences and render the watermark useless.
Spatial domain watermarks are being developed today, since their techniques are
relatively cheap and for more trivial examples they can quickly create a watermark with
little effort. The main area of focus with this form of watermarking is in the randomised
key. If the key follows a pattern, then the human mind is more likely to pick up on the
imperfection and so it will be easier to notice and therefore remove.
Although it is accepted that spatial domain watermarks are not as robust as frequency
domain attacks, there seems to be some ambiguity with regards to which technique has
the largest capacity (i.e. the technique that can you can fit most of into your sound file).
Adaptive watermarking in the DCT domain [28] suggests that Spatial has the larger
capacity, but Image watermarking for tamper detection [29] has a different opinion. It
seems that both statements are true but as you fit more watermarking using Spatial
techniques, the quality of the original media will degrade, unlike frequency domain
watermarking. Since spatial domain techniques distort the media, you tend to find that
very little watermarking can be carried out to make it practical so its use is fairly limited.
Techniques applied in the Frequency domain are more robust than those applied in the
Spatial domain. This explains in part why the bulk of current research is directed towards
the the exploration of Frequency based techniques.
The benefits of real-time watermarking allow safe and traceable audio streaming, radio
broadcasting and cellular phone recordings. With these practical applications,
watermarking in audio can be viewed as a more serious technique that has great potential.
Digital image watermarking using daubechies’ wavelets and error correction coding
[30] discusses how this could be possible despite applications on the Windows platform
having an embedded 8x run time.
The biggest debate with frequency domain watermarking is where exactly to put the
watermark. Technically, if the watermark is placed in an inaudible area of the domain
then a low pass filter should be able to eradicate it. Many techniques are being developed
to counter this but if the filter cannot erase the watermark then its not doing its job
properly and needs to be adapted to stop this. With this kind of trade-off between two
different fields of development, you have to wonder what the future direction of
watermarking will be with relation to audio.
Another issue regarding watermarking is security. Who can translate the watermark? In
principle, for a watermark to be truly robust, it should only have one possible method of
removal, known only to the owner. However, this is just an ideal, which cannot be
achieved with current technology. The solution is a trade-off has to be established
between the ease-of-finding the watermark and the ease of its removal. Hidden digital
watermarks in images [31] makes the suggestion that the key that encodes the watermark
is known only to the image owner and a key that decodes the watermark, or recognises it
would be available to the public.
2.4 Specific Audio Techniques
Techniques and Applications of Digital Watermarking and Content Protection [21] is one
of the few books to contain a chapter exclusively on audio watermarking. In this chapter,
five specific watermarking techniques are formally specified:
• Least Significant Bit (LSB) Coding
The substitution of the LSB carrier signal with the bit pattern from the watermark
noise.
• Embedding Watermarks into the phase (phase coding and phase modulation)
Exploiting the fact that humans have a low sensibility against relative phase changes.
Phase coding splits the original audio stream into blocks and embeds the whole
watermark into the phase spectrum of the first block whilst phase modulation
performs independent multiband phase modulation.
• Echo hiding
Embedding watermarks into a signal by adding echoes in a slightly delayed time
position to produce a marked signal.
• Spread spectrum audio watermarking
Sending the watermark in the transmission signal.
• Patchwork Technique
Use a pseudorandom process to embed a certain statistic into a data set that is
detected in the reading process with the help of numerical indexes.
3. Preparation
3.1 Technology Choices
Development OS & Programming Language I chose to implement the project in a Windows environment due to its familiarity and
convenience. The system could have been developed in a number of languages. I selected
Java to capitalise on my experience with the language. Many people criticise Java for
being slow. However, the order of complexity of an algorithm remains invariant under the
language, thus arguments of efficiency of languages are not as important as delivering a
lean and efficient algorithm. Java 1.4 also provides added performance by virtue of the
Java HotSpot virtual machine.
The Java HotSpot uses adaptive compilation to capitalise on the trend that most programs
spend 80% of their time in 20% of the code. The HotSpot VM compiles the most
frequently run code, performing advanced optimisation and in-lining of methods. This
type of compilation is particularly effective for audio processing kernels.
Audio Library Audio functionality is a relatively new feature in Java, with Digital Audio With Java
[22], providing a slightly dated yet nonetheless useful outlook on audio programming in
the language.
I decided to you use the Java Sound API. The API provided abstraction of an audio signal
from its underlying medium, be it wav, mp3, au etc or an input device, eg
A microphone.
The API also provided a platform independent interface to audio hardware for playback.
Sound Editor
Goldwave version 4.25 is a sophisticated sound editor with many pre-built sound
manipulators such as filters, limiters and gates. It contains many filtering tools with easy
to adjust settings. Goldwave was used to create the attacks on each file since it seems
unnecessary to ‘re-invent the wheel’.
Backup Procedure
Throughout development I regularly backed up my code onto a network share.
3.2 High Level Design
What follows are some of the most commonly used audio watermarking techniques and
an analysis of their implementation methods.
It is my intension to produce implementations of the following algorithms:
- LSB
- Echo Hiding
- Spread Spectrum
- Patchwork
If time remains I will also produce implementations of the following algorithms:
-
- Additional Algorithm Descriptions
- Phase Coding
- Phase Modulation
- Watermarking the compressed bitstream
- Integrating watermark embedding into compression encoder
For a description of these algorithms see Appendices - Additional Algorithm
Descriptions.
3.2.1 Finding the maximum inaudible volume
To find the maximum volume inaudible to humans, thus finding the boundary of
audibility for future watermarks, the following steps need to be taken:
1. Calculation of the power spectrum;
2. Identification of the tonal (sinusoid-like) and nontonal (noise-like) components;
3. Decimation of the maskers to eliminate all irrelevant maskers;
4. Computation of the individual masking thresholds;
5. Computation of the global masking threshold;
6. Determination of the minimum masking threshold in each subband.
3.2.2 LSB
One of the first techniques investigated in the watermarking field, as for virtually all
media types, is the so-called LSB encoding. It is based on substituting the LSB of the
carrier signal with the bit pattern from the watermark noise.
This method places message bits into cover audio by modifying the least significant bits
of the audio. The scheme developed places bits into the mth bit of the cover audio, where
m is a parameter that ranges from 1 (MSB) to 16 (LSB). This method has extremely low
computational complexity, on the order of O(n). To allow as fair a comparison between
watermarking methods as possible, the cover audio is segmented and bits are placed at the
first location of each segment. Thus, the embedded bit locations were known in advance.
This violates the provision that watermarks should be statistically invisible; technically
this method can be considered as more of a “data-hiding” than a watermarking algorithm.
Decoding simply involves taking values at these known locations and extracting the
desired bit. Interestingly, bits encoded down to the 10th bit location could not be heard by
human observers.
3.2.3 Echo Hiding
Echo Hiding was developed by Gruhl, Lu and Bender in Echo Hiding [25] and proposed
to encode bits by introducing a small, imperceptible echo to the file. Overlapping an echo
kernel with the original signal to implements the echo.
A variety of watermarking algorithms are based on echo hiding methods, according to
Techniques and Applications of Digital Watermarking and Content Protection [21]. Echo
hiding algorithms embed watermarks into a signal co(t) by adding echoes co(t − ∆t) to
produce a marked signal cw(t ):
cw(t) = co(t) + αco(t − ∆t) (i)
In the above equation, parameters ∆t and a can be adjusted to provide inaudibility of the
echo whilst by changing ∆t alone you can encode bits of the watermark into the audio
signal. In general, (i) can be written as
cw(t) = αk co (t − ∆tk) (ii)
where co(t) is the original signal with parameters α0 = 1, ∆t0 = 0, and N the number of
different echo signals embedded. By substituting the response function
h(t) = αk δ(t − ∆tk) (iii)
a short form convolution of the echoes with the original signal can be written
cw(t) = co(t) * h(t) (iv)
The marked signal cw(t) can also be expressed in the frequency domain as
Cw(ω) = Co(ω)H(ω) (v)
where Co(ω) and H(ω) are the Fourier transformations of the signals co(t) and h(t),
respectively. During the detection step, the calculation of h(t) is necessary to determine
the individual echoes with corresponding delay times ∆tk encoding the bits k = 1, . . . , N.
According to (v), the signal can be separated by dividing Cw(ω) by Co(ω) in the frequency
domain and calculating the inverse Fourier transformation. Performing this operation
requires an a prior knowledge of the original signal Co(ω), which is not practical in the
case of watermarking. The method for separating the signal and the echoes is known as
homomorphic deconvolution.
The basic idea behind homomorphic deconvolution is to apply a logarithmic function to
convert the product (v) into a sum. Using the definition of the complex cepstrum as the
inverse Fourier transformation of the log-normalized Fourier transform of the
watermarked signal, the transformed signal can be written as
Cw(q) = F−1{log |Co(ω)H(ω)|}
= F−1{log |Co(ω)|} + F−1{log |H(ω)|} (vi)
= Co(q) + H(q)
as a function of the time or quefrency domain which is equivalent in nature to a time
domain representation. According to (vi), the original signal Co(q) and the embedded
echoes H(q) are clearly separated on the quefrency axis q. Using this deconvolution
technique in the detection of the watermark bits, an algorithm adding two different echoes
for embedding 0 and 1 bits can be constructed. The original signal co is split into M =
blocks coj, 0 ≤ j ≤ M − 1 with N samples. Each block carries 1 bit of the
watermark.
1. For each block co j of the original signal, the echo signal for the 0 and 1 bits are
constructed with the corresponding delay time and attenuation factors α0 and α1.
wk(t) = αkco(t − ∆tk), for k = 0, 1 (vii)
2. Two complementary modulation signals mk(t), k = 0, 1 for the 0 and 1 bits are
generated: m0(t) = (1 − bj) rect j (t), m1(t) = bj rect j (t) (viii)
with
m0(t) + m1(t) = 1 ∀t rectj (t) = 1 for t j ≤ t < t j+1
0 otherwise (ix)
and bj = m[ j modl(m)]
The modulation signals are used to construct the echo signals according to the bits of
the watermark.
3. After multiplying the echo signals wk(t) with the modulation signals mk(t), the marked
audio stream is generated by addition of the computed signals to the original one:
cw(t) = co(t) + m0(t)w0(t) + m1(t)w1(t) (x)
Using this mathematical theory on the
Through trial and error, echoes of 5500 and 4400 yielded good decoding results. The
amplitude of the echo kernel may also be adjusted. A higher amplitude means a stronger
echo. When the echo kernel amplitude was less than .5, very few listeners could hear any
difference between the original and echoed signal. Stronger amplitudes produced a more
resonant, “richer” sound.
Retrieving the watermark requires a synchronization procedure to perform an alignment
with the watermarked blocks:
1 Transformation of the sequence in the cepstrum domain Cw = F−1{log(|F{ cw}|)};
2 Autocorrelation of Cw in the cepstrum domain;
3 Measurement of the delay time δt via the peaks of the autocorrelation of Cw;
4 Determination of the embedded bit by comparison of δt with ∆tk, k = 0 or 1.
Using masking effects, echo hiding uses the postmasking effect in order to control the
inaudibility of the embedded watermark. The louder the echo, the stronger the watermark
will be. The watermark shouldn’t be much greater than the lowest audible level humans
can hear else any lossy compression will instantly cut it, but also shouldn’t be much
greater, else it will be too audible. The delay times ∆tk and attenuation factors αk, k = 0, 1
have to be adjusted in the embedding process according to the perception threshold of the
human auditory system to ensure the relative inaudibility of the echoes. It is a blind
watermarking method so the original audio file would not be required thus extending the
usability of the method. The embedding and the detection are performed in two different
domains, the time and cepstrum domain, respectively which can add complexity to the
algorithm since the number of transformations that have to be computed for detection in
the cepstrum domain would be great.
3.2.4 Spread-Spectrum
Spread-spectrum methods, originally conceived for masking the origin of radio
transmissions and enhancing resilience against jamming, are often used in the
transmission of digital information according to Techniques and Applications of Digital
Watermarking and Content Protection [21]. Since the requirements of suppressing
jamming during transmission, hiding a signal against an unintended listener and ensuring
information privacy are very similar to those in watermarking applications. If fact, they
are probably the most widely used techniques in the development of watermarking
algorithms. From the spread-spectrum viewpoint, the original audio signal can be
considered as a jammer interfering with the signal carrying the watermark information.
The spread-spectrum modulation is a special form of watermark modulation. The
modulation is performed on Co, which is the transformed block of samples co. The
transformation is used to model the audio signal with orthonormal base functions
spanning the signal space. If the identity transformation is used, the signal is represented
by the block of PCM samples itself. In the case of the Fourier transformation, the
trigonometric functions are used as basis functions and the transformed block consists of
the Fourier coefficients represented by the vector Co. Each bit k ε {0, 1} is modelled by a
pseudonoise pnk vector consisting of two equally probable elements {−1, +1} generated
by means of the secret key. Therefore, the expectation value of the pseudonoise sequence
is E{ pnk} = 0. Usually the pseudonoise sequences for the two bits are inverted pn0 =
−pn1 = pn. The original signal co is split into M = blocks co j, 0 ≤ j ≤ M − 1 with
N samples.
To simplify the discussion, consider one block (co : = co j ) carrying 1 bit of the
watermark.
1. The block co is transformed with the orthogonal transform T in the corresponding
domain Co.
Co = T (co) (i)
2. The PN sequence pnk is weighted with α to adjust between quality and robustness.
W = αpnk (ii)
3. The modulated and weighted watermark signal is added to the cover signal in the
transformed domain.
Cw = Co +W (iii)
4. The watermarked signal is transformed back into the time domain.
cw = T −1(Cw) (iv)
During the detection step, the same vector pnk, k = 0, 1 has to be generated via the secret
key. A comparator function is used in order to decide about the presence of the embedded
vector pn. This requires a perfect synchronization with the embedding block of samples.
1. Synchronization with the beginning of the embedding block cw;
2. Transformation of cw into embedding domain Cw = T (cw);
3. Correlation of Cw with pnk, k = 0, 1 by applying the comparator function Cτ:
Cτ (Cw, pn) = Cτ (Co, pn) + Cτ (αpn, pn) (v)
4. Detection of the transmitted bit, usually made on the sign of the comparator function
sign (Cτ (Cw, pn)) { > 0, for pn0
{ < 0, for pn1 (vi)
One of the widely used comparator functions Cτ is the linear correlation
Cτ (x, y) = [x, y] = x[i]y[i] (vii)
with the signal vectors x and y. The result of the correlation consists of the two
contributions Cτ (Co, pn) and Cτ (αpn, pn). The second term accumulates the contribution
of the pseudonoise sequence embedded in the different base functions, whereas the first
term represents the correlation or the interference of the carrier signal respectively and
pseudonoise sequence. If the pseudonoise sequence is split into the two sequences
containing positive and negative elements, the correlation Cτ (Co, pn) can also be written
as:
Cτ (Co, pn) = C+o [i] − C−
o [i] = (µ+ − µ−)
2 (viii)
with µ+ and µ− denoting the mean values. According to the central limit theorem, the
distribution of the means is normal if N is sufficiently large. Furthermore, the difference
of two normal distributions is also normal with N(µCτ , σCτ ). Since Co and pn are two
independent random variables, the mean µCτ and the variance σCτ can be calculated
according to
µCτ = {Cτ (Co, pn)} = E{ Co} E{ pnk} = 0 (ix)
Cτ2 2
(µ+ - µ−)/2 = 2(µ+ + µ−)/2 = 2
µco = (x)
By using the model of the distribution function N(0, σCo / √N) in the unwatermarked case
and assuming a fixed weighting α := {α} Ni=1 of the pseudonoise sequence, the probability
distribution function for the two different sequences is
fpn1 (t) = fpn0 (t) = (xi)
Errors in detection of the bits occur if Cτ (Co, pn) > Cτ (αpn, pn). Therefore, the false
alarm probability is obtained by
P f a = P01 + P10 = p fpn0 (t)dt + p1 fpn1 (t)dt (xii)
where P01 represents the error that a 0 bit is transmitted and a 1 bit is detected and P10
accordingly. Setting the a prior probabilites that the different bits are transmitted to
p0 = p1 = ½ and using the definition for the complementary error function erfc(x),
erfc(x) = 1 − erf(x) = e−t2dt (xiii)
this can be written with the threshold τ = α according to (xii) as
P f a = P01 + P10 = ½ erfc (xiv)
Different kinds of audio watermarking algorithms use different embedding domains and
representations of the transformed signal vector Co. Furthermore, the psychoacoustic
parameters have to correspond to the specific embedding domain in order to perform the
psychoacoustic weighting step. One of the first algorithms that used the masking
properties human auditory system by Tewfik et al. [1, 2] works in the Fourier domain.
The psychoacoustic weighting is performed by shaping the Fourier coefficients of the PN
sequence according to the masking threshold calculated by the psychoacoustic model
presented in Section 5.2.3. Furthermore, this algorithm approximates the temporal
masking behavior by using the envelope of the signal for the increase and a decaying
exponential for the decrease of the signal. Another algorithm presented by Haitsma et al.
Spread spectrum is a widely used technique for different types of media given its high
robustness against signal manipulations. If a secret key is used to generate the
pseudonoise sequence pn, this algorithm does not need the original audio signal in order
to detect the embedded bits and is therefore a blind watermarking method, provided that
the synchronization requirement is met. The main disadvantage is the vulnerability
against desynchronization attacks. Furthermore, the length of the correlator has to be
sufficient in order to ensure small error probabilities, which is evident from (xiv).
3.2.5 The Patchwork Technique
The patchwork technique, first presented by Bender et al. in Techniques For Data
Hiding, [15] for embedding watermarks in images is a statistical method based on
hypothesis testing described in Techniques and Applications of Digital Watermarking and
Content Protection [21]. These methods use stochastic models relying on large sets,
which make them applicable for CD-quality audio data due to the large amount of
samples. The watermark encoding procedure uses a pseudorandom process to embed a
certain statistic into a data set which is detected in the reading process with the help of
numerical indexes (like the mean) describing the specific distribution. This method is
applied to magnitudes in the Fourier domain in order to spread the watermark in the time
domain and be more robust against random sample cropping operations.
The selection of the two subsets can be described by a permutation of the indices i = (1, . .
. , 2N) according to the bit to be embedded:
π = (a1, . . . , aN, b1, . . . , bN), with pn[ai] = +1, pn[bi] = −1 (xv)
Therefore, the watermarked block is obtained by
Cw[n] = Co[n] + ∆Co[n]pn[n], n = π[i], i = 1, . . . , 2N (xvi)
Cw = Co +W (xvii)
where the alteration of the different Fourier magnitudes is described by the vector ∆Co.
The test performed during detection in the patchwork algorithm is a difference of subsets
defined by the indexes a1, . . . , aN and b1, . . . , bN, which can be written as
(Cw[ai] − Cw[bi]) = Cw[π[i]]pn[π[i]] = Cτ (Cw, pn) (xviii)
Therefore, the patchwork technique in this form is equivalent to the linear correlation
comparator function in the spread-spectrum technique as described earlier.
3.3 Development Methodology
Whilst implementing the project I planned to stick to an evolutionary development model
whose stages consisted of expanding increments of an operational software product. This
enabled me to deliver a working solution of at least one watermark implementation fairly
quickly.
Critics of this software model argue that it veers towards a ‘code and fix’ model so it’s
important to plan its evolution, and provision wisely for unplanned events.
Each stage is implemented and tested before progressing to the next. This provides a
trusted base to isolate bugs faster:
First phase of development
This stage is concerned with creating the end-to-end framework required to implement a
watermark and a very simple watermarking technique itself.
From a component level this involves:
- The creation of a generic interface to represent a watermark and its associated
operations.
- A component that manages the application and detection of a watermark given an
audio file and a given implementation of the watermark interface.
- An implementation of the Watermark interface that uses the LSB algorithm.
- A test class that applies a watermark to an input file and then attempts to detect
the same watermark within the resultant file.
Subsequent phases of development
Each additional stage of development was concerned with implementing a new version of
the Watermark interface described above.
3.4 Test Plan
3.4.1 Module Testing
The modules were designed to be self-contained and therefore able to be tested separately
by both an individual test harness and by hand using a debugger and visualisation tools.
For testing purposes a class that takes as input both a Watermark and an audio file will
insert the watermark into the audio file and then try to detect the resultant file for the
corresponding watermark. If it can’t detect the freshly inserted watermark there is an
error in the implementation.
In the case of the Maths functions, it will be possible to test the validity of some simple
identity relationship with random data such as:
fastFourierTransform(inverseFastFourierTransform( x )) = x
convolution(x, [1, 0, 0...]) = x
cepstrum(convolution(x, y)) = cepstrum( x ) + cepstrum( y )
3.4.2 End To End Testing
For testing purposes a class that takes as input both a Watermark and an audio file will
insert the watermark into the audio file and then try to detect the resultant file for the
corresponding watermark. If it can’t detect the watermark on an audio file that has had a
watermark freshly inserted then there is an error in the implementation.
3.5 Evaluation Data
To evaluate the different watermark techniques it is necessary to supply different types of
input files to obtain a balanced view of how the watermark copes with a variety of styles.
In mass usage, the permutations of sounds that possibly exist mean that testing cannot be
exhaustive, but by covering a handful of different sound types, we should receive a good
reflection.
The sounds chosen and the reasons why are as follows:
• A single vocal. A one layered sound that the human ear easily recognises.
Volume levels fluctuate greatly in a single vocal since pauses in between words
tend to fall to very low volumes. In a sound as basic as this, audible marks
should be relatively easy to recognise.
• A drum beat. The drum beat has some of the qualities of the single vocal in that
not much is going on in the sound so it is easy to pick out the individual elements
of the sound. The test posed for the watermark in this case is whether it can
withstand a constant rhythm with identical beats hitting in exact timings.
• Classical music test. This particular classical snippet is heavy on treble and is the
longest sample used to test the algorithms abilities to deal with sounds that really
utilise the windowing technique. The subtleties in this classical piece and the
quiet nature of it make the effects on an actual piece of music clearer.
• Loud full band test. This piece should withstand the most due to its complexity
and layering. Many layers of drums, bass, vocals and guitars, the watermarks
should be relatively easily disguisable.
3.6 Attacking Techniques
As mentioned in Chapter 2, there are two types of attack, intentional and unintentional.
As stated in [5], some sources like to define robust watermarks as those resistant to
unintentional attacks whilst secure attacks survive intentional attacks also making robust
watermarks a subclass of their secure counterparts. In the image domain, set benchmarks
exist as a set measure of the durability of the watermark. As mentioned in Chapter 2,
despite these benchmarks being designed for images, it doesn’t take much intuition to see
how they can be adapted for use in the audio domain. Take the Stirmark Benchmark
[10] for example. Stirmark was developed by Fabien Petitcolas, a well-known developer
in the field of watermarking. There also exists Checkmark Benchmarking [11] and
Optimark [12] which perform similar tasks but with extra benchmarks tested against.
With audio watermarking, more specific attacks can be implemented. Using these
adaptable attacks alongside some suggested in Techniques and Applications of Digital
Watermarking and Content Protection [21], a set of attacks were produced to test the
robustness of the watermark.
3.6.1 Cropping
Cropping is performed by shortening the sound file or by removing an arbitrary piece
within the waveform. The amount and location of the crop could occur anywhere within
a track but is most likely going to happen at the start or the end of the song to avoid
disrupting the flow of the sound.
3.6.2 Noise Reduction
Noise reduction is the process of removing specific sound frequencies within a waveform.
Using an envelope shape to specify which frequencies you want to remove means that
there are a lot of different ways to manipulate noise reduction to remove the specific
sounds that you want to remove. The following settings were applied to both the Echo
hiding and LSB Coding watermarks as a form of attack. The original sound was 2
minutes, 24 seconds and 520 milliseconds long. By cropping 1 second, 315 milliseconds
off the end of the file, it is disrupted enough to change the layout of the file but not
enough to seriously disfigure the sound inside.
3.6.3 High and Low Pass Filtering
High pass filters block low pitch frequencies, but allow high-pitched frequencies to pass.
They can remove deep rumbling noise or remove unwanted sounds below the given cut-
off frequency. A high pass filter was applied at 250Hz, only allowing frequencies above
this through. Applying the filter at this level meant the change in the song was not
affected too much with the reduction in sound.
Low pass filters block high-pitched frequencies (treble), but allow low pitched
frequencies (bass) to pass. They can be used to reduce high-end hiss noise or remove
unwanted sounds above the given cut-off frequency. By applying a low pass filter at
4000Hz to both sounds, the effect on the actual sound itself is minimised.
3.6.4 Lossy Compression
Lossy compression is probably the most common attack a watermark will have to
withstand whilst also being the most difficult to withstand. Lossy compression is
compression where the original image cannot be perfectly retained from the compressed
form as opposed to lossless compression. The most common form of lossy compression
today is MPEG Layer 3, more commonly known as MP3 compression. MP3
compression works because uncompressed audio, stores more data than the human brain
can actually process. It utilises the fact that if two sounds are very different but one is
much louder than the other, your brain may never perceive the quieter signal and that
your ears are more sensitive to some frequencies than others.
MP3 encoding tools analyze incoming source signal, and compare patterns to
psychoacoustic models stored in the encoder itself. The encoder can then discard most of
the data that doesn't match the stored models. MP3s also overlap frames with if an
adjoining one has excess space inside it.
To achieve compression of approximately 1000% efficiency and still be left with a sound
very similar to the original, MP3s go through a lot of processing. With these measures in
mind, MP3 encoding is likely to disturb the entire layout of the sound file.
3.6.5 Addition of noise
Certain noises are not recognised by the human ear, especially if quiet, consistent and
continuous. By manipulating this, noise can be added to the original file consistently
throughout but at a level barely audible to the human ear. The noise will be mixed with
the file from start to finish using an arbitrary white noise sample.
3.6.6 Changing The Sampling Rate
By changing the number of samples taken of the particular sound, what’s left is
essentially a different sound. By taking a higher sample rate, nothing would change since
you cannot introduce more samples to a sound when there are no more to choose from.
By halving the sample rate from 16bit to 8bit in both sounds, the waveform becomes half
as complex.
3.6.7 Pitchshifting
By changing the pitch of the sound sample by small amounts, the listener should not be
able to notice much difference, if any. By raising the pitch, the length of the sample is
shortened since pitchshifting also changes the tempo of a sound.
3.6.8 Volume Reduction
By making slight changes to the volume, the quality of the recording will suffer slightly
but if the change is small enough, the difference should be negligible. By increasing the
sound, there is a chance of the highest peaks of the recording to distort because they have
exceeded the highest level of volume, known as clipping. It is safest to reduce the
volume slightly of the entire sound so the difference isn’t so noticeable.
3.6.9 Sound Compression
This is a dynamic effect processes that makes use of compressors, limiters, expanders and
gates. The process consists of three variables; the ratio, the threshold and the smoothness.
The Ratio specifies the compression or expansion ratio. This value was declared at 90%
to ensure the difference made to the file itself was minimal.
The Threshold specifies the envelope level to activate the expander or compressor.
Compressors change the volume level of all sounds above that level. Depending on the
Smoothness setting, the threshold may have to be set much lower than expected. In this
case it was set at 0.250.
The Smoothness specifies how quickly the compressor changes from one volume level to
the next and how quickly it activates. Using 0% means that volumes will change
instantly, which can cause a rough distortion in sections of audio that border on the
threshold level. A value of 100% means that volumes will change gradually over 100ms.
With a high smoothness setting, the threshold will have to be reduced. The higher setting
makes the envelope detector respond more slowly to changes in the sound, resulting in a
lower envelope range. As a compromise, 50% was chosen for the smoothness.
4. Implementation
4.1 Framework Overview
The framework overview corresponds to the first phase of development as described in
Development Methodology (section 3.3):
Watermark Interface
Provides a generic interface to represent a watermark and its associated operations.
WatermarkImpl
Encapsulates common functionality all watermark implementations require.
WatermarkedAudioInputStream
The WatermarkedAudioInputStream class is a subclass of
javax.sound.sampled.AudioInputStream.
The AudioInputStream is used for both reading and writing audio. The extended version
of this class enables users to read audio, apply a watermark and save the watermarked
audio to a standard audio format.
Message
The message class can determine which bit of the watermark message needs to be
inserted for the current block of audio data.
Audio Format
This class encapsulates information about the underlying audio format.
LSBDetector
Detects input audio streams for LSB watermarks
LSBWatermark
Inserts LSB watermarks in the underlying audio stream
Framework Overview UML Class Diagram
4.2 Graph Visualisation Tool
The graph visualisation applet was developed to help visualise the processes occurring as
they occurred. It’s a great tool for debugging by allowing a visualisation of the current
watermark applications being made. Places where mistakes were occurring could be seen
to identify with the human eye where the problem lay.
On the following graphs, the x axis is measured in time samples in the time domain or
quefrency in the cepstrum domain. The two scroll bars were part of a test harness
application where they were used to calculate the power density spectrum. The debug
part is controlled by viewGraph.
In the following graph, we can see a visualisation of echo hiding. Notice the enormous
peak on the right. This is situated in one of two possible positions representing a 1 or a 0
bit. What you are seeing is a portion of the cepstrum in the echo hiding algorithm. The
strong peaks are looked for in the sound pattern to attempt to decode the watermark.
The following graph is an example where the echo has not been found.
4.3 Fast Fourier Transform
The Fast Fourier Transform is a well-studied algorithm, so to implement one would be
foolhardy. Instead I decided to obtain pre-written source code for the FFT. Searching for
such code in Java form proved fruitless and left me with two alternatives: write my own
or rewrite a FFT from C code.
I decided to rewrite Numerical Recipes implementation of the FFT from Numerical
Recipes in C [23]. This implementation is well known and is relatively straightforward to
adapt to the Java language.
A peculiarity was noticed in the specific implementation of Numerical Recipe’s FFT. In
reference to the realft packing, it stated:
“Calculates the Fourier transform of a set of n real-valued data points. Replaces this
data (which is stored in array data[1..n]) by the positive frequency half of its complex
Fourier transform. The real-valued first and last components of the complex transform
are returned as elements
data[1] and data[2], respectively. n must be a power of 2. This routine also calculates
the inverse transform of a complex data array if it is the transform of real data. (Result
in this case must be multiplied by 2/n”
From this description, it can be noted that real data[1..n] goes through realfft & becomes
the positive frequency half of its complex Fourier transform.
The real-valued first and last components are returned as data[1] & data[2] because the
negative frequency half are complex conjugates.
All arithmetic performed on arrays in the format produced by realft had to be aware of
the data format. For example, complexMultiply2, logModulus.and modulusSqr which
also unpacks its result into a plain list of real values.
4.4 Watermark Implementations
Watermark Framework UML Class Diagram
4.5 Implementation Review
The watermarking techniques implemented cover LSB encoding, Echo Hiding and the
Patchwork Technique from the types discussed in Chapter 2. Before any kind of attack
was implemented, certain strengths and weaknesses were apparent from the algorithms
implemented. Implementing the LSB algorithm is a relatively straightforward task since
it involves no actual sound manipulation, just the placing of actual binary values.
For all other techniques, the explicit coding of algorithms gave a good insight into the
true workings of the watermark. Hard coding these algorithms was intended to give a
possible insight into extended reasons for the weakness of the watermarks. Although this
process did enlighten the workings of the watermarks and give a better understanding of
their advantages and disadvantages, no epiphanies occurred.
At present the block size used is of size 8192. The echo delays are of sizes 5500 and
4400 respectively (approximately a tenth of a second at 44100Hz sample frequency)
which means only half of the 8192 sized block contains an echo. If the delays were any
less than this and it'd be far more difficult to extract the peaks because the cepstrum’s a
lot more noisy near the smaller echo delays. Any louder and the echo would have been
too audible. This will add to the uncertainty when detecting the watermark. It’s possible
to wrap the echo so the last part of the block is heard first. However, the sound this
would make would appear strange and notable so no echo wrapping occurred. This all
adds uncertainty due to the effect of silences in the sound.
A bug fix was implemented when a problem arose with the program terminating. BugFix
ensures the javax.sound.sampled api thread system is closed down allowing the program
to terminate. Which is a bug as far as could be analysed.
Due to lack of time, the implementation of the Patchwork technique did not reach a stage
worthy of testing. However, the framework for the algorithm was built with algorithms
followed for its development.
When finding the maximum audible volume, the first two steps noted in Chapter 3
(Calculation of the power spectrum; Identification of the tonal (sinusoid-like) and
nontonal (noise-like) components) were successfully implemented, but unfortunately, due
to time constraints, the process didn’t get any further.
Detailed commenting of the source code thoroughly explains the workings of the
program.
5. Evaluation & Test Results
5.1 Attacking Techniques
As described in Chapter 3, a set of attacks were declared to be implemented upon the
various watermarking methods to test their robustness. This section collects those results
and presents them in a form with analysis and reasoning behind it.
The following diagram shows a combination of results and predictions of the
watermarking robustness. Since not all watermarks were fully implemented, the results
for the Patchwork and Spread Spectrum algorithms have been carefully calculated by
referring to Enhanced Spread Spectrum Watermarking of MPEG-2 AAC Audio [20] and
drawing conclusions from the points made.
Cropping Noise
Reduction High/Low Pass Filtering
Lossy Compression
Addition of Noise
Changing the sample rate
Pitchshifting Volume reduction
Sound Compression
LSB Errors occur from the crop onwards
Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable Unrecognisable
Echo Hiding
Errors occur from the crop onwards
Errors grew with correlation to the amount of reduction
More filtered, more errors, depending on the equaliser of the sound
Removes completely if Echo inaudible, no change if audible, errors increase around the audibility mark
More silence, more errors
No Effect
Error grows parallel to amount of reduction since higher frequency definition is lost
Error grows parallel to amount of reduction since amplitude definition is lost
Errors increase as compression increases.
Patch-work
Errors occur from the crop onwards
Unrecognisable Some errors caused
Unrecognisable Unrecognisable Causes some errors
Error grows parallel to amount of reduction since higher frequency definition is lost
Error grows parallel to amount of reduction since amplitude definition is lost
Causes some errors
Spread Spec-trum
Errors occur from the crop onwards
Errors grew with correlation to the amount of reduction
More filtered, more errors, depending on the equaliser of the sound
Removes completely if Echo inaudible, no change if audible, errors increase around the audibility mark
More silence, more errors
No Effect
No Effect Error grows parallel to amount of reduction since amplitude definition is lost
Errors increase as compression increases.
5.1.1 Cropping
In the case of both the LSB encoding and the echo hiding, the watermark remained
untouched up to the point where the crop occurs. Since the file is being read in
chronological order, the problem does not occur until a break in the flow occurs where
the detector can no longer find the watermark.
5.1.2 Noise Reduction
In the case of the LSB encoding, the watermark became unrecognisable, with the noise
reduction seeming to wipe out all traces of a watermark. The noise reduction caused
problems to the echo hiding since the signal was weakened from the reduction. It’s
natural that the greater the severity of the reduction is, the more the signal reduces
eventually eliminating the watermark entirely. With this particular echo hiding
algorithm, it didn’t take much nose reduction to start to distort the results with most
envelope shapes causing the watermark’s recollection relatively useless in successfully
proving ownership.
5.1.3 High and Low Pass Filtering
In the case of the LSB encoding, the watermark became unrecognisable. The effect of the
filtering depended on the content of the sound itself. In the classical music piece, the low
pass filter did not have as many errors since the sound was high in treble values in its
original form. This contradicted the results for the high pass filter where the echo was
completely eradicated, but also leaving the sound quite warped. It would be rare that
someone would apply a filter that would change the sound so much. If the echo was
inaudible, however, the filters would be the most efficient ways of removing it since they
could cancel the relevant frequencies.
5.1.4 Lossy Compression
Lossy compression was the toughest attack implemented on the sounds. In the case of the
LSB encoding, the watermark became unrecognisable. Echo hiding didn’t fair well
against this either. If the echo was inaudible, the process would wipe out any trace of it
instantly since that’s one of the main features of lossy compression. When the echo was
audible, some results could be obtained from the detector but even at a relatively high
level, the watermarking was still not sufficient to give a useful proof of ownership.
5.1.5 Addition of noise
In the case of the LSB encoding, the watermark became unrecognisable. During loud
moments in the audio, the echo hiding fair well, returning positive results, but in quieter
moments, the noise took effect. With a consistently loud sound like the rock song, the
watermark could be recalled to a reasonable level, but it’s use is limited.
5.1.6 Changing The Sampling Rate
In the case of the LSB encoding, the watermark became unrecognisable. It made no
difference to the echo hiding.
5.1.7 Pitchshifting
In the case of the LSB encoding, the watermark became unrecognisable. It also made the
echo hiding unrecognisable but was due more to the change in length of the file than the
change of pitch itself.
5.1.8 Volume Reduction
In the case of the LSB encoding, the watermark became unrecognisable. Small volume
changes made no difference to the echo hiding, but once levels dropped to around three
quarters, the errors became a lot more apparent.
5.1.9 Sound Compression
In the case of the LSB encoding, the watermark became unrecognisable. The sound
compression caused problems to the echo hiding since the signal was weakened from the
reduction. It’s natural that the greater the severity of the reduction is, the more the signal
reduces eventually eliminating the watermark entirely. With this particular echo hiding
algorithm, it didn’t take much compression to start to distort the results.
5.2 Evaluation Summary
Of the watermarks mentioned previously, each has been referred to as if it were a set
technique. Most of the watermarking methods described so far have been concepts
adapted from implementations inspired from various sources. It’s not just the delay and
audibility of the echo that can be changed to still have a functioning echo hiding
technique. The results obtained from the echo hiding contrasted with the results from the
LSB show this.
With the LSB the exact results expected were obtained each time without one single
rogue value. It failed to remain robust against anything apart from partially against a
crop. There are ways to improve the robustness of the LSB, such as moving the signal to
a more significant bit, which will probably remain inaudible, but being such a haphazard
method that fairs significantly worse to the other watermarking techniques, there’s no real
point.
The echo hiding results on average were worse than expected. Principally it shouldn’t be
too difficult to predict how an echo will fare against these various forms of attack, but
with various parameters, and a less discrete detection method, the scope for potential
occurrences increases.
In general, the loud full band song served better than all other samples. It was the only
piece not to contain a moment of absolute silence which seemed to make a difference
since the echo hiding watermark writer had nothing to work with in moments of silence.
The input data of size N was padded to block 2N before performing convolution. The
advantages were the fact that where the echo is present, it exactly echoes the source
signal by a fractional delay, making it less audible. The disadvantage of this is that the
echo is not continuously present; it’s absent at the beginning of each block. This is
deemed to make it harder for the detector to find it. It also increases computation time
because we're transforming a list of size 2N for audio data size N which means double
the cost.
Despite this, even if the techniques had surpassed expectations, their practical use would
still be severely limited. Most of the attacks implemented were merely normal processing
as opposed to malicious. If a watermark cannot withstand these then it is difficult to see
its commercial use.
6. Conclusion
Watermarking in the audio domain as a concept seems to be principally flawed. A
statement backed up by many leading steganography researchers with views similar to
that found in Why Digital Watermarking Is Nonsense [24]. The reasons for this can be
summed up as follows. If a watermark exists and is inaudible, an all-pass filter or an
MP3 compression could remove it with ease. If the watermark is audible, its uses are
already limited since in most cases, the user will not want their sound sample changed by
the watermark. Also with audibility, it would be easier to narrow down the watermark
itself, giving the attacker an advantage when trying to remove it.
Can a watermark really be called secure? Can you really say that we will reach a stage
where the information hidden within the medium is truly hidden? We can't really
measure security. All security systems are built with assumptions and measured
accordingly. Most of the assumptions in the real world do not stand on their own and tend
to be invalidated as technology/human-race evolves. This problem can be generalised
from the larger subject of all security. You could ask how hard it is to break the weakest
part of the system with existing technologies and a reasonable amount of resources given;
and how hard it is to break/invalidate the assumption of the system but the entire system
of defence is based on assumption of attacks. As long as there are watermarks existing,
there will always be people attempting to break them. With any system that isn’t trivially
simple, you cannot state what the weakest part of the system is due to it’s complexity.
One of the main forms of copyright protection being pushed by the Music Industry is
Digital Rights Management. This involves the use of a container that carries encrypted
audio. The rights are distributed within the audio file itself and are enforced by the
software player at decode time. This system means that many of the traditional watermark
attacks are foiled. Altering the file usually results in rendering the file useless. However,
most DRM systems require proprietary infrastructure to support the encoding, license
management and decoding process. Watermarks have the advantage of being format
independent as they are embedded within the essence itself.
Watermarking does not stand alone as the future of audio protection. Essentially it is a
technique that has been inspired by it’s sibling in the visual domain. The significant
overlap between the two fields may mean that the concept can be transferred with ease
into the audio domain, but the differences between the two are where it falls down.
For watermarking to seriously be considered, principle ideas need to be revised to find
solutions to the evident shortcomings, but with rival processes fulfilling the same goals,
you have to wonder what the practicalities are of furthering the development.
7. Further Development
7.1 Potential Improvements
So far the LSB coding and echo hiding watermarking techniques have been successfully
implemented. As discussed earlier, there are many more algorithms that could potentially
be tested, including variations of those already attempted, such as the echo hiding
technique [15]. The techniques currently being used also have room for improvement
with regards to their robustness. With more time, more algorithms could have been
developed with ranging characteristics to find out which had the best use for certain
places. With this, it may even have been possible to make observations with ideas of
improvement of current techniques.
More extensive testing could have revealed more about the particular strengths and
weaknesses of certain watermarks. By extending the number of source sounds being
tested, a wider idea of what sort of sound files suited which watermarks could have been
developed. Although it’s impossible to perform an exhaustive test on sounds and
combinations of sounds, more tested would have meant more assured results. More
attributes could have been applied to the attacking techniques to find out what levels the
existing watermarks succeeded most or failed most at.
Specific improvements could have been made to algorithms themselves. An example of
some variations that can be made with the technique can be found in New Echo
Embedding Technique for Robust and Imperceptible Audio Watermarking [26]. One
major drawback of echo hiding in general is its vulnerability to malicious attacks, since
the information can be detected by anyone without using a secret key. An attacker can
exploit this knowledge if he knows the underlying algorithm to apply a removal attack. A
possible countermeasure against the easy determination of the delay time is the spreading
of the echo over the time axis. This is accomplished by substituting the Dirac delta
function in the response function with a pseudonoise (PN) sequence. Instead of
calculating the autocorrelation in the cepstrum domain, despreading of the echo is
performed by cross-correlation of the cepstral signal with the PN sequence generated
from a secret key.
The lack of usability is a big flaw in the program. Usability was not a major
consideration for the project since its use was for simply collecting information regarding
watermarking techniques. With a graphical user interface (GUI), then the program would
become more available to other users making results retrieval easier and more efficient.
The program could potentially be used to implement your own watermarks to your own
media, using the testing tools to decide which watermark was right for you. It would also
make it easier to implement watermarks for testing.
8. Bibliography
[1] Gillies, James. Callilau, Robert Callilau. How the Web Was Born: The Story of
the World Wide Web (Popular Science). Oxford Paperbacks. 0192862073.
[2] Alderman, John. Sonic Boom: Napster, P2P and the Battle for the Future of
Music. Fourth Estate. 1841155136.
[3] Cox, Ingemar J. Miller, Matthew L. Bloom, Jeffrey A. Digital Watermarking.
Morgan Kaufmann. 1558607145.
[4] Kozaczuk, Wladyslaw. Enigma: How the German Machine Cipher Was Broken
and How It Was Read by the Allies in World War Two. Univ Publications of
America. 0890935475.
[5] Anderson, Ross J. Information Hiding: 1st International Workshop, Cambridge,
U.K., May 30-June 1, 1996: Proceedings (Lecture Notes in Computer Science,
1174). Springer Verlag. 3540619968.
[6] Digital Watermarking World. (http://www.watermarkingworld.org/)
[7] Katzenbeisser, Stefan. Petitcolas, Fabien A P. Information Hiding Techniques for
Steganography and Digital Watermarking. Artech House, 1580530354.
[8] Matlin, Margaret W. Foley, Hugh J. Sensation and Perception. Pearson Allyn &
Bacon. 0205263828.
[9] Petitcolas, Fabien A P. Anderson, Ross J. Kuhn, Markus G. Attacks on Copyright
Marking Systems. (http://www.cl.cam.ac.uk/~fapp2/publications/ih98-
attacks.pdf)
[10] Petitcolas, Fabien A P. Stirmark Benchmark 4.0.
(http://www.petitcolas.net/fabien/watermarking/stirmark/).
[11] Pereira, Shelby. Checkmark Benchmarking.
(http://watermarking.unige.ch/Checkmark/index.html).
[12] Argyriou, Vasilis. Optimark. (http://poseidon.csd.auth.gr/optimark/)
[13] Tsung, Din-Chand. Hsieh, Ming-Shing. Huang, Yong-Huai. Hiding digital
watermarks using multiresolution wavelet transform. IEEE Transactions
On Industrial Electronics, 48(5):875–882, 2001.
[14] Hwang, Min-Shiang. Chang, Chin-Chen. Hwang, Kuo-Feng. A Watermark
Technique Base On One-Way Hash Functions. IEEE Transactions on Consumer
Electronics, Volume 45 Issue 2. 0098-3063.
[15] Morimoto, N. Bender, W. Gruhl, D. Lu, A. Techniques For Data Hiding. IBM
Systems Journal Vol. 35, No. 3&4, 1996 - MIT Media Lab. G321-5608.
[20] Cheng, Samuel. Yu, Heather. Xiong, Zixiang. Enhanced Spread Spectrum
Watermarking of MPEG-2 AAC Audio. (ICASSP). Vol. 4, pp. 3728–3731.
[21] Arnold, Michael. Wolthusen, Stephen D. Schmucker, Martin. Techniques and
Applications of Digital Watermarking and Content Protection. Artech House.
1580531113.
[22] Lindley, Craig. Digital Audio With Java. Prentice Hall PTR. 0130876763.
[23] Press, William H. Flannery, Brian P. Teukolsky, Saul A. Vetterling, William T.
Numerical Recipes in C : The Art of Scientific Computing. Cambridge University
Press. 0521431085.
[24] Herley. C. Why Watermarking Is Nonsense. IEEE Signal Processing Magazine,
pp. 10-11, Sep. 2002.
[25] Gruhl, Daniel. Bender, Walter. Lu, Anthony. Echo Hiding. Information Hiding:
First International Workshop, Vol. 1174 of Lecture Notes in Computer Science,
Cambridge, Springer-Verlag, pp. 295-315.
[26] Oh, Hyen-O. Soek, Jong-Won. Hong, Jin-Woo. Youn, Dae-Hee. New Echo
Embedding Technique for Robust and ImperceptibleAudio Watermarking.
International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Orlando, pp. 1341–1344.
[27] Swanson, Mitchell D. Zhu, Bin. Tewfik, Ahmed H. Transparent robust image
watermarking. In 1996 SPIE Conf. on Visual Communications and Image Proc.,
volume III, pages 211–214, 1996.
[28] Tao, Bo. Dickinson, Bradley. Adaptive watermarking in the DCT domain. In
International Conf.on Accoustics, Speech, and Signal Processing, ICASSP ’97,
April 1997.
[29] Fridrich, J. Image watermarking for tamper detection.
(http://citeseer.nj.nec.com/fridrich98image.html), 1998.
[30] Wang, James Ze and Wiederhold, Gio. Wavemark: Digital image watermarking
using daubechies’ wavelets and error correction coding. In Proceedings of SPIE,
volume 3528, pages 432=–439, November 1998.
[31] Hsu, Chiou-Ting and Ling, Ja. Hidden digital watermarks in images. IEEE
Transactions on Image Processing, 8(1):58–68, 1999.
9. Appendices
9.1 Additional Algorithm Descriptions
9.1.1 Phase Coding
Approaches that embed the watermark into the phase of the original signal exploit the fact
that the human auditory system has a low sensibility against relative phase changes, as
stated in [techniques].
This method was presented by Walter Bender et al. in [11] and proposes to split the
original audio stream into blocks and embed the whole watermark into the phase
spectrum of the first block, as described in [techniques].
One disadvantage of the phase coding approach is the low payload that can be achieved.
Only the first block is used in embedding the watermark. Since the watermark is not
distributed over the entire data set, but is implicitly localized, it can be removed easily if
cropping is acceptable.
9.1.2 Phase Modulation
Another form of embedding the watermark into the phase is by performing independent
multiband phase modulation, [techniques] states. Inaudible phase modifications are
exploited in this algorithm by controlled multiband phase alterations of the original
signal.
Both phase embedding approaches use the psychoacoustic features of the human auditory
system with regard to the just noticeable phase changes. They exploit the inaudibility of
phase changes if the time envelope of the original signal is approximately preserved.
Because of the phase alteration, embedding and detection of the watermark is done in the
Fourier domain by processing the audio stream blockwise. While the phase coding
method is embedding the watermark in the phases of the first block, the phase modulation
algorithm performs a long-term multiband phase modulation. Both algorithms are non-
blind watermarking methods, since they require the original signal during the watermark
retrieval, which of course limits their applicability.
9.1.3 Watermarking the compressed bitstream
Several approaches exist to embed the watermark directly into the already compressed
audio bit stream as we can see in [techniques]. Time consuming decoding, watermarking
embedding, and re-encoding in the case of pulse code modulation (PCM) watermarking
techniques are not necessary in order to embed the watermark. Furthermore, the retrieval
process does not involve a decoding procedure, which results in an additional decrease in
watermark retrieval speed. Nevertheless, the starting point for professionally created
audio material is always the PCM format. These approaches change the contents of the
MPEG frame directly. The scaling factor can be viewed as a logarithmic gain factor for
the sample values in order to retrieve the original samples in PCM format. The
embedding of the watermark is done by changing the scaling factors of different frames
according to a special pattern derived from a secret key. A problem of this method is that
some audio streams carry only a few scaling factors per frame. Therefore, the space for
embedding a watermark is reduced. This leads to the problem that multiple watermarks
cannot be embedded, because altering scale factors already used for embedding the first
watermark destroys the quality of the audio data. A second approach in the variation of
the MPEG frame tries to alter the sample values instead of the scaling factors. Embedding
multiple watermarks is also critical in this case. The additional requirement of using the
original track as input for the retrieval process further limits the applicability of this
approach.
Besides working on MP3 bit streams, methods like the one presented by Cheng et al. in
Enhanced Spread Spectrum Watermarking of MPEG-2 AAC Audio [20] are embedding
watermarks into the advanced audio coding (AAC)[21, 22] compressed bit stream by
direct modification of the quantized coefficients. The watermark bits are embedded by
performing a spread-spectrum modulation (see Sections 2.3.3 and 5.4.5) of the quantized
coefficients. The individual bits are retrieved by a linear correlation of the
PN sequence used during the embedding combined with the quantized coefficients of the
watermarked bit stream. The coefficients to be modified are selected by applying a
heuristic, which uses only nonzero coefficients in a predefined frequency range. The
amount of distortion applied is fixed and set to the quantization step size of 1.
Methods of directly watermarking the compressed bit stream have in common that they
do not make use of a psychoacoustic model. Both embedding and detection are performed
directly on the compressed bit stream, where the audio stream is processed in frames
according to the formatting of the bit stream in the specific compression algorithm.
Additional information is not necessary if the audio data are synchronized. The main
advantage is the low computational cost. Furthermore, these methods obey implicit
robustness against their specific compression format due to embedding of the watermark
in the already compressed bit stream. The main disadvantage of these methods is the
missing psychoacoustic counterpart in comparison to the uncompressed audio signal. The
influence on the audio quality of the original track by altering scaling factors, sample
data, or the quantized coefficients can only be estimated. Moreover, the decoding of the
compressed bit stream and a new compression with a shifted audio stream may lead to a
synchronization problem because of the new scaling factors, sample data, and
quantization coefficients of the MPEG frames. Furthermore, the complexity advantage is
lost if the watermarked audio tracks have to be transcoded in another compression format.
9.1.4 Integrating watermark embedding into
compression encoder
Besides directly watermarking the bit stream, other methods extract the information in the
compressed bit stream from the quantization of the audio samples. This enables the
estimation of the masking threshold to shape the watermark noise below this threshold in
order to ensure inaudibility. Integrating the watermark and compression encoder has two
advantages: The quality during the watermarking can be controlled in contrast to the
methods described above and the speed of embedding is improved in comparison to two
separate processes of watermarking and compression. The building blocks consists of
parts of the PCM watermark embedder and the compression decoder and encoder.
Part of the bit stream decoder is used in order to read the scaling factors and decode the
bit stream and perform the inverse quantization of compressed samples. The information
about the quantization enables the calculation of the masking threshold. The masking
threshold controls the multiplication factors used to multiply the spectral lines of the
constructed watermark— as usual in a perceptual watermark encoder—applying the
masking effects. The watermark generation can be the same as for the PCM watermark
embedder. After weighting the spectrum of the watermark noise, the result is added to the
original spectral lines. The extracted scaling factors from the original frame are used in
order to quantize the marked audio data again and format the bit stream. The final output
is the marked bit stream.
This method makes implicit usage of the psychoacoustic model by approximating the
perceptual information contained in the MPEG frames. Detection can be performed on
the compressed and uncompressed audio data. It is a blind watermarking method, which
distributes the bits over different MPEG frames. Due to the usage of parts of the
compression encoder and decoder, such a mechanism is tied to the special compression
scheme used. For each newly developed compression algorithm, a new integration of the
watermarking embedding procedure becomes necessary.
9.1 Source Code Sample
FFT.java package math; import applet.FloatGraph; import applet.Graph; import applet.ViewGraph; import audio.Channels; /** * @author Steven Morgan * * Calculates the Fourier transform of a set of n real - valued data * points. Replaces this data (which is stored in array data[1..n]) * by the positive frequency half of its complex Fourier transform. * * The real - valued first and last components of the complex transform * are returned as elements data[1] and data[2], respectively. n must * be a power of 2. This routine also calculates the inverse * transform of a complex data array if it is the transform of real * data. (Result in this case must be multiplied by 2/n.) */ public class FFT {
public static final void getFloats( int[] ints, float[] floats) { for ( int i = 0; i < ints.length; i++) { floats[i + 1] = ints[i]; } } public static final void getInts( float[] floats, int[] ints) { for ( int i = 0; i < ints.length; i++) { ints[i] = Math.round(floats[i + 1]); } } public static final void fudgeFactor( float[] data, int n) { // Fudge Factor: multiply result by 2 / n float factor = 2.0f / ( float) n; for ( int i = 1; i <= n; i++) data[i] *= factor; } public static final void four1( float[] data, int nn, int isign) { int i, j, m, n, mmax, istep; double wtemp, wr, wpr, wpi, wi, theta; //Double precision for the trigonometric recurrences. float tempr, tempi; n = nn << 1; j = 1; // This is the bit-reversal section of the routine. for (i = 1; i < n; i += 2) { if (j > i) { // Exchange the two complex numbers. float d; d = data[j]; data[j] = data[i]; data[i] = d; d = data[j + 1]; data[j + 1] = data[i + 1];
data[i + 1] = d; } m = nn; while (m >= 2 && j > m) { j -= m; m >>= 1; } j += m; } // Here begins the Danielson-Lanczos section of the routine. mmax = 2; while (n > mmax) { // Outer loop executed log2 nn times. istep = mmax << 1; theta = isign * (2.0 * Math.PI / mmax); // Initialize the trigonometric recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); wr = 1.0; wi = 0.0; // Here are the two nested inner loops. for (m = 1; m < mmax; m += 2) { for (i = m; i <= n; i += istep) { j = i + mmax; // This is the Danielson-Lanczos formula: tempr = ( float) (wr * data[j] - wi * data[j + 1]); tempi = ( float) (wr * data[j + 1] + wi * data[j]); data[j] = data[i] - tempr; data[j + 1] = data[i + 1] - tempi; data[i] += tempr; data[i + 1] += tempi; }
wr = (wtemp = wr) * wpr - wi * wpi + wr; // Trigonometric recurrence. wi = wi * wpr + wtemp * wpi + wi; } mmax = istep; } } public static final void realft( float data[], int n, int isign) { int i, i1, i2, i3, i4, np3; float c1 = 0.5f, c2, h1r, h1i, h2r, h2i; double wr, wi, wpr, wpi, wtemp, theta; //Double precision for the trigonometric recurrences. theta = Math.PI / ( double) (n >> 1); //Initialize the recurrence. if (isign == 1) { c2 = -0.5f; four1(data, n >> 1, 1); // The forward transform is here. } else { c2 = 0.5f; // Otherwise set up for an inverse transform. theta = -theta; } wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); wr = 1.0 + wpr; wi = wpi; np3 = n + 3; for (i = 2; i <= (n >> 2); i++) { // Case i=1 done separately below. i4 = 1 + (i3 = np3 - (i2 = 1 + (i1 = i + i - 1)));
h1r = c1 * (data[i1] + data[i3]); // The two separate transforms are separated out of data. h1i = c1 * (data[i2] - data[i4]); h2r = -c2 * (data[i2] + data[i4]); h2i = c2 * (data[i1] - data[i3]); data[i1] = ( float) (h1r + wr * h2r - wi * h2i); // Here they are recombined to form the true transform of the original real data. data[i2] = ( float) (h1i + wr * h2i + wi * h2r); data[i3] = ( float) (h1r - wr * h2r + wi * h2i); data[i4] = ( float) (-h1i + wr * h2i + wi * h2r); wr = (wtemp = wr) * wpr - wi * wpi + wr; // The recurrence. wi = wi * wpr + wtemp * wpi + wi; } if (isign == 1) { data[1] = (h1r = data[1]) + data[2]; // Squeeze the first and last data together to get them all within the original array. data[2] = h1r - data[2]; } else { data[1] = c1 * ((h1r = data[1]) + data[2]); data[2] = c1 * (h1r - data[2]); four1(data, n >> 1, -1); // This is the inverse transform for the case isign=-1. } } public static final void cosft1( float y[], int n) { int j, n2; float sum, y1, y2;
double theta, wi = 0.0, wpi, wpr, wr = 1.0, wtemp; // Double precision for the trigonometric recurrences. theta = Math.PI / n; // Initialize the recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); sum = 0.5f * (y[1] - y[n + 1]); y[1] = 0.5f * (y[1] + y[n + 1]); n2 = n + 2; for (j = 2; j <= (n >> 1); j++) { // j=n/2+1 unnecessary since y[n/2+1] unchanged. wr = (wtemp = wr) * wpr - wi * wpi + wr; // Carry out the recurrence. wi = wi * wpr + wtemp * wpi + wi; y1 = 0.5f * (y[j] + y[n2 - j]); // Calculate the auxiliary function. y2 = (y[j] - y[n2 - j]); y[j] = ( float) (y1 - wi * y2); // The values for j and N ? j are related. y[n2 - j] = ( float) (y1 + wi * y2); sum += wr * y2; // Carry along this sum for later use in unfolding the transform. } realft(y, n, 1); // Calculate the transform of the auxiliary function. y[n + 1] = y[2]; y[2] = sum; // sum is the value of F1 in equation (12.3.21).
for (j = 4; j <= n; j += 2) { sum += y[j]; // Equation (12.3.20). y[j] = sum; } } /* Calculates the sine transform of a set of n real-v alued data points stored in array y[1..n]. The number n must be a power of 2. On exit y is re placed by its transform. This program, without changes, also calculates the inverse sine transform, but in this case the output array should be multiplied by 2/n. */ public static final void sinft( float y[], int n) { int j, n2 = n + 2; float sum, y1, y2; // Double precision in the trigonometric recurrences. double theta, wi = 0.0, wr = 1.0, wpi, wpr, wtemp; theta = 3.14159265358979 / ( double) n; // Initialize the recurrence. wtemp = Math.sin(0.5 * theta); wpr = -2.0 * wtemp * wtemp; wpi = Math.sin(theta); y[1] = 0.0f; for (j = 2; j <= (n >> 1) + 1; j++) { wr = (wtemp = wr) * wpr - wi * wpi + wr; // Calculate the sine for the auxiliary array. wi = wi * wpr + wtemp * wpi + wi; // The cosine is needed to continue the recurrence. y1 = ( float) (wi * (y[j] + y[n2 - j])); // Construct the auxiliary array. y2 = 0.5f * (y[j] - y[n2 - j]); y[j] = y1 + y2; // Terms j and N ? j are related. y[n2 - j] = y1 - y2; } realft(y, n, 1); // Transform the auxiliary array.
y[1] *= 0.5; // Initialize the sum used for odd terms below. sum = y[2] = 0.0f; for (j = 1; j <= n - 1; j += 2) { sum += y[j]; y[j] = y[j + 1]; // Even terms determined directly. y[j + 1] = sum; // Odd terms determined by this running sum. } } public static void printFloats(String name, float[] floats) { System.out.print(name + "\t:= " ); printFloats(floats); } public static void printFloats( float[] floats) { System.out.print( "[" ); for ( int i = 0; i < floats.length; i++) { if (i > 0) System.out.print( ", " ); System.out.print(Math.round(floats[i] * 1000) / 1000.0f); } System.out.println( "]" ); } public static void fastFourierTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = floats.length - 1; i >= 0; i--) { temp[i + 1] = floats[i]; } realft(temp, floats.length, 1); for ( int i = floats.length - 1; i >= 0; i--) { floats[i] = temp[i + 1]; } } public static void inverseFastFourierTransform( float[] floats) {
float[] temp = new float[floats.length + 1]; for ( int i = floats.length - 1; i >= 0; i--) { temp[i + 1] = floats[i]; } realft(temp, floats.length, -1); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = floats.length - 1; i >= 0; i--) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void cosineTransform( float[] floats) { float[] temp = new float[floats.length + 2]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } cosft1(temp, floats.length); for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1]; } } public static void inverseCosineTransform( float[] floats) { float[] temp = new float[floats.length + 2]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } cosft1(temp, floats.length); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void sineTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } sinft(temp, floats.length); for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1];
} } public static void inverseSineTransform( float[] floats) { float[] temp = new float[floats.length + 1]; for ( int i = 0; i < floats.length; i++) { temp[i + 1] = floats[i]; } sinft(temp, floats.length); float fudgeFactor = 2.0f / ( float) floats.length; for ( int i = 0; i < floats.length; i++) { floats[i] = temp[i + 1] * fudgeFactor; } } public static void testRealft1() { float[] floats = new float[8 + 1]; floats[8] = 1; printFloats(floats); realft(floats, 8, -1); fudgeFactor(floats, 8); printFloats(floats); realft(floats, 8, 1); printFloats(floats); } public static void testRealft2() { float[] theFloats = new float[1 << 10]; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i; } printFloats(theFloats); fastFourierTransform(theFloats); //printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats);
} public static void testRealft3() { float[] theFloats = new float[1 << 3]; int v = 1; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = v; v *= -1; } printFloats(theFloats); fastFourierTransform(theFloats); printFloats(theFloats); theFloats[0] = 0; theFloats[1] = 0; theFloats[theFloats.length - 1] = theFloats.lengt h; printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats); } public static void testRealfft4() { float[] theFloats = new float[1 << 3]; int v = 1; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i + 1; } //Cepstrum.randomize(theFloats); printFloats(theFloats); float[] response = new float[theFloats.length]; response[5] = 1; printFloats(response); fastFourierTransform(theFloats); printFloats(theFloats); //if(true)return; fastFourierTransform(response);
printFloats(response); Channels.multiplyComplex2(theFloats, response); //Channels.multiply(theFloats, response); printFloats(theFloats); inverseFastFourierTransform(theFloats); printFloats(theFloats); } public static void main(String[] args) { float[] theFloats = new float[1 << 3]; for ( int i = 0; i < theFloats.length; i++) { theFloats[i] = i + 1; } FFT.printFloats( "floats" , theFloats); float[] fft; fft = ( float[]) theFloats.clone(); fastFourierTransform(fft); FFT.printFloats( "fft" , fft); inverseFastFourierTransform(fft); FFT.printFloats( "inverse" , fft); fft = ( float[]) theFloats.clone(); Graph graph = new FloatGraph(fft); graph.setTitle( "fft" ); ViewGraph view = new ViewGraph(graph); view.setVisible( true); } }
Cepstrum.java package math;
import audio.Channels; /** * @author Steven Morgan * * Utility class that provides Cepstrum related functions */ public class Cepstrum { /** * computes the complex log of this list of complex numbers * in the format as produced by the math.FFT.realft() method * * This method is experimental: there is a problem with phase * wraping - ie: the imaginary part of the complex log. * This problem is due to the multivalued result of a complex log * * @param dest - the list of complex numbers */ public static final void log( float[] dest) { float temp; dest[0] = ( float) Math.log(Math.abs(dest[0])); dest[1] = ( float) Math.log(Math.abs(dest[1])); for ( int i = 2; i < dest.length; i += 2) { float r = ( float) Math.sqrt( dest[i] * dest[i] + dest[i + 1] * dest[i + 1]); // 0 <= arg < Pi float arg = ( float) Math.acos(dest[i] / r); if (dest[i + 1] < 0.0) { //arg = (float)(2.0 * Math.PI - arg); arg = -arg; } //if (arg < -2.7) arg += 2.0 * Math.PI;
//arg += 8 * Math.PI; //System.out.print(((int) (arg * 1000)) / 1000.0 + ", "); dest[i] = ( float) Math.log(r); dest[i + 1] = arg; } /* * For uniqueness, it is necessary that the phase be * "unwrapped", which eliminates the jumps as the phase * passes between -PI and PI. This careful definition causes * the complex cepstrum of a real sequence to als o be a real * sequence. * Following is a utility function that will perform the * phase unwrapping for discrete numeric data. * This technique may fail for extremely oscillatory * functions, and is not appropriate for very noi sy data. */ float min = dest[3]; float max = min; //System.out.println(); boolean changed = true; while (changed) { changed = false; for ( int i = 2; i < dest.length - 2; i += 2) { float arg = dest[i + 1]; float next = dest[(i + 2) + 1]; /* while (next - arg > Math.PI) { next -= 2 * Math.PI; changed = true;
} */ while (arg - next > Math.PI) { next += 2 * Math.PI; changed = true; } dest[(i + 2) + 1] = next; //System.out.print(((int) (arg * 1000)) / 1000.0 + ", "); min = Math.min(min, next); max = Math.max(max, next); } float shift = 0.0f; if (min < -Math.PI) { } /* System.out.print(((int) (dest[dest.length - 2] * 1000)) / 1000.0 + ", "); System.out.println(); System.out.println("Min, Max:" + min + ", " + max); */ } } /** * computes the log of the modulus of this list of complex numbers * in the format as produced by the math.FFT.realft() method * * @param dest - the list of complex numbers to be modified */ public static final void logModulus( float[] dest) { float temp; dest[0] = ( float) Math.log(Math.abs(dest[0])); dest[1] = ( float) Math.log(Math.abs(dest[1])); for ( int i = 2; i < dest.length; i += 2) { float r =
( float) Math.sqrt( dest[i] * dest[i] + dest[i + 1] * dest[i + 1]); dest[i] = ( float) Math.log(r); dest[i + 1] = 0; } } /** * computes the log of the list of real numbers provided * * @param floats - the list of floats to be modified */ public static final void realLog( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) Math.log(Math.abs(floats[i])); } } /** * Computes the complex Cepstrum of the list of real valued data * - uses the experimental complex log method above * * @param floats - the list of real valued data * @return the complex cepstrum of the real input data */ public static final float[] complexCepstrum( float[] floats) { float[] cepstrum = ( float[]) floats.clone(); FFT.fastFourierTransform(cepstrum); log(cepstrum); FFT.inverseFastFourierTransform(cepstrum); return cepstrum; } /** * computes the cepstrum of the list of real valued data * - uses the logModulus method above to compute the log *
* @param floats - the list of real valued data * @return the real cepstrum of the real input data */ public static final float[] realCepstrum( float[] floats) { float[] cepstrum = ( float[]) floats.clone(); FFT.fastFourierTransform(cepstrum); logModulus(cepstrum); FFT.inverseFastFourierTransform(cepstrum); return cepstrum; } /** * Test helper method - assigns the given list of floats with * random values between - 1000 and 1000 * * @param floats - the list to be modified */ public static void randomize( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) (Math.random() - 0.5) * 2000; } } /** * Test method - to verify the cepstrum property for the given float lists: * the cepstrum of the convolution of two functions is * the same as the sum of the cepstrum of each * * This method is used to test experimental complexLog() method * * @param theC - the first input list of real numbers * @param theH - the second input list of real numbers * @param debug - flag set for debug output * @return - the number of differences between the two result lists */ public static final int testCepstrumLogProperty(
float[] theC, float[] theH, boolean debug) { float[] c = ( float[]) theC.clone(); float[] h = ( float[]) theH.clone(); float[] cepC = complexCepstrum(c); float[] cepH = complexCepstrum(h); FFT.fastFourierTransform(c); FFT.fastFourierTransform(h); if (debug) { FFT.printFloats( "F[c]" , c); FFT.printFloats( "F[h]" , h); } float[] ch = ( float[]) c.clone(); Channels.multiplyComplex2(ch, h); FFT.inverseFastFourierTransform(ch); if (debug) FFT.printFloats( "ch" , ch); float[] cepCcepH = ( float[]) cepC.clone(); Channels.addComplex(cepCcepH, cepH); float[] cepCH = complexCepstrum(ch); if (debug) { FFT.printFloats( "cC+cH" , cepCcepH); FFT.printFloats( "cepCH" , cepCH); } int diffcount = 0; for ( int i = 0; i < cepCcepH.length; i++) { if (Math.abs(cepCcepH[i] - cepCH[i]) >= 0.00001) { diffcount++; } } return diffcount; } /**
* Test method to repeatedly test the cepstrum log property * for random input lists * */ public static void testLogProperty() { float[] c = new float[1 << 3]; float[] h = new float[c.length]; for ( int i = 0; i < c.length; i++) { c[i] = ((17 * (i + 1)) % 8); h[i] = -4 + i * i; } boolean debug = false; int matchCount = 0, runCount = 100; for ( int i = 0; i < runCount; i++) { randomize(c); randomize(h); if (debug) { FFT.printFloats( "c" , c); FFT.printFloats( "h" , h); } int diffCount = testCepstrumLogProperty(c, h, false); if (diffCount == 0) { matchCount++; } } System.out.println( "matched " + matchCount + " out of " + runCount); } public static void main(String[] args) { testLogProperty(); } }
WatermarkedAudioInputStream.java
package player; import java.io.*; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioInputStream; import watermark.*; /** * @author Steven Morgan * * AudioInputStream Class which repeatedly applys a given Watermark * to fixed size windows from the source AudioInputStream * */ public class WatermarkedAudioInputStream extends AudioInputStream { private AudioInputStream ais; int currentPosition; int bytesInWindow; private byte[] window; int windowSize; Watermark watermark; public WatermarkedAudioInputStream(AudioInputStream ais, int windowSize) { super(ais, ais.getFormat(), ais.getFrameLength()); this.ais = ais; this.windowSize = windowSize; window = new byte[windowSize]; currentPosition = 0; bytesInWindow = 0; } public WatermarkedAudioInputStream(AudioInputStream ais, int windowSize, Watermark watermark) { this(ais, windowSize); setWatermark(watermark); }
public Watermark getWatermark() { return watermark; } public void setWatermark(Watermark watermark) { this.watermark = watermark; watermark.setAudioFormat(getFormat()); } public int available() throws IOException { return bytesInWindow - currentPosition + ais.available(); } public int read( byte[] buf, int offset, int length) throws IOException { int bytesRead = 0; while (bytesRead < length) { if (currentPosition >= bytesInWindow) { currentPosition = 0; bytesInWindow = ais.read(window, 0, windowSize); if (bytesInWindow <= 0) { break; } for ( int i = bytesInWindow; i < windowSize; i++) { window[i] = 0; } watermark.apply(window, 0, window.length); } int bytesToRead = Math.min(bytesInWindow - currentPosition, length - bytesRead); System.arraycopy( window, currentPosition, buf,
bytesRead, bytesToRead); bytesRead += bytesToRead; currentPosition += bytesToRead; } return bytesRead; } public int read( byte[] buf) throws IOException { return read(buf, 0, buf.length); } public int read() throws IOException { byte[] buf = new byte[1]; read(buf); return buf[0]; } public void close() throws IOException { ais.close(); } public synchronized void mark( int arg0) { throw new UnsupportedOperationException(); } public boolean markSupported() { return false; } public synchronized void reset() throws IOException { throw new UnsupportedOperationException(); } public long skip( long bytes) throws IOException { throw new UnsupportedOperationException(); } public AudioFormat getFormat() { return ais.getFormat(); }
public long getFrameLength() { return ais.getFrameLength(); } }
App.java package applet; import java.applet.Applet; import java.awt.*; import java.awt.event.*; import java.beans.*; import java.io.*; import javax.sound.sampled.*; import math.FFT; import audio.Band; import audio.Channels; /** * @author Steven Morgan * * Experimental Class used to vizualise and test the various * methods in the other packages * */ public class App extends Applet implements ActionListener, AdjustmentListener, PropertyChangeListener { public static final int MPEG_LAYER1_BLOCK_SIZE = 1 << 9; public static final int MPEG_LAYER2_BLOCK_SIZE = 1 << 10; public static final int BLOCK_SIZE = MPEG_LAYER1_BLOCK_SIZE;
AudioFormat audioFormat; Visualization vis; Label field; Graph currentSubbandGraph = null; float[] freq = new float[BLOCK_SIZE]; float[] allSubbands = null; public void setFreq( int frequency) { for ( int i = 0; i < freq.length; i++) { freq[i] = 0; } freq[frequency] = freq.length; FFT.inverseFastFourierTransform(freq); } public void setup(Visualization vis) { String filename = "I:\\Original Compositions\\Drum Samples\\808\\Open Hi-Hat\\TR-808\\OH\\OH75.WAV" ; try { File file = new File(filename); AudioInputStream ais = AudioSystem.getAudioInputStream(file); audioFormat = ais.getFormat(); byte[] buf = new byte[BLOCK_SIZE * audioFormat.getFrameSize()]; ais.read(buf, 0, buf.length); int[] channel = Channels.getChannel(buf, 0, audioFormat); ais.close(); float[] floats = Channels.getFloats(channel);
float[] hanning = Channels.hanning(floats.length); FloatGraph source = new FloatGraph( ( float[]) floats.clone(), Channels.getMinValue(audioFormat.getSampleSizeInBi ts()), Channels.getMaxValue(audioFormat.getSampleSizeInBi ts())); source.setColor(Color.black); source.setTitle( "s(t)" ); vis.addGraph(source); for ( int i = 0; i < floats.length; i++) { floats[i] *= hanning[i]; } FloatGraph graph = new FloatGraph(hanning); graph.setColor(Color.blue); graph.setTitle( "H(t)" ); //vis.addGraph(graph); FloatGraph hanningGraph = new FloatGraph(floats); hanningGraph.setColor(Color.red); hanningGraph.setTitle( "H(t) * s(t)" ); //vis.addGraph(hanningGraph); float[] fft = ( float[]) floats.clone(); FFT.fastFourierTransform(fft); FloatGraph fftGraph = new FloatGraph(fft); fftGraph.setColor(Color.green); fftGraph.setTitle( "fft(H(t) * s(t))" ); System.out.println(); float[] fftInverse = ( float[]) fft.clone(); FFT.inverseFastFourierTransform(fftInverse); graph = new FloatGraph(fftInverse); graph.setColor(Color.pink); graph.setScale(hanningGraph); graph.setTitle( "fft-1(fft(H(t) * s(t)))" );
//addGraph(graph); //vis.addGraph(fftGraph); float[] x = ( float[]) fft.clone(); for ( int i = 0; i < x.length; i++) { x[i] = ( float) (10.0 * Math.log(x[i] * x[i]) / Math.log(10.0)); } FloatGraph xGraph = new FloatGraph(x); xGraph.setTitle( "X(k)" ); xGraph.setColor(Color.blue); vis.addGraph(xGraph); allSubbands = x; Graph bandGraph = new Graph(( int) (audioFormat.getSampleRate() / 2)) { { minValue = 0; maxValue = 24; } public float getValue( int x) { return ( int) Band.getSubBand(x); } }; bandGraph.setTitle( "Sub Band" ); bandGraph.setColor( new Color(0, 128, 255)); vis.addGraph(bandGraph); //vis.graphs.clear(); setFreq(0); Graph g = new FloatGraph(freq, -2, 2); g.setColor(Color.black); g.setTitle( "freq" ); //vis.addGraph(g); } catch (UnsupportedAudioFileException e) { e.printStackTrace(); } catch (IOException e) {
e.printStackTrace(); } } public void init() { super.init(); LayoutManager layout = new FlowLayout(); this.setLayout(layout); vis = new Visualization(); setup(vis); vis.setSize(640, 480); { Scrollbar scrollbar = new Scrollbar() { public Dimension getPreferredSize() { return new Dimension(20, 200); } }; scrollbar.setName( "frequency" ); scrollbar.setValues(0, 1, 0, BLOCK_SIZE); scrollbar.setSize(20, 480); scrollbar.addAdjustmentListener( this); this.add(scrollbar); field = new Label( "0" ); this.add(field); } { Scrollbar scrollbar = new Scrollbar() { public Dimension getPreferredSize() { return new Dimension(20, 200); } }; scrollbar.setName( "subband" );
scrollbar.setValues(0, 1, 0, 25); scrollbar.setSize(20, 480); scrollbar.addAdjustmentListener( this); this.add(scrollbar); } this.add(vis); Button layoutBut = new Button( "Layout" ); layoutBut.setActionCommand( "layout" ); layoutBut.addActionListener( this); this.add(layoutBut); this.setSize(900, 480); this.doLayout(); } public void actionPerformed(ActionEvent event) { System.out.println(event.getActionCommand()); if ( "layout" .equals(event.getActionCommand())) { doLayout(); System.out.println( "size:" + vis.getWidth() + ", " + vis.getHeight()); } } public void adjustmentValueChanged(AdjustmentEvent event) { System.out.println( "adjustmentValueChanged(" + event.getValue() + ")" ); String componentName = ((Component) event.getSource()).getName(); System.out.println(componentName); if ( "frequency" .equals(componentName)) { setFreq(event.getValue()); field.setText( "" + event.getValue()); vis.repaint();
} else if ( "subband" .equals(componentName)) { vis.removeGraph(currentSubbandGraph); currentSubbandGraph = new FloatGraph( Band.getSubBand( allSubbands, event.getValue(), ( int) audioFormat.getSampleRate())); currentSubbandGraph.setColor(Color.red); vis.addGraph(currentSubbandGraph); vis.repaint(); } } public void propertyChange(PropertyChangeEvent event) { System.out.println( "propertyChange(" + event.getPropertyName() + ")" ); if ( "frequency" .equals(event.getPropertyName())) { System.out.println(event); vis.repaint(); } } }
Visualisation.java package applet; import java.awt.Component; import java.awt.Dimension; import java.awt.Graphics; import java.util.ArrayList; import java.util.Iterator; import java.util.List; /** * @author Steven Morgan
* * An awt Component which displays a plot of a number of graphs * The graphs are drawn overlaid ontop of eachother * */ public class Visualization extends Component { public List graphs; public Visualization() { graphs = new ArrayList(); } public void addGraph(Graph graph) { graphs.add(graph); } public boolean removeGraph(Graph graph) { return graphs.remove(graph); } public synchronized void paint(Graphics g) { int width = getWidth(); int height = getHeight(); Iterator it = graphs.iterator(); while (it.hasNext()) { Graph graph = (Graph) it.next(); /* System.out.println( graph.getTitle() + ": range[" + graph.getMinValue() + ", " + graph.getMaxValue() + "]"); */ float minHeight = graph.getMinValue(); float maxHeight = graph.getMaxValue(); g.setColor(graph.getColor());
for ( int t = 0; t < width; t++) { int sampleCount = 0; float minSample = maxHeight; float maxSample = minHeight; float averageSample = 0.0f; for ( int idx = (t * graph.size()) / width; idx <= ((t + 1) * graph.size()) / width; idx++) { float sample = graph.getValue(Math.min(idx, graph.size() - 1)); averageSample += sample; minSample = Math.min(minSample, sample); maxSample = Math.max(maxSample, sample); sampleCount++; } averageSample += sampleCount / 2; averageSample /= sampleCount; minSample = ( float) height - 1.0f - ((minSample - minHeight) / (maxHeight - minHeight)) * ( float) height; maxSample = ( float) height - 1.0f - ((maxSample - minHeight) / (maxHeight - minHeight)) * ( float) height; averageSample = ( float) height - 1.0f
- (averageSample - minHeight) * ( float) height / (maxHeight - minHeight); g.fillRect( t, Math.max(Math.round(maxSample), 0), 1, Math.round(minSample - maxSample + 1.0f)); /* g.fillOval(t, Math.round(maxSample + 0.5f), 5, 5); g.fillOval( t, Math.round(maxSample + 0.5f) + Math.round(minSample - maxSample), 5, 5); */ } } } public Dimension getMinimumSize() { return new Dimension(200, 200); } public Dimension getPreferredSize() { return new Dimension(800, 400); } }
Autocorrelation.java package math; /** * @author Steven Morgan * * Utility class that provides Autocorrelation related functions */ public class Autocorrelation { /** * Returns a list of the real components of a list of complex numbers. The input array * is in the format as produced by the math.FFT.realft() method * * @param floats - source and destination float array */ static final void pack( float[] floats) { float mid = floats[1]; for ( int i = 1; i < floats.length / 2; i++) { floats[i] = floats[i << 1]; } floats[floats.length / 2] = mid; for ( int i = 1; i < floats.length / 2; i++) { floats[floats.length - i] = floats[i]; } } public static final void sqrt( float[] floats) { for ( int i = 0; i < floats.length; i++) { floats[i] = ( float) Math.sqrt(floats[i]); } } /** * Calculates the real valued modulus squares of a list of * complex numbers in the format as produced by the math.FFT.realft() method
* * * @see math.FFT.realft() * * @param floats - source and destination float array */ public static final void modulusSqr( float[] floats) { floats[0] *= floats[0]; floats[1] *= floats[1]; for ( int i = 2; i < floats.length; i += 2) { float r = floats[i] * floats[i] + floats[i + 1] * floats[i + 1]; floats[i] = r; floats[i + 1] = 0; } pack(floats); } /** * Performs the autocorrelation function on the supplied list of floats * * @param floats - list of floats in the time domain */ public static void autocorrelation( float[] floats) { FFT.fastFourierTransform(floats); modulusSqr(floats); FFT.fastFourierTransform(floats); } public static float minPositive = 99999999; public static float maxNegative = 0; public static void main(String[] args) { } }
Player.java package player;
import java.io.IOException; import java.io.*; import javax.sound.sampled.*; /** * @author Steven Morgan * * Utility class to feed an AudioInputStream to a SourceDataLine * This can be used to listen to an AudioInputStream if the * AudioInputStream can provide the data in realtime */ public class Player { private InputStream ais; private SourceDataLine source; public Player(InputStream ais, SourceDataLine source) { this.ais = ais; this.source = source; } /** * Sends the AudioInputStream data to the SourceDataLine * This method will only return once the AudioInputStream has * been completely consumed by the SourceDataLine * * @throws LineUnavailableException * @throws IOException */ public void play() throws LineUnavailableException, IOException { int bytesRead; byte[] buf = new byte[10240]; source.open(); source.start(); do { bytesRead = ais.read(buf, 0, buf.length); if (bytesRead > 0) { source.write(buf, 0, bytesRead); }
} while (bytesRead > 0); source.drain(); source.stop(); source.close(); } }
AudioSystemInfo.java package test; import javax.sound.sampled.*; /** * @author Steven Morgan * */ public class AudioSystemInfo { public static final void main(String[] args) { System.out.println( "AudioFileTypes" ); AudioFileFormat.Type[] list = AudioSystem.getAudioFileTypes(); for ( int i = 0; i < list.length; i++) { System.out.println(list[i].toString()); } System.out.println( "Mixers" ); Mixer.Info[] mixerInfoList = AudioSystem.getMixerInfo(); for ( int i = 0; i < mixerInfoList.length; i++) { Mixer.Info info = mixerInfoList[i]; System.out.println(info.toString()); Mixer mixer = AudioSystem.getMixer(info); Line[] sourceLines = mixer.getSourceLines(); System.out.println( " Found " + sourceLines.length + " SourceLines" );
for ( int n = 0; n < sourceLines.length; n++) { System.out.println( " SourceLineInfo[" + n + "]: " + sourceLines[n].getLineInfo().toString()); } } Line.Info[] lines = { Port.Info.COMPACT_DISC, Port.Info.HEADPHONE, Port.Info.LINE_IN, Port.Info.LINE_OUT, Port.Info.MICROPHONE, Port.Info.SPEAKER }; System.out.println( "Ports" ); for ( int idx = 0; idx < lines.length; idx++) { Line.Info[] line = AudioSystem.getSourceLineInfo(lines[idx]); for ( int i = 0; i < line.length; i++) { System.out.println( "SourceLine: " + line[i].toString()); } line = AudioSystem.getTargetLineInfo(lines[idx]); for ( int i = 0; i < line.length; i++) { System.out.println( "TargetLine: " + line[i].toString()); } } BugFix.apply(); } }
Graph.java
package applet; import java.awt.Color; /** * @author Steven Morgan * * Abstract Class to represent the graph of a function * */ public abstract class Graph { protected float minValue, maxValue; private String title; private Color color = Color.BLACK; protected int theSize; public Graph() { } public Graph( int size) { this.theSize = size; } /** * @return the graphs title */ public String getTitle() { return title; } /** * sets the graphs title * * @param title */ public void setTitle(String title) { this.title = title; }
/** * @return - the color used to draw the graph */ public Color getColor() { return color; } /** * Sets the color with which to draw this graph * * @param color - the color of the graph */ public void setColor(Color color) { this.color = color; } /** * @return the minimum value on the Y axis */ public float getMinValue() { return minValue; } /** * @return the maximum value on the Y axis */ public float getMaxValue() { return maxValue; } public void setScale(FloatGraph graph) { this.minValue = graph.getMinValue(); this.maxValue = graph.getMaxValue(); } public abstract float getValue( int x); public final int size() { return theSize; } }
Tone.java package audio; /** * @author Steven Morgan * * Psycho - acoustic computation functions * */ public class Tone { /** * Extracts a list of Maxima points from a power density spectrum * * @param floats * @return */ public static Maxima[] getMaxima( float[] floats) { int maximaCount = 0; Maxima[] list = null; for ( int idx = 1; idx < floats.length - 1; idx++) { if (floats[idx] > floats[idx - 1] && floats[idx] >= floats[idx + 1]) { maximaCount++; } } list = new Maxima[maximaCount]; maximaCount = 0; for ( int idx = 1; idx < floats.length - 1; idx++) { if (floats[idx] > floats[idx - 1] && floats[idx] >= floats[idx + 1]) { Maxima maxima = new Maxima(); maxima.frequencyIndex = idx; maxima.tonal = true; int range = 2; if (idx >= floats.length / 4 - 1)
range = 3; if (idx >= floats.length / 2) range = 6; for ( int i = 2; i <= range; i++) { if ((idx - i >= 0) && (floats[idx] - floats[idx - i] < 7)) { maxima.tonal = false; break; } if ((idx + i < floats.length) && (floats[idx] - floats[idx + i] < 7)) { maxima.tonal = false; break; } } list[maximaCount++] = maxima; } } return list; } }
BugFix.java package test; /** * @author Steven Morgan * * To change the template for this generated type comment go to * Window>Preferences>Java>Code Generation>Code and Comments */ public class BugFix {
public static final void apply() { Thread[] ts = new Thread[2]; int threadCount = Thread.enumerate(ts); Thread javaSoundEventDispatcherThread = null; for ( int i = 0; i < threadCount; i++) { if ( "Java Sound event dispatcher" .equals(ts[i].getName())) { javaSoundEventDispatcherThread = ts[i]; } } if (javaSoundEventDispatcherThread != null) { System.out.println( "BugFix: Interrupting 'Java Sound event dispatcher' Thread" ); javaSoundEventDispatcherThread.interrupt(); } } }
9.2 Project Proposal
Understanding The Effectiveness Of Current Digital Watermarking Techniques
(sound or picture)
Author : Steven Morgan
Supervisor : Professor John P Fitch
Initial Project Description
With the advent of Internet publishing it is difficult to retain control over data like images
or sound. Some people have developed watermarking techniques (hiding data in the
image) to promote this control. A watermark is some data hidden in the medium in such a
way that a) the watermark is not visible/audible; b) the watermark is robust is the sense
that it is not easily removable by, say, changing a few pixels in the image; c) the
watermark is easily readable by the copyright owner.
The aim of this project is to implement a few watermarking techniques, and determine
their effectiveness against a few simple attacks.
Digital watermarking is the process of editing pictures, sounds or videos to
include an unnoticeable, robust change that can be read regardless of the manipulation of
the media. Watermarking is a lot like fingerprinting except that a watermark contains
more information about the owner.
Watermarking is used to enforce copyright laws and to ensure that the creator of any
media can add a subtle mark containing traceable information. Many different
watermarking techniques currently exist, but none are impenetrable with various forms of
attack existing to break the current techniques widely available. The evolution of
watermarking is still relatively primitive but it’s an area of cryptography that’s believed
to be a promising direction for the fight against piracy.
The growth of the Internet has created problems tracking the usage of media. The
copying of MPeg Layer 3s (MP3) and films through peer to peer programs and
unauthorised usage of images are problems that have become too big to be ignored and
the development of software and techniques to prevent this are becoming more and more
sophisticated.
Watermarking shows great promise as a copyright upholding technique since the
watermarker doesn’t rely on any other person to uphold its integrity. Previous methods of
copy protection have relied on the viewer or player to abide to the copyright protection
that they have. For example, on early Digital Versatile Disks (DVD), there was a boolean
marker that decided whether a disk was allowed to be copied or not. Some disks were
allowed one copy (where after the first copy the flag would be changed to never copy)
and others allowed no copy at all. The problem with this was that this relied on the DVD
player upholding this copy protection. With some people modifying the hardware or with
some players being created without any protection at all, the entire system collapsed. By
making the watermark untraceable, and invisible, it becomes a much more difficult task
to remove.
As long as there are different techniques preventing copyright infringement there will
always be a concerted effort to break these defences.
For watermarking to be effective, it has to fulfil three important criteria:
• The watermark is not visible or audible
To make the watermark a viable option, the user should not even know it is there.
Whether an editing of a picture or an echo added to a sound file, unless the
watermark is as undetectable as possible, people will not want to use it and it
makes the job of removing or distorting it a lot easier.
• The watermark is robust
Since there will be people who will do everything they can to remove these
watermarks, the watermark should still be recognisable after stretching, shearing,
shifting, rotating etc.
• The watermark is easily recognisable
Despite being well hidden, with the right procedure, the watermark should be
simple to find. After distortions there should be effective techniques for finding
the watermark despite it’s different form.
It is accepted that watermarking will probably never be impenetrable, since there will
always be a way to recognise the watermark so there will always be a way to distort or
destroy it. If the watermark is developed to a point where the media has to be heavily
modified to destroy the watermark and the watermark removal process is long and
arduous, then this should put most people off since the media will be tainted once the
mark is removed (like ink-tags attached to clothes in department stores; once the tag is
forcefully removed, it releases staining ink that permanently damages the item of clothing
involved).
Since there are many different watermarking schemes currently available, and since so
many of them are easily broken, a standard attack was created to test the effectiveness of
watermarks in general. This tool is known as the Stirmark Benchmark [10]. StirMark
is a relatively simple attack that applies minor distortion to media to see if the watermark
survives. A surprising number of current techniques fall down even at this basic level.
For example, Adobe Photoshop and Corel Draw come equipped a watermarking facility
called PictureMarc. It relies on a user ID and a two-digit password. This watermarking
procedure does not even meet the StirMark standard yet is widely used.
To fully understand watermarking you have to look as much at prevention techniques as
ways to break the system. It’s said that the best form of defence is attack. This project
will mainly be research based to fully appreciate the current techniques being used and
understanding where they fall short. This will involve research into all possible ways to
break the watermark too. Over time, the more inventive the attack techniques become,
the more effective the defence techniques will become. Unless people try to break the
watermarks, then the technology cannot evolve, but as long as piracy exists, the testers for
the software will already exist.
Implementing the watermark techniques will take a great understanding of how they
work. This may cause problems since watermarkers will not be very keen to give up the
information of how they specifically implement their watermarks since it would be a
security breach. Many watermarking techniques are widely available but efforts will
have to be made to ensure that all information found will be cutting edge.
What follows is a high level break-down of the tasks ahead to carry out this task:
• Research
There are currently many different watermarking techniques in existence, with
none working as the definitive. Background reading needs to be done to fully
understand the wide array of current watermarking techniques and how to
implement them. Some basic programming and ordered note taking should be
performed to ensure that all relevant data read can be utilised at a later date with
maximum efficiency. Most research should come from journals since
watermarking is a cutting edge technology that is constantly evolving.
• Implementation of many watermarking techniques across a wide array of
media
Once the research is carried out, to fully understand the capabilities of the various
watermarking techniques, many of them will have to be tested across a wide
array of media to ensure consistent results from the later testing. For example,
with the echo hiding technique, which implements an fractional echo on a sound
file (between 0.5 and 2 milliseconds), too small for the human ear to hear. The
larger the echo, the more effective the watermark, but the smaller the echo the
more hidden it is. These kind of tradeoffs are the ones that need to be
implemented to find the pros and cons of each technique.
• Testing the effectiveness of each watermark
For a watermark to be effective, it needs to withstand some heavy attacking and
still come out intact. Initial tests should measure each marking technique against
the StirMark to see which marks are truly ineffectual. More complicated attacks
will then discover which watermarks are effective against which techniques.
With echo hiding this would include testing of how large the echo can be before
human detection, the ease of finding the echo depending on its size etc.
• Evaluation of findings
Without any form of evaluation, the results would mean nothing. An evaluation
looks at all existing results and weighs the pros and cons of them all. At this
point suggested improvements to existing watermarking schemes could be
suggested and even implemented, depending on previous results.
Since this project is so research based, it’s hard to predict the precise direction that the
project takes since that will depend on conclusions drawn after the literature review.
Following this project plan however, should at least help understanding of the tasks ahead
how to go about tackling them.