Upload
khaled-hassan-mostafa
View
127
Download
1
Embed Size (px)
Citation preview
STEGANALYSIS OF BINARY IMAGES
This thesis is presented for the degree of
DOCTOR OF PHILOSOPHY
by
KANG LENG CHIEW
Department of ComputingFaculty of Science
MACQUARIE UNIVERSITYAustralia
June 2011
© 2011 KANG LENG CHIEW
TABLE OF CONTENTS
Page
LIST OF FIGURES iv
LIST OF TABLES vi
ABSTRACT vii
LIST OF PUBLICATIONS x
ACKNOWLEDGMENTS xi
1 Introduction 1
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Research Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . 6
2 Background and Concepts 9
2.1 Overview of Steganography . . . . . . . . . . . . . . . . . . . . . . 92.2 Steganalysis—Model of Adversary . . . . . . . . . . . . . . . . . . . 112.3 Level of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Blind Steganalysis as Pattern Recognition . . . . . . . . . . . . . . 14
2.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Digital Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.1 Image File Formats . . . . . . . . . . . . . . . . . . . . . . . 202.5.2 Spatial and Frequency Domain Images . . . . . . . . . . . . 21
3 Literature Review 22
3.1 Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.1 Liang et al. Binary Image Steganography . . . . . . . . . . . 223.1.2 Pan et al. Binary Image Steganography . . . . . . . . . . . . 253.1.3 Tseng and Pan Binary Image Steganography . . . . . . . . . 263.1.4 Chang et al. Binary Image Steganography . . . . . . . . . . 273.1.5 Wu and Liu Binary Image Steganography . . . . . . . . . . 283.1.6 F5 Steganography . . . . . . . . . . . . . . . . . . . . . . . . 283.1.7 OutGuess Steganography . . . . . . . . . . . . . . . . . . . . 293.1.8 Model-Based Steganography . . . . . . . . . . . . . . . . . . 30
i
3.2 Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.1 Differentiation of Cover and Stego Images . . . . . . . . . . 313.2.2 Classification of Steganographic Methods . . . . . . . . . . . 413.2.3 Estimation of Message Length . . . . . . . . . . . . . . . . . 473.2.4 Identification of Stego-Bearing Pixels . . . . . . . . . . . . . 523.2.5 Retrieval of Stegokey . . . . . . . . . . . . . . . . . . . . . . 563.2.6 Extracting the Hidden Message . . . . . . . . . . . . . . . . 58
4 Blind Steganalysis 59
4.1 Comparison of the Steganography Methods under Analysis . . . . . 604.2 Proposed Steganalysis Method . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Grey Level Run Length Matrix . . . . . . . . . . . . . . . . 624.2.2 Pixel Differences . . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 GLRL Matrix from the Pixel Difference . . . . . . . . . . . . 634.2.4 GLGL Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 674.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 674.3.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Multi-Class Steganalysis 70
5.1 Summary of the Steganographic Methods under Analysis . . . . . . 715.2 Proposed Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.1 Increasing the Grey Level via the Pixel Difference . . . . . . 735.2.2 Grey Level Run Length Matrix . . . . . . . . . . . . . . . . 755.2.3 Grey Level Co-Occurrence Matrix . . . . . . . . . . . . . . . 755.2.4 Cover Image Estimation . . . . . . . . . . . . . . . . . . . . 765.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Multi-Class Classification . . . . . . . . . . . . . . . . . . . . . . . . 795.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 815.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 82
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6 Hidden Message Length Estimation 86
6.1 Boundary Pixel Steganography . . . . . . . . . . . . . . . . . . . . 876.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.1 512-Pattern Histogram as the Distinguishing Statistic . . . . 886.2.2 Matrix Right Division . . . . . . . . . . . . . . . . . . . . . 916.2.3 Message Length Estimation . . . . . . . . . . . . . . . . . . 93
6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 946.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 946.3.2 Results of the Estimation . . . . . . . . . . . . . . . . . . . 95
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
ii
7 Steganographic Payload Location Identification 98
7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . 997.3 Proposed Stego-Bearing Pixel Location Identification . . . . . . . . 1017.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 1037.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 104
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8 Feature-Pooling Blind JPEG Image Steganalysis 109
8.1 Feature Extraction Techniques . . . . . . . . . . . . . . . . . . . . . 1098.1.1 Image Quality Metrics . . . . . . . . . . . . . . . . . . . . . 1108.1.2 Moment of Wavelet Decomposition . . . . . . . . . . . . . . 1108.1.3 Feature-Based . . . . . . . . . . . . . . . . . . . . . . . . . . 1118.1.4 Moment of CF of PDF . . . . . . . . . . . . . . . . . . . . . 112
8.2 Features-Pooling Steganalysis . . . . . . . . . . . . . . . . . . . . . 1138.2.1 Feature Selection in Feature-Based Method . . . . . . . . . . 1138.2.2 Feature-Pooling . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 1168.3.1 Classifier Selection . . . . . . . . . . . . . . . . . . . . . . . 1168.3.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 118
8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9 Improving JPEG Image Steganalysis 121
9.1 Steganography as Additive Noise . . . . . . . . . . . . . . . . . . . 1219.2 Image-to-Image Variation Minimisation . . . . . . . . . . . . . . . . 1229.3 Steganalysis Improvement . . . . . . . . . . . . . . . . . . . . . . . 125
9.3.1 Moments of Wavelet Decomposition . . . . . . . . . . . . . . 1259.3.2 Moment of CF of PDF . . . . . . . . . . . . . . . . . . . . . 1269.3.3 Moment of CF of Wavelet Subbands . . . . . . . . . . . . . 126
9.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 1279.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 128
9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
10 Conclusions and Future Research Directions 131
10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13110.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . 132
Bibliography 134
iii
LIST OF FIGURES
Page
1.1 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 General model of steganography . . . . . . . . . . . . . . . . . . . . 102.2 General framework of blind steganalysis . . . . . . . . . . . . . . . 152.3 Two-class SVM classification . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Example of eligible pixels . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Example of ineligible pixels . . . . . . . . . . . . . . . . . . . . . . 243.3 Effect of flipping a pixel . . . . . . . . . . . . . . . . . . . . . . . . 263.4 Measurement of smoothness and connectivity . . . . . . . . . . . . 293.5 Algorithm of model-based steganography . . . . . . . . . . . . . . . 313.6 Co-occurrence matrices extracted from cover and stego images . . . 333.7 Illustration of wavelet decomposition . . . . . . . . . . . . . . . . . 373.8 Intra- and inter-block correlations in a JPEG image . . . . . . . . . 393.9 The 64 modes of an 8×8 DCT block . . . . . . . . . . . . . . . . . 433.10 Modified image calibration for double compressed JPEG image . . . 443.11 One-against-one approach for a multi-class classification . . . . . . . 463.12 A portion of image histogram before and after LSB embedding . . . 483.13 The boundaries of 8× 8 blocks . . . . . . . . . . . . . . . . . . . . 493.14 The extraction of residual image . . . . . . . . . . . . . . . . . . . . 55
4.1 Detection results displayed in ROC curves and AUR . . . . . . . . 68
5.1 Pixel difference in vertical direction . . . . . . . . . . . . . . . . . . 73
6.1 Illustration of a boundary pixel . . . . . . . . . . . . . . . . . . . . 896.2 Examples of 512 patterns . . . . . . . . . . . . . . . . . . . . . . . . 896.3 Comparison of patterns histogram between cover and stego images . 906.4 Histogram difference between two binary images . . . . . . . . . . . 926.5 Histogram quotient with increasing message length . . . . . . . . . 946.6 Estimated length of hidden messages for all binary images . . . . . 956.7 Example of a highly distorted stego image . . . . . . . . . . . . . . 966.8 Estimation error of hidden message length for all binary images . . 97
7.1 Identification results for different window sizes . . . . . . . . . . . . 1047.2 Comparison of results for image Database A . . . . . . . . . . . . . 1077.3 Comparison of results for image Database B . . . . . . . . . . . . . 1077.4 Comparison of results for image Database C . . . . . . . . . . . . . 107
8.1 Features comparison in detecting F5 . . . . . . . . . . . . . . . . . 114
iv
8.2 Features comparison in detecting OutGuess . . . . . . . . . . . . . 1158.3 Features comparison in detecting MB1 . . . . . . . . . . . . . . . . 1158.4 Classifier comparison in detecting F5 . . . . . . . . . . . . . . . . . 1178.5 Classifier comparison in detecting OutGuess . . . . . . . . . . . . . 1178.6 Classifier comparison in detecting MB1 . . . . . . . . . . . . . . . . 1188.7 Comparison of steganalysis performance in detecting F5 . . . . . . . 1198.8 Comparison of steganalysis performance in detecting OutGuess . . . 1208.9 Comparison of steganalysis performance in detecting MB1 . . . . . 120
9.1 Two images with their respective underlying statistics . . . . . . . . 1239.2 Transformed image by scaling and cropping . . . . . . . . . . . . . 124
v
LIST OF TABLES
Page
4.1 Comparison of the steganographic techniques . . . . . . . . . . . . . 614.2 Summary of the 68-dimensional feature space . . . . . . . . . . . . 664.3 Experimental parameters . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Properties of features . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 Example of majority-voting strategy for multi-class SVM . . . . . . 805.3 Summary of image databases . . . . . . . . . . . . . . . . . . . . . 815.4 Summary of stego image databases . . . . . . . . . . . . . . . . . . 825.5 Confusion matrix for the textual database . . . . . . . . . . . . . . 845.6 Confusion matrix for the mixture database . . . . . . . . . . . . . . 855.7 Confusion matrix for the scene database . . . . . . . . . . . . . . . 85
6.1 Mean and standard deviation of the estimation . . . . . . . . . . . 96
7.1 Summary of image databases . . . . . . . . . . . . . . . . . . . . . 1037.2 The accuracy of the identification for image Database A . . . . . . . 1057.3 The accuracy of the identification for image Database B . . . . . . . 1057.4 The accuracy of the identification for image Database C . . . . . . 106
8.1 Feature selection comparison for SFFS, T-test and Bhattacharyya . 114
9.1 Comparison for the proposed technique and the Farid technique . . 1289.2 Comparison for the proposed technique and the COM technique . . 1299.3 Comparison for the proposed technique and the MW technique . . . 129
vi
ABSTRACT
Steganography is a science of hiding messages into multimedia documents. A
message can be hidden in a document only if the content of a document has high
redundancy. Although the embedded message changes the characteristics and
nature of the document, it is required that these changes are difficult to be iden-
tified by an unsuspecting user. On the other hand, steganalysis develops theories,
methods and techniques that can be used to detect hidden messages in multi-
media documents. The documents without any hidden messages are called cover
documents and the documents with hidden messages are named stego documents.
The work of this thesis concentrates on image steganalysis. We present four differ-
ent types of steganalysis techniques. These steganalysis techniques are developed
to counteract the steganographic methods that use binary (black and white) im-
ages as the cover media. Unlike greyscale and colour images, binary images have
a rather modest statistical nature. This makes it difficult to apply directly the
existing steganalysis on binary images.
The first steganalysis technique addresses blind steganalysis. Its objective is to
detect the existence of a secret message in a binary image. Since the detection of
a secret message is often modelled as a classification problem, consequently it can
be approached using pattern recognition methodology.
The second steganalysis technique is known as multi-class steganalysis. Its purpose
is to identify the type of steganographic method used to create the stego image.
This extends the earlier blind steganalysis from two-class (cover or stego image) to
multi-class (cover or different types of stego images) classification. Similar to blind
steganalysis, this technique is also based on the pattern recognition methodology
to perform the classification.
The third steganalysis technique uses first-order statistic—binary pattern
histogram—to estimate the length of an embedded message. This technique is
used specifically to analyse the steganography developed by Liang et al. The es-
timated message length usually plays an important role and is needed at other
levels of analysis.
The fourth steganalysis technique identifies the steganographic payload locations
based on multiple stego images. This technique can reveal which pixels in the
binary image carry the message bits. This technique is crucial as it not only
vii
reveals the existence of a hidden message, it also provides information to locate
the hidden message.
Finally, we proposed two improvements to existing JPEG image steganalysis. We
combined several feature sets and applied a feature selection technique to obtain
a set of powerful features. We showed that by minimising the influence of image
content, we can improve the features sensitivity with respect to steganographic
alteration.
viii
STATEMENT OF CANDIDATE
I certify that the work in this thesis entitled “STEGANALYSIS OF BINARY
IMAGES” has not previously been submitted for a degree nor has it been sub-
mitted as part of requirements for a degree to any other university or institution
other than Macquarie University.
I also certify that the thesis is an original piece of research and it has been written
by me. Any help and assistance that I have received in my research work and the
preparation of the thesis itself have been appropriately acknowledged.
In addition, I certify that all information sources and literature used are indicated
in the thesis.
KANG LENG CHIEW
(41375521)
8 June 2011
ix
LIST OF PUBLICATIONS
1. K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Ste-
ganalysis. IEEE Conference on Digital Image Computing: Techniques and
Applications, 96–103, 2008.
2. K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement via
Image-to-image Variation Minimization. International IEEE Conference on
Advanced Computer Theory and Engineering, 223–227, 2008.
3. K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in Binary
Image Embedded by Using Boundary Pixels Steganography. International
Conference on Availability, Reliability and Security, 683–688, 2010.
4. K. L. Chiew and J. Pieprzyk. Blind Steganalysis: A Countermeasure for
Binary Image Steganography. International Conference on Availability, Re-
liability and Security, 653–658, 2010.
5. K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Clas-
sication Based on Multi-Class Steganalysis. 6th International Conference on
Information Security, Practice and Experience, 6047:341–358, 2010.
6. K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Location
in Binary Image. 11th Pacic Rim Conference on Multimedia—Advances in
Multimedia Information Processing, 6297:590–600, 2010.
x
ACKNOWLEDGMENTS
I would like to express my sincere appreciation to my supervisor, Professor Josef
Pieprzyk for his countless help, assistance and guidance in every stage of my
research. I have benefited a lot from the valuable discussion with him since the
very beginning of my research.
I would also like to express my gratitude and special thanks to Dr. Scott McCallum
for being so patient and inspiring in guiding my academic writing skill. The
interaction with him has tremendously improved my understanding in academic
writing.
I want to take this opportunity to thank Ministry of Higher Education Malaysia
and Universiti Malaysia Sarawak for providing me with SLAI scholarship for the
research. I am also very grateful for the HDR Project Support Funds supported
by Macquarie University.
Very special thanks to Joan for spending valuable time to proof-read my thesis.
I would like to thank Nana who always provides me with valuable information,
hints and updates related to my research. I would also like to thank Gaurav for
the enjoyable discussions and interactions. To all the staff in the Department of
Computing, their excellent supports are highly appreciated.
Thanks to my parent, brother, sister and brother-in-law for their continuous sup-
port, encouragement and motivation in me throughout the years.
I am so grateful to my wife, for her love, thoughtful comment, support and nur-
turing in all aspect. Her advice and encouragement have been always be the point
of reference whenever I am lost. The surviving moment would be much tougher if
without her accompany.
And finally, to all the people who have helped directly and indirectly to support
me throughout this undertaking, thank you.
This thesis was edited by Dr Lisa Lines, and editorial intervention was restricted
to Standards D and E of the Australian Standards for Editing Practice.
xi
Chapter 1
Introduction
The process of sending messages between two parties through a public channel
in such a way that it deceives the adversary from realising the existence of the
communication is known as steganography. Tracing back to antiquity, Histaiacus
shaved a slave’s head, wrote a message on his scalp and the slave was sent as a
messenger after his hair grew back to convey steganographic content [12]. The
Greeks received warning about the intention of invasion by Xerxes from a message
underneath a writing tablet covered by wax [3, 84]. In a more recent history,
invisible ink was used as a form of steganography during World War II [12, 59] to
establish covert communication.
An application of steganography was reported in the literature around 1980’s when
British Prime Minister Margaret Thatcher had the word processors programmed
to encode the identity in the word spacing to trace disloyal ministers that were
responsible for the leaks of cabinet documents [2, 3].
The ongoing development of computer and network technologies provides an excel-
lent new channel for steganography. Most digital documents contain redundancy.
This means that there are parts of documents that can be modified without an
impact on their quality. The redundant parts of a document can be identified in
many distinct ways. Consider an image. Typically, margins of the image do not
convey any significant information and they can be used to hide a secret message.
Also, some pixels of the image can be modified to carry a small number of secret
bits as small modification (e.g., least significant bit of pixels) will not be noticeable
to an unsuspecting user. As the redundant parts of a digital document can be
determined in a variety of ways, many steganographic methods can be developed.
Mainly, steganography considers methods and techniques that can create covert
1
communication channels for unobtrusive transmission for military purposes.
Steganography is also used for automatic monitoring of radio advertisements, in-
dexing of videomail (to embed comments) and medical imaging (to embed infor-
mation like patient and physician names, DNA sequences and other particulars)
[3]. Other applications include: smart video-audio synchronization, secure and
invisible storage of confidential information, identity cards (to embed individuals’
details) and checksum embedding [12].
Steganography is also used for the less dramatic purpose of watermarking. The
applications of watermarking mainly involve the protection of intellectual property
such as ownership protection, file duplication management, document authentica-
tion (by inserting an appropriate digital signature) and file annotation.
1.1 Motivations
Like most other areas, steganography has thrived in the digital era. Many inter-
esting steganographic techniques have been created and its continuing evolution
is guaranteed by a growing need for information security. Inevitably, they are
potentially open to abuse and can be used by criminals and terrorists.
An article from USA Today stated that steganography was used by terrorists [60],
although there was little evidence to substantiate this claim [79]. Nonetheless,
after the 9/11 incident, it has triggered immediate concern on the possibility that
steganography can be used in the terrorism planning. In addition, several reports
from the literature stated that steganography has been suspected as a possible
means of covert communication and planning of terrorist attacks [6, 103, 52].
A training manual for the Mujahideen, which contains an exposition on image
steganography over the Internet is also reported in Hogan’s PhD thesis [52]. While
initially the use of steganography by terrorists appeared doubtful, it has since
become accepted and should be treated seriously.
For the less drastic case, those who wish to evade surveillance (e.g., who have
reason to fear punishment for expressing sensitive political thoughts) can use
steganography. For example, the communication between members of a politi-
cal dissident organisation is usually under surveillance. The adversary (i.e., gov-
ernment agencies) may arrest the dissidents if evidence of sensitive issue being
discussed and planned is found. Therefore, steganography may be the safest form
of communication between dissidents. There are a large number of steganographic
2
tools available as commercial software or freeware, which can be easily downloaded.
With these tools, accomplishing such activities will become even simpler1. As a
result, this has created unique challenges for law enforcement agencies.
Digital media and information technology have developed rapidly and are ubiq-
uitous. Information is stored digitally and is abundant. Specifically, there are
a multitude of daily tasks that involves dealing with documents. The originals
of these documents might be digital or they may be converted from hardcopies
into appropriate digital formats. In general, the majority of documents are binary
(black and white), which consist of foreground (black) and background (white).
Scanning such a document obtains a binary image that can potentially be used as
a medium for steganography. This deserves a careful analysis.
Despite the importance and widespread use of binary images in steganography, it
has received little attention, especially the steganalysis of binary image steganog-
raphy. More research is found on the more commonplace steganalysis of greyscale
and colour images; however, these techniques cannot be directly used to analyse
binary image steganography. Therefore, a more appropriate and effective set of
techniques should be developed.
1.2 Research Problems
In general, the steganalysis techniques can be categorised into six levels depending
on how much information about the hidden messages we require. These levels
(ordered according to the increased amount of information acquired) are as follows:
❐ Differentiation between cover and stego documents—this is the first step in
steganalysis and the purpose of this technique is to determine if a given
document carries a hidden message.
❐ Identification of steganographic method—this technique identifies the type of
steganographic method used and it is the so-called multi-class steganalysis.
❐ Estimation of the length of a hidden message—this technique reveals the
amount of embedded message as the acquired information.
❐ Identification of stego-bearing pixels—this technique uncovers the exact lo-
cations where the pixels are used to carry the message bits.
❐ Retrieval of stegokey—this technique provides access to the stego-bearing
1A list of free steganographic tools can be found in the citation entry #25 given in [12].
3
pixels as well as the embedding sequence.
❐ Message extraction—this technique normally involves extracting and deci-
phering the hidden message to obtain a meaningful message.
1.3 Objectives
The main part of the thesis is steganalysis of information hiding techniques. The
task of steganalysis is to design an algorithm that can tell apart a cover doc-
ument from its copy but with a hidden message. A larger part of steganalysis
works published so far deals with grayscale and color images. We consider a less
explored area of binary image steganography, which becomes more and more im-
portant for electronic publishers, distribution, management of printed documents
and electronic libraries.
To summarise, our main objectives cover the following:
❐ To study techniques that can be applied to distinguish the images hidden
with secret messages from those without. This technique will serve as an
automated system to perform the analysis on a large number of images.
❐ To evaluate the functionality of the steganalysis technique across different
steganographic methods. In particular, we are going to investigate how the
steganalysis technique could be used to detect new and unknown stegano-
graphic methods.
❐ To investigate different types of binary image steganography. This is impor-
tant to gain an understanding of the internal mechanism used during the
embedding operation.
❐ To make contributions that will extend the steganalysis technique to extract
additional secret parameters. These secret parameters include hidden mes-
sage length, type of steganographic method used, locations of stego-bearing
pixels and secret key.
Note that there are two aspects of steganalysis. The first relates to the attempt
to break or attack a steganography; the second uses it as an effective way of
evaluating and measuring steganography security performance. This work studies
steganalysis in terms of the first aspect. In particular, we aim to carry out different
levels of analysis to extract the relevant secret parameters.
4
1. Introduction 10. Conclusion
Background
and Review
2. Background
and Concepts
3. Literature
Review
Steganalysis
Enhancement
9. Improving
JPEG Image
Steganalysis
8. Feature-Pooling
Steganalysis
Binary Image
Steganalysis
5. Multi-Class
Steganalysis
6. Message
Length
Estimation
7. Payload
Location
Identifcation
4. Blind
Steganalysis
Figure 1.1: Overview of the thesis
1.4 Research Overview
The general structure of the thesis is shown in Figure 1.1. The chapters can
be divided into the following three parts: background and review, binary image
steganalysis and steganalysis enhancement. The background and review part de-
scribes the main developments and concepts in steganography and its analysis. It
also describes the state of the art and major publications that have influenced the
research developments in the field. The binary image steganalysis part presents
techniques to counteract binary image steganography. The underlying ideas are
to employ statistical techniques to analyse the given images. The steganalysis
enhancement part provides improvement to some of the existing steganalysis tech-
niques that deal with JPEG images.
1.4.1 Contributions
The major contributions of this thesis are listed below.
❐ Blind steganalysis. We have developed a steganalysis technique to distin-
guish a stego image from a cover image. Mainly, we have broken several
steganographic methods from the literature. This technique uses an image
processing technique that extracts sensitive statistical data as the feature
set. From the feature set, it employs classifier to determine the existence of
a secret message. In addition, this technique can be refined and used to de-
tect a different type of steganographic method. This property is important
when dealing with an unknown and new steganographic method.
5
❐ Multi-class steganalysis. We have extended our blind steganalysis to deter-
mine the type of steganographic method used to produce the stego image.
This is important information that allows an adversary to mount a more
specific attack. From the literature review, this is the first multi-class ste-
ganalysis technique developed particularly to attack binary image steganog-
raphy.
❐ Message length estimation. We have designed a simple yet effective technique
based on first-order statistic to estimate the length of an embedded message.
This estimation is crucial and normally is required if we intend to extract a
hidden message. We have identified that the notches and protrusions can be
utilised to approximate the degree of image distortion caused by embedding
operation. In particular, this technique attacks the steganographic method
developed in [69].
❐ Steganographic payload locations identification. We have presented a tech-
nique to identify the locations where hidden message bits are embedded.
This technique is one of the very few researches in the literature that is
able to extract additional secret information. Eventually, this information
is very important for an adversary who wishes to remove a hidden message
or deceive communication.
❐ Enhancement of existing steganalysis techniques. We have proposed improve-
ment to existing JPEG image steganalysis. Specifically, we select and com-
bine several types of features from several existing steganalysis techniques
by using a feature selection technique to form a more powerful blind ste-
ganalysis. We have shown that the technique has improved the detection
accuracy and also reduced the computational resources. We also show that
by minimising the influence of image content, the detection accuracy can be
improved.
1.4.2 Organisation of the Thesis
The rest of the thesis is organised into nine chapters.
Chapter 2 introduces some background to explore the state-of-the-art techniques
studied in this work. Additionally, we introduce the fundamental concepts that
will be used in the following chapters. More precisely, this chapter gives short in-
troductions to the field, including the definitions, terms, synonyms and taxonomy.
Chapter 3 reviews the literature related to our work. We select several steganalysis
6
techniques that are going to be analysed in the thesis. To make the presentation as
meaningful as possible, the reviews are organised into different levels of analysis.
There are myriad of possible steganographic methods available; however, we will
discuss only the methods selected for our analysis. Please refer to [12] for a
comprehensive review of steganography.
Our steganalysis starts from finding an algorithm that is able to distinguish a cover
image from a stego one. This work employs pattern recognition methodology to
perform the classification. Our focus is to extract a discriminative feature set to
enable accurate detection of the existence of secret messages. This analysis was
published in [20] and is presented in Chapter 4.
Chapter 5 discusses an algorithm for identification of a steganographic method
that has been used to embed a secret message into a binary image. We assume
that the collection of possible methods is known. The objective of this analysis is
twofold: to differentiate an image with hidden message from one without and to
identify the type of steganographic method used. This analysis is an extension on
the work presented in Chapter 4 to form a more powerful multi-class steganalysis.
This work has been published in [19].
In Chapter 6, we present a technique for estimating the length of a hidden message
embedded in a binary image. This estimated length is one of the important secret
steganographic parameters and is usually required to accomplish further analysis,
such as retrieving the stegokey shared between the sender and receiver. The
technique presented in this chapter has also been published in [21].
The work done in the previous chapters so far has enabled us to discriminate
images with a hidden message from those without one. However, the ability to
discriminate images does not enable us to locate the hidden message. Therefore, we
wish to investigate the identification of hidden message bits location in an image.
The work is based on the concept developed by Ker [62] where it is assumed that
we may access different stego images with message bits embedded in the same
locations. This assumption is possible when the same stegokey is reused for a
batch of secret communications. The essential difference is the medium under
the analysis, namely the binary image, which is known to have modest statistical
characteristics. This work is presented in Chapter 7. An initial study of this
chapter has been published in [22].
Although the previous chapters focused primarily on binary image steganalysis,
we have also paid attention to the steganalysis in other image domains. Our
7
contribution to greyscale image steganalysis is supplementary, but is as important
as that of the other chapters and is presented in Chapters 8 and 9. This work
can be considered an adjunct to existing steganalysis techniques that contributes
some enhancements. The enhancements discussed in Chapters 8 and 9 have been
published in [17] and [18], respectively.
We conclude the thesis in Chapter 10 where we discuss possible future directions
for the research.
8
Chapter 2
Background and Concepts
This chapter introduces and defines the concepts used throughout this thesis and
provides relevant background information. We start by providing an overview
of steganography and a formal definition. We also provide the description of
its counterpart, namely steganalysis. We discuss different types of steganalysis,
which are referred to as different levels of analysis. For steganalysis that involves
classification, we dedicate a section that discusses different types of classifiers.
Finally, since this thesis focuses on the analysis of image steganography, we also
provide a description of a variety of common digital images used for steganography.
2.1 Overview of Steganography
Usually cryptography is used to protect a communication from eavesdropping.
Messages are encrypted and only a rightful recipient can decrypt and read the mes-
sages. However, encrypted messages are obvious, which might arouse the suspicion
of an eavesdropper. Consequently, the communication is probably susceptible to
attacks.
Steganography is an alternative method for privacy and security. Instead of en-
crypting, we can hide the messages in other innocuous looking medium (carrier) so
that their existence is not revealed. Clearly, the goal of cryptography is to protect
the content of messages, steganography is to hide the existence of messages. An
advantage of steganography is that it can be employed to secretly transmit mes-
sages without the fact of the transmission being discovered. Often, cryptography
and steganography are used together to achieve higher security.
9
Embedding ExtractionCarrier
Message Message
Key
Public Channel
Figure 2.1: General model of steganography
Steganography can be mathematically defined as follows:
Emb : C ×M ×K → S
Ext : S ×K → M, (2.1)
such that Emb(C,M,K) = S and Ext(S,K) = M . Emb and Ext are the embed-
ding and extraction mapping functions, respectively. C is the cover medium, S is
the medium embedded with message M and K denotes the key.
Figure 2.1 shows a simple representation of the generic embedding and extrac-
tion operation in steganography. During the embedding operation, a message is
inserted into the medium by altering some portion of it. The extraction oper-
ation involves the recovery of the message from the medium. In this example,
the message is embedded inside a carrier and is transmitted via a public channel
(e.g., internet). While at the receiving site, the message is extracted using the key
shared between the sender and receiver. The message is the hidden information
and can be a plain text, cipher text, image or anything that can be converted into
stream of bits.
Consider a typical image steganography. In the embedding operation, a secret
message is transformed into a bit stream of bits, which is embedded into the
least significant bits (LSBs) of the image pixels. The embedding overwrites the
pixel LSB with the message bit if the pixel LSB and message bit do not match.
Otherwise, no changes are necessary. For the extraction operation, message bits
are retrieved from pixel LSBs and combined to form the secret message.
There are two main selection algorithms that can be employed to embed secret
message bits: sequential and random. For sequential selection, the locations of
10
pixels used for embedding are selected sequentially—one after another. For in-
stance, pixels are selected from left to right and top to bottom until all message
bits are embedded. With random selection, the locations of the pixels used for
embedding are permuted and distributed over the whole image. The distribution
of the message bits is controlled by a pseudorandom number generator (PRNG)
whose seed is a secret shared by the sender and the receiver. This seed is also
called the stegokey.
The latter selection method provides better security than the former because ran-
dom selection scatters the image distortion over the whole image, which makes
it less perceptible. In addition, the complexity of tracing the selection path for
an adversary is increased when random selection is applied. Apart from this,
steganographic security can be enhanced by encrypting the secret message before
embedding it.
Almost any form of digital media can be used for steganographic purposes as long
as the information in the media has redundancy. These media can be classified (but
not limited) to the following categories: images, videos, audios, texts, executable
files and computer file systems [67, 94, 5, 81, 29, 26, 104, 118, 46, 1, 28, 83]. The
most common medium is an image, as the large redundancy of images allows easy
embedding of messages [78]. The input image used in the embedding operation
is called the cover image; the generated output image (with the secret message
embedded in it) is called the stego image. Ideally, the cover and stego images
should appear identical—it should be difficult for an unsuspecting user to tell
apart the stego image from the cover image.
A list of possible choices for cover images includes binary (black and white),
greyscale and colour images. Tseng and Pan [107] developed a steganography
that embeds a secret message in a binary image, and Liang et al. [69] used binary
images in their steganography. The OutGuess [90] and F5 [110] are examples of
steganography that apply greyscale and colour images. A more recent stegano-
graphic method developed by Yang (see [117]) uses colour images.
2.2 Steganalysis—Model of Adversary
The invasive nature of steganography leaves detectable traces within the stego
image. This allows an adversary to use steganalysis techniques to reveal that a
secret communication is taking place. Sometimes, an adversary is also referred
11
to as a warden. In general, there are two types of warden: passive and active.
A passive warden only examines the communication and wishes to know if the
communication contains some hidden messages. The warden does not modify
the content of the communication. For example, the communication is allowed if
no evidence of secret message is found. Otherwise, it is blocked. On the other
hand, an active warden may introduce distortion to interrupt and destroy the
communication although there is no evidence of secret communication. Most
current steganographic methods are designed for the passive warden scenario.
Without loss of generality, we will use the term adversary instead of warden in all
the following steganalysis scenarios.
Beside the warden scenario discussed above, sometimes an adversary may not
have the authority or resources to block the communication. Then, the adversary
might wish to acquire related secret information (parameters) or even to extract
the secret message. Note that our works are based on this type of adversary who
wants to extract information about a secret message. We will discuss this at length
shortly in the next section.
In general, there are two types of steganalysis: targeted and blind. Targeted ste-
ganalysis is designed to attack one particular embedding algorithm. For example,
the work in [7, 49, 57, 42] is considered targeted steganalysis. Targeted steganal-
ysis can produce more accurate results, but it normally fails if the embedding
algorithm used is not the target.
Blind steganalysis can be considered a universal technique for detecting different
types of steganography. Because blind steganalysis can detect a wider class of
steganographic techniques, it is generally less accurate; however, blind steganalysis
can detect new steganographic techniques where there is no targeted steganalysis
available yet. In other words, blind steganalysis is an irreplaceable detection tool
if the embedding algorithm is unknown or secret. The feature-based steganalysis
developed in [35] is one example of successful blind steganalysis. Other examples
are to be found in [99, 70].
The most widely used definition of steganography security is based on Cachin
scheme [8]. Let the distribution of cover image and stego image be denoted as
PC and PS, respectively. Cachin defined steganography security by comparing the
distribution, PC and PS. The comparison can be made by using Kullback-Leibler
distance defined as follows:
D(PC‖PS) =∑
c∈C
PC(c) logPC(c)
PS(c). (2.2)
12
When D(PC‖PS) = 0, it means the distribution of stego image, PS is identical
to the distribution of cover image, PC . This implies the steganography is per-
fectly secure because it is impossible for the adversary to distinguish between
cover and stego images. If D(PC‖PS) ≤ ǫ, then Cachin defined the steganogra-
phy as ǫ-secure. Thus, the smaller ǫ is, the greater the likelihood that a covert
communication (i.e., steganography) will not be detected.
As discussed in [25], another possible way to define steganography security is based
on a specific steganalysis technique. Alternatively, one could define the security
with respect to the inability of an adversary to prove the existence of covert
communication. In other words, a steganographic method may be considered
“practically secure” if no existing steganalysis technique can be used to mount a
successful attack.
2.3 Level of Analysis
Under ideal circumstances, an adversary applying steganalysis intends to extract
the full hidden information. This task can be very difficult, or even impossible
to achieve. Thus, the adversary may start steganalysis with more realistic and
modest goals in mind, such as restricting the effort to differentiating cover and
stego images, classifying the embedding technique, estimating the length of hidden
messages, identifying the locations where bits of hidden information are embedded
and retrieving the stegokey. Achieving some of these goals allows improvement of
the steganalysis, making it more effective and appropriate for the steganographic
method.
The first step in analysing steganography can be distinguishing cover from stego
images. This involves analysing the characteristics of the image and looking for
the evidence of abnormalities. This step is plausible because the embedding op-
eration will distort the image content and produce deviations from normal image
characteristics. For example, the first-order statistic of a stego image tends to
form histogram bin pairing, where this abnormal characteristic practically never
occurs in a cover image. Normally, this analysis is known as the most basic level
of blind steganalysis.
It is also possible to extend this level of blind steganalysis to a more involved
level, known as multi-class steganalysis. From a practical perspective, multi-class
steganalysis is similar to the basic level; however, instead of classifying two classes
13
(cover and stego images), multi-class steganalysis can classify images into more
classes that come from different types of stego images produced by different em-
bedding techniques. Hence, the task of multi-class steganalysis is to identify the
embedding algorithm applied to produce a given stego image, or to classify it as
a cover image if no embedding is performed on it.
Normally, to avoid suspicion, the amount of message embedded is far less than the
image can accommodate. Thus, an adversary cannot tell how much information
has been embedded based on the size of the image and a statistical approach needs
to be utilised to estimate the hidden message length. Note that the terms message,
hidden message and secret message are used interchangeably. The message length
is the number of bits embedded in the image. It is normally defined by the ratio
between the number of embedded message bits and the maximum number of bits
that can be embedded in a given image. It can also be measured as bits per pixel
(bpp).
The analysis levels discussed so far cannot reveal the locations where hidden mes-
sage bits are embedded. However, with the help of estimated hidden message
length as side information, an adversary can proceed to identify the stego-bearing
pixels. Identifying the exact location of stego-bearing pixels is not easy for two
reasons. First, the message bits are often randomly scattered throughout the
whole image. Second, it is difficult or impossible to detect hidden message bits
that are unchanged with respect to the cover image.
Identifying the stego-bearing pixels locates the message bits, but does not deter-
mine the sequence of the message bits. Thus, the next level of steganography
analysis is to retrieve the stegokey. Successfully retrieving the stegokey can be
considered a bigger achievement—it provides access to the stego-bearing pixels as
well as the embedding sequence. In other words, a correct stegokey will give in-
formation about the order of bits that create the hidden message. Studies related
to each analysis technique will be given and elaborated in Section 3.2.
2.4 Blind Steganalysis as Pattern Recognition
An example of classification problem involves dividing a set of many possible ob-
jects into disjoint subsets where each subset forms a class. Usually, the pattern
recognition techniques are used to solve this problem. Pattern recognition is an
important aspect of Computer Science that focuses on recognising complex pat-
14
Decision
(Cover or Stego)
Image
Features
Feature Extraction
TestingTraining
Classification
Trained Model
Figure 2.2: General framework of blind steganalysis
terns from samples and making intelligent decisions based on the patterns.
As discussed in Section 2.3, blind steganalysis examines the image characteristics
(samples) and determines whether these characteristics exhibit abnormalities (de-
cision making). This means that, given an image, the steganalysis should be able
to decide the class (cover or stego) in which the image belongs. Hence, the prob-
lem of blind steganalysis can be considered a classification problem and techniques
from pattern recognition can be employed.
Different embedding techniques are thought to produce different changes in image
characteristics. In other words, the characteristics of cover and stego images differ,
and those resulting from different stego images (stego images produced by different
embedding techniques) differ as well. Therefore, it is possible to extend the pattern
recognition techniques to differentiate and classify these images. This extended
blind steganalysis is known as multi-class steganalysis.
As with any pattern recognition methodology, blind and multi-class steganaly-
sis consist of two processes—feature extraction and classification. The general
framework for blind steganalysis is shown in Figure 2.2.
2.4.1 Feature Extraction
Feature extraction is a process of constructing a set of discriminative statistical
descriptors or distinctive statistical attributes from an image. These descriptors or
15
attributes are called features. Alternatively, feature extraction can be considered
a form of dimensionality reduction. It is desirable that the extracted features
should be sensitive to the embedding artefact, as opposed to the image content.
Some examples of the features extracted in the early stage of blind steganalysis
research include image quality metrics, wavelet decompositions and moment of
image statistic histograms. These features were used in the blind steganalysis
developed in [4], [73] and [48], respectively. More recent features developed include
Markov empirical transition matrix, moment of image statistic from spatial and
frequency domains and the co-occurrence matrix, which are employed in [54], [14]
and [116], respectively. The details of these features will be covered in Section 3.2.
2.4.2 Classification
Classification identifies or categorises images into classes (such as a cover or stego
image) based on their feature values. The primary classification involved in ste-
ganalysis is supervised learning. In supervised learning, a set of training samples
(consisting of input features and class labels) is fed in to train the classifier. Once
the classifier is trained (trained model), it predicts the class label based on the
given features.
Some of the common classifiers used in steganalysis include multivariate regression,
Fisher linear discriminant, neural network and support vector machines (SVM).
Multivariate regression [11] provides a trained model, which consists of regres-
sion coefficients. During training, regression coefficients are predicted using the
minimum mean square error. For example, let the target label (or class label)
be yi and xij denotes the features, where i = 1, . . . , N indicates the ith image
and j = 1, . . . , n indicates the jth feature, then the linear expression would be as
shown below:
y1 = β1x11 + β2x12 + · · ·+ βnx1n + ε1,
y2 = β1x21 + β2x22 + · · ·+ βnx2n + ε2,...
yN = β1xN1 + β2xN2 + · · ·+ βnxNn + εN , (2.3)
where β is the regression coefficient and ε is the zero mean Gaussian noise. N and
16
n are the total number of samples and features, respectively. With these regression
coefficients, a given image can be classified by regressing the image features. The
computed target value is then compared to a threshold to determine the right
image class.
Fisher linear discriminant is a classification method that projects multi-
dimensional features, x onto a linear space [16]. Suppose two classes of obser-
vations have means µy=0 and µy=1, and covariances Σy=0 and Σy=0, the linear
combination of features w × x will have means w × µy=i and variances wTΣy=iw
for i = 0, 1. Fisher linear discriminant is defined as a linear combination of features
that maximizes the following separation, S:
S =σ2between class
σ2within class
,
=(w × µy=0 − w × µy=1)
2
wTΣy=0w + wTΣy=1w,
=(w(µy=0 − µy=1))
2
wT (Σy=0 + Σy=1)w. (2.4)
Next, it can be shown that the optimal w is given by
w = (Σy=0 + Σy=1)−1(µy=0 − µy=1). (2.5)
Finally, an image can be classified by linearly combining its extracted features
with w and comparing the result to a threshold.
Artificial neural network, usually called neural network, is an information-
processing model inspired by the way the biological nervous system (e.g., the
brain) processes information. The basic building block of the neural network is
the processing element (PE), commonly known as the neuron. The processing
capabilities are derived from a collection of interconnected neurons (PEs). Math-
ematically, a neural network can be considered a mapping function F : Xn → Y ,
where n dimensions of features X are the inputs to the neural network, with deci-
sion values Y (class labels) [119]. The function F can be defined as a composition
of other functions Gi = (G1, . . . , Gm). In addition, function Gi can further be
defined as a composition of other functions. The composition of these function
definitions forms the neural network. The structure of these functions and their de-
pendencies between inputs and outputs will determine the type of neural network.
The most common types used in classification are feedforward and backpropa-
gation neural network. As with any other supervised learning, the classification
17
Y
X
Figure 2.3: Two-class SVM classification
process in a neural network involves two operations—training and testing. During
training, the neural network learns to associate outputs with input patterns. This
is carried out by systematically modifying the weights of the inputs throughout
the neural network. When the neural network is used for testing, it identifies
the input pattern and tries to determine the associated output. When the input
pattern has no associated output, the neural network provides an output that
corresponds to the best match of the learned input patterns.
Support vector machines (SVM) are a classification technique that can learn
from a sample. More precisely, we can train the SVM to recognise and assign class
labels based on a given collection of data (i.e., features). For example, we train the
SVM to differentiate cover images from stego images by examining the extracted
features from many instances of cover and stego images. To illustrate the point,
let us interpret this example using the illustration shown in Figure 2.3. The X and
Y axes represent two different features. Cover and stego images are represented by
circles and stars, respectively. Given an unknown image (represented by a square),
the SVM is required to predict the class to which it belongs.
This example is easy, as the two classes (cover and stego) form two distinct clusters
that can be separated by a straight line. Hence, the SVM finds the separating line
and determines the cluster for the unknown image. Finding the right separating
line is crucial and it is provided during the training. In practice, the feature
dimensionalities are higher and we need a separating plane, known as separating
hyperplane, instead of a line.
Thus, the goal of SVM is to find a separating hyperplane that can effectively
separate classes. To do that, the SVM will try to maximise the margin of the sep-
arating hyperplane during training. Obtaining this maximum-margin hyperplane
will optimise the SVM’s ability to predict the correct class of an unknown object
(image).
18
However, there are often non-separable datasets that cannot be separated by a
straight separating line or flat plane. The solution to this difficulty is to use a
kernel function. A kernel function is a mathematical routine that projects the
features from a low-dimensional space into a higher dimensional space. Note that
the choice of kernel function will affect the classification accuracy. For additional
reading on SVMs, see [80].
2.5 Digital Images
As discussed in the overview of steganography section, practically any form of
digital media can be used to carry secret messages. Examples of these media
include image, video, audio, text, etc. By far the most popular choice is image.
In this section we will introduce various digital images, since the vast majority of
research in steganography is concerned with image steganography. In addition,
the work of the thesis is concentrated on image steganalysis.
A digital image is produced through a process called digitisation. Digitising an
image involves converting analogue information into digital information; thus, a
digital image is the representation of an original image by discrete sets of points.
Each of these points is called a picture element or pixel.
Pixels are normally arranged in a two-dimensional grid corresponding to the spatial
coordinates in the original image. The number of distinct colours in a digital
image depends on the number of bits per pixel (bpp). Hence, the types of digital
image can be classified according to the number of bits per pixel. There are three
common types of digital image:
❐ Binary image. In a binary image, only one bpp is allocated for each pixel.
Since a bit has only two possible states (on or off), each pixel in a binary
image must represent one of two colours. Usually, the two colours used are
black and white. A binary image is also called a bi-level image.
❐ Greyscale image. A greyscale image is a digital image in which the only
colours are shades of grey. The darkest possible shade is black, whereas
the lightest possible shade is white. Normally, there are eight bits per pixel
assigned for a greyscale image. This creates 256 possible different shades of
grey.
❐ Colour image. In general, a pixel in a colour image consists of several primary
colours. Red, green and blue are the most commonly used primary colours.
19
Each primary colour forms a single component called a channel, with eight
bits usually allocated for each channel, producing 24 bits per pixel. This
corresponds to roughly 16.7 million possible distinct colours. When the
channels in a colour image are split, each forms a different greyscale image.
2.5.1 Image File Formats
After digitisation, a digital image can be stored in a specific file format. Although
many file formats exist, the major formats include BMP, JPEG, TIFF, GIF and
PNG. Images stored in these formats are considered raster graphics. Another
type of graphic image is a vector graphic image. Unlike raster graphics, which use
pixels, vector graphics use geometric primitives such as points, lines and polygons
to represent the images. The rendering of the geometric primitives in vector
graphics is based on mathematical equations. This thesis focuses on raster rather
than vector graphics.
In a raw image, the data captured from a digital device sensor are preserved and
stored in a file. The data captured are raw in the sense that no adjustment or
processing is applied. The data are merely a collection of pixel values captured
at the time of exposure. Note that there is no standard for a raw image and it is
device dependent. Hence, a raw image is often considered an image, rather than
a standard image file format.
The bitmap or BMP format is considered a simple image file format. Normally
the data is uncompressed and easy to manipulate. However, the uncompressed
BMP format gives a BMP image a larger file size than that of a compressed
image. A BMP image can also use a colour palette for indexed-colour images.
Nonetheless, a colour palette is not used for BMP images greater than 16 bpp or
higher.
The joint photographic experts group (JPEG) format is by far the most com-
mon image file format. JPEG images are very popular and primarily used in
photographs. Their popularity is due to the excellent image quality they produce
despite a smaller file size. This is achieved through lossy compression. Many
imaging applications allow users to control the level of compression. This is useful
because users can trade off image quality for a smaller file size and vice versa.
However, lossy compression reduces the image quality and cannot be reversed.
In situations where the image quality is as important as the file size, the tagged
20
image file format (TIFF) could be a suitable choice. The TIFF format uses
lossless compression, which reduces the image file size while preserving the original
image quality. This makes TIFF a popular image archive option. In addition, as
the name implies, the TIFF format also offers flexible information fields in the
image header called tags. These tags are very useful and can be defined to hold
application-specific information.
The graphics interchange format (GIF) uses a colour palette to produce an indexed-
colour image. It also uses lossless compression. GIF can offer optimum compres-
sion when the image contains solid colour graphics (such as a logo, diagram, draw-
ing, or clipart). In addition, GIF supports transparency and animation. These
features make GIF an excellent format for certain web images. However, GIF is
not suitable for complex photographs with continuous tones, as a GIF image can
store only 256 distinct colours.
Compared with GIF, the portable network graphics (PNG) format provides more
improvements. These improvements include greater compression, better colour
support, gamma correction in brightness control and image transparency. The
PNG format is an alternative to GIF and is expected to become a mainstream
format for web images.
2.5.2 Spatial and Frequency Domain Images
In a general sense, an image (I) can be considered a result of the projection of a
scene (S) [34]. The spatial domain image is said to have a normal image space,
which means that each image element at location ℓ in image I is a projection at
the same location in scene S. The distance in spatial domain corresponds to the
real distance. A common example of the spatial domain image is BMP image.
The frequency domain image has a space where each element value at location
ℓ in image I represents the rate of change over a specific distance related to the
location ℓ. A popular frequency domain image is the JPEG image.
21
Chapter 3
Literature Review
This chapter considers research relevant to both steganography and steganalysis.
Steganography is presented in Section 3.1. We give an overview of different types
of steganography with the emphasis on image steganography. In particular, we
discuss binary image steganography in the first part of the section and JPEG image
steganography in the second part. Steganalysis is discussed in Section 3.2. We
review and highlight the most relevant existing techniques in steganalysis. These
techniques are specifically used in analysing image steganography. The discussion
is divided into six subsections. We follow and organise the discussion according
to the different levels of analysis, which is presented in Section 2.3.
3.1 Steganography
This section discusses some of the selected steganographic methods. These par-
ticular methods are used as a subject in our analysis. The first five subsections
discuss steganographic methods that use binary images as the cover images and
the rest of the subsections discuss methods that use JPEG images.
3.1.1 Liang et al. Binary Image Steganography
Consider a variant of boundary pixel steganography proposed by Liang et al. [69].
Boundary pixel steganography hides a message along the edges, where white and
black pixels meet—these are known as boundary pixels. Note that the boundary
pixels are those pixels within the image where there is colour transition occurred
22
between white and black pixels. The boundary pixels should not be confused with
the four borders of an image.
To obtain higher imperceptibility, the pixel locations used for embedding are per-
muted and distributed over the whole image. The distribution of message bits
is controlled by a pseudorandom number generator whose seed is a secret shared
by the sender and the receiver of the hidden message. This seed is also called
stegokey.
As the message bits are embedded on the boundary pixels of the image, it is
important to identify the boundary pixels and their orders unambiguously. Once
the sequence of boundary pixels is obtained, a pseudorandom number generator is
used to determine the place where the message bits should be hidden. The authors
of [69] define boundary pixels as those that have at least one neighbouring pixel
with a different intensity. For example, a white (black) pixel must have at least
one black (white) neighbouring pixel. Note that a pixel can have, at most, four
neighbours (left, right, top and bottom).
Not all boundary pixels are suitable for carrying message bits because embedding
a bit into an arbitrary boundary pixel may convert it into a non-boundary one. If
this happens, then the extraction will not be correct and recovery of the hidden
message is impossible.
Because of this technical difficulty, the authors have proposed a modified algo-
rithm that adds restrictions on the selection of boundary pixels for embedding. A
currently evaluated boundary pixel, P is considered eligible for embedding if the
following two conditions are satisfied:
i. Among the four neighbouring pixels, there exist at least two unmarked neigh-
bouring pixels and their pixel values must be different.
ii. For each marked neighbouring pixel (if any), its neighbouring pixels (exclud-
ing the current pixel, P ) must also satisfy the first criterion.
A pixel is said to be marked if it has already been evaluated or is assigned a
(pseudorandom) index with a smaller value than the current index. In contrast,
a pixel is said to be unmarked if it is evaluated after the current pixel.
Figures 3.1 and 3.2 show some examples of eligible and ineligible pixels, respec-
tively. The shaded box represents a pixel value of zero and the white box represents
a pixel value of one. These pixels are taken from some portion of a binary image.
Pixel P is the currently evaluated pixel and the number inside each box is the
23
pseudorandom index. This index will indicate if a pixel is unmarked or marked.
For example, in Figure 3.1(b), the current pixel, P will have three unmarked
(i.e., left, right and top neighbouring pixels) and one marked pixels (i.e., bottom
neighbouring pixel).
Pixel P in Figure 3.1(a) is an eligible pixel because it satisfies the first condition
and it does not have any marked neighbouring pixel. Pixel P in Figure 3.1(b)
satisfies both conditions and thus, it is an eligible pixel. On the other hand,
pixel P in Figure 3.2(a) is an ineligible pixel because it does not satisfy the first
condition. Pixel P in Figure 3.2(b) only satisfies the first condition; therefore, it
is also considered ineligible.
Current pixel, P
55 7
22
30
12 21
90 56 9
32
80
73
10 63 67 45
(a)
Current pixel, P
55 7
22
30
12 21
90 56 9
32
80
73
10 63 67 45
(b)
Figure 3.1: Example of eligible pixels
Current pixel, P
55 7
22
30
12 21
90 56 9
32
80
73
10 63 67 45
(a)
Current pixel, P
55 7
22
30
12 21
90 56 9
32
80
73
10 63 67 45
(b)
Figure 3.2: Example of ineligible pixels
Once the boundary pixel is found eligible, the message bit will be embedded in
the pixel by overwriting its value if the message bit does not match the value;
otherwise, the pixel is left intact. This procedure is applied and repeated to
embed other message bits.
24
3.1.2 Pan et al. Binary Image Steganography
Motivated by Wu and Lee [113], Pan et al. developed a steganographic method
that embeds secret messages in binary images [82]. Compared with [113], this
method is more flexible, in terms of choosing the cover image block. The Pan
et al. method uses every block within an image to carry a secret message. This
gives it a greater embedding capacity. The security is also improved by having
less alteration of the cover image.
In this embedding algorithm a random binary matrix, κ and a secret weight matrix,
ω are defined and shared between the sender and receiver. Both matrices are of
size m×n. κ is a binary matrix and the matrix ω has elements of {1, 2, . . . , 2r−1}
where r is the number of message bits to be embedded within a block. A given
binary image is partitioned into non-overlapping blocks, Fi of size m× n and the
following matrix, ϕi is computed:
ϕi =∑
[(Fi ⊕ κ)⊗ ω], (3.1)
where ⊕ and ⊗ are the bitwise exclusive-OR and pair-wise multiplication opera-
tors, respectively.∑
[·] is the arithmetic summation of all elements in the matrix.
r message bits, mN = Φ(m1m2 . . .mr) are embedded in block Fi by ensuring the
following invariant:
ϕi ≡ mN mod 2r, (3.2)
where mN is the decimal representation of the message bits and Φ(·) is the binary
to decimal conversion. If the invariant holds, the Fi from ϕi (Equation (3.1)) is
left intact. Otherwise, some pixels from Fi will be altered. In most cases, one
pixel will be flipped if there is a mismatch and an alteration is required. However,
if flipping one pixel is not sufficient, flipping a second pixel will guarantee the
invariant to be held. Hence, only two pixels of Fi will be altered, at most. This
method can embed up to r = ⌊log2(mn + 1)⌋ bits per block.
Successfully extracting a secret message requires the correct combination of κ
and ω. κ and ω, in this case, can be considered the stegokey. The receiver also
needs to know the correct parameters (m, n and r) used in the embedding. Then
the secret message bits embedded in a block can be extracted through Equation
(3.2)—mN = ϕi mod 2r. The extracted mN from each block is converted into
binary bits and concatenated to form the secret message.
25
3.1.3 Tseng and Pan Binary Image Steganography
Although the method developed in [82] generally enhanced security (by altering
fewer pixels for the same amount of embedded message), the quality of the stego
image has not been taken into consideration. Noise may become prominent in
certain blocks after embedding. For example, an isolated dot may exist in an
entirely black or white block.
As a sequel to the work done in [82], Tseng and Pan revised the method and
enhanced it [107]. The main contribution of this work was to maintain the image
quality through sacrificing some of the payload. According to the authors, the
image quality can be greatly improved while still maintaining a good embedding
rate—as much as r = ⌊log2(mn+1)⌋−1 bits per block, where m×n is the size of
a block. On average, r is only one bit per block less than their previous method.
To maintain image quality, the method discards any block that is either entirely
black or white. In addition, when a pixel must be flipped to carry a message bit,
the selection of which pixel to flip is governed by a distance matrix. The distance
matrix selects only a pixel in which the new value (after flipping) is the same as
the pixel value of its majority neighbouring pixels. This prevents the generation
of isolated dots, which degrades the image quality. For example, Figure 3.3 shows
two possible ways of flipping a pixel. Obviously, the effect of flipping will be
less visible in Figure 3.3(b) than in Figure 3.3(c). The authors also defined an
additional criterion for the secret weight matrix, ω which also improves the image
quality.
(a)
Flipped pixel
(b)
Flipped pixel
(c)
Figure 3.3: Effect of flipping a pixel: (a) original block of pixels; (b) no isolateddot (c) obvious isolated dot
Similar to their previous method, the maximum number of pixels that must be
altered per block to carry the message bits is, at most, two. The rest of the em-
bedding and extraction algorithms are similar to the previous method. However,
26
if block Fi becomes entirely black or white after embedding, it is skipped. The
alteration of that block will not be reversed and the same message bits will be em-
bedded in the next block. This is important to ensure the correctness of message
extraction.
Both methods have the flexibility to adjust between security level and payload
size. When increased security is necessary, the block size (parameters m and n)
can be increased. This larger block size will reduce the payload size because the
total number of blocks per image will be reduced when the block size is larger.
Eventually, with the same r bits per block, the total payload is reduced as the
total number of blocks is reduced.
3.1.4 Chang et al. Binary Image Steganography
The steganographic method developed by Chang et al. [10] can be considered a
variant improved from the binary image steganography developed by Pan et al.
[82]. In general, this method offers the same embedding rate as the Pan et al.
method, which is r = ⌊log2(mn + 1)⌋ bits per block (m × n is the block size).
However, this method is superior to the Pan et al. method in the sense that it
alters one pixel (at most) to embed the same amount of message bits within a
block (as opposed to two pixels in the Pan et al. method). Thus, this method
provides a higher level of security by reducing the alteration of the stego image.
Practically, the Chang et al. method also employs two matrices during embed-
ding: a random binary matrix and serial number matrix. The main difference in
the Chang et al. method is the introduction of the serial number matrix to re-
place the secret weight matrix. This enables this method to work with less image
alteration. With the serial number matrix, r linear equations, known as general
hiding equations, are defined to embed r bits of message in a block. The general
hiding equations are used to determine the pixel suitable for flipping. To obtain
valid general hiding equations, the serial number matrix is required to have 2r − 1
elements with non-duplicate decimal values.
For message extraction, each block is transformed using the bitwise exclusive-
OR operator with the random binary matrix. For each block, r general hiding
equations are defined through the serial number matrix. The parities of results
calculated from the r general hiding equations are obtained as the message bits.
Clearly, the random binary matrix and serial number matrix are used as the
stegokey and shared between the sender and receiver.
27
3.1.5 Wu and Liu Binary Image Steganography
Another steganography using a block-based method to embed secret messages in
binary images is that developed by Wu and Liu in [112]. This technique also starts
by partitioning a given image into blocks. To avoid synchronisation problems
(which lead to incorrect message extraction) between embedding and extraction,
this technique embeds a fixed number of message bits within a block. In their
implementation details, the authors opt to embed one message bit per block. The
embedding algorithm is based on the odd-even relationship of the number of black
pixels within a block. In other words, the total number of black pixels within a
block is kept as an odd number when a message bit of ones is embedded, whereas
the total number of black pixels is kept as an even number for a message bit of
zeros. If the odd-even relationship matches the message bit, no flipping is needed.
Otherwise, some pixels must be flipped.
Like any other embedding technique, the most important part is the selection of
pixels for flipping. An efficient selection approach ensures minimum distortion.
That is why, in [112], Wu and Liu introduced a flippability scoring system for
selecting pixels for flipping. The score for each pixel is computed by examining the
pixel and its immediate neighbours (those within a 3 × 3 block). The flippability
score is produced by a decision module based on the input of two measurements.
The first measurement is the smoothness, which computes the total number of
transitions in the vertical, horizontal and two diagonal directions. The second
measurement is the connectivity, which computes the total number of black and
white clusters formed within a block. These measurements are all computed within
a 3 × 3 block. An illustration of these measurements is shown in Figure 3.4.
3.1.6 F5 Steganography
In [111], Westfeld and Pfitzmann observed that an embedding algorithm that
overwrites the LSB of JPEG coefficients causes the JPEG distribution to form
pair of values (PoV). PoV occurs when two adjacent frequencies in the JPEG
distribution are similar (Figure 3.12 shows the effect of PoV). By exploiting the
PoV, Westfeld and Pfitzmann concluded that a steganographic method can be
broken. They showed the analysis and attack on Jsteg using the chi-square test
(details of this attack are discussed in Subsection 3.2.3).
As a result, Westfeld developed a new steganography called F5 [110]. F5 is for-
28
Vertical
0
0
0
0
1
0
Horizontal Diagonal Anti-diagonal
1
1
1
0
1
1
1
1
1
1
(a)
3 x 3 block 1 white cluster 2 black clusters(b)
Figure 3.4: Measurement of smoothness and connectivity: (a) smoothness is mea-sured by the total number of transitions in four directions (the arrows indicatethe transition directions, 0 indicates no transition and 1 indicates a transition)(b) Connectivity is measured by the number of the black and white clusters (fourwhite pixels forming 1 cluster and 5 black pixels forming 2 clusters)
mulated to preserve the original property of the statistic (i.e., the JPEG distri-
bution). When alteration is required during embedding, F5 will decrement the
absolute value of JPEG coefficients by one, instead of overwriting the LSBs with
message bits. This prevents the formation of the PoV; hence, F5 cannot be de-
tected through the chi-square test.
To minimise the changes caused by embedding, matrix encoding is employed to
increase embedding efficiency. Finally, to avoid concentrating the embedded mes-
sage bits in a certain part of the image, F5 embeds on the randomly permuted
sequence of coefficients. The random sequences are generated by a PRNG.
3.1.7 OutGuess Steganography
OutGuess is a type of JPEG image steganography developed by Provos in [90].
This method was developed to withstand the chi-square attack and the extended
chi-square attack as well. This method can be summarised as two main operations:
embedding and statistical correction.
Similar to other JPEG image steganographies, OutGuess embeds message bits
by altering the LSBs of JPEG coefficients. The embedding is spread randomly
29
throughout the whole image using a random selection that proceeds with the
coefficients from the beginning until the end of the image. To select the next
coefficient, OutGuess computes a random offset and adds the offset to the current
coefficient location. The random offsets are computed by a PRNG.
Note that the embedding will cause the image statistics (i.e., the distribution of
the coefficients) to deviate, hence some coefficients are reserved (unaltered) with
the intention of correcting the statistical deviation. In other words, after all the
message bits are embedded, the reserved coefficients will be adjusted accordingly.
The adjustment is carried out such that the distributions of cover and stego images
are similar.
3.1.8 Model-Based Steganography
Based on the concept of statistical modelling and information theory, Sallee de-
veloped a steganography called model-based steganography [96]. Model-based
steganography is designed to withstand a first-order statistical attack while main-
taining a high embedding rate. Unlike OutGuess, which preserves only the dis-
tribution of an image, model-based steganography preserves the distributions of
individual coefficient modes.
To start the embedding, model-based steganography separates an image into an
unalterable xα and alterable part xβ . If a JPEG image is used as the cover image,
the most significant bits of the coefficients will be the xα and the least significant
bits will be the xβ . xα is used to build a conditional probability P (xβ|xα) from
a selected cover image model. Together with this conditional probability and a
secret message, a non-adaptive arithmetic decoder is used to generate a new part
x′β , which will carry the message bits. The selection of the coefficients to use is
based on a PRNG. Finally, xα and x′β are combined to form the stego image. The
embedding algorithm is shown in Figure 3.5(a).
To extract the secret message, steps similar to those discussed above are followed
with the exception of the non-adaptive arithmetic decoder. An arithmetic encoder
is used instead of an arithmetic decoder. The input to the non-adaptive arithmetic
encoder is x′β and the conditional probability P (xβ|xα). Since xα is unaltered, the
conditional probability can be regenerated. Therefore, the secret message can be
extracted successfully through the non-adaptive arithmetic encoder. Figure 3.5(b)
illustrates the extraction algorithm.
30
Conditionalprobabilitygeneration
Imagemodel Message
Entropydecoding
xα xβ
xα x'β
Cover image
Stego image
(a)
Conditionalprobabilitygeneration
Imagemodel
MessageEntropyencoding
xα x'β
Stego image
(b)
Figure 3.5: (a) Embedding algorithm of model-based steganography (b) extractionalgorithm of model-based steganography
3.2 Steganalysis
This review and organisation of the types of steganalysis is not intended to be
exhaustive, but that it is organised according to the different levels of possible
steganographic analysis. More precisely, these levels are ordered according to
the type of secret information or parameter an adversary wishes to extract. We
begin with the techniques employed by the adversary to detect the presence of a
secret message in an image and to determine which type of steganographic method
is used. After that, we discuss the techniques used to recover some attributes
(secret parameters) of the embedded secret message. This attributes include secret
message length, location of stego-bearing pixels and stegokey.
3.2.1 Differentiation of Cover and Stego Images
In this scenario, it is assumed that the adversary has access to an image (or a
collection of images) and tries to determine if the image contains a secret message
(stego image) or does not (cover image). This task is doable only if the statistical
features present in both cover and stego images are different enough to make
a reliable decision. In order to do that, different feature extraction techniques
31
can be applied to extract relevant statistical features. The following collection of
statistical features can be found in the literature:
❐ Co-occurrence matrix
❐ Statistical moments
❐ Wavelet subbands
❐ Pixel difference
The next step is to perform classification based on the extracted features. Because
the distributions of cover and stego images will never be exactly known, sometimes
overlapping happens. To alleviate this problem, cover image estimation is utilised
to derive a more sensitive feature for steganalysis.
In the following subsections, we are going to discuss how these features have been
applied in steganalysis, follow by the discussion on classification and last but not
least, we will discuss cover image estimation as well.
Co-occurrence matrix
Sullivan et al. use an empirical matrix as the feature set to construct a ste-
ganalysis [102]. The steganalysis technique developed can detect several variants
of spread-spectrum data hiding techniques [24, 76] and perturbed quantisation
steganography [36]. This empirical matrix is also known as a co-occurrence ma-
trix.
The authors observe that the empirical matrix of a cover image is highly concen-
trated along the main diagonal. However, data hiding will spread the concentra-
tion away from the main diagonal. An example of this effect is shown in Figure
3.6. To capture this effect, the six highest probabilities (the elements of the em-
pirical matrix with the highest probability) along the main diagonal are chosen.
Then ten nearest elements of each highest probability element are also chosen.
This creates a feature set with 66-dimensional vectors. Next, the authors subsam-
ple the remaining main diagonal elements by four and obtain another feature set
with 63-dimensional vectors. A feature set with 129 dimensions is used in their
steganalysis.
The feature set selected in [102] is stochastic and may not effectively capture the
embedding artefacts. Xuan et al. [116] constructed a better feature set from
the co-occurrence matrices. They generated four co-occurrence matrices from
32
(a)
Spread awayfrom diagonal
(b)
Figure 3.6: Plot of co-occurrence matrices extracted from: (a) cover image; (b)stego image
the horizontal, vertical, main and minor diagonal directions (as opposed to using
only the horizontal direction as in [102]). These four matrices are averaged and
normalised to form a final matrix. Note that, because the final co-occurrence
matrix is symmetric, it is sufficient to use the main diagonal and part of the upper
triangle of the co-occurrence matrix. Xuan et al. selected 1018 elements from this
area to form their feature set (a 1018-dimensional feature set).
A specifically tuned classifier (class-wise non-principal components analysis) is
used to obtain a high detection rate. Xuan et al. proved its efficiency with
JPEG and spatial domain image steganography. However, their high dimensional
features may suffer from the curse of dimensionality when applied to other types
of classifier. Although their current implementation is arguably optimal, it is
threshold dependent, which limits its flexibility for blind steganalysis.
Chen et al. developed a blind steganalysis based on a co-occurrence matrix [15].
It is well known that direct use of a co-occurrence matrix as the feature will create
an expansion of the matrix dimension. For example, for an 8-bit image, the co-
occurrence matrix will have 256×256 dimensions. Therefore, Chen et al. projected
the co-occurrence matrix into a first-order statistic to reduce its dimensionality.
More precisely, this first-order statistic is the frequency of occurrence along the
horizontal axis of the co-occurrence matrix.
In [43], the authors exploited the correlations between the discrete cosine trans-
form (DCT) coefficients in intra- and inter-blocks of JPEG images. Intra-block
correlation is the correlation between neighbouring coefficients within a block;
inter-block correlation measures the correlation between a DCT coefficient in one
block and the coefficient of the same position in another block.
33
The authors arranged the DCT coefficients in a block into a one-dimensional
vector using the zigzag order. For each block, only AC coefficients are considered
while the DC coefficient is discarded. This is because normally DC coefficients are
not changed in JPEG steganography. In addition, the authors also discard some
coefficients with a high frequency of occurrence (i.e. coefficients with a value of
zero). All the blocks in a JPEG image are scanned in a fixed pattern to form a
new re-ordered block called a 2-D array. Only the magnitudes of the coefficients
are used.
Markov empirical transition matrices are used to capture these dependencies. Hor-
izontal and vertical Markov empirical transition matrices are used to capture the
intra- and inter-block correlations, respectively. The authors also further trim the
dimensionality of the matrices by thresholding the 2-D array. In other words,
elements with a magnitude greater than the threshold are assigned a maximum
value (the threshold value).
Statistical moments
Harmsen and Pearlman [48] showed that additive noise data hiding techniques are
equivalent to a low-pass filtering of image histograms. The centre of mass (COM)
is used to quantify this effect. Note that COM is the first-order of statistical
moment.
The authors have shown that it is better to compute the COM from the frequency
domain. Hence, the discrete Fourier transform is applied to transform the im-
age histogram. This transformation produces a histogram characteristic function.
COM is computed based on this characteristic function.
The detection accuracy reported exceeded 95 per cent when the embedding rate
was 1.0 bpp. Unfortunately, the authors did not test for a smaller embedding
rate. In most cases, decreasing the embedding rate reduces the detection accuracy.
Further, only 24 images were used to test the detection accuracy. This smaller
subset of images may not fully represent the actual accuracy.
However, the use of COM as the feature in [48] has brought insight into much
research. For instance, Shi et al. [100] used a set of statistical moments as the
features in their blind steganalysis. First, the authors use the Haar wavelet to
decompose the image. After the decomposition, eight wavelet subbands are pro-
duced and a discrete Fourier transform was applied on the probability density
34
function of these subbands. The same discrete Fourier transform is applied to the
given image as well. Note that these transformations produce nine characteristic
functions. Finally, the first and second orders of statistical moments are computed
from the characteristic functions.
In a different work [115], Xuan et al. developed an enhanced version, based on
statistical moments. Enhancement is achieved with an additional level of wavelet
decomposition. The third order is used in addition to the first two orders of
statistical moments. The reported experimental results show improvements in
detection accuracy and an ability to detect more steganographic types.
Shi et al. further improved the use of statistical moments as features [101]. The
main difference compared to [100] and [115] is the incorporation of a prediction-
error image. The prediction-error image is obtained from the pixel-wise difference
between the given image and its predicted version. The prediction algorithm is
based on a predefined relationship within a block of four neighbouring pixels.
The statistical moments in [101] are computed from two image components: the
given image and the prediction-error image. The procedures to compute the sta-
tistical moments are the same as the procedures used in [115], obtaining a 78-
dimensional feature set (39 features from each image component).
The experimental results reported are promising. However, the detection accu-
racy is unclear for wider percentage range of hidden messages, since only certain
percentages of hidden messages are tested.
Research similar to that developed in [101] can be found in [15]. The authors of
[15] raised the concern of precision degradation in the first-order statistic when a
wavelet is used (i.e., the wavelet coefficients are floating points). Hence, the co-
occurrence matrix (discrete integers) was used instead of wavelet decomposition.
Inspired by the work in [101], Chen et al. enhanced and applied the statisti-
cal moments on JPEG image steganalysis [14]. This enhancement involves the
incorporation of additional high-order statistics.
In their work, the first feature set is inherited directly from [101]. The same feature
extraction procedure, with some modification in the prediction algorithm, is used
to form the second feature set. Note that the second feature set is extracted from
the absolute value of non-zero DCT coefficients. For the third feature set, the
same set of non-zero DCT coefficients and wavelet subbands are used to construct
three co-occurrence matrices. These co-occurrence matrices are transformed into
35
the characteristic functions and the statistical moments are calculated from these
characteristic functions.
According to the authors, it is crucial to use higher-order statistics as the fea-
tures because some modern steganography, such as OutGuess and Model-based
Steganography, tries to preserve the first-order statistics. This may cause the
first-order statistical features to become less effective. Hence, it is suitable to
incorporate co-occurrence matrices as features.
The statistical moments computed from characteristic functions are more effective
than those computed from image histogram (i.e., the image probability density
function). The main difference between moments of characteristic functions and
image histogram is the variance proportionality (i.e., 1σand σ, respectively). This
means moments of characteristic functions are determined by a smaller variance
distribution. Moments of image histogram are determined by a larger variance
distribution. Since data hiding involves the addition of smaller variance noise, it
is clear that the effect will be reflected more strongly in moments of characteristic
functions. Hence, moments of characteristic functions are more sensitive to data
hiding. This claim has been verified in [115].
Wavelet subbands
It is well known that natural images exhibit strong higher-order statistical reg-
ularities and consistencies. Thus, wavelet decomposition is often used to repre-
sent these characteristics for various image processing purposes. It is also well
known that steganographic embedding significantly disturbs the characteristics of
statistics. Hence, it is very natural to employ wavelet decomposition to detect
disturbances.
The first steganalysis technique using wavelet decomposition was developed by
Farid [32, 33]. In his work, quadrature mirror filters (QMFs) are used to decompose
a given image into multiple scales and orientations of wavelet subbands, obtaining
nine wavelet subbands. Quadrature mirror filters is formed by the combination of
low- and high-pass decomposition filters and their associated reconstruction filters,
which produces three different directions (i.e., horizontal, vertical and diagonal).
An illustration of the decomposition is shown in Figure 3.7. Farid also used
a linear predictor to compute the log errors from the magnitude of coefficients
in each subband. A linear predictor is defined as a linear combination of some
scalar weighting values and a subset of neighbouring coefficients. This results in
36
V1
D1H1
V2
H2 D2
V3
D3H3
Figure 3.7: Illustration of wavelet decomposition. Hi, Vi and Di denote the hor-izontal, vertical and diagonal subbands, respectively. The index i indicates thescale
another nine sets of log errors (i.e., from the decomposed nine wavelet subbands).
Finally, the mean, variance, skewness and kurtosis are used to characterise the
wavelet coefficient distribution in all nine subbands. The same statistics are used
to characterise the nine sets of log error distributions. Combining these statistics
forms a 72-dimensional final feature set.
In subsequent work [74], Lyu and Farid extended the wavelet statistics to include
the colour components of an image. The wavelet decomposition process used is
the same as in their prior work. This means that each colour component will
be treated as a greyscale image and is decomposed into wavelet subbands. For
example, in a colour image consisting of red, green and blue components, each is
processed independently as a greyscale image. However, the main difference in [74]
is the second part of the feature set—the log errors. The linear predictor used to
compute the log errors has been updated to include neighbouring coefficients from
different colour components. Identical to their prior work, the mean, variance,
skewness and kurtosis are used to characterise the wavelet coefficients and log
error distributions. Obviously, the number of dimensions for the final feature set
has been increased to 216.
Through extensive work on wavelet decomposition, Lyu and Farid [75] extended
their work to include phase statistics (in addition to their prior work with mag-
nitude statistics). In their work, phase statistics are modelled using the local
angular harmonic decomposition (LAHD). The LAHD can be regarded as a local
decomposition of image structure by projecting onto a set of angular Fourier basis
kernels. Different orders of LAHD can be computed from the convolution of the
image with the derivatives of a differentiable radial filter such as a Gaussian filter.
The feature set has been extended to form a 432-dimensional feature set. The ex-
perimental results reported show promising results and the ability to detect eight
37
different steganographic methods.
A feature set extracted from wavelet decomposition may seem effective, but nor-
mally the feature dimensionality is large, which increases the complexity of the
classification process. In addition, a larger dimensional feature set requires more
training samples to achieve stable classification. Other related works that utilise
wavelet decomposition to extract feature set can be found in [100, 14, 120].
Pixel difference
Liu et al. consider the differential operation as high-pass filtering process when
applied to images [70]. This is desirable as it can capture the small distortion
caused by the embedding operation. In [70] the differential operation is defined
as the pixel-wise difference between two neighbouring pixels in the horizontal
direction (similarly in the vertical direction). The similar differentiation operations
are extended to obtain higher orders—second and third orders. The authors call
these statistics differential statistics.
In the feature extraction phase, differential statistics and the image pixel proba-
bility mass function are used to construct the first- (histogram for the frequency of
occurrence) and second-order (co-occurrence matrix) statistics. With these first-
and second-order statistics, a discrete Fourier transform is applied to obtain the
respective characteristic functions. Finally, the COM for each characteristic func-
tion is computed as a feature set. Note that the COM features computed are
identical to the features developed by Harmsen and Pearlman [48].
The experimental results reported in [70] suggest that this method can effectively
detect spread-spectrum data hiding. In addition, incorporating the differential
statistics feature set significantly improves the JPEG blind steganalysis developed
in [35]. According to the authors, the differential statistics are used to enlarge
the blockiness effects incurred during embedding. Hence, the enlargement makes
their feature set more sensitive to data hiding.
In a different work [99], Shi et al. developed an effective steganalysis technique
to attack JPEG steganography. The high accuracy achieved by this technique
is due to a sensitive feature set, notably the use of a difference JPEG 2D array.
JPEG 2D array has the same size as the given image filled with the absolute value
of quantised DCT coefficients. Note that the difference JPEG 2D array is very
similar to the differential statistics in [70]. More precisely, it is the first-order
38
differential statistic.
Compared with differential statistics, where only the horizontal and vertical direc-
tions are used, Shi et al. included the major and minor diagonal directions. For
each of the four directions, a transition probability matrix is computed. Thresh-
olding is also utilised to achieve a balance between detection accuracy and com-
putational complexity.
The difference JPEG 2D array in [99] reflects the correlations of neighbouring coef-
ficients within an 8×8 block. These correlations are called intra-block correlations.
Later, the authors in [13] include the inter-block correlations. For inter-block cor-
relations, the difference between two coefficients with the same mode is computed
from two neighbouring 8× 8 blocks (as opposed to two immediately neighbouring
coefficients within an 8 × 8 block for intra-block correlations). Figure 3.8 shows
an example of these correlations. Note that there are 64 coefficients per block and
the location of each coefficient within a block is known as the mode. The exper-
imental results indicate a significant improvement from incorporating inter-block
correlations. Clearly, the coefficient differences contribute crucial information for
this improvement.
Intra-blockcorrelation
Inter-blockcorrelation
(i)-th block (i+1)-th block
(N)-th block
Figure 3.8: Illustration of the intra- and inter-block correlations in a JPEG image
The effectiveness of differential statistics [70] can be attributed to the net results
after high-pass filtering. More precisely, the results of differentiation will produce
only the variable parts—possibly altered during embedding. This characteristic is
desirable as it amplifies the embedding artefact. Similarly, the alterations incurred
in JPEG steganography can be greatly enlarged and captured. This is the case for
the difference 2D array in [99], where the authors examine the difference between
a DCT coefficient and its neighbouring coefficient. This may seem optimistic
for richer statistics image like an 8-bit image. However, its applicability may be
39
questionable for a moderate statistics image, such as a halftone image.
Classification
As discussed in Subsection 2.4, differentiating a stego image from a cover image
involves classification. From the literature, the most commonly used classifiers
include Fisher linear discriminant, artificial neural networks and support vector
machines. These classifiers were discussed in Subsection 2.4.2.
Note that most of the work on blind steganalysis focuses on feature extraction.
The type of classifier selected is merely a choice. The task of feature extraction is
considered more crucial than the selection of the classifier in steganalysis, primarily
because the detection accuracy depends significantly on the feature sensitivity to
the embedding artefact. It is clear that for a sensitive and discriminating feature
set, the overall accuracy can be optimised by an optimised classifier. It is not
hard to switch from one type of classifier to another. For example, Farid changed
from using Fisher linear discriminant in [33] to SVM [73]. In [101] and [14], where
neural network is the initial classifier, it is later switched to SVM.
Cover image estimation
Normally, a cover image is destroyed or kept secret once a stego image is generated
to ensure maximum security of covert communications [92]. This implies that only
one version is typically available. If we have access to both the cover and stego
versions of the image, we can tell the differences easily and the steganography
scheme is considered broken.
In general, the effect of data hiding can be modelled as the effect of additive noise
in an image. It is sufficient to assume that if the additive noise or message is
independent of the cover image, the probability mass function (PMF) of the stego
image is equal to the convolution of additive noise PMF and cover image PMF
[48]. Hence, the cover image can be estimated from the stego image if the additive
noise is eliminated.
This has inspired the incorporation of cover image estimation in much steganalysis
research, such as image calibration [39] and prediction-error [101], to increase fea-
ture sensitivity with respect to the embedded artefacts and to remove the influence
of the image content. This improves the discriminatory power of steganalysis.
40
Ker [61] applied image calibration (along with another improvement called the ad-
jacency histogram) to improve the blind steganalysis initially developed by Harm-
sen and Pearlman [48]. In his work, a given image is down sampled with an
averaging filter. This down sampling involves addition and rounding operations
on the pixels (or coefficients). These operations even out the additive noise, allow-
ing the cover image to be estimated. However, the efficiency degrades when it is
used to detect stego images with shorter messages. This suggests that calibration
by down sampling may not be the optimal option.
In [121], Zou et al. used a simpler method to obtain the estimated cover image,
which they call a prediction-error image. The current pixel is subtracted from
the neighbouring pixel to obtain the prediction. For example, x(i, j)− x(i+ 1, j)
will produce the prediction-error image in the horizontal direction. x(i, j) is the
current pixel at location i and j. The same prediction is applied for vertical and
diagonal directions.
The authors note that this prediction may exhibit high variation for the prediction
values within a predicted image. For instance, the prediction values for an 8-bit
image will be [−255, 255]. To overcome this issue, the authors proposed using a
threshold T . If the absolute value of the prediction is greater than T , it will be set
to zero. The authors suggest that thresholding is effective as a high variation in
the prediction values is mostly caused by the image content (hence, is insignificant
in steganalysis).
However, it may be possible for adaptive steganography to counteract this thresh-
olding technique by adaptively selecting the region with high variations. This
causes the hidden data to be regarded as the original image content and discarded;
therefore, the detection fails.
3.2.2 Classification of Steganographic Methods
In this case, the adversary holds an image and wants to discover which stegano-
graphic technique has been used. Further on, we may assume that the collection
of possible steganographic techniques is public and known to the adversary. This
steganalysis problem has been tackled using the following approaches:
❐ Feature extraction
❐ Multi-class classification
41
Feature extraction
The first part of multi-class steganalysis is feature extraction. Here, several impor-
tant feature extraction techniques used in multi-class steganalysis are described
and analysed to gain an understanding of the analysis.
Rodriguez and Peterson focus on the determination of the embedding techniques
for JPEG image steganography [95]. The feature set is extracted from the mul-
tilevel energy bands of DCT coefficients. First, all DCT coefficients are arranged
into blocks of 8 × 8 coefficients. Within each block, the DCT coefficients are ar-
ranged using zigzag and Peano scan methods to produce the multilevel energy
bands. Then, the higher-order statistics (such as inertia, energy and entropy) are
computed for each band. These higher-order statistics form the first part of the
feature set. In addition, log errors are computed for the multilevel energy bands.
The log errors are the residuals computed from the DCT coefficients and their
predicted coefficients. The predicted coefficients are obtained from a predefine
subset of neighbouring coefficients. The same higher-order statistics are applied
to these log errors to form the second part of the feature set.
In general, the multi-class steganalysis [95] performs fairly well. However, the
method performs poorly on some steganographic techniques, such as OutGuess
[90] and StegHide [51], because OutGuess and StegHide can use a similar em-
bedding algorithm, which makes differentiation difficult. On the other hand, this
also shows that the developed feature set may not discriminate sufficiently. The
weakness is manifested in the feature elimination procedure. It is unclear how the
method performs the feature elimination. As a result, important information can
be discarded.
The extensive work in multi-class steganalysis carried out by Pevny and Fridrich
[85, 86, 87, 88, 89] was aimed at determining the types of embedding algorithms
employed in JPEG image steganography. The first version of their multi-class
steganalysis was an enhanced version of their blind steganalysis developed in [35].
Mainly, they utilised their proven discriminative feature set, applying it to multi-
class steganalysis.
There are 23 features and they can be grouped as global histogram, individual
histograms, dual histograms, variation, blockiness and co-occurrence matrices. A
global histogram is the histogram of all DCT coefficients in an image. Individual
histograms are extracted from the DCT coefficients of the five lowest-frequency AC
modes. Note that the mode refers to which DCT coefficient is located in the block
42
and there are 64 modes. Figure 3.9 shows an illustration of the modes from a JPEG
image. The next features are dual histograms, which represent the distributions
of eleven selected DCT coefficient values within the 64 modes. Variation is used
to measure the inter-block dependencies among the DCT coefficients. Blockiness
measures the spatial inter-block boundary discontinuities. The discontinuities
are calculated from the spatial pixel values of the decompressed JPEG image.
Finally, the co-occurrence matrices are calculated from the DCT coefficients of
neighbouring blocks. In addition, the estimated cover image is also used in feature
construction to increase the discriminative power of the feature set. To obtain an
estimation of the cover image, the authors decompress the JPEG image, crop
off some portion of the image and re-compress it. This process is called image
calibration.
A magnified view ofan 8x8 DCT block
(i)-th block (i+1)-th block
64-th mode
63-th mode
1-st mode2-nd mode
Figure 3.9: The 64 modes of an 8×8 DCT block. The circle represents the DCTcoefficient
Note that their multi-class steganalysis shows promising results. Later, Pevny
and Fridrich extended it to include a more complicated case, which involved the
analysis of double compressed JPEG images [86, 87]. Double compression occurs
when a JPEG image has been decompressed and re-compressed with different
JPEG quality factors after embedding the secret message. This can occur when
F5 or OutGuess is used to generate the stego image. According to the authors,
the double compression effect will make cover image estimation inaccurate. Hence,
the results of steganalysis may be misleading.
The main difficulty lies in the unavailability of the previous or first JPEG quality
factor. To alleviate this problem, the authors use an estimation algorithm from
[72] to estimate the previous JPEG quality factor. The estimation algorithm
utilises a set of neural networks to compute the closest estimation, based on the
43
JPEGimage
Q
Q'
JPEGdecompression
Cropping
JPEGcompression
JPEGdecompression
JPEGimageSpatialimage
JPEGimage
Q
JPEGimageSpatialimage
JPEGimageSpatialimage
JPEGcompression
QuantisationMatrix Estimation
JPEGimage
Q'
Q
Figure 3.10: The modified image calibration steps used for double compressedJPEG image. The shaded box represents the calibrated image and Q denotes theJPEG quality factor
DCT coefficients of five lowest-frequency AC modes. With the estimated JPEG
quality factor, the updated image calibration process proceeds as follows: First,
the JPEG image is decomposed, cropped and re-compressed with the estimated
JPEG quality factor. Then, the re-compressed JPEG image is decomposed again
and re-compressed for the second time using the second JPEG quality factor—The
second JPEG quality factor is the one stored in the JPEG image before calibration.
These steps are shown in Figure 3.10. The rest of the feature extraction process
remains the same.
Pevny and Fridrich later discovered that some important information might be
lost due to the existing features representation [88]. Hence, they enhanced some
of the features by replacing the L1 norm with the feature differences within a
carefully chosen DCT coefficient range. Only a subset of features is involved in
the improvement—the global histogram, individual histograms, dual histograms
and co-occurrence matrices. According to the authors, their feature set effectively
models the inter-block dependencies of DCT coefficients. To build a strong multi-
class steganalysis requires features that can also model intra-block dependency.
Hence, the authors incorporate the feature set developed in [99] with their ex-
tended feature set. Prior to the incorporation, the feature set developed in [99] is
averaged and calibrated.
From the work of Pevny and Fridrich, the authors combined and constructed a
complete functional multi-class steganalysis in [89]. This system was developed to
handle both single and double compressed stego images generated from current
popular steganographic techniques. The system can perform classification under
a diversified range of JPEG quality factors. In addition, for some non-standard
44
JPEG quality factors tested, the system also shows reliable classification results.
The experimental results reported proved the system could classify when presented
with stego images that it had not been previously trained to interpret.
In a different work [31], Dong et al. constructed a multi-class steganalysis based
on the analysis of image run length. This work is an extension of their previous
work [30]. The main contribution of this work is the ability to perform multi-class
steganalysis across different image domains—the same technique can be used to
classify spatial (e.g., BMP) and frequency (e.g., JPEG image) domain images.
This shows the ability to generalise, which is desirable in multi-class steganalysis.
The core feature in their work is the histogram of image run length. Image run
length can be considered a compression technique. A sequence of consecutive
pixels with the same intensity along a direction can be represented compactly as a
single intensity value and count. This forms a matrix r(g, ℓ) with intensity value
g and count ℓ as the axes. For an 8-bit image and a maximum count of run length
L, the histogram of the image run length can be defined as follows:
H(ℓ) =
255∑
g=0
r(g, ℓ), 1 < ℓ < L, (3.3)
Note that the histogram count defined in Equation (3.3) is for one direction. Other
directions (e.g., 0◦, 45◦, 90◦ and 135◦) are computed in a similar manner. Based on
the histograms of image run length, several higher-order moments are computed
and used as a feature set.
The embedding of a secret message alters the distribution of the run length. More
precisely, the original pixel sequence, with identical intensity, will be turned into
different shorter sequences. These changes will be significantly reflected in the
image run length.
The reported experimental results show comparable performance in the spatial
and frequency domains; however, these results may not be representative because
the experimental message lengths are arbitrary. It is well known that detection
accuracy is influenced significantly by the size of the embedded message. It will
be useful to determine a fair measurement in terms of message length that can
be used in both image domains such that the detection performance accurately
reflects the discriminative power of the multi-class steganalysis.
45
Multi-class classification
The second part of multi-class steganalysis is classification. The most common
classifier used in multi-class steganalysis is the support vector machine. In general,
there are two methods for constructing a multi-class classifier: the all-together
method and the method that combines several two-class classifiers. The all-
together method can be considered one that solves the entire classification with
a single optimised classifier. Clearly, this method requires more computational
resources and involves a complex classifier. The other method solves multi-class
classification problem by combining several two-class classifiers (for brevity, we
refer to this as the multiple two-class classifiers method). This method requires
relatively less computational resources and provides competitive classification ac-
curacy.
According to the review in [53], there are three multiple two-class classifier ap-
proaches: one-against-one, one-against-all and directed acyclic graph support vec-
tor machine (DAGSVM). Based on the findings in [53], one-against-one is prefer-
able and more suitable for practical applications. Examples of work using this
approach in multi-class steganalysis can be found in [97, 89, 31].
The first step in the one-against-one approach is to perform a normal two-class
classification among the classes. Every two-class classifier is trained to classify one
class against each of the other classes. For instance, the first two-class classifier
is assigned to distinguish between cover images and type-1 stego images. The
next two-class classifier is assigned to distinguish between type-1 and type-2 stego
images and so on until all pairs of combinations are formed. This method uses
K(K−1)/2 two-class classifiers for all pairs of classes, where K is the total number
of classes. The conceptual diagram for this approach is shown in Figure 3.11.
Cover
Stegotype-2
Stegotype-1
Cover
Stegotype-1
Cover
Stegotype-2
Stegotype-2
Stegotype-1
= + +
Figure 3.11: The multi-class classification on the left is formed by a combinationof several two-class classifications on the right
The second step is to employ a strategy to determine the correct class for the
46
image. A commonly used strategy is majority voting or the max-wins strategy. In
the majority-voting strategy, the results from each two-class classifier are obtained
and accumulated. From the accumulated results, the class receiving the highest
count is assigned as the correct class. If two classes obtain the same highest count,
one class is randomly selected.
Clearly, an embedding algorithm alters a cover image one way or another. This
implies that, with an effective feature set, the distance in the feature space be-
tween cover images and all types of stego images should be large. However, the
distance in the feature space within each different type of stego images should
be comparatively smaller. Therefore, it is more efficient to use blind steganalysis
(i.e., two-class classification) to initially separate cover images from stego images.
Then, multi-class classification can determine the type of embedding algorithm.
This scheme reduces the number of classes and the number of classifiers. The
efficiency of this scheme has been proven and reported in [31].
3.2.3 Estimation of Message Length
If the steganographic method is known to the adversary, he can begin to recover
some attributes about the embedded message. For instance, the steganalysis tech-
nique used may provide the adversary with an estimate of the number of em-
bedding changes. For that, the adversary can approximately infer the embedded
message length. Further on, we will discuss several well-known steganalysis tech-
niques that estimate the length of an embedded message.
Note that the LSB embedding algorithm that overwrites the pixel LSBs will not
change the grand total frequencies of pixel intensities. Only the frequencies of
occurrence are swapped between these intensities. In other words, when embed-
ding occurs, the frequencies of occurrence for odd pixel intensities are transferred
to the corresponding even pixel intensities and vice versa. These frequencies of
odd-even pixel intensities are called pairs of values (PoV), (2i, 2i+1). This change
involves swapping the frequencies of occurrence within each PoV and the sums of
the frequencies in every PoV remain the same. If the message bits are uniformly
distributed (typically the case, because the message is encrypted), the frequen-
cies of the intensities in each PoV will become identical after embedding (refer to
Figure 3.12).
From this observation, Westfeld and Pfitzmann [111] developed a steganalysis
technique based on the chi-square test (known as a chi-square or χ2 attack). The
47
85 90 950
100
200
300
400
500PoV
(2i, 2i+1)
PoV(2i, 2i+1)
PoV(2i, 2i+1)
(a)85 90 95
0
100
200
300
400
500
PoV(2i, 2i+1)
PoV(2i, 2i+1)
PoV(2i, 2i+1)
(b)
Figure 3.12: (a) A portion of image histogram before embedding. (b) The sameportion of image histogram after embedding. Notice that the histogram bins ofeach PoV have been equalised
chi-square test measures the degree of similarity between the observed sample
distribution and the expected frequency distribution. The observed sample dis-
tribution is obtained from the given image distribution. The expected frequency
distribution is computed from the arithmetic mean of the PoVs. The χ2 attack
can estimate the length of an embedded message as long as the message is em-
bedded sequentially. However, the attack is unable to provide reliable detection if
the message bits are randomly embedded in the image.
To address this weakness, Provos and Honeyman [90, 91] extended the chi-square
attack. In contrast to the previous chi-square attack (where the sample size was
increased from a fixed start location along the test), the extended chi-square attack
uses a fixed sample size that moves over the entire image. The start location
for the fixed sample size is set at the beginning of an image and moved with a
constant distance along the test. Another difference is that, instead of computing
the PoV arithmetic mean, the expected frequency distribution is obtained from
the arithmetic mean of pairs of unrelated coefficients.
Although the χ2 attack is effective for attacking generic LSB replacement steganog-
raphy, it fails if a steganography employs a complicated algorithm such as F5 [110].
For that, Fridrich et al. [39, 40] developed a steganalysis technique targeted specif-
ically to attack F5. This technique can estimate the length of a hidden message
embedded in the JPEG image.
The main idea is based on the proportionality of a defined macroscopic quantity
and the hidden message length. In other words, the size of the embedded message
will be reflected in the macroscopic quantity. Hence, the hidden message length
can be determined by computing the macroscopic quantity.
48
The first step in this technique is to estimate a copy of cover image from the
given stego image. The estimation is carried out by cropping four pixels in both
the horizontal and vertical directions after decompressing the stego image. The
cropped image is then recompressed with the same JPEG quantisation table from
the stego image. Note that this process is the image calibration discussed in the
preceding subsection.
In the next step, the authors use the histograms of several low-frequency DCT
coefficient modes as the macroscopic quantity. The histograms used are from the
given stego image and the estimated cover image. The modification caused by the
embedding will be reflected on the distribution of the histograms. Hence, based
on the histograms, the modification rate can be determined. Finally, with the
modification rate, the size of the hidden message can be computed.
In [38], Fridrich et al. launched an attack on OutGuess using a similar concept.
They started by determining a macroscopic quantity that progressively changes
with the size of the embedded message.
Due to the LSB flipping algorithm of OutGuess, embedding increases the spatial
discontinuities at the boundaries of all 8 × 8 blocks. Hence, the authors used
blockiness as the macroscopic quantity for measuring the degree of change that
occurred at the boundaries. In Figure 3.13, we show an illustration of the 8 × 8
block boundaries, where the blockiness measurement is calculated.
A magnified viewof an 8x8 block
(i)-th block (i+1)-th block Boundaries of
an 8 x 8 block
(N-1)-th block (N)-th block
Figure 3.13: The shaded regions denote the boundaries of 8 × 8 blocks in a de-compressed JPEG image
According to the authors, the increase of blockiness is expected to be smaller in
49
a stego image than in a cover image when a full-length dummy message (random
bits) is artificially re-embedded using OutGuess. This is because of the partial can-
cellation effect on the stego image. For example, the LSB of a pixel xi is changed
from zero to one after embedding. For re-embedding a full-length message, the
LSB of pixel xi is changed back from one to zero.
Note that this attack also depends on the estimation of the cover image, which
is the same technique used in [39]. With the blockiness measurement and the
re-embedding of a full-length message on the given stego image and the estimated
cover image, a linear interpolation is used to estimate the length of the embedded
message.
In another example, He and Huang [50, 49] analysed non-adaptive stochastic mod-
ulation steganography [36] and showed how to estimate the length of a hidden
message. Stochastic modulation steganography is a noise-additive steganography,
where a signal with specific probabilistic distribution is modulated and added to
carry the message bits. The signal in this context is known as stego noise.
The attack is based on the differing probability distributions of pixel differences for
cover and stego images. More precisely, the probability distribution of pixel differ-
ence for a cover image closely follows a generalised Gaussian distribution (GGD),
while the probability distribution of a stego image reflects the statistical charac-
teristics of a hidden message. Note that, for non-adaptive stochastic modulation
steganography, the probabilistic distribution of a stego image’s pixel difference is
a convolution of the probabilistic distributions of a cover image’s pixel difference
and a stego noise difference. Thus, the attack starts by establishing a model to
describe the statistical relationship among the cover image, stego image and stego
noise. Next, the required distributional parameters are estimated from the given
stego image. Then, based on the distributional parameters, the authors employ a
grid search and chi-square goodness of fit test approach to estimate the length of
the embedded message.
The experimental results reported show promising detection accuracy. The au-
thors mention that this steganalysis technique is not only effective for noise addi-
tive steganography, but also suitable for other types of non-adaptive steganography
(e.g., LSB-based steganography and ±k steganography). Unfortunately, no fur-
ther details regarding this are provided. This technique depends significantly on
the assumption where the pixel difference of a cover image is accurately modelled
as GGD. However, this assumption will likely cause this technique to fail when
the analysed cover image is a binary image. The failure is due to the modest
50
statistical characteristic of a binary image.
Jiang et al. [57] launched an attack on boundary-based steganography that embeds
a secret message in a binary image. Their attack hinges on the observation that
embedding disturbs pixel positions and this degrades the fit of the autoregressive
model on binary object boundaries.
The attack works by assuming that the boundaries of characters or symbols in a
textual document can be modelled by a cubic polynomial. This allows a bound-
ary pixel to be estimated from its neighbouring pixels through an autoregressive
process. An estimation error vector is computed from the given and estimated
boundary pixels. Then the mean and variance of the estimation error vector are
calculated. According to their experiments, the mean and variance increase pro-
portionally with respect to the relative message length. Hence, based on some
testing samples, a linear equation is defined for message length estimation. How-
ever, this attack is not applicable when the object boundaries cannot be modelled
by a cubic polynomial.
In [56], Jiang et al. launched another attack on binary image steganography. This
attack is based on the idea that the entropy of a stego image is a monotonically
increasing function of the embedding rate. A JBIG 2 binary image compression
algorithm is used to capture the entropy. This compression algorithm establishes a
quantitative relationship between the compression rate and embedding rate. Thus,
the estimate of message length can be derived from the computed compression rate.
The message length estimation steganalysis techniques mentioned above are mainly
focused on a specify steganography. The work developed in [71] generalises the esti-
mation technique so that it can be applied to a wider range of steganographic meth-
ods. Indeed, it can be considered a multi-class steganalysis technique that uses a
multi-class classifier to estimate the hidden message length. The authors employ
SVM classifiers with one-against-all strategy to perform the multi-classification
tasks. The measurement of standard mean square error is modified and used as a
feature set.
Unfortunately, the use of multi-classification technique to estimate the message
length is of limited use and impractical. Unlike the multi-class steganalysis dis-
cussed in Subsection 3.2.2, where the number of classes is small, treating each
different message length as a single class will contribute to a large number of
classes. For instance, if there are n classes of steganographic methods with m
different lengths, the multi-classifier will be required to classify n × m different
51
classes. Clearly, when n and m increase, the classification will become ineffective
as the extracted feature points may have significant overlaps. Therefore, the mes-
sage length estimation technique developed in [71] will likely become unreliable
when the number of classes is large.
3.2.4 Identification of Stego-Bearing Pixels
In some cases when the adversary is certain and has knowledge about the stegano-
graphic method used, this provides an opportunity for him to identify which pixels
are used to carry the message bits.
The work in [27] is motivated by the concept of outlier detection. A model of image
distribution is built first. After that, any pixel that deviates from the model is
identified as an outlier. Together with the outlier detection, the authors of [27]
opted to utilise an image restoration technique. Their idea is that a pixel altered
to carry a message bit will deviate from the image distribution and be identified
as an outlier. When the image restoration technique is applied, the pixel (outlier)
will be automatically removed. Obtaining the list of removed pixels identifies the
locations of stego-bearing pixels.
Due to the wide variety of image content, a non-parametric model will be more
suitable and useful than a parametric model. Hence, the image pixel energy has
been used to model the distribution. In the restoration process, each pixel is
examined and may be conditionally updated to minimise the pixel energy. The
whole process is repeated until convergence occurs.
The developed technique is reported to work with greyscale and colour palette
images, such as GIF. However, the authors report that their technique may be
defeated if the message is adaptively embedded in the high-energy regions of an
image. This also implies that identification of stego-bearing pixels becomes unre-
liable when an image with rich texture content is used as the cover image.
Kong et al. developed a steganalysis technique to identify the region in a colour
image where a secret message is embedded [68]. This technique is specifically
targeted to attack steganography with a sequential embedding algorithm. The
idea comes from the fact that when the colour components (e.g., red, green and
blue) of the colour image are altered independently, the smoothness of the colour
will be disturbed. According to the authors, this observation becomes prominent
under the investigation of a different colour system (e.g., HIS, YUV and YCbCr)
52
which uses luminance, hue and saturation to describe the colours.
Kong et al. suggest that, in general, the hue of a cover image varies slowly and
tends to be constant in a small neighbourhood of pixels. This is no longer true
when a hidden message is embedded. Thus, when the coherence of the hue in a
region under examination exceeds a certain threshold, there is a good reason to
suspect that it contains bits of a hidden message.
The algorithm of Kong et al. technique can be summarised as follows. Given a
colour image, their technique will partition the image into blocks and examine
each block separately. The pixels within each block are divided into two distinct
groups: coherence and incoherence. A pixel is assigned to the coherence group
if the maximal difference of hue between this pixel and its neighbouring pixels
is less than a threshold. In addition, at least one neighbouring pixel with the
same hue as that pixel must exist. Any pixel that fails to fulfil these conditions
will be assigned to the incoherence group. The ratio of the coherence group to
the incoherence group determines if a block should be labelled as a stego-bearing
region.
Note that this technique involves a high degree of threshold dependency. For exam-
ple, the technique requires three different thresholds to identify the stego-bearing
region. This may be a drawback in practice, especially with a wide variety of image
content. Hence, careful selection of cover images renders this technique ineffec-
tive. Furthermore, this technique only works for steganography with sequential
embedding. If the embedding is random, then this attack will fail.
In [62], Ker argues that it is possible to have a situation, where several different
cover images are used for a batch of secret communications. These images are the
same size, but embedded with different messages. It is very likely that the same
stegokey will be used for this batch of communications.
We can relate a simple but plausible scenario to this assumption. For instance, a
batch of secret communications can use a set of different images, captured with
the same settings of a digital camera. This produces images with the same size.
For security reasons, random embedding algorithms are preferable. As usual,
the random embedding is controlled by a stegokey and it is quite possible the
same stegokey is reused for the entire batch of communications. This will result
in different messages embedded in the same locations across different images. In
addition, it is also possible that a sequential embedding algorithm is used, resulting
in embedding with the same fixed pixel locations.
53
This is what inspired Ker [62] to develop a technique to identify the locations of
these stego-bearing pixels. In this work, Ker employed a weighted stego image
(WS), initially developed in [37] and later improved [63]. The analysis is based on
the residuals of WS. The residual of a WS is the pixel-wise difference between the
stego image and the estimated cover image. The residual at the ith pixel can be
defined as follows:
ri = (si − si)(si − ci), (3.4)
where si is the ith pixel of the stego image, si is the corresponding stego pixel
with its LSB flipped and ci is the pixel of the estimated cover image.
With access to multiple stego images, as in the scenario described above, the mean
of residual at the ith pixel can be computed as follows:
ri· =1
N
N∑
j=1
rij , (3.5)
where N is the total number of stego images. rij is obtained as in Equation (3.4)
for the jth stego image and can be defined as follows:
rij = (sij − sij)(sij − cij). (3.6)
When N is sufficiently large, the mean of the residual will provide strong evidence
for separating stego-bearing pixels from normal pixels.
Note that the analysis developed in [62] is most effective for LSB replacement
steganography. It may become ineffective for other steganography, such as LSB-
matching steganography. Being aware of this limitation, Ker and Lubenko ex-
tended the work to cover the analysis of LSB-matching steganography in [64].
They use the residuals of wavelet absolute moments (WAM), which are derived
from the feature set developed for blind steganalysis in [44]. The residuals of
WAM are computed as follows. Given an image, one level of wavelet decomposition
using an eight-tap Daubechies filter is employed. The decomposition produces four
subbands—the low frequency, vertical, horizontal and diagonal subbands. Then,
the authors use a quasi-Wiener filter to compute the residuals of WAM from the
vertical, horizontal and diagonal subbands. However, the low-frequency subband
is not used and initialised to zero. The residuals of WAM and the zeroed low
frequency subband are reconstructed through the inverse of the wavelet transform.
This reconstruction produces something similar to a spatial domain image that
54
Image
Inverse wavelettransform
Quasi-Wienerfiltering
Waveletdecomposition L
VH
D
R[V]
R[H]
R[D]
Residualimage
ZeroFilling
L'
Figure 3.14: The extraction of residual image. L, V, H and D denote the lowfrequency, vertical, horizontal and diagonal subbands, respectively. L' is the zeroout low frequency subband. R[V], R[H] and R[D] denote the residuals of WAMfrom the vertical, horizontal and diagonal subbands, respectively
the authors call a spatial domain residual image. The whole process is depicted
in Figure 3.14.
Similar to [62], and based on the same assumption, where multiple stego images
are available and the same stegokey is reused for the embedding, the identification
of stego-bearing pixels is performed. The identification starts by computing the
mean of the absolute residual for every pixel across all stego images, Note that the
mean of the absolute residual is identical to the mean of the residual defined in
Equation (3.5). The pixel used to carry the message bit will have a higher mean
of the absolute residual.
Thus, the locations of stego-bearing pixels can be identified by selecting p number
of pixels with the highest mean of the absolute residual. According to the authors,
p can be estimated by a quantitative steganalysis technique (the analysis discussed
in Section 3.2.3). However, if the estimation of p is less accurate, the inaccuracy
of p will cause inaccurate identification of stego-bearing pixels.
One important observation is that cover image estimation has a significant effect
on the analysis. In general, the accuracy of the analysis can be greatly improved
by using a more accurate estimation technique. In [62, 64] it is very important
to keep the number of required stego images to a minimum. For example, if
several hundred stego images are required to obtain an accurate identification, the
technique will be of limited use.
55
3.2.5 Retrieval of Stegokey
In a scenario where the embedded message is not encrypted and the key space
is small, it is very likely that the adversary can mount a dictionary attack or
brute-force search for the stegokey. For example, for every stegokey tried, the
adversary gets an alleged message and the correct stegokey would be revealed
when a meaningful message is obtained. The following examples show a more
advanced version of this type of attack.
Fridrich et al. [41] developed a steganalysis technique that can retrieve the ste-
gokey. The technique was developed with several assumptions: (i) the retrieved
stegokey is the seed of a pseudorandom number generator (PRNG), and (ii) the
steganalysis is independent of the encryption algorithm. As the steganography
may use a mapping component, such as a hash function, to map the password
to the seed, it is reasonable to assume the retrieved stegokey is the seed of a
PRNG rather than the password. Normally, a message will be encrypted before
the embedding algorithm is applied. For that, a stegokey (or stegokeys) is used
in the encryption algorithm as well as the PRNG that generates the embedding
path in the embedding algorithm. Clearly, the computational of stegokey retrieval
may become infeasible—this is where the second assumption comes into play. The
technique involves only finding the seed used in the embedding algorithm and
discarding the encryption algorithm.
Given an image with N pixels, where m < N pixels are randomly selected during
embedding to carry the secret message bits, the embedding path generated by the
stegokey is a random path. The steganalysis technique starts by taking n samples
of pixels where n < m. The n samples are selected randomly from the stego image
and the random selection path is generated from a seed kj. kj is from the stegokey
space. The correct stegokey is determined through a brute force search within the
stegokey space for different kj. The distributions of n samples for the correct and
incorrect stegokeys are different; therefore, it is suitable to use their probability
density functions (PDFs) as the statistical properties. Finally, the chi-square test
is used. For every tested stegokey, the chi-square statistic is obtained and the
outlier will be identified as the correct stegokey.
This technique was tested on two JPEG steganographic methods—F5 and Out-
Guess. Later, they extended the analysis to cover spatial domain image steganog-
raphy [42]. In [42], Fridrich et al. chose generic LSB replacement and LSB match-
ing steganography as the benchmark steganographic methods for testing their
extended steganalysis technique.
56
Their extended technique adds a pre-processing step. In the pre-processing step,
a non-linear filtering operation is used to increase the signal-to-noise ratio (SNR)
between the stego signal and the cover image. Thus, instead of pixel values,
residuals are used. Residuals are computed from the difference between the pixels
values of the image and its filtered version.
Although both techniques are powerful and can be applied practically to a wider
class of steganography, the following issues may reduce their effectiveness:
i. The embedded message occupies 100 per cent of the image capacity.
ii. Matrix encoding is employed as part of the embedding algorithm.
iii. The speed of the PRNG is reduced (hence the brute search time increases
exponentially and makes the technique infeasible).
The authors also noted that their technique could become complicated and difficult
when the stego image is noisy or the stegokey space is huge. However, if multiple
stego images embedded with the same stegokey are available, this will increase the
probability of retrieving the correct stegokey.
A similar analysis is found in [105, 106]. The focus of the analysis is on retrieving
the stegokey in a sequential embedding algorithm. The stegokey is defined differ-
ently and identified as the start-end locations of a consecutive embedding path.
To identify the start-end locations, the analysis employs a sequential analysis tech-
nique called the cumulative sum (CUSUM). The idea is to detect a “sudden jump”
in the statistic, which indicates the existence of a message.
The authors extended the steganalysis by utilising a locally most powerful (LMP)
sequential statistical test. LMP test is an optimum statistical test for the detection
of weak signals. The extended steganalysis is mainly used to handle the difficulty
of analysing messages with a low signal-to-noise ratio (SNR)1 in a stego image. The
CUSUM was later combined with LMP to form an enhanced steganalysis technique
and its effectiveness tested with spread-spectrum steganography. In addition, it
can detect multiple messages embedded sequentially in different image segments.
The developed steganalysis technique seems to be a useful tool to evaluate the
security level in watermarking applications. However, this technique may not be
suitable for analysing steganography, especially when it uses a random embedding
algorithm. Note that the analysis presented in [105, 106] is more related to the
identification of stego-bearing pixel locations (discussed in Subsection 3.2.4) than
1Low SNR is often required to maintain the imperceptibility of embedding.
57
stegokey retrieval.
3.2.6 Extracting the Hidden Message
In general, messages are encrypted using cryptographic strong encryption algo-
rithm before embedding. This enhances security and provides a dual layer of se-
curity. Therefore, we might not be able to obtain a meaningful message even after
extracting a hidden message using the steganalysis techniques discussed. Clearly,
it is most desirable for the extracted message to be deciphered.
It is reasonable to consider and separate the analysis of steganography into two
phases: steganalysis and cryptanalysis. Steganalysis involves the analysis dis-
cussed from Subsection 3.2.1 until 3.2.5, whereas cryptanalysis deciphers the hid-
den message extracted in the steganalysis phase. Note that if a message is not
encrypted before embedding, cryptanalysis is not required.
58
Chapter 4
Blind Steganalysis
In general, there are two types of steganalysis—targeted and blind. Targeted
steganalysis is designed to attack one particular embedding algorithm. For exam-
ple, Bohme and Westfeld [7] broke model-based steganography [96] using analy-
sis of the Cauchy probability distribution. In another example, He and Huang
[49] successfully estimated the hidden message length for stochastic modulation
steganography [36], where a specific probabilistic distribution of a signal is modu-
lated and added to carry message bits. Jiang et al. in [57] launched an attack on
boundary-based steganography, which embeds secret messages in binary images.
Their attack hinges on an observation that embedding disturbs pixel positions and
this degrades the fit of an autoregressive model on binary object boundaries. The
work in [41, 42] showed how to estimate the stegokeys used for embedding hidden
messages. Targeted steganalysis can produce more accurate results, but can fail
if the embedding algorithm differs from the target one.
Blind steganalysis can be considered a universal technique that detects differ-
ent types of steganography. Because blind steganalysis can detect a wider class
of steganographic techniques, it is generally less accurate compared to targeted
steganalysis. However, blind steganalysis can detect a new steganographic tech-
nique, when there is no targeted steganalysis available. Thus, blind steganalysis
is an irreplaceable detection tool if the embedding algorithm is unknown or se-
cret. Successful blind steganalysis techniques include the feature-based steganal-
ysis proposed in [35], where a set of effective statistical characteristics (features)
is extracted to differentiate cover images from stego images. A similar technique
relying on pixel differences was used in [99, 70] to detect hidden messages—this
feature was proven to work well. Meng et al. employ a run length histogram in
their steganalysis to detect a hidden message in binary images [77].
59
In this work, we further study blind steganalysis and its effectiveness in detecting
a secret message embedded in a binary image. To confirm that our attack works,
we experiment with steganographic techniques, for which we have reduced the
length of an embedded message. In other words, we use images with a reduced
steganographic payload1 in our experiments. Our experiments have shown that
our steganalysis works well over a wide range of steganographic payloads.
The organisation of this chapter is as follow. In the next section, we give a brief
comparison of the steganographic methods. The technique of analysis we apply is
given in Section 4.2. Section 4.3 presents the experimental results of the analysis
and Section 4.4 concludes the chapter.
4.1 Comparison of the Steganography Methods
under Analysis
The steganographic techniques in [107, 10] are actually extended from [82]. With-
out loss of generality, we will describe the technique given in [82] as an example.
A detail description of these three techniques can be found in Section 3.1.
Steganography involves two basic operations—embedding and extraction. The em-
bedding operation in [82] starts by partitioning a given image into non-overlapping
blocks of size m × n. The payload for each non-overlapping block is r bits. The
message bits are segmented into streams of r bits and embedded by modifying
some pixels in the blocks. The modification of the pixels is governed by certain
criteria computed through bitwise exclusive-OR and pair-wise multiplication oper-
ations between the non-overlapping block, random binary (denoted κ) and secret
matrix (denoted ω). Both κ and ω are m× n matrices and serve as the stegokey.
During extraction, parameters such asm, n and r must be communicated correctly
between the sender and receiver to construct the correct size of non-overlapping
blocks and number of r bits per stream. In addition, the correct stegokey (κ and
ω) is needed to extract the secret message. After that, the receiver can derive the
message bits by using the inverse of the criteria used in the embedding operation.
The steganography in [107] is an improved version of the steganography devel-
oped initially in [82]. The improvement is mainly control of the visual quality
1Reducing the payload minimises the alteration of image pixels and cause less distortion,hence increasing the steganographic security.
60
Table 4.1: Comparison of the steganographic techniques
Steganography Secret Matrix, Payload, r Image Altered
ω Quality Bit
Pan et al. [82] weight matrix ≤ ⌊log2(mn+ 1)⌋ – 2
Tseng & Pan [107] weight matrix ≤ ⌊log2(mn+ 1)⌋ − 1 enhanced 2
Chang et al. [10] serial number ≤ ⌊log2(mn+ 1)⌋ – 1
matrix
of the produced stego image, where only boundary pixels are flipped. The third
steganography [10] is also improved from the steganography in [82]. As compared
to the method in [82] where two bits alteration per block at most are required,
the method in [10] only requires one bit. The extraction operation for Tseng and
Pan [107] and Chang et al. [10] steganographic methods are similar to that of
Pan et al. [74]. All of these steganographic techniques were developed to embed
secret message in binary images. Table 4.1 compares the main steganographic
characteristics of the techniques.
It is interesting that these techniques, for example the method in [82] can embed
as many as ⌊log2(mn+1)⌋ message bits but only need to alter two pixels at most.
Whereas, in conventional techniques, a one-pixel alteration accommodates one
message bit at most. Further, adjusting m and n changes the payload and affects
the security level. This gives flexibility in balancing between payload and security
level.
4.2 Proposed Steganalysis Method
Blind steganalysis can be viewed as a supervised machine learning problem that
classifies points as either points of an original cover image or points related to
an inserted secret message. Our analysis includes feature extraction and data
classification. The first stage is crucial and we show how to construct the features.
The second stage uses the SVM [23] as the classifier. SVM is based on the idea of
hyperplane separation between two classes. It obtains an optimal hyperplane that
separates the feature set of different classes into different sides of the hyperplane.
Based on the separation, the class an image belongs to can be determined.
61
4.2.1 Grey Level Run Length Matrix
The feature we want to extract from images is based on the grey level run length
(GLRL). The length is measured by the number of consecutive pixels for a given
grey level g and direction θ. Note that 0 ≤ g ≤ G − 1, G is the total number of
grey levels and θ, where 0◦ ≤ θ ≤ 180◦, indicates the direction. The sequence of
pixels (at a grey level) is characterised by its length (run length) and its frequency
count (run length value), which tells us how many times the run has occurred in
the image. Thus, our feature is a GLRL matrix that fully characterises different
grey runs in two dimensions: the grey level g and the run length ℓ. The GLRL
matrix is defined as follows:
r(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ s, y + t) = g;
p(x+ u, y + v) 6= g;
0 ≤ s < u & 0 ≤ t < v;
u = ℓ cos(θ) & v = ℓ sin(θ);
0 ≤ g ≤ G− 1 & 1 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (4.1)
where # denotes the number of elements and p(x, y) is the pixel intensity (grey
level) at position x, y. G is the total number of grey levels and L is the maximum
run length.
The extracted GLRL matrix from an image can be considered a set of higher-
order statistical characteristics. The GLRL matrix is not sufficient for an analysis
of black and white images, where the number of grey levels is drastically reduced
(note that in greyscale and colour images, the number of grey levels is larger than
that in binary images, which is at least 256). To fix this technical difficulty, we
propose a solution that allows us to create more grey levels and, consequently,
more meaningful statistics. Our approach is to use pixel differences.
4.2.2 Pixel Differences
The pixel difference is the difference between a pixel and its neighbouring pixels.
Given pixel p(x, y) of an image, with x ∈ [1, X ] and y ∈ [1, Y ], where X and Y
are the image width and height, respectively, the vertical difference for the pixel
62
p(x, y) in the vertical direction is defined as follows:
pv(x′, y′) = p(x, y + 1)− p(x, y), (4.2)
where x′ ∈ [1, X − 1] and y′ ∈ [1, Y − 1]. The pixel differences in the horizontal,
main diagonal and minor diagonal directions are defined similarly.
It is easy to observe, and has been confirmed by experiments, that introducing the
pixel difference increases (almost doubles) the number of grey levels. To illustrate
the point, consider a greyscale image with 256 grey levels. After introduction of
pixel difference, the range of grey levels becomes [−255,+255]. The same dou-
bling effect happens for binary images. This effect is desirable for addressing the
technical difficulty mentioned in Subsection 4.2.1.
The authors in [70] named this similar pixel difference a high-order differentiation
and derived some additional sets. Their features are defined as below:
pn+1c (x, y) = pn(x, y + 1)− pn(x, y),
pn+1r (x, y) = pn(x+ 1, y)− pn(x, y), (4.3)
p1(x, y) = |p1r(x, y)|+ |p1c(x, y)|, (4.4)
p2(x, y) = p1r(x, y)− p1r(x− 1, y) + p1c(x, y)− p1c(x, y − 1), (4.5)
where n = 0, 1, 2 and | · | represents the absolute value. p1c(x, y) and p1r(x, y) can
be considered the pixel difference in the vertical and horizontal directions, respec-
tively. p1(x, y) and p2(x, y) are the respective higher-order total differentiations.
p0(x, y) is a special case, which is the given image.
4.2.3 GLRL Matrix from the Pixel Difference
The statistical features we use in our analysis are developed in the following two
stages:
1. In the first stage, we use the pixel difference to increase the number of grey
levels. We incorporated the pixel difference shown in Equation (4.5). Note
that p2(x, y) in Equation (4.5) is obtained by summing the pixel differences
computed in the horizontal and vertical directions. Note that the doubling
effects of grey levels on the pixel difference increased from [0, 1] for p(x, y)
63
to [−4, 4] for p2(x, y). This is not hard to verify. For example, consider the
minimum and maximum of grey levels for p1r are −1 and +1, respectively.
The same is applied to p1c . Hence, we can obtain the minimum and maximum
grey level for p2(x, y). The minimum is obtained when the neighbouring
differences for both p1r and p1c (right components) in Equation (4.5) are −2,
which will produce p2(x, y) = −4, whereas the maximum is obtained when
the neighbouring differences for both p1r and p1c are 2, which will produce
p2(x, y) = 4.
2. In the second stage, we compute GLRLmatrix to get the required feature set.
This is achieved by extracting the GLRL matrix discussed in Subsection 4.2.1
on top of the pixel difference obtained in the first stage. We do not include
p(x, y) (pixel from the given binary image) and all the pixel differences as in
Subsection 4.2.2 (except p2(x, y)) because we observed that these features do
not improve the results significantly. We also observed that it is significant
to use only two directions for θ—0◦ and 90◦. Thus, by substituting p(x, y)
by p2(x, y) in Equation (4.1), we obtain our first set of features.
4.2.4 GLGL Matrix
Since GLRL matrix features tend to measure “plateaus” of the image, we need
additional sensitive features to reflect the image “peaks”2. The grey level gap
length (GLGL) matrix proposed in [114] seems to be an appropriate choice. The
authors in [114] used the GLGL matrix in texture analysis and defined it as follows:
a(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ u, y + v) = g;
p(x+ s, y + t) 6= g;
s < u & t < v;
u = (ℓ+ 1) cos(θ) & v = (ℓ+ 1) sin(θ);
0 ≤ g ≤ G− 1 & 0 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (4.6)
where # denotes the number of elements, L is the maximum gap length and the
rest of notations are the same as in Equation (4.1).
We compose two features from the GLGL matrix for our second feature set. The
first feature is composed directly from the binary image, as shown in Equation
2Small pixel-wide notches and protrusions near the boundary pixels caused by embedding.
64
(4.6). The second feature is from the pixel difference, similar to that from the
GLRL matrix in Subsection 4.2.3. We replace p(x, y) in Equation (4.6) with
p2(x, y) and set θ = 0◦ for both features.
4.2.5 Final Feature Sets
It is too computational expensive to use all elements in the GLRL and GLGL
matrices as feature elements. Therefore, we propose to simplify by transforming
the two-dimensional GLRL and GLGL matrices to one-dimensional histograms.
hGLRLg =
L∑
ℓ=1
r(g, ℓ|θ), 0 ≤ g ≤ G− 1, (4.7)
where θ = 0◦ and 90◦, and the rest of notations are the same as in Equation
(4.1). In addition, we observe that, within a GLRL matrix, there are some high
concentration of frequencies near the short runs, which may be important. Hence,
we propose to extract the first four short runs as a single histogram, hsrα
g .
hsrα
g = r(g, α|θ), 0 ≤ g ≤ G− 1, (4.8)
where θ = 0◦ and 90◦, and α = 1, 2, 3, 4 are the selected short runs.
The one-dimensional histogram of the GLGL matrix hGLGLg can be obtained in the
same way as in Equation (4.7). r(g, ℓ|θ) is replaced by a(g, ℓ|θ) for both p(x, y)
and p2(x, y). θ in this case is 0◦ and 0 ≤ ℓ ≤ L.
We also incorporate some of the high-order differentiation features developed in
[70]. These one-dimensional histograms can be derived from Equations (4.3) to
(4.5) and shown as follows:
hpn
q =X∑
x=1
Y∑
y=1
δ(q, pn(x, y)), minp ≤ q ≤ maxp, (4.9)
hpmc +pmrq =
X∑
x=1
Y−1∑
y=1
δ(q, pmc (x, y)) +
X−1∑
x=1
Y∑
y=1
δ(q, pmr (x, y)),
minp ≤ q ≤ maxp, (4.10)
where δ(µ, ν) = 1 if µ = ν and 0 otherwise. n = 1, 2 and m = 1, 2, 3. minp and
maxp denote the minimum and maximum values of the grey level, respectively.
65
Other notations are the same as in Subsection 4.2.2.
As noted previously, blind steganalysis can be considered a two-class classification,
so the extracted feature sets must be sensitive to embedding alterations. This is to
say that the feature values of the cover image should be different than those of the
stego image—the larger the difference, the better the features. Hence, we apply
the characteristic function, CF to each of the above histograms to achieve better
discrimination. The characteristic function can be computed using a discrete
Fourier transform, as shown in Equation (4.11).
CFk =
N−1∑
n=0
hne− 2πi
Nkn, 0 ≤ k ≤ N − 1, (4.11)
where N is the vector length, i is the imaginary unit and e−2πiN is the Nth root of
unity.
For each characteristic function (one for each histogram), we compute the mean,
variance, kurtosis and skewness. This includes the characteristic functions cal-
culated from Equations (4.9) and (4.10), where the original work developed in
[70] uses the first-order of moment. We form a 68-dimensional feature space, as
summarised in Table 4.2.
Table 4.2: Respective feature sets and the total number of dimensions for each set
Histogram Type Number of Number Statistic3 Total
Direction of Matrix Dimension
hGLRLg 2 1 4 8
hsrα
g 2 4 4 32
hGLGLg 1 2 4 8
hpn
q – 2 4 8
hpmc +pmrq – 3 4 12
Empirical evidence shows that the difference in feature values between the cover
image and the re-embedded cover image4 is significantly larger than the difference
between those of the stego image and the re-embedded stego image. This is helpful
in creating discriminating features and we apply it in our 68-dimensional feature
space as the final constructed feature set.
3Consists of the mean, variance, kurtosis and skewness.4The re-embedded image is the same image that has been re-embedded with the full-length
random message using the same steganography.
66
Table 4.3: Experimental parameters
Steganography Block Size Payload, Message Length Total Number of
(pixel) r (%) Stego Image
Pan et al. [82] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636
Tseng & Pan [107] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636
Chang et al. [10] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636
4.3 Experimental Results
4.3.1 Experimental Setup
In our experiments, we construct a set of 659 binary images as cover images. The
images are all textual documents with white background and black foreground.
The image resolutions are all 200 dpi and with image size of 800 × 800. The
experimental parameters are summarised in Table 4.3.
As shown in Table 4.3, we used a larger non-overlapping block size (32 × 32) and
shorter secret message (smaller payload of r = 3 bits per block). This setup makes
our attack more difficult as the steganographic embedding is more secure. The
secret message length is measured as the ratio between the number of embedded
message bits and the maximum number of message bits that can be embedded in
a binary image. We employ uniform distribution of random message bits for the
experiments.
We extract the feature sets proposed in Subsection 4.2.5 for each image and use
the SVM implemented in [9] to classify the class (cover or stego) of the image. For
all experiments, we dedicate 80 per cent of the images to training the classifier;
the remaining 20 per cent are used for testing. The prototype implementation is
coded in Matlab R2008a.
4.3.2 Results Comparison
We use a receiver operating characteristic (ROC) curve to illustrate our detection
results. The ROC curve is a plot of detection probability versus false alarm prob-
ability; each point plotted on the ROC curve represents the achieved performance
of the steganalysis. We also use the area under the ROC curve (AUR) to provide
a clearer comparison. The AUR values ranged from 0.5 to 1.0, where 0.5 is the
67
worst detection performance and 1.0 is optimum—an AUR of 0.5 indicates that
the detection is merely random guessing; an AUR of 1.0 means the detection is
very reliable (detection probability = 1.0). Therefore, the closer the AUR is to
1.0, the better it is.
The respective ROC curves with the AUR values (in brackets) are shown in Figure
4.1. The area under the dotted diagonal line in each ROC curve is 0.5 (AUR =
0.5), which corresponds to random guessing. The figure clearly shows that the
detection results are very promising and the steganography developed by Tseng
and Pan [107] appeared to be the most difficult to detect. This is consistent
with the claim by Tseng and Pan that their method is an improved version. The
detection results for Pan et al. [82] and Chang et al. [10] methods are nearly
perfect.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
De
tectio
n
10%’s (0.9203)30%’s (1.0000)50%’s (1.0000)80%’s (1.0000)
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
De
tectio
n
10%’s (0.7843)30%’s (0.9930)50%’s (0.9997)80%’s (1.0000)1%’s (0.5205)
(b)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
De
tectio
n
10%’s (0.9182)30%’s (1.0000)50%’s (1.0000)80%’s (1.0000)
(c)
Figure 4.1: Detection results using ROC curves and AUR: (a) detection result forPan et al. [82]; (b) for Tseng and Pan [107]; and (c) for Chang et al. [10]
It is also worth mentioning that the longer the embedded secret message, the more
image distortion it produces. Hence, it is relatively easier to detect a stego image
with a long embedded message than one with a shorter message. This is shown in
Figure 4.1 where the detection accuracy increased as the test moved from a shorter
message (10 per cent) to a longer message (80 per cent). For a message length
greater than 30 per cent, the detection is very accurate. However, if the message
is very short (one per cent), it is very difficult to detect (refer to the yellow line
in Figure 4.1(b)). The main reason for this is the image alteration caused by the
embedding is minimal (e.g., modifying 15 pixels in a 800 × 800 pixel image) and
therefore is not captured to any extent by our features.
68
4.4 Conclusion
Our proposed 48 new dimensional features, used in combination with some mod-
ification of the existing 20-dimensional features, achieve reliable and effective de-
tection of secret messages embedded in binary images. The experimental results
show that the proposed method can detect different lengths with low percentage
of embedded message. In addition, our proposed method can detect more than
one steganographic method, which make it a suitable blind steganalysis for binary
images.
69
Chapter 5
Multi-Class Steganalysis
In general, blind steganalysis is considered two-class classification. This means
that, given an image, the steganalysis should be able to decide the class (cover or
stego) of the image. It is possible to extend blind steganalysis to form a multi-class
steganalysis. From a practical point of view, multi-class steganalysis is similar to
blind steganalysis; however, it can accommodate more classes. The additional
classes come from different types of stego images, produced by different embed-
ding techniques. The task of multi-class steganalysis is to identify the embedding
algorithm applied to produce the given stego image or, if no embedding has been
performed on the image, it should be classified as a cover image.
In [87], Pevny and Fridrich extended the blind steganalysis developed in [35] to
form a multi-class steganalysis. Their multi-class steganalysis can classify embed-
ding algorithms based on the given JPEG stego images. Rodriguez and Peterson
[95] studied a different multi-class steganalysis for JPEG images. In [95], the ex-
tracted features are based on wavelet decomposition and a SVM is employed as
the classifier. The most recent work is the technique developed by Dong et al. [31].
The main contribution of this multi-class steganalysis is its ability to carry out
classification in two different image domains—the frequency domain (e.g., JPEG
images) and spatial domain (e.g., BMP images). Other multi-class steganalysis
approaches can be found in [88, 97, 108] and all were developed to counter JPEG
image steganography.
Note that these multi-class steganalysis techniques are for images with at least
eight bits per pixel intensity. This means that the images can be greyscale, colour
or true colour images. It is not clear how the existing multi-class steganalysis can
be generalised for black and white binary images. Unlike greyscale and colour
70
images, binary images have a rather modest statistical nature. This makes it
difficult to apply the existing multi-class steganalysis techniques on binary images.
To the best of our knowledge, there is no multi-class steganalysis proposed for
binary images in the literature.
In this chapter, we propose a multi-class steganalysis for binary images. The
work in this chapter is based on an extension of our previously developed blind
steganalysis for binary images (Chapter 4). There are three main contributions of
this chapter. First, we incorporate additional new features to our existing feature
sets. Second, the concept of cover image estimation is incorporated to enhance the
feature sensitivity. Third, a new multi-class steganalysis technique is developed.
Consequently, we are able to assign a given image to its appropriate class. This will
provide valuable information for steganalysts (e.g., forensic examiners) towards the
goal of extracting hidden messages.
The remainder of this chapter is as follow. In the next section, we summarise the
steganographic methods under analysis. The method of analysis we apply is given
in Section 5.2. The construction of multi-class classifier is discussed in Section
5.3. Section 5.4 presents the experimental results of the analysis and Section 5.5
concludes the chapter.
5.1 Summary of the Steganographic Methods un-
der Analysis
This chapter analyses five different types of steganography. These steganographic
methods have been elaborated concisely in Section 3.1. In this section, we briefly
summarise the methods and focus on their embedding algorithms. All the stegano-
graphic methods were developed to embed secret messages in binary images.
The first three methods under analysis are from the work developed in [82, 107,
10]. These methods are all variants of block-based steganography. To perform
embedding, a given binary image will be partitioned into non-overlapping blocks.
The message bits are divided into a stream of r bits before being embedded in
the block. Two sets of matrices, the random binary matrix and secret weight
matrix (the method in [10] uses the serial number matrix instead) are used to
determine which pixels should be flipped when necessary. The two matrices are
shared between the sender and receiver as the stegokey.
71
The steganography developed in [69] is considered boundary-based steganography.
This type of steganography will hide a message along the edges where white pixels
meet black ones—these pixels are known as boundary pixels. To obtain higher
imperceptibility, the locations of pixels used for embedding are permuted and
distributed over the whole image. The permutation is controlled by a PRNG
whose seed is a secret shared by the sender and the receiver.
Not all boundary pixels are suitable for carrying message bits because embedding
a bit into an arbitrary boundary pixel may convert it into a non-boundary one.
This will jeopardise the extraction and cause the recovery of the hidden message
becomes impossible. Because of these technical difficulties, some improvements
were developed by adding restrictions on the selection of boundary pixels for em-
bedding.
The last steganography under our analysis is that developed by Wu and Liu in
[112]. This technique also starts by partitioning a given image into blocks. The
odd-even relationship of the pixels within a block is adjusted to hold the message
bit. Clearly, when this odd-even relationship holds for the message bit to be
embedded then no alteration is required. Otherwise, some pixels are selected and
altered to adjust the odd-even relationship. Moreover, a flippability scoring system
is constructed to ensure the pixel selection for alteration is efficient.
5.2 Proposed Steganalysis
The ultimate goal of steganalysis is to extract the full hidden message. This task,
however, may be very difficult to achieve. Thus, we may start with more realistic
and modest goals, such as identifying the type of steganographic technique used
for the embedding. We want to improve our existing technique so that we can
identify the embedding algorithm.
To do this, we propose a multi-class steganalysis. Multi-class steganalysis can be
viewed as a supervised machine learning problem where we want to determine
the class of a given image. Our analysis includes feature extraction and data
classification stages. First stage is crucial and we show how to construct the
existing and new features in this section. The second stage uses the SVM [23] to
construct the multi-class classifier. We will describe the multi-class classifier in
detail in Section 5.3.
72
Figure 5.1: Pixel difference in vertical direction
5.2.1 Increasing the Grey Level via the Pixel Difference
The number of grey levels is insufficient for an analysis of black and white images
where the number of grey levels is drastically reduced (note that in greyscale and
colour images, there are at least 256 grey levels). To resolve this technical difficulty,
we propose a solution that allows us to create more grey levels and consequently
more meaningful statistics. Our approach is to use the pixel difference.
The pixel difference is the difference between a pixel and its neighbouring pixels.
Given a pixel p(x, y) of an image, with x ∈ [1, X ] and y ∈ [1, Y ], where X and Y
are the image width and height, respectively, the vertical difference for the pixel
p(x, y) in the vertical direction is defined as follows:
pv(x′, y′) = p(x, y + 1)− p(x, y), (5.1)
where x′ ∈ [1, X − 1] and y′ ∈ [1, Y − 1]. The pixel differences for the horizontal,
main diagonal and minor diagonal directions are defined similarly. Figure 5.1
illustrates the pixel difference in the vertical direction.
It is easy to observe and been confirmed by experiments that the introduction of
the pixel difference increases (almost doubles) the number of grey levels. To illus-
trate this point, consider a greyscale image with 256 grey levels. After introducing
the pixel difference, the range of grey levels becomes [−255,+255]. The same dou-
bling effect happens for binary images. This effect is desirable for resolving the
technical difficulty mentioned before.
For this purpose, we incorporated the pixel difference developed in [70]. Their
73
features are defined as below:
pn+1c (x, y) = pn(x, y + 1)− pn(x, y),
pn+1r (x, y) = pn(x+ 1, y)− pn(x, y), (5.2)
p1(x, y) = |p1r(x, y)|+ |p1c(x, y)|, (5.3)
p2(x, y) = p1r(x, y)− p1r(x− 1, y) + p1c(x, y)− p1c(x, y − 1), (5.4)
where n = 0, 1, 2 and | · | represents the absolute value. p1c(x, y) and p1r(x, y)
can be considered as pixel differences in the vertical and horizontal directions,
respectively. p1(x, y) and p2(x, y) are the respective higher-order total differences.
p0(x, y) is a special case and is actually the given binary image.
We further define the pixel difference an order higher as follows:
p3c(x, y) = p2(x, y + 1)− p2(x, y),
p3r(x, y) = p2(x+ 1, y)− p2(x, y). (5.5)
We call these third-order pixel differences. We would like to stress that all the sta-
tistical features we use in our analysis are based on this third-order pixel difference
and can be summarised in the following two stages:
1. In the first stage we use the third-order pixel difference to increase the num-
ber of grey levels. Note that p2(x, y) in Equation (5.4) is obtained by sum-
ming the pixel differences computed in the horizontal and vertical directions.
The doubling effect of the pixel difference increases the grey levels from [0,
1] for p(x, y) to [−4, 4] for p2(x, y). This is not hard to verify. For example,
consider the minimum and maximum of grey levels for p1r are −1 and +1,
respectively. The same is applied to p1c . Hence, we can obtain the minimum
and maximum grey level for p2(x, y). The minimum is obtained when the
neighbouring differences for both p1r and p1c (right components) in Equation
(5.4) are −2, which produces p2(x, y) = −4. The maximum is obtained
when the neighbouring differences for both p1r and p1c are 2, which will pro-
duce p2(x, y) = 4. Finally, using the same concept for the third-order pixel
difference, we can increase the number of grey levels to 17 (i.e., [−8, 8]).
2. In the second stage, we proceed with the computed third-order pixel differ-
74
ence to extract each of the specific feature sets. In other words, a certain
feature set (the feature sets will be discussed in Subsections 5.2.2 and 5.2.3)
is extracted on top of this third-order pixel difference.
5.2.2 Grey Level Run Length Matrix
The first feature set we try to extract is based on the grey level run length (GLRL).
The length is measured by the number of consecutive pixels for a given grey level
g and a direction θ. Note that 0 ≤ g ≤ G − 1 and G is the total number of grey
levels and θ indicates the direction, where 0◦ ≤ θ ≤ 180◦. The sequence of pixels
(at a grey level) is characterised by its length (run length) and its frequency count
(run length value) that tells us how many times the run has occurred in the image.
Thus, our feature is a GLRL matrix that fully characterises different grey runs in
two dimensions: the grey level g and the run length ℓ. The general GLRL matrix
is defined as follows:
r(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ s, y + t) = g;
p(x+ u, y + v) 6= g;
0 ≤ s < u & 0 ≤ t < v;
u = ℓ cos(θ) & v = ℓ sin(θ);
0 ≤ g ≤ G− 1 & 1 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (5.6)
where # denotes the number of elements and p(x, y) is the pixel intensity (grey
level) at position x, y. G is the total number of grey levels and L is the maximum
run length.
For our practical implementation, we simply concatenate p3c(x, y) with p3r(x, y) and
substitute p(x, y) in Equation (5.6). In addition, we observed that it is significant
to use only two directions for θ—0◦ and 90◦. Therefore, the extracted GLRL matri-
ces from the third-order pixel difference can be considered higher-order statistical
features.
5.2.3 Grey Level Co-Occurrence Matrix
We replaced the grey level gap length (GLGL) matrix proposed in our previous
work with the grey level co-occurrence matrix (GLCM). From empirical studies,
75
we found that GLCM performs better in multi-class classifications than GLGL.
GLCM can be considered an approach for capturing the inter-pixel relationships.
More precisely, the elements in a GLCM matrix represent the relative frequencies
of two pixels (with grey level g1 and g2, respectively) separated by a distance, d.
GLCM can be defined as follows:
o(g1, g2, d|θ) = # {(x, y) | p(x, y) = g1;
p(x+ u, y + v) = g2;
u = d cos(θ) & v = d sin(θ);
0 ≤ g1, g2 ≤ G− 1 & 1 ≤ d ≤ D & 0◦ ≤ θ ≤ 180◦}, (5.7)
where # denotes the number of elements and p(x, y) is the pixel intensity (grey
level) at position x, y. G is the total number of grey levels and D is the maximum
distance between two pixels.
In our implementation, we substitute p(x, y) in Equation (5.7) with p3c(x, y),
p3r(x, y) and |p3c(x, y)| + |p3r(x, y)|. To avoid confusion, we call the resultants
o1(g1, g2, d|θ), o2(g1, g2, d|θ) and o3(g1, g2, d|θ), respectively. Thus, we can obtain
four GLCM matrices from each of o1(g1, g2, d|θ), o2(g1, g2, d|θ) and o3(g1, g2, d|θ).
Each matrix comes from one direction for a total of four directions (0◦, 45◦, 90◦
and 135◦). We set the distance, d to one.
5.2.4 Cover Image Estimation
Cover image estimation is the process of eliminating embedding artefacts1 in a
given image with the objective of getting close to a “clean image”. Cover image
estimation was first proposed by Fridrich and known as image calibration [38, 39,
35]. For brevity, consider the following proposition:
Let Ic and Is represent the cover image and stego image, respectively.
If∑
|Ic − I ′c| <∑
|Is − I ′s|, then
φ(Ic)− φ(I ′c) < φ(Is)− φ(I ′s), (5.8)
where I ′c and I ′s are the estimated cover images from Ic and Is, respec-
tively. I − I ′ is the pixel-wise difference between two same resolution
1Embedding artefact refers to any alteration or mark introduced by embedding.
76
images. | · | represents absolute value and φ() indicates the feature
extraction function.
From this proposition, the feature sets extracted from the feature differences (e.g.,
φ(Is) − φ(I ′s)) can be considered as the differences caused by the embedding op-
eration, as long as the relationship holds. This is desired because we want to
have feature sets that are sensitive to the embedding artefacts and invariant to
the image content.
We chose an image filtering approach to cover image estimation. There are several
alternative image filters and, from empirical studies, we found that the Gaussian
filter produces the best results. Three parameters must be determined to use
this filter: standard deviation of Gaussian distribution (σ) and the distances for
horizontal and vertical directions (dh and dv, respectively). We determined σ =
0.6, dh = 3 and dv = 3 from trial and error, which gives us the optimum solution.
5.2.5 Final Feature Sets
It is very computationally expensive to use all elements in GLRL and GLCM ma-
trices as the feature elements. Therefore, we propose to simplify them by trans-
forming the two-dimensional GLRL and GLCM matrices into one-dimensional
histograms.
hGLRLg =
L∑
ℓ=1
r(g, ℓ|θ), 0 ≤ g ≤ G− 1, (5.9)
where θ = 0◦ and 90◦, and the rest of notations are the same as in Equation
(5.6). We observe that, within a GLRL matrix, there are some high concentration
of frequencies near the short runs, which may be important. Hence, we propose
extracting the first four short runs as a single histogram, hsrα
g .
hsrα
g = r(g, α|θ), 0 ≤ g ≤ G− 1, (5.10)
where θ = 0◦ and 90◦ and α = 1, 2, 3, 4 are the selected short runs.
The one-dimensional histogram of the GLCM matrix, hGLCMηg can be obtained in
a similar manner as in Equation (5.9), which is defined as follows:
hGLCMη
g =
∑G−1g1=0 oη(g1, g, d|θ) +
∑G−1g2=0 oη(g, g2, d|θ)
2, 0 ≤ g ≤ G− 1,(5.11)
77
where η = 1, 2, 3 and θ = 0◦, 45◦, 90◦ and 135◦. d = 1 and the rest of notations
are the same as in Equation (5.7).
As noted before, multi-class steganalysis can be considered a multi-class classifica-
tion, so the extracted feature sets must be sensitive to embedding alterations—the
feature values should be very distinctive. The larger the differences across the dif-
ferent classes, the better the features. Hence, we apply the characteristic function,
CF to each of the histograms to achieve better discrimination. The characteristic
function can be computed by a discrete Fourier transform, as shown in Equation
(5.12).
CFk =N−1∑
n=0
hne− 2πi
Nkn, 0 ≤ k ≤ N − 1, (5.12)
where N is the vector length, i is the imaginary unit and e−2πiN is a Nth root of
unity.
For each of characteristic function (one for each histogram), we compute the mean,
variance, kurtosis and skewness. Except for the characteristic function of the four
hGLCMηg histograms (i.e., for the four directions) shown in Equation (5.11), we
compute the average of these characteristic functions. Based on the averaged
characteristic function, we compute its mean, variance, kurtosis and skewness.
We include another four statistics for each of the computed GLCM matrices, as
discussed in Subsection 5.2.3. These four statistics can be defined as follows:
contrast =∑
g1
∑
g2
|g1 − g2|2o(g1, g2), (5.13)
energy =∑
g1
∑
g2
o(g1, g2)2, (5.14)
homogeneity =∑
g1
∑
g2
o(g1, g2)
1 + |g1 − g2|, (5.15)
correlation =∑
g1
∑
g2
(g1 − µg1)(g2 − µg2)o(g1, g2)
σg1σg2
, (5.16)
where µg1 and µg2 are the means of o(g1, g2), whereas σg1 and σg2 are the standard
deviations of o(g1, g2).
We form a 100-dimensional feature space as summarised in Table 5.1.
2averaged from four directions
78
Table 5.1: Respective feature sets and the total number of dimensions for each set
Number Number CF Mean, Variance, Total
of of Kurtosis and DimensionsDirection Matrix Skewness
hGLRLg 2 1 applied 4 8
hsrα
g 2 4 applied 4 32
hGLCMηg 12 3 applied 4 12
contrast 4 3 – – 12
energy 4 3 – – 12
homogeneity 4 3 – – 12
correlation 4 3 – – 12
5.3 Multi-Class Classification
As stated in Section 5.2, the second stage of our proposed steganalysis is multi-
class classification. We have chosen the SVM as our multi-class classifier. We
start this section by explaining the general terminology of two-class SVM classifi-
cation and then show how to generalise the two-class classification into multi-class
classification using SVM.
SVM can be considered a classification technique that can learn from a sample.
More precisely, we can train the SVM to recognise and assign labels (classes) based
on the given data collection (using features). For example, we train the SVM to
differentiate a cover image (class-1 ) from a stego image (class-2 ) by examining
the extracted features from many instances of cover images and stego images.
The SVM finds the separating line and determines the cluster in which an unknown
image falls. Finding the right separating line is crucial and this is what the training
accomplishes. In practice, the feature dimensionality is higher and we need a
separating plane instead of line. This is known as separating hyperplane.
The goal of SVM is to find a separating hyperplane that effectively separates
classes. To do that, the SVM will try to maximise the margin of the separat-
ing hyperplane during training. Obtaining this maximum-margin hyperplane will
optimise the ability of the SVM to predict the class of an unknown object (image).
However, there are often non-separable datasets that cannot be separated by a
straight separating line or flat plane. The solution to this difficulty is to use a
79
Table 5.2: Example of majority-voting strategy for multi-class SVM
class-1 class-2 class-3
SVM-a 0 1 0
SVM-b 1 0 0
SVM-c 0 1 0
Total Votes 1 2 0
kernel function. The kernel function is a mathematical routine that projects the
features from a low-dimensional space to a higher dimensional space. Note that
the choice of kernel function affects the classification accuracy. For further reading
of SVM, readers are referred to [80].
Although the nature of SVM is two-class classification, it is not hard to generalise
the SVM to handle multiple classes. Several approaches can be used, including
one-against-one, one-against-all and all-together. According to the recommenda-
tions given in [53], one-against-one provides the best and most efficient classifica-
tions.
Here, therefore, we will be using one-against-one and we discuss only this approach.
For other approaches and the details of comparison, readers are referred to [53].
For a multi-class SVM based on the one-against-one approach, K(K − 1)/2 two-
class SVMs are constructed. Each of these SVM classifiers is assigned to classifica-
tion of a non-overlapping pair of classes (which means there are no two pairs with
the same combination of classes). After completing all two-class classifications, a
majority-voting strategy determines the final class of an object. With majority
voting, the class receiving the most votes during the classification processes is con-
sidered the correct class. If there are two or more classes with the same number
of votes, one is chosen arbitrarily.
Consider the following example. Suppose we have class-1, class-2 and class-3. We
can construct three two-class SVMs—SVM-a classifying classes-1 and -2, SVM-b
classifying classes-1 and -3 and SVM-c classifying classes-2 and -3. Assume that,
given an image, each of the two-class SVM classification results can be obtained,
as tabulated in Table 5.2. From the table, the given image can be identified as
belonging to class-2 because it received the highest number of votes (typeset to
bold).
80
Table 5.3: Summary of image databases
Database Total Images Resolution Image Type
Textual 659 200 dpi Textual
Mixture 659 200 dpi Textual and Graphic
Scene 1338 72 dpi Scene
5.4 Experimental Results
5.4.1 Experimental Setup
To cover a wider range of images, we constructed three image databases. The first
database consists of 659 binary images as cover images. The images are all textual
documents with white background and black foreground. The image resolutions
are all 200 dpi and with image size of 800 × 800. The second image database
also consists of 659 binary images as cover images. These images have the same
properties as the first image database; however, we added some graphics (i.e.,
cartoons, clipart and some random shapes) randomly positioned in each textual
document. In the third image database, we constructed 1338 binary images from a
greyscale images using Irfanview version 4.10 freeware. These images are actually
converted from natural images. Their resolution is 72 dpi and with image size of
512× 384.
Overall, we constructed 2656 cover images. The image databases are summarised
in Table 5.3. For brevity, we will name the image databases textual, mixture and
scene databases, respectively.
As discussed in Section 5.1, we used five different steganographic techniques to
generate different types (classes) of stego images. Due to the different embedding
algorithm in each technique, the steganographic capacity produced also varies sig-
nificantly. Hence, to obtain a fair comparison we opt to use the absolute stegano-
graphic capacity, which can be measured as bits per pixel (bpp). Since a binary
image only has one bit per pixel, we can think of bpp as the average number of bits
per image. For example, 0.01 bpp embedding means that, for every 100 pixels,
only one pixel is used to carry message bits. 0.01 bpp is significantly small, which
also means there is less distortion in the image. This implies that the produced
stego image is relatively secure and harder for steganalysis to detect.
81
Table 5.4: Summary of stego image databases
Database Total Number of 0.003 bpp 0.006 bpp 0.01 bpp Total
Images Steganography Images
Textual 659 5 1 1 1 9885
Mixture 659 5 1 1 1 9885
Scene 1338 5 – – 1 6690
To verify the effectiveness of our proposed multi-class steganalysis, we constructed
stego images with capacities of 0.003, 0.006 and 0.01 bpp for each steganography
approach. Every stego image in the experiment is embedded with randomly gener-
ated message bits. This means that we used relatively huge stego image databases
of 26460 images (as summarised in Table 5.4).
We will extract the feature sets for each image from all the image databases
mentioned above using the feature extraction methods proposed in Section 5.2.
These feature sets will serve as the inputs for the multi-class SVM. As describes in
Section 5.3 we will use the one-against-one approach to construct our multi-class
steganalysis. We use the SVM implemented in [9] and follow the recommendations
given in [53] for using radial basis function (RBF) as the kernel function and
the optimum parameters for the SVMs were determined through a grid-search
tool from [9]. We dedicated 80 per cent of each image database for training the
classifiers and the remaining 20 per cent is used for testing. The prototype is
implemented in Matlab R2008a.
5.4.2 Results Comparison
To simplify the presentation, we abbreviate the five steganographic methods as
PCT, TP, CTL, LWZ and WL for [82], [107], [10], [69] and [112], respectively.
Our multi-class steganalysis classification results are displayed in a table format
called a confusion matrix. In the confusion matrix, the first column consists of the
classes, which include one cover image class and five different classes of stego image
(i.e., each class of the stego image is produced by each of the five steganographic
methods discussed). The value within brackets beside each class indicates the
embedded capacity. For the cover image class, there is no embedding and we can
consider the embedded capacity as zero (0.0 bpp). The first row of the confusion
matrix indicates the class of a given image.
82
We separated the three databases into three confusion matrices—Tables 5.5, 5.6
and 5.7. To better illustrate the results, we typeset the desired results in bold.
In other words, the correct classification results are aligned along the diagonal
elements within each confusion matrix.
From the confusion matrices, we clearly see that the multi-class steganalysis gives
very promising results. Especially in Table 5.5, the detections are nearly perfect.
The results obtained for the mixture image database (Table 5.6) are accurate
although slightly less than the results obtained in Table 5.5. The results for
the scene image database (Table 5.7) appear to be the least accurate; however,
the detection reliability is good and all the detection results show at least 80
per cent accuracy. Note that the type of cover image used affects the detection
accuracy, which means it is relatively easier to detect images with textual content
than images with natural scenes. This observation is supported by the detection
accuracy order (where the results in Table 5.5 are the best, followed by the results
in Table 5.6 and lastly the results in Table 5.7).
We attribute this phenomenon to the fact that the textual content in an image
has periodic patterns that are uniform and consistent. However, an image with
scene content has fewer fixed patterns and may appear more random.
It is also worthwhile mentioning that embedding a longer secret message produces
more distortion in an image. Hence, it is relatively easier to detect a stego image
with a longer embedded message (higher bpp) than with a shorter message (lower
bpp). This is seen by comparing the rows with 0.01 bpp to the rows with 0.003
bpp in the confusion matrices.
5.5 Conclusion
We proposed a multi-class steganalysis for binary images. Our proposed 60-
dimensional feature sets, used in combination with existing 40-dimensional feature
sets, extended from our previous work effectively and accurately classified images
to the appropriate class—one cover image class and five of stego images produced
by different steganographic techniques. We employed the concept of cover image
estimation, which improved the classification. Experimental results showed that
our proposed method can detect low embedded capacity. Further, the experimen-
tal results showed that a detection accuracy of at least 92 per cent can be achieved
with textual or a mixture of textual and graphic images. However, the accuracy
83
decreased slightly, to 80 per cent, in natural scene binary images.
Table 5.5: Confusion matrix of the multi-class steganalysis for the textual database
Classified as
WL(%) LWZ(%) CTL(%) TP(%) PCT(%) Cover(%)
Cover 0.00 0.00 0.00 0.00 0.00 100.00
PCT (0.003 bpp) 0.00 0.00 0.77 0.00 99.23 0.00
TP (0.003 bpp) 0.00 0.00 0.00 100.00 0.00 0.00
CTL (0.003 bpp) 0.00 0.00 100.00 0.00 0.00 0.00
LWZ (0.003 bpp) 0.00 99.23 0.77 0.00 0.00 0.00
WL (0.003 bpp) 96.15 0.77 0.77 1.54 0.00 0.77
PCT (0.006 bpp) 0.00 0.00 0.00 0.00 100.00 0.00
TP (0.006 bpp) 0.00 0.00 0.00 100.00 0.00 0.00
CTL (0.006 bpp) 0.00 0.00 100.00 0.00 0.00 0.00
LWZ (0.006 bpp) 0.00 99.23 0.77 0.00 0.00 0.00
WL (0.006 bpp) 98.46 0.00 0.77 0.77 0.00 0.00
PCT (0.01 bpp) 0.00 0.00 0.00 0.00 100.00 0.00
TP (0.01 bpp) 0.77 0.00 0.00 99.23 0.00 0.00
CTL (0.01 bpp) 0.00 0.00 100.00 0.00 0.00 0.00
LWZ (0.01 bpp) 0.00 100.00 0.00 0.00 0.00 0.00
WL (0.01 bpp) 99.23 0.00 0.77 0.00 0.00 0.00
84
Table 5.6: Confusion matrix of the multi-class steganalysis for the mixturedatabase
Classified as
WL(%) LWZ(%) CTL(%) TP(%) PCT(%) Cover(%)
Cover 0.00 0.77 0.00 0.00 0.00 99.23
PCT (0.003 bpp) 0.00 0.00 2.31 0.77 96.92 0.00
TP (0.003 bpp) 2.31 1.54 0.00 96.15 0.00 0.00
CTL (0.003 bpp) 0.00 0.77 92.31 0.77 6.15 0.00
LWZ (0.003 bpp) 1.54 96.15 0.77 1.54 0.00 0.00
WL (0.003 bpp) 96.92 1.54 0.77 0.77 0.00 0.00
PCT (0.006 bpp) 0.00 0.00 2.31 0.00 97.69 0.00
TP (0.006 bpp) 0.00 0.77 0.00 99.23 0.00 0.00
CTL (0.006 bpp) 0.00 0.00 99.23 0.00 0.77 0.00
LWZ (0.006 bpp) 0.00 99.23 0.00 0.00 0.77 0.00
WL (0.006 bpp) 98.46 0.00 0.00 0.00 0.77 0.77
PCT (0.01 bpp) 0.00 0.00 1.54 0.00 98.46 0.00
TP (0.01 bpp) 0.00 0.00 0.00 100.00 0.00 0.00
CTL (0.01 bpp) 0.00 0.00 99.23 0.00 0.77 0.00
LWZ (0.01 bpp) 0.00 99.23 0.00 0.00 0.77 0.00
WL (0.01 bpp) 99.23 0.00 0.00 0.00 0.77 0.00
Table 5.7: Confusion matrix of the multi-class steganalysis for the scene database
Classified as
WL(%) LWZ(%) CTL(%) TP(%) PCT(%) Cover(%)
Cover 2.26 5.26 1.88 8.65 0.38 81.58
PCT (0.01 bpp) 0.76 0.38 4.51 0.00 93.99 0.38
TP (0.01 bpp) 11.65 3.01 0.38 80.08 0.00 4.89
CTL (0.01 bpp) 1.13 1.50 89.10 0.00 3.76 4.51
LWZ (0.01 bpp) 1.88 91.35 0.38 2.26 0.38 3.76
WL (0.01 bpp) 85.34 2.26 0.38 10.90 0.38 0.75
85
Chapter 6
Hidden Message Length
Estimation
The field of information hiding has two facets. The first relates to the design
of efficient and secure data hiding and embedding methods. The second facet,
steganalysis, attempts to discover hidden data in a medium. Under ideal circum-
stances, an adversary who applies steganalysis wishes to extract the full hidden in-
formation. This task, however, may be very difficult or even impossible to achieve.
Thus, the adversary may start steganalysis with more realistic and modest goals.
These could be restricted to finding the length of hidden messages, identification
of places where bits of hidden information have been embedded, estimation of the
stegokey and classification of the embedding algorithms. Achieving some of these
goals enables the adversary to improve the steganalysis, making it more effective
and appropriate for the steganographic method used.
Most works published on steganalysis relate to methods that use colour or greyscale
images. Steganography that uses binary images has received relatively little atten-
tion. This can be partially attributed to the difficulty of applying the statistical
model used for colour and greyscale images and adapting it to the new environ-
ment. In spite of this difficulty, binary images are very popular and frequently used
to store textual documents, black and white pictures, signatures and engineering
drawings, to name a few.
Colour and greyscale images are characterised by a rich collection of various statis-
tical features that have been used to develop new steganalysis techniques. Unlike
colour and greyscale images, binary images have a rather modest statistical na-
ture. In general, it is difficult to convert steganalysis used for colour or greyscale
86
images to an attack on binary images. However, we have successfully adapted
the concepts used for colour or greyscale images and we propose a new collection
of statistical features that estimate the length of hidden message embedded in
a binary image. Consequently, we can decide whether a given image contains a
hidden message. More precisely, we can tell apart the cover images from the stego
images.
We must emphasise that our steganalysis is designed to attack the steganographic
technique developed in [69]. This means that our analysis is a type of targeted
steganalysis. In this work, we define the length of embedded message as the ratio
between the number of bits of the embedded message and the maximum number
of bits that can be embedded in a given binary image. Note that we use the
terms message length, embedded message length and hidden message length as
synonyms.
The organisation of this chapter is as follow. In the next section, we give a brief
summary of the steganographic method we analyse. The technique of analysis
we apply is given in Section 6.2. Section 6.3 presents the results of the analysis.
Section 6.4 concludes the chapter.
6.1 Boundary Pixel Steganography
We briefly introduce the steganographic method under analysis in this section.
Note that this steganographic method [69] is described in detail in Subsection
3.1.1 and is summarised here.
The steganography developed in [69] is a variant of boundary pixel steganography.
This method uses a binary image as the medium for secret message bits. A set of
rules is proposed to determine the data carrying eligibility of the boundary pixels.
This plays an important role in ensuring that embedding produces minimum dis-
tortion and obtaining error free message extraction. In addition, the embedding
algorithm generates no isolated pixels. This method also employs a PRNG to
produce a random selection path for embedding.
As the embedding algorithm modifies only boundary pixels, the visual distortions
are minimal and there is no pepper-and-salt like noise. However, if we take a close
look at an image with an embedded message, we can observe small pixel-wide
notches and protrusions near the boundary pixels.
87
We use these small distortions to launch an attack on the steganographic algo-
rithm. In our attack, we first detect the existence of a hidden message and then
estimate its length.
6.2 Proposed Method
We want to propose a steganalysis technique that can counteract the steganog-
raphy developed in [69]. However, given an image, we do not know whether the
image is a cover or a stego image without a priori knowledge. What we can do is
to extract some useful characteristics from the given image. These characteristics
may reveal some estimate of the length of embedded hidden message (i.e., if zero
percent is estimated then the given image is a cover image; if certain nonzero
percentage is estimated then the given image is a stego image).
We first define a statistic by measuring the number of notches and protrusions
in the image. This statistic will reflect the degree of image distortion. Then we
define a numerical value associated with this statistic. Finally, we show that this
numerical value is approximately proportional to the size of the embedded mes-
sage, which enables us to compute an estimate of the embedded message length.
6.2.1 512-Pattern Histogram as the Distinguishing Statis-
tic
For any boundary pixel (as shown in Figure 6.1), we can form a certain pixel
pattern together with its eight neighbouring pixels. Examples of the pattern are
shown in Figure 6.2 (the shaded box represents a pixel value of zero and the white
box represents a pixel value of one). Altogether, 512 patterns can be formed by
the different combinations of black and white pixels in a block of nine pixels;
however, two patterns cannot be used, because they do not contain any boundary
pixels. Clearly, these patterns are formed by either all black or all white pixels.
To simplify our considerations, we assume that there are 512 patterns.
The 512-pattern histogram H(J) tabulates the frequency of occurrence of each
pattern in the given image J . The frequency of occurrence hi for the ith pattern
88
Figure 6.1: Illustration of a boundary pixel in the magnified view on some portionof the ‘n’ character
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
n n
b
n
n n
n n n
Figure 6.2: Examples of the patterns formed by a single boundary pixel (denotedby b) and its eight neighbouring pixels (denoted by n) from a binary image
is given by
hi =M∑
k=1
δ(i, p(k)), (6.1)
where p(k) denotes the kth pattern in the given image, M is the total number of
patterns in the given image and δ is the Dirac delta function (δ(u, v) = 1 if u = v
and 0 otherwise). For brevity, we let H represent H(J) and have
H = {hi | 1 ≤ i ≤ 512}. (6.2)
In this 512-pattern histogram, a cover image has some high-frequency bins (corre-
sponding to the pattern types) and some other bins with low frequency. However,
these bins tend to be flattened1 after embedding (see Figure 6.3 for example).
This happens because, during embedding, some image pixels are flipped to carry
message bits, which disturbs the inter-pixel correlation. This is reflected in the
pattern changes.
The longer the embedded message, the flatter the 512-pattern histogram. From
1By flattened we mean some of the local maxima in the histogram will decrease and some ofthe local minima in the histogram will increase.
89
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
500
1000
1500
2000
2500
3000
Selected Bins from the 512 Pattern Histogram
Fre
quen
y
Cover ImageStego Image (80%)
Decrement
Decrement
Increment Increment
Figure 6.3: Some of the bins from the 512-pattern histogram are selected to illus-trate the comparison between a cover image and stego image (embedded with 80per cent of the message length). Note that some of the bins are flattened in thestego image
this observation, we propose to compute the histogram difference to capture the
“flatness” of the 512-pattern histogram. The histogram difference is the bin-wise
absolute difference between the 512-pattern histograms for two images. The first
histogram is from the given binary image and the second is from the same image
after it been re-embedded with a random message of the maximum length (100 per
cent). The re-embedding operation uses the steganographic technique described
in Section 6.1.
The following equation defines the second histogram:
H ′ = {h′i | 1 ≤ i ≤ 512}, (6.3)
where h′i is the corresponding frequency of occurrence for the ith pattern in the
same image that has been re-embedded with 100 per cent of the length of a random
message. Then the histogram difference can be written as follows:
HD = {|hi − h′i| | 1 ≤ i ≤ 512}, (6.4)
where | · | represents the absolute value.
We choose to calculate the histogram difference because using the 512-pattern
histogram for the given image directly is insufficient. The 512-pattern histogram
for a given image does not fully represent the embedding artefacts and may be
biased by the image content. It would be easier if we had both the cover and
stego versions of the images—the differences between these two images would be
90
only from the embedded message. However, under normal circumstances we do
not have both versions. Therefore, it is useful to work backward by determining
how many remaining boundary pixels can be used for embedding, which gives an
estimate the number of message bits that have (or have not) been embedded. This
explains why we opted to use re-embedding to obtain the histogram difference.
We use Figure 6.4 to illustrate our considerations. Figure 6.4(a) shows two binary
images that are slightly different but their pattern histograms (Figure 6.4(b)) are
entirely different. However, as shown in Figure 6.4(c), the histogram differences for
the respective bins of the two binary images are (almost) identical. This argument
supports the use of the histogram difference.
6.2.2 Matrix Right Division
To allow the histogram difference to measure the embedded message length, we
propose using matrix right division. Matrix right division can be considered a
transformation of a histogram to a numerical value (one-dimensional metric). Al-
ternatively, matrix right division can be seen as an attempt to solve an appropriate
system of linear equations.
The matrix right division used is from the standard MATLAB R2007b built-in
matrix division function and defined as follows:
If A is a non-singular and square matrix and B is a row vector, then
x = B/A is the solution to the system of linear equations x × A = B
computed by Gaussian elimination with partial pivoting.
If x×A = B is an over-determined system of linear equations, then x =
B/A is the solution in the least squares sense of the over-determined
system.
In general, when A is non-singular and square, the system has an exact solution
given by x = B × A−1 where A−1 is the inverse of matrix A. Hence, the solution
can be computed by multiplying the row vector B with the inverse of matrix A. It
is also defined as multiplication with the pseudo inverse (refer to [58, 45] for details
of the pseudo inverse). However, a solution based on a matrix inverse is inefficient
for practical applications and may cause large numerical errors. A better approach
is to use matrix division.
For an over-determined system of equations, it is impossible to compute the inverse
91
(a)
1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
350
400
450
Selected Bins from the 512 Patterns Histogram
Fre
quen
cy
Image # 1Image # 2
(b)
1 2 3 4 5 6 7 8 9 100
100
200
300
400
500
600
700
800
Selected Bins from the 512 Histogram Difference
Fre
quen
cy
Image # 1Image # 2
(c)
Figure 6.4: (a) Two sample binary images. (b) Some bins from the 512-patternhistogram of the binary images shown in (a). (c) The respective bins in thehistogram difference of the binary images shown in (a).
92
of matrix A. However, a solution for this system can be computed by minimising
the number of elements in r = x × A − B (also known as the Euclidean length).
This can be computed in matrix division, which corresponds to the sum of squares
of r. This yields a solution in the least squares sense. In our application of matrix
right division, matrix A is actually a row vector of the same length as B.
It is reasonable to consider a pattern histogram as a row vector; since the his-
togram difference is the bin-wise absolute difference between two 512-pattern his-
tograms, the histogram difference can be considered a row vector as well. Thus,
we can perform matrix right division between the histogram difference and the
512-pattern histogram of a given binary image, as in Equation (6.5). We call the
resulting numerical value a histogram quotient hq. However, the division is not an
element-wise division.
hq = HD/H. (6.5)
We illustrate matrix right division in the following examples:
Example 1 Example 2
x =[
2 4 8]
x =[
2 4 12 9 18 6]
y =[
1 2 4]
y =[
1 2 6 3 6 2]
x/y = 2 x/y = 2.5444
In Example 1, all elements of x are a multiple of two of the vector y. Thus, the
quotient matrix right division is two. In Example 2, the first three elements of x
are products of the first three elements of y by two. The last three elements of x
are products of last three elements of y by three. Thus, the quotient matrix right
division is 2.5444, which is between two and three.
6.2.3 Message Length Estimation
In this subsection, we select a binary image to demonstrate the consistent be-
haviour of the histogram quotient. We re-embedded it with a five per cent incre-
ment of the message length and observed the response of the histogram quotient.
As shown in Figure 6.5, the respective histogram quotient increases almost lin-
early until a certain point, where a further increase the length of the re-embedded
message does not increase the histogram quotient. This shows that the desired
93
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Length of Re−embedded Message (%)
hist
ogra
m q
uotie
nt
Figure 6.5: Histogram quotient with a five per cent increment in the re-embeddedmessage length
consistency can be obtained by using a histogram quotient based on the histogram
difference. Therefore, as discussed in Subsection 6.2.1, finding the corresponding
difference between the two circles shown in Figure 6.5 proves crucial and provides
us a strong trait for estimating the embedded message length.
In short, our proposed method first identifies all boundary pixels in a given binary
image. The boundary pixel used here is defined as a pixel that has at least one
neighbouring pixel (among the four neighbouring pixels) with a different pixel
value. Then the 512-pattern histogram will be obtained from these boundary
pixels. Based on this pattern histogram, the histogram difference is computed and
the histogram quotient is calculated (denoted hq in Equation (6.5)). Finally, we
employ linear interpolation to obtain an approximate constant of proportionality
c such that hq ≈ c × ℓ, where ℓ is the message length. Then, for any particular
value of hq, we can compute an estimate of ℓ using ℓ ≈ hqc.
6.3 Experimental Results
6.3.1 Experimental Setup
The experimental settings are described below:
❐ The embedding algorithm used to create the stego images is the steganog-
raphy proposed in [69].
❐ The total embeddable pixels per image produced by this embedding algo-
rithm is about 25 per cent of the total boundary pixels.
❐ The maximum message length (100 per cent length) is defined as the total
94
0 100 200 300 400 500 600
0
20
40
60
80
Image #
Est
imat
ed M
essa
ge L
engt
h (%
)
Figure 6.6: Estimated length of hidden messages for all binary images
number of embeddable pixels per image.
❐ Eight sets of stego images (i.e., 10, 20, 30, 40, 50, 60, 70 and 80 per cent)
are created from 659 binary cover images.
❐ The cover images are all textual documents with a white background and
black foreground.
❐ The resolution of all binary images is 200 dpi and with image size of 800×800.
❐ The prototype is implemented in Matlab R2008a.
6.3.2 Results of the Estimation
From the 5931 mixture of cover and stego images, we estimate the length of the
embedded message and compare it with the actual embedded lengths of 0, 10 ,
20, 30, 40, 50, 60, 70 and 80 per cent, using our proposed method. Zero per cent
represents a cover image.
The estimation results are shown in Figure 6.6. The estimated lengths are very
close to the actual lengths. The estimations for large embedded messages, such
as 80 per cent are not as close as those of other estimations, although they retain
good accuracy. At such a high percentage, some stego images are quite distorted
and the pixels exhibit a high degree of randomness. We believe this randomness
causes slight instability of our proposed method; however, this phenomenon does
not pose a serious problem because we can easily spot the embedding artefacts
in such a highly distorted stego image (Figure 6.7 shows a highly distorted stego
image).
Table 6.1 summaries the mean and standard deviation of all the estimated message
lengths according to the actual embedded message lengths. The average value for
95
Figure 6.7: Example of a highly distorted stego image embedded with 80 per centof the message
Table 6.1: Mean and standard deviation of the estimation
Length (%) Mean Standard Deviation
0 −0.0277 1.8761
10 9.8540 1.8034
20 19.8438 1.5966
30 29.9271 1.4337
40 39.9608 1.3445
50 50.0210 1.2869
60 60.0763 1.2747
70 70.3436 1.7666
80 79.9598 2.0547
each estimated length is very close to the actual length. The standard deviation is
also very small—only about one or two per cent. This implies that the estimated
lengths do not deviate much from the actual lengths.
The estimation errors are displayed in Figure 6.8. The estimation error for each
binary image is computed as the difference between the estimated and the actual
embedded message length in percentage terms. The estimation errors are rela-
tively low and concentrated around 0.00 per cent. The highest estimation error is
occasionally found and only about 6.00 per cent, except that one outlier has an
error of 7.43 per cent.
96
0 100 200 300 400 500 600−10
−8
−6
−4
−2
0
2
4
6
8
10
Image #
Est
imat
ion
Err
or (
%)
Cover 10% 20% 30% 40% 50% 60% 70% 80%
Figure 6.8: Estimation error of hidden message length for all binary images
6.4 Conclusion
The method proposed in this work can detect the steganography developed in
[69] and estimate the length of the embedded message. In this estimation, we
first build the 512-pattern histogram from a binary image as the distinguishing
statistic. From this 512-pattern histogram, we compute the histogram difference
to capture the changes caused by the embedding operation. Performing matrix
right division creates a histogram quotient. Based on this histogram quotient,
the length of the embedded message is estimated. We used a significantly large
image database, consisting of 5931 binary images (one set of cover images and
eight sets of stego images) to test the proposed method. From the experiment
results, we conclude that our proposed method effectively estimated the hidden
message length with low estimation error.
We observe that it is insufficient only using a set of rules to ensure suitable data-
carrying pixels because the notches and protrusions produced from embedding
still can be utilised to mount an attack. To alleviate this shortcoming in the
steganography, we suggest to incorporate an adaptive pixel selection mechanism
for the identification of suitable data-carrying pixels.
97
Chapter 7
Steganographic Payload Location
Identification
In general, as discussed in Section 3.2, the task of steganalysis involves several dif-
ferent levels of analysis (also considered different forms of attacks). They are the
determination of the existence of a hidden message, classification of the stegano-
graphic methods, finding the length of hidden message, identification of locations
where bits of hidden message have been embedded and retrieval of the stegokey.
Compared to other forms of attack, the identification of locations that carry stego
pixels and retrieval of the stegokey have received relatively less attention in the
literature. These attacks require extracting extra and more information about
the steganography method used. Consequently, they are much more difficult than
attacks that extract only partial information. For instance, the estimation of
the stegokey in the attack given in [42] requires the identification of the hidden
message length.
In this chapter, we develop an attack that identifies the steganographic payload
locations in binary images, where bit-replacement steganography is used. More
precisely, our proposed method will find and locate the pixels in the image used
to carry secret message bits. Note that steganographic payload, hidden data and
message are used interchangeably throughout the chapter.
The remainder of the chapter is structured as follow. In the next section, we pro-
vide the related background. The motivation for this research and main research
challenges are discussed in Section 7.2. The attack is discussed in detail in Sec-
tion 7.3. Section 7.4 gives experimental results for the attack and the chapter is
concluded in Section 7.5.
98
7.1 Background
Some attacks, such as blind steganalysis or stego message length estimation, can
determine if a given image is a stego or cover image. Assume that we have already
determined that the image contains a steganographic payload. The next and quite
natural step is to identify the location of the hidden message.
Because of the invasive nature of steganography, the embedding operation is likely
to disturb the inter-pixel correlations. The embedding operation creates pixels
with high energy, as defined by Davidson and Paul [27]. They developed a method
to measure the energy caused by the embedding disturbance and were able to
identify pixels with high energy that are likely to carry the hidden message bits.
However, their method suffers from high false negatives or missed detections when
some message bits do not change the pixel energy. This occurs when the parity of
the hidden message bits is the same as the parity of the image pixels.
Kong et al. in [68] used the coherence of hue in a colour image to identify a subset
of pixels used to carry the hidden message bits. They observed that, in cover
images (without a hidden message), the coherence of hue varies slowly and tends
to be constant in a small neighbourhood of pixels. This is no longer true when a
hidden message is embedded. Thus, when the hue of a region under examination
exceeds certain threshold, there is a good reason to suspect that it contains bits
of the hidden message. Unfortunately, this analysis only works for steganography
with sequential embedding. If the embedding is random, then this attack fails.
In [62], Ker showed how to use the residual of the weighted stego image to identify
the location of bits of the hidden message. The residual is the pixel-wise difference
between the stego image and the estimated cover image. This analysis requires a
large number of stego images. The only concern is whether it is possible to obtain
different multiple stego images with the payload embedded in the same locations.
Nevertheless, this is plausible when the same stegokey is reused across different
stego images. The author applied a similar concept in a different paper to attack
the LSB-matching steganography and proved to be effective [64].
7.2 Motivation and Challenges
The promising results obtained by the Ker method in analysis of greyscale im-
age steganography motivated us to take a closer look at the analysis and extend
99
the concept to binary image steganography. The Ker method is superior to the
methods developed in [27, 68]. More importantly, the method can be applied for
sequential and random embedding and has low false negatives.
These two advantages are very important. For example, the problem of false
negatives gets worse and becomes critical when a message is encrypted before
embedding. The bits of an encrypted message behave like truly random ones
with uniform probability distribution for zeros and ones. This implies that half
of the time the message bits will match the pixel LSBs, so no change takes place.
Consequently, nothing can be detected. It is well known that sequential embedding
is insecure as it can be easily detected using a visual inspection. Most current
steganographic techniques employ random embedding. Thus, steganalysis is of
limited use if it can only attack sequential embedding.
Let us introduce the ideas used in our work. We follow the conventions used by Ker
in his work [62]. Given a stego image as a sequence of pixels S = {s1, s2, · · · , sn}
and an estimated cover image C = {c1, c2, · · · , cn}. C can be estimated from
the stego image by taking the average of the four connected neighbouring pixels
(linear filtering). n is the total number of pixels. Now we can define a vector of
residuals
ri = (si − si)(si − ci), (7.1)
where si is the ith pixel of the stego image with its LSB flipped.
Assume that we have N multiple stego images. We can define the residual of the
ith pixel in the jth stego image as
rij = (sij − sij)(sij − cij). (7.2)
The mean of the residual of the ith pixel can be computed as follows:
ri· =1
N
N∑
j=1
rij . (7.3)
With a sufficient number of stego images, this mean of residuals will provide
strong evidence that can be used to separate the stego-bearing pixels from non-
stego-bearing pixels.
Although there are some similarities in the embedding algorithms for greyscale
and binary image steganography, the attack on different image types may require
100
a different approach. Unlike a greyscale image, a binary image has a rather modest
statistical nature. This makes it difficult to apply the existing method directly. In
addition, it is clear that the Ker technique offers a high accuracy; however, there
is always a trade-off between the required number of stego images and detection
accuracy.
7.3 Proposed Stego-Bearing Pixel Location Iden-
tification
In this section, we discuss our proposed method for attacking binary stego im-
ages embedded using bit-replacement steganography. Let us first introduce bit-
replacement steganography for a binary image. Given a cover image C =
{c1, c2, · · · , cn} and a stego image S = {s1, s2, · · · , sn} that contains a hidden
message. Since a binary image has only two intensities (black and white), the
embedding operation involves simply flipping the one-bit pixel (i.e. changing the
zeros to ones and vice versa) when the message bit does not match that of the
image pixel. Assume that the order of hidden message bits that are going to be
embedded are randomly permuted. The random permutation is obtained from a
PRNG that is controlled by a stegokey.
Clearly, it is not impossible for an adversary to get access to multiple stego images
that reuse the same stegokey for a batch of covert communications. Although the
content of the secret message and the cover image used may differ every time,
the steganographic payload locations will be the same because they use the same
stegokey.
We begin by adapting the method proposed by Ker in [62]. This includes finding
the vector of residuals and employing multiple stego images to obtain the mean
of the residuals. However, due to the limited characteristic of a binary image,
we need a different approach to estimating the cover image. We choose an image
smoothing approach to achieve binary cover image estimation. Several alternatives
exist; our empirical studies found that a Gaussian filter produces the best results.
The Gaussian filter is defined as follows:
g(x, y) =1
2πσ2e−
x2+y2
2σ2 , (7.4)
where x and y are the window size of the Gaussian filter in the horizontal and
vertical directions, respectively and σ is the standard deviation of the Gaussian
101
distribution.
Once we estimate the cover image, we can compute the vector of residual for each
stego image using Equation (7.1). We compute the mean of residuals, as shown
in Equation (7.3), by employing multiple stego images. The identification of pixel
locations containing a steganographic payload can be carried out by choosing the
M pixels with the highest mean residual ri·. According to the author in [62], M
can be calculated by M = 2nr·· with r·· =1
nN
∑n,Ni,j rij .
However, the estimation ofM for binary images does not always give as accurate an
estimation as for greyscale images. This happens because of the modest statistical
characteristic in a binary image. This becomes more severe when N is small.
To overcome this problem, we propose incorporating the entropy measurement.
Entropy is defined as follows:
E(I) = −
K∑
i=1
pi log2 pi, (7.5)
where I can be the given stego image S or the estimated cover image C. pi
is the probability of the pixel intensity occurrences with a total of K possible
intensities. However, computing entropy for the entire image at once will give us
a global feature. Instead, we use Equation (7.5) to compute the local entropy of
the 3×3 neighbourhood around the ith pixel. We can obtain the local entropy for
every pixel in both the stego image and its corresponding estimated cover image.
Entropy measurement has been widely used as a statistical measure for random-
ness to characterise the content of an image. Note that the embedding operation
will alter the image content, which causes direct changes on the degree of ran-
domness in the image. Thus, incorporating entropy appears to be an appropriate
solution for capturing the embedding artefact.
The next question would be, how do we combine the mean residual with the local
entropy? Firstly, we find the local entropy difference,
di = ǫsi − ǫci , (7.6)
where ǫsi and ǫci are the ith pixel local entropy from the stego image and its
estimated cover image, respectively. Secondly, we employ multiple stego images
to compute the mean of local entropy differences di· by replacing rij in Equation
(7.3) with dij. dij can be obtained in a manner similar to rij in Equation (7.2).
102
Table 7.1: Summary of image databases
Database Total Images Resolution Image Size
Database A 5867 300 dpi 400 × 400
Database B 1338 96 dpi 512 × 384
Database C 2636 200 dpi 400 × 400
Thirdly, we construct two pixel subsets, Sr ⊆ S and Sd ⊆ S, by evaluating the
mean of residuals and the mean of local entropy differences, respectively. The
elements in subset Sr are obtained by 10 per cent more than M pixels with the
highest mean of residual. The 10 per cent is determined empirically and the
aim is to obtain slightly more samples. For the second subset Sd we select those
pixels with a mean local entropy difference exceeding threshold τ . Finally, the
identification of pixels containing the steganographic payload is determined by
Sr ∩ Sd.
7.4 Experimental Results
7.4.1 Experimental Setup
To cover a diverse set of images, we constructed three image databases. The first
image database consists of 5867 binary cover images. These images are cropped
from a set of 4288 × 2848 pixel RAW images captured by a Nikon D90 digital
camera. Then we use the conversion software supplied by the camera manufacturer
to convert the images to TIFF. In the second image database, we constructed 1338
binary images from the image database used in [98]. The third image database
consists of 2636 binary cover images.
The images in the first and second databases are natural scene images. The images
in the third database are textual document images. The cropping operation and
greyscale to binary conversion are carried out with Irfanview version 4.10 freeware.
Overall, we constructed 9841 cover images and the databases are summarised in
Table 7.1. For brevity, we call these Database A, B and C.
We use bit-replacement steganography (as discussed in the first paragraph of Sec-
tion 7.3) to generate stego images from the three image databases for different
message lengths. We generated three message lengths (0.01, 0.05 and 0.10 bpp)
103
for each database. Since a binary image only has one bit per pixel, we can think of
bpp as the average message bits embedded per image. For example, 0.01 bpp em-
bedding means that, for every 100 pixels, only one pixel is used to carry message
bits. We employ uniform distribution of random message bits for the experiments.
For the parameters of our proposed method, we set x = 3, y = 3 (abbreviated as
3× 3) and σ = 0.6 for the Gaussian filter. We also tried several different window
sizes for the Gaussian filter and the results are given in Figure 7.1. Window sizes
of 3× 3 and 5× 5 give the optimum performance. Since the window size of 3× 3
and 5× 5 give about the same accuracy, we chose 3× 3 to reduce the demand for
computational resources. The threshold τ is > 0.05.
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
2x23x34x45x56x6
(a)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
2x23x34x45x56x6
(b)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
2x23x34x45x56x6
(c)
Figure 7.1: Identification results of 0.05 bpp for different window sizes: (a)Database A (b) Database B (c) Database C
7.4.2 Results Comparison
To evaluate the accuracy of the identification, we compared the estimated locations
with the actual set of stego-bearing pixel locations. For each message length, we
show the accuracy of the identification in terms of true positives (abbreviated as
TP), false positives (FP) and false negatives (FN).
We divided the results into three tables—one for each image database (Tables 7.2,
7.3 and 7.4, respectively). The identification accuracy shown in each table is in a
percentage.
The tables show clearly that the proposed method gives very promising results.
Especially in Table 7.2, the identification is nearly perfect for N = 100 and perfect
for N greater than 300 images. Similarly, reliable accuracy is also shown in Table
104
Table 7.2: The accuracy of the stego-bearing pixel location identification for imageDatabase A (* indicates the message length)
# of images, N 100 200 300 · · · > 320
TP (* 0.01bpp) 100 100 100 · · · 100
FP (* 0.01bpp) 0.00 0.00 0.00 · · · 0.00
FN (* 0.01bpp) 0.00 0.00 0.00 · · · 0.00
TP (* 0.05bpp) 99.95 99.99 100 · · · 100
FP (* 0.05bpp) 0.00 0.00 0.00 · · · 0.00
FN (* 0.05bpp) 0.05 0.01 0.00 · · · 0.00
TP (* 0.10bpp) 99.80 99.96 99.99 · · · 100
FP (* 0.10bpp) 0.04 0.00 0.00 · · · 0.00
FN (* 0.10bpp) 0.16 0.04 0.01 · · · 0.00
Table 7.3: The accuracy of the stego-bearing pixel location identification for imageDatabase B (* indicates the message length)
# of images, N 100 200 300 · · · > 820
TP (* 0.01bpp) 99.90 100 100 · · · 100
FP (* 0.01bpp) 0.05 0 0 · · · 0
FN (* 0.01bpp) 0.05 0 0 · · · 0
TP (* 0.05bpp) 99.77 100 100 · · · 100
FP (* 0.05bpp) 0.11 0.00 0.00 · · · 0.00
FN (* 0.05bpp) 0.12 0.00 0.00 · · · 0.00
TP (* 0.10bpp) 99.62 99.86 99.92 · · · 99.98
FP (* 0.10bpp) 0.19 0.07 0.04 · · · 0.01
FN (* 0.10bpp) 0.19 0.07 0.04 · · · 0.01
105
Table 7.4: The accuracy of the stego-bearing pixel location identification for imageDatabase C (* indicates the message length)
# of images, N 100 200 300 · · · > 2600
TP (* 0.01bpp) 84.48 90.33 92.59 · · · 99.31
FP (* 0.01bpp) 1.95 0.94 0.50 · · · 0.06
FN (* 0.01bpp) 13.57 8.73 6.91 · · · 0.63
TP (* 0.05bpp) 86.69 90.84 92.85 · · · 99.48
FP (* 0.05bpp) 3.21 1.91 1.26 · · · 0.09
FN (* 0.05bpp) 10.10 7.25 5.89 · · · 0.43
TP (* 0.10bpp) 85.91 90.41 92.45 · · · 99.40
FP (* 0.10bpp) 4.11 2.35 1.72 · · · 0.13
FN (* 0.10bpp) 9.98 7.23 5.83 · · · 0.47
7.3, except for an embedded message length of 0.10 bpp, where near perfect iden-
tification is achieved for N > 820 images. The identification of stego-bearing pixel
locations for images in Database C (Table 7.4) appeared to be the most difficult
compared to the others. However, the detection reliability is still very good and
all the identifications show at least 84 per cent for N = 100 and more than 90 per
cent when N > 200.
Further analysis reveals that the textual content in image Database C has periodic
patterns that are uniform and consistent across the whole image. This increased
the image entropy significantly in the whole image. Since our method is partly
based on using the local entropy, this interfered with our identification mechanism.
To the best of our knowledge, there is no stego-bearing pixels identification ap-
proach proposed for binary images in the literature. Thus, we compare our pro-
posed method to a general method, where just the residual of weighted stego im-
ages and linear filtering are used. From the results shown in Figures 7.2, 7.3 and
7.4, our proposed method shows better performance. However, with Database C,
the identification results for both methods show only a marginal difference. Figure
7.4 illustrates the difference. This justifies the explanation given in the previous
paragraph about the use of local entropy in textual images and its smaller effect.
106
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(a)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(b)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(c)
Figure 7.2: Comparison of results for image Database A (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(a)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(b)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(c)
Figure 7.3: Comparison of results for image Database B (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(a)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(b)0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
# of images, N
Tru
e P
ositi
ve, T
P (
%)
(c)
Figure 7.4: Comparison of results for image Database C (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp
107
7.5 Conclusion
We successfully proposed a steganalysis technique to identify the steganographic
payload location for binary stego images. This work was motivated by the concept
developed in [62] where greyscale stego images are used. We enhanced and applied
the concept to binary stego images where we propose using Gaussian smoothing to
estimate the cover images. We employ local entropy to improve the identification
accuracy. Experimental results showed that our proposed method can provide
reliable identification accuracy of at least 84 per cent for N = 100 and more
than 90 per cent when N > 200. The experimental results also showed that our
proposed method can provide nearly perfect (≈ 99 per cent) identification for N
as low as 100 in non-textual stego images.
It is important to note that our proposed method will not produce the same
accuracy if only one stego image is available. Although this may seem like a
downside, it is unavoidable. If only one stego image is available, we would not have
sufficient evidence to locate the unchanged pixels where the LSBs already matched
the message bits. As a result, the problem of high false negatives, discussed in
Section 7.2 (second paragraph) would occur.
108
Chapter 8
Feature-Pooling Blind JPEG
Image Steganalysis
From a practical point of view, blind steganalysis is more useful because, if an
image is suspected to carry a secret message, we can use blind steganalysis first to
detect the existence of hidden message. Then, we can carry out further analysis,
such as to identify the steganographic technique used, except in a rare case, where
a priori knowledge of the type of steganography is known (for example, when the
computer of a suspect is confiscated and a certain steganography tool is found in
that computer).
This chapter focuses on blind steganalysis. The analysis is carried out and tested
on greyscale JPEG image steganography. We will study several existing JPEG im-
age blind steganalysis techniques, especially feature extraction techniques. Then
we will select and combine the features to form a pooled feature set.
The rest of the chapter is structured as follows. In the next section, we discuss the
feature extraction techniques. The proposed feature-pooling steganalysis will be
given in Section 8.2. Section 8.3 presents the experimental results and the chapter
is concluded in Section 8.4.
8.1 Feature Extraction Techniques
Feature extraction plays an important role in blind steganalysis. A good feature
should be representative and sensitive to steganographic operations. Moreover,
the feature should be insensitive to image content. In the following subsection,
109
several well known steganalysis techniques are discussed. The emphasis will be on
feature extraction algorithms.
8.1.1 Image Quality Metrics
In [4], the authors proposed and selected a set of ten image quality metrics. These
metrics are the mean absolute error, mean square error, Czekanowski correlation,
angle mean, image fidelity, cross correlation, spectral magnitude distance, me-
dian block spectral phase distance, median block weighted spectral distance and
normalised mean square HVS error.
These metrics are selected based on the one-way ANOVA test. Among these met-
rics, seven metrics are more sensitive in detecting active warden steganography.
The other four metrics test more sensitive in detecting passive warden steganog-
raphy. Active warden steganography is constructed to withstand alterations—
robustness—made by the warden (steganalyst). Robustness is not the main ob-
jective in passive warden steganography, rather it is to conceal the existence of
a secret message to create a covert communication (the description of active and
passive warden is given in Section 2.2). The metric sensitivity is based on the
statistic significance obtained from the ANOVA tests, where the tests are per-
formed on active and passive warden steganography separately.
8.1.2 Moment of Wavelet Decomposition
Lyu and Farid [73] proposed using a higher-order statistic as features that include
the mean, variance, skewness and kurtosis. Two sets of these higher-order statistics
are obtained and result in 72-dimensional features.
The first set is acquired from wavelet decomposition, based on separable quadra-
ture mirror filters. In the decomposition, a given image is decomposed into multi-
ple orientations and scales. Each scale has three orientations—vertical, horizontal
and diagonal subbands. The elements in each subband are called wavelet sub-
band coefficients. The original paper used the first three scales and produced nine
subbands. The mean, variance, skewness and kurtosis of each wavelet subband
coefficient are composed and result in 36 statistics, which are used as the first set
of features.
Next, based on the nine decomposed subbands, the linear predictor of the wavelet
110
coefficients is obtained from the neighbouring wavelet coefficients for each vertical,
horizontal and diagonal subband. The linear relationship for the predictor is
defined as below:
V = Qw, (8.1)
where w is the weight, V is the vertical subband coefficient and Q is the neighbour-
ing coefficients. The error log for the linear predictor is obtained by the following:
E = log2(V )− log2(|Qw|). (8.2)
The same linear predictor and error log is applied for the horizontal and diagonal
subbands. The second set of features is composed from the error logs for all nine
subband coefficients, using the mean, variance, skewness and kurtosis, which yield
another 36-dimensional features.
8.1.3 Feature-Based
In [35], the cover image is estimated from a stego image using calibration. A
set of 20-dimensional features are constructed from the L1 norm of the difference
between the features of the estimated cover image and the stego image. The L1
norm is defined as the sum of the vector absolute values. Among these features,
17 are the first-order features. They are defined as follows:
❐ Global histogram: the frequency plot of quantised DCT coefficients
❐ Individual histogram: low frequency coefficient of individual DCT mode
histogram where five DCT modes are selected
❐ Dual Histogram: frequency of occurrence for a (i, j)-th quantised DCT co-
efficients in an 8 × 8 block equal to a fixed value, d over the whole image
and defined as follows:
gdi,j =
B∑
k=1
δ(d, dk(i, j)), (8.3)
where δ(u, v) = 1 if u = v and 0 otherwise. B is the total of all blocks in
the JPEG image and 11 d values are selected.
The other three second-order features are given below.
111
❐ Variation measures the inter-block dependency and is defined as follows:
V =
∑8i,j=1
∑|Ir|−1k=1 |dIr(k)(i, j)− dIr(k+1)(i, j)|
|Ir|+ |Ic|+
∑8i,j=1
∑|Ic|−1k=1 |dIc(k)(i, j)− dIc(k+1)(i, j)|
|Ir|+ |Ic|, (8.4)
where Ir and Ic are collection of blocks that are scanned by rows and columns,
respectively, throughout the image. d(i, j) is the quantised DCT coefficient
at (i, j)-th position from a 8× 8 block in the kth block.
❐ Blockiness measures the spatial inter-block boundary discontinuity defined
as follows:
Bα =
∑⌊(M−1)/8⌋i=1
∑Nj=1 |p8i,j − p8i+1,j |
α
N⌊(M − 1)/8⌋+M⌊(N − 1)/8⌋+
∑⌊(N−1)/8⌋j=1
∑Mi=1 |pi,8j − pi,8j+1|
α
N⌊(M − 1)/8⌋+M⌊(N − 1)/8⌋, (8.5)
where α = 1, 2 and pi,j is the spatial pixel value. M and N are image
resolution.
Another three final features that make up a total of 23-dimensional features in
[35] are co-occurrence matrices defined as follows:
Cst =
∑|Ir|−1k=1
∑8i,j=1 δ
(
s, dIr(k)(i, j))
δ(
t, dIr(k+1)(i, j))
|Ir|+ |Ic|+
∑|Ic|−1k=1
∑8i,j=1 δ
(
s, dIc(k)(i, j))
δ(
t, dIc(k+1)(i, j))
|Ir|+ |Ic|, (8.6)
where s and t ∈ (−1, 0, 1), which form nine combinations. From these combina-
tions, the three final features are obtained by the difference between Cst of the
estimated cover image and Cst of the stego image.
8.1.4 Moment of CF of PDF
The use of characteristic functions (CF) in steganalysis was pioneered by Harmsen
and Pearlman in [48]. In their work, they assume the stego image histogram is the
convolution between the hidden message probabilistic mass function and the cover
112
image histogram. This is because a steganographic operation can be considered
as noise addition.
The characteristic function is obtained by applying the discrete Fourier transform
to the probabilistic density function (PDF) of an image. From this characteristic
function, the first-order of absolute moment (or centre of mass, in their paper) is
computed and used as the feature. Equation (8.7) shows the calculation of this
moment.
M =
∑Kk=0 k|H [k]|
∑Kk=0 |H [k]|
, (8.7)
where H [·] is the characteristic function, K ∈ {0, . . . , N2− 1} and N is the width
of the domain of the PDF.
8.2 Features-Pooling Steganalysis
This section discusses the proposed feature-pooling method motivated by the fea-
ture selection capability discussed in [55]. The proposed method selects from the
existing sensitive discriminant features and pools them with another two feature
sets from different feature extraction techniques.
8.2.1 Feature Selection in Feature-Based Method
The first set of proposed feature-pooling features is obtained from [35]. The fea-
tures from [35] are selected because they include first- and second-order features
that are sensitive to steganographic operations. In addition, the experiments car-
ried out in Section 8.3.2 proved the efficacy of these features.
The feature selection technique used is the sequential forward floating selection
technique (SFFS) from [93]. As proven by Jain and Zongker in experiments [55],
the use of SFFS dominated other feature selection techniques being tested. We also
tested other selection techniques based on T-test and Bhattacharyya distance. The
experimental results showed the superiority of the SFFS technique. A comparison
of the results is summarised in Table 8.1.
We also compared the efficiencies of the selected feature set (selected through the
SFFS technique) and the original 23-dimensional feature set. The comparison was
113
Table 8.1: Feature selection comparison for SFFS, T-test and Bhattacharyya
SFFS T-test Bhattacharyya
F5 0.86528 0.86063 0.85447
OutGuess 0.84505 0.84185 0.84232
MB1 0.80208 0.78638 0.79566
0 5 10 15 20 250.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
Number of Combined Features
AU
R
Max AUR = 0.86528 at combination of 9 featuresMax AUR = 0.85447 at original 23 features
Figure 8.1: Comparison between the selected features and the original features indetecting F5
made using three steganographic models—the F5, OutGuess1 and model-based
Steganography2 (MB1) from [110], [90] and [96], respectively. The F5, OutGuess
and MB1 steganography are discussed in Section 3.1. The area under the ROC
curve (AUR) is used to evaluate the detection accuracy and is shown in Figures
8.1, 8.2 and 8.3. The higher the AUR, the better the detection accuracy. It can be
clearly seen that the selected feature set performs well—better than the original
23-dimensional feature set for all three steganographic models. The Y-axis in the
graph represents the AUR ranging from 0.5 to 1, and the X-axis is the top nth
ranked number of selected and combined features produced by SFFS. The square
asterisk corresponds to the AUR for the original 23-dimensional feature set and the
circled asterisk is for the selected top nth ranked features. The selected features
form the best feature set with optimum discriminant capability.
8.2.2 Feature-Pooling
Pooling the selected feature set from the SFFS technique in Section 8.2.1 with
two additional feature sets from different feature extraction techniques creates the
1OutGuess steganography with statistic correction.2Model-based steganography without deblocking.
114
0 5 10 15 20 250.65
0.7
0.75
0.8
0.85
0.9
Number of Combined Features
AU
R
Max AUR = 0.84505 at combination of 18 featuresMax AUR = 0.83394 at original 23 features
Figure 8.2: Comparison between the selected features and the original features indetecting OutGuess
0 5 10 15 20 250.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
Number of Combined Features
AU
R
Max AUR = 0.80208 at combination of 17 featuresMax AUR = 0.78451 at original 23 features
Figure 8.3: Comparison between the selected features and the original features indetecting MB1
final feature set for blind steganalysis. The first additional set is extracted from
the image quality metric developed in [4]. The second additional set is from the
feature extraction developed in [48], which is the moment of characteristic function
computed from the image PDF. Both of these feature sets are discussed in Section
8.1.1 and 8.1.4, respectively.
Based on the analysis given in the original paper [4], we chose the four features
assigned for the passive warden steganography case, because the blind steganalysis
that we propose in this research is for passive warden steganography as well.
However, from these four features, we excluded the angle mean feature because
the images tested here are all greyscale images. The contribution of the angle
mean feature will be significant only when colour images are used.
For the next pooled features, the original feature proposed in [48] has only the
first moment, so we increase it to the second and third moments according to the
115
following equation for α ∈ 1, 2, 3:
Mα =
∑Kk=0 k
α|H [k]|∑K
k=0 |H [k]|. (8.8)
Increasing the moment to a higher order does not always improve the result sig-
nificantly, which has been justified well in [109]. Furthermore, in our experiments,
we found that it is sufficient to use only the first three orders.
8.3 Experimental Results
This section shows and analyses the experimental results. First we choose the
optimal classifier for our proposed blind steganalysis and followed by the result of
comparisons with some existing blind steganalysis.
In the construction of the image database, 2037 images of four different sizes
(512×512, 608×608, 768×768 and 1024×1024) were downloaded from [47]. All
images were cropped to obtain the centre portion of the image and were converted
to greyscale images. F5, OutGuess and MB1 were selected as the steganographic
models for creating three different types of stego images. To have a percentage
wise equal number of changes over all images, we define the embedding rate in
term of bits per embeddable quantised DCT coefficient of the cover image for
each. We define the “embeddable coefficients” as the coefficients that can be used
to carry the message bits in each steganographic model. We used four embedding
rates (5, 25, 50 and 100 per cent), resulting in a mixture of 10,185 cover and stego
images in the database. We employ uniform distribution of random message bits
for the experiments and the prototype is implemented in Matlab R2008a.
8.3.1 Classifier Selection
We selected four types of classifiers. There are multivariate regression, Fisher
linear discriminant, support vector machine and neural network. Concise expla-
nations of these classifiers are available in Section 2.4.2. In this section, we com-
pare different classifiers using the same set of the proposed features. The purpose
is to choose the optimal combination of proposed feature set and classifier. To
test the flexibility and consistency of this combination, the same three different
steganographic models were tested.
116
Figures 8.4, 8.5 and 8.6 compare the ROC curves for F5, OutGuess and MB1,
respectively. In each figure, the Y-axis represents the detection rate and X-axis is
the false alarm rate. Each axis ranges from zero to one. The value shown inside the
bracket is the AUR value, indicating the detection accuracy. NN, FLD, SVM and
MR stand for neural network, Fisher linear discriminant, support vector machine
and multivariate regression, respectively. In the comparison for all steganographic
models, classifications using the neural network as the classifier produced the
highest AUR values. This indicates that the combination of the proposed feature
set and the neural network provide optimal blind steganalysis. Thus, our blind
steganalysis is constructed by combining the proposed feature set with a neural
network classifier.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
Det
ectio
n
NN (0.9359)FLD (0.87976)SVM (0.87766)MR (0.75908)
Figure 8.4: Classifier comparison using the proposed features in detecting F5
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
Det
ectio
n
NN (0.91068)FLD (0.86376)SVM (0.89634)MR (0.88408)
Figure 8.5: Classifier comparison using the proposed features in detecting Out-Guess
117
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False AlarmD
etec
tion
NN (0.76213)FLD (0.7232)SVM (0.72565)MR (0.64392)
Figure 8.6: Classifier comparison using the proposed features in detecting MB1
8.3.2 Results Comparison
This section compares the performance of our proposed blind steganalysis to that
of selected existing blind steganalysis. From the constructed image database, 80
per cent of the images are used for training and the remaining 20 per cent are
used for testing. The same steganographic models (i.e., F5, OutGuess and MB1)
are used and the classification for each steganography is carried out separately.
The following blind steganalysis techniques are selected for the detection perfor-
mance comparison:
❐ Image quality metrics are combined with the multivariate regression classifier
[4] (IQM).
❐ Moment of wavelet decomposition is combined with the SVM classifier [73]
(Farid).
❐ Feature-based method is combined with the SVM classifier3 [85] (FB).
❐ Moments of characteristic function of the image PDF is combined with Fisher
linear discriminant classifier [48] (COM).
Figure 8.7 shows the ROC curves and AUR values for our proposed method and
other blind steganalysis techniques at an embedding rate of 25 per cent. From
the best ROC curve at the top left of the graph to the diagonal, the AUR values
are 0.9359 for our proposed method, followed by the FB method at 0.72827. The
Farid method, at 0.53736, is slightly better than the COM and IQM methods at
0.52292 and 0.51072, respectively. Our proposed method outperformed all other
3Although the original paper [35] used Fisher linear discriminant as the classifier, their laterpaper [85] obtain improvement by using SVM.
118
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False AlarmD
etec
tion
PF (0.9359)Farid (0.53736)FeatureBased (0.72827)IQM (0.51072)COM (0.52292)
Figure 8.7: Comparison of steganalysis performance in detecting F5
blind steganalysis techniques in detecting F5.
From Figure 8.8, the steganalysis results in detecting OutGuess show that the FB
method is competitive with our proposed method—the difference in AUR values
is only 0.0243. But our proposed method is better overall and especially with a
lower false alarm rate. This property is desired in an optimal blind steganalysis
because an optimal blind steganalysis should be able to classify correctly with a
low false alarm rate. The other three blind steganalysis techniques do not show
good performance at this low embedding rate and the AUR values are cantered
around 0.51.
Although it is well known that, among the three steganographies (F5, OutGuess
and MB1), OutGuess is relatively easier to detect, we obtained a relatively lower
AUR values compared with that for detecting F5. This is because we are using
bits per embeddable quantised DCT coefficients as the embedding rate, which
reduces the message length embedded in OutGuess. In other words, we are using
a shorter message length in our experiments, which makes it more difficult for a
steganalysis to perform the detection.
Figure 8.9 compares the detection performance in detecting MB1. Again, our
proposed method outperformed all other blind steganalysis techniques at an AUR
value of 0.76213. This exceeds the AUR values of 0.64868, 0.53785, 0.53025 and
0.50423 for FB, Farid, IQM and COM, respectively. All the AUR values in Figure
8.9 are low compared to the AUR values in both Figure 8.7 and 8.8. This finding
is consistent with the finding in [35, 65, 66], which indicated that MB1 is the
hardest to detect.
119
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False AlarmD
etec
tion
PF (0.91068)Farid (0.53844)FeatureBased (0.88637)IQM (0.50571)COM (0.50159)
Figure 8.8: Comparison of steganalysis performance in detecting OutGuess
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm
Det
ectio
n
PF (0.76213)Farid (0.53785)FeatureBased (0.64868)IQM (0.53025)COM (0.50423)
Figure 8.9: Comparison of steganalysis performance in detecting MB1
8.4 Conclusion
In this research, we proposed a feature-pooling method for building a blind ste-
ganalysis feature set. We applied the SFFS technique to selecting the key signif-
icant features from the feature-based method [35] and then combined them with
two additional feature sets: image quality metrics [4] and the modified first three
moments of the characteristic function computed from the image PDF [48]. Based
on these feature sets, we employed a neural network as the classifier to construct
a blind JPEG image steganalysis. From the experimental results, we concluded
that our proposed blind steganalysis showed improvement and dominates the other
tested blind steganalysis techniques.
120
Chapter 9
Improving JPEG Image
Steganalysis
Although the performance of blind steganalysis is often inferior to a targeted one,
its flexibility and wide coverage of different steganographic methods make it an
attractive and practical choice. This chapter focuses on blind steganalysis; specif-
ically we will propose a technique for improving some of the existing steganalysis
techniques. To do that we propose to minimise the image-to-image variations,
which increases the discriminative ability of a feature set. We will illustrate the
efficiency of the proposed method by incorporating it into several existing blind
JPEG image steganalysis techniques. The experimental results presented will ver-
ify the feasibility and applicability of the proposed technique for improving existing
techniques.
The remainder of this chapter is as follow. The next section will model the
steganography artefact as an additive noise. The proposed method is discussed
in Section 9.2 and 9.3. Section 9.4 presents the experimental results. Section 9.5
concludes the chapter.
9.1 Steganography as Additive Noise
Let X denote an instance of a JPEG cover image and let PC(x) denote the prob-
ability mass function of a cover image. In a JPEG image, the probability mass
function can be considered as the frequency count of the quantised DCT coeffi-
cients.
121
The secret message probability mass function can be defined as the distribution
of additive stego noise, which is defined as follows:
PN(n) ≡ P (x′ − x = n), (9.1)
where x and x′ are quantised DCT coefficients before and after embedding, re-
spectively.
Generally, we can divide a cover image used in steganography into two parts, xc
and xs. Part xc is the unperturbed part and normally consists of a group of the
most significant bits. xs is the part that will be altered and used to carry the
secret message. Normally this part contains a group of less significant bits.
Since the additive stego noise is independent of the cover image, perturbing the xs
part by embedding a secret message bit into it is equivalent to the convolution of
the additive stego noise probability mass function and the cover image probability
mass function. This can be expressed by the following:
PS(n) = PN(n) Θ PC(n), (9.2)
where Θ is the convolution and PS(n) is the stego image probability mass function.
9.2 Image-to-Image Variation Minimisation
Defining a discriminative feature set in image steganalysis is a challenging task
because the defined feature set should be optimally sensitive to steganographic
alteration and not to image-to-image variations. Image-to-image variation is de-
fined as the difference between the underlying statistic of one image and that of
another. The underlying statistic can be the histogram distribution of the DCT
coefficients or the pixel intensities. For example, the images shown in Figure 9.1
are obviously different and, therefore, their underlying statistics (histogram dis-
tributions shown below each image) differ. This difference is the image-to-image
variation. In other words, the image-to-image variation is caused by the difference
of the image content.
The question of interest here is how we can categorise these images into either
cover or stego images. It is obvious that there is no consistency in differentiating
these two images as either a cover or stego image by just examining the histogram
distribution because the distribution is rather random and different for each image.
122
Figure 9.1: Two images with their respective underlying statistics
If we apply feature extraction directly to the histogram distribution, then the
extracted feature will have poor discriminative capability because the image-to-
image variation is large.
Ideally, the cover image is presented together with the stego image during ste-
ganalysis detection. We could subtract the stego image S from the cover image C
directly as follows:
N = S − C. (9.3)
Hence, N will be the stego noise and the image-to-image variation is minimal.
The result of the subtraction will be the corresponding stego noise N . Note that
the subtraction can be viewed as pixel-wise subtraction. However, this case is not
typical, and most of the time we have only one version of the image—the cover or
stego image.
As a result, it is reasonable to estimate the cover image from the stego image, so
that we can minimise the image-to-image variation. To demonstrate the efficiency
of our proposed technique, we will apply it to existing steganalysis techniques.
Thus, we propose two different techniques for optimum performance of the respec-
tive existing steganalysis techniques. For the first technique, given two versions
of an image, we will first extract the feature set for each image and compute the
difference between them. For the second technique, we will compute the pixel-wise
difference between the two images and follow it with feature extraction.
123
Figure 9.2: Transformed image by scaling (left) and cropping (right)
The two proposed techniques are defined as follows:
Ψ1 = Φ(υ)− Φ(υ), (9.4)
Ψ2 = Φ(η), η = υ − υ, (9.5)
where υ and υ are the given image (possibly a cover or a stego image) and the
estimated cover image, respectively. The variable η is the additive stego noise
generated by the embedding operation and Ψi is the feature set produced by the
feature-extraction technique, Φ(·). If υ is given as a cover image, then Ψi ≈ 0. If
υ is given as a stego image, then the absolute value of Ψi is always greater than
zero and i = 1, 2.
In cover image estimation, we first decompress the JPEG images to the spatial
domain and apply a transformation to the decompressed images. In our experi-
ments, we employed scaling by bilinear interpolation (shown in the left of Figure
9.2) and cropping four pixels in both the horizontal and vertical directions (shown
in the right of Figure 9.2). We then recompress the transformed image back to
the JPEG domain. In the decompression and recompression processes, we used
the same JPEG image quality as before the transformation to avoid double com-
pression. Since steganography can be modelled as additive noise, the effect of the
transformation can be attributed as noise neutralisation. Hence, the cover image
estimation is reasonable. Similar estimation approaches have proven efficient and
can be found in [35, 61].
124
9.3 Steganalysis Improvement
We will select three existing steganalysis techniques to demonstrate the efficiency
of the proposed technique. In the following subsections, we discuss the incorpo-
ration of the proposed technique in each of the selected steganalysis techniques
separately.
9.3.1 Moments of Wavelet Decomposition
Lyu and Farid [73] proposed using higher-order statistics as features—mean, vari-
ance, skewness and kurtosis. Two sets of these higher-order statistics are obtained,
resulting in 72-dimensional features.
The first set is acquired from wavelet decomposition based on separable quadrature
mirror filters. A total of nine subbands are obtained and the mean, variance,
skewness and kurtosis are computed for each subband. These 36 higher-order
statistics (nine subbands × four higher-order statistics), Ψwkand k = 1, 2, . . . , 36
will be used as the feature set.
The second set of features is obtained from the log error in the linear predictor for
each of the same nine subbands. The four higher-order statistics (mean, variance,
skewness and kurtosis) are computed for each log error, resulting in another 36-
dimensional features, Ψek for k = 1, 2, . . . , 36.
Instead of using the 72-dimensional features extracted directly from the image in
the classification, we use the proposed technique discussed in Section 9.2. Specif-
ically, we improve the feature discrimination capability by employing the second
proposed technique, defined in Equation (9.5). Thus, our improved feature set is
defined in the following equation:
η = υ − υ,
Ψwk= Φwk
(η),
Ψek = Φek(η),
Ψ = Ψwk+ Ψek . (9.6)
125
9.3.2 Moment of CF of PDF
The use of characteristic functions (CF) was pioneered by Harmsen and Pearlman
in [48]. The CF is obtained by applying a discrete Fourier transform to the PDF
of an image. From this characteristic function, the first-order absolute moment is
computed and used as the feature. Equation (9.7) shows the calculation of this
moment.
M =
∑Kk=0 k|H [k]|
∑Kk=0 |H [k]|
, (9.7)
where H [·] is the characteristic function, K ∈ {0, . . . , N2− 1} and N is the width
of the domain of the PDF.
The feature proposed in [48] has only one feature—the first moment. We increased
to the second and third moments according to the following equation:
Mα =
∑Kk=0 k
α|H [k]|∑K
k=0 |H [k]|, (9.8)
where α ∈ 1, 2, 3. Increasing the moment to a higher order does not always improve
the result significantly and therefore is not necessary [109].
By incorporating the proposed technique defined in Equation (9.4), we obtained
a new set of features, defined as follows:
Ψ = Φ(υ)− Φ(υ),
Ψ = Mα −Mα, (9.9)
where Mα is the moment computed from the estimated cover image.
9.3.3 Moment of CF of Wavelet Subbands
The basis of the feature set proposed in [115] derives from a Haar wavelet de-
composition. The authors decomposed the wavelet to 12 subbands denoted by
LLi, HLi, LHi, HHi where i = 1, 2, 3. The given image histogram, denoted LL0,
is also employed.
Essentially, the probability mass function can be considered as the distribution for
126
the wavelet subbands and the image histogram. Motivated by the characteristic
function from [48], the authors constructed the characteristic functions from all the
wavelet subbands and the image histogram, resulting in 13 characteristic functions.
After that, the first three moments for each of the characteristic functions can be
computed as in the following equation:
Mα =
∑N/2k=0 f
αk |H(fk)|
∑N/2k=0 |H(fk)|
, (9.10)
where α = 1, 2 and 3 are the three moments. fk is the probability mass function
and H(·) is the characteristic function. N is the width of the domain of the
probability mass function.
Next, we improve this method by incorporating the proposed technique, as defined
in Equation (9.4). The improved feature set can be defined as follows:
Ψ = Φ(υ)− Φ(υ),
Ψ = Mαj −Mα
j , (9.11)
where Mαj is the moment computed from the estimated cover image. j = 1, 2, . . . , 13
are the 13 characteristic functions.
9.4 Experimental Results
9.4.1 Experimental Setup
Since we are interested in comparing feature discrimination performance, we stan-
dardise the classification by using a SVM [9] as the classifier in all experiments.
Three different steganographic methods, F5 [110], OutGuess [90] and MB1 [96],
are employed to create three different types of stego images. To have a percentage
wise equal number of changes over all images, we define the embedding rate in
term of bits per embeddable quantised DCT coefficient for each steganography.
We use four embedding rates: 5, 25, 50 and 100 per cent.
In the construction of the image database, 2037 images of four different sizes (512
× 512, 608 × 608, 768 × 768 and 1024 × 1024) were downloaded from [47]. All
127
Table 9.1: Performance comparison between the proposed technique and the Faridtechnique
Steganalysis F5 OutGuess MB1
Improved 0.5160 0.5120 0.5165
5% Original 0.5077 0.4625 0.5064
Improved 0.5850 0.6460 0.6304
25% Original 0.5374 0.5384 0.5379
Improved 0.7236 0.7983 0.7564
50% Original 0.5940 0.6606 0.5754
Improved 0.90437 0.8811 0.8815
100% Original 0.7218 0.7650 0.6485
images were cropped to obtain the centre portion and then converted to greyscale
images. From the constructed database, 80 per cent is used for training and the
remaining 20 per cent is used for testing. The prototype implementation is coded
in Matlab R2008a.
9.4.2 Results Comparison
We compare the improved version, which uses our proposed method, to the original
version for each of the three steganalysis techniques, as discussed in Section 9.3.
The detection results were evaluated using the area under the ROC curve (AUR).
A higher AUR value indicates better steganalysis performance. The obtained
results are tabulated in Tables 9.1, 9.2 and 9.3. We abbreviate the original version
of steganalysis techniques discussed in Subsections 9.3.1, 9.3.2 and 9.3.3 as Farid,
COM and MW, respectively.
Table 9.1 shows the comparison between the improved version and the original ver-
sion of Farid technique. From the table, each AUR value of the improved version
is larger than the corresponding original version. This indicates that the improved
version outperformed the original version for all the steganographic models and
the four embedded message lengths.
Table 9.2 compares the improved version and the original version of the steganaly-
sis technique in [48]. Although the improvement is not as large as the improvement
obtained in Table 9.1, the overall performance has been improved.
128
Table 9.2: Performance comparison between the proposed technique and the COMtechnique
Steganalysis F5 OutGuess MB1
Improved 0.5055 0.4838 0.5040
5% Original 0.5022 0.4631 0.5020
Improved 0.5279 0.5261 0.5080
25% Original 0.5097 0.4794 0.5029
Improved 0.5654 0.5408 0.5147
50% Original 0.5314 0.4799 0.5030
Improved 0.6392 0.5686 0.5228
100% Original 0.5971 0.4817 0.5052
Table 9.3: Performance comparison between the proposed technique and the MWtechnique
Steganalysis F5 OutGuess MB1
Improved 0.5016 0.5087 0.5105
5% Original 0.5009 0.4793 0.5042
Improved 0.5514 0.5385 0.5116
25% Original 0.5202 0.4927 0.5043
Improved 0.7601 0.5635 0.5648
50% Original 0.5625 0.5064 0.5286
Improved 0.8518 0.6213 0.5747
100% Original 0.6667 0.5307 0.5520
129
As for the third improved steganalysis technique, Table 9.3 clearly shows sig-
nificant improvement in detecting all the steganographic models—the detection
performance for the F5 steganographic model appears the most improved. This
verifies the effectiveness of the proposed technique.
9.5 Conclusion
In conclusion, our proposed technique has improved the three selected steganalysis
techniques by minimising image-to-image variations. To minimise the image-to-
image variation, we estimate the cover image from the stego image and then com-
pute the difference between the two. Finally, we extract the feature set from this
difference. The experimental results prove the effectiveness of using the proposed
technique.
130
Chapter 10
Conclusions and Future Research
Directions
10.1 Summary
In this thesis, we investigated steganalysis that extract information related to a
secret message hidden in multimedia document. In particular, we focused our
analysis on steganographic methods that use binary images as the medium for a
secret message. We organised our work according to the amount of information
extracted about the hidden message (i.e., the organisation structured in Section
3.2).
The work presented in this thesis is summarised below.
1. Blind steganalysis. We studied and analysed different image characteris-
tics. These images are produced by three different steganographic methods.
Based on the analysis, we developed an effective feature extraction technique
to extract a set of sensitive and discriminating features. Using a SVM as
the classifier, we constructed a blind steganalysis to detect the presence of
secret messages embedded in the binary images.
2. Multi-class steganalysis. To the best of our knowledge, no multi-class ste-
ganalysis was proposed for binary images at the time we published our multi-
class steganalysis in [19]. Besides being able to detect the presence of a secret
message in the binary image, this analysis reveals the type of steganographic
method used to produce the stego image. This information is crucial and
serves as an additional secret parameter that can narrow the scope of anal-
131
ysis. Thus, our multi-class steganalysis can be considered to extend blind
steganalysis to a more involved level of analysis.
3. Message length estimation. Information such as the length of an embedded
message is important. In this thesis, we proposed a technique for estimating
the length of a message embedded in a binary image. Specifically, in this
work, our technique attacks the steganographic method developed by Liang
et al. in [69]. This type of analysis is normally considered targeted steganal-
ysis, which plays an important role at other levels of analysis (i.e., retrieval
of the stegokey).
4. Steganographic payload locations identification. In general, the only evidence
we need to break a steganography is to verify and prove that a secret message
exists in the image. However, this does not provide enough information for
us to locate the secret message. We developed a technique for identifying
the steganographic payload locations, based on multiple stego images. Our
technique can reveal which pixels in the binary image have been used to
carry the secret message bits.
Finally, we revisited some of the existing blind steganalysis techniques for analysing
JPEG images. We combined several types of features and applied a feature selec-
tion technique for the analysis, which not only improves the detection accuracy,
but also reduces the computational resources. We showed that an enhancement
can be obtained by minimising the influence of image content. In other words, we
increased the feature sensitivity with respect to the differences caused by stegano-
graphic artefacts, rather than the image content.
Even though this thesis is formulated as an attack on binary image steganography,
we hope that it will contribute to the design of a more secure steganographic
method. More precisely, the analysis presented in this thesis can be used to
evaluate and measure the security level of a steganographic method, instead of
using conventional measurements, such as PSNR.
10.2 Future Research Directions
Modern blind steganalysis techniques are not universal in the sense that their ef-
fectiveness depends very much on both the type of cover images and the stegano-
graphic methods used. For example, effective blind steganalysis for JPEG image
steganography will not be as effective when applied to a spatial domain image.
132
Hence, future work should focus on constructing a real universal steganalysis tech-
nique.
For multi-class steganalysis, the performance drops when the number of different
steganographic methods used to train the classifier increases. This happens be-
cause the feature set may not be optimal. Another reason is that a similarity in
the embedding algorithms might exist across different steganographic methods,
making them difficult to identify and differentiate. Thus, a more effective and dis-
criminating feature set should be developed. In addition, a better strategy should
be found and employed for constructing the multi-class steganalysis.
The identification of payload locations in this thesis has been simplified to rep-
resent a generic case. This can be seen in our experiments where a generic LSB
replacement steganography is used. A more challenging environment can be set up
to include other complicated steganographies, such as steganography with adap-
tive embedding functionality.
Even though our steganographic payload identification technique can identify the
locations with high accuracy, unfortunately, we cannot retrieve meaningful content
of the secret message because what we have is a randomly scattered collection of
message bits. We need to re-order them into the correct sequence to extract the
message. Obviously, a more complete analysis, involving the correct sequence
extraction should be engaged. To gain further insight in this problem, we should
examine the retrieval of the stegokey. Unfortunately, this area has been scarcely
studied except for the material published in [41, 42].
133
Bibliography
[1] B. Anckaert, B. D. Sutter, D. Chanet, and K. D. Bosschere. Steganogra-phy for Executables and Code Transformation Signatures. 7th InternationalConference on Information Security and Cryptology, 3506:425–439, 2004.
[2] R. J. Anderson. Stretching the Limits of Steganography. 1st InternationalWorkshop on Information Hiding, 1174:39–48, 1996.
[3] R. J. Anderson and F. A. P. Petitcolas. On the limits of steganography.IEEE Journal of Selected Areas in Communications, 16(4):474–481, 1998.
[4] I. Avcibas, M. Nasir, and B. Sankur. Steganalysis based on image qualitymetrics. IEEE 4th Workshop on Multimedia Signal Processing, pages 517–522, 2001.
[5] S. Badura and S. Rymaszewski. Transform domain steganography in DVDvideo and audio content. IEEE International Workshop on Imaging Systemsand Techniques, pages 1–5, 2007.
[6] J. D. Ballard, J. G. Hornik, and D. Mckenzie. Technological Facilitationof Terrorism: Definitional, Legal, and Policy Issues. American BehavioralScientist, 45(6):989–1016, 2002.
[7] R. Bohme and A. Westfeld. Breaking cauchy model-based JPEG steganog-raphy with first order statistics. 9th European Symposium on Research Com-puter Security, 3193:125–140, 2004.
[8] C. Cachin. An Information-Theoretic Model for Steganography. 2nd Inter-national Workshop on Information Hiding, 1525:306–318, 1998.
[9] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines.Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm, 2001.
[10] C.-C. Chang, C.-S. Tseng, and C.-C. Lin. Hiding data in binary images. 1stInternational Conference on Information Security Practice and Experience,3439:338–349, 2005.
[11] S. Chatterjee and A. S. Hadi. Regression Analysis by Example. John Wileyand Sons, 4th edition, 2006.
[12] A. Cheddad, J. Condell, K. Curran, and P. McKevitt. Digital imagesteganography: Survey and analysis of current methods. Signal Process-ing, 90(3):727–752, 2010.
134
[13] C. Chen and Y. Q. Shi. JPEG Image Steganalysis Utilizing both Intrablockand Interblock Correlations. IEEE International Symposium on Circuits andSystems, pages 3029–3032, 2008.
[14] C. Chen, Y. Q. Shi, W. Chen, and G. Xuan. Statistical Moments BasedUniversal Steganalysis using JPEG 2-D Array and 2-D Characteristic Func-tion. IEEE International Conference on Image Processing, pages 105–108,2006.
[15] X. Chen, Y. Wang, T. Tan, and L. Guo. Blind Image Steganalysis Basedon Statistical Analysis of Empirical Matrix. International Conference onPattern Recognition, 3:1107–1110, 2006.
[16] Z. Chen, S. Haykin, J. J. Eggermont, and S. Becker. Correlative Learning:A Basis for Brain and Adaptive Systems. John Wiley and Sons, 2007.
[17] K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Ste-ganalysis. IEEE Conference on Digital Image Computing: Techniques andApplications, pages 96–103, 2008.
[18] K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement ViaImage-to-Image Variation Minimization. International IEEE Conference onAdvanced Computer Theory and Engineering, pages 223–227, 2008.
[19] K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Clas-sification Based on Multi-Class Steganalysis. 6th International Conferenceon Information Security, Practice and Experience, 6047:341–358, 2010.
[20] K. L. Chiew and J. Pieprzyk. Blind steganalysis: A countermeasure forbinary image steganography. International Conference on Availability, Re-liability and Security, pages 653–658, 2010.
[21] K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in BinaryImage Embedded by Using Boundary Pixels Steganography. InternationalConference on Availability, Reliability and Security, pages 683–688, 2010.
[22] K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Locationin Binary Image. 11th Pacific Rim Conference on Multimedia – Advancesin Multimedia Information Processing, 6297:590–600, 2010.
[23] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning,20(3):273–297, 1995.
[24] I. Cox, J. Kilian, F. Leighton, and T. Shamoon. Secure spread spectrumwatermarking for multimedia. IEEE Transactions on Image Processing,6(12):1673–1687, 1997.
[25] I. J. Cox, M. L. Miller, J. A. Bloom, J. Fridrich, and T. Kalker. Digital wa-termarking and steganography. The Morgan Kaufmann series in multimediainformation and systems. Morgan Kaufmann Publishers, 2nd edition, 2008.
135
[26] N. Cvejic and T. Seppanen. Increasing robustness of lsb audio steganographyby reduced distortion lsb coding. Journal of Universal Computer Science,11(1):56–65, 2005.
[27] I. Davidson and G. Paul. Locating Secret Messages in Images. 10th ACMSIGKDD International Conference on Knowledge Discovery and Data Min-ing, pages 545–550, 2004.
[28] J. Davis, J. MacLean, and D. Dampier. Methods of information hiding anddetection in file systems. 5th IEEE International Workshop on SystematicApproaches to Digital Forensic Engineering, pages 66–69, 2010.
[29] A. Delforouzi and M. Pooyan. Adaptive Digital Audio Steganography Basedon Integer Wavelet Transform. 3rd International Conference on Interna-tional Information Hiding and Multimedia Signal Processing, pages 283–286,2007.
[30] J. Dong and T. Tan. Blind image steganalysis based on run-length histogramanalysis. 15th IEEE International Conference on Image Processing, pages2064–2067, 2008.
[31] J. Dong, W. Wang, and T. Tan. Multi-class blind steganalysis based onimage run-length analysis. 8th International Workshop on Digital Water-marking, 5703:199–210, 2009.
[32] H. Farid. Detecting Steganographic Messages in Digital Images. TR2001-412, Department of Computer Science, Dartmouth College, 2001.
[33] H. Farid. Detecting Hidden Messages Using Higher-Order Statistical Models.International Conference on Image Processing, 2:905–908, 2002.
[34] R. Fisher, S. Perkins, A. Walker, and E. Wolfart. HYPER-MEDIA IMAGE PROCESSING REFERENCE. Available athttp://homepages.inf.ed.ac.uk/rbf/HIPR2/spatdom.htm.
[35] J. Fridrich. Feature-Based Steganalysis for JPEG Images and its Impli-cations for Future Design of Steganographic Schemes. 6th InternationalWorkshop on Information Hiding, 3200:67–81, 2004.
[36] J. Fridrich and M. Goljan. Digital image steganography using stochasticmodulation. Proceedings of the SPIE on Security and Watermarking ofMultimedia Contents V, 5020(1):191–202, 2003.
[37] J. Fridrich and M. Goljan. On estimation of secret message length in LSBsteganography in spatial domain. Security, Steganography, and Watermark-ing of Multimedia Contents VI, 5306:23–34, 2004.
[38] J. Fridrich, M. Goljan, and D. Hogea. Attacking the OutGuess. Proceedingsof ACM: Special Session on Multimedia Security and Watermarking, 2002.
[39] J. Fridrich, M. Goljan, and D. Hogea. Steganalysis of JPEG Images: Break-ing the F5 Algorithm. 5th International Workshop on Information Hiding,2578:310–323, 2003.
136
[40] J. Fridrich, M. Goljan, D. Hogea, and D. Soukal. Quantitative steganalysisof digital images: estimating the secret message length. Multimedia System,9(3):288–302, 2003.
[41] J. Fridrich, M. Goljan, and D. Soukal. Searching for the Stego-Key. Pro-ceedings of the SPIE on Security and Watermarking of Multimedia ContentsVI, 5306:70–82, 2004.
[42] J. Fridrich, M. Goljan, D. Soukal, and T. Holotyak. Forensic Steganalysis:Determining the Stego Key in Spatial Domain Steganography. Proceedingsof the SPIE on Security and Watermarking of Multimedia Contents VII,5681:631–642, 2005.
[43] D. Fu, Y. Q. Shi, and D. Zou. JPEG Steganalysis Using Empirical Transi-tion Matrix in Block DCT Domain. International Workshop on MultimediaSignal Processing, 2006.
[44] M. Goljan, J. Fridrich, and T. Holotyak. New blind steganalysis and itsimplications. Security, Steganography, and Watermarking of MultimediaContents VIII, 6072, 2006.
[45] G. Golub and W. Kahan. Calculating the singular values and pseudo-inverseof a matrix. Journal of the Society for Industrial and Applied Mathematics,Series B: Numerical Analysis, 2(2):205–224, 1965.
[46] D. Gong, F. Liu, B. Lu, P. Wang, and L. Ding. Hiding informationin in javaclass file. International Symposium on Computer Science and ComputationalTechnology, 2:160–164, 2008.
[47] P. Greenspun. Philip Greenspun. Available at http://philip.greenspun.com.
[48] J. Harmsen and W. Pearlman. Steganalysis of additive-noise modelableinformation hiding. Proceedings of the SPIE on Security and Watermarkingof Multimedia Contents V, 5020:131–142, 2003.
[49] J. He and J. Huang. Steganalysis of stochastic modulation steganography.Science in China Series F: Information Sciences, 49(3):273–285, 2006.
[50] J. He, J. Huang, and G. Qiu. A New Approach to Estimating HiddenMessage Length in Stochastic Modulation Steganography. 4th InternationalWorkshop on Digital Watermarking, 3710:1–14, 2005.
[51] S. Hetzl and P. Mutzel. A Graph-Theoretic Approach to Steganography.9th IFIP TC-6 TC-11 International Conference on Communications andMultimedia Security, 3677:119–128, 2005.
[52] M. Hogan. Security and Robustness Analysis of Data Hiding Techniques forSteganography. PhD thesis, University College Dublin, 2008.
[53] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass supportvector machines. IEEE Transactions on Neural Networks, 13:415–425, 2002.
137
[54] F. Huang and J. Huang. Calibration based universal JPEG steganalysis.Science in China Series F: Information Sciences, 52(2):260–268, 2009.
[55] A. Jain and D. Zongker. Feature Selection: Evaluation, Application, andSmall Sample Performance. IEEE Transactions on Pattern Analysis andMachine Intelligence, 19(2), 1997.
[56] M. Jiang, N. Memon, E. Wong, and X. Wu. Quantitative steganalysis ofbinary images. IEEE International Conference on Image Processing, pages29–32, 2004.
[57] M. Jiang, X. Wu, E. Wong, and A. Memon. Steganalysis of boundary-based steganography using autoregressive model of digital boundaries. IEEEInternational Conference on Multimedia and Expo, 2:883–886, 2004.
[58] L. Jodar, A. G. Law, A. Rezazadeh, J. H. Watson, , and G. Wu. Compu-tations for the moore-penrose and other generalized inverses. CongressusNumerantium, pages 57–64, 1991.
[59] N. F. Johnson and S. Jajodia. Exploring steganography: Seeing the unseen.Computer, 31(2):26–34, 1998.
[60] J. Kelley. Terror groups hide behind Web encryption. USA Today . Availableat http://www.usatoday.com/tech/news/2001-02-05-binladen.htm, 2 May2001.
[61] A. Ker. Steganalysis of LSB Matching in Grayscale Images. IEEE SignalProcessing Letters, 12(6):441–444, 2005.
[62] A. D. Ker. Locating Steganographic Payload via WS Residuals. 10th ACMWorkshop on Multimedia and Security, pages 27–32, 2008.
[63] A. D. Ker and R. Bohme. Revisiting weighted stego-image steganalysis.Security, Forensics, Steganography, and Watermarking of Multimedia Con-tents X, 6819, 2008.
[64] A. D. Ker and I. Lubenko. Feature Reduction and Payload Location withWAM Steganalysis. Media Forensics and Security, 7254, 2009.
[65] M. Kharrazi, H. Sencar, and N. Memon. Benchmarking steganographic andsteganalysis techniques. Proceedings of the SPIE on Security, Steganography,and Watermarking of Multimedia Contents VII, 5681:252–263, 2005.
[66] M. Kharrazi, H. Sencar, and N. Memon. Improving steganalysis by fu-sion techniques: a case study with image steganography. Proceedings of theSPIE on Security, Steganography, and Watermarking of Multimedia Con-tents VIII, 6072:51–58, 2006.
[67] C. Kim. Data hiding based on compressed dithering images. Advances inIntelligent Information and Database Systems, 283:89–98, 2010.
138
[68] X.-W. Kong, W.-F. Liu, and X.-G. You. Secret Message Location Steganal-ysis Based on Local Coherences of Hue. 6th Pacific-Rim Conference onMultimedia, 3768:301–311, 2005.
[69] G.-l. Liang, S.-z. Wang, and X.-p. Zhang. Steganography in binary image bychecking data-carrying eligibility of boundary pixels. Journal of ShanghaiUniversity (English Edition), 11(3):272–277, 2007.
[70] Z. Liu, L. Ping, J. Chen, J. Wang, and X. Pan. Steganalysis based on dif-ferential statistics. 5th International Conference on Cryptology and NetworkSecurity, 4301:224–240, 2006.
[71] D.-C. Lou, C.-L. Liu, and C.-L. Lin. Message estimation for universal ste-ganalysis using multi-classification support vector machine. Computer Stan-dards & Interfaces, 31(2):420–427, 2009.
[72] J. Lukas and J. Fridrich. Estimation of primary quantization matrix indouble compressed JPEG images. Proceedings on Digital Forensic ResearchWorkshop, pages 5–8, 2003.
[73] S. Lyu and H. Farid. Detecting Hidden Messages Using Higher-Order Statis-tics and Support Vector Machines. 5th International Workshop on Informa-tion Hiding, 2002.
[74] S. Lyu and H. Farid. Steganalysis using color wavelet statistics and one-class support vector machines. Security, Steganography, and Watermarkingof Multimedia Contents VI, 5306:35–45, 2004.
[75] S. Lyu and H. Farid. Steganalysis Using Higher-Order Image Statistics.IEEE Transactions on Information Forensics and Security, 1(1):111–119,2006.
[76] L. Marvel, C. G. Boncelet, Jr, and C. T. Retter. Spread Spectrum ImageSteganography. IEEE Transactions on Image Processing, 8(8):1075–1083,1999.
[77] Y.-y. Meng, B.-j. Gao, Q. Yuan, F.-g. Yu, and C.-f. Wang. A novel steganal-ysis of data hiding in binary text images. 11th IEEE Singapore InternationalConference on Communication Systems, pages 347–351, 2008.
[78] T. Morkel, J. H. P. Eloff, and M. S. Olivier. Using image steganographyfor decryptor distribution. OTM Confederated International Workshops,4277:322–330, 2006.
[79] B. Morrison. Ex-USA Today reporter faked major stories. USA To-day . Available at http://www.usatoday.com/news/2004-03-18-2004-03-18 kelleymain x.htm, 19 March 2004.
[80] W. S. Noble. What is a support vector machine? Nature biotechnology,24(12), 2006.
139
[81] H. Noda, T. Furuta, M. Niimi, and E. Kawaguchi. Video steganographybased on bit-plane decomposition of wavelet-transformed video. Security,Steganography, and Watermarking of Multimedia Contents VI, 5306(1):345–353, 2004.
[82] H.-K. Pan, Y.-Y. Chen, and Y.-C. Tseng. A secure data hiding scheme fortwo-color images. 5th IEEE Symposium on Computers and Communications,pages 750–755, 2000.
[83] H. Pang, K.-L. Tan, and X. Zhou. Steganographic schemes for file system andb-tree. IEEE Transactions on Knowledge and Data Engineering, 16:701–713,2004.
[84] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding—asurvey. Proceedings of the IEEE, 87(7):1062–1078, 1999.
[85] T. Pevny and J. Fridrich. Towards Multi-class Blind Steganalyzer for JPEGImages. International Workshop on Digital Watermarking, LNCS, 3710:39–53, 2005.
[86] T. Pevny and J. Fridrich. Determining the Stego Algorithm for JPEG Im-ages. In Special Issue of IEE Proceedings - Information Security, 153(3):75–139, 2006.
[87] T. Pevny and J. Fridrich. Multi-class blind steganalysis for JPEG images.Proceedings of the SPIE on Security and Watermarking of Multimedia Con-tents VIII, 6072(1):257–269, 2006.
[88] T. Pevny and J. Fridrich. Merging markov and DCT features for multi-classJPEG steganalysis. Proceedings of the SPIE on Security and Watermarkingof Multimedia Contents IX, 6505(1):1–13, 2007.
[89] T. Pevny and J. Fridrich. Multi-Class Detector of Current SteganographicMethods for JPEG Format. IEEE Transactions on Information Forensicsand Security, 3(4):635–650, 2008.
[90] N. Provos. Defending Against Statistical Steganalysis. Proceedings of the10th conference on USENIX Security Symposium, 10:323–335, 2001.
[91] N. Provos and P. Honeyman. Detecting Steganographic Content on theInternet. Proceedings of the Network and Distributed System Security Sym-posium, 2002.
[92] N. Provos and P. Honeyman. Hide and Seek: An Introduction to Steganog-raphy. IEEE Security & Privacy, 1(3):32–44, 2003.
[93] P. Pudil, J. Novovicova, and J. Kittler. Floating search methods in featureselection. Pattern Recognition Letters, 15(11):1119–1125, 1994.
[94] N. B. Puhan, A. T. S. Ho, and F. Sattar. High capacity data hiding in binarydocument images. 8th International Workshop on Digital Watermarking,5703:149–161, 2009.
140
[95] B. Rodriguez and G. L. Peterson. Detecting steganography using multi-classclassification. IFIP International Conference on Digital Forensics, 242:193–204, 2007.
[96] P. Sallee. Model-based steganography. 2nd International Workshop on Dig-ital Watermarking, 2939:154–167, 2003.
[97] A. Savoldi and P. Gubian. Blind multi-class steganalysis system usingwavelet statistics. 3rd International Conference on Intelligent InformationHiding and Multimedia Signal Processing, 2:93–96, 2007.
[98] G. Schaefer and M. Stich. UCID - An Uncompressed Colour Image Database.Proc. SPIE, Storage and Retrieval Methods and Applications for Multimedia,pages 472–480, 2004.
[99] Y. Q. Shi, C. Chen, and W. Chen. A Markov Process Based Approach toEffective Attacking JPEG Steganography. 8th International Workshop onInformation Hiding, 4437:249–264, 2006.
[100] Y. Q. Shi, G. Xuan, C. Yang, J. Gao, Z. Zhang, P. Chai, D. Zou, C. Chen,and W. Chen. Effective Steganalysis Based on Statistical Moments ofWavelet Characteristic Function. International Conference on InformationTechnology: Coding and Computing, pages 768–773, 2005.
[101] Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen,and C. Chen. Image Steganalysis Based on Moments of Characteristic Func-tions Using Wavelet Decomposition, Prediction-Error Image, and NeuralNetwork. IEEE International Conference on Multimedia and Expo, pages269–272, 2005.
[102] K. Sullivan, U. Madhow, S. Chandrasekaran, and B. Manjunath. Steganaly-sis for Markov Cover Data With Applications to Images. IEEE Transactionson Information Forensics and Security, 1(2):275–287, 2006.
[103] P. S. Tibbetts. Terrorist Use of the Internet And Related Information Tech-nologies. Monograph, School of Advanced Military Studies, Fort Leaven-worth, 2002.
[104] U. Topkara, M. Topkara, and M. J. Atallah. The hiding virtues of ambi-guity: quantifiably resilient watermarking of natural language text throughsynonym substitutions. 8th workshop on Multimedia and security, pages164–174, 2006.
[105] S. Trivedi and R. Chandramouli. Locally Most Powerful Detector for Se-cret Key Estimation in Spread Spectrum Image Steganography. Proceedingsof the SPIE on Security, Steganography, and Watermarking of MultimediaContents VI, 5306:1–12, 2004.
[106] S. Trivedi and R. Chandramouli. Secret Key Estimation in SequentialSteganography. IEEE Transactions on Signal Processing, 53(2):746–757,2005.
141
[107] Y.-C. Tseng and H.-K. Pan. Secure and invisible data hiding in 2-colorimages. 20th Annual Joint Conference of the IEEE Computer and Commu-nications Societies, 2:887–896, 2001.
[108] P. Wang, F. Liu, G. Wang, Y. Sun, and D. Gong. Multi-class steganalysisfor JPEG stego algorithms. 15th IEEE International Conference on ImageProcessing, pages 2076–2079, 2008.
[109] Y. Wang and P. Moulin. Optimized feature extraction for learning-based im-age steganalysis. IEEE Transactions on Information Forensics and Security,2(1), 2007.
[110] A. Westfeld. F5 - A Steganographic Algorithm. 4th International Workshopon Information Hiding, 2137:289–302, 2001.
[111] A. Westfeld and A. Pfitzmann. Attacks on Steganographic System. 3rdInternational Workshop on Information Hiding, 1768:61–76, 2000.
[112] M. Wu and B. Liu. Data hiding in binary image for authentication andannotation. IEEE transactions on multimedia, 6(4):528–538, 2004.
[113] M. Y. Wu and J. H. Lee. A Novel Data Embedding Method for Two-ColorFacsimile Images. International Symposium on Multimedia Information Pro-cessing, 1998.
[114] W. Xinli, F. Albregtsen, and B. Foyn. Texture features from gray level gaplength matrix. Proceedings of IAPR Workshop on Machine Vision Applica-tions, pages 375–378, 1994.
[115] G. Xuan, Y. Q. Shi, J. Gao, D. Zou, C. Yang, Z. Zhang, P. Chai, C. Chen,and W. Chen. Steganalysis Based on Multiple Features Formed by StatisticalMoments of Wavelet Characteristic Functions. 7th International Workshopon Information Hiding, 3727:262–277, 2005.
[116] G. Xuan, Y. Q. Shi, C. Huang, D. Fu, X. Zhu, P. Chai, and J. Gao. Steganal-ysis Using High-Dimensional Features Derived from Co-occurrence Matrixand Class-Wise Non-Principal Components Analysis (CNPCA). 5th Inter-national Workshop on Digital Watermarking, 4283:49–60, 2006.
[117] C.-Y. Yang. Color image steganography based on module substitutions.3rd IEEE International Conference on Intelligent Information Hiding andMultimedia Signal Processing, 2:118–121, 2007.
[118] L. Yuling, S. Xingming, G. Can, and W. Hong. An Efficient LinguisticSteganography for Chinese Text. IEEE International Conference on Multi-media and Expo, pages 2094–2097, 2007.
[119] G. Zhang. Neural networks for classification: a survey. IEEE Transac-tions on Systems, Man and Cybernetics, Part C: Applications and Reviews,30(4):451–462, 2000.
142
[120] H. Zong, F. Liu, and X. Luo. A wavelet-based blind JPEG image steganal-ysis uing co-occurrence matrix. 11th International Conference on AdvancedCommunication Technology, 3:1933–1936, 2009.
[121] D. Zou, Y. Q. Shi, W. Su, and G. Xuan. Steganalysis Based on MarkovModel of Thresholded Prediction-error Image. IEEE International Confer-ence on Multimedia and Expo, pages 1365–1368, 2006.
143
Index
512-pattern histogram, 80
AUR, 59, 108
backpropagation, 13
bit per pixel, 10, 14
bitmap, 16
Blind steganalysis, 8, 10
BMP, 16
bpp, 10, 14
center of mass, 20
CF, 107
characteristic function, 20, 107
classification, 10
COM, 20
cover image, 7
cumulative sum, 41
curse of dimensionality, 19
CUSUM, 41
DCT, 28
differential operation, 24
differential statistics, 24
digitisation, 14
discrete Fourier transform, 20
embedding operation, 6
extraction operation, 6
F5, 7, 33
feedforward, 13
Fisher linear discriminant, 12
generalized Gaussian distribution, 35
GGD, 35
GIF, 16
GLCM, 67
grey level co-occurrence, 67
histogram difference, 82
histogram quotient, 85
image calibration, 26
JBIG 2, 36
JPEG, 16
LAHD, 23
least significant bit, 7
LMP, 41
local angular harmonic decomposition,
23
locally most powerful, 41
LSB, 7
machine learning, 10
mode, 25, 29
Model-based steganography, 48
multivariate regression, 12
Neural networks, 13
OutGuess, 7
pairs of values, 33
pattern recognition, 10
pixel, 14
PMF, 26
PNG, 17
PoV, 33
prediction-error, 21, 26
PRNG, 7, 40, 94
144
probability density function, 21
probability mass function, 26
processing element, 13
pseudorandom number generator, 7, 40
QMF, 23
quadrature mirror filters, 23
radial basis function, 74
random embedding, 7
raster graphic, 15
RBF, 74
Receiver Operating Characteristic, 59
ROC, 59
separating hyperplane, 14
separating line, 13
sequential embedding, 7
sequential forward floating selection, 108
SFFS, 108
steganalysis, 6
steganographic, 6
stego image, 7
stegokey, 7
supervised learning, 11
support vector machine, 13, 31
SVM, 13
Targeted steganalysis, 8
TIFF, 16
vector graphic, 15
WAM, 39
wavelet absolute moments, 39
weighted stego image, 38
WS, 38
145