Steganalysis of Binary Images

STEGANALYSIS OF BINARY IMAGES

This thesis is presented for the degree of

DOCTOR OF PHILOSOPHY

by

KANG LENG CHIEW

Department of ComputingFaculty of Science

MACQUARIE UNIVERSITYAustralia

June 2011

© 2011 KANG LENG CHIEW

TABLE OF CONTENTS

Page

LIST OF FIGURES iv

LIST OF TABLES vi

ABSTRACT vii

LIST OF PUBLICATIONS x

ACKNOWLEDGMENTS xi

1 Introduction 1

1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Research Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . 6

2 Background and Concepts 9

2.1 Overview of Steganography . . . . . . . . . . . . . . . . . . . . . . 92.2 Steganalysis—Model of Adversary . . . . . . . . . . . . . . . . . . . 112.3 Level of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Blind Steganalysis as Pattern Recognition . . . . . . . . . . . . . . 14

2.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Digital Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.1 Image File Formats . . . . . . . . . . . . . . . . . . . . . . . 202.5.2 Spatial and Frequency Domain Images . . . . . . . . . . . . 21

3 Literature Review 22

3.1 Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.1 Liang et al. Binary Image Steganography . . . . . . . . . . . 223.1.2 Pan et al. Binary Image Steganography . . . . . . . . . . . . 253.1.3 Tseng and Pan Binary Image Steganography . . . . . . . . . 263.1.4 Chang et al. Binary Image Steganography . . . . . . . . . . 273.1.5 Wu and Liu Binary Image Steganography . . . . . . . . . . 283.1.6 F5 Steganography . . . . . . . . . . . . . . . . . . . . . . . . 283.1.7 OutGuess Steganography . . . . . . . . . . . . . . . . . . . . 293.1.8 Model-Based Steganography . . . . . . . . . . . . . . . . . . 30

i

3.2 Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.1 Differentiation of Cover and Stego Images . . . . . . . . . . 313.2.2 Classification of Steganographic Methods . . . . . . . . . . . 413.2.3 Estimation of Message Length . . . . . . . . . . . . . . . . . 473.2.4 Identification of Stego-Bearing Pixels . . . . . . . . . . . . . 523.2.5 Retrieval of Stegokey . . . . . . . . . . . . . . . . . . . . . . 563.2.6 Extracting the Hidden Message . . . . . . . . . . . . . . . . 58

4 Blind Steganalysis 59

4.1 Comparison of the Steganography Methods under Analysis . . . . . 604.2 Proposed Steganalysis Method . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Grey Level Run Length Matrix . . . . . . . . . . . . . . . . 624.2.2 Pixel Differences . . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 GLRL Matrix from the Pixel Difference . . . . . . . . . . . . 634.2.4 GLGL Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . . . . . . . 65

4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 674.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 674.3.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Multi-Class Steganalysis 70

5.1 Summary of the Steganographic Methods under Analysis . . . . . . 715.2 Proposed Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.1 Increasing the Grey Level via the Pixel Difference . . . . . . 735.2.2 Grey Level Run Length Matrix . . . . . . . . . . . . . . . . 755.2.3 Grey Level Co-Occurrence Matrix . . . . . . . . . . . . . . . 755.2.4 Cover Image Estimation . . . . . . . . . . . . . . . . . . . . 765.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Multi-Class Classification . . . . . . . . . . . . . . . . . . . . . . . . 795.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 815.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 82

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Hidden Message Length Estimation 86

6.1 Boundary Pixel Steganography . . . . . . . . . . . . . . . . . . . . 876.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.1 512-Pattern Histogram as the Distinguishing Statistic . . . . 886.2.2 Matrix Right Division . . . . . . . . . . . . . . . . . . . . . 916.2.3 Message Length Estimation . . . . . . . . . . . . . . . . . . 93

6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 946.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 946.3.2 Results of the Estimation . . . . . . . . . . . . . . . . . . . 95

6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

ii

7 Steganographic Payload Location Identification 98

7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . 997.3 Proposed Stego-Bearing Pixel Location Identification . . . . . . . . 1017.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 1037.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 104

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8 Feature-Pooling Blind JPEG Image Steganalysis 109

8.1 Feature Extraction Techniques . . . . . . . . . . . . . . . . . . . . . 1098.1.1 Image Quality Metrics . . . . . . . . . . . . . . . . . . . . . 1108.1.2 Moment of Wavelet Decomposition . . . . . . . . . . . . . . 1108.1.3 Feature-Based . . . . . . . . . . . . . . . . . . . . . . . . . . 1118.1.4 Moment of CF of PDF . . . . . . . . . . . . . . . . . . . . . 112

8.2 Features-Pooling Steganalysis . . . . . . . . . . . . . . . . . . . . . 1138.2.1 Feature Selection in Feature-Based Method . . . . . . . . . . 1138.2.2 Feature-Pooling . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 1168.3.1 Classifier Selection . . . . . . . . . . . . . . . . . . . . . . . 1168.3.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 118

8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

9 Improving JPEG Image Steganalysis 121

9.1 Steganography as Additive Noise . . . . . . . . . . . . . . . . . . . 1219.2 Image-to-Image Variation Minimisation . . . . . . . . . . . . . . . . 1229.3 Steganalysis Improvement . . . . . . . . . . . . . . . . . . . . . . . 125

9.3.1 Moments of Wavelet Decomposition . . . . . . . . . . . . . . 1259.3.2 Moment of CF of PDF . . . . . . . . . . . . . . . . . . . . . 1269.3.3 Moment of CF of Wavelet Subbands . . . . . . . . . . . . . 126

9.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 1279.4.2 Results Comparison . . . . . . . . . . . . . . . . . . . . . . . 128

9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

10 Conclusions and Future Research Directions 131

10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13110.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . 132

Bibliography 134

iii

LIST OF FIGURES

Page

1.1 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 General model of steganography . . . . . . . . . . . . . . . . . . . . 102.2 General framework of blind steganalysis . . . . . . . . . . . . . . . 152.3 Two-class SVM classification . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Example of eligible pixels . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Example of ineligible pixels . . . . . . . . . . . . . . . . . . . . . . 243.3 Effect of flipping a pixel . . . . . . . . . . . . . . . . . . . . . . . . 263.4 Measurement of smoothness and connectivity . . . . . . . . . . . . 293.5 Algorithm of model-based steganography . . . . . . . . . . . . . . . 313.6 Co-occurrence matrices extracted from cover and stego images . . . 333.7 Illustration of wavelet decomposition . . . . . . . . . . . . . . . . . 373.8 Intra- and inter-block correlations in a JPEG image . . . . . . . . . 393.9 The 64 modes of an 8×8 DCT block . . . . . . . . . . . . . . . . . 433.10 Modified image calibration for double compressed JPEG image . . . 443.11 One-against-one approach for a multi-class classification . . . . . . . 463.12 A portion of image histogram before and after LSB embedding . . . 483.13 The boundaries of 8× 8 blocks . . . . . . . . . . . . . . . . . . . . 493.14 The extraction of residual image . . . . . . . . . . . . . . . . . . . . 55

4.1 Detection results displayed in ROC curves and AUR . . . . . . . . 68

5.1 Pixel difference in vertical direction . . . . . . . . . . . . . . . . . . 73

6.1 Illustration of a boundary pixel . . . . . . . . . . . . . . . . . . . . 896.2 Examples of 512 patterns . . . . . . . . . . . . . . . . . . . . . . . . 896.3 Comparison of patterns histogram between cover and stego images . 906.4 Histogram difference between two binary images . . . . . . . . . . . 926.5 Histogram quotient with increasing message length . . . . . . . . . 946.6 Estimated length of hidden messages for all binary images . . . . . 956.7 Example of a highly distorted stego image . . . . . . . . . . . . . . 966.8 Estimation error of hidden message length for all binary images . . 97

7.1 Identification results for different window sizes . . . . . . . . . . . . 1047.2 Comparison of results for image Database A . . . . . . . . . . . . . 1077.3 Comparison of results for image Database B . . . . . . . . . . . . . 1077.4 Comparison of results for image Database C . . . . . . . . . . . . . 107

8.1 Features comparison in detecting F5 . . . . . . . . . . . . . . . . . 114

iv

8.2 Features comparison in detecting OutGuess . . . . . . . . . . . . . 1158.3 Features comparison in detecting MB1 . . . . . . . . . . . . . . . . 1158.4 Classifier comparison in detecting F5 . . . . . . . . . . . . . . . . . 1178.5 Classifier comparison in detecting OutGuess . . . . . . . . . . . . . 1178.6 Classifier comparison in detecting MB1 . . . . . . . . . . . . . . . . 1188.7 Comparison of steganalysis performance in detecting F5 . . . . . . . 1198.8 Comparison of steganalysis performance in detecting OutGuess . . . 1208.9 Comparison of steganalysis performance in detecting MB1 . . . . . 120

9.1 Two images with their respective underlying statistics . . . . . . . . 1239.2 Transformed image by scaling and cropping . . . . . . . . . . . . . 124

v

LIST OF TABLES

Page

4.1 Comparison of the steganographic techniques . . . . . . . . . . . . . 614.2 Summary of the 68-dimensional feature space . . . . . . . . . . . . 664.3 Experimental parameters . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1 Properties of features . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 Example of majority-voting strategy for multi-class SVM . . . . . . 805.3 Summary of image databases . . . . . . . . . . . . . . . . . . . . . 815.4 Summary of stego image databases . . . . . . . . . . . . . . . . . . 825.5 Confusion matrix for the textual database . . . . . . . . . . . . . . 845.6 Confusion matrix for the mixture database . . . . . . . . . . . . . . 855.7 Confusion matrix for the scene database . . . . . . . . . . . . . . . 85

6.1 Mean and standard deviation of the estimation . . . . . . . . . . . 96

7.1 Summary of image databases . . . . . . . . . . . . . . . . . . . . . 1037.2 The accuracy of the identification for image Database A . . . . . . . 1057.3 The accuracy of the identification for image Database B . . . . . . . 1057.4 The accuracy of the identification for image Database C . . . . . . 106

8.1 Feature selection comparison for SFFS, T-test and Bhattacharyya . 114

9.1 Comparison for the proposed technique and the Farid technique . . 1289.2 Comparison for the proposed technique and the COM technique . . 1299.3 Comparison for the proposed technique and the MW technique . . . 129

vi

ABSTRACT

Steganography is a science of hiding messages into multimedia documents. A

message can be hidden in a document only if the content of a document has high

redundancy. Although the embedded message changes the characteristics and

nature of the document, it is required that these changes are difficult to be iden-

tified by an unsuspecting user. On the other hand, steganalysis develops theories,

methods and techniques that can be used to detect hidden messages in multi-

media documents. The documents without any hidden messages are called cover

documents and the documents with hidden messages are named stego documents.

The work of this thesis concentrates on image steganalysis. We present four differ-

ent types of steganalysis techniques. These steganalysis techniques are developed

to counteract the steganographic methods that use binary (black and white) im-

ages as the cover media. Unlike greyscale and colour images, binary images have

a rather modest statistical nature. This makes it difficult to apply directly the

existing steganalysis on binary images.

The first steganalysis technique addresses blind steganalysis. Its objective is to

detect the existence of a secret message in a binary image. Since the detection of

a secret message is often modelled as a classification problem, consequently it can

be approached using pattern recognition methodology.

The second steganalysis technique is known as multi-class steganalysis. Its purpose

is to identify the type of steganographic method used to create the stego image.

This extends the earlier blind steganalysis from two-class (cover or stego image) to

multi-class (cover or different types of stego images) classification. Similar to blind

steganalysis, this technique is also based on the pattern recognition methodology

to perform the classification.

The third steganalysis technique uses first-order statistic—binary pattern

histogram—to estimate the length of an embedded message. This technique is

used specifically to analyse the steganography developed by Liang et al. The es-

timated message length usually plays an important role and is needed at other

levels of analysis.

The fourth steganalysis technique identifies the steganographic payload locations

based on multiple stego images. This technique can reveal which pixels in the

binary image carry the message bits. This technique is crucial as it not only

vii

reveals the existence of a hidden message, it also provides information to locate

the hidden message.

Finally, we proposed two improvements to existing JPEG image steganalysis. We

combined several feature sets and applied a feature selection technique to obtain

a set of powerful features. We showed that by minimising the influence of image

content, we can improve the features sensitivity with respect to steganographic

alteration.

viii

STATEMENT OF CANDIDATE

I certify that the work in this thesis entitled “STEGANALYSIS OF BINARY

IMAGES” has not previously been submitted for a degree nor has it been sub-

mitted as part of requirements for a degree to any other university or institution

other than Macquarie University.

I also certify that the thesis is an original piece of research and it has been written

by me. Any help and assistance that I have received in my research work and the

preparation of the thesis itself have been appropriately acknowledged.

In addition, I certify that all information sources and literature used are indicated

in the thesis.

KANG LENG CHIEW

(41375521)

8 June 2011

ix

LIST OF PUBLICATIONS

1. K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Ste-

ganalysis. IEEE Conference on Digital Image Computing: Techniques and

Applications, 96–103, 2008.

2. K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement via

Image-to-image Variation Minimization. International IEEE Conference on

Advanced Computer Theory and Engineering, 223–227, 2008.

3. K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in Binary

Image Embedded by Using Boundary Pixels Steganography. International

Conference on Availability, Reliability and Security, 683–688, 2010.

4. K. L. Chiew and J. Pieprzyk. Blind Steganalysis: A Countermeasure for

Binary Image Steganography. International Conference on Availability, Re-

liability and Security, 653–658, 2010.

5. K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Clas-

sication Based on Multi-Class Steganalysis. 6th International Conference on

Information Security, Practice and Experience, 6047:341–358, 2010.

6. K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Location

in Binary Image. 11th Pacic Rim Conference on Multimedia—Advances in

Multimedia Information Processing, 6297:590–600, 2010.

x

ACKNOWLEDGMENTS

I would like to express my sincere appreciation to my supervisor, Professor Josef

Pieprzyk for his countless help, assistance and guidance in every stage of my

research. I have benefited a lot from the valuable discussion with him since the

very beginning of my research.

I would also like to express my gratitude and special thanks to Dr. Scott McCallum

for being so patient and inspiring in guiding my academic writing skill. The

interaction with him has tremendously improved my understanding in academic

writing.

I want to take this opportunity to thank Ministry of Higher Education Malaysia

and Universiti Malaysia Sarawak for providing me with SLAI scholarship for the

research. I am also very grateful for the HDR Project Support Funds supported

by Macquarie University.

Very special thanks to Joan for spending valuable time to proof-read my thesis.

I would like to thank Nana who always provides me with valuable information,

hints and updates related to my research. I would also like to thank Gaurav for

the enjoyable discussions and interactions. To all the staff in the Department of

Computing, their excellent supports are highly appreciated.

Thanks to my parent, brother, sister and brother-in-law for their continuous sup-

port, encouragement and motivation in me throughout the years.

I am so grateful to my wife, for her love, thoughtful comment, support and nur-

turing in all aspect. Her advice and encouragement have been always be the point

of reference whenever I am lost. The surviving moment would be much tougher if

without her accompany.

And finally, to all the people who have helped directly and indirectly to support

me throughout this undertaking, thank you.

This thesis was edited by Dr Lisa Lines, and editorial intervention was restricted

to Standards D and E of the Australian Standards for Editing Practice.

xi

Chapter 1

Introduction

The process of sending messages between two parties through a public channel

in such a way that it deceives the adversary from realising the existence of the

communication is known as steganography. Tracing back to antiquity, Histaiacus

shaved a slave’s head, wrote a message on his scalp and the slave was sent as a

messenger after his hair grew back to convey steganographic content [12]. The

Greeks received warning about the intention of invasion by Xerxes from a message

underneath a writing tablet covered by wax [3, 84]. In a more recent history,

invisible ink was used as a form of steganography during World War II [12, 59] to

establish covert communication.

An application of steganography was reported in the literature around 1980’s when

British Prime Minister Margaret Thatcher had the word processors programmed

to encode the identity in the word spacing to trace disloyal ministers that were

responsible for the leaks of cabinet documents [2, 3].

The ongoing development of computer and network technologies provides an excel-

lent new channel for steganography. Most digital documents contain redundancy.

This means that there are parts of documents that can be modified without an

impact on their quality. The redundant parts of a document can be identified in

many distinct ways. Consider an image. Typically, margins of the image do not

convey any significant information and they can be used to hide a secret message.

Also, some pixels of the image can be modified to carry a small number of secret

bits as small modification (e.g., least significant bit of pixels) will not be noticeable

to an unsuspecting user. As the redundant parts of a digital document can be

determined in a variety of ways, many steganographic methods can be developed.

Mainly, steganography considers methods and techniques that can create covert

1

communication channels for unobtrusive transmission for military purposes.

Steganography is also used for automatic monitoring of radio advertisements, in-

dexing of videomail (to embed comments) and medical imaging (to embed infor-

mation like patient and physician names, DNA sequences and other particulars)

[3]. Other applications include: smart video-audio synchronization, secure and

invisible storage of confidential information, identity cards (to embed individuals’

details) and checksum embedding [12].

Steganography is also used for the less dramatic purpose of watermarking. The

applications of watermarking mainly involve the protection of intellectual property

such as ownership protection, file duplication management, document authentica-

tion (by inserting an appropriate digital signature) and file annotation.

1.1 Motivations

Like most other areas, steganography has thrived in the digital era. Many inter-

esting steganographic techniques have been created and its continuing evolution

is guaranteed by a growing need for information security. Inevitably, they are

potentially open to abuse and can be used by criminals and terrorists.

An article from USA Today stated that steganography was used by terrorists [60],

although there was little evidence to substantiate this claim [79]. Nonetheless,

after the 9/11 incident, it has triggered immediate concern on the possibility that

steganography can be used in the terrorism planning. In addition, several reports

from the literature stated that steganography has been suspected as a possible

means of covert communication and planning of terrorist attacks [6, 103, 52].

A training manual for the Mujahideen, which contains an exposition on image

steganography over the Internet is also reported in Hogan’s PhD thesis [52]. While

initially the use of steganography by terrorists appeared doubtful, it has since

become accepted and should be treated seriously.

For the less drastic case, those who wish to evade surveillance (e.g., who have

reason to fear punishment for expressing sensitive political thoughts) can use

steganography. For example, the communication between members of a politi-

cal dissident organisation is usually under surveillance. The adversary (i.e., gov-

ernment agencies) may arrest the dissidents if evidence of sensitive issue being

discussed and planned is found. Therefore, steganography may be the safest form

of communication between dissidents. There are a large number of steganographic

2

tools available as commercial software or freeware, which can be easily downloaded.

With these tools, accomplishing such activities will become even simpler1. As a

result, this has created unique challenges for law enforcement agencies.

Digital media and information technology have developed rapidly and are ubiq-

uitous. Information is stored digitally and is abundant. Specifically, there are

a multitude of daily tasks that involves dealing with documents. The originals

of these documents might be digital or they may be converted from hardcopies

into appropriate digital formats. In general, the majority of documents are binary

(black and white), which consist of foreground (black) and background (white).

Scanning such a document obtains a binary image that can potentially be used as

a medium for steganography. This deserves a careful analysis.

Despite the importance and widespread use of binary images in steganography, it

has received little attention, especially the steganalysis of binary image steganog-

raphy. More research is found on the more commonplace steganalysis of greyscale

and colour images; however, these techniques cannot be directly used to analyse

binary image steganography. Therefore, a more appropriate and effective set of

techniques should be developed.

1.2 Research Problems

In general, the steganalysis techniques can be categorised into six levels depending

on how much information about the hidden messages we require. These levels

(ordered according to the increased amount of information acquired) are as follows:

❐ Differentiation between cover and stego documents—this is the first step in

steganalysis and the purpose of this technique is to determine if a given

document carries a hidden message.

❐ Identification of steganographic method—this technique identifies the type of

steganographic method used and it is the so-called multi-class steganalysis.

❐ Estimation of the length of a hidden message—this technique reveals the

amount of embedded message as the acquired information.

❐ Identification of stego-bearing pixels—this technique uncovers the exact lo-

cations where the pixels are used to carry the message bits.

❐ Retrieval of stegokey—this technique provides access to the stego-bearing

1A list of free steganographic tools can be found in the citation entry #25 given in [12].

3

pixels as well as the embedding sequence.

❐ Message extraction—this technique normally involves extracting and deci-

phering the hidden message to obtain a meaningful message.

1.3 Objectives

The main part of the thesis is steganalysis of information hiding techniques. The

task of steganalysis is to design an algorithm that can tell apart a cover doc-

ument from its copy but with a hidden message. A larger part of steganalysis

works published so far deals with grayscale and color images. We consider a less

explored area of binary image steganography, which becomes more and more im-

portant for electronic publishers, distribution, management of printed documents

and electronic libraries.

To summarise, our main objectives cover the following:

❐ To study techniques that can be applied to distinguish the images hidden

with secret messages from those without. This technique will serve as an

automated system to perform the analysis on a large number of images.

❐ To evaluate the functionality of the steganalysis technique across different

steganographic methods. In particular, we are going to investigate how the

steganalysis technique could be used to detect new and unknown stegano-

graphic methods.

❐ To investigate different types of binary image steganography. This is impor-

tant to gain an understanding of the internal mechanism used during the

embedding operation.

❐ To make contributions that will extend the steganalysis technique to extract

additional secret parameters. These secret parameters include hidden mes-

sage length, type of steganographic method used, locations of stego-bearing

pixels and secret key.

Note that there are two aspects of steganalysis. The first relates to the attempt

to break or attack a steganography; the second uses it as an effective way of

evaluating and measuring steganography security performance. This work studies

steganalysis in terms of the first aspect. In particular, we aim to carry out different

levels of analysis to extract the relevant secret parameters.

4

1. Introduction 10. Conclusion

Background

and Review

2. Background

and Concepts

3. Literature

Review

Steganalysis

Enhancement

9. Improving

JPEG Image

Steganalysis

8. Feature-Pooling

Steganalysis

Binary Image

Steganalysis

5. Multi-Class

Steganalysis

6. Message

Length

Estimation

7. Payload

Location

Identifcation

4. Blind

Steganalysis

Figure 1.1: Overview of the thesis

1.4 Research Overview

The general structure of the thesis is shown in Figure 1.1. The chapters can

be divided into the following three parts: background and review, binary image

steganalysis and steganalysis enhancement. The background and review part de-

scribes the main developments and concepts in steganography and its analysis. It

also describes the state of the art and major publications that have influenced the

research developments in the field. The binary image steganalysis part presents

techniques to counteract binary image steganography. The underlying ideas are

to employ statistical techniques to analyse the given images. The steganalysis

enhancement part provides improvement to some of the existing steganalysis tech-

niques that deal with JPEG images.

1.4.1 Contributions

The major contributions of this thesis are listed below.

❐ Blind steganalysis. We have developed a steganalysis technique to distin-

guish a stego image from a cover image. Mainly, we have broken several

steganographic methods from the literature. This technique uses an image

processing technique that extracts sensitive statistical data as the feature

set. From the feature set, it employs classifier to determine the existence of

a secret message. In addition, this technique can be refined and used to de-

tect a different type of steganographic method. This property is important

when dealing with an unknown and new steganographic method.

5

❐ Multi-class steganalysis. We have extended our blind steganalysis to deter-

mine the type of steganographic method used to produce the stego image.

This is important information that allows an adversary to mount a more

specific attack. From the literature review, this is the first multi-class ste-

ganalysis technique developed particularly to attack binary image steganog-

raphy.

❐ Message length estimation. We have designed a simple yet effective technique

based on first-order statistic to estimate the length of an embedded message.

This estimation is crucial and normally is required if we intend to extract a

hidden message. We have identified that the notches and protrusions can be

utilised to approximate the degree of image distortion caused by embedding

operation. In particular, this technique attacks the steganographic method

developed in [69].

❐ Steganographic payload locations identification. We have presented a tech-

nique to identify the locations where hidden message bits are embedded.

This technique is one of the very few researches in the literature that is

able to extract additional secret information. Eventually, this information

is very important for an adversary who wishes to remove a hidden message

or deceive communication.

❐ Enhancement of existing steganalysis techniques. We have proposed improve-

ment to existing JPEG image steganalysis. Specifically, we select and com-

bine several types of features from several existing steganalysis techniques

by using a feature selection technique to form a more powerful blind ste-

ganalysis. We have shown that the technique has improved the detection

accuracy and also reduced the computational resources. We also show that

by minimising the influence of image content, the detection accuracy can be

improved.

1.4.2 Organisation of the Thesis

The rest of the thesis is organised into nine chapters.

Chapter 2 introduces some background to explore the state-of-the-art techniques

studied in this work. Additionally, we introduce the fundamental concepts that

will be used in the following chapters. More precisely, this chapter gives short in-

troductions to the field, including the definitions, terms, synonyms and taxonomy.

Chapter 3 reviews the literature related to our work. We select several steganalysis

6

techniques that are going to be analysed in the thesis. To make the presentation as

meaningful as possible, the reviews are organised into different levels of analysis.

There are myriad of possible steganographic methods available; however, we will

discuss only the methods selected for our analysis. Please refer to [12] for a

comprehensive review of steganography.

Our steganalysis starts from finding an algorithm that is able to distinguish a cover

image from a stego one. This work employs pattern recognition methodology to

perform the classification. Our focus is to extract a discriminative feature set to

enable accurate detection of the existence of secret messages. This analysis was

published in [20] and is presented in Chapter 4.

Chapter 5 discusses an algorithm for identification of a steganographic method

that has been used to embed a secret message into a binary image. We assume

that the collection of possible methods is known. The objective of this analysis is

twofold: to differentiate an image with hidden message from one without and to

identify the type of steganographic method used. This analysis is an extension on

the work presented in Chapter 4 to form a more powerful multi-class steganalysis.

This work has been published in [19].

In Chapter 6, we present a technique for estimating the length of a hidden message

embedded in a binary image. This estimated length is one of the important secret

steganographic parameters and is usually required to accomplish further analysis,

such as retrieving the stegokey shared between the sender and receiver. The

technique presented in this chapter has also been published in [21].

The work done in the previous chapters so far has enabled us to discriminate

images with a hidden message from those without one. However, the ability to

discriminate images does not enable us to locate the hidden message. Therefore, we

wish to investigate the identification of hidden message bits location in an image.

The work is based on the concept developed by Ker [62] where it is assumed that

we may access different stego images with message bits embedded in the same

locations. This assumption is possible when the same stegokey is reused for a

batch of secret communications. The essential difference is the medium under

the analysis, namely the binary image, which is known to have modest statistical

characteristics. This work is presented in Chapter 7. An initial study of this

chapter has been published in [22].

Although the previous chapters focused primarily on binary image steganalysis,

we have also paid attention to the steganalysis in other image domains. Our

7

contribution to greyscale image steganalysis is supplementary, but is as important

as that of the other chapters and is presented in Chapters 8 and 9. This work

can be considered an adjunct to existing steganalysis techniques that contributes

some enhancements. The enhancements discussed in Chapters 8 and 9 have been

published in [17] and [18], respectively.

We conclude the thesis in Chapter 10 where we discuss possible future directions

for the research.

8

Chapter 2

Background and Concepts

This chapter introduces and defines the concepts used throughout this thesis and

provides relevant background information. We start by providing an overview

of steganography and a formal definition. We also provide the description of

its counterpart, namely steganalysis. We discuss different types of steganalysis,

which are referred to as different levels of analysis. For steganalysis that involves

classification, we dedicate a section that discusses different types of classifiers.

Finally, since this thesis focuses on the analysis of image steganography, we also

provide a description of a variety of common digital images used for steganography.

2.1 Overview of Steganography

Usually cryptography is used to protect a communication from eavesdropping.

Messages are encrypted and only a rightful recipient can decrypt and read the mes-

sages. However, encrypted messages are obvious, which might arouse the suspicion

of an eavesdropper. Consequently, the communication is probably susceptible to

attacks.

Steganography is an alternative method for privacy and security. Instead of en-

crypting, we can hide the messages in other innocuous looking medium (carrier) so

that their existence is not revealed. Clearly, the goal of cryptography is to protect

the content of messages, steganography is to hide the existence of messages. An

advantage of steganography is that it can be employed to secretly transmit mes-

sages without the fact of the transmission being discovered. Often, cryptography

and steganography are used together to achieve higher security.

9

Embedding ExtractionCarrier

Message Message

Key

Public Channel

Figure 2.1: General model of steganography

Steganography can be mathematically defined as follows:

Emb : C ×M ×K → S

Ext : S ×K → M, (2.1)

such that Emb(C,M,K) = S and Ext(S,K) = M . Emb and Ext are the embed-

ding and extraction mapping functions, respectively. C is the cover medium, S is

the medium embedded with message M and K denotes the key.

Figure 2.1 shows a simple representation of the generic embedding and extrac-

tion operation in steganography. During the embedding operation, a message is

inserted into the medium by altering some portion of it. The extraction oper-

ation involves the recovery of the message from the medium. In this example,

the message is embedded inside a carrier and is transmitted via a public channel

(e.g., internet). While at the receiving site, the message is extracted using the key

shared between the sender and receiver. The message is the hidden information

and can be a plain text, cipher text, image or anything that can be converted into

stream of bits.

Consider a typical image steganography. In the embedding operation, a secret

message is transformed into a bit stream of bits, which is embedded into the

least significant bits (LSBs) of the image pixels. The embedding overwrites the

pixel LSB with the message bit if the pixel LSB and message bit do not match.

Otherwise, no changes are necessary. For the extraction operation, message bits

are retrieved from pixel LSBs and combined to form the secret message.

There are two main selection algorithms that can be employed to embed secret

message bits: sequential and random. For sequential selection, the locations of

10

pixels used for embedding are selected sequentially—one after another. For in-

stance, pixels are selected from left to right and top to bottom until all message

bits are embedded. With random selection, the locations of the pixels used for

embedding are permuted and distributed over the whole image. The distribution

of the message bits is controlled by a pseudorandom number generator (PRNG)

whose seed is a secret shared by the sender and the receiver. This seed is also

called the stegokey.

The latter selection method provides better security than the former because ran-

dom selection scatters the image distortion over the whole image, which makes

it less perceptible. In addition, the complexity of tracing the selection path for

an adversary is increased when random selection is applied. Apart from this,

steganographic security can be enhanced by encrypting the secret message before

embedding it.

Almost any form of digital media can be used for steganographic purposes as long

as the information in the media has redundancy. These media can be classified (but

not limited) to the following categories: images, videos, audios, texts, executable

files and computer file systems [67, 94, 5, 81, 29, 26, 104, 118, 46, 1, 28, 83]. The

most common medium is an image, as the large redundancy of images allows easy

embedding of messages [78]. The input image used in the embedding operation

is called the cover image; the generated output image (with the secret message

embedded in it) is called the stego image. Ideally, the cover and stego images

should appear identical—it should be difficult for an unsuspecting user to tell

apart the stego image from the cover image.

A list of possible choices for cover images includes binary (black and white),

greyscale and colour images. Tseng and Pan [107] developed a steganography

that embeds a secret message in a binary image, and Liang et al. [69] used binary

images in their steganography. The OutGuess [90] and F5 [110] are examples of

steganography that apply greyscale and colour images. A more recent stegano-

graphic method developed by Yang (see [117]) uses colour images.

2.2 Steganalysis—Model of Adversary

The invasive nature of steganography leaves detectable traces within the stego

image. This allows an adversary to use steganalysis techniques to reveal that a

secret communication is taking place. Sometimes, an adversary is also referred

11

to as a warden. In general, there are two types of warden: passive and active.

A passive warden only examines the communication and wishes to know if the

communication contains some hidden messages. The warden does not modify

the content of the communication. For example, the communication is allowed if

no evidence of secret message is found. Otherwise, it is blocked. On the other

hand, an active warden may introduce distortion to interrupt and destroy the

communication although there is no evidence of secret communication. Most

current steganographic methods are designed for the passive warden scenario.

Without loss of generality, we will use the term adversary instead of warden in all

the following steganalysis scenarios.

Beside the warden scenario discussed above, sometimes an adversary may not

have the authority or resources to block the communication. Then, the adversary

might wish to acquire related secret information (parameters) or even to extract

the secret message. Note that our works are based on this type of adversary who

wants to extract information about a secret message. We will discuss this at length

shortly in the next section.

In general, there are two types of steganalysis: targeted and blind. Targeted ste-

ganalysis is designed to attack one particular embedding algorithm. For example,

the work in [7, 49, 57, 42] is considered targeted steganalysis. Targeted steganal-

ysis can produce more accurate results, but it normally fails if the embedding

algorithm used is not the target.

Blind steganalysis can be considered a universal technique for detecting different

types of steganography. Because blind steganalysis can detect a wider class of

steganographic techniques, it is generally less accurate; however, blind steganalysis

can detect new steganographic techniques where there is no targeted steganalysis

available yet. In other words, blind steganalysis is an irreplaceable detection tool

if the embedding algorithm is unknown or secret. The feature-based steganalysis

developed in [35] is one example of successful blind steganalysis. Other examples

are to be found in [99, 70].

The most widely used definition of steganography security is based on Cachin

scheme [8]. Let the distribution of cover image and stego image be denoted as

PC and PS, respectively. Cachin defined steganography security by comparing the

distribution, PC and PS. The comparison can be made by using Kullback-Leibler

distance defined as follows:

D(PC‖PS) =∑

c∈C

PC(c) logPC(c)

PS(c). (2.2)

12

When D(PC‖PS) = 0, it means the distribution of stego image, PS is identical

to the distribution of cover image, PC . This implies the steganography is per-

fectly secure because it is impossible for the adversary to distinguish between

cover and stego images. If D(PC‖PS) ≤ ǫ, then Cachin defined the steganogra-

phy as ǫ-secure. Thus, the smaller ǫ is, the greater the likelihood that a covert

communication (i.e., steganography) will not be detected.

As discussed in [25], another possible way to define steganography security is based

on a specific steganalysis technique. Alternatively, one could define the security

with respect to the inability of an adversary to prove the existence of covert

communication. In other words, a steganographic method may be considered

“practically secure” if no existing steganalysis technique can be used to mount a

successful attack.

2.3 Level of Analysis

Under ideal circumstances, an adversary applying steganalysis intends to extract

the full hidden information. This task can be very difficult, or even impossible

to achieve. Thus, the adversary may start steganalysis with more realistic and

modest goals in mind, such as restricting the effort to differentiating cover and

stego images, classifying the embedding technique, estimating the length of hidden

messages, identifying the locations where bits of hidden information are embedded

and retrieving the stegokey. Achieving some of these goals allows improvement of

the steganalysis, making it more effective and appropriate for the steganographic

method.

The first step in analysing steganography can be distinguishing cover from stego

images. This involves analysing the characteristics of the image and looking for

the evidence of abnormalities. This step is plausible because the embedding op-

eration will distort the image content and produce deviations from normal image

characteristics. For example, the first-order statistic of a stego image tends to

form histogram bin pairing, where this abnormal characteristic practically never

occurs in a cover image. Normally, this analysis is known as the most basic level

of blind steganalysis.

It is also possible to extend this level of blind steganalysis to a more involved

level, known as multi-class steganalysis. From a practical perspective, multi-class

steganalysis is similar to the basic level; however, instead of classifying two classes

13

(cover and stego images), multi-class steganalysis can classify images into more

classes that come from different types of stego images produced by different em-

bedding techniques. Hence, the task of multi-class steganalysis is to identify the

embedding algorithm applied to produce a given stego image, or to classify it as

a cover image if no embedding is performed on it.

Normally, to avoid suspicion, the amount of message embedded is far less than the

image can accommodate. Thus, an adversary cannot tell how much information

has been embedded based on the size of the image and a statistical approach needs

to be utilised to estimate the hidden message length. Note that the terms message,

hidden message and secret message are used interchangeably. The message length

is the number of bits embedded in the image. It is normally defined by the ratio

between the number of embedded message bits and the maximum number of bits

that can be embedded in a given image. It can also be measured as bits per pixel

(bpp).

The analysis levels discussed so far cannot reveal the locations where hidden mes-

sage bits are embedded. However, with the help of estimated hidden message

length as side information, an adversary can proceed to identify the stego-bearing

pixels. Identifying the exact location of stego-bearing pixels is not easy for two

reasons. First, the message bits are often randomly scattered throughout the

whole image. Second, it is difficult or impossible to detect hidden message bits

that are unchanged with respect to the cover image.

Identifying the stego-bearing pixels locates the message bits, but does not deter-

mine the sequence of the message bits. Thus, the next level of steganography

analysis is to retrieve the stegokey. Successfully retrieving the stegokey can be

considered a bigger achievement—it provides access to the stego-bearing pixels as

well as the embedding sequence. In other words, a correct stegokey will give in-

formation about the order of bits that create the hidden message. Studies related

to each analysis technique will be given and elaborated in Section 3.2.

2.4 Blind Steganalysis as Pattern Recognition

An example of classification problem involves dividing a set of many possible ob-

jects into disjoint subsets where each subset forms a class. Usually, the pattern

recognition techniques are used to solve this problem. Pattern recognition is an

important aspect of Computer Science that focuses on recognising complex pat-

14

Decision

(Cover or Stego)

Image

Features

Feature Extraction

TestingTraining

Classification

Trained Model

Figure 2.2: General framework of blind steganalysis

terns from samples and making intelligent decisions based on the patterns.

As discussed in Section 2.3, blind steganalysis examines the image characteristics

(samples) and determines whether these characteristics exhibit abnormalities (de-

cision making). This means that, given an image, the steganalysis should be able

to decide the class (cover or stego) in which the image belongs. Hence, the prob-

lem of blind steganalysis can be considered a classification problem and techniques

from pattern recognition can be employed.

Different embedding techniques are thought to produce different changes in image

characteristics. In other words, the characteristics of cover and stego images differ,

and those resulting from different stego images (stego images produced by different

embedding techniques) differ as well. Therefore, it is possible to extend the pattern

recognition techniques to differentiate and classify these images. This extended

blind steganalysis is known as multi-class steganalysis.

As with any pattern recognition methodology, blind and multi-class steganaly-

sis consist of two processes—feature extraction and classification. The general

framework for blind steganalysis is shown in Figure 2.2.

2.4.1 Feature Extraction

Feature extraction is a process of constructing a set of discriminative statistical

descriptors or distinctive statistical attributes from an image. These descriptors or

15

attributes are called features. Alternatively, feature extraction can be considered

a form of dimensionality reduction. It is desirable that the extracted features

should be sensitive to the embedding artefact, as opposed to the image content.

Some examples of the features extracted in the early stage of blind steganalysis

research include image quality metrics, wavelet decompositions and moment of

image statistic histograms. These features were used in the blind steganalysis

developed in [4], [73] and [48], respectively. More recent features developed include

Markov empirical transition matrix, moment of image statistic from spatial and

frequency domains and the co-occurrence matrix, which are employed in [54], [14]

and [116], respectively. The details of these features will be covered in Section 3.2.

2.4.2 Classification

Classification identifies or categorises images into classes (such as a cover or stego

image) based on their feature values. The primary classification involved in ste-

ganalysis is supervised learning. In supervised learning, a set of training samples

(consisting of input features and class labels) is fed in to train the classifier. Once

the classifier is trained (trained model), it predicts the class label based on the

given features.

Some of the common classifiers used in steganalysis include multivariate regression,

Fisher linear discriminant, neural network and support vector machines (SVM).

Multivariate regression [11] provides a trained model, which consists of regres-

sion coefficients. During training, regression coefficients are predicted using the

minimum mean square error. For example, let the target label (or class label)

be yi and xij denotes the features, where i = 1, . . . , N indicates the ith image

and j = 1, . . . , n indicates the jth feature, then the linear expression would be as

shown below:

y1 = β1x11 + β2x12 + · · ·+ βnx1n + ε1,

y2 = β1x21 + β2x22 + · · ·+ βnx2n + ε2,...

yN = β1xN1 + β2xN2 + · · ·+ βnxNn + εN , (2.3)

where β is the regression coefficient and ε is the zero mean Gaussian noise. N and

16

n are the total number of samples and features, respectively. With these regression

coefficients, a given image can be classified by regressing the image features. The

computed target value is then compared to a threshold to determine the right

image class.

Fisher linear discriminant is a classification method that projects multi-

dimensional features, x onto a linear space [16]. Suppose two classes of obser-

vations have means µy=0 and µy=1, and covariances Σy=0 and Σy=0, the linear

combination of features w × x will have means w × µy=i and variances wTΣy=iw

for i = 0, 1. Fisher linear discriminant is defined as a linear combination of features

that maximizes the following separation, S:

S =σ2between class

σ2within class

,

=(w × µy=0 − w × µy=1)

2

wTΣy=0w + wTΣy=1w,

=(w(µy=0 − µy=1))

2

wT (Σy=0 + Σy=1)w. (2.4)

Next, it can be shown that the optimal w is given by

w = (Σy=0 + Σy=1)−1(µy=0 − µy=1). (2.5)

Finally, an image can be classified by linearly combining its extracted features

with w and comparing the result to a threshold.

Artificial neural network, usually called neural network, is an information-

processing model inspired by the way the biological nervous system (e.g., the

brain) processes information. The basic building block of the neural network is

the processing element (PE), commonly known as the neuron. The processing

capabilities are derived from a collection of interconnected neurons (PEs). Math-

ematically, a neural network can be considered a mapping function F : Xn → Y ,

where n dimensions of features X are the inputs to the neural network, with deci-

sion values Y (class labels) [119]. The function F can be defined as a composition

of other functions Gi = (G1, . . . , Gm). In addition, function Gi can further be

defined as a composition of other functions. The composition of these function

definitions forms the neural network. The structure of these functions and their de-

pendencies between inputs and outputs will determine the type of neural network.

The most common types used in classification are feedforward and backpropa-

gation neural network. As with any other supervised learning, the classification

17

Y

X

Figure 2.3: Two-class SVM classification

process in a neural network involves two operations—training and testing. During

training, the neural network learns to associate outputs with input patterns. This

is carried out by systematically modifying the weights of the inputs throughout

the neural network. When the neural network is used for testing, it identifies

the input pattern and tries to determine the associated output. When the input

pattern has no associated output, the neural network provides an output that

corresponds to the best match of the learned input patterns.

Support vector machines (SVM) are a classification technique that can learn

from a sample. More precisely, we can train the SVM to recognise and assign class

labels based on a given collection of data (i.e., features). For example, we train the

SVM to differentiate cover images from stego images by examining the extracted

features from many instances of cover and stego images. To illustrate the point,

let us interpret this example using the illustration shown in Figure 2.3. The X and

Y axes represent two different features. Cover and stego images are represented by

circles and stars, respectively. Given an unknown image (represented by a square),

the SVM is required to predict the class to which it belongs.

This example is easy, as the two classes (cover and stego) form two distinct clusters

that can be separated by a straight line. Hence, the SVM finds the separating line

and determines the cluster for the unknown image. Finding the right separating

line is crucial and it is provided during the training. In practice, the feature

dimensionalities are higher and we need a separating plane, known as separating

hyperplane, instead of a line.

Thus, the goal of SVM is to find a separating hyperplane that can effectively

separate classes. To do that, the SVM will try to maximise the margin of the sep-

arating hyperplane during training. Obtaining this maximum-margin hyperplane

will optimise the SVM’s ability to predict the correct class of an unknown object

(image).

18

However, there are often non-separable datasets that cannot be separated by a

straight separating line or flat plane. The solution to this difficulty is to use a

kernel function. A kernel function is a mathematical routine that projects the

features from a low-dimensional space into a higher dimensional space. Note that

the choice of kernel function will affect the classification accuracy. For additional

reading on SVMs, see [80].

2.5 Digital Images

As discussed in the overview of steganography section, practically any form of

digital media can be used to carry secret messages. Examples of these media

include image, video, audio, text, etc. By far the most popular choice is image.

In this section we will introduce various digital images, since the vast majority of

research in steganography is concerned with image steganography. In addition,

the work of the thesis is concentrated on image steganalysis.

A digital image is produced through a process called digitisation. Digitising an

image involves converting analogue information into digital information; thus, a

digital image is the representation of an original image by discrete sets of points.

Each of these points is called a picture element or pixel.

Pixels are normally arranged in a two-dimensional grid corresponding to the spatial

coordinates in the original image. The number of distinct colours in a digital

image depends on the number of bits per pixel (bpp). Hence, the types of digital

image can be classified according to the number of bits per pixel. There are three

common types of digital image:

❐ Binary image. In a binary image, only one bpp is allocated for each pixel.

Since a bit has only two possible states (on or off), each pixel in a binary

image must represent one of two colours. Usually, the two colours used are

black and white. A binary image is also called a bi-level image.

❐ Greyscale image. A greyscale image is a digital image in which the only

colours are shades of grey. The darkest possible shade is black, whereas

the lightest possible shade is white. Normally, there are eight bits per pixel

assigned for a greyscale image. This creates 256 possible different shades of

grey.

❐ Colour image. In general, a pixel in a colour image consists of several primary

colours. Red, green and blue are the most commonly used primary colours.

19

Each primary colour forms a single component called a channel, with eight

bits usually allocated for each channel, producing 24 bits per pixel. This

corresponds to roughly 16.7 million possible distinct colours. When the

channels in a colour image are split, each forms a different greyscale image.

2.5.1 Image File Formats

After digitisation, a digital image can be stored in a specific file format. Although

many file formats exist, the major formats include BMP, JPEG, TIFF, GIF and

PNG. Images stored in these formats are considered raster graphics. Another

type of graphic image is a vector graphic image. Unlike raster graphics, which use

pixels, vector graphics use geometric primitives such as points, lines and polygons

to represent the images. The rendering of the geometric primitives in vector

graphics is based on mathematical equations. This thesis focuses on raster rather

than vector graphics.

In a raw image, the data captured from a digital device sensor are preserved and

stored in a file. The data captured are raw in the sense that no adjustment or

processing is applied. The data are merely a collection of pixel values captured

at the time of exposure. Note that there is no standard for a raw image and it is

device dependent. Hence, a raw image is often considered an image, rather than

a standard image file format.

The bitmap or BMP format is considered a simple image file format. Normally

the data is uncompressed and easy to manipulate. However, the uncompressed

BMP format gives a BMP image a larger file size than that of a compressed

image. A BMP image can also use a colour palette for indexed-colour images.

Nonetheless, a colour palette is not used for BMP images greater than 16 bpp or

higher.

The joint photographic experts group (JPEG) format is by far the most com-

mon image file format. JPEG images are very popular and primarily used in

photographs. Their popularity is due to the excellent image quality they produce

despite a smaller file size. This is achieved through lossy compression. Many

imaging applications allow users to control the level of compression. This is useful

because users can trade off image quality for a smaller file size and vice versa.

However, lossy compression reduces the image quality and cannot be reversed.

In situations where the image quality is as important as the file size, the tagged

20

image file format (TIFF) could be a suitable choice. The TIFF format uses

lossless compression, which reduces the image file size while preserving the original

image quality. This makes TIFF a popular image archive option. In addition, as

the name implies, the TIFF format also offers flexible information fields in the

image header called tags. These tags are very useful and can be defined to hold

application-specific information.

The graphics interchange format (GIF) uses a colour palette to produce an indexed-

colour image. It also uses lossless compression. GIF can offer optimum compres-

sion when the image contains solid colour graphics (such as a logo, diagram, draw-

ing, or clipart). In addition, GIF supports transparency and animation. These

features make GIF an excellent format for certain web images. However, GIF is

not suitable for complex photographs with continuous tones, as a GIF image can

store only 256 distinct colours.

Compared with GIF, the portable network graphics (PNG) format provides more

improvements. These improvements include greater compression, better colour

support, gamma correction in brightness control and image transparency. The

PNG format is an alternative to GIF and is expected to become a mainstream

format for web images.

2.5.2 Spatial and Frequency Domain Images

In a general sense, an image (I) can be considered a result of the projection of a

scene (S) [34]. The spatial domain image is said to have a normal image space,

which means that each image element at location ℓ in image I is a projection at

the same location in scene S. The distance in spatial domain corresponds to the

real distance. A common example of the spatial domain image is BMP image.

The frequency domain image has a space where each element value at location

ℓ in image I represents the rate of change over a specific distance related to the

location ℓ. A popular frequency domain image is the JPEG image.

21

Chapter 3

Literature Review

This chapter considers research relevant to both steganography and steganalysis.

Steganography is presented in Section 3.1. We give an overview of different types

of steganography with the emphasis on image steganography. In particular, we

discuss binary image steganography in the first part of the section and JPEG image

steganography in the second part. Steganalysis is discussed in Section 3.2. We

review and highlight the most relevant existing techniques in steganalysis. These

techniques are specifically used in analysing image steganography. The discussion

is divided into six subsections. We follow and organise the discussion according

to the different levels of analysis, which is presented in Section 2.3.

3.1 Steganography

This section discusses some of the selected steganographic methods. These par-

ticular methods are used as a subject in our analysis. The first five subsections

discuss steganographic methods that use binary images as the cover images and

the rest of the subsections discuss methods that use JPEG images.

3.1.1 Liang et al. Binary Image Steganography

Consider a variant of boundary pixel steganography proposed by Liang et al. [69].

Boundary pixel steganography hides a message along the edges, where white and

black pixels meet—these are known as boundary pixels. Note that the boundary

pixels are those pixels within the image where there is colour transition occurred

22

between white and black pixels. The boundary pixels should not be confused with

the four borders of an image.

To obtain higher imperceptibility, the pixel locations used for embedding are per-

muted and distributed over the whole image. The distribution of message bits

is controlled by a pseudorandom number generator whose seed is a secret shared

by the sender and the receiver of the hidden message. This seed is also called

stegokey.

As the message bits are embedded on the boundary pixels of the image, it is

important to identify the boundary pixels and their orders unambiguously. Once

the sequence of boundary pixels is obtained, a pseudorandom number generator is

used to determine the place where the message bits should be hidden. The authors

of [69] define boundary pixels as those that have at least one neighbouring pixel

with a different intensity. For example, a white (black) pixel must have at least

one black (white) neighbouring pixel. Note that a pixel can have, at most, four

neighbours (left, right, top and bottom).

Not all boundary pixels are suitable for carrying message bits because embedding

a bit into an arbitrary boundary pixel may convert it into a non-boundary one. If

this happens, then the extraction will not be correct and recovery of the hidden

message is impossible.

Because of this technical difficulty, the authors have proposed a modified algo-

rithm that adds restrictions on the selection of boundary pixels for embedding. A

currently evaluated boundary pixel, P is considered eligible for embedding if the

following two conditions are satisfied:

i. Among the four neighbouring pixels, there exist at least two unmarked neigh-

bouring pixels and their pixel values must be different.

ii. For each marked neighbouring pixel (if any), its neighbouring pixels (exclud-

ing the current pixel, P ) must also satisfy the first criterion.

A pixel is said to be marked if it has already been evaluated or is assigned a

(pseudorandom) index with a smaller value than the current index. In contrast,

a pixel is said to be unmarked if it is evaluated after the current pixel.

Figures 3.1 and 3.2 show some examples of eligible and ineligible pixels, respec-

tively. The shaded box represents a pixel value of zero and the white box represents

a pixel value of one. These pixels are taken from some portion of a binary image.

Pixel P is the currently evaluated pixel and the number inside each box is the

23

pseudorandom index. This index will indicate if a pixel is unmarked or marked.

For example, in Figure 3.1(b), the current pixel, P will have three unmarked

(i.e., left, right and top neighbouring pixels) and one marked pixels (i.e., bottom

neighbouring pixel).

Pixel P in Figure 3.1(a) is an eligible pixel because it satisfies the first condition

and it does not have any marked neighbouring pixel. Pixel P in Figure 3.1(b)

satisfies both conditions and thus, it is an eligible pixel. On the other hand,

pixel P in Figure 3.2(a) is an ineligible pixel because it does not satisfy the first

condition. Pixel P in Figure 3.2(b) only satisfies the first condition; therefore, it

is also considered ineligible.

Current pixel, P

55 7

22

30

12 21

90 56 9

32

80

73

10 63 67 45

(a)

Current pixel, P

55 7

22

30

12 21

90 56 9

32

80

73

10 63 67 45

(b)

Figure 3.1: Example of eligible pixels

Current pixel, P

55 7

22

30

12 21

90 56 9

32

80

73

10 63 67 45

(a)

Current pixel, P

55 7

22

30

12 21

90 56 9

32

80

73

10 63 67 45

(b)

Figure 3.2: Example of ineligible pixels

Once the boundary pixel is found eligible, the message bit will be embedded in

the pixel by overwriting its value if the message bit does not match the value;

otherwise, the pixel is left intact. This procedure is applied and repeated to

embed other message bits.

24

3.1.2 Pan et al. Binary Image Steganography

Motivated by Wu and Lee [113], Pan et al. developed a steganographic method

that embeds secret messages in binary images [82]. Compared with [113], this

method is more flexible, in terms of choosing the cover image block. The Pan

et al. method uses every block within an image to carry a secret message. This

gives it a greater embedding capacity. The security is also improved by having

less alteration of the cover image.

In this embedding algorithm a random binary matrix, κ and a secret weight matrix,

ω are defined and shared between the sender and receiver. Both matrices are of

size m×n. κ is a binary matrix and the matrix ω has elements of {1, 2, . . . , 2r−1}

where r is the number of message bits to be embedded within a block. A given

binary image is partitioned into non-overlapping blocks, Fi of size m× n and the

following matrix, ϕi is computed:

ϕi =∑

[(Fi ⊕ κ)⊗ ω], (3.1)

where ⊕ and ⊗ are the bitwise exclusive-OR and pair-wise multiplication opera-

tors, respectively.∑

[·] is the arithmetic summation of all elements in the matrix.

r message bits, mN = Φ(m1m2 . . .mr) are embedded in block Fi by ensuring the

following invariant:

ϕi ≡ mN mod 2r, (3.2)

where mN is the decimal representation of the message bits and Φ(·) is the binary

to decimal conversion. If the invariant holds, the Fi from ϕi (Equation (3.1)) is

left intact. Otherwise, some pixels from Fi will be altered. In most cases, one

pixel will be flipped if there is a mismatch and an alteration is required. However,

if flipping one pixel is not sufficient, flipping a second pixel will guarantee the

invariant to be held. Hence, only two pixels of Fi will be altered, at most. This

method can embed up to r = ⌊log2(mn + 1)⌋ bits per block.

Successfully extracting a secret message requires the correct combination of κ

and ω. κ and ω, in this case, can be considered the stegokey. The receiver also

needs to know the correct parameters (m, n and r) used in the embedding. Then

the secret message bits embedded in a block can be extracted through Equation

(3.2)—mN = ϕi mod 2r. The extracted mN from each block is converted into

binary bits and concatenated to form the secret message.

25

3.1.3 Tseng and Pan Binary Image Steganography

Although the method developed in [82] generally enhanced security (by altering

fewer pixels for the same amount of embedded message), the quality of the stego

image has not been taken into consideration. Noise may become prominent in

certain blocks after embedding. For example, an isolated dot may exist in an

entirely black or white block.

As a sequel to the work done in [82], Tseng and Pan revised the method and

enhanced it [107]. The main contribution of this work was to maintain the image

quality through sacrificing some of the payload. According to the authors, the

image quality can be greatly improved while still maintaining a good embedding

rate—as much as r = ⌊log2(mn+1)⌋−1 bits per block, where m×n is the size of

a block. On average, r is only one bit per block less than their previous method.

To maintain image quality, the method discards any block that is either entirely

black or white. In addition, when a pixel must be flipped to carry a message bit,

the selection of which pixel to flip is governed by a distance matrix. The distance

matrix selects only a pixel in which the new value (after flipping) is the same as

the pixel value of its majority neighbouring pixels. This prevents the generation

of isolated dots, which degrades the image quality. For example, Figure 3.3 shows

two possible ways of flipping a pixel. Obviously, the effect of flipping will be

less visible in Figure 3.3(b) than in Figure 3.3(c). The authors also defined an

additional criterion for the secret weight matrix, ω which also improves the image

quality.

(a)

Flipped pixel

(b)

Flipped pixel

(c)

Figure 3.3: Effect of flipping a pixel: (a) original block of pixels; (b) no isolateddot (c) obvious isolated dot

Similar to their previous method, the maximum number of pixels that must be

altered per block to carry the message bits is, at most, two. The rest of the em-

bedding and extraction algorithms are similar to the previous method. However,

26

if block Fi becomes entirely black or white after embedding, it is skipped. The

alteration of that block will not be reversed and the same message bits will be em-

bedded in the next block. This is important to ensure the correctness of message

extraction.

Both methods have the flexibility to adjust between security level and payload

size. When increased security is necessary, the block size (parameters m and n)

can be increased. This larger block size will reduce the payload size because the

total number of blocks per image will be reduced when the block size is larger.

Eventually, with the same r bits per block, the total payload is reduced as the

total number of blocks is reduced.

3.1.4 Chang et al. Binary Image Steganography

The steganographic method developed by Chang et al. [10] can be considered a

variant improved from the binary image steganography developed by Pan et al.

[82]. In general, this method offers the same embedding rate as the Pan et al.

method, which is r = ⌊log2(mn + 1)⌋ bits per block (m × n is the block size).

However, this method is superior to the Pan et al. method in the sense that it

alters one pixel (at most) to embed the same amount of message bits within a

block (as opposed to two pixels in the Pan et al. method). Thus, this method

provides a higher level of security by reducing the alteration of the stego image.

Practically, the Chang et al. method also employs two matrices during embed-

ding: a random binary matrix and serial number matrix. The main difference in

the Chang et al. method is the introduction of the serial number matrix to re-

place the secret weight matrix. This enables this method to work with less image

alteration. With the serial number matrix, r linear equations, known as general

hiding equations, are defined to embed r bits of message in a block. The general

hiding equations are used to determine the pixel suitable for flipping. To obtain

valid general hiding equations, the serial number matrix is required to have 2r − 1

elements with non-duplicate decimal values.

For message extraction, each block is transformed using the bitwise exclusive-

OR operator with the random binary matrix. For each block, r general hiding

equations are defined through the serial number matrix. The parities of results

calculated from the r general hiding equations are obtained as the message bits.

Clearly, the random binary matrix and serial number matrix are used as the

stegokey and shared between the sender and receiver.

27

3.1.5 Wu and Liu Binary Image Steganography

Another steganography using a block-based method to embed secret messages in

binary images is that developed by Wu and Liu in [112]. This technique also starts

by partitioning a given image into blocks. To avoid synchronisation problems

(which lead to incorrect message extraction) between embedding and extraction,

this technique embeds a fixed number of message bits within a block. In their

implementation details, the authors opt to embed one message bit per block. The

embedding algorithm is based on the odd-even relationship of the number of black

pixels within a block. In other words, the total number of black pixels within a

block is kept as an odd number when a message bit of ones is embedded, whereas

the total number of black pixels is kept as an even number for a message bit of

zeros. If the odd-even relationship matches the message bit, no flipping is needed.

Otherwise, some pixels must be flipped.

Like any other embedding technique, the most important part is the selection of

pixels for flipping. An efficient selection approach ensures minimum distortion.

That is why, in [112], Wu and Liu introduced a flippability scoring system for

selecting pixels for flipping. The score for each pixel is computed by examining the

pixel and its immediate neighbours (those within a 3 × 3 block). The flippability

score is produced by a decision module based on the input of two measurements.

The first measurement is the smoothness, which computes the total number of

transitions in the vertical, horizontal and two diagonal directions. The second

measurement is the connectivity, which computes the total number of black and

white clusters formed within a block. These measurements are all computed within

a 3 × 3 block. An illustration of these measurements is shown in Figure 3.4.

3.1.6 F5 Steganography

In [111], Westfeld and Pfitzmann observed that an embedding algorithm that

overwrites the LSB of JPEG coefficients causes the JPEG distribution to form

pair of values (PoV). PoV occurs when two adjacent frequencies in the JPEG

distribution are similar (Figure 3.12 shows the effect of PoV). By exploiting the

PoV, Westfeld and Pfitzmann concluded that a steganographic method can be

broken. They showed the analysis and attack on Jsteg using the chi-square test

(details of this attack are discussed in Subsection 3.2.3).

As a result, Westfeld developed a new steganography called F5 [110]. F5 is for-

28

Vertical

0

0

0

0

1

0

Horizontal Diagonal Anti-diagonal

1

1

1

0

1

1

1

1

1

1

(a)

3 x 3 block 1 white cluster 2 black clusters(b)

Figure 3.4: Measurement of smoothness and connectivity: (a) smoothness is mea-sured by the total number of transitions in four directions (the arrows indicatethe transition directions, 0 indicates no transition and 1 indicates a transition)(b) Connectivity is measured by the number of the black and white clusters (fourwhite pixels forming 1 cluster and 5 black pixels forming 2 clusters)

mulated to preserve the original property of the statistic (i.e., the JPEG distri-

bution). When alteration is required during embedding, F5 will decrement the

absolute value of JPEG coefficients by one, instead of overwriting the LSBs with

message bits. This prevents the formation of the PoV; hence, F5 cannot be de-

tected through the chi-square test.

To minimise the changes caused by embedding, matrix encoding is employed to

increase embedding efficiency. Finally, to avoid concentrating the embedded mes-

sage bits in a certain part of the image, F5 embeds on the randomly permuted

sequence of coefficients. The random sequences are generated by a PRNG.

3.1.7 OutGuess Steganography

OutGuess is a type of JPEG image steganography developed by Provos in [90].

This method was developed to withstand the chi-square attack and the extended

chi-square attack as well. This method can be summarised as two main operations:

embedding and statistical correction.

Similar to other JPEG image steganographies, OutGuess embeds message bits

by altering the LSBs of JPEG coefficients. The embedding is spread randomly

29

throughout the whole image using a random selection that proceeds with the

coefficients from the beginning until the end of the image. To select the next

coefficient, OutGuess computes a random offset and adds the offset to the current

coefficient location. The random offsets are computed by a PRNG.

Note that the embedding will cause the image statistics (i.e., the distribution of

the coefficients) to deviate, hence some coefficients are reserved (unaltered) with

the intention of correcting the statistical deviation. In other words, after all the

message bits are embedded, the reserved coefficients will be adjusted accordingly.

The adjustment is carried out such that the distributions of cover and stego images

are similar.

3.1.8 Model-Based Steganography

Based on the concept of statistical modelling and information theory, Sallee de-

veloped a steganography called model-based steganography [96]. Model-based

steganography is designed to withstand a first-order statistical attack while main-

taining a high embedding rate. Unlike OutGuess, which preserves only the dis-

tribution of an image, model-based steganography preserves the distributions of

individual coefficient modes.

To start the embedding, model-based steganography separates an image into an

unalterable xα and alterable part xβ . If a JPEG image is used as the cover image,

the most significant bits of the coefficients will be the xα and the least significant

bits will be the xβ . xα is used to build a conditional probability P (xβ|xα) from

a selected cover image model. Together with this conditional probability and a

secret message, a non-adaptive arithmetic decoder is used to generate a new part

x′β , which will carry the message bits. The selection of the coefficients to use is

based on a PRNG. Finally, xα and x′β are combined to form the stego image. The

embedding algorithm is shown in Figure 3.5(a).

To extract the secret message, steps similar to those discussed above are followed

with the exception of the non-adaptive arithmetic decoder. An arithmetic encoder

is used instead of an arithmetic decoder. The input to the non-adaptive arithmetic

encoder is x′β and the conditional probability P (xβ|xα). Since xα is unaltered, the

conditional probability can be regenerated. Therefore, the secret message can be

extracted successfully through the non-adaptive arithmetic encoder. Figure 3.5(b)

illustrates the extraction algorithm.

30

Conditionalprobabilitygeneration

Imagemodel Message

Entropydecoding

xα xβ

xα x'β

Cover image

Stego image

(a)

Conditionalprobabilitygeneration

Imagemodel

MessageEntropyencoding

xα x'β

Stego image

(b)

Figure 3.5: (a) Embedding algorithm of model-based steganography (b) extractionalgorithm of model-based steganography

3.2 Steganalysis

This review and organisation of the types of steganalysis is not intended to be

exhaustive, but that it is organised according to the different levels of possible

steganographic analysis. More precisely, these levels are ordered according to

the type of secret information or parameter an adversary wishes to extract. We

begin with the techniques employed by the adversary to detect the presence of a

secret message in an image and to determine which type of steganographic method

is used. After that, we discuss the techniques used to recover some attributes

(secret parameters) of the embedded secret message. This attributes include secret

message length, location of stego-bearing pixels and stegokey.

3.2.1 Differentiation of Cover and Stego Images

In this scenario, it is assumed that the adversary has access to an image (or a

collection of images) and tries to determine if the image contains a secret message

(stego image) or does not (cover image). This task is doable only if the statistical

features present in both cover and stego images are different enough to make

a reliable decision. In order to do that, different feature extraction techniques

31

can be applied to extract relevant statistical features. The following collection of

statistical features can be found in the literature:

❐ Co-occurrence matrix

❐ Statistical moments

❐ Wavelet subbands

❐ Pixel difference

The next step is to perform classification based on the extracted features. Because

the distributions of cover and stego images will never be exactly known, sometimes

overlapping happens. To alleviate this problem, cover image estimation is utilised

to derive a more sensitive feature for steganalysis.

In the following subsections, we are going to discuss how these features have been

applied in steganalysis, follow by the discussion on classification and last but not

least, we will discuss cover image estimation as well.

Co-occurrence matrix

Sullivan et al. use an empirical matrix as the feature set to construct a ste-

ganalysis [102]. The steganalysis technique developed can detect several variants

of spread-spectrum data hiding techniques [24, 76] and perturbed quantisation

steganography [36]. This empirical matrix is also known as a co-occurrence ma-

trix.

The authors observe that the empirical matrix of a cover image is highly concen-

trated along the main diagonal. However, data hiding will spread the concentra-

tion away from the main diagonal. An example of this effect is shown in Figure

3.6. To capture this effect, the six highest probabilities (the elements of the em-

pirical matrix with the highest probability) along the main diagonal are chosen.

Then ten nearest elements of each highest probability element are also chosen.

This creates a feature set with 66-dimensional vectors. Next, the authors subsam-

ple the remaining main diagonal elements by four and obtain another feature set

with 63-dimensional vectors. A feature set with 129 dimensions is used in their

steganalysis.

The feature set selected in [102] is stochastic and may not effectively capture the

embedding artefacts. Xuan et al. [116] constructed a better feature set from

the co-occurrence matrices. They generated four co-occurrence matrices from

32

(a)

Spread awayfrom diagonal

(b)

Figure 3.6: Plot of co-occurrence matrices extracted from: (a) cover image; (b)stego image

the horizontal, vertical, main and minor diagonal directions (as opposed to using

only the horizontal direction as in [102]). These four matrices are averaged and

normalised to form a final matrix. Note that, because the final co-occurrence

matrix is symmetric, it is sufficient to use the main diagonal and part of the upper

triangle of the co-occurrence matrix. Xuan et al. selected 1018 elements from this

area to form their feature set (a 1018-dimensional feature set).

A specifically tuned classifier (class-wise non-principal components analysis) is

used to obtain a high detection rate. Xuan et al. proved its efficiency with

JPEG and spatial domain image steganography. However, their high dimensional

features may suffer from the curse of dimensionality when applied to other types

of classifier. Although their current implementation is arguably optimal, it is

threshold dependent, which limits its flexibility for blind steganalysis.

Chen et al. developed a blind steganalysis based on a co-occurrence matrix [15].

It is well known that direct use of a co-occurrence matrix as the feature will create

an expansion of the matrix dimension. For example, for an 8-bit image, the co-

occurrence matrix will have 256×256 dimensions. Therefore, Chen et al. projected

the co-occurrence matrix into a first-order statistic to reduce its dimensionality.

More precisely, this first-order statistic is the frequency of occurrence along the

horizontal axis of the co-occurrence matrix.

In [43], the authors exploited the correlations between the discrete cosine trans-

form (DCT) coefficients in intra- and inter-blocks of JPEG images. Intra-block

correlation is the correlation between neighbouring coefficients within a block;

inter-block correlation measures the correlation between a DCT coefficient in one

block and the coefficient of the same position in another block.

33

The authors arranged the DCT coefficients in a block into a one-dimensional

vector using the zigzag order. For each block, only AC coefficients are considered

while the DC coefficient is discarded. This is because normally DC coefficients are

not changed in JPEG steganography. In addition, the authors also discard some

coefficients with a high frequency of occurrence (i.e. coefficients with a value of

zero). All the blocks in a JPEG image are scanned in a fixed pattern to form a

new re-ordered block called a 2-D array. Only the magnitudes of the coefficients

are used.

Markov empirical transition matrices are used to capture these dependencies. Hor-

izontal and vertical Markov empirical transition matrices are used to capture the

intra- and inter-block correlations, respectively. The authors also further trim the

dimensionality of the matrices by thresholding the 2-D array. In other words,

elements with a magnitude greater than the threshold are assigned a maximum

value (the threshold value).

Statistical moments

Harmsen and Pearlman [48] showed that additive noise data hiding techniques are

equivalent to a low-pass filtering of image histograms. The centre of mass (COM)

is used to quantify this effect. Note that COM is the first-order of statistical

moment.

The authors have shown that it is better to compute the COM from the frequency

domain. Hence, the discrete Fourier transform is applied to transform the im-

age histogram. This transformation produces a histogram characteristic function.

COM is computed based on this characteristic function.

The detection accuracy reported exceeded 95 per cent when the embedding rate

was 1.0 bpp. Unfortunately, the authors did not test for a smaller embedding

rate. In most cases, decreasing the embedding rate reduces the detection accuracy.

Further, only 24 images were used to test the detection accuracy. This smaller

subset of images may not fully represent the actual accuracy.

However, the use of COM as the feature in [48] has brought insight into much

research. For instance, Shi et al. [100] used a set of statistical moments as the

features in their blind steganalysis. First, the authors use the Haar wavelet to

decompose the image. After the decomposition, eight wavelet subbands are pro-

duced and a discrete Fourier transform was applied on the probability density

34

function of these subbands. The same discrete Fourier transform is applied to the

given image as well. Note that these transformations produce nine characteristic

functions. Finally, the first and second orders of statistical moments are computed

from the characteristic functions.

In a different work [115], Xuan et al. developed an enhanced version, based on

statistical moments. Enhancement is achieved with an additional level of wavelet

decomposition. The third order is used in addition to the first two orders of

statistical moments. The reported experimental results show improvements in

detection accuracy and an ability to detect more steganographic types.

Shi et al. further improved the use of statistical moments as features [101]. The

main difference compared to [100] and [115] is the incorporation of a prediction-

error image. The prediction-error image is obtained from the pixel-wise difference

between the given image and its predicted version. The prediction algorithm is

based on a predefined relationship within a block of four neighbouring pixels.

The statistical moments in [101] are computed from two image components: the

given image and the prediction-error image. The procedures to compute the sta-

tistical moments are the same as the procedures used in [115], obtaining a 78-

dimensional feature set (39 features from each image component).

The experimental results reported are promising. However, the detection accu-

racy is unclear for wider percentage range of hidden messages, since only certain

percentages of hidden messages are tested.

Research similar to that developed in [101] can be found in [15]. The authors of

[15] raised the concern of precision degradation in the first-order statistic when a

wavelet is used (i.e., the wavelet coefficients are floating points). Hence, the co-

occurrence matrix (discrete integers) was used instead of wavelet decomposition.

Inspired by the work in [101], Chen et al. enhanced and applied the statisti-

cal moments on JPEG image steganalysis [14]. This enhancement involves the

incorporation of additional high-order statistics.

In their work, the first feature set is inherited directly from [101]. The same feature

extraction procedure, with some modification in the prediction algorithm, is used

to form the second feature set. Note that the second feature set is extracted from

the absolute value of non-zero DCT coefficients. For the third feature set, the

same set of non-zero DCT coefficients and wavelet subbands are used to construct

three co-occurrence matrices. These co-occurrence matrices are transformed into

35

the characteristic functions and the statistical moments are calculated from these

characteristic functions.

According to the authors, it is crucial to use higher-order statistics as the fea-

tures because some modern steganography, such as OutGuess and Model-based

Steganography, tries to preserve the first-order statistics. This may cause the

first-order statistical features to become less effective. Hence, it is suitable to

incorporate co-occurrence matrices as features.

The statistical moments computed from characteristic functions are more effective

than those computed from image histogram (i.e., the image probability density

function). The main difference between moments of characteristic functions and

image histogram is the variance proportionality (i.e., 1σand σ, respectively). This

means moments of characteristic functions are determined by a smaller variance

distribution. Moments of image histogram are determined by a larger variance

distribution. Since data hiding involves the addition of smaller variance noise, it

is clear that the effect will be reflected more strongly in moments of characteristic

functions. Hence, moments of characteristic functions are more sensitive to data

hiding. This claim has been verified in [115].

Wavelet subbands

It is well known that natural images exhibit strong higher-order statistical reg-

ularities and consistencies. Thus, wavelet decomposition is often used to repre-

sent these characteristics for various image processing purposes. It is also well

known that steganographic embedding significantly disturbs the characteristics of

statistics. Hence, it is very natural to employ wavelet decomposition to detect

disturbances.

The first steganalysis technique using wavelet decomposition was developed by

Farid [32, 33]. In his work, quadrature mirror filters (QMFs) are used to decompose

a given image into multiple scales and orientations of wavelet subbands, obtaining

nine wavelet subbands. Quadrature mirror filters is formed by the combination of

low- and high-pass decomposition filters and their associated reconstruction filters,

which produces three different directions (i.e., horizontal, vertical and diagonal).

An illustration of the decomposition is shown in Figure 3.7. Farid also used

a linear predictor to compute the log errors from the magnitude of coefficients

in each subband. A linear predictor is defined as a linear combination of some

scalar weighting values and a subset of neighbouring coefficients. This results in

36

V1

D1H1

V2

H2 D2

V3

D3H3

Figure 3.7: Illustration of wavelet decomposition. Hi, Vi and Di denote the hor-izontal, vertical and diagonal subbands, respectively. The index i indicates thescale

another nine sets of log errors (i.e., from the decomposed nine wavelet subbands).

Finally, the mean, variance, skewness and kurtosis are used to characterise the

wavelet coefficient distribution in all nine subbands. The same statistics are used

to characterise the nine sets of log error distributions. Combining these statistics

forms a 72-dimensional final feature set.

In subsequent work [74], Lyu and Farid extended the wavelet statistics to include

the colour components of an image. The wavelet decomposition process used is

the same as in their prior work. This means that each colour component will

be treated as a greyscale image and is decomposed into wavelet subbands. For

example, in a colour image consisting of red, green and blue components, each is

processed independently as a greyscale image. However, the main difference in [74]

is the second part of the feature set—the log errors. The linear predictor used to

compute the log errors has been updated to include neighbouring coefficients from

different colour components. Identical to their prior work, the mean, variance,

skewness and kurtosis are used to characterise the wavelet coefficients and log

error distributions. Obviously, the number of dimensions for the final feature set

has been increased to 216.

Through extensive work on wavelet decomposition, Lyu and Farid [75] extended

their work to include phase statistics (in addition to their prior work with mag-

nitude statistics). In their work, phase statistics are modelled using the local

angular harmonic decomposition (LAHD). The LAHD can be regarded as a local

decomposition of image structure by projecting onto a set of angular Fourier basis

kernels. Different orders of LAHD can be computed from the convolution of the

image with the derivatives of a differentiable radial filter such as a Gaussian filter.

The feature set has been extended to form a 432-dimensional feature set. The ex-

perimental results reported show promising results and the ability to detect eight

37

different steganographic methods.

A feature set extracted from wavelet decomposition may seem effective, but nor-

mally the feature dimensionality is large, which increases the complexity of the

classification process. In addition, a larger dimensional feature set requires more

training samples to achieve stable classification. Other related works that utilise

wavelet decomposition to extract feature set can be found in [100, 14, 120].

Pixel difference

Liu et al. consider the differential operation as high-pass filtering process when

applied to images [70]. This is desirable as it can capture the small distortion

caused by the embedding operation. In [70] the differential operation is defined

as the pixel-wise difference between two neighbouring pixels in the horizontal

direction (similarly in the vertical direction). The similar differentiation operations

are extended to obtain higher orders—second and third orders. The authors call

these statistics differential statistics.

In the feature extraction phase, differential statistics and the image pixel proba-

bility mass function are used to construct the first- (histogram for the frequency of

occurrence) and second-order (co-occurrence matrix) statistics. With these first-

and second-order statistics, a discrete Fourier transform is applied to obtain the

respective characteristic functions. Finally, the COM for each characteristic func-

tion is computed as a feature set. Note that the COM features computed are

identical to the features developed by Harmsen and Pearlman [48].

The experimental results reported in [70] suggest that this method can effectively

detect spread-spectrum data hiding. In addition, incorporating the differential

statistics feature set significantly improves the JPEG blind steganalysis developed

in [35]. According to the authors, the differential statistics are used to enlarge

the blockiness effects incurred during embedding. Hence, the enlargement makes

their feature set more sensitive to data hiding.

In a different work [99], Shi et al. developed an effective steganalysis technique

to attack JPEG steganography. The high accuracy achieved by this technique

is due to a sensitive feature set, notably the use of a difference JPEG 2D array.

JPEG 2D array has the same size as the given image filled with the absolute value

of quantised DCT coefficients. Note that the difference JPEG 2D array is very

similar to the differential statistics in [70]. More precisely, it is the first-order

38

differential statistic.

Compared with differential statistics, where only the horizontal and vertical direc-

tions are used, Shi et al. included the major and minor diagonal directions. For

each of the four directions, a transition probability matrix is computed. Thresh-

olding is also utilised to achieve a balance between detection accuracy and com-

putational complexity.

The difference JPEG 2D array in [99] reflects the correlations of neighbouring coef-

ficients within an 8×8 block. These correlations are called intra-block correlations.

Later, the authors in [13] include the inter-block correlations. For inter-block cor-

relations, the difference between two coefficients with the same mode is computed

from two neighbouring 8× 8 blocks (as opposed to two immediately neighbouring

coefficients within an 8 × 8 block for intra-block correlations). Figure 3.8 shows

an example of these correlations. Note that there are 64 coefficients per block and

the location of each coefficient within a block is known as the mode. The exper-

imental results indicate a significant improvement from incorporating inter-block

correlations. Clearly, the coefficient differences contribute crucial information for

this improvement.

Intra-blockcorrelation

Inter-blockcorrelation

(i)-th block (i+1)-th block

(N)-th block

Figure 3.8: Illustration of the intra- and inter-block correlations in a JPEG image

The effectiveness of differential statistics [70] can be attributed to the net results

after high-pass filtering. More precisely, the results of differentiation will produce

only the variable parts—possibly altered during embedding. This characteristic is

desirable as it amplifies the embedding artefact. Similarly, the alterations incurred

in JPEG steganography can be greatly enlarged and captured. This is the case for

the difference 2D array in [99], where the authors examine the difference between

a DCT coefficient and its neighbouring coefficient. This may seem optimistic

for richer statistics image like an 8-bit image. However, its applicability may be

39

questionable for a moderate statistics image, such as a halftone image.

Classification

As discussed in Subsection 2.4, differentiating a stego image from a cover image

involves classification. From the literature, the most commonly used classifiers

include Fisher linear discriminant, artificial neural networks and support vector

machines. These classifiers were discussed in Subsection 2.4.2.

Note that most of the work on blind steganalysis focuses on feature extraction.

The type of classifier selected is merely a choice. The task of feature extraction is

considered more crucial than the selection of the classifier in steganalysis, primarily

because the detection accuracy depends significantly on the feature sensitivity to

the embedding artefact. It is clear that for a sensitive and discriminating feature

set, the overall accuracy can be optimised by an optimised classifier. It is not

hard to switch from one type of classifier to another. For example, Farid changed

from using Fisher linear discriminant in [33] to SVM [73]. In [101] and [14], where

neural network is the initial classifier, it is later switched to SVM.

Cover image estimation

Normally, a cover image is destroyed or kept secret once a stego image is generated

to ensure maximum security of covert communications [92]. This implies that only

one version is typically available. If we have access to both the cover and stego

versions of the image, we can tell the differences easily and the steganography

scheme is considered broken.

In general, the effect of data hiding can be modelled as the effect of additive noise

in an image. It is sufficient to assume that if the additive noise or message is

independent of the cover image, the probability mass function (PMF) of the stego

image is equal to the convolution of additive noise PMF and cover image PMF

[48]. Hence, the cover image can be estimated from the stego image if the additive

noise is eliminated.

This has inspired the incorporation of cover image estimation in much steganalysis

research, such as image calibration [39] and prediction-error [101], to increase fea-

ture sensitivity with respect to the embedded artefacts and to remove the influence

of the image content. This improves the discriminatory power of steganalysis.

40

Ker [61] applied image calibration (along with another improvement called the ad-

jacency histogram) to improve the blind steganalysis initially developed by Harm-

sen and Pearlman [48]. In his work, a given image is down sampled with an

averaging filter. This down sampling involves addition and rounding operations

on the pixels (or coefficients). These operations even out the additive noise, allow-

ing the cover image to be estimated. However, the efficiency degrades when it is

used to detect stego images with shorter messages. This suggests that calibration

by down sampling may not be the optimal option.

In [121], Zou et al. used a simpler method to obtain the estimated cover image,

which they call a prediction-error image. The current pixel is subtracted from

the neighbouring pixel to obtain the prediction. For example, x(i, j)− x(i+ 1, j)

will produce the prediction-error image in the horizontal direction. x(i, j) is the

current pixel at location i and j. The same prediction is applied for vertical and

diagonal directions.

The authors note that this prediction may exhibit high variation for the prediction

values within a predicted image. For instance, the prediction values for an 8-bit

image will be [−255, 255]. To overcome this issue, the authors proposed using a

threshold T . If the absolute value of the prediction is greater than T , it will be set

to zero. The authors suggest that thresholding is effective as a high variation in

the prediction values is mostly caused by the image content (hence, is insignificant

in steganalysis).

However, it may be possible for adaptive steganography to counteract this thresh-

olding technique by adaptively selecting the region with high variations. This

causes the hidden data to be regarded as the original image content and discarded;

therefore, the detection fails.

3.2.2 Classification of Steganographic Methods

In this case, the adversary holds an image and wants to discover which stegano-

graphic technique has been used. Further on, we may assume that the collection

of possible steganographic techniques is public and known to the adversary. This

steganalysis problem has been tackled using the following approaches:

❐ Feature extraction

❐ Multi-class classification

41

Feature extraction

The first part of multi-class steganalysis is feature extraction. Here, several impor-

tant feature extraction techniques used in multi-class steganalysis are described

and analysed to gain an understanding of the analysis.

Rodriguez and Peterson focus on the determination of the embedding techniques

for JPEG image steganography [95]. The feature set is extracted from the mul-

tilevel energy bands of DCT coefficients. First, all DCT coefficients are arranged

into blocks of 8 × 8 coefficients. Within each block, the DCT coefficients are ar-

ranged using zigzag and Peano scan methods to produce the multilevel energy

bands. Then, the higher-order statistics (such as inertia, energy and entropy) are

computed for each band. These higher-order statistics form the first part of the

feature set. In addition, log errors are computed for the multilevel energy bands.

The log errors are the residuals computed from the DCT coefficients and their

predicted coefficients. The predicted coefficients are obtained from a predefine

subset of neighbouring coefficients. The same higher-order statistics are applied

to these log errors to form the second part of the feature set.

In general, the multi-class steganalysis [95] performs fairly well. However, the

method performs poorly on some steganographic techniques, such as OutGuess

[90] and StegHide [51], because OutGuess and StegHide can use a similar em-

bedding algorithm, which makes differentiation difficult. On the other hand, this

also shows that the developed feature set may not discriminate sufficiently. The

weakness is manifested in the feature elimination procedure. It is unclear how the

method performs the feature elimination. As a result, important information can

be discarded.

The extensive work in multi-class steganalysis carried out by Pevny and Fridrich

[85, 86, 87, 88, 89] was aimed at determining the types of embedding algorithms

employed in JPEG image steganography. The first version of their multi-class

steganalysis was an enhanced version of their blind steganalysis developed in [35].

Mainly, they utilised their proven discriminative feature set, applying it to multi-

class steganalysis.

There are 23 features and they can be grouped as global histogram, individual

histograms, dual histograms, variation, blockiness and co-occurrence matrices. A

global histogram is the histogram of all DCT coefficients in an image. Individual

histograms are extracted from the DCT coefficients of the five lowest-frequency AC

modes. Note that the mode refers to which DCT coefficient is located in the block

42

and there are 64 modes. Figure 3.9 shows an illustration of the modes from a JPEG

image. The next features are dual histograms, which represent the distributions

of eleven selected DCT coefficient values within the 64 modes. Variation is used

to measure the inter-block dependencies among the DCT coefficients. Blockiness

measures the spatial inter-block boundary discontinuities. The discontinuities

are calculated from the spatial pixel values of the decompressed JPEG image.

Finally, the co-occurrence matrices are calculated from the DCT coefficients of

neighbouring blocks. In addition, the estimated cover image is also used in feature

construction to increase the discriminative power of the feature set. To obtain an

estimation of the cover image, the authors decompress the JPEG image, crop

off some portion of the image and re-compress it. This process is called image

calibration.

A magnified view ofan 8x8 DCT block

(i)-th block (i+1)-th block

64-th mode

63-th mode

1-st mode2-nd mode

Figure 3.9: The 64 modes of an 8×8 DCT block. The circle represents the DCTcoefficient

Note that their multi-class steganalysis shows promising results. Later, Pevny

and Fridrich extended it to include a more complicated case, which involved the

analysis of double compressed JPEG images [86, 87]. Double compression occurs

when a JPEG image has been decompressed and re-compressed with different

JPEG quality factors after embedding the secret message. This can occur when

F5 or OutGuess is used to generate the stego image. According to the authors,

the double compression effect will make cover image estimation inaccurate. Hence,

the results of steganalysis may be misleading.

The main difficulty lies in the unavailability of the previous or first JPEG quality

factor. To alleviate this problem, the authors use an estimation algorithm from

[72] to estimate the previous JPEG quality factor. The estimation algorithm

utilises a set of neural networks to compute the closest estimation, based on the

43

JPEGimage

Q

Q'

JPEGdecompression

Cropping

JPEGcompression

JPEGdecompression

JPEGimageSpatialimage

JPEGimage

Q



JPEGcompression

QuantisationMatrix Estimation

JPEGimage

Q'

Q

Figure 3.10: The modified image calibration steps used for double compressedJPEG image. The shaded box represents the calibrated image and Q denotes theJPEG quality factor

DCT coefficients of five lowest-frequency AC modes. With the estimated JPEG

quality factor, the updated image calibration process proceeds as follows: First,

the JPEG image is decomposed, cropped and re-compressed with the estimated

JPEG quality factor. Then, the re-compressed JPEG image is decomposed again

and re-compressed for the second time using the second JPEG quality factor—The

second JPEG quality factor is the one stored in the JPEG image before calibration.

These steps are shown in Figure 3.10. The rest of the feature extraction process

remains the same.

Pevny and Fridrich later discovered that some important information might be

lost due to the existing features representation [88]. Hence, they enhanced some

of the features by replacing the L1 norm with the feature differences within a

carefully chosen DCT coefficient range. Only a subset of features is involved in

the improvement—the global histogram, individual histograms, dual histograms

and co-occurrence matrices. According to the authors, their feature set effectively

models the inter-block dependencies of DCT coefficients. To build a strong multi-

class steganalysis requires features that can also model intra-block dependency.

Hence, the authors incorporate the feature set developed in [99] with their ex-

tended feature set. Prior to the incorporation, the feature set developed in [99] is

averaged and calibrated.

From the work of Pevny and Fridrich, the authors combined and constructed a

complete functional multi-class steganalysis in [89]. This system was developed to

handle both single and double compressed stego images generated from current

popular steganographic techniques. The system can perform classification under

a diversified range of JPEG quality factors. In addition, for some non-standard

44

JPEG quality factors tested, the system also shows reliable classification results.

The experimental results reported proved the system could classify when presented

with stego images that it had not been previously trained to interpret.

In a different work [31], Dong et al. constructed a multi-class steganalysis based

on the analysis of image run length. This work is an extension of their previous

work [30]. The main contribution of this work is the ability to perform multi-class

steganalysis across different image domains—the same technique can be used to

classify spatial (e.g., BMP) and frequency (e.g., JPEG image) domain images.

This shows the ability to generalise, which is desirable in multi-class steganalysis.

The core feature in their work is the histogram of image run length. Image run

length can be considered a compression technique. A sequence of consecutive

pixels with the same intensity along a direction can be represented compactly as a

single intensity value and count. This forms a matrix r(g, ℓ) with intensity value

g and count ℓ as the axes. For an 8-bit image and a maximum count of run length

L, the histogram of the image run length can be defined as follows:

H(ℓ) =

255∑

g=0

r(g, ℓ), 1 < ℓ < L, (3.3)

Note that the histogram count defined in Equation (3.3) is for one direction. Other

directions (e.g., 0◦, 45◦, 90◦ and 135◦) are computed in a similar manner. Based on

the histograms of image run length, several higher-order moments are computed

and used as a feature set.

The embedding of a secret message alters the distribution of the run length. More

precisely, the original pixel sequence, with identical intensity, will be turned into

different shorter sequences. These changes will be significantly reflected in the

image run length.

The reported experimental results show comparable performance in the spatial

and frequency domains; however, these results may not be representative because

the experimental message lengths are arbitrary. It is well known that detection

accuracy is influenced significantly by the size of the embedded message. It will

be useful to determine a fair measurement in terms of message length that can

be used in both image domains such that the detection performance accurately

reflects the discriminative power of the multi-class steganalysis.

45

Multi-class classification

The second part of multi-class steganalysis is classification. The most common

classifier used in multi-class steganalysis is the support vector machine. In general,

there are two methods for constructing a multi-class classifier: the all-together

method and the method that combines several two-class classifiers. The all-

together method can be considered one that solves the entire classification with

a single optimised classifier. Clearly, this method requires more computational

resources and involves a complex classifier. The other method solves multi-class

classification problem by combining several two-class classifiers (for brevity, we

refer to this as the multiple two-class classifiers method). This method requires

relatively less computational resources and provides competitive classification ac-

curacy.

According to the review in [53], there are three multiple two-class classifier ap-

proaches: one-against-one, one-against-all and directed acyclic graph support vec-

tor machine (DAGSVM). Based on the findings in [53], one-against-one is prefer-

able and more suitable for practical applications. Examples of work using this

approach in multi-class steganalysis can be found in [97, 89, 31].

The first step in the one-against-one approach is to perform a normal two-class

classification among the classes. Every two-class classifier is trained to classify one

class against each of the other classes. For instance, the first two-class classifier

is assigned to distinguish between cover images and type-1 stego images. The

next two-class classifier is assigned to distinguish between type-1 and type-2 stego

images and so on until all pairs of combinations are formed. This method uses

K(K−1)/2 two-class classifiers for all pairs of classes, where K is the total number

of classes. The conceptual diagram for this approach is shown in Figure 3.11.

Cover

Stegotype-2

Stegotype-1

Cover

Stegotype-1

Cover

Stegotype-2

Stegotype-2

Stegotype-1

= + +

Figure 3.11: The multi-class classification on the left is formed by a combinationof several two-class classifications on the right

The second step is to employ a strategy to determine the correct class for the

46

image. A commonly used strategy is majority voting or the max-wins strategy. In

the majority-voting strategy, the results from each two-class classifier are obtained

and accumulated. From the accumulated results, the class receiving the highest

count is assigned as the correct class. If two classes obtain the same highest count,

one class is randomly selected.

Clearly, an embedding algorithm alters a cover image one way or another. This

implies that, with an effective feature set, the distance in the feature space be-

tween cover images and all types of stego images should be large. However, the

distance in the feature space within each different type of stego images should

be comparatively smaller. Therefore, it is more efficient to use blind steganalysis

(i.e., two-class classification) to initially separate cover images from stego images.

Then, multi-class classification can determine the type of embedding algorithm.

This scheme reduces the number of classes and the number of classifiers. The

efficiency of this scheme has been proven and reported in [31].

3.2.3 Estimation of Message Length

If the steganographic method is known to the adversary, he can begin to recover

some attributes about the embedded message. For instance, the steganalysis tech-

nique used may provide the adversary with an estimate of the number of em-

bedding changes. For that, the adversary can approximately infer the embedded

message length. Further on, we will discuss several well-known steganalysis tech-

niques that estimate the length of an embedded message.

Note that the LSB embedding algorithm that overwrites the pixel LSBs will not

change the grand total frequencies of pixel intensities. Only the frequencies of

occurrence are swapped between these intensities. In other words, when embed-

ding occurs, the frequencies of occurrence for odd pixel intensities are transferred

to the corresponding even pixel intensities and vice versa. These frequencies of

odd-even pixel intensities are called pairs of values (PoV), (2i, 2i+1). This change

involves swapping the frequencies of occurrence within each PoV and the sums of

the frequencies in every PoV remain the same. If the message bits are uniformly

distributed (typically the case, because the message is encrypted), the frequen-

cies of the intensities in each PoV will become identical after embedding (refer to

Figure 3.12).

From this observation, Westfeld and Pfitzmann [111] developed a steganalysis

technique based on the chi-square test (known as a chi-square or χ2 attack). The

47

85 90 950

100

200

300

400

500PoV

(2i, 2i+1)

PoV(2i, 2i+1)

PoV(2i, 2i+1)

(a)85 90 95

0

100

200

300

400

500

PoV(2i, 2i+1)

PoV(2i, 2i+1)

PoV(2i, 2i+1)

(b)

Figure 3.12: (a) A portion of image histogram before embedding. (b) The sameportion of image histogram after embedding. Notice that the histogram bins ofeach PoV have been equalised

chi-square test measures the degree of similarity between the observed sample

distribution and the expected frequency distribution. The observed sample dis-

tribution is obtained from the given image distribution. The expected frequency

distribution is computed from the arithmetic mean of the PoVs. The χ2 attack

can estimate the length of an embedded message as long as the message is em-

bedded sequentially. However, the attack is unable to provide reliable detection if

the message bits are randomly embedded in the image.

To address this weakness, Provos and Honeyman [90, 91] extended the chi-square

attack. In contrast to the previous chi-square attack (where the sample size was

increased from a fixed start location along the test), the extended chi-square attack

uses a fixed sample size that moves over the entire image. The start location

for the fixed sample size is set at the beginning of an image and moved with a

constant distance along the test. Another difference is that, instead of computing

the PoV arithmetic mean, the expected frequency distribution is obtained from

the arithmetic mean of pairs of unrelated coefficients.

Although the χ2 attack is effective for attacking generic LSB replacement steganog-

raphy, it fails if a steganography employs a complicated algorithm such as F5 [110].

For that, Fridrich et al. [39, 40] developed a steganalysis technique targeted specif-

ically to attack F5. This technique can estimate the length of a hidden message

embedded in the JPEG image.

The main idea is based on the proportionality of a defined macroscopic quantity

and the hidden message length. In other words, the size of the embedded message

will be reflected in the macroscopic quantity. Hence, the hidden message length

can be determined by computing the macroscopic quantity.

48

The first step in this technique is to estimate a copy of cover image from the

given stego image. The estimation is carried out by cropping four pixels in both

the horizontal and vertical directions after decompressing the stego image. The

cropped image is then recompressed with the same JPEG quantisation table from

the stego image. Note that this process is the image calibration discussed in the

preceding subsection.

In the next step, the authors use the histograms of several low-frequency DCT

coefficient modes as the macroscopic quantity. The histograms used are from the

given stego image and the estimated cover image. The modification caused by the

embedding will be reflected on the distribution of the histograms. Hence, based

on the histograms, the modification rate can be determined. Finally, with the

modification rate, the size of the hidden message can be computed.

In [38], Fridrich et al. launched an attack on OutGuess using a similar concept.

They started by determining a macroscopic quantity that progressively changes

with the size of the embedded message.

Due to the LSB flipping algorithm of OutGuess, embedding increases the spatial

discontinuities at the boundaries of all 8 × 8 blocks. Hence, the authors used

blockiness as the macroscopic quantity for measuring the degree of change that

occurred at the boundaries. In Figure 3.13, we show an illustration of the 8 × 8

block boundaries, where the blockiness measurement is calculated.

A magnified viewof an 8x8 block

(i)-th block (i+1)-th block Boundaries of

an 8 x 8 block

(N-1)-th block (N)-th block

Figure 3.13: The shaded regions denote the boundaries of 8 × 8 blocks in a de-compressed JPEG image

According to the authors, the increase of blockiness is expected to be smaller in

49

a stego image than in a cover image when a full-length dummy message (random

bits) is artificially re-embedded using OutGuess. This is because of the partial can-

cellation effect on the stego image. For example, the LSB of a pixel xi is changed

from zero to one after embedding. For re-embedding a full-length message, the

LSB of pixel xi is changed back from one to zero.

Note that this attack also depends on the estimation of the cover image, which

is the same technique used in [39]. With the blockiness measurement and the

re-embedding of a full-length message on the given stego image and the estimated

cover image, a linear interpolation is used to estimate the length of the embedded

message.

In another example, He and Huang [50, 49] analysed non-adaptive stochastic mod-

ulation steganography [36] and showed how to estimate the length of a hidden

message. Stochastic modulation steganography is a noise-additive steganography,

where a signal with specific probabilistic distribution is modulated and added to

carry the message bits. The signal in this context is known as stego noise.

The attack is based on the differing probability distributions of pixel differences for

cover and stego images. More precisely, the probability distribution of pixel differ-

ence for a cover image closely follows a generalised Gaussian distribution (GGD),

while the probability distribution of a stego image reflects the statistical charac-

teristics of a hidden message. Note that, for non-adaptive stochastic modulation

steganography, the probabilistic distribution of a stego image’s pixel difference is

a convolution of the probabilistic distributions of a cover image’s pixel difference

and a stego noise difference. Thus, the attack starts by establishing a model to

describe the statistical relationship among the cover image, stego image and stego

noise. Next, the required distributional parameters are estimated from the given

stego image. Then, based on the distributional parameters, the authors employ a

grid search and chi-square goodness of fit test approach to estimate the length of

the embedded message.

The experimental results reported show promising detection accuracy. The au-

thors mention that this steganalysis technique is not only effective for noise addi-

tive steganography, but also suitable for other types of non-adaptive steganography

(e.g., LSB-based steganography and ±k steganography). Unfortunately, no fur-

ther details regarding this are provided. This technique depends significantly on

the assumption where the pixel difference of a cover image is accurately modelled

as GGD. However, this assumption will likely cause this technique to fail when

the analysed cover image is a binary image. The failure is due to the modest

50

statistical characteristic of a binary image.

Jiang et al. [57] launched an attack on boundary-based steganography that embeds

a secret message in a binary image. Their attack hinges on the observation that

embedding disturbs pixel positions and this degrades the fit of the autoregressive

model on binary object boundaries.

The attack works by assuming that the boundaries of characters or symbols in a

textual document can be modelled by a cubic polynomial. This allows a bound-

ary pixel to be estimated from its neighbouring pixels through an autoregressive

process. An estimation error vector is computed from the given and estimated

boundary pixels. Then the mean and variance of the estimation error vector are

calculated. According to their experiments, the mean and variance increase pro-

portionally with respect to the relative message length. Hence, based on some

testing samples, a linear equation is defined for message length estimation. How-

ever, this attack is not applicable when the object boundaries cannot be modelled

by a cubic polynomial.

In [56], Jiang et al. launched another attack on binary image steganography. This

attack is based on the idea that the entropy of a stego image is a monotonically

increasing function of the embedding rate. A JBIG 2 binary image compression

algorithm is used to capture the entropy. This compression algorithm establishes a

quantitative relationship between the compression rate and embedding rate. Thus,

the estimate of message length can be derived from the computed compression rate.

The message length estimation steganalysis techniques mentioned above are mainly

focused on a specify steganography. The work developed in [71] generalises the esti-

mation technique so that it can be applied to a wider range of steganographic meth-

ods. Indeed, it can be considered a multi-class steganalysis technique that uses a

multi-class classifier to estimate the hidden message length. The authors employ

SVM classifiers with one-against-all strategy to perform the multi-classification

tasks. The measurement of standard mean square error is modified and used as a

feature set.

Unfortunately, the use of multi-classification technique to estimate the message

length is of limited use and impractical. Unlike the multi-class steganalysis dis-

cussed in Subsection 3.2.2, where the number of classes is small, treating each

different message length as a single class will contribute to a large number of

classes. For instance, if there are n classes of steganographic methods with m

different lengths, the multi-classifier will be required to classify n × m different

51

classes. Clearly, when n and m increase, the classification will become ineffective

as the extracted feature points may have significant overlaps. Therefore, the mes-

sage length estimation technique developed in [71] will likely become unreliable

when the number of classes is large.

3.2.4 Identification of Stego-Bearing Pixels

In some cases when the adversary is certain and has knowledge about the stegano-

graphic method used, this provides an opportunity for him to identify which pixels

are used to carry the message bits.

The work in [27] is motivated by the concept of outlier detection. A model of image

distribution is built first. After that, any pixel that deviates from the model is

identified as an outlier. Together with the outlier detection, the authors of [27]

opted to utilise an image restoration technique. Their idea is that a pixel altered

to carry a message bit will deviate from the image distribution and be identified

as an outlier. When the image restoration technique is applied, the pixel (outlier)

will be automatically removed. Obtaining the list of removed pixels identifies the

locations of stego-bearing pixels.

Due to the wide variety of image content, a non-parametric model will be more

suitable and useful than a parametric model. Hence, the image pixel energy has

been used to model the distribution. In the restoration process, each pixel is

examined and may be conditionally updated to minimise the pixel energy. The

whole process is repeated until convergence occurs.

The developed technique is reported to work with greyscale and colour palette

images, such as GIF. However, the authors report that their technique may be

defeated if the message is adaptively embedded in the high-energy regions of an

image. This also implies that identification of stego-bearing pixels becomes unre-

liable when an image with rich texture content is used as the cover image.

Kong et al. developed a steganalysis technique to identify the region in a colour

image where a secret message is embedded [68]. This technique is specifically

targeted to attack steganography with a sequential embedding algorithm. The

idea comes from the fact that when the colour components (e.g., red, green and

blue) of the colour image are altered independently, the smoothness of the colour

will be disturbed. According to the authors, this observation becomes prominent

under the investigation of a different colour system (e.g., HIS, YUV and YCbCr)

52

which uses luminance, hue and saturation to describe the colours.

Kong et al. suggest that, in general, the hue of a cover image varies slowly and

tends to be constant in a small neighbourhood of pixels. This is no longer true

when a hidden message is embedded. Thus, when the coherence of the hue in a

region under examination exceeds a certain threshold, there is a good reason to

suspect that it contains bits of a hidden message.

The algorithm of Kong et al. technique can be summarised as follows. Given a

colour image, their technique will partition the image into blocks and examine

each block separately. The pixels within each block are divided into two distinct

groups: coherence and incoherence. A pixel is assigned to the coherence group

if the maximal difference of hue between this pixel and its neighbouring pixels

is less than a threshold. In addition, at least one neighbouring pixel with the

same hue as that pixel must exist. Any pixel that fails to fulfil these conditions

will be assigned to the incoherence group. The ratio of the coherence group to

the incoherence group determines if a block should be labelled as a stego-bearing

region.

Note that this technique involves a high degree of threshold dependency. For exam-

ple, the technique requires three different thresholds to identify the stego-bearing

region. This may be a drawback in practice, especially with a wide variety of image

content. Hence, careful selection of cover images renders this technique ineffec-

tive. Furthermore, this technique only works for steganography with sequential

embedding. If the embedding is random, then this attack will fail.

In [62], Ker argues that it is possible to have a situation, where several different

cover images are used for a batch of secret communications. These images are the

same size, but embedded with different messages. It is very likely that the same

stegokey will be used for this batch of communications.

We can relate a simple but plausible scenario to this assumption. For instance, a

batch of secret communications can use a set of different images, captured with

the same settings of a digital camera. This produces images with the same size.

For security reasons, random embedding algorithms are preferable. As usual,

the random embedding is controlled by a stegokey and it is quite possible the

same stegokey is reused for the entire batch of communications. This will result

in different messages embedded in the same locations across different images. In

addition, it is also possible that a sequential embedding algorithm is used, resulting

in embedding with the same fixed pixel locations.

53

This is what inspired Ker [62] to develop a technique to identify the locations of

these stego-bearing pixels. In this work, Ker employed a weighted stego image

(WS), initially developed in [37] and later improved [63]. The analysis is based on

the residuals of WS. The residual of a WS is the pixel-wise difference between the

stego image and the estimated cover image. The residual at the ith pixel can be

defined as follows:

ri = (si − si)(si − ci), (3.4)

where si is the ith pixel of the stego image, si is the corresponding stego pixel

with its LSB flipped and ci is the pixel of the estimated cover image.

With access to multiple stego images, as in the scenario described above, the mean

of residual at the ith pixel can be computed as follows:

ri· =1

N

N∑

j=1

rij , (3.5)

where N is the total number of stego images. rij is obtained as in Equation (3.4)

for the jth stego image and can be defined as follows:

rij = (sij − sij)(sij − cij). (3.6)

When N is sufficiently large, the mean of the residual will provide strong evidence

for separating stego-bearing pixels from normal pixels.

Note that the analysis developed in [62] is most effective for LSB replacement

steganography. It may become ineffective for other steganography, such as LSB-

matching steganography. Being aware of this limitation, Ker and Lubenko ex-

tended the work to cover the analysis of LSB-matching steganography in [64].

They use the residuals of wavelet absolute moments (WAM), which are derived

from the feature set developed for blind steganalysis in [44]. The residuals of

WAM are computed as follows. Given an image, one level of wavelet decomposition

using an eight-tap Daubechies filter is employed. The decomposition produces four

subbands—the low frequency, vertical, horizontal and diagonal subbands. Then,

the authors use a quasi-Wiener filter to compute the residuals of WAM from the

vertical, horizontal and diagonal subbands. However, the low-frequency subband

is not used and initialised to zero. The residuals of WAM and the zeroed low

frequency subband are reconstructed through the inverse of the wavelet transform.

This reconstruction produces something similar to a spatial domain image that

54

Image

Inverse wavelettransform

Quasi-Wienerfiltering

Waveletdecomposition L

VH

D

R[V]

R[H]

R[D]

Residualimage

ZeroFilling

L'

Figure 3.14: The extraction of residual image. L, V, H and D denote the lowfrequency, vertical, horizontal and diagonal subbands, respectively. L' is the zeroout low frequency subband. R[V], R[H] and R[D] denote the residuals of WAMfrom the vertical, horizontal and diagonal subbands, respectively

the authors call a spatial domain residual image. The whole process is depicted

in Figure 3.14.

Similar to [62], and based on the same assumption, where multiple stego images

are available and the same stegokey is reused for the embedding, the identification

of stego-bearing pixels is performed. The identification starts by computing the

mean of the absolute residual for every pixel across all stego images, Note that the

mean of the absolute residual is identical to the mean of the residual defined in

Equation (3.5). The pixel used to carry the message bit will have a higher mean

of the absolute residual.

Thus, the locations of stego-bearing pixels can be identified by selecting p number

of pixels with the highest mean of the absolute residual. According to the authors,

p can be estimated by a quantitative steganalysis technique (the analysis discussed

in Section 3.2.3). However, if the estimation of p is less accurate, the inaccuracy

of p will cause inaccurate identification of stego-bearing pixels.

One important observation is that cover image estimation has a significant effect

on the analysis. In general, the accuracy of the analysis can be greatly improved

by using a more accurate estimation technique. In [62, 64] it is very important

to keep the number of required stego images to a minimum. For example, if

several hundred stego images are required to obtain an accurate identification, the

technique will be of limited use.

55

3.2.5 Retrieval of Stegokey

In a scenario where the embedded message is not encrypted and the key space

is small, it is very likely that the adversary can mount a dictionary attack or

brute-force search for the stegokey. For example, for every stegokey tried, the

adversary gets an alleged message and the correct stegokey would be revealed

when a meaningful message is obtained. The following examples show a more

advanced version of this type of attack.

Fridrich et al. [41] developed a steganalysis technique that can retrieve the ste-

gokey. The technique was developed with several assumptions: (i) the retrieved

stegokey is the seed of a pseudorandom number generator (PRNG), and (ii) the

steganalysis is independent of the encryption algorithm. As the steganography

may use a mapping component, such as a hash function, to map the password

to the seed, it is reasonable to assume the retrieved stegokey is the seed of a

PRNG rather than the password. Normally, a message will be encrypted before

the embedding algorithm is applied. For that, a stegokey (or stegokeys) is used

in the encryption algorithm as well as the PRNG that generates the embedding

path in the embedding algorithm. Clearly, the computational of stegokey retrieval

may become infeasible—this is where the second assumption comes into play. The

technique involves only finding the seed used in the embedding algorithm and

discarding the encryption algorithm.

Given an image with N pixels, where m < N pixels are randomly selected during

embedding to carry the secret message bits, the embedding path generated by the

stegokey is a random path. The steganalysis technique starts by taking n samples

of pixels where n < m. The n samples are selected randomly from the stego image

and the random selection path is generated from a seed kj. kj is from the stegokey

space. The correct stegokey is determined through a brute force search within the

stegokey space for different kj. The distributions of n samples for the correct and

incorrect stegokeys are different; therefore, it is suitable to use their probability

density functions (PDFs) as the statistical properties. Finally, the chi-square test

is used. For every tested stegokey, the chi-square statistic is obtained and the

outlier will be identified as the correct stegokey.

This technique was tested on two JPEG steganographic methods—F5 and Out-

Guess. Later, they extended the analysis to cover spatial domain image steganog-

raphy [42]. In [42], Fridrich et al. chose generic LSB replacement and LSB match-

ing steganography as the benchmark steganographic methods for testing their

extended steganalysis technique.

56

Their extended technique adds a pre-processing step. In the pre-processing step,

a non-linear filtering operation is used to increase the signal-to-noise ratio (SNR)

between the stego signal and the cover image. Thus, instead of pixel values,

residuals are used. Residuals are computed from the difference between the pixels

values of the image and its filtered version.

Although both techniques are powerful and can be applied practically to a wider

class of steganography, the following issues may reduce their effectiveness:

i. The embedded message occupies 100 per cent of the image capacity.

ii. Matrix encoding is employed as part of the embedding algorithm.

iii. The speed of the PRNG is reduced (hence the brute search time increases

exponentially and makes the technique infeasible).

The authors also noted that their technique could become complicated and difficult

when the stego image is noisy or the stegokey space is huge. However, if multiple

stego images embedded with the same stegokey are available, this will increase the

probability of retrieving the correct stegokey.

A similar analysis is found in [105, 106]. The focus of the analysis is on retrieving

the stegokey in a sequential embedding algorithm. The stegokey is defined differ-

ently and identified as the start-end locations of a consecutive embedding path.

To identify the start-end locations, the analysis employs a sequential analysis tech-

nique called the cumulative sum (CUSUM). The idea is to detect a “sudden jump”

in the statistic, which indicates the existence of a message.

The authors extended the steganalysis by utilising a locally most powerful (LMP)

sequential statistical test. LMP test is an optimum statistical test for the detection

of weak signals. The extended steganalysis is mainly used to handle the difficulty

of analysing messages with a low signal-to-noise ratio (SNR)1 in a stego image. The

CUSUM was later combined with LMP to form an enhanced steganalysis technique

and its effectiveness tested with spread-spectrum steganography. In addition, it

can detect multiple messages embedded sequentially in different image segments.

The developed steganalysis technique seems to be a useful tool to evaluate the

security level in watermarking applications. However, this technique may not be

suitable for analysing steganography, especially when it uses a random embedding

algorithm. Note that the analysis presented in [105, 106] is more related to the

identification of stego-bearing pixel locations (discussed in Subsection 3.2.4) than

1Low SNR is often required to maintain the imperceptibility of embedding.

57

stegokey retrieval.

3.2.6 Extracting the Hidden Message

In general, messages are encrypted using cryptographic strong encryption algo-

rithm before embedding. This enhances security and provides a dual layer of se-

curity. Therefore, we might not be able to obtain a meaningful message even after

extracting a hidden message using the steganalysis techniques discussed. Clearly,

it is most desirable for the extracted message to be deciphered.

It is reasonable to consider and separate the analysis of steganography into two

phases: steganalysis and cryptanalysis. Steganalysis involves the analysis dis-

cussed from Subsection 3.2.1 until 3.2.5, whereas cryptanalysis deciphers the hid-

den message extracted in the steganalysis phase. Note that if a message is not

encrypted before embedding, cryptanalysis is not required.

58

Chapter 4

Blind Steganalysis

In general, there are two types of steganalysis—targeted and blind. Targeted

steganalysis is designed to attack one particular embedding algorithm. For exam-

ple, Bohme and Westfeld [7] broke model-based steganography [96] using analy-

sis of the Cauchy probability distribution. In another example, He and Huang

[49] successfully estimated the hidden message length for stochastic modulation

steganography [36], where a specific probabilistic distribution of a signal is modu-

lated and added to carry message bits. Jiang et al. in [57] launched an attack on

boundary-based steganography, which embeds secret messages in binary images.

Their attack hinges on an observation that embedding disturbs pixel positions and

this degrades the fit of an autoregressive model on binary object boundaries. The

work in [41, 42] showed how to estimate the stegokeys used for embedding hidden

messages. Targeted steganalysis can produce more accurate results, but can fail

if the embedding algorithm differs from the target one.

Blind steganalysis can be considered a universal technique that detects differ-

ent types of steganography. Because blind steganalysis can detect a wider class

of steganographic techniques, it is generally less accurate compared to targeted

steganalysis. However, blind steganalysis can detect a new steganographic tech-

nique, when there is no targeted steganalysis available. Thus, blind steganalysis

is an irreplaceable detection tool if the embedding algorithm is unknown or se-

cret. Successful blind steganalysis techniques include the feature-based steganal-

ysis proposed in [35], where a set of effective statistical characteristics (features)

is extracted to differentiate cover images from stego images. A similar technique

relying on pixel differences was used in [99, 70] to detect hidden messages—this

feature was proven to work well. Meng et al. employ a run length histogram in

their steganalysis to detect a hidden message in binary images [77].

59

In this work, we further study blind steganalysis and its effectiveness in detecting

a secret message embedded in a binary image. To confirm that our attack works,

we experiment with steganographic techniques, for which we have reduced the

length of an embedded message. In other words, we use images with a reduced

steganographic payload1 in our experiments. Our experiments have shown that

our steganalysis works well over a wide range of steganographic payloads.

The organisation of this chapter is as follow. In the next section, we give a brief

comparison of the steganographic methods. The technique of analysis we apply is

given in Section 4.2. Section 4.3 presents the experimental results of the analysis

and Section 4.4 concludes the chapter.

4.1 Comparison of the Steganography Methods

under Analysis

The steganographic techniques in [107, 10] are actually extended from [82]. With-

out loss of generality, we will describe the technique given in [82] as an example.

A detail description of these three techniques can be found in Section 3.1.

Steganography involves two basic operations—embedding and extraction. The em-

bedding operation in [82] starts by partitioning a given image into non-overlapping

blocks of size m × n. The payload for each non-overlapping block is r bits. The

message bits are segmented into streams of r bits and embedded by modifying

some pixels in the blocks. The modification of the pixels is governed by certain

criteria computed through bitwise exclusive-OR and pair-wise multiplication oper-

ations between the non-overlapping block, random binary (denoted κ) and secret

matrix (denoted ω). Both κ and ω are m× n matrices and serve as the stegokey.

During extraction, parameters such asm, n and r must be communicated correctly

between the sender and receiver to construct the correct size of non-overlapping

blocks and number of r bits per stream. In addition, the correct stegokey (κ and

ω) is needed to extract the secret message. After that, the receiver can derive the

message bits by using the inverse of the criteria used in the embedding operation.

The steganography in [107] is an improved version of the steganography devel-

oped initially in [82]. The improvement is mainly control of the visual quality

1Reducing the payload minimises the alteration of image pixels and cause less distortion,hence increasing the steganographic security.

60

Table 4.1: Comparison of the steganographic techniques

Steganography Secret Matrix, Payload, r Image Altered

ω Quality Bit

Pan et al. [82] weight matrix ≤ ⌊log2(mn+ 1)⌋ – 2

Tseng & Pan [107] weight matrix ≤ ⌊log2(mn+ 1)⌋ − 1 enhanced 2

Chang et al. [10] serial number ≤ ⌊log2(mn+ 1)⌋ – 1

matrix

of the produced stego image, where only boundary pixels are flipped. The third

steganography [10] is also improved from the steganography in [82]. As compared

to the method in [82] where two bits alteration per block at most are required,

the method in [10] only requires one bit. The extraction operation for Tseng and

Pan [107] and Chang et al. [10] steganographic methods are similar to that of

Pan et al. [74]. All of these steganographic techniques were developed to embed

secret message in binary images. Table 4.1 compares the main steganographic

characteristics of the techniques.

It is interesting that these techniques, for example the method in [82] can embed

as many as ⌊log2(mn+1)⌋ message bits but only need to alter two pixels at most.

Whereas, in conventional techniques, a one-pixel alteration accommodates one

message bit at most. Further, adjusting m and n changes the payload and affects

the security level. This gives flexibility in balancing between payload and security

level.

4.2 Proposed Steganalysis Method

Blind steganalysis can be viewed as a supervised machine learning problem that

classifies points as either points of an original cover image or points related to

an inserted secret message. Our analysis includes feature extraction and data

classification. The first stage is crucial and we show how to construct the features.

The second stage uses the SVM [23] as the classifier. SVM is based on the idea of

hyperplane separation between two classes. It obtains an optimal hyperplane that

separates the feature set of different classes into different sides of the hyperplane.

Based on the separation, the class an image belongs to can be determined.

61

4.2.1 Grey Level Run Length Matrix

The feature we want to extract from images is based on the grey level run length

(GLRL). The length is measured by the number of consecutive pixels for a given

grey level g and direction θ. Note that 0 ≤ g ≤ G − 1, G is the total number of

grey levels and θ, where 0◦ ≤ θ ≤ 180◦, indicates the direction. The sequence of

pixels (at a grey level) is characterised by its length (run length) and its frequency

count (run length value), which tells us how many times the run has occurred in

the image. Thus, our feature is a GLRL matrix that fully characterises different

grey runs in two dimensions: the grey level g and the run length ℓ. The GLRL

matrix is defined as follows:

r(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ s, y + t) = g;

p(x+ u, y + v) 6= g;

0 ≤ s < u & 0 ≤ t < v;

u = ℓ cos(θ) & v = ℓ sin(θ);

0 ≤ g ≤ G− 1 & 1 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (4.1)

where # denotes the number of elements and p(x, y) is the pixel intensity (grey

level) at position x, y. G is the total number of grey levels and L is the maximum

run length.

The extracted GLRL matrix from an image can be considered a set of higher-

order statistical characteristics. The GLRL matrix is not sufficient for an analysis

of black and white images, where the number of grey levels is drastically reduced

(note that in greyscale and colour images, the number of grey levels is larger than

that in binary images, which is at least 256). To fix this technical difficulty, we

propose a solution that allows us to create more grey levels and, consequently,

more meaningful statistics. Our approach is to use pixel differences.

4.2.2 Pixel Differences

The pixel difference is the difference between a pixel and its neighbouring pixels.

Given pixel p(x, y) of an image, with x ∈ [1, X ] and y ∈ [1, Y ], where X and Y

are the image width and height, respectively, the vertical difference for the pixel

62

p(x, y) in the vertical direction is defined as follows:

pv(x′, y′) = p(x, y + 1)− p(x, y), (4.2)

where x′ ∈ [1, X − 1] and y′ ∈ [1, Y − 1]. The pixel differences in the horizontal,

main diagonal and minor diagonal directions are defined similarly.

It is easy to observe, and has been confirmed by experiments, that introducing the

pixel difference increases (almost doubles) the number of grey levels. To illustrate

the point, consider a greyscale image with 256 grey levels. After introduction of

pixel difference, the range of grey levels becomes [−255,+255]. The same dou-

bling effect happens for binary images. This effect is desirable for addressing the

technical difficulty mentioned in Subsection 4.2.1.

The authors in [70] named this similar pixel difference a high-order differentiation

and derived some additional sets. Their features are defined as below:

pn+1c (x, y) = pn(x, y + 1)− pn(x, y),

pn+1r (x, y) = pn(x+ 1, y)− pn(x, y), (4.3)

p1(x, y) = |p1r(x, y)|+ |p1c(x, y)|, (4.4)

p2(x, y) = p1r(x, y)− p1r(x− 1, y) + p1c(x, y)− p1c(x, y − 1), (4.5)

where n = 0, 1, 2 and | · | represents the absolute value. p1c(x, y) and p1r(x, y) can

be considered the pixel difference in the vertical and horizontal directions, respec-

tively. p1(x, y) and p2(x, y) are the respective higher-order total differentiations.

p0(x, y) is a special case, which is the given image.

4.2.3 GLRL Matrix from the Pixel Difference

The statistical features we use in our analysis are developed in the following two

stages:

1. In the first stage, we use the pixel difference to increase the number of grey

levels. We incorporated the pixel difference shown in Equation (4.5). Note

that p2(x, y) in Equation (4.5) is obtained by summing the pixel differences

computed in the horizontal and vertical directions. Note that the doubling

effects of grey levels on the pixel difference increased from [0, 1] for p(x, y)

63

to [−4, 4] for p2(x, y). This is not hard to verify. For example, consider the

minimum and maximum of grey levels for p1r are −1 and +1, respectively.

The same is applied to p1c . Hence, we can obtain the minimum and maximum

grey level for p2(x, y). The minimum is obtained when the neighbouring

differences for both p1r and p1c (right components) in Equation (4.5) are −2,

which will produce p2(x, y) = −4, whereas the maximum is obtained when

the neighbouring differences for both p1r and p1c are 2, which will produce

p2(x, y) = 4.

2. In the second stage, we compute GLRLmatrix to get the required feature set.

This is achieved by extracting the GLRL matrix discussed in Subsection 4.2.1

on top of the pixel difference obtained in the first stage. We do not include

p(x, y) (pixel from the given binary image) and all the pixel differences as in

Subsection 4.2.2 (except p2(x, y)) because we observed that these features do

not improve the results significantly. We also observed that it is significant

to use only two directions for θ—0◦ and 90◦. Thus, by substituting p(x, y)

by p2(x, y) in Equation (4.1), we obtain our first set of features.

4.2.4 GLGL Matrix

Since GLRL matrix features tend to measure “plateaus” of the image, we need

additional sensitive features to reflect the image “peaks”2. The grey level gap

length (GLGL) matrix proposed in [114] seems to be an appropriate choice. The

authors in [114] used the GLGL matrix in texture analysis and defined it as follows:

a(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ u, y + v) = g;

p(x+ s, y + t) 6= g;

s < u & t < v;

u = (ℓ+ 1) cos(θ) & v = (ℓ+ 1) sin(θ);

0 ≤ g ≤ G− 1 & 0 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (4.6)

where # denotes the number of elements, L is the maximum gap length and the

rest of notations are the same as in Equation (4.1).

We compose two features from the GLGL matrix for our second feature set. The

first feature is composed directly from the binary image, as shown in Equation

2Small pixel-wide notches and protrusions near the boundary pixels caused by embedding.

64

(4.6). The second feature is from the pixel difference, similar to that from the

GLRL matrix in Subsection 4.2.3. We replace p(x, y) in Equation (4.6) with

p2(x, y) and set θ = 0◦ for both features.

4.2.5 Final Feature Sets

It is too computational expensive to use all elements in the GLRL and GLGL

matrices as feature elements. Therefore, we propose to simplify by transforming

the two-dimensional GLRL and GLGL matrices to one-dimensional histograms.

hGLRLg =

L∑

ℓ=1

r(g, ℓ|θ), 0 ≤ g ≤ G− 1, (4.7)

where θ = 0◦ and 90◦, and the rest of notations are the same as in Equation

(4.1). In addition, we observe that, within a GLRL matrix, there are some high

concentration of frequencies near the short runs, which may be important. Hence,

we propose to extract the first four short runs as a single histogram, hsrα

g .

hsrα

g = r(g, α|θ), 0 ≤ g ≤ G− 1, (4.8)

where θ = 0◦ and 90◦, and α = 1, 2, 3, 4 are the selected short runs.

The one-dimensional histogram of the GLGL matrix hGLGLg can be obtained in the

same way as in Equation (4.7). r(g, ℓ|θ) is replaced by a(g, ℓ|θ) for both p(x, y)

and p2(x, y). θ in this case is 0◦ and 0 ≤ ℓ ≤ L.

We also incorporate some of the high-order differentiation features developed in

[70]. These one-dimensional histograms can be derived from Equations (4.3) to

(4.5) and shown as follows:

hpn

q =X∑

x=1

Y∑

y=1

δ(q, pn(x, y)), minp ≤ q ≤ maxp, (4.9)

hpmc +pmrq =

X∑

x=1

Y−1∑

y=1

δ(q, pmc (x, y)) +

X−1∑

x=1

Y∑

y=1

δ(q, pmr (x, y)),

minp ≤ q ≤ maxp, (4.10)

where δ(µ, ν) = 1 if µ = ν and 0 otherwise. n = 1, 2 and m = 1, 2, 3. minp and

maxp denote the minimum and maximum values of the grey level, respectively.

65

Other notations are the same as in Subsection 4.2.2.

As noted previously, blind steganalysis can be considered a two-class classification,

so the extracted feature sets must be sensitive to embedding alterations. This is to

say that the feature values of the cover image should be different than those of the

stego image—the larger the difference, the better the features. Hence, we apply

the characteristic function, CF to each of the above histograms to achieve better

discrimination. The characteristic function can be computed using a discrete

Fourier transform, as shown in Equation (4.11).

CFk =

N−1∑

n=0

hne− 2πi

Nkn, 0 ≤ k ≤ N − 1, (4.11)

where N is the vector length, i is the imaginary unit and e−2πiN is the Nth root of

unity.

For each characteristic function (one for each histogram), we compute the mean,

variance, kurtosis and skewness. This includes the characteristic functions cal-

culated from Equations (4.9) and (4.10), where the original work developed in

[70] uses the first-order of moment. We form a 68-dimensional feature space, as

summarised in Table 4.2.

Table 4.2: Respective feature sets and the total number of dimensions for each set

Histogram Type Number of Number Statistic3 Total

Direction of Matrix Dimension

hGLRLg 2 1 4 8

hsrα

g 2 4 4 32

hGLGLg 1 2 4 8

hpn

q – 2 4 8

hpmc +pmrq – 3 4 12

Empirical evidence shows that the difference in feature values between the cover

image and the re-embedded cover image4 is significantly larger than the difference

between those of the stego image and the re-embedded stego image. This is helpful

in creating discriminating features and we apply it in our 68-dimensional feature

space as the final constructed feature set.

3Consists of the mean, variance, kurtosis and skewness.4The re-embedded image is the same image that has been re-embedded with the full-length

random message using the same steganography.

66

Table 4.3: Experimental parameters

Steganography Block Size Payload, Message Length Total Number of

(pixel) r (%) Stego Image

Pan et al. [82] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636

Tseng & Pan [107] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636

Chang et al. [10] 32 x 32 3 10, 30, 50 and 80 659 x 4 = 2636

4.3 Experimental Results

4.3.1 Experimental Setup

In our experiments, we construct a set of 659 binary images as cover images. The

images are all textual documents with white background and black foreground.

The image resolutions are all 200 dpi and with image size of 800 × 800. The

experimental parameters are summarised in Table 4.3.

As shown in Table 4.3, we used a larger non-overlapping block size (32 × 32) and

shorter secret message (smaller payload of r = 3 bits per block). This setup makes

our attack more difficult as the steganographic embedding is more secure. The

secret message length is measured as the ratio between the number of embedded

message bits and the maximum number of message bits that can be embedded in

a binary image. We employ uniform distribution of random message bits for the

experiments.

We extract the feature sets proposed in Subsection 4.2.5 for each image and use

the SVM implemented in [9] to classify the class (cover or stego) of the image. For

all experiments, we dedicate 80 per cent of the images to training the classifier;

the remaining 20 per cent are used for testing. The prototype implementation is

coded in Matlab R2008a.

4.3.2 Results Comparison

We use a receiver operating characteristic (ROC) curve to illustrate our detection

results. The ROC curve is a plot of detection probability versus false alarm prob-

ability; each point plotted on the ROC curve represents the achieved performance

of the steganalysis. We also use the area under the ROC curve (AUR) to provide

a clearer comparison. The AUR values ranged from 0.5 to 1.0, where 0.5 is the

67

worst detection performance and 1.0 is optimum—an AUR of 0.5 indicates that

the detection is merely random guessing; an AUR of 1.0 means the detection is

very reliable (detection probability = 1.0). Therefore, the closer the AUR is to

1.0, the better it is.

The respective ROC curves with the AUR values (in brackets) are shown in Figure

4.1. The area under the dotted diagonal line in each ROC curve is 0.5 (AUR =

0.5), which corresponds to random guessing. The figure clearly shows that the

detection results are very promising and the steganography developed by Tseng

and Pan [107] appeared to be the most difficult to detect. This is consistent

with the claim by Tseng and Pan that their method is an improved version. The

detection results for Pan et al. [82] and Chang et al. [10] methods are nearly

perfect.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

De

tectio

n

10%’s (0.9203)30%’s (1.0000)50%’s (1.0000)80%’s (1.0000)

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

De

tectio

n

10%’s (0.7843)30%’s (0.9930)50%’s (0.9997)80%’s (1.0000)1%’s (0.5205)

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

De

tectio

n

10%’s (0.9182)30%’s (1.0000)50%’s (1.0000)80%’s (1.0000)

(c)

Figure 4.1: Detection results using ROC curves and AUR: (a) detection result forPan et al. [82]; (b) for Tseng and Pan [107]; and (c) for Chang et al. [10]

It is also worth mentioning that the longer the embedded secret message, the more

image distortion it produces. Hence, it is relatively easier to detect a stego image

with a long embedded message than one with a shorter message. This is shown in

Figure 4.1 where the detection accuracy increased as the test moved from a shorter

message (10 per cent) to a longer message (80 per cent). For a message length

greater than 30 per cent, the detection is very accurate. However, if the message

is very short (one per cent), it is very difficult to detect (refer to the yellow line

in Figure 4.1(b)). The main reason for this is the image alteration caused by the

embedding is minimal (e.g., modifying 15 pixels in a 800 × 800 pixel image) and

therefore is not captured to any extent by our features.

68

4.4 Conclusion

Our proposed 48 new dimensional features, used in combination with some mod-

ification of the existing 20-dimensional features, achieve reliable and effective de-

tection of secret messages embedded in binary images. The experimental results

show that the proposed method can detect different lengths with low percentage

of embedded message. In addition, our proposed method can detect more than

one steganographic method, which make it a suitable blind steganalysis for binary

images.

69

Chapter 5

Multi-Class Steganalysis

In general, blind steganalysis is considered two-class classification. This means

that, given an image, the steganalysis should be able to decide the class (cover or

stego) of the image. It is possible to extend blind steganalysis to form a multi-class

steganalysis. From a practical point of view, multi-class steganalysis is similar to

blind steganalysis; however, it can accommodate more classes. The additional

classes come from different types of stego images, produced by different embed-

ding techniques. The task of multi-class steganalysis is to identify the embedding

algorithm applied to produce the given stego image or, if no embedding has been

performed on the image, it should be classified as a cover image.

In [87], Pevny and Fridrich extended the blind steganalysis developed in [35] to

form a multi-class steganalysis. Their multi-class steganalysis can classify embed-

ding algorithms based on the given JPEG stego images. Rodriguez and Peterson

[95] studied a different multi-class steganalysis for JPEG images. In [95], the ex-

tracted features are based on wavelet decomposition and a SVM is employed as

the classifier. The most recent work is the technique developed by Dong et al. [31].

The main contribution of this multi-class steganalysis is its ability to carry out

classification in two different image domains—the frequency domain (e.g., JPEG

images) and spatial domain (e.g., BMP images). Other multi-class steganalysis

approaches can be found in [88, 97, 108] and all were developed to counter JPEG

image steganography.

Note that these multi-class steganalysis techniques are for images with at least

eight bits per pixel intensity. This means that the images can be greyscale, colour

or true colour images. It is not clear how the existing multi-class steganalysis can

be generalised for black and white binary images. Unlike greyscale and colour

70

images, binary images have a rather modest statistical nature. This makes it

difficult to apply the existing multi-class steganalysis techniques on binary images.

To the best of our knowledge, there is no multi-class steganalysis proposed for

binary images in the literature.

In this chapter, we propose a multi-class steganalysis for binary images. The

work in this chapter is based on an extension of our previously developed blind

steganalysis for binary images (Chapter 4). There are three main contributions of

this chapter. First, we incorporate additional new features to our existing feature

sets. Second, the concept of cover image estimation is incorporated to enhance the

feature sensitivity. Third, a new multi-class steganalysis technique is developed.

Consequently, we are able to assign a given image to its appropriate class. This will

provide valuable information for steganalysts (e.g., forensic examiners) towards the

goal of extracting hidden messages.

The remainder of this chapter is as follow. In the next section, we summarise the

steganographic methods under analysis. The method of analysis we apply is given

in Section 5.2. The construction of multi-class classifier is discussed in Section

5.3. Section 5.4 presents the experimental results of the analysis and Section 5.5

concludes the chapter.

5.1 Summary of the Steganographic Methods un-

der Analysis

This chapter analyses five different types of steganography. These steganographic

methods have been elaborated concisely in Section 3.1. In this section, we briefly

summarise the methods and focus on their embedding algorithms. All the stegano-

graphic methods were developed to embed secret messages in binary images.

The first three methods under analysis are from the work developed in [82, 107,

10]. These methods are all variants of block-based steganography. To perform

embedding, a given binary image will be partitioned into non-overlapping blocks.

The message bits are divided into a stream of r bits before being embedded in

the block. Two sets of matrices, the random binary matrix and secret weight

matrix (the method in [10] uses the serial number matrix instead) are used to

determine which pixels should be flipped when necessary. The two matrices are

shared between the sender and receiver as the stegokey.

71

The steganography developed in [69] is considered boundary-based steganography.

This type of steganography will hide a message along the edges where white pixels

meet black ones—these pixels are known as boundary pixels. To obtain higher

imperceptibility, the locations of pixels used for embedding are permuted and

distributed over the whole image. The permutation is controlled by a PRNG

whose seed is a secret shared by the sender and the receiver.

Not all boundary pixels are suitable for carrying message bits because embedding

a bit into an arbitrary boundary pixel may convert it into a non-boundary one.

This will jeopardise the extraction and cause the recovery of the hidden message

becomes impossible. Because of these technical difficulties, some improvements

were developed by adding restrictions on the selection of boundary pixels for em-

bedding.

The last steganography under our analysis is that developed by Wu and Liu in

[112]. This technique also starts by partitioning a given image into blocks. The

odd-even relationship of the pixels within a block is adjusted to hold the message

bit. Clearly, when this odd-even relationship holds for the message bit to be

embedded then no alteration is required. Otherwise, some pixels are selected and

altered to adjust the odd-even relationship. Moreover, a flippability scoring system

is constructed to ensure the pixel selection for alteration is efficient.

5.2 Proposed Steganalysis

The ultimate goal of steganalysis is to extract the full hidden message. This task,

however, may be very difficult to achieve. Thus, we may start with more realistic

and modest goals, such as identifying the type of steganographic technique used

for the embedding. We want to improve our existing technique so that we can

identify the embedding algorithm.

To do this, we propose a multi-class steganalysis. Multi-class steganalysis can be

viewed as a supervised machine learning problem where we want to determine

the class of a given image. Our analysis includes feature extraction and data

classification stages. First stage is crucial and we show how to construct the

existing and new features in this section. The second stage uses the SVM [23] to

construct the multi-class classifier. We will describe the multi-class classifier in

detail in Section 5.3.

72

Figure 5.1: Pixel difference in vertical direction

5.2.1 Increasing the Grey Level via the Pixel Difference

The number of grey levels is insufficient for an analysis of black and white images

where the number of grey levels is drastically reduced (note that in greyscale and

colour images, there are at least 256 grey levels). To resolve this technical difficulty,

we propose a solution that allows us to create more grey levels and consequently

more meaningful statistics. Our approach is to use the pixel difference.

The pixel difference is the difference between a pixel and its neighbouring pixels.

Given a pixel p(x, y) of an image, with x ∈ [1, X ] and y ∈ [1, Y ], where X and Y

are the image width and height, respectively, the vertical difference for the pixel

p(x, y) in the vertical direction is defined as follows:

pv(x′, y′) = p(x, y + 1)− p(x, y), (5.1)

where x′ ∈ [1, X − 1] and y′ ∈ [1, Y − 1]. The pixel differences for the horizontal,

main diagonal and minor diagonal directions are defined similarly. Figure 5.1

illustrates the pixel difference in the vertical direction.

It is easy to observe and been confirmed by experiments that the introduction of

the pixel difference increases (almost doubles) the number of grey levels. To illus-

trate this point, consider a greyscale image with 256 grey levels. After introducing

the pixel difference, the range of grey levels becomes [−255,+255]. The same dou-

bling effect happens for binary images. This effect is desirable for resolving the

technical difficulty mentioned before.

For this purpose, we incorporated the pixel difference developed in [70]. Their

73

features are defined as below:

pn+1c (x, y) = pn(x, y + 1)− pn(x, y),

pn+1r (x, y) = pn(x+ 1, y)− pn(x, y), (5.2)

p1(x, y) = |p1r(x, y)|+ |p1c(x, y)|, (5.3)

p2(x, y) = p1r(x, y)− p1r(x− 1, y) + p1c(x, y)− p1c(x, y − 1), (5.4)

where n = 0, 1, 2 and | · | represents the absolute value. p1c(x, y) and p1r(x, y)

can be considered as pixel differences in the vertical and horizontal directions,

respectively. p1(x, y) and p2(x, y) are the respective higher-order total differences.

p0(x, y) is a special case and is actually the given binary image.

We further define the pixel difference an order higher as follows:

p3c(x, y) = p2(x, y + 1)− p2(x, y),

p3r(x, y) = p2(x+ 1, y)− p2(x, y). (5.5)

We call these third-order pixel differences. We would like to stress that all the sta-

tistical features we use in our analysis are based on this third-order pixel difference

and can be summarised in the following two stages:

1. In the first stage we use the third-order pixel difference to increase the num-

ber of grey levels. Note that p2(x, y) in Equation (5.4) is obtained by sum-

ming the pixel differences computed in the horizontal and vertical directions.

The doubling effect of the pixel difference increases the grey levels from [0,

1] for p(x, y) to [−4, 4] for p2(x, y). This is not hard to verify. For example,

consider the minimum and maximum of grey levels for p1r are −1 and +1,

respectively. The same is applied to p1c . Hence, we can obtain the minimum

and maximum grey level for p2(x, y). The minimum is obtained when the

neighbouring differences for both p1r and p1c (right components) in Equation

(5.4) are −2, which produces p2(x, y) = −4. The maximum is obtained

when the neighbouring differences for both p1r and p1c are 2, which will pro-

duce p2(x, y) = 4. Finally, using the same concept for the third-order pixel

difference, we can increase the number of grey levels to 17 (i.e., [−8, 8]).

2. In the second stage, we proceed with the computed third-order pixel differ-

74

ence to extract each of the specific feature sets. In other words, a certain

feature set (the feature sets will be discussed in Subsections 5.2.2 and 5.2.3)

is extracted on top of this third-order pixel difference.

5.2.2 Grey Level Run Length Matrix

The first feature set we try to extract is based on the grey level run length (GLRL).

The length is measured by the number of consecutive pixels for a given grey level

g and a direction θ. Note that 0 ≤ g ≤ G − 1 and G is the total number of grey

levels and θ indicates the direction, where 0◦ ≤ θ ≤ 180◦. The sequence of pixels

(at a grey level) is characterised by its length (run length) and its frequency count

(run length value) that tells us how many times the run has occurred in the image.

Thus, our feature is a GLRL matrix that fully characterises different grey runs in

two dimensions: the grey level g and the run length ℓ. The general GLRL matrix

is defined as follows:

r(g, ℓ|θ) = # {(x, y) | p(x, y) = p(x+ s, y + t) = g;

p(x+ u, y + v) 6= g;

0 ≤ s < u & 0 ≤ t < v;

u = ℓ cos(θ) & v = ℓ sin(θ);

0 ≤ g ≤ G− 1 & 1 ≤ ℓ ≤ L & 0◦ ≤ θ ≤ 180◦}, (5.6)


level) at position x, y. G is the total number of grey levels and L is the maximum

run length.

For our practical implementation, we simply concatenate p3c(x, y) with p3r(x, y) and

substitute p(x, y) in Equation (5.6). In addition, we observed that it is significant

to use only two directions for θ—0◦ and 90◦. Therefore, the extracted GLRL matri-

ces from the third-order pixel difference can be considered higher-order statistical

features.

5.2.3 Grey Level Co-Occurrence Matrix

We replaced the grey level gap length (GLGL) matrix proposed in our previous

work with the grey level co-occurrence matrix (GLCM). From empirical studies,

75

we found that GLCM performs better in multi-class classifications than GLGL.

GLCM can be considered an approach for capturing the inter-pixel relationships.

More precisely, the elements in a GLCM matrix represent the relative frequencies

of two pixels (with grey level g1 and g2, respectively) separated by a distance, d.

GLCM can be defined as follows:

o(g1, g2, d|θ) = # {(x, y) | p(x, y) = g1;

p(x+ u, y + v) = g2;

u = d cos(θ) & v = d sin(θ);

0 ≤ g1, g2 ≤ G− 1 & 1 ≤ d ≤ D & 0◦ ≤ θ ≤ 180◦}, (5.7)


level) at position x, y. G is the total number of grey levels and D is the maximum

distance between two pixels.

In our implementation, we substitute p(x, y) in Equation (5.7) with p3c(x, y),

p3r(x, y) and |p3c(x, y)| + |p3r(x, y)|. To avoid confusion, we call the resultants

o1(g1, g2, d|θ), o2(g1, g2, d|θ) and o3(g1, g2, d|θ), respectively. Thus, we can obtain

four GLCM matrices from each of o1(g1, g2, d|θ), o2(g1, g2, d|θ) and o3(g1, g2, d|θ).

Each matrix comes from one direction for a total of four directions (0◦, 45◦, 90◦

and 135◦). We set the distance, d to one.

5.2.4 Cover Image Estimation

Cover image estimation is the process of eliminating embedding artefacts1 in a

given image with the objective of getting close to a “clean image”. Cover image

estimation was first proposed by Fridrich and known as image calibration [38, 39,

35]. For brevity, consider the following proposition:

Let Ic and Is represent the cover image and stego image, respectively.

If∑

|Ic − I ′c| <∑

|Is − I ′s|, then

φ(Ic)− φ(I ′c) < φ(Is)− φ(I ′s), (5.8)

where I ′c and I ′s are the estimated cover images from Ic and Is, respec-

tively. I − I ′ is the pixel-wise difference between two same resolution

1Embedding artefact refers to any alteration or mark introduced by embedding.

76

images. | · | represents absolute value and φ() indicates the feature

extraction function.

From this proposition, the feature sets extracted from the feature differences (e.g.,

φ(Is) − φ(I ′s)) can be considered as the differences caused by the embedding op-

eration, as long as the relationship holds. This is desired because we want to

have feature sets that are sensitive to the embedding artefacts and invariant to

the image content.

We chose an image filtering approach to cover image estimation. There are several

alternative image filters and, from empirical studies, we found that the Gaussian

filter produces the best results. Three parameters must be determined to use

this filter: standard deviation of Gaussian distribution (σ) and the distances for

horizontal and vertical directions (dh and dv, respectively). We determined σ =

0.6, dh = 3 and dv = 3 from trial and error, which gives us the optimum solution.

5.2.5 Final Feature Sets

It is very computationally expensive to use all elements in GLRL and GLCM ma-

trices as the feature elements. Therefore, we propose to simplify them by trans-

forming the two-dimensional GLRL and GLCM matrices into one-dimensional

histograms.

hGLRLg =

L∑

ℓ=1

r(g, ℓ|θ), 0 ≤ g ≤ G− 1, (5.9)

where θ = 0◦ and 90◦, and the rest of notations are the same as in Equation

(5.6). We observe that, within a GLRL matrix, there are some high concentration

of frequencies near the short runs, which may be important. Hence, we propose

extracting the first four short runs as a single histogram, hsrα

g .

hsrα

g = r(g, α|θ), 0 ≤ g ≤ G− 1, (5.10)

where θ = 0◦ and 90◦ and α = 1, 2, 3, 4 are the selected short runs.

The one-dimensional histogram of the GLCM matrix, hGLCMηg can be obtained in

a similar manner as in Equation (5.9), which is defined as follows:

hGLCMη

g =

∑G−1g1=0 oη(g1, g, d|θ) +

∑G−1g2=0 oη(g, g2, d|θ)

2, 0 ≤ g ≤ G− 1,(5.11)

77

where η = 1, 2, 3 and θ = 0◦, 45◦, 90◦ and 135◦. d = 1 and the rest of notations

are the same as in Equation (5.7).

As noted before, multi-class steganalysis can be considered a multi-class classifica-

tion, so the extracted feature sets must be sensitive to embedding alterations—the

feature values should be very distinctive. The larger the differences across the dif-

ferent classes, the better the features. Hence, we apply the characteristic function,

CF to each of the histograms to achieve better discrimination. The characteristic

function can be computed by a discrete Fourier transform, as shown in Equation

(5.12).

CFk =N−1∑

n=0

hne− 2πi

Nkn, 0 ≤ k ≤ N − 1, (5.12)

where N is the vector length, i is the imaginary unit and e−2πiN is a Nth root of

unity.

For each of characteristic function (one for each histogram), we compute the mean,

variance, kurtosis and skewness. Except for the characteristic function of the four

hGLCMηg histograms (i.e., for the four directions) shown in Equation (5.11), we

compute the average of these characteristic functions. Based on the averaged

characteristic function, we compute its mean, variance, kurtosis and skewness.

We include another four statistics for each of the computed GLCM matrices, as

discussed in Subsection 5.2.3. These four statistics can be defined as follows:

contrast =∑

g1

∑

g2

|g1 − g2|2o(g1, g2), (5.13)

energy =∑

g1

∑

g2

o(g1, g2)2, (5.14)

homogeneity =∑

g1

∑

g2

o(g1, g2)

1 + |g1 − g2|, (5.15)

correlation =∑

g1

∑

g2

(g1 − µg1)(g2 − µg2)o(g1, g2)

σg1σg2

, (5.16)

where µg1 and µg2 are the means of o(g1, g2), whereas σg1 and σg2 are the standard

deviations of o(g1, g2).

We form a 100-dimensional feature space as summarised in Table 5.1.

2averaged from four directions

78

Table 5.1: Respective feature sets and the total number of dimensions for each set

Number Number CF Mean, Variance, Total

of of Kurtosis and DimensionsDirection Matrix Skewness

hGLRLg 2 1 applied 4 8

hsrα

g 2 4 applied 4 32

hGLCMηg 12 3 applied 4 12

contrast 4 3 – – 12

energy 4 3 – – 12

homogeneity 4 3 – – 12

correlation 4 3 – – 12

5.3 Multi-Class Classification

As stated in Section 5.2, the second stage of our proposed steganalysis is multi-

class classification. We have chosen the SVM as our multi-class classifier. We

start this section by explaining the general terminology of two-class SVM classifi-

cation and then show how to generalise the two-class classification into multi-class

classification using SVM.

SVM can be considered a classification technique that can learn from a sample.

More precisely, we can train the SVM to recognise and assign labels (classes) based

on the given data collection (using features). For example, we train the SVM to

differentiate a cover image (class-1 ) from a stego image (class-2 ) by examining

the extracted features from many instances of cover images and stego images.

The SVM finds the separating line and determines the cluster in which an unknown

image falls. Finding the right separating line is crucial and this is what the training

accomplishes. In practice, the feature dimensionality is higher and we need a

separating plane instead of line. This is known as separating hyperplane.

The goal of SVM is to find a separating hyperplane that effectively separates

classes. To do that, the SVM will try to maximise the margin of the separat-

ing hyperplane during training. Obtaining this maximum-margin hyperplane will

optimise the ability of the SVM to predict the class of an unknown object (image).

However, there are often non-separable datasets that cannot be separated by a

straight separating line or flat plane. The solution to this difficulty is to use a

79

Table 5.2: Example of majority-voting strategy for multi-class SVM

class-1 class-2 class-3

SVM-a 0 1 0

SVM-b 1 0 0

SVM-c 0 1 0

Total Votes 1 2 0

kernel function. The kernel function is a mathematical routine that projects the

features from a low-dimensional space to a higher dimensional space. Note that

the choice of kernel function affects the classification accuracy. For further reading

of SVM, readers are referred to [80].

Although the nature of SVM is two-class classification, it is not hard to generalise

the SVM to handle multiple classes. Several approaches can be used, including

one-against-one, one-against-all and all-together. According to the recommenda-

tions given in [53], one-against-one provides the best and most efficient classifica-

tions.

Here, therefore, we will be using one-against-one and we discuss only this approach.

For other approaches and the details of comparison, readers are referred to [53].

For a multi-class SVM based on the one-against-one approach, K(K − 1)/2 two-

class SVMs are constructed. Each of these SVM classifiers is assigned to classifica-

tion of a non-overlapping pair of classes (which means there are no two pairs with

the same combination of classes). After completing all two-class classifications, a

majority-voting strategy determines the final class of an object. With majority

voting, the class receiving the most votes during the classification processes is con-

sidered the correct class. If there are two or more classes with the same number

of votes, one is chosen arbitrarily.

Consider the following example. Suppose we have class-1, class-2 and class-3. We

can construct three two-class SVMs—SVM-a classifying classes-1 and -2, SVM-b

classifying classes-1 and -3 and SVM-c classifying classes-2 and -3. Assume that,

given an image, each of the two-class SVM classification results can be obtained,

as tabulated in Table 5.2. From the table, the given image can be identified as

belonging to class-2 because it received the highest number of votes (typeset to

bold).

80

Table 5.3: Summary of image databases

Database Total Images Resolution Image Type

Textual 659 200 dpi Textual

Mixture 659 200 dpi Textual and Graphic

Scene 1338 72 dpi Scene



To cover a wider range of images, we constructed three image databases. The first

database consists of 659 binary images as cover images. The images are all textual

documents with white background and black foreground. The image resolutions

are all 200 dpi and with image size of 800 × 800. The second image database

also consists of 659 binary images as cover images. These images have the same

properties as the first image database; however, we added some graphics (i.e.,

cartoons, clipart and some random shapes) randomly positioned in each textual

document. In the third image database, we constructed 1338 binary images from a

greyscale images using Irfanview version 4.10 freeware. These images are actually

converted from natural images. Their resolution is 72 dpi and with image size of

512× 384.

Overall, we constructed 2656 cover images. The image databases are summarised

in Table 5.3. For brevity, we will name the image databases textual, mixture and

scene databases, respectively.

As discussed in Section 5.1, we used five different steganographic techniques to

generate different types (classes) of stego images. Due to the different embedding

algorithm in each technique, the steganographic capacity produced also varies sig-

nificantly. Hence, to obtain a fair comparison we opt to use the absolute stegano-

graphic capacity, which can be measured as bits per pixel (bpp). Since a binary

image only has one bit per pixel, we can think of bpp as the average number of bits

per image. For example, 0.01 bpp embedding means that, for every 100 pixels,

only one pixel is used to carry message bits. 0.01 bpp is significantly small, which

also means there is less distortion in the image. This implies that the produced

stego image is relatively secure and harder for steganalysis to detect.

81

Table 5.4: Summary of stego image databases

Database Total Number of 0.003 bpp 0.006 bpp 0.01 bpp Total

Images Steganography Images

Textual 659 5 1 1 1 9885

Mixture 659 5 1 1 1 9885

Scene 1338 5 – – 1 6690

To verify the effectiveness of our proposed multi-class steganalysis, we constructed

stego images with capacities of 0.003, 0.006 and 0.01 bpp for each steganography

approach. Every stego image in the experiment is embedded with randomly gener-

ated message bits. This means that we used relatively huge stego image databases

of 26460 images (as summarised in Table 5.4).

We will extract the feature sets for each image from all the image databases

mentioned above using the feature extraction methods proposed in Section 5.2.

These feature sets will serve as the inputs for the multi-class SVM. As describes in

Section 5.3 we will use the one-against-one approach to construct our multi-class

steganalysis. We use the SVM implemented in [9] and follow the recommendations

given in [53] for using radial basis function (RBF) as the kernel function and

the optimum parameters for the SVMs were determined through a grid-search

tool from [9]. We dedicated 80 per cent of each image database for training the

classifiers and the remaining 20 per cent is used for testing. The prototype is

implemented in Matlab R2008a.


To simplify the presentation, we abbreviate the five steganographic methods as

PCT, TP, CTL, LWZ and WL for [82], [107], [10], [69] and [112], respectively.

Our multi-class steganalysis classification results are displayed in a table format

called a confusion matrix. In the confusion matrix, the first column consists of the

classes, which include one cover image class and five different classes of stego image

(i.e., each class of the stego image is produced by each of the five steganographic

methods discussed). The value within brackets beside each class indicates the

embedded capacity. For the cover image class, there is no embedding and we can

consider the embedded capacity as zero (0.0 bpp). The first row of the confusion

matrix indicates the class of a given image.

82

We separated the three databases into three confusion matrices—Tables 5.5, 5.6

and 5.7. To better illustrate the results, we typeset the desired results in bold.

In other words, the correct classification results are aligned along the diagonal

elements within each confusion matrix.

From the confusion matrices, we clearly see that the multi-class steganalysis gives

very promising results. Especially in Table 5.5, the detections are nearly perfect.

The results obtained for the mixture image database (Table 5.6) are accurate

although slightly less than the results obtained in Table 5.5. The results for

the scene image database (Table 5.7) appear to be the least accurate; however,

the detection reliability is good and all the detection results show at least 80

per cent accuracy. Note that the type of cover image used affects the detection

accuracy, which means it is relatively easier to detect images with textual content

than images with natural scenes. This observation is supported by the detection

accuracy order (where the results in Table 5.5 are the best, followed by the results

in Table 5.6 and lastly the results in Table 5.7).

We attribute this phenomenon to the fact that the textual content in an image

has periodic patterns that are uniform and consistent. However, an image with

scene content has fewer fixed patterns and may appear more random.

It is also worthwhile mentioning that embedding a longer secret message produces

more distortion in an image. Hence, it is relatively easier to detect a stego image

with a longer embedded message (higher bpp) than with a shorter message (lower

bpp). This is seen by comparing the rows with 0.01 bpp to the rows with 0.003

bpp in the confusion matrices.

5.5 Conclusion

We proposed a multi-class steganalysis for binary images. Our proposed 60-

dimensional feature sets, used in combination with existing 40-dimensional feature

sets, extended from our previous work effectively and accurately classified images

to the appropriate class—one cover image class and five of stego images produced

by different steganographic techniques. We employed the concept of cover image

estimation, which improved the classification. Experimental results showed that

our proposed method can detect low embedded capacity. Further, the experimen-

tal results showed that a detection accuracy of at least 92 per cent can be achieved

with textual or a mixture of textual and graphic images. However, the accuracy

83

decreased slightly, to 80 per cent, in natural scene binary images.

Table 5.5: Confusion matrix of the multi-class steganalysis for the textual database

Classified as

WL(%) LWZ(%) CTL(%) TP(%) PCT(%) Cover(%)

Cover 0.00 0.00 0.00 0.00 0.00 100.00

PCT (0.003 bpp) 0.00 0.00 0.77 0.00 99.23 0.00

TP (0.003 bpp) 0.00 0.00 0.00 100.00 0.00 0.00

CTL (0.003 bpp) 0.00 0.00 100.00 0.00 0.00 0.00

LWZ (0.003 bpp) 0.00 99.23 0.77 0.00 0.00 0.00

WL (0.003 bpp) 96.15 0.77 0.77 1.54 0.00 0.77

PCT (0.006 bpp) 0.00 0.00 0.00 0.00 100.00 0.00

TP (0.006 bpp) 0.00 0.00 0.00 100.00 0.00 0.00

CTL (0.006 bpp) 0.00 0.00 100.00 0.00 0.00 0.00

LWZ (0.006 bpp) 0.00 99.23 0.77 0.00 0.00 0.00

WL (0.006 bpp) 98.46 0.00 0.77 0.77 0.00 0.00

PCT (0.01 bpp) 0.00 0.00 0.00 0.00 100.00 0.00

TP (0.01 bpp) 0.77 0.00 0.00 99.23 0.00 0.00

CTL (0.01 bpp) 0.00 0.00 100.00 0.00 0.00 0.00

LWZ (0.01 bpp) 0.00 100.00 0.00 0.00 0.00 0.00

WL (0.01 bpp) 99.23 0.00 0.77 0.00 0.00 0.00

84

Table 5.6: Confusion matrix of the multi-class steganalysis for the mixturedatabase

Classified as


Cover 0.00 0.77 0.00 0.00 0.00 99.23

PCT (0.003 bpp) 0.00 0.00 2.31 0.77 96.92 0.00

TP (0.003 bpp) 2.31 1.54 0.00 96.15 0.00 0.00

CTL (0.003 bpp) 0.00 0.77 92.31 0.77 6.15 0.00

LWZ (0.003 bpp) 1.54 96.15 0.77 1.54 0.00 0.00

WL (0.003 bpp) 96.92 1.54 0.77 0.77 0.00 0.00

PCT (0.006 bpp) 0.00 0.00 2.31 0.00 97.69 0.00

TP (0.006 bpp) 0.00 0.77 0.00 99.23 0.00 0.00

CTL (0.006 bpp) 0.00 0.00 99.23 0.00 0.77 0.00

LWZ (0.006 bpp) 0.00 99.23 0.00 0.00 0.77 0.00

WL (0.006 bpp) 98.46 0.00 0.00 0.00 0.77 0.77

PCT (0.01 bpp) 0.00 0.00 1.54 0.00 98.46 0.00

TP (0.01 bpp) 0.00 0.00 0.00 100.00 0.00 0.00

CTL (0.01 bpp) 0.00 0.00 99.23 0.00 0.77 0.00

LWZ (0.01 bpp) 0.00 99.23 0.00 0.00 0.77 0.00

WL (0.01 bpp) 99.23 0.00 0.00 0.00 0.77 0.00

Table 5.7: Confusion matrix of the multi-class steganalysis for the scene database

Classified as


Cover 2.26 5.26 1.88 8.65 0.38 81.58

PCT (0.01 bpp) 0.76 0.38 4.51 0.00 93.99 0.38

TP (0.01 bpp) 11.65 3.01 0.38 80.08 0.00 4.89

CTL (0.01 bpp) 1.13 1.50 89.10 0.00 3.76 4.51

LWZ (0.01 bpp) 1.88 91.35 0.38 2.26 0.38 3.76

WL (0.01 bpp) 85.34 2.26 0.38 10.90 0.38 0.75

85

Chapter 6

Hidden Message Length

Estimation

The field of information hiding has two facets. The first relates to the design

of efficient and secure data hiding and embedding methods. The second facet,

steganalysis, attempts to discover hidden data in a medium. Under ideal circum-

stances, an adversary who applies steganalysis wishes to extract the full hidden in-

formation. This task, however, may be very difficult or even impossible to achieve.

Thus, the adversary may start steganalysis with more realistic and modest goals.

These could be restricted to finding the length of hidden messages, identification

of places where bits of hidden information have been embedded, estimation of the

stegokey and classification of the embedding algorithms. Achieving some of these

goals enables the adversary to improve the steganalysis, making it more effective

and appropriate for the steganographic method used.

Most works published on steganalysis relate to methods that use colour or greyscale

images. Steganography that uses binary images has received relatively little atten-

tion. This can be partially attributed to the difficulty of applying the statistical

model used for colour and greyscale images and adapting it to the new environ-

ment. In spite of this difficulty, binary images are very popular and frequently used

to store textual documents, black and white pictures, signatures and engineering

drawings, to name a few.

Colour and greyscale images are characterised by a rich collection of various statis-

tical features that have been used to develop new steganalysis techniques. Unlike

colour and greyscale images, binary images have a rather modest statistical na-

ture. In general, it is difficult to convert steganalysis used for colour or greyscale

86

images to an attack on binary images. However, we have successfully adapted

the concepts used for colour or greyscale images and we propose a new collection

of statistical features that estimate the length of hidden message embedded in

a binary image. Consequently, we can decide whether a given image contains a

hidden message. More precisely, we can tell apart the cover images from the stego

images.

We must emphasise that our steganalysis is designed to attack the steganographic

technique developed in [69]. This means that our analysis is a type of targeted

steganalysis. In this work, we define the length of embedded message as the ratio

between the number of bits of the embedded message and the maximum number

of bits that can be embedded in a given binary image. Note that we use the

terms message length, embedded message length and hidden message length as

synonyms.

The organisation of this chapter is as follow. In the next section, we give a brief

summary of the steganographic method we analyse. The technique of analysis

we apply is given in Section 6.2. Section 6.3 presents the results of the analysis.

Section 6.4 concludes the chapter.

6.1 Boundary Pixel Steganography

We briefly introduce the steganographic method under analysis in this section.

Note that this steganographic method [69] is described in detail in Subsection

3.1.1 and is summarised here.

The steganography developed in [69] is a variant of boundary pixel steganography.

This method uses a binary image as the medium for secret message bits. A set of

rules is proposed to determine the data carrying eligibility of the boundary pixels.

This plays an important role in ensuring that embedding produces minimum dis-

tortion and obtaining error free message extraction. In addition, the embedding

algorithm generates no isolated pixels. This method also employs a PRNG to

produce a random selection path for embedding.

As the embedding algorithm modifies only boundary pixels, the visual distortions

are minimal and there is no pepper-and-salt like noise. However, if we take a close

look at an image with an embedded message, we can observe small pixel-wide

notches and protrusions near the boundary pixels.

87

We use these small distortions to launch an attack on the steganographic algo-

rithm. In our attack, we first detect the existence of a hidden message and then

estimate its length.

6.2 Proposed Method

We want to propose a steganalysis technique that can counteract the steganog-

raphy developed in [69]. However, given an image, we do not know whether the

image is a cover or a stego image without a priori knowledge. What we can do is

to extract some useful characteristics from the given image. These characteristics

may reveal some estimate of the length of embedded hidden message (i.e., if zero

percent is estimated then the given image is a cover image; if certain nonzero

percentage is estimated then the given image is a stego image).

We first define a statistic by measuring the number of notches and protrusions

in the image. This statistic will reflect the degree of image distortion. Then we

define a numerical value associated with this statistic. Finally, we show that this

numerical value is approximately proportional to the size of the embedded mes-

sage, which enables us to compute an estimate of the embedded message length.

6.2.1 512-Pattern Histogram as the Distinguishing Statis-

tic

For any boundary pixel (as shown in Figure 6.1), we can form a certain pixel

pattern together with its eight neighbouring pixels. Examples of the pattern are

shown in Figure 6.2 (the shaded box represents a pixel value of zero and the white

box represents a pixel value of one). Altogether, 512 patterns can be formed by

the different combinations of black and white pixels in a block of nine pixels;

however, two patterns cannot be used, because they do not contain any boundary

pixels. Clearly, these patterns are formed by either all black or all white pixels.

To simplify our considerations, we assume that there are 512 patterns.

The 512-pattern histogram H(J) tabulates the frequency of occurrence of each

pattern in the given image J . The frequency of occurrence hi for the ith pattern

88

Figure 6.1: Illustration of a boundary pixel in the magnified view on some portionof the ‘n’ character

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

n n

b

n

n n

n n n

Figure 6.2: Examples of the patterns formed by a single boundary pixel (denotedby b) and its eight neighbouring pixels (denoted by n) from a binary image

is given by

hi =M∑

k=1

δ(i, p(k)), (6.1)

where p(k) denotes the kth pattern in the given image, M is the total number of

patterns in the given image and δ is the Dirac delta function (δ(u, v) = 1 if u = v

and 0 otherwise). For brevity, we let H represent H(J) and have

H = {hi | 1 ≤ i ≤ 512}. (6.2)

In this 512-pattern histogram, a cover image has some high-frequency bins (corre-

sponding to the pattern types) and some other bins with low frequency. However,

these bins tend to be flattened1 after embedding (see Figure 6.3 for example).

This happens because, during embedding, some image pixels are flipped to carry

message bits, which disturbs the inter-pixel correlation. This is reflected in the

pattern changes.

The longer the embedded message, the flatter the 512-pattern histogram. From

1By flattened we mean some of the local maxima in the histogram will decrease and some ofthe local minima in the histogram will increase.

89

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

500

1000

1500

2000

2500

3000

Selected Bins from the 512 Pattern Histogram

Fre

quen

y

Cover ImageStego Image (80%)

Decrement

Decrement

Increment Increment

Figure 6.3: Some of the bins from the 512-pattern histogram are selected to illus-trate the comparison between a cover image and stego image (embedded with 80per cent of the message length). Note that some of the bins are flattened in thestego image

this observation, we propose to compute the histogram difference to capture the

“flatness” of the 512-pattern histogram. The histogram difference is the bin-wise

absolute difference between the 512-pattern histograms for two images. The first

histogram is from the given binary image and the second is from the same image

after it been re-embedded with a random message of the maximum length (100 per

cent). The re-embedding operation uses the steganographic technique described

in Section 6.1.

The following equation defines the second histogram:

H ′ = {h′i | 1 ≤ i ≤ 512}, (6.3)

where h′i is the corresponding frequency of occurrence for the ith pattern in the

same image that has been re-embedded with 100 per cent of the length of a random

message. Then the histogram difference can be written as follows:

HD = {|hi − h′i| | 1 ≤ i ≤ 512}, (6.4)

where | · | represents the absolute value.

We choose to calculate the histogram difference because using the 512-pattern

histogram for the given image directly is insufficient. The 512-pattern histogram

for a given image does not fully represent the embedding artefacts and may be

biased by the image content. It would be easier if we had both the cover and

stego versions of the images—the differences between these two images would be

90

only from the embedded message. However, under normal circumstances we do

not have both versions. Therefore, it is useful to work backward by determining

how many remaining boundary pixels can be used for embedding, which gives an

estimate the number of message bits that have (or have not) been embedded. This

explains why we opted to use re-embedding to obtain the histogram difference.

We use Figure 6.4 to illustrate our considerations. Figure 6.4(a) shows two binary

images that are slightly different but their pattern histograms (Figure 6.4(b)) are

entirely different. However, as shown in Figure 6.4(c), the histogram differences for

the respective bins of the two binary images are (almost) identical. This argument

supports the use of the histogram difference.

6.2.2 Matrix Right Division

To allow the histogram difference to measure the embedded message length, we

propose using matrix right division. Matrix right division can be considered a

transformation of a histogram to a numerical value (one-dimensional metric). Al-

ternatively, matrix right division can be seen as an attempt to solve an appropriate

system of linear equations.

The matrix right division used is from the standard MATLAB R2007b built-in

matrix division function and defined as follows:

If A is a non-singular and square matrix and B is a row vector, then

x = B/A is the solution to the system of linear equations x × A = B

computed by Gaussian elimination with partial pivoting.

If x×A = B is an over-determined system of linear equations, then x =

B/A is the solution in the least squares sense of the over-determined

system.

In general, when A is non-singular and square, the system has an exact solution

given by x = B × A−1 where A−1 is the inverse of matrix A. Hence, the solution

can be computed by multiplying the row vector B with the inverse of matrix A. It

is also defined as multiplication with the pseudo inverse (refer to [58, 45] for details

of the pseudo inverse). However, a solution based on a matrix inverse is inefficient

for practical applications and may cause large numerical errors. A better approach

is to use matrix division.

For an over-determined system of equations, it is impossible to compute the inverse

91

(a)

1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

350

400

450

Selected Bins from the 512 Patterns Histogram

Fre

quen

cy

Image # 1Image # 2

(b)

1 2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

700

800

Selected Bins from the 512 Histogram Difference

Fre

quen

cy

Image # 1Image # 2

(c)

Figure 6.4: (a) Two sample binary images. (b) Some bins from the 512-patternhistogram of the binary images shown in (a). (c) The respective bins in thehistogram difference of the binary images shown in (a).

92

of matrix A. However, a solution for this system can be computed by minimising

the number of elements in r = x × A − B (also known as the Euclidean length).

This can be computed in matrix division, which corresponds to the sum of squares

of r. This yields a solution in the least squares sense. In our application of matrix

right division, matrix A is actually a row vector of the same length as B.

It is reasonable to consider a pattern histogram as a row vector; since the his-

togram difference is the bin-wise absolute difference between two 512-pattern his-

tograms, the histogram difference can be considered a row vector as well. Thus,

we can perform matrix right division between the histogram difference and the

512-pattern histogram of a given binary image, as in Equation (6.5). We call the

resulting numerical value a histogram quotient hq. However, the division is not an

element-wise division.

hq = HD/H. (6.5)

We illustrate matrix right division in the following examples:

Example 1 Example 2

x =[

2 4 8]

x =[

2 4 12 9 18 6]

y =[

1 2 4]

y =[

1 2 6 3 6 2]

x/y = 2 x/y = 2.5444

In Example 1, all elements of x are a multiple of two of the vector y. Thus, the

quotient matrix right division is two. In Example 2, the first three elements of x

are products of the first three elements of y by two. The last three elements of x

are products of last three elements of y by three. Thus, the quotient matrix right

division is 2.5444, which is between two and three.

6.2.3 Message Length Estimation

In this subsection, we select a binary image to demonstrate the consistent be-

haviour of the histogram quotient. We re-embedded it with a five per cent incre-

ment of the message length and observed the response of the histogram quotient.

As shown in Figure 6.5, the respective histogram quotient increases almost lin-

early until a certain point, where a further increase the length of the re-embedded

message does not increase the histogram quotient. This shows that the desired

93

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Length of Re−embedded Message (%)

hist

ogra

m q

uotie

nt

Figure 6.5: Histogram quotient with a five per cent increment in the re-embeddedmessage length

consistency can be obtained by using a histogram quotient based on the histogram

difference. Therefore, as discussed in Subsection 6.2.1, finding the corresponding

difference between the two circles shown in Figure 6.5 proves crucial and provides

us a strong trait for estimating the embedded message length.

In short, our proposed method first identifies all boundary pixels in a given binary

image. The boundary pixel used here is defined as a pixel that has at least one

neighbouring pixel (among the four neighbouring pixels) with a different pixel

value. Then the 512-pattern histogram will be obtained from these boundary

pixels. Based on this pattern histogram, the histogram difference is computed and

the histogram quotient is calculated (denoted hq in Equation (6.5)). Finally, we

employ linear interpolation to obtain an approximate constant of proportionality

c such that hq ≈ c × ℓ, where ℓ is the message length. Then, for any particular

value of hq, we can compute an estimate of ℓ using ℓ ≈ hqc.



The experimental settings are described below:

❐ The embedding algorithm used to create the stego images is the steganog-

raphy proposed in [69].

❐ The total embeddable pixels per image produced by this embedding algo-

rithm is about 25 per cent of the total boundary pixels.

❐ The maximum message length (100 per cent length) is defined as the total

94

0 100 200 300 400 500 600

0

20

40

60

80

Image #

Est

imat

ed M

essa

ge L

engt

h (%

)

Figure 6.6: Estimated length of hidden messages for all binary images

number of embeddable pixels per image.

❐ Eight sets of stego images (i.e., 10, 20, 30, 40, 50, 60, 70 and 80 per cent)

are created from 659 binary cover images.

❐ The cover images are all textual documents with a white background and

black foreground.

❐ The resolution of all binary images is 200 dpi and with image size of 800×800.

❐ The prototype is implemented in Matlab R2008a.

6.3.2 Results of the Estimation

From the 5931 mixture of cover and stego images, we estimate the length of the

embedded message and compare it with the actual embedded lengths of 0, 10 ,

20, 30, 40, 50, 60, 70 and 80 per cent, using our proposed method. Zero per cent

represents a cover image.

The estimation results are shown in Figure 6.6. The estimated lengths are very

close to the actual lengths. The estimations for large embedded messages, such

as 80 per cent are not as close as those of other estimations, although they retain

good accuracy. At such a high percentage, some stego images are quite distorted

and the pixels exhibit a high degree of randomness. We believe this randomness

causes slight instability of our proposed method; however, this phenomenon does

not pose a serious problem because we can easily spot the embedding artefacts

in such a highly distorted stego image (Figure 6.7 shows a highly distorted stego

image).

Table 6.1 summaries the mean and standard deviation of all the estimated message

lengths according to the actual embedded message lengths. The average value for

95

Figure 6.7: Example of a highly distorted stego image embedded with 80 per centof the message

Table 6.1: Mean and standard deviation of the estimation

Length (%) Mean Standard Deviation

0 −0.0277 1.8761

10 9.8540 1.8034

20 19.8438 1.5966

30 29.9271 1.4337

40 39.9608 1.3445

50 50.0210 1.2869

60 60.0763 1.2747

70 70.3436 1.7666

80 79.9598 2.0547

each estimated length is very close to the actual length. The standard deviation is

also very small—only about one or two per cent. This implies that the estimated

lengths do not deviate much from the actual lengths.

The estimation errors are displayed in Figure 6.8. The estimation error for each

binary image is computed as the difference between the estimated and the actual

embedded message length in percentage terms. The estimation errors are rela-

tively low and concentrated around 0.00 per cent. The highest estimation error is

occasionally found and only about 6.00 per cent, except that one outlier has an

error of 7.43 per cent.

96

0 100 200 300 400 500 600−10

−8

−6

−4

−2

0

2

4

6

8

10

Image #

Est

imat

ion

Err

or (

%)

Cover 10% 20% 30% 40% 50% 60% 70% 80%

Figure 6.8: Estimation error of hidden message length for all binary images

6.4 Conclusion

The method proposed in this work can detect the steganography developed in

[69] and estimate the length of the embedded message. In this estimation, we

first build the 512-pattern histogram from a binary image as the distinguishing

statistic. From this 512-pattern histogram, we compute the histogram difference

to capture the changes caused by the embedding operation. Performing matrix

right division creates a histogram quotient. Based on this histogram quotient,

the length of the embedded message is estimated. We used a significantly large

image database, consisting of 5931 binary images (one set of cover images and

eight sets of stego images) to test the proposed method. From the experiment

results, we conclude that our proposed method effectively estimated the hidden

message length with low estimation error.

We observe that it is insufficient only using a set of rules to ensure suitable data-

carrying pixels because the notches and protrusions produced from embedding

still can be utilised to mount an attack. To alleviate this shortcoming in the

steganography, we suggest to incorporate an adaptive pixel selection mechanism

for the identification of suitable data-carrying pixels.

97

Chapter 7

Steganographic Payload Location

Identification

In general, as discussed in Section 3.2, the task of steganalysis involves several dif-

ferent levels of analysis (also considered different forms of attacks). They are the

determination of the existence of a hidden message, classification of the stegano-

graphic methods, finding the length of hidden message, identification of locations

where bits of hidden message have been embedded and retrieval of the stegokey.

Compared to other forms of attack, the identification of locations that carry stego

pixels and retrieval of the stegokey have received relatively less attention in the

literature. These attacks require extracting extra and more information about

the steganography method used. Consequently, they are much more difficult than

attacks that extract only partial information. For instance, the estimation of

the stegokey in the attack given in [42] requires the identification of the hidden

message length.

In this chapter, we develop an attack that identifies the steganographic payload

locations in binary images, where bit-replacement steganography is used. More

precisely, our proposed method will find and locate the pixels in the image used

to carry secret message bits. Note that steganographic payload, hidden data and

message are used interchangeably throughout the chapter.

The remainder of the chapter is structured as follow. In the next section, we pro-

vide the related background. The motivation for this research and main research

challenges are discussed in Section 7.2. The attack is discussed in detail in Sec-

tion 7.3. Section 7.4 gives experimental results for the attack and the chapter is

concluded in Section 7.5.

98

7.1 Background

Some attacks, such as blind steganalysis or stego message length estimation, can

determine if a given image is a stego or cover image. Assume that we have already

determined that the image contains a steganographic payload. The next and quite

natural step is to identify the location of the hidden message.

Because of the invasive nature of steganography, the embedding operation is likely

to disturb the inter-pixel correlations. The embedding operation creates pixels

with high energy, as defined by Davidson and Paul [27]. They developed a method

to measure the energy caused by the embedding disturbance and were able to

identify pixels with high energy that are likely to carry the hidden message bits.

However, their method suffers from high false negatives or missed detections when

some message bits do not change the pixel energy. This occurs when the parity of

the hidden message bits is the same as the parity of the image pixels.

Kong et al. in [68] used the coherence of hue in a colour image to identify a subset

of pixels used to carry the hidden message bits. They observed that, in cover

images (without a hidden message), the coherence of hue varies slowly and tends

to be constant in a small neighbourhood of pixels. This is no longer true when a

hidden message is embedded. Thus, when the hue of a region under examination

exceeds certain threshold, there is a good reason to suspect that it contains bits

of the hidden message. Unfortunately, this analysis only works for steganography

with sequential embedding. If the embedding is random, then this attack fails.

In [62], Ker showed how to use the residual of the weighted stego image to identify

the location of bits of the hidden message. The residual is the pixel-wise difference

between the stego image and the estimated cover image. This analysis requires a

large number of stego images. The only concern is whether it is possible to obtain

different multiple stego images with the payload embedded in the same locations.

Nevertheless, this is plausible when the same stegokey is reused across different

stego images. The author applied a similar concept in a different paper to attack

the LSB-matching steganography and proved to be effective [64].

7.2 Motivation and Challenges

The promising results obtained by the Ker method in analysis of greyscale im-

age steganography motivated us to take a closer look at the analysis and extend

99

the concept to binary image steganography. The Ker method is superior to the

methods developed in [27, 68]. More importantly, the method can be applied for

sequential and random embedding and has low false negatives.

These two advantages are very important. For example, the problem of false

negatives gets worse and becomes critical when a message is encrypted before

embedding. The bits of an encrypted message behave like truly random ones

with uniform probability distribution for zeros and ones. This implies that half

of the time the message bits will match the pixel LSBs, so no change takes place.

Consequently, nothing can be detected. It is well known that sequential embedding

is insecure as it can be easily detected using a visual inspection. Most current

steganographic techniques employ random embedding. Thus, steganalysis is of

limited use if it can only attack sequential embedding.

Let us introduce the ideas used in our work. We follow the conventions used by Ker

in his work [62]. Given a stego image as a sequence of pixels S = {s1, s2, · · · , sn}

and an estimated cover image C = {c1, c2, · · · , cn}. C can be estimated from

the stego image by taking the average of the four connected neighbouring pixels

(linear filtering). n is the total number of pixels. Now we can define a vector of

residuals

ri = (si − si)(si − ci), (7.1)

where si is the ith pixel of the stego image with its LSB flipped.

Assume that we have N multiple stego images. We can define the residual of the

ith pixel in the jth stego image as

rij = (sij − sij)(sij − cij). (7.2)

The mean of the residual of the ith pixel can be computed as follows:

ri· =1

N

N∑

j=1

rij . (7.3)

With a sufficient number of stego images, this mean of residuals will provide

strong evidence that can be used to separate the stego-bearing pixels from non-

stego-bearing pixels.

Although there are some similarities in the embedding algorithms for greyscale

and binary image steganography, the attack on different image types may require

100

a different approach. Unlike a greyscale image, a binary image has a rather modest

statistical nature. This makes it difficult to apply the existing method directly. In

addition, it is clear that the Ker technique offers a high accuracy; however, there

is always a trade-off between the required number of stego images and detection

accuracy.

7.3 Proposed Stego-Bearing Pixel Location Iden-

tification

In this section, we discuss our proposed method for attacking binary stego im-

ages embedded using bit-replacement steganography. Let us first introduce bit-

replacement steganography for a binary image. Given a cover image C =

{c1, c2, · · · , cn} and a stego image S = {s1, s2, · · · , sn} that contains a hidden

message. Since a binary image has only two intensities (black and white), the

embedding operation involves simply flipping the one-bit pixel (i.e. changing the

zeros to ones and vice versa) when the message bit does not match that of the

image pixel. Assume that the order of hidden message bits that are going to be

embedded are randomly permuted. The random permutation is obtained from a

PRNG that is controlled by a stegokey.

Clearly, it is not impossible for an adversary to get access to multiple stego images

that reuse the same stegokey for a batch of covert communications. Although the

content of the secret message and the cover image used may differ every time,

the steganographic payload locations will be the same because they use the same

stegokey.

We begin by adapting the method proposed by Ker in [62]. This includes finding

the vector of residuals and employing multiple stego images to obtain the mean

of the residuals. However, due to the limited characteristic of a binary image,

we need a different approach to estimating the cover image. We choose an image

smoothing approach to achieve binary cover image estimation. Several alternatives

exist; our empirical studies found that a Gaussian filter produces the best results.

The Gaussian filter is defined as follows:

g(x, y) =1

2πσ2e−

x2+y2

2σ2 , (7.4)

where x and y are the window size of the Gaussian filter in the horizontal and

vertical directions, respectively and σ is the standard deviation of the Gaussian

101

distribution.

Once we estimate the cover image, we can compute the vector of residual for each

stego image using Equation (7.1). We compute the mean of residuals, as shown

in Equation (7.3), by employing multiple stego images. The identification of pixel

locations containing a steganographic payload can be carried out by choosing the

M pixels with the highest mean residual ri·. According to the author in [62], M

can be calculated by M = 2nr·· with r·· =1

nN

∑n,Ni,j rij .

However, the estimation ofM for binary images does not always give as accurate an

estimation as for greyscale images. This happens because of the modest statistical

characteristic in a binary image. This becomes more severe when N is small.

To overcome this problem, we propose incorporating the entropy measurement.

Entropy is defined as follows:

E(I) = −

K∑

i=1

pi log2 pi, (7.5)

where I can be the given stego image S or the estimated cover image C. pi

is the probability of the pixel intensity occurrences with a total of K possible

intensities. However, computing entropy for the entire image at once will give us

a global feature. Instead, we use Equation (7.5) to compute the local entropy of

the 3×3 neighbourhood around the ith pixel. We can obtain the local entropy for

every pixel in both the stego image and its corresponding estimated cover image.

Entropy measurement has been widely used as a statistical measure for random-

ness to characterise the content of an image. Note that the embedding operation

will alter the image content, which causes direct changes on the degree of ran-

domness in the image. Thus, incorporating entropy appears to be an appropriate

solution for capturing the embedding artefact.

The next question would be, how do we combine the mean residual with the local

entropy? Firstly, we find the local entropy difference,

di = ǫsi − ǫci , (7.6)

where ǫsi and ǫci are the ith pixel local entropy from the stego image and its

estimated cover image, respectively. Secondly, we employ multiple stego images

to compute the mean of local entropy differences di· by replacing rij in Equation

(7.3) with dij. dij can be obtained in a manner similar to rij in Equation (7.2).

102

Table 7.1: Summary of image databases

Database Total Images Resolution Image Size

Database A 5867 300 dpi 400 × 400

Database B 1338 96 dpi 512 × 384

Database C 2636 200 dpi 400 × 400

Thirdly, we construct two pixel subsets, Sr ⊆ S and Sd ⊆ S, by evaluating the

mean of residuals and the mean of local entropy differences, respectively. The

elements in subset Sr are obtained by 10 per cent more than M pixels with the

highest mean of residual. The 10 per cent is determined empirically and the

aim is to obtain slightly more samples. For the second subset Sd we select those

pixels with a mean local entropy difference exceeding threshold τ . Finally, the

identification of pixels containing the steganographic payload is determined by

Sr ∩ Sd.



To cover a diverse set of images, we constructed three image databases. The first

image database consists of 5867 binary cover images. These images are cropped

from a set of 4288 × 2848 pixel RAW images captured by a Nikon D90 digital

camera. Then we use the conversion software supplied by the camera manufacturer

to convert the images to TIFF. In the second image database, we constructed 1338

binary images from the image database used in [98]. The third image database

consists of 2636 binary cover images.

The images in the first and second databases are natural scene images. The images

in the third database are textual document images. The cropping operation and

greyscale to binary conversion are carried out with Irfanview version 4.10 freeware.

Overall, we constructed 9841 cover images and the databases are summarised in

Table 7.1. For brevity, we call these Database A, B and C.

We use bit-replacement steganography (as discussed in the first paragraph of Sec-

tion 7.3) to generate stego images from the three image databases for different

message lengths. We generated three message lengths (0.01, 0.05 and 0.10 bpp)

103

for each database. Since a binary image only has one bit per pixel, we can think of

bpp as the average message bits embedded per image. For example, 0.01 bpp em-

bedding means that, for every 100 pixels, only one pixel is used to carry message

bits. We employ uniform distribution of random message bits for the experiments.

For the parameters of our proposed method, we set x = 3, y = 3 (abbreviated as

3× 3) and σ = 0.6 for the Gaussian filter. We also tried several different window

sizes for the Gaussian filter and the results are given in Figure 7.1. Window sizes

of 3× 3 and 5× 5 give the optimum performance. Since the window size of 3× 3

and 5× 5 give about the same accuracy, we chose 3× 3 to reduce the demand for

computational resources. The threshold τ is > 0.05.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

2x23x34x45x56x6

(a)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

2x23x34x45x56x6

(b)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

2x23x34x45x56x6

(c)

Figure 7.1: Identification results of 0.05 bpp for different window sizes: (a)Database A (b) Database B (c) Database C


To evaluate the accuracy of the identification, we compared the estimated locations

with the actual set of stego-bearing pixel locations. For each message length, we

show the accuracy of the identification in terms of true positives (abbreviated as

TP), false positives (FP) and false negatives (FN).

We divided the results into three tables—one for each image database (Tables 7.2,

7.3 and 7.4, respectively). The identification accuracy shown in each table is in a

percentage.

The tables show clearly that the proposed method gives very promising results.

Especially in Table 7.2, the identification is nearly perfect for N = 100 and perfect

for N greater than 300 images. Similarly, reliable accuracy is also shown in Table

104

Table 7.2: The accuracy of the stego-bearing pixel location identification for imageDatabase A (* indicates the message length)

# of images, N 100 200 300 · · · > 320

TP (* 0.01bpp) 100 100 100 · · · 100

FP (* 0.01bpp) 0.00 0.00 0.00 · · · 0.00

FN (* 0.01bpp) 0.00 0.00 0.00 · · · 0.00

TP (* 0.05bpp) 99.95 99.99 100 · · · 100

FP (* 0.05bpp) 0.00 0.00 0.00 · · · 0.00

FN (* 0.05bpp) 0.05 0.01 0.00 · · · 0.00

TP (* 0.10bpp) 99.80 99.96 99.99 · · · 100

FP (* 0.10bpp) 0.04 0.00 0.00 · · · 0.00

FN (* 0.10bpp) 0.16 0.04 0.01 · · · 0.00

Table 7.3: The accuracy of the stego-bearing pixel location identification for imageDatabase B (* indicates the message length)

# of images, N 100 200 300 · · · > 820

TP (* 0.01bpp) 99.90 100 100 · · · 100

FP (* 0.01bpp) 0.05 0 0 · · · 0

FN (* 0.01bpp) 0.05 0 0 · · · 0

TP (* 0.05bpp) 99.77 100 100 · · · 100

FP (* 0.05bpp) 0.11 0.00 0.00 · · · 0.00

FN (* 0.05bpp) 0.12 0.00 0.00 · · · 0.00

TP (* 0.10bpp) 99.62 99.86 99.92 · · · 99.98

FP (* 0.10bpp) 0.19 0.07 0.04 · · · 0.01

FN (* 0.10bpp) 0.19 0.07 0.04 · · · 0.01

105

Table 7.4: The accuracy of the stego-bearing pixel location identification for imageDatabase C (* indicates the message length)

# of images, N 100 200 300 · · · > 2600

TP (* 0.01bpp) 84.48 90.33 92.59 · · · 99.31

FP (* 0.01bpp) 1.95 0.94 0.50 · · · 0.06

FN (* 0.01bpp) 13.57 8.73 6.91 · · · 0.63

TP (* 0.05bpp) 86.69 90.84 92.85 · · · 99.48

FP (* 0.05bpp) 3.21 1.91 1.26 · · · 0.09

FN (* 0.05bpp) 10.10 7.25 5.89 · · · 0.43

TP (* 0.10bpp) 85.91 90.41 92.45 · · · 99.40

FP (* 0.10bpp) 4.11 2.35 1.72 · · · 0.13

FN (* 0.10bpp) 9.98 7.23 5.83 · · · 0.47

7.3, except for an embedded message length of 0.10 bpp, where near perfect iden-

tification is achieved for N > 820 images. The identification of stego-bearing pixel

locations for images in Database C (Table 7.4) appeared to be the most difficult

compared to the others. However, the detection reliability is still very good and

all the identifications show at least 84 per cent for N = 100 and more than 90 per

cent when N > 200.

Further analysis reveals that the textual content in image Database C has periodic

patterns that are uniform and consistent across the whole image. This increased

the image entropy significantly in the whole image. Since our method is partly

based on using the local entropy, this interfered with our identification mechanism.

To the best of our knowledge, there is no stego-bearing pixels identification ap-

proach proposed for binary images in the literature. Thus, we compare our pro-

posed method to a general method, where just the residual of weighted stego im-

ages and linear filtering are used. From the results shown in Figures 7.2, 7.3 and

7.4, our proposed method shows better performance. However, with Database C,

the identification results for both methods show only a marginal difference. Figure

7.4 illustrates the difference. This justifies the explanation given in the previous

paragraph about the use of local entropy in textual images and its smaller effect.

106

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(a)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(b)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(c)

Figure 7.2: Comparison of results for image Database A (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(a)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(b)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(c)

Figure 7.3: Comparison of results for image Database B (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(a)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(b)0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

# of images, N

Tru

e P

ositi

ve, T

P (

%)

(c)

Figure 7.4: Comparison of results for image Database C (solid line represents theproposed method and the line with crosses is the general method): (a) 0.01 bpp(b) 0.05 bpp (c) 0.10 bpp

107

7.5 Conclusion

We successfully proposed a steganalysis technique to identify the steganographic

payload location for binary stego images. This work was motivated by the concept

developed in [62] where greyscale stego images are used. We enhanced and applied

the concept to binary stego images where we propose using Gaussian smoothing to

estimate the cover images. We employ local entropy to improve the identification

accuracy. Experimental results showed that our proposed method can provide

reliable identification accuracy of at least 84 per cent for N = 100 and more

than 90 per cent when N > 200. The experimental results also showed that our

proposed method can provide nearly perfect (≈ 99 per cent) identification for N

as low as 100 in non-textual stego images.

It is important to note that our proposed method will not produce the same

accuracy if only one stego image is available. Although this may seem like a

downside, it is unavoidable. If only one stego image is available, we would not have

sufficient evidence to locate the unchanged pixels where the LSBs already matched

the message bits. As a result, the problem of high false negatives, discussed in

Section 7.2 (second paragraph) would occur.

108

Chapter 8

Feature-Pooling Blind JPEG

Image Steganalysis

From a practical point of view, blind steganalysis is more useful because, if an

image is suspected to carry a secret message, we can use blind steganalysis first to

detect the existence of hidden message. Then, we can carry out further analysis,

such as to identify the steganographic technique used, except in a rare case, where

a priori knowledge of the type of steganography is known (for example, when the

computer of a suspect is confiscated and a certain steganography tool is found in

that computer).

This chapter focuses on blind steganalysis. The analysis is carried out and tested

on greyscale JPEG image steganography. We will study several existing JPEG im-

age blind steganalysis techniques, especially feature extraction techniques. Then

we will select and combine the features to form a pooled feature set.

The rest of the chapter is structured as follows. In the next section, we discuss the

feature extraction techniques. The proposed feature-pooling steganalysis will be

given in Section 8.2. Section 8.3 presents the experimental results and the chapter

is concluded in Section 8.4.

8.1 Feature Extraction Techniques

Feature extraction plays an important role in blind steganalysis. A good feature

should be representative and sensitive to steganographic operations. Moreover,

the feature should be insensitive to image content. In the following subsection,

109

several well known steganalysis techniques are discussed. The emphasis will be on

feature extraction algorithms.

8.1.1 Image Quality Metrics

In [4], the authors proposed and selected a set of ten image quality metrics. These

metrics are the mean absolute error, mean square error, Czekanowski correlation,

angle mean, image fidelity, cross correlation, spectral magnitude distance, me-

dian block spectral phase distance, median block weighted spectral distance and

normalised mean square HVS error.

These metrics are selected based on the one-way ANOVA test. Among these met-

rics, seven metrics are more sensitive in detecting active warden steganography.

The other four metrics test more sensitive in detecting passive warden steganog-

raphy. Active warden steganography is constructed to withstand alterations—

robustness—made by the warden (steganalyst). Robustness is not the main ob-

jective in passive warden steganography, rather it is to conceal the existence of

a secret message to create a covert communication (the description of active and

passive warden is given in Section 2.2). The metric sensitivity is based on the

statistic significance obtained from the ANOVA tests, where the tests are per-

formed on active and passive warden steganography separately.

8.1.2 Moment of Wavelet Decomposition

Lyu and Farid [73] proposed using a higher-order statistic as features that include

the mean, variance, skewness and kurtosis. Two sets of these higher-order statistics

are obtained and result in 72-dimensional features.

The first set is acquired from wavelet decomposition, based on separable quadra-

ture mirror filters. In the decomposition, a given image is decomposed into multi-

ple orientations and scales. Each scale has three orientations—vertical, horizontal

and diagonal subbands. The elements in each subband are called wavelet sub-

band coefficients. The original paper used the first three scales and produced nine

subbands. The mean, variance, skewness and kurtosis of each wavelet subband

coefficient are composed and result in 36 statistics, which are used as the first set

of features.

Next, based on the nine decomposed subbands, the linear predictor of the wavelet

110

coefficients is obtained from the neighbouring wavelet coefficients for each vertical,

horizontal and diagonal subband. The linear relationship for the predictor is

defined as below:

V = Qw, (8.1)

where w is the weight, V is the vertical subband coefficient and Q is the neighbour-

ing coefficients. The error log for the linear predictor is obtained by the following:

E = log2(V )− log2(|Qw|). (8.2)

The same linear predictor and error log is applied for the horizontal and diagonal

subbands. The second set of features is composed from the error logs for all nine

subband coefficients, using the mean, variance, skewness and kurtosis, which yield

another 36-dimensional features.

8.1.3 Feature-Based

In [35], the cover image is estimated from a stego image using calibration. A

set of 20-dimensional features are constructed from the L1 norm of the difference

between the features of the estimated cover image and the stego image. The L1

norm is defined as the sum of the vector absolute values. Among these features,

17 are the first-order features. They are defined as follows:

❐ Global histogram: the frequency plot of quantised DCT coefficients

❐ Individual histogram: low frequency coefficient of individual DCT mode

histogram where five DCT modes are selected

❐ Dual Histogram: frequency of occurrence for a (i, j)-th quantised DCT co-

efficients in an 8 × 8 block equal to a fixed value, d over the whole image

and defined as follows:

gdi,j =

B∑

k=1

δ(d, dk(i, j)), (8.3)

where δ(u, v) = 1 if u = v and 0 otherwise. B is the total of all blocks in

the JPEG image and 11 d values are selected.

The other three second-order features are given below.

111

❐ Variation measures the inter-block dependency and is defined as follows:

V =

∑8i,j=1

∑|Ir|−1k=1 |dIr(k)(i, j)− dIr(k+1)(i, j)|

|Ir|+ |Ic|+

∑8i,j=1

∑|Ic|−1k=1 |dIc(k)(i, j)− dIc(k+1)(i, j)|

|Ir|+ |Ic|, (8.4)

where Ir and Ic are collection of blocks that are scanned by rows and columns,

respectively, throughout the image. d(i, j) is the quantised DCT coefficient

at (i, j)-th position from a 8× 8 block in the kth block.

❐ Blockiness measures the spatial inter-block boundary discontinuity defined

as follows:

Bα =

∑⌊(M−1)/8⌋i=1

∑Nj=1 |p8i,j − p8i+1,j |

α

N⌊(M − 1)/8⌋+M⌊(N − 1)/8⌋+

∑⌊(N−1)/8⌋j=1

∑Mi=1 |pi,8j − pi,8j+1|

α

N⌊(M − 1)/8⌋+M⌊(N − 1)/8⌋, (8.5)

where α = 1, 2 and pi,j is the spatial pixel value. M and N are image

resolution.

Another three final features that make up a total of 23-dimensional features in

[35] are co-occurrence matrices defined as follows:

Cst =

∑|Ir|−1k=1

∑8i,j=1 δ

(

s, dIr(k)(i, j))

δ(

t, dIr(k+1)(i, j))

|Ir|+ |Ic|+

∑|Ic|−1k=1

∑8i,j=1 δ

(

s, dIc(k)(i, j))

δ(

t, dIc(k+1)(i, j))

|Ir|+ |Ic|, (8.6)

where s and t ∈ (−1, 0, 1), which form nine combinations. From these combina-

tions, the three final features are obtained by the difference between Cst of the

estimated cover image and Cst of the stego image.

8.1.4 Moment of CF of PDF

The use of characteristic functions (CF) in steganalysis was pioneered by Harmsen

and Pearlman in [48]. In their work, they assume the stego image histogram is the

convolution between the hidden message probabilistic mass function and the cover

112

image histogram. This is because a steganographic operation can be considered

as noise addition.

The characteristic function is obtained by applying the discrete Fourier transform

to the probabilistic density function (PDF) of an image. From this characteristic

function, the first-order of absolute moment (or centre of mass, in their paper) is

computed and used as the feature. Equation (8.7) shows the calculation of this

moment.

M =

∑Kk=0 k|H [k]|

∑Kk=0 |H [k]|

, (8.7)

where H [·] is the characteristic function, K ∈ {0, . . . , N2− 1} and N is the width

of the domain of the PDF.

8.2 Features-Pooling Steganalysis

This section discusses the proposed feature-pooling method motivated by the fea-

ture selection capability discussed in [55]. The proposed method selects from the

existing sensitive discriminant features and pools them with another two feature

sets from different feature extraction techniques.

8.2.1 Feature Selection in Feature-Based Method

The first set of proposed feature-pooling features is obtained from [35]. The fea-

tures from [35] are selected because they include first- and second-order features

that are sensitive to steganographic operations. In addition, the experiments car-

ried out in Section 8.3.2 proved the efficacy of these features.

The feature selection technique used is the sequential forward floating selection

technique (SFFS) from [93]. As proven by Jain and Zongker in experiments [55],

the use of SFFS dominated other feature selection techniques being tested. We also

tested other selection techniques based on T-test and Bhattacharyya distance. The

experimental results showed the superiority of the SFFS technique. A comparison

of the results is summarised in Table 8.1.

We also compared the efficiencies of the selected feature set (selected through the

SFFS technique) and the original 23-dimensional feature set. The comparison was

113

Table 8.1: Feature selection comparison for SFFS, T-test and Bhattacharyya

SFFS T-test Bhattacharyya

F5 0.86528 0.86063 0.85447

OutGuess 0.84505 0.84185 0.84232

MB1 0.80208 0.78638 0.79566

0 5 10 15 20 250.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

0.9

Number of Combined Features

AU

R

Max AUR = 0.86528 at combination of 9 featuresMax AUR = 0.85447 at original 23 features

Figure 8.1: Comparison between the selected features and the original features indetecting F5

made using three steganographic models—the F5, OutGuess1 and model-based

Steganography2 (MB1) from [110], [90] and [96], respectively. The F5, OutGuess

and MB1 steganography are discussed in Section 3.1. The area under the ROC

curve (AUR) is used to evaluate the detection accuracy and is shown in Figures

8.1, 8.2 and 8.3. The higher the AUR, the better the detection accuracy. It can be

clearly seen that the selected feature set performs well—better than the original

23-dimensional feature set for all three steganographic models. The Y-axis in the

graph represents the AUR ranging from 0.5 to 1, and the X-axis is the top nth

ranked number of selected and combined features produced by SFFS. The square

asterisk corresponds to the AUR for the original 23-dimensional feature set and the

circled asterisk is for the selected top nth ranked features. The selected features

form the best feature set with optimum discriminant capability.

8.2.2 Feature-Pooling

Pooling the selected feature set from the SFFS technique in Section 8.2.1 with

two additional feature sets from different feature extraction techniques creates the

1OutGuess steganography with statistic correction.2Model-based steganography without deblocking.

114

0 5 10 15 20 250.65

0.7

0.75

0.8

0.85

0.9


AU

R


Figure 8.2: Comparison between the selected features and the original features indetecting OutGuess

0 5 10 15 20 250.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82


AU

R


Figure 8.3: Comparison between the selected features and the original features indetecting MB1

final feature set for blind steganalysis. The first additional set is extracted from

the image quality metric developed in [4]. The second additional set is from the

feature extraction developed in [48], which is the moment of characteristic function

computed from the image PDF. Both of these feature sets are discussed in Section

8.1.1 and 8.1.4, respectively.

Based on the analysis given in the original paper [4], we chose the four features

assigned for the passive warden steganography case, because the blind steganalysis

that we propose in this research is for passive warden steganography as well.

However, from these four features, we excluded the angle mean feature because

the images tested here are all greyscale images. The contribution of the angle

mean feature will be significant only when colour images are used.

For the next pooled features, the original feature proposed in [48] has only the

first moment, so we increase it to the second and third moments according to the

115

following equation for α ∈ 1, 2, 3:

Mα =

∑Kk=0 k

α|H [k]|∑K

k=0 |H [k]|. (8.8)

Increasing the moment to a higher order does not always improve the result sig-

nificantly, which has been justified well in [109]. Furthermore, in our experiments,

we found that it is sufficient to use only the first three orders.


This section shows and analyses the experimental results. First we choose the

optimal classifier for our proposed blind steganalysis and followed by the result of

comparisons with some existing blind steganalysis.

In the construction of the image database, 2037 images of four different sizes

(512×512, 608×608, 768×768 and 1024×1024) were downloaded from [47]. All

images were cropped to obtain the centre portion of the image and were converted

to greyscale images. F5, OutGuess and MB1 were selected as the steganographic

models for creating three different types of stego images. To have a percentage

wise equal number of changes over all images, we define the embedding rate in

term of bits per embeddable quantised DCT coefficient of the cover image for

each. We define the “embeddable coefficients” as the coefficients that can be used

to carry the message bits in each steganographic model. We used four embedding

rates (5, 25, 50 and 100 per cent), resulting in a mixture of 10,185 cover and stego

images in the database. We employ uniform distribution of random message bits

for the experiments and the prototype is implemented in Matlab R2008a.

8.3.1 Classifier Selection

We selected four types of classifiers. There are multivariate regression, Fisher

linear discriminant, support vector machine and neural network. Concise expla-

nations of these classifiers are available in Section 2.4.2. In this section, we com-

pare different classifiers using the same set of the proposed features. The purpose

is to choose the optimal combination of proposed feature set and classifier. To

test the flexibility and consistency of this combination, the same three different

steganographic models were tested.

116

Figures 8.4, 8.5 and 8.6 compare the ROC curves for F5, OutGuess and MB1,

respectively. In each figure, the Y-axis represents the detection rate and X-axis is

the false alarm rate. Each axis ranges from zero to one. The value shown inside the

bracket is the AUR value, indicating the detection accuracy. NN, FLD, SVM and

MR stand for neural network, Fisher linear discriminant, support vector machine

and multivariate regression, respectively. In the comparison for all steganographic

models, classifications using the neural network as the classifier produced the

highest AUR values. This indicates that the combination of the proposed feature

set and the neural network provide optimal blind steganalysis. Thus, our blind

steganalysis is constructed by combining the proposed feature set with a neural

network classifier.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

Det

ectio

n

NN (0.9359)FLD (0.87976)SVM (0.87766)MR (0.75908)

Figure 8.4: Classifier comparison using the proposed features in detecting F5

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

Det

ectio

n

NN (0.91068)FLD (0.86376)SVM (0.89634)MR (0.88408)

Figure 8.5: Classifier comparison using the proposed features in detecting Out-Guess

117

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False AlarmD

etec

tion

NN (0.76213)FLD (0.7232)SVM (0.72565)MR (0.64392)

Figure 8.6: Classifier comparison using the proposed features in detecting MB1


This section compares the performance of our proposed blind steganalysis to that

of selected existing blind steganalysis. From the constructed image database, 80

per cent of the images are used for training and the remaining 20 per cent are

used for testing. The same steganographic models (i.e., F5, OutGuess and MB1)

are used and the classification for each steganography is carried out separately.

The following blind steganalysis techniques are selected for the detection perfor-

mance comparison:

❐ Image quality metrics are combined with the multivariate regression classifier

[4] (IQM).

❐ Moment of wavelet decomposition is combined with the SVM classifier [73]

(Farid).

❐ Feature-based method is combined with the SVM classifier3 [85] (FB).

❐ Moments of characteristic function of the image PDF is combined with Fisher

linear discriminant classifier [48] (COM).

Figure 8.7 shows the ROC curves and AUR values for our proposed method and

other blind steganalysis techniques at an embedding rate of 25 per cent. From

the best ROC curve at the top left of the graph to the diagonal, the AUR values

are 0.9359 for our proposed method, followed by the FB method at 0.72827. The

Farid method, at 0.53736, is slightly better than the COM and IQM methods at

0.52292 and 0.51072, respectively. Our proposed method outperformed all other

3Although the original paper [35] used Fisher linear discriminant as the classifier, their laterpaper [85] obtain improvement by using SVM.

118

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False AlarmD

etec

tion

PF (0.9359)Farid (0.53736)FeatureBased (0.72827)IQM (0.51072)COM (0.52292)

Figure 8.7: Comparison of steganalysis performance in detecting F5

blind steganalysis techniques in detecting F5.

From Figure 8.8, the steganalysis results in detecting OutGuess show that the FB

method is competitive with our proposed method—the difference in AUR values

is only 0.0243. But our proposed method is better overall and especially with a

lower false alarm rate. This property is desired in an optimal blind steganalysis

because an optimal blind steganalysis should be able to classify correctly with a

low false alarm rate. The other three blind steganalysis techniques do not show

good performance at this low embedding rate and the AUR values are cantered

around 0.51.

Although it is well known that, among the three steganographies (F5, OutGuess

and MB1), OutGuess is relatively easier to detect, we obtained a relatively lower

AUR values compared with that for detecting F5. This is because we are using

bits per embeddable quantised DCT coefficients as the embedding rate, which

reduces the message length embedded in OutGuess. In other words, we are using

a shorter message length in our experiments, which makes it more difficult for a

steganalysis to perform the detection.

Figure 8.9 compares the detection performance in detecting MB1. Again, our

proposed method outperformed all other blind steganalysis techniques at an AUR

value of 0.76213. This exceeds the AUR values of 0.64868, 0.53785, 0.53025 and

0.50423 for FB, Farid, IQM and COM, respectively. All the AUR values in Figure

8.9 are low compared to the AUR values in both Figure 8.7 and 8.8. This finding

is consistent with the finding in [35, 65, 66], which indicated that MB1 is the

hardest to detect.

119

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False AlarmD

etec

tion


Figure 8.8: Comparison of steganalysis performance in detecting OutGuess

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm

Det

ectio

n


Figure 8.9: Comparison of steganalysis performance in detecting MB1

8.4 Conclusion

In this research, we proposed a feature-pooling method for building a blind ste-

ganalysis feature set. We applied the SFFS technique to selecting the key signif-

icant features from the feature-based method [35] and then combined them with

two additional feature sets: image quality metrics [4] and the modified first three

moments of the characteristic function computed from the image PDF [48]. Based

on these feature sets, we employed a neural network as the classifier to construct

a blind JPEG image steganalysis. From the experimental results, we concluded

that our proposed blind steganalysis showed improvement and dominates the other

tested blind steganalysis techniques.

120

Chapter 9

Improving JPEG Image

Steganalysis

Although the performance of blind steganalysis is often inferior to a targeted one,

its flexibility and wide coverage of different steganographic methods make it an

attractive and practical choice. This chapter focuses on blind steganalysis; specif-

ically we will propose a technique for improving some of the existing steganalysis

techniques. To do that we propose to minimise the image-to-image variations,

which increases the discriminative ability of a feature set. We will illustrate the

efficiency of the proposed method by incorporating it into several existing blind

JPEG image steganalysis techniques. The experimental results presented will ver-

ify the feasibility and applicability of the proposed technique for improving existing

techniques.

The remainder of this chapter is as follow. The next section will model the

steganography artefact as an additive noise. The proposed method is discussed

in Section 9.2 and 9.3. Section 9.4 presents the experimental results. Section 9.5

concludes the chapter.

9.1 Steganography as Additive Noise

Let X denote an instance of a JPEG cover image and let PC(x) denote the prob-

ability mass function of a cover image. In a JPEG image, the probability mass

function can be considered as the frequency count of the quantised DCT coeffi-

cients.

121

The secret message probability mass function can be defined as the distribution

of additive stego noise, which is defined as follows:

PN(n) ≡ P (x′ − x = n), (9.1)

where x and x′ are quantised DCT coefficients before and after embedding, re-

spectively.

Generally, we can divide a cover image used in steganography into two parts, xc

and xs. Part xc is the unperturbed part and normally consists of a group of the

most significant bits. xs is the part that will be altered and used to carry the

secret message. Normally this part contains a group of less significant bits.

Since the additive stego noise is independent of the cover image, perturbing the xs

part by embedding a secret message bit into it is equivalent to the convolution of

the additive stego noise probability mass function and the cover image probability

mass function. This can be expressed by the following:

PS(n) = PN(n) Θ PC(n), (9.2)

where Θ is the convolution and PS(n) is the stego image probability mass function.

9.2 Image-to-Image Variation Minimisation

Defining a discriminative feature set in image steganalysis is a challenging task

because the defined feature set should be optimally sensitive to steganographic

alteration and not to image-to-image variations. Image-to-image variation is de-

fined as the difference between the underlying statistic of one image and that of

another. The underlying statistic can be the histogram distribution of the DCT

coefficients or the pixel intensities. For example, the images shown in Figure 9.1

are obviously different and, therefore, their underlying statistics (histogram dis-

tributions shown below each image) differ. This difference is the image-to-image

variation. In other words, the image-to-image variation is caused by the difference

of the image content.

The question of interest here is how we can categorise these images into either

cover or stego images. It is obvious that there is no consistency in differentiating

these two images as either a cover or stego image by just examining the histogram

distribution because the distribution is rather random and different for each image.

122

Figure 9.1: Two images with their respective underlying statistics

If we apply feature extraction directly to the histogram distribution, then the

extracted feature will have poor discriminative capability because the image-to-

image variation is large.

Ideally, the cover image is presented together with the stego image during ste-

ganalysis detection. We could subtract the stego image S from the cover image C

directly as follows:

N = S − C. (9.3)

Hence, N will be the stego noise and the image-to-image variation is minimal.

The result of the subtraction will be the corresponding stego noise N . Note that

the subtraction can be viewed as pixel-wise subtraction. However, this case is not

typical, and most of the time we have only one version of the image—the cover or

stego image.

As a result, it is reasonable to estimate the cover image from the stego image, so

that we can minimise the image-to-image variation. To demonstrate the efficiency

of our proposed technique, we will apply it to existing steganalysis techniques.

Thus, we propose two different techniques for optimum performance of the respec-

tive existing steganalysis techniques. For the first technique, given two versions

of an image, we will first extract the feature set for each image and compute the

difference between them. For the second technique, we will compute the pixel-wise

difference between the two images and follow it with feature extraction.

123

Figure 9.2: Transformed image by scaling (left) and cropping (right)

The two proposed techniques are defined as follows:

Ψ1 = Φ(υ)− Φ(υ), (9.4)

Ψ2 = Φ(η), η = υ − υ, (9.5)

where υ and υ are the given image (possibly a cover or a stego image) and the

estimated cover image, respectively. The variable η is the additive stego noise

generated by the embedding operation and Ψi is the feature set produced by the

feature-extraction technique, Φ(·). If υ is given as a cover image, then Ψi ≈ 0. If

υ is given as a stego image, then the absolute value of Ψi is always greater than

zero and i = 1, 2.

In cover image estimation, we first decompress the JPEG images to the spatial

domain and apply a transformation to the decompressed images. In our experi-

ments, we employed scaling by bilinear interpolation (shown in the left of Figure

9.2) and cropping four pixels in both the horizontal and vertical directions (shown

in the right of Figure 9.2). We then recompress the transformed image back to

the JPEG domain. In the decompression and recompression processes, we used

the same JPEG image quality as before the transformation to avoid double com-

pression. Since steganography can be modelled as additive noise, the effect of the

transformation can be attributed as noise neutralisation. Hence, the cover image

estimation is reasonable. Similar estimation approaches have proven efficient and

can be found in [35, 61].

124

9.3 Steganalysis Improvement

We will select three existing steganalysis techniques to demonstrate the efficiency

of the proposed technique. In the following subsections, we discuss the incorpo-

ration of the proposed technique in each of the selected steganalysis techniques

separately.

9.3.1 Moments of Wavelet Decomposition

Lyu and Farid [73] proposed using higher-order statistics as features—mean, vari-

ance, skewness and kurtosis. Two sets of these higher-order statistics are obtained,

resulting in 72-dimensional features.

The first set is acquired from wavelet decomposition based on separable quadrature

mirror filters. A total of nine subbands are obtained and the mean, variance,

skewness and kurtosis are computed for each subband. These 36 higher-order

statistics (nine subbands × four higher-order statistics), Ψwkand k = 1, 2, . . . , 36

will be used as the feature set.

The second set of features is obtained from the log error in the linear predictor for

each of the same nine subbands. The four higher-order statistics (mean, variance,

skewness and kurtosis) are computed for each log error, resulting in another 36-

dimensional features, Ψek for k = 1, 2, . . . , 36.

Instead of using the 72-dimensional features extracted directly from the image in

the classification, we use the proposed technique discussed in Section 9.2. Specif-

ically, we improve the feature discrimination capability by employing the second

proposed technique, defined in Equation (9.5). Thus, our improved feature set is

defined in the following equation:

η = υ − υ,

Ψwk= Φwk

(η),

Ψek = Φek(η),

Ψ = Ψwk+ Ψek . (9.6)

125

9.3.2 Moment of CF of PDF

The use of characteristic functions (CF) was pioneered by Harmsen and Pearlman

in [48]. The CF is obtained by applying a discrete Fourier transform to the PDF

of an image. From this characteristic function, the first-order absolute moment is

computed and used as the feature. Equation (9.7) shows the calculation of this

moment.

M =

∑Kk=0 k|H [k]|

∑Kk=0 |H [k]|

, (9.7)

where H [·] is the characteristic function, K ∈ {0, . . . , N2− 1} and N is the width

of the domain of the PDF.

The feature proposed in [48] has only one feature—the first moment. We increased

to the second and third moments according to the following equation:

Mα =

∑Kk=0 k

α|H [k]|∑K

k=0 |H [k]|, (9.8)

where α ∈ 1, 2, 3. Increasing the moment to a higher order does not always improve

the result significantly and therefore is not necessary [109].

By incorporating the proposed technique defined in Equation (9.4), we obtained

a new set of features, defined as follows:

Ψ = Φ(υ)− Φ(υ),

Ψ = Mα −Mα, (9.9)

where Mα is the moment computed from the estimated cover image.

9.3.3 Moment of CF of Wavelet Subbands

The basis of the feature set proposed in [115] derives from a Haar wavelet de-

composition. The authors decomposed the wavelet to 12 subbands denoted by

LLi, HLi, LHi, HHi where i = 1, 2, 3. The given image histogram, denoted LL0,

is also employed.

Essentially, the probability mass function can be considered as the distribution for

126

the wavelet subbands and the image histogram. Motivated by the characteristic

function from [48], the authors constructed the characteristic functions from all the

wavelet subbands and the image histogram, resulting in 13 characteristic functions.

After that, the first three moments for each of the characteristic functions can be

computed as in the following equation:

Mα =

∑N/2k=0 f

αk |H(fk)|

∑N/2k=0 |H(fk)|

, (9.10)

where α = 1, 2 and 3 are the three moments. fk is the probability mass function

and H(·) is the characteristic function. N is the width of the domain of the

probability mass function.

Next, we improve this method by incorporating the proposed technique, as defined

in Equation (9.4). The improved feature set can be defined as follows:

Ψ = Φ(υ)− Φ(υ),

Ψ = Mαj −Mα

j , (9.11)

where Mαj is the moment computed from the estimated cover image. j = 1, 2, . . . , 13

are the 13 characteristic functions.



Since we are interested in comparing feature discrimination performance, we stan-

dardise the classification by using a SVM [9] as the classifier in all experiments.

Three different steganographic methods, F5 [110], OutGuess [90] and MB1 [96],

are employed to create three different types of stego images. To have a percentage

wise equal number of changes over all images, we define the embedding rate in

term of bits per embeddable quantised DCT coefficient for each steganography.

We use four embedding rates: 5, 25, 50 and 100 per cent.

In the construction of the image database, 2037 images of four different sizes (512

× 512, 608 × 608, 768 × 768 and 1024 × 1024) were downloaded from [47]. All

127

Table 9.1: Performance comparison between the proposed technique and the Faridtechnique

Steganalysis F5 OutGuess MB1

Improved 0.5160 0.5120 0.5165

5% Original 0.5077 0.4625 0.5064

Improved 0.5850 0.6460 0.6304

25% Original 0.5374 0.5384 0.5379

Improved 0.7236 0.7983 0.7564

50% Original 0.5940 0.6606 0.5754

Improved 0.90437 0.8811 0.8815

100% Original 0.7218 0.7650 0.6485

images were cropped to obtain the centre portion and then converted to greyscale

images. From the constructed database, 80 per cent is used for training and the

remaining 20 per cent is used for testing. The prototype implementation is coded

in Matlab R2008a.


We compare the improved version, which uses our proposed method, to the original

version for each of the three steganalysis techniques, as discussed in Section 9.3.

The detection results were evaluated using the area under the ROC curve (AUR).

A higher AUR value indicates better steganalysis performance. The obtained

results are tabulated in Tables 9.1, 9.2 and 9.3. We abbreviate the original version

of steganalysis techniques discussed in Subsections 9.3.1, 9.3.2 and 9.3.3 as Farid,

COM and MW, respectively.

Table 9.1 shows the comparison between the improved version and the original ver-

sion of Farid technique. From the table, each AUR value of the improved version

is larger than the corresponding original version. This indicates that the improved

version outperformed the original version for all the steganographic models and

the four embedded message lengths.

Table 9.2 compares the improved version and the original version of the steganaly-

sis technique in [48]. Although the improvement is not as large as the improvement

obtained in Table 9.1, the overall performance has been improved.

128

Table 9.2: Performance comparison between the proposed technique and the COMtechnique


Improved 0.5055 0.4838 0.5040

5% Original 0.5022 0.4631 0.5020

Improved 0.5279 0.5261 0.5080

25% Original 0.5097 0.4794 0.5029

Improved 0.5654 0.5408 0.5147

50% Original 0.5314 0.4799 0.5030

Improved 0.6392 0.5686 0.5228

100% Original 0.5971 0.4817 0.5052

Table 9.3: Performance comparison between the proposed technique and the MWtechnique


Improved 0.5016 0.5087 0.5105

5% Original 0.5009 0.4793 0.5042

Improved 0.5514 0.5385 0.5116

25% Original 0.5202 0.4927 0.5043

Improved 0.7601 0.5635 0.5648

50% Original 0.5625 0.5064 0.5286

Improved 0.8518 0.6213 0.5747

100% Original 0.6667 0.5307 0.5520

129

As for the third improved steganalysis technique, Table 9.3 clearly shows sig-

nificant improvement in detecting all the steganographic models—the detection

performance for the F5 steganographic model appears the most improved. This

verifies the effectiveness of the proposed technique.

9.5 Conclusion

In conclusion, our proposed technique has improved the three selected steganalysis

techniques by minimising image-to-image variations. To minimise the image-to-

image variation, we estimate the cover image from the stego image and then com-

pute the difference between the two. Finally, we extract the feature set from this

difference. The experimental results prove the effectiveness of using the proposed

technique.

130

Chapter 10

Conclusions and Future Research

Directions

10.1 Summary

In this thesis, we investigated steganalysis that extract information related to a

secret message hidden in multimedia document. In particular, we focused our

analysis on steganographic methods that use binary images as the medium for a

secret message. We organised our work according to the amount of information

extracted about the hidden message (i.e., the organisation structured in Section

3.2).

The work presented in this thesis is summarised below.

1. Blind steganalysis. We studied and analysed different image characteris-

tics. These images are produced by three different steganographic methods.

Based on the analysis, we developed an effective feature extraction technique

to extract a set of sensitive and discriminating features. Using a SVM as

the classifier, we constructed a blind steganalysis to detect the presence of

secret messages embedded in the binary images.

2. Multi-class steganalysis. To the best of our knowledge, no multi-class ste-

ganalysis was proposed for binary images at the time we published our multi-

class steganalysis in [19]. Besides being able to detect the presence of a secret

message in the binary image, this analysis reveals the type of steganographic

method used to produce the stego image. This information is crucial and

serves as an additional secret parameter that can narrow the scope of anal-

131

ysis. Thus, our multi-class steganalysis can be considered to extend blind

steganalysis to a more involved level of analysis.

3. Message length estimation. Information such as the length of an embedded

message is important. In this thesis, we proposed a technique for estimating

the length of a message embedded in a binary image. Specifically, in this

work, our technique attacks the steganographic method developed by Liang

et al. in [69]. This type of analysis is normally considered targeted steganal-

ysis, which plays an important role at other levels of analysis (i.e., retrieval

of the stegokey).

4. Steganographic payload locations identification. In general, the only evidence

we need to break a steganography is to verify and prove that a secret message

exists in the image. However, this does not provide enough information for

us to locate the secret message. We developed a technique for identifying

the steganographic payload locations, based on multiple stego images. Our

technique can reveal which pixels in the binary image have been used to

carry the secret message bits.

Finally, we revisited some of the existing blind steganalysis techniques for analysing

JPEG images. We combined several types of features and applied a feature selec-

tion technique for the analysis, which not only improves the detection accuracy,

but also reduces the computational resources. We showed that an enhancement

can be obtained by minimising the influence of image content. In other words, we

increased the feature sensitivity with respect to the differences caused by stegano-

graphic artefacts, rather than the image content.

Even though this thesis is formulated as an attack on binary image steganography,

we hope that it will contribute to the design of a more secure steganographic

method. More precisely, the analysis presented in this thesis can be used to

evaluate and measure the security level of a steganographic method, instead of

using conventional measurements, such as PSNR.

10.2 Future Research Directions

Modern blind steganalysis techniques are not universal in the sense that their ef-

fectiveness depends very much on both the type of cover images and the stegano-

graphic methods used. For example, effective blind steganalysis for JPEG image

steganography will not be as effective when applied to a spatial domain image.

132

Hence, future work should focus on constructing a real universal steganalysis tech-

nique.

For multi-class steganalysis, the performance drops when the number of different

steganographic methods used to train the classifier increases. This happens be-

cause the feature set may not be optimal. Another reason is that a similarity in

the embedding algorithms might exist across different steganographic methods,

making them difficult to identify and differentiate. Thus, a more effective and dis-

criminating feature set should be developed. In addition, a better strategy should

be found and employed for constructing the multi-class steganalysis.

The identification of payload locations in this thesis has been simplified to rep-

resent a generic case. This can be seen in our experiments where a generic LSB

replacement steganography is used. A more challenging environment can be set up

to include other complicated steganographies, such as steganography with adap-

tive embedding functionality.

Even though our steganographic payload identification technique can identify the

locations with high accuracy, unfortunately, we cannot retrieve meaningful content

of the secret message because what we have is a randomly scattered collection of

message bits. We need to re-order them into the correct sequence to extract the

message. Obviously, a more complete analysis, involving the correct sequence

extraction should be engaged. To gain further insight in this problem, we should

examine the retrieval of the stegokey. Unfortunately, this area has been scarcely

studied except for the material published in [41, 42].

133

Bibliography

[1] B. Anckaert, B. D. Sutter, D. Chanet, and K. D. Bosschere. Steganogra-phy for Executables and Code Transformation Signatures. 7th InternationalConference on Information Security and Cryptology, 3506:425–439, 2004.

[2] R. J. Anderson. Stretching the Limits of Steganography. 1st InternationalWorkshop on Information Hiding, 1174:39–48, 1996.

[3] R. J. Anderson and F. A. P. Petitcolas. On the limits of steganography.IEEE Journal of Selected Areas in Communications, 16(4):474–481, 1998.

[4] I. Avcibas, M. Nasir, and B. Sankur. Steganalysis based on image qualitymetrics. IEEE 4th Workshop on Multimedia Signal Processing, pages 517–522, 2001.

[5] S. Badura and S. Rymaszewski. Transform domain steganography in DVDvideo and audio content. IEEE International Workshop on Imaging Systemsand Techniques, pages 1–5, 2007.

[6] J. D. Ballard, J. G. Hornik, and D. Mckenzie. Technological Facilitationof Terrorism: Definitional, Legal, and Policy Issues. American BehavioralScientist, 45(6):989–1016, 2002.

[7] R. Bohme and A. Westfeld. Breaking cauchy model-based JPEG steganog-raphy with first order statistics. 9th European Symposium on Research Com-puter Security, 3193:125–140, 2004.

[8] C. Cachin. An Information-Theoretic Model for Steganography. 2nd Inter-national Workshop on Information Hiding, 1525:306–318, 1998.

[9] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines.Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm, 2001.

[10] C.-C. Chang, C.-S. Tseng, and C.-C. Lin. Hiding data in binary images. 1stInternational Conference on Information Security Practice and Experience,3439:338–349, 2005.

[11] S. Chatterjee and A. S. Hadi. Regression Analysis by Example. John Wileyand Sons, 4th edition, 2006.

[12] A. Cheddad, J. Condell, K. Curran, and P. McKevitt. Digital imagesteganography: Survey and analysis of current methods. Signal Process-ing, 90(3):727–752, 2010.

134

[13] C. Chen and Y. Q. Shi. JPEG Image Steganalysis Utilizing both Intrablockand Interblock Correlations. IEEE International Symposium on Circuits andSystems, pages 3029–3032, 2008.

[14] C. Chen, Y. Q. Shi, W. Chen, and G. Xuan. Statistical Moments BasedUniversal Steganalysis using JPEG 2-D Array and 2-D Characteristic Func-tion. IEEE International Conference on Image Processing, pages 105–108,2006.

[15] X. Chen, Y. Wang, T. Tan, and L. Guo. Blind Image Steganalysis Basedon Statistical Analysis of Empirical Matrix. International Conference onPattern Recognition, 3:1107–1110, 2006.

[16] Z. Chen, S. Haykin, J. J. Eggermont, and S. Becker. Correlative Learning:A Basis for Brain and Adaptive Systems. John Wiley and Sons, 2007.

[17] K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Ste-ganalysis. IEEE Conference on Digital Image Computing: Techniques andApplications, pages 96–103, 2008.

[18] K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement ViaImage-to-Image Variation Minimization. International IEEE Conference onAdvanced Computer Theory and Engineering, pages 223–227, 2008.

[19] K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Clas-sification Based on Multi-Class Steganalysis. 6th International Conferenceon Information Security, Practice and Experience, 6047:341–358, 2010.

[20] K. L. Chiew and J. Pieprzyk. Blind steganalysis: A countermeasure forbinary image steganography. International Conference on Availability, Re-liability and Security, pages 653–658, 2010.

[21] K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in BinaryImage Embedded by Using Boundary Pixels Steganography. InternationalConference on Availability, Reliability and Security, pages 683–688, 2010.

[22] K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Locationin Binary Image. 11th Pacific Rim Conference on Multimedia – Advancesin Multimedia Information Processing, 6297:590–600, 2010.

[23] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning,20(3):273–297, 1995.

[24] I. Cox, J. Kilian, F. Leighton, and T. Shamoon. Secure spread spectrumwatermarking for multimedia. IEEE Transactions on Image Processing,6(12):1673–1687, 1997.

[25] I. J. Cox, M. L. Miller, J. A. Bloom, J. Fridrich, and T. Kalker. Digital wa-termarking and steganography. The Morgan Kaufmann series in multimediainformation and systems. Morgan Kaufmann Publishers, 2nd edition, 2008.

135

[26] N. Cvejic and T. Seppanen. Increasing robustness of lsb audio steganographyby reduced distortion lsb coding. Journal of Universal Computer Science,11(1):56–65, 2005.

[27] I. Davidson and G. Paul. Locating Secret Messages in Images. 10th ACMSIGKDD International Conference on Knowledge Discovery and Data Min-ing, pages 545–550, 2004.

[28] J. Davis, J. MacLean, and D. Dampier. Methods of information hiding anddetection in file systems. 5th IEEE International Workshop on SystematicApproaches to Digital Forensic Engineering, pages 66–69, 2010.

[29] A. Delforouzi and M. Pooyan. Adaptive Digital Audio Steganography Basedon Integer Wavelet Transform. 3rd International Conference on Interna-tional Information Hiding and Multimedia Signal Processing, pages 283–286,2007.

[30] J. Dong and T. Tan. Blind image steganalysis based on run-length histogramanalysis. 15th IEEE International Conference on Image Processing, pages2064–2067, 2008.

[31] J. Dong, W. Wang, and T. Tan. Multi-class blind steganalysis based onimage run-length analysis. 8th International Workshop on Digital Water-marking, 5703:199–210, 2009.

[32] H. Farid. Detecting Steganographic Messages in Digital Images. TR2001-412, Department of Computer Science, Dartmouth College, 2001.

[33] H. Farid. Detecting Hidden Messages Using Higher-Order Statistical Models.International Conference on Image Processing, 2:905–908, 2002.

[34] R. Fisher, S. Perkins, A. Walker, and E. Wolfart. HYPER-MEDIA IMAGE PROCESSING REFERENCE. Available athttp://homepages.inf.ed.ac.uk/rbf/HIPR2/spatdom.htm.

[35] J. Fridrich. Feature-Based Steganalysis for JPEG Images and its Impli-cations for Future Design of Steganographic Schemes. 6th InternationalWorkshop on Information Hiding, 3200:67–81, 2004.

[36] J. Fridrich and M. Goljan. Digital image steganography using stochasticmodulation. Proceedings of the SPIE on Security and Watermarking ofMultimedia Contents V, 5020(1):191–202, 2003.

[37] J. Fridrich and M. Goljan. On estimation of secret message length in LSBsteganography in spatial domain. Security, Steganography, and Watermark-ing of Multimedia Contents VI, 5306:23–34, 2004.

[38] J. Fridrich, M. Goljan, and D. Hogea. Attacking the OutGuess. Proceedingsof ACM: Special Session on Multimedia Security and Watermarking, 2002.

[39] J. Fridrich, M. Goljan, and D. Hogea. Steganalysis of JPEG Images: Break-ing the F5 Algorithm. 5th International Workshop on Information Hiding,2578:310–323, 2003.

136

[40] J. Fridrich, M. Goljan, D. Hogea, and D. Soukal. Quantitative steganalysisof digital images: estimating the secret message length. Multimedia System,9(3):288–302, 2003.

[41] J. Fridrich, M. Goljan, and D. Soukal. Searching for the Stego-Key. Pro-ceedings of the SPIE on Security and Watermarking of Multimedia ContentsVI, 5306:70–82, 2004.

[42] J. Fridrich, M. Goljan, D. Soukal, and T. Holotyak. Forensic Steganalysis:Determining the Stego Key in Spatial Domain Steganography. Proceedingsof the SPIE on Security and Watermarking of Multimedia Contents VII,5681:631–642, 2005.

[43] D. Fu, Y. Q. Shi, and D. Zou. JPEG Steganalysis Using Empirical Transi-tion Matrix in Block DCT Domain. International Workshop on MultimediaSignal Processing, 2006.

[44] M. Goljan, J. Fridrich, and T. Holotyak. New blind steganalysis and itsimplications. Security, Steganography, and Watermarking of MultimediaContents VIII, 6072, 2006.

[45] G. Golub and W. Kahan. Calculating the singular values and pseudo-inverseof a matrix. Journal of the Society for Industrial and Applied Mathematics,Series B: Numerical Analysis, 2(2):205–224, 1965.

[46] D. Gong, F. Liu, B. Lu, P. Wang, and L. Ding. Hiding informationin in javaclass file. International Symposium on Computer Science and ComputationalTechnology, 2:160–164, 2008.

[47] P. Greenspun. Philip Greenspun. Available at http://philip.greenspun.com.

[48] J. Harmsen and W. Pearlman. Steganalysis of additive-noise modelableinformation hiding. Proceedings of the SPIE on Security and Watermarkingof Multimedia Contents V, 5020:131–142, 2003.

[49] J. He and J. Huang. Steganalysis of stochastic modulation steganography.Science in China Series F: Information Sciences, 49(3):273–285, 2006.

[50] J. He, J. Huang, and G. Qiu. A New Approach to Estimating HiddenMessage Length in Stochastic Modulation Steganography. 4th InternationalWorkshop on Digital Watermarking, 3710:1–14, 2005.

[51] S. Hetzl and P. Mutzel. A Graph-Theoretic Approach to Steganography.9th IFIP TC-6 TC-11 International Conference on Communications andMultimedia Security, 3677:119–128, 2005.

[52] M. Hogan. Security and Robustness Analysis of Data Hiding Techniques forSteganography. PhD thesis, University College Dublin, 2008.

[53] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass supportvector machines. IEEE Transactions on Neural Networks, 13:415–425, 2002.

137

[54] F. Huang and J. Huang. Calibration based universal JPEG steganalysis.Science in China Series F: Information Sciences, 52(2):260–268, 2009.

[55] A. Jain and D. Zongker. Feature Selection: Evaluation, Application, andSmall Sample Performance. IEEE Transactions on Pattern Analysis andMachine Intelligence, 19(2), 1997.

[56] M. Jiang, N. Memon, E. Wong, and X. Wu. Quantitative steganalysis ofbinary images. IEEE International Conference on Image Processing, pages29–32, 2004.

[57] M. Jiang, X. Wu, E. Wong, and A. Memon. Steganalysis of boundary-based steganography using autoregressive model of digital boundaries. IEEEInternational Conference on Multimedia and Expo, 2:883–886, 2004.

[58] L. Jodar, A. G. Law, A. Rezazadeh, J. H. Watson, , and G. Wu. Compu-tations for the moore-penrose and other generalized inverses. CongressusNumerantium, pages 57–64, 1991.

[59] N. F. Johnson and S. Jajodia. Exploring steganography: Seeing the unseen.Computer, 31(2):26–34, 1998.

[60] J. Kelley. Terror groups hide behind Web encryption. USA Today . Availableat http://www.usatoday.com/tech/news/2001-02-05-binladen.htm, 2 May2001.

[61] A. Ker. Steganalysis of LSB Matching in Grayscale Images. IEEE SignalProcessing Letters, 12(6):441–444, 2005.

[62] A. D. Ker. Locating Steganographic Payload via WS Residuals. 10th ACMWorkshop on Multimedia and Security, pages 27–32, 2008.

[63] A. D. Ker and R. Bohme. Revisiting weighted stego-image steganalysis.Security, Forensics, Steganography, and Watermarking of Multimedia Con-tents X, 6819, 2008.

[64] A. D. Ker and I. Lubenko. Feature Reduction and Payload Location withWAM Steganalysis. Media Forensics and Security, 7254, 2009.

[65] M. Kharrazi, H. Sencar, and N. Memon. Benchmarking steganographic andsteganalysis techniques. Proceedings of the SPIE on Security, Steganography,and Watermarking of Multimedia Contents VII, 5681:252–263, 2005.

[66] M. Kharrazi, H. Sencar, and N. Memon. Improving steganalysis by fu-sion techniques: a case study with image steganography. Proceedings of theSPIE on Security, Steganography, and Watermarking of Multimedia Con-tents VIII, 6072:51–58, 2006.

[67] C. Kim. Data hiding based on compressed dithering images. Advances inIntelligent Information and Database Systems, 283:89–98, 2010.

138

[68] X.-W. Kong, W.-F. Liu, and X.-G. You. Secret Message Location Steganal-ysis Based on Local Coherences of Hue. 6th Pacific-Rim Conference onMultimedia, 3768:301–311, 2005.

[69] G.-l. Liang, S.-z. Wang, and X.-p. Zhang. Steganography in binary image bychecking data-carrying eligibility of boundary pixels. Journal of ShanghaiUniversity (English Edition), 11(3):272–277, 2007.

[70] Z. Liu, L. Ping, J. Chen, J. Wang, and X. Pan. Steganalysis based on dif-ferential statistics. 5th International Conference on Cryptology and NetworkSecurity, 4301:224–240, 2006.

[71] D.-C. Lou, C.-L. Liu, and C.-L. Lin. Message estimation for universal ste-ganalysis using multi-classification support vector machine. Computer Stan-dards & Interfaces, 31(2):420–427, 2009.

[72] J. Lukas and J. Fridrich. Estimation of primary quantization matrix indouble compressed JPEG images. Proceedings on Digital Forensic ResearchWorkshop, pages 5–8, 2003.

[73] S. Lyu and H. Farid. Detecting Hidden Messages Using Higher-Order Statis-tics and Support Vector Machines. 5th International Workshop on Informa-tion Hiding, 2002.

[74] S. Lyu and H. Farid. Steganalysis using color wavelet statistics and one-class support vector machines. Security, Steganography, and Watermarkingof Multimedia Contents VI, 5306:35–45, 2004.

[75] S. Lyu and H. Farid. Steganalysis Using Higher-Order Image Statistics.IEEE Transactions on Information Forensics and Security, 1(1):111–119,2006.

[76] L. Marvel, C. G. Boncelet, Jr, and C. T. Retter. Spread Spectrum ImageSteganography. IEEE Transactions on Image Processing, 8(8):1075–1083,1999.

[77] Y.-y. Meng, B.-j. Gao, Q. Yuan, F.-g. Yu, and C.-f. Wang. A novel steganal-ysis of data hiding in binary text images. 11th IEEE Singapore InternationalConference on Communication Systems, pages 347–351, 2008.

[78] T. Morkel, J. H. P. Eloff, and M. S. Olivier. Using image steganographyfor decryptor distribution. OTM Confederated International Workshops,4277:322–330, 2006.

[79] B. Morrison. Ex-USA Today reporter faked major stories. USA To-day . Available at http://www.usatoday.com/news/2004-03-18-2004-03-18 kelleymain x.htm, 19 March 2004.

[80] W. S. Noble. What is a support vector machine? Nature biotechnology,24(12), 2006.

139

[81] H. Noda, T. Furuta, M. Niimi, and E. Kawaguchi. Video steganographybased on bit-plane decomposition of wavelet-transformed video. Security,Steganography, and Watermarking of Multimedia Contents VI, 5306(1):345–353, 2004.

[82] H.-K. Pan, Y.-Y. Chen, and Y.-C. Tseng. A secure data hiding scheme fortwo-color images. 5th IEEE Symposium on Computers and Communications,pages 750–755, 2000.

[83] H. Pang, K.-L. Tan, and X. Zhou. Steganographic schemes for file system andb-tree. IEEE Transactions on Knowledge and Data Engineering, 16:701–713,2004.

[84] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding—asurvey. Proceedings of the IEEE, 87(7):1062–1078, 1999.

[85] T. Pevny and J. Fridrich. Towards Multi-class Blind Steganalyzer for JPEGImages. International Workshop on Digital Watermarking, LNCS, 3710:39–53, 2005.

[86] T. Pevny and J. Fridrich. Determining the Stego Algorithm for JPEG Im-ages. In Special Issue of IEE Proceedings - Information Security, 153(3):75–139, 2006.

[87] T. Pevny and J. Fridrich. Multi-class blind steganalysis for JPEG images.Proceedings of the SPIE on Security and Watermarking of Multimedia Con-tents VIII, 6072(1):257–269, 2006.

[88] T. Pevny and J. Fridrich. Merging markov and DCT features for multi-classJPEG steganalysis. Proceedings of the SPIE on Security and Watermarkingof Multimedia Contents IX, 6505(1):1–13, 2007.

[89] T. Pevny and J. Fridrich. Multi-Class Detector of Current SteganographicMethods for JPEG Format. IEEE Transactions on Information Forensicsand Security, 3(4):635–650, 2008.

[90] N. Provos. Defending Against Statistical Steganalysis. Proceedings of the10th conference on USENIX Security Symposium, 10:323–335, 2001.

[91] N. Provos and P. Honeyman. Detecting Steganographic Content on theInternet. Proceedings of the Network and Distributed System Security Sym-posium, 2002.

[92] N. Provos and P. Honeyman. Hide and Seek: An Introduction to Steganog-raphy. IEEE Security & Privacy, 1(3):32–44, 2003.

[93] P. Pudil, J. Novovicova, and J. Kittler. Floating search methods in featureselection. Pattern Recognition Letters, 15(11):1119–1125, 1994.

[94] N. B. Puhan, A. T. S. Ho, and F. Sattar. High capacity data hiding in binarydocument images. 8th International Workshop on Digital Watermarking,5703:149–161, 2009.

140

[95] B. Rodriguez and G. L. Peterson. Detecting steganography using multi-classclassification. IFIP International Conference on Digital Forensics, 242:193–204, 2007.

[96] P. Sallee. Model-based steganography. 2nd International Workshop on Dig-ital Watermarking, 2939:154–167, 2003.

[97] A. Savoldi and P. Gubian. Blind multi-class steganalysis system usingwavelet statistics. 3rd International Conference on Intelligent InformationHiding and Multimedia Signal Processing, 2:93–96, 2007.

[98] G. Schaefer and M. Stich. UCID - An Uncompressed Colour Image Database.Proc. SPIE, Storage and Retrieval Methods and Applications for Multimedia,pages 472–480, 2004.

[99] Y. Q. Shi, C. Chen, and W. Chen. A Markov Process Based Approach toEffective Attacking JPEG Steganography. 8th International Workshop onInformation Hiding, 4437:249–264, 2006.

[100] Y. Q. Shi, G. Xuan, C. Yang, J. Gao, Z. Zhang, P. Chai, D. Zou, C. Chen,and W. Chen. Effective Steganalysis Based on Statistical Moments ofWavelet Characteristic Function. International Conference on InformationTechnology: Coding and Computing, pages 768–773, 2005.

[101] Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen,and C. Chen. Image Steganalysis Based on Moments of Characteristic Func-tions Using Wavelet Decomposition, Prediction-Error Image, and NeuralNetwork. IEEE International Conference on Multimedia and Expo, pages269–272, 2005.

[102] K. Sullivan, U. Madhow, S. Chandrasekaran, and B. Manjunath. Steganaly-sis for Markov Cover Data With Applications to Images. IEEE Transactionson Information Forensics and Security, 1(2):275–287, 2006.

[103] P. S. Tibbetts. Terrorist Use of the Internet And Related Information Tech-nologies. Monograph, School of Advanced Military Studies, Fort Leaven-worth, 2002.

[104] U. Topkara, M. Topkara, and M. J. Atallah. The hiding virtues of ambi-guity: quantifiably resilient watermarking of natural language text throughsynonym substitutions. 8th workshop on Multimedia and security, pages164–174, 2006.

[105] S. Trivedi and R. Chandramouli. Locally Most Powerful Detector for Se-cret Key Estimation in Spread Spectrum Image Steganography. Proceedingsof the SPIE on Security, Steganography, and Watermarking of MultimediaContents VI, 5306:1–12, 2004.

[106] S. Trivedi and R. Chandramouli. Secret Key Estimation in SequentialSteganography. IEEE Transactions on Signal Processing, 53(2):746–757,2005.

141

[107] Y.-C. Tseng and H.-K. Pan. Secure and invisible data hiding in 2-colorimages. 20th Annual Joint Conference of the IEEE Computer and Commu-nications Societies, 2:887–896, 2001.

[108] P. Wang, F. Liu, G. Wang, Y. Sun, and D. Gong. Multi-class steganalysisfor JPEG stego algorithms. 15th IEEE International Conference on ImageProcessing, pages 2076–2079, 2008.

[109] Y. Wang and P. Moulin. Optimized feature extraction for learning-based im-age steganalysis. IEEE Transactions on Information Forensics and Security,2(1), 2007.

[110] A. Westfeld. F5 - A Steganographic Algorithm. 4th International Workshopon Information Hiding, 2137:289–302, 2001.

[111] A. Westfeld and A. Pfitzmann. Attacks on Steganographic System. 3rdInternational Workshop on Information Hiding, 1768:61–76, 2000.

[112] M. Wu and B. Liu. Data hiding in binary image for authentication andannotation. IEEE transactions on multimedia, 6(4):528–538, 2004.

[113] M. Y. Wu and J. H. Lee. A Novel Data Embedding Method for Two-ColorFacsimile Images. International Symposium on Multimedia Information Pro-cessing, 1998.

[114] W. Xinli, F. Albregtsen, and B. Foyn. Texture features from gray level gaplength matrix. Proceedings of IAPR Workshop on Machine Vision Applica-tions, pages 375–378, 1994.

[115] G. Xuan, Y. Q. Shi, J. Gao, D. Zou, C. Yang, Z. Zhang, P. Chai, C. Chen,and W. Chen. Steganalysis Based on Multiple Features Formed by StatisticalMoments of Wavelet Characteristic Functions. 7th International Workshopon Information Hiding, 3727:262–277, 2005.

[116] G. Xuan, Y. Q. Shi, C. Huang, D. Fu, X. Zhu, P. Chai, and J. Gao. Steganal-ysis Using High-Dimensional Features Derived from Co-occurrence Matrixand Class-Wise Non-Principal Components Analysis (CNPCA). 5th Inter-national Workshop on Digital Watermarking, 4283:49–60, 2006.

[117] C.-Y. Yang. Color image steganography based on module substitutions.3rd IEEE International Conference on Intelligent Information Hiding andMultimedia Signal Processing, 2:118–121, 2007.

[118] L. Yuling, S. Xingming, G. Can, and W. Hong. An Efficient LinguisticSteganography for Chinese Text. IEEE International Conference on Multi-media and Expo, pages 2094–2097, 2007.

[119] G. Zhang. Neural networks for classification: a survey. IEEE Transac-tions on Systems, Man and Cybernetics, Part C: Applications and Reviews,30(4):451–462, 2000.

142

[120] H. Zong, F. Liu, and X. Luo. A wavelet-based blind JPEG image steganal-ysis uing co-occurrence matrix. 11th International Conference on AdvancedCommunication Technology, 3:1933–1936, 2009.

[121] D. Zou, Y. Q. Shi, W. Su, and G. Xuan. Steganalysis Based on MarkovModel of Thresholded Prediction-error Image. IEEE International Confer-ence on Multimedia and Expo, pages 1365–1368, 2006.

143

Index

512-pattern histogram, 80

AUR, 59, 108

backpropagation, 13

bit per pixel, 10, 14

bitmap, 16

Blind steganalysis, 8, 10

BMP, 16

bpp, 10, 14

center of mass, 20

CF, 107

characteristic function, 20, 107

classification, 10

COM, 20

cover image, 7

cumulative sum, 41

curse of dimensionality, 19

CUSUM, 41

DCT, 28

differential operation, 24

differential statistics, 24

digitisation, 14

discrete Fourier transform, 20

embedding operation, 6

extraction operation, 6

F5, 7, 33

feedforward, 13

Fisher linear discriminant, 12

generalized Gaussian distribution, 35

GGD, 35

GIF, 16

GLCM, 67

grey level co-occurrence, 67

histogram difference, 82

histogram quotient, 85

image calibration, 26

JBIG 2, 36

JPEG, 16

LAHD, 23

least significant bit, 7

LMP, 41

local angular harmonic decomposition,

23

locally most powerful, 41

LSB, 7

machine learning, 10

mode, 25, 29

Model-based steganography, 48

multivariate regression, 12

Neural networks, 13

OutGuess, 7

pairs of values, 33

pattern recognition, 10

pixel, 14

PMF, 26

PNG, 17

PoV, 33

prediction-error, 21, 26

PRNG, 7, 40, 94

144

probability density function, 21

probability mass function, 26

processing element, 13

pseudorandom number generator, 7, 40

QMF, 23

quadrature mirror filters, 23

radial basis function, 74

random embedding, 7

raster graphic, 15

RBF, 74

Receiver Operating Characteristic, 59

ROC, 59

separating hyperplane, 14

separating line, 13

sequential embedding, 7

sequential forward floating selection, 108

SFFS, 108

steganalysis, 6

steganographic, 6

stego image, 7

stegokey, 7

supervised learning, 11

support vector machine, 13, 31

SVM, 13

Targeted steganalysis, 8

TIFF, 16

vector graphic, 15

WAM, 39

wavelet absolute moments, 39

weighted stego image, 38

WS, 38

145

Documents

Steganalysis of Binary Images