Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University

Three Novel Algorithms for Hiding Data in

PDF Files Based on Incremental Updates

Li Lei School of Information Science and Technology

Sun Yat-Sen University

Contents

1

2

3

4

5

Introduction

The Structure of PDF Files

Experimental Results

Proposed Algorithms

Incremental Updates

6 Future work

• Introduction

PDF (Portable Document Format)

A widely used electronic document format

High printing quality

Cross-platform applicability

Device-independence

Hiding information in PDF file

Secret message transmission

Mark the source and transmission path

• Introduction

Existing algorithms First category

Varying the line, word, character spacing or other certain display attributes

slightly.

[2,3,4,5,6,7]

Obvious defects that the effect of page display is disturbed and that

information security is relatively low.

Second category

Adding or changing the content of PDF file streams.

[8,9,10]

Disadvantages in guaranteeing large capacity, high security and robustness

to some degree.

• Introduction

• The structure of PDF file

Header

Body

Cross-referencetable

Trailer

File structure (Physical structure)

It includes the header, the body which contains a

lot of objects, the cross-reference table containing

information about the indirect objects in the file and

the trailer.

It determines how the objects are stored in a PDF

file.

Document structure (Logical structure) A PDF document can be regarded as a hierarchy

of objects contained in the body section of a PDF file.

The document structure of PDF file is organized

in the shape of an object tree topped by Catalog

and five subtrees named Page tree, Outline hierarchy,

Article thread, Named destinations and

Interactive form included.


Object

An object is the basic element in PDF files. PDF supports eight basic types of

objects: Boolean Object, Numeric Object, String Object, Name Object, Array

Object, Dictionary Object, Stream Object and Null Object.

Objects may be labeled so that they can be referred to by other objects. A labeled

object is called an indirect object.


Content stream

The content stream belong to Page tree contains the almost all information

about PDF text contents and display attributes. Each page’s contents will be

cut to some blocks and saved in some dictionary objects named Contents object.

Each Contents object will contain text object and text state. The text object

describes the text contents and the text state is a collection of page display

attributes.


• Incremental updates

Header

OriginalBody

OriginalCross-reference

sectionOriginal trailer

Updated body 1

Cross-referenceSection 1

Updated trailer 1...

Updated trailer n

Initial structureOf PDF file

Incremental Update 1

Incremental Update n

...

The contents of PDF file can be updated

incrementally without rewriting the entire

file. Changes are appended to the end of

the file, leaving its original contents

intact.

In an incremental update, any new or

changed objects are appended to the file,

which constitute the updated body at the

end of the file, a cross-reference section

and a new trailer are appended followed.

• Incremental updates

When Incremental updates?

Right-click and modify properties

“Save” editing operations

•Proposed algorithms

1. A compensated version of modifying display attributes

Text state in Contents object indicates the attributes of text display. Every attribute

has a operator key word to mark it, such as Char Space: Tc, Word Space: Tw, Scale:

Tz,

Leading: TL, Font size: Tf, Render: Tr, Rise: Ts etc. These operator key words in the

content stream can be modified to hide information.


1. A compensated version of modifying display attributes

But these algorithms affect the display of the PDF file.


1. A compensated version of

modifying display attributes we can compensate the effect of data hiding

using incremental updates of PDF files:

After altering the text states of contents objects

to embed information, the original contents

objects are written in updated body.

PDF file

Read file stream

Get all Contents objects and decode them

If text state space is enough

Report:Embedding

failed

No

Embeddeddata

modify the lowest order's parity of the text state

Stego-keyRewrite the originalContents objects by incremental updat

Output Stego-file and

stego-key

Chaoticsequence

Yes


2. Algorithms based on new body and cross-reference section

① In the updated body, the actual embedded carrier is indirect objects. Considering the

complexity of inserting objects, content security, capacity and other factors, we

select stream object as the embedded carrier.

② Select the new cross-reference section as covert information carrier. We can embed

information by controlling the 10-bytes offset in cross-reference section’s entry. Use

the difference of adjacent entries’ offset to represent the covert information.

PDF file

Read file stream

Add n+1 stream objects with the specified length based on the ordered decimal sequence

Embeddeddata

Stego-key

Output stego-PDF fileand stego-key

Write new stream objects, each stream content is compressed

embedded segment

chaos the ordered binary segment

sequence

Add new cross reference section

Embeddeddata

Stego-key

chaos the ordered binary segment

sequence

Add new cross reference section


2. Algorithms based on new body and cross-reference section

•The experimental results and analysis

Data Embedding Capacity

User interface:

Perceptual transparency property

Seen from the effects chart, after having

embedded data, there was not any change

in display effect of the cover file.


The robustness to reading and editing operations

1. Robustness to annotating and marking operations

Apply Adobe Acrobat 9 Pro to annotate and mark

the embedded PDF file in various ways.

We try to extract the covert information

from it. And the experiment result shows

that the accuracy of extracting data is 100%.


The robustness to reading and editing operations

1. Robustness to interactive form editing

(a) is the stego file without any editing and (b) is the file been written some contents

to (a). We try to extract the covert information from (b), and the experiment result

shows that the accuracy of extracting test is 100%.



File Size Page number Embedded SizeSize increasing

percentage

1 149KB 4 153KB 2.7%

2 237KB 4 245KB 3.4%

3 271KB 4 272KB 0.4%

4 298KB 4 306KB 2.7%

5 303KB 6 304KB 0.3%

6 349KB 7 350KB 0.3%

7 413KB 2 415KB 0.5%

8 543KB 5 544KB 0.2%

9 663KB 4 664KB 0.15%

10 801KB 10 803KB 0.2%

Increase in the size of carrier file 1. Algorithm 1 (Embed 128 bits)

Rewriting a Contents object by incremental update will increase the size of the original file by 1 to 8 KB (depending on the size of the original Contents object). Real experimental result shows average rate of files’ size increase is around 1%.


Increase in the size of carrier file

2. Algorithm 2, 3 (Embed 128 bits)

The increase of the size caused by algorithm 2 is irrelevant to the original files. Using 4 objects

to embed 128 bits, will add no more than 1KB to original PDF file. 200KB0.5%

The increase of the size caused by algorithm 3 is also irrelevant to the original files. Using 22

entries (need to add 22 new objects) of cross-reference to embed 128 bits, the maximal size

increase will be around 4 to 5 KB. 2002.5%

•The experimental results and analysis Performance Comparison

Performance

Incremental

updates

methods

wbStego 4.3

The methods based

on varying

display attributes

The methods based

on changing

entries’ order

Perceptual

transparencyNo changed No changed

Slightly

changedNo changed

Embedding

capacitylarge enough Small Based on file Based on file

Security HighRelatively

low

Relatively

LowHigh

Robustness StrongRelatively

Strong

Relatively

StrongMedium

•Future work

Different versions of PDF files are being used at present. Some higher versions of

PDF files have used cross-reference streams to store the information of indirect objects.

How to advance the compatibility of different PDF versions is the emphasis for our

next step work.

1. Adobe Systems Incorporated. PDF Reference, fifth edition, version 1.6.http://www.adobe.com/devnet/pdf/pdfs/PDFReference16.pdf, 20062. S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking methods. IEEE Journal on Selected Areas in Communications, Vol.16, No.4, 1998,pp.561-5723. J. T. Brassil, et al. Electronic marking and identification techniques to discourage document copying, IEEE Journal on Selected Areas in Communications,Vol.13, No.8, 1995, pp.1495-15044. Shangping Zhong, Tierui Chen. Information Steganography Algorithm Based on PDF Documents. Computer Engineering, Vol.32, No.3, Feb. 2006, pp.161-1635. S. H. Low, et al. Document marking an identification using both line and word shifing. in Proceedings INFOCOM’95, Boston, MA, Apr. 1995, pp.853-8606. N. F. Maxemchuk and S. H. Low. Marking text documents. in Proceedings, International Conference Image Processing,, Boston, Santa Barbara, CA, Oct. 1997, pp.13-177. E. Franz and A. Pfitzmann. Steganography secure against Cover-Stego-Attacek, 3 th International Workshop, Information Hiding 1999,2000, pp.29-46.8. wbStego Studio. The steganography tool wbStego4. http://www.wbailer.com/wbstego, 2007.9. Youji Liu, Xingming Sun, Gang Luo. A Novel Information Hidng Algorithm Based on Structure of PDF Document. Computer Engineering, Vol.32, No.17, Sep. 2006, pp.230-23210. Xingtong Liu, Quan Zhang, Chaojing Tang, Jingjing Zhao and Jian Liu. A Steganographic Algorithm for Hiding Data in PDF Files Based on Equivalent Transformation, in Information Processing (ISIP), 2008 International Symposiums on, 23-25 May 2008, pp. 417-421.

•Reference

It’s all

Thanks