Upload
robyn-boone
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Three Novel Algorithms for Hiding Data in
PDF Files Based on Incremental Updates
Li Lei School of Information Science and Technology
Sun Yat-Sen University
Contents
1
2
3
4
5
Introduction
The Structure of PDF Files
Experimental Results
Proposed Algorithms
Incremental Updates
6 Future work
• Introduction
PDF (Portable Document Format)
A widely used electronic document format
High printing quality
Cross-platform applicability
Device-independence
Hiding information in PDF file
Secret message transmission
Mark the source and transmission path
• Introduction
Existing algorithms First category
Varying the line, word, character spacing or other certain display attributes
slightly.
[2,3,4,5,6,7]
Obvious defects that the effect of page display is disturbed and that
information security is relatively low.
Second category
Adding or changing the content of PDF file streams.
[8,9,10]
Disadvantages in guaranteeing large capacity, high security and robustness
to some degree.
• Introduction
• The structure of PDF file
Header
Body
Cross-referencetable
Trailer
File structure (Physical structure)
It includes the header, the body which contains a
lot of objects, the cross-reference table containing
information about the indirect objects in the file and
the trailer.
It determines how the objects are stored in a PDF
file.
Document structure (Logical structure) A PDF document can be regarded as a hierarchy
of objects contained in the body section of a PDF file.
The document structure of PDF file is organized
in the shape of an object tree topped by Catalog
and five subtrees named Page tree, Outline hierarchy,
Article thread, Named destinations and
Interactive form included.
• The structure of PDF file
Object
An object is the basic element in PDF files. PDF supports eight basic types of
objects: Boolean Object, Numeric Object, String Object, Name Object, Array
Object, Dictionary Object, Stream Object and Null Object.
Objects may be labeled so that they can be referred to by other objects. A labeled
object is called an indirect object.
• The structure of PDF file
Content stream
The content stream belong to Page tree contains the almost all information
about PDF text contents and display attributes. Each page’s contents will be
cut to some blocks and saved in some dictionary objects named Contents object.
Each Contents object will contain text object and text state. The text object
describes the text contents and the text state is a collection of page display
attributes.
• The structure of PDF file
• Incremental updates
Header
OriginalBody
OriginalCross-reference
sectionOriginal trailer
Updated body 1
Cross-referenceSection 1
Updated trailer 1...
Updated trailer n
Initial structureOf PDF file
Incremental Update 1
Incremental Update n
...
The contents of PDF file can be updated
incrementally without rewriting the entire
file. Changes are appended to the end of
the file, leaving its original contents
intact.
In an incremental update, any new or
changed objects are appended to the file,
which constitute the updated body at the
end of the file, a cross-reference section
and a new trailer are appended followed.
• Incremental updates
When Incremental updates?
Right-click and modify properties
“Save” editing operations
•Proposed algorithms
1. A compensated version of modifying display attributes
Text state in Contents object indicates the attributes of text display. Every attribute
has a operator key word to mark it, such as Char Space: Tc, Word Space: Tw, Scale:
Tz,
Leading: TL, Font size: Tf, Render: Tr, Rise: Ts etc. These operator key words in the
content stream can be modified to hide information.
•Proposed algorithms
1. A compensated version of modifying display attributes
But these algorithms affect the display of the PDF file.
•Proposed algorithms
1. A compensated version of
modifying display attributes we can compensate the effect of data hiding
using incremental updates of PDF files:
After altering the text states of contents objects
to embed information, the original contents
objects are written in updated body.
PDF file
Read file stream
Get all Contents objects and decode them
If text state space is enough
Report:Embedding
failed
No
Embeddeddata
modify the lowest order's parity of the text state
Stego-keyRewrite the originalContents objects by incremental updat
Output Stego-file and
stego-key
Chaoticsequence
Yes
•Proposed algorithms
2. Algorithms based on new body and cross-reference section
① In the updated body, the actual embedded carrier is indirect objects. Considering the
complexity of inserting objects, content security, capacity and other factors, we
select stream object as the embedded carrier.
② Select the new cross-reference section as covert information carrier. We can embed
information by controlling the 10-bytes offset in cross-reference section’s entry. Use
the difference of adjacent entries’ offset to represent the covert information.
PDF file
Read file stream
Add n+1 stream objects with the specified length based on the ordered decimal sequence
Embeddeddata
Stego-key
Output stego-PDF fileand stego-key
Write new stream objects, each stream content is compressed
embedded segment
chaos the ordered binary segment
sequence
Add new cross reference section
Embeddeddata
Stego-key
chaos the ordered binary segment
sequence
Add new cross reference section
•Proposed algorithms
2. Algorithms based on new body and cross-reference section
•The experimental results and analysis
Data Embedding Capacity
User interface:
Perceptual transparency property
Seen from the effects chart, after having
embedded data, there was not any change
in display effect of the cover file.
•The experimental results and analysis
The robustness to reading and editing operations
1. Robustness to annotating and marking operations
Apply Adobe Acrobat 9 Pro to annotate and mark
the embedded PDF file in various ways.
We try to extract the covert information
from it. And the experiment result shows
that the accuracy of extracting data is 100%.
•The experimental results and analysis
The robustness to reading and editing operations
1. Robustness to interactive form editing
(a) is the stego file without any editing and (b) is the file been written some contents
to (a). We try to extract the covert information from (b), and the experiment result
shows that the accuracy of extracting test is 100%.
•The experimental results and analysis
•The experimental results and analysis
File Size Page number Embedded SizeSize increasing
percentage
1 149KB 4 153KB 2.7%
2 237KB 4 245KB 3.4%
3 271KB 4 272KB 0.4%
4 298KB 4 306KB 2.7%
5 303KB 6 304KB 0.3%
6 349KB 7 350KB 0.3%
7 413KB 2 415KB 0.5%
8 543KB 5 544KB 0.2%
9 663KB 4 664KB 0.15%
10 801KB 10 803KB 0.2%
Increase in the size of carrier file 1. Algorithm 1 (Embed 128 bits)
Rewriting a Contents object by incremental update will increase the size of the original file by 1 to 8 KB (depending on the size of the original Contents object). Real experimental result shows average rate of files’ size increase is around 1%.
•The experimental results and analysis
Increase in the size of carrier file
2. Algorithm 2, 3 (Embed 128 bits)
The increase of the size caused by algorithm 2 is irrelevant to the original files. Using 4 objects
to embed 128 bits, will add no more than 1KB to original PDF file. 200KB0.5%
The increase of the size caused by algorithm 3 is also irrelevant to the original files. Using 22
entries (need to add 22 new objects) of cross-reference to embed 128 bits, the maximal size
increase will be around 4 to 5 KB. 2002.5%
•The experimental results and analysis Performance Comparison
Performance
Incremental
updates
methods
wbStego 4.3
The methods based
on varying
display attributes
The methods based
on changing
entries’ order
Perceptual
transparencyNo changed No changed
Slightly
changedNo changed
Embedding
capacitylarge enough Small Based on file Based on file
Security HighRelatively
low
Relatively
LowHigh
Robustness StrongRelatively
Strong
Relatively
StrongMedium
•Future work
Different versions of PDF files are being used at present. Some higher versions of
PDF files have used cross-reference streams to store the information of indirect objects.
How to advance the compatibility of different PDF versions is the emphasis for our
next step work.
1. Adobe Systems Incorporated. PDF Reference, fifth edition, version 1.6.http://www.adobe.com/devnet/pdf/pdfs/PDFReference16.pdf, 20062. S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking methods. IEEE Journal on Selected Areas in Communications, Vol.16, No.4, 1998,pp.561-5723. J. T. Brassil, et al. Electronic marking and identification techniques to discourage document copying, IEEE Journal on Selected Areas in Communications,Vol.13, No.8, 1995, pp.1495-15044. Shangping Zhong, Tierui Chen. Information Steganography Algorithm Based on PDF Documents. Computer Engineering, Vol.32, No.3, Feb. 2006, pp.161-1635. S. H. Low, et al. Document marking an identification using both line and word shifing. in Proceedings INFOCOM’95, Boston, MA, Apr. 1995, pp.853-8606. N. F. Maxemchuk and S. H. Low. Marking text documents. in Proceedings, International Conference Image Processing,, Boston, Santa Barbara, CA, Oct. 1997, pp.13-177. E. Franz and A. Pfitzmann. Steganography secure against Cover-Stego-Attacek, 3 th International Workshop, Information Hiding 1999,2000, pp.29-46.8. wbStego Studio. The steganography tool wbStego4. http://www.wbailer.com/wbstego, 2007.9. Youji Liu, Xingming Sun, Gang Luo. A Novel Information Hidng Algorithm Based on Structure of PDF Document. Computer Engineering, Vol.32, No.17, Sep. 2006, pp.230-23210. Xingtong Liu, Quan Zhang, Chaojing Tang, Jingjing Zhao and Jian Liu. A Steganographic Algorithm for Hiding Data in PDF Files Based on Equivalent Transformation, in Information Processing (ISIP), 2008 International Symposiums on, 23-25 May 2008, pp. 417-421.
•Reference
It’s all
Thanks