Upload
john-smith
View
212
Download
0
Embed Size (px)
Citation preview
8/12/2019 PDF SecretsPDF Secrets
1/129
2014/05/17
secrets
PDFhiding & revealing secrets in PDF documents
Mannheim Germany
RaumZeitLabor Ange Albertini CTF PDF stegano 101
8/12/2019 PDF SecretsPDF Secrets
2/129
reverse engineering&visual documentations
corkami.com
8/12/2019 PDF SecretsPDF Secrets
3/129
The problem
You need to remove sensitive elementsof a PDF document for public release
are they actually removed ?can someone reveal your secrets ?
8/12/2019 PDF SecretsPDF Secrets
4/129
https://www.schneier.com/blog/archives/2005/05/pdf_radacting_f.html
Its not a new fact
https://www.schneier.com/blog/archives/2005/05/pdf_radacting_f.html8/12/2019 PDF SecretsPDF Secrets
5/129
http://download.repubblica.it/pdf/rapportousacalipari.pdf seen in its metadata: EmailSubject (Another Redact Job For You)
You just need to:1. uncompress the PDF2. remove all re\n occurences
(re =re ctangle operator)
There areplenty of real
examples
http://download.repubblica.it/pdf/rapportousacalipari.pdfhttp://download.repubblica.it/pdf/rapportousacalipari.pdf8/12/2019 PDF SecretsPDF Secrets
6/129
AFAIK
the topic wasnt really
covered technically
8/12/2019 PDF SecretsPDF Secrets
7/129
The reverse problem
You need to carry a sensible PDF,or exfiltrate some information:
Can you convincingly pretendthat it was a mistake,
and yet easily re-enable the contents?
8/12/2019 PDF SecretsPDF Secrets
8/129
...and, more importantly...
it still makes it an interesting exerciseto learn and experiment with PDF internals
...and it might also be useful for a CTF steganography challenge...
8/12/2019 PDF SecretsPDF Secrets
9/129
its about hiding
parts of the PDF documentnot hiding data in a PDF file
+ nothing reader-specific
8/12/2019 PDF SecretsPDF Secrets
10/129
General outline of this talk
3 relatively independent parts:1. a non-technical approach2. a basic introduction to the PDF file format3. a technical perspective
8/12/2019 PDF SecretsPDF Secrets
11/129
a non-technical approachPart I / III
8/12/2019 PDF SecretsPDF Secrets
12/129
What about that NSA doc ?
there is an NSA document on the topic.worth a read, but Adobe Acrobat (Pro) only
http://www.nsa.gov/ia/_files/app/pdf_risks.pdf
http://www.nsa.gov/ia/_files/app/pdf_risks.pdfhttp://www.nsa.gov/ia/_files/app/pdf_risks.pdf8/12/2019 PDF SecretsPDF Secrets
13/129
Preamble
this presentation has a lot of hands-on examples,that you can find at:
http://pdf.corkami.com
http://pdf.corkami.com/http://pdf.corkami.com/8/12/2019 PDF SecretsPDF Secrets
14/129
Outli ne
1. the problem (introduction)2. outline
a. see Google recursion
3. examplesa. color i. forgotten text
b. overlapped text
c. secu red documentsi. bypassing securityd. overlapped image
i. extracting image
4. Conclusion
8/12/2019 PDF SecretsPDF Secrets
15/129
So, you tried to hideelements in a PDF...
8/12/2019 PDF SecretsPDF Secrets
16/129
well, I dont see them anymore
try with the next slide:nothing is visible and yet...
1. Select All text with your favorite PDF viewer 2. Copy and paste in a text editor
You cant work on these slides via online viewing like SlideShare:it displays rendered pictures of the PDF file, not the file itself try in a dedicated viewer with the PDF file itself
8/12/2019 PDF SecretsPDF Secrets
17/129
Example: color
hidden viawhite color
8/12/2019 PDF SecretsPDF Secrets
18/129hint
8/12/2019 PDF SecretsPDF Secrets
19/129
It worked, right?
you cant see the text,but its still on the page
the software can select itExample: colorhidden viawhite color
8/12/2019 PDF SecretsPDF Secrets
20/129
Btw...
this can lead to unexpected results,so be careful before publishing slides,
even if you think you have nothing to removetry with next slide
8/12/2019 PDF SecretsPDF Secrets
21/129
HyperVortex 1.0
a publication softwareRoberto Martinez
title
authors
insert stupid footer here -- LaTeX sucks!!!
god, I hate making slides!!!
Example: forgotten text
8/12/2019 PDF SecretsPDF Secrets
22/129
Oops
maybe it wasnt a secret to be removed,buts still there!
put extra hidden content for easier indexing
god, I hate making slides!!!Example: forgotten textHyperVortex 1.0a publication softwaretitleRoberto Martinezauthorsinsert stupid footer here -- LaTeX sucks!!!
8/12/2019 PDF SecretsPDF Secrets
23/129
Another try
Try to get the secret from the next slide,with the same copy-paste trick...
8/12/2019 PDF SecretsPDF Secrets
24/129
Example: overlapped text
hidden viaoverlapping shapeCONFIDENTIAL
8/12/2019 PDF SecretsPDF Secrets
25/129
Once again...
the text is behind the CONFIDENTIAL shape,but its still there!
the software selects everything(not only the front layer)
Example: overlapped text
CONFIDENTIALhidden viaoverlapping shape
8/12/2019 PDF SecretsPDF Secrets
26/129
8/12/2019 PDF SecretsPDF Secrets
27/129
But PDF can prevent that?
yes, in theory but the text is still there, and decrypted it can be circumvented
8/12/2019 PDF SecretsPDF Secrets
28/129
either: some readers just ignore it
like Evince
generate a new file out of the original one print PDF as PDF
(not 100% compatible, but fast and usually works) decrypt
Bypassing copy/paste protection
D:\>qpdf -decrypt protected.pdf unprotected.pdf
D:\> _
8/12/2019 PDF SecretsPDF Secrets
29/129
1. open in chrome
2. print
8/12/2019 PDF SecretsPDF Secrets
30/129
1. change printer as Save as PDF2. Save
8/12/2019 PDF SecretsPDF Secrets
31/129
final document looks identical
not (SECURED) anymore
8/12/2019 PDF SecretsPDF Secrets
32/129
sometimes, text can be copied,but it comes as corrupted
its not protection, just incompatibility
try with another reader
it could be abused but its not easy to implement and its still easy to recover content
(its just a substitution cipher)
Copy/paste corruption
8/12/2019 PDF SecretsPDF Secrets
33/129
copy/paste weirdness
8/12/2019 PDF SecretsPDF Secrets
34/129
Ok, a last one
is it hopeless?
try this one...
8/12/2019 PDF SecretsPDF Secrets
35/129
Example: overlapped image
SECRET
8/12/2019 PDF SecretsPDF Secrets
36/129
Failure?
the secret behind the shape is a picture: its not copied as text by standard software
(common softwares dont copy pictures)
Example: overlapped image
SECRET
8/12/2019 PDF SecretsPDF Secrets
37/129
Does it means were safe?
No:the image is still present in the PDF document. its trivial to extract it with a standard tool
Example:use PDFImages (or mutool)
8/12/2019 PDF SecretsPDF Secrets
38/129
extracting our secret image directly from the file
D:\>pdfimages -f 32 -l 32 "PDF Secrets.pdf" .
D:\> _ D:\>mutool extract "PDF Secrets.pdf"extracting image img-0015.pngextracting image img-0016.png...
8/12/2019 PDF SecretsPDF Secrets
39/129
Conclusionon Part I / III
8/12/2019 PDF SecretsPDF Secrets
40/129
text can be copiedimages can be extracted
8/12/2019 PDF SecretsPDF Secrets
41/129
8/12/2019 PDF SecretsPDF Secrets
42/129
even if Select All does not work,secrets may still be recovered
8/12/2019 PDF SecretsPDF Secrets
43/129
but there are
more advanced tricks! need to study PDF internals
8/12/2019 PDF SecretsPDF Secrets
44/129
PDF 101basics of the PDF file format
Part II / III
8/12/2019 PDF SecretsPDF Secrets
45/129
My poster on the PDF format (free to print, reuse) http://pics.corkami.comto order a print: http://prints.corkami.com
http://prints.corkami.com/http://prints.corkami.com/http://prints.corkami.com/http://pics.corkami.com/8/12/2019 PDF SecretsPDF Secrets
46/129
A simple examplehelloworld.pdf
reminder: this is simplified, PDF is actually much more complex
8/12/2019 PDF SecretsPDF Secrets
47/129
==
8/12/2019 PDF SecretsPDF Secrets
48/129
binary stream
(text)
(text)
8/12/2019 PDF SecretsPDF Secrets
49/129
8/12/2019 PDF SecretsPDF Secrets
50/129
Recommended environment
text editor Sumatra
single-file viewer
updates on the fly
a tool to decompress streams (explanations later)
check mistakes withqpdf --check or pdfinfo
8/12/2019 PDF SecretsPDF Secrets
51/129
editing and viewing the changes on the fly
8/12/2019 PDF SecretsPDF Secrets
52/129
A PDF structure
1. header signature
2. body objects3. cross-reference table
4. trailer 5. xref pointer 6. end of file signature
8/12/2019 PDF SecretsPDF Secrets
53/129
1. PDF signature %PDF-1.0 - %PDF-1.7
2. charset identifier not required tells tools its not ASCII 4 non-ASCII chars in a
comment
Signature
8/12/2019 PDF SecretsPDF Secrets
54/129
made of objects obj
endobj
Body
8/12/2019 PDF SecretsPDF Secrets
55/129
Xref
table offsets of each objectxref0 5 5 objects, starting at 00000000000 65535 f obj #0: always null0000000016 00000 n obj #1: offset 160000000051 00000 n obj #2: offset 510000000111 00000 n
0000000283 00000 n
each line = 20 chars space before CR
8/12/2019 PDF SecretsPDF Secrets
56/129
Trailer 1/2
structurea. trailer b. object-like content
defines the root object /Size = #(xref elements)
8/12/2019 PDF SecretsPDF Secrets
57/129
Trailer 2/2
1. pointer to xref a. startxref b. offset to xref
(decimal)2. End Of File marker
a. %%EOF
8/12/2019 PDF SecretsPDF Secrets
58/129
Basic typesnames, strings, dictionaries...
8/12/2019 PDF SecretsPDF Secrets
59/129
(string ) %comment until line return
some others, less-used types(PDF is quite f*cked up)
Literals
8/12/2019 PDF SecretsPDF Secrets
60/129
equivalent files
8/12/2019 PDF SecretsPDF Secrets
61/129
points Rwith the actual contents of the
object
some object CANT be inlined
is very rarely non-null
Object reference
8/12/2019 PDF SecretsPDF Secrets
62/129
57
Object reference - example 1
354 0 R
354 0 obj57endobj
2 equivalent examples via object reference
8/12/2019 PDF SecretsPDF Secrets
63/129
8/12/2019 PDF SecretsPDF Secrets
64/129
reserved keywords like symbols in Ruby
starts with / /Pages , /Kids
case sensitive CamelCase by default undefined names are ignored
/ pages != /Pages(useful to disable tags)
Name objects
8/12/2019 PDF SecretsPDF Secrets
65/129
Syntax [ * ]
Examples: [3 0 R] = 1 value
a. 3 0 R
[0 0 612 792] = 4 valuesa. 0b. 0c. 612
d. 792
Array
8/12/2019 PDF SecretsPDF Secrets
66/129
Syntax: >
Object 1 sets:1. /Pages to 2 0 RObject 2 sets:1. /Kids to [3 0 R]2. /Count to 13. /Type to /Pages
Dictionaries
8/12/2019 PDF SecretsPDF Secrets
67/129
/Pages 2 0 Ris equivalent to/Pages >
and then 3 0 R is replaced too...
Object reference - example 2
8/12/2019 PDF SecretsPDF Secrets
68/129
Binary streamsparameters, filters...
8/12/2019 PDF SecretsPDF Secrets
69/129
syntax:1. usual object declaration
2. parameters dictionary
3. stream+ return character
4. stream data
5. endstream+ return character6. usual endobjstream data is not interpreted
(at object level)
Streams
8/12/2019 PDF SecretsPDF Secrets
70/129
object 4 stream parameters
/Filter = /FlateDecode
/Length = 57 stream content (binary)
xsRPw3T044B BH-_ !0
Example
8/12/2019 PDF SecretsPDF Secrets
71/129
Binary streams
can be stored with different encodings /Filter encodings can be cascaded
content is decoded after each filter only the final data matters
8/12/2019 PDF SecretsPDF Secrets
72/129
Streams dont enforce
encodingsas long as the result is correct
once decoded by the filters
8/12/2019 PDF SecretsPDF Secrets
73/129
>
streamBT /F1 110 Tf 10 400 Td (Hello World!) Tj
ETendstream
>streamx sRPw3T044 B BH - _ !0endstream
these 2 streams are equivalent,
just using a different encoding
8/12/2019 PDF SecretsPDF Secrets
74/129
8/12/2019 PDF SecretsPDF Secrets
75/129
Main filters
: direct raw binary in the file /FlateDecode : ZIPs deflate decompression
smaller /ASCIIHexDecode: turns hex into binary
41 0A A\n easy text editing (but binary is very common)
mutool has a specific option for that
8/12/2019 PDF SecretsPDF Secrets
76/129
Images /DCTDecode to store JPEG files directly
not just the data, even the header!
JPEG2000, Fax
Encryption
Crypt RC4 or AES
Other filters
8/12/2019 PDF SecretsPDF Secrets
77/129
Lets put it all together how is the file actually parsed?
%PDF-1.1%
1 0 bj
8/12/2019 PDF SecretsPDF Secrets
78/129
Parsing 1/7
1. Signature is checked
1 0 obj>endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>streamBT /F1 110 Tf 10 400 Td (Hello World!) TjETendstreamendobj
xref 0 5
0000000000 65535 f0000000016 00000 n0000000051 00000 n0000000109 00000 n0000000281 00000 n
trailer >
startxref 384%%EOF
%PDF-1.1%
1 0 obj
8/12/2019 PDF SecretsPDF Secrets
79/129
Parsing 2/7
2. %%EOF is located
1 0 obj>endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>streamBT /F1 110 Tf 10 400 Td (Hello World!) TjETendstreamendobj
xref 0 5
0000000000 65535 f0000000016 00000 n0000000051 00000 n0000000109 00000 n0000000281 00000 n
trailer >
startxref 384%%EOF
%PDF-1.1%
1 0 obj
8/12/2019 PDF SecretsPDF Secrets
80/129
Parsing 3/7
3. xref is located via startxref
1 0 obj>endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>streamBT /F1 110 Tf 10 400 Td (Hello World!) TjETendstreamendobj
xref 0 5
0000000000 65535 f0000000016 00000 n0000000051 00000 n0000000109 00000 n0000000281 00000 n
trailer >
startxref 384%%EOF
8/12/2019 PDF SecretsPDF Secrets
81/129
8/12/2019 PDF SecretsPDF Secrets
82/129
%PDF-1.1%
1 0 obj
8/12/2019 PDF SecretsPDF Secrets
83/129
Parsing 6/7
6. objects are parseda. /Root object contains /Pagesb. /Pages contains page array
/Kidsc. each /Page has:
size: /MediaBox /Contents
as stream object
/Resources define /Font dictionary
1 0 obj>endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>streamBT /F1 110 Tf 10 400 Td (Hello World!) TjETendstreamendobj
xref 0 5
0000000000 65535 f0000000016 00000 n0000000051 00000 n0000000109 00000 n0000000281 00000 n
trailer >
startxref 384%%EOF
%PDF-1.1%
1 0 obj
8/12/2019 PDF SecretsPDF Secrets
84/129
7. the page is rendereda. BT BeginTextb. Tf select fontc. Td move cursor d. Tj display stringe. ET EndText
Parsing 7/71 0 obj>endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>streamBT /F1 110 Tf 10 400 Td (Hello World!) TjETendstreamendobj
xref 0 5
0000000000 65535 f0000000016 00000 n0000000051 00000 n0000000109 00000 n0000000281 00000 n
trailer >
startxref 384%%EOF
BT /F1 110 Tf 10 400 Td (Hello World!) TjET
8/12/2019 PDF SecretsPDF Secrets
85/129
In practice
that was the strict minimum a typical PDF embeds more information
fonts fonts encoding metadata
a generated Hello World typically weights >5 Kb
8/12/2019 PDF SecretsPDF Secrets
86/129
In practice - in the malware world
most readers accept malformed files many elements missing
EOF, startxref, xref, /Length, endobj, endstream /MediaBox /Font
each reader has its own weirdness see my Schizophrens talks and PoCs
so much for the so-called standard
8/12/2019 PDF SecretsPDF Secrets
87/129
%PDF-\0 1 0 obj2 0objstream \nBT/F1 105 Tf 0 400 Td
(Hello Adobe!)Tj ETendstream \nendobj \ntrailer
a Hello World for Adobe, in 179 bytes
8/12/2019 PDF SecretsPDF Secrets
88/129
Conclusion
weve covered the basics of: file structure objects relation file parsing page rendering
enough to play with PDF internals!
8/12/2019 PDF SecretsPDF Secrets
89/129
A technical perspectivePart III / III
8/12/2019 PDF SecretsPDF Secrets
90/129
8/12/2019 PDF SecretsPDF Secrets
91/129
Easy PDF editing
1. decompress streams PDFTk , qpdf optional: use ASCIIHex to get an ASCII-only file
2. open in text editor 3. view results via Sumatra
overwrite, or comment (dont delete) no offset to adjust D:\>pdftk "PDF Secrets.pdf" output uncompressed.pdf uncompress
D:\>qpdf --qdf "PDF Secrets.pdf" uncompressed.pdf
8/12/2019 PDF SecretsPDF Secrets
92/129
Reminder
technically speaking, a PDF page is:1. a stream object2. as the /Contents of a /Type /Page object3. in the /Kids array of a /Type /Pages object4. as the value of /Pages in root object5. as the value of /Root in the trailer
and a text on the page is a simple ( string ) Tj
8/12/2019 PDF SecretsPDF Secrets
93/129
Remove a page ?
easy hiding1. remove reference from /Kids2. write it back later
8/12/2019 PDF SecretsPDF Secrets
94/129
locate the /Kids array
8/12/2019 PDF SecretsPDF Secrets
95/129
Edit out your pages reference
8/12/2019 PDF SecretsPDF Secrets
96/129
and dont forget to update the pages /Count
(may lead to funny results)
8/12/2019 PDF SecretsPDF Secrets
97/129
tools such as PDFtk can operate on pages
but:
they dont erase pages! they extract the other pages the whole page is lost
but the image contents (as objects) are still left!and extractable!!
Erasing a page with a tool
D:\>pdftk "PDF Secrets.pdf" cat 1-3 5-end output no4.pdf
8/12/2019 PDF SecretsPDF Secrets
98/129
Erase overlapping element?
remove paint/text operators from binary stream
Hint:overlapping elements might beat the end of the stream,as they were likely added last
8/12/2019 PDF SecretsPDF Secrets
99/129
paint operators(PDF 32000-1:2008, page 135)
8/12/2019 PDF SecretsPDF Secrets
100/129
text showing operators(PDF 32000-1:2008, page 250-251)
8/12/2019 PDF SecretsPDF Secrets
101/129
Example:
manually removeoverlapping elements
8/12/2019 PDF SecretsPDF Secrets
102/129
take the uncompressed PDFlocate the /Contents stream object
locate the S (S troke path)(you can search for \nS\n )
8/12/2019 PDF SecretsPDF Secrets
103/129
erase the S no more black border
8/12/2019 PDF SecretsPDF Secrets
104/129
locate the f (path Filling)
8/12/2019 PDF SecretsPDF Secrets
105/129
no more gray surface
8/12/2019 PDF SecretsPDF Secrets
106/129
8/12/2019 PDF SecretsPDF Secrets
107/129
no more hidden elements!
bonus: the operation can be easily automated!
(on all pages, etc)
8/12/2019 PDF SecretsPDF Secrets
108/129
Page size tricks
a page isnt just a /MediaBox :( PDF is not so simple!
CropBox/BleedBox/TrimBox/ArtBox/...
What you see is /CropBox Copy/Paste and (some) pdftotext respect that
what is in Mediabox (but not CropBox)is not extracted
8/12/2019 PDF SecretsPDF Secrets
109/129
disable /CropBox to see the full contents
8/12/2019 PDF SecretsPDF Secrets
110/129
OS-X actually does a /CropBox when you copy/paste out of a PDF,
and you can see the full original content by rotating the page.
8/12/2019 PDF SecretsPDF Secrets
111/129
d bl h d
8/12/2019 PDF SecretsPDF Secrets
112/129
A more deniable hiding
altering /Kids or the pages /Contents work,
but there is another elegant solution:incremental updates
8/12/2019 PDF SecretsPDF Secrets
113/129
E l
%PDF-1.1%
1 0 obj>
d bj
8/12/2019 PDF SecretsPDF Secrets
114/129
Example
a confidential objectwith a secret stream object 4to be hidden
endobj
2 0 obj>endobj
3 0 obj >> /Contents 4 0 R /Type /Page >>endobj
4 0 obj>stream
BT /F1 120 Tf 10 400 Td (Top Secret) TjETendstreamendobj
xref0 50000000000 65535 f0000000016 00000 n0000000052 00000 n0000000110 00000 n0000000282 00000 n
trailer >
startxref385%%EOF
N /C
8/12/2019 PDF SecretsPDF Secrets
115/129
New /Contents
append a new object 4 4 0 obj>streamBT /F1 110 Tf
10 400 Td (Hello World!) TjETendstreamendobj
8/12/2019 PDF SecretsPDF Secrets
116/129
E il 1/2
8/12/2019 PDF SecretsPDF Secrets
117/129
Extra trailer 1/2
same /Size & /Root references the previous xref via /Prev
(not the previous trailer)trailer >
E il 2/2
8/12/2019 PDF SecretsPDF Secrets
118/129
Extra trailer 2/2
points to the new xref
startxref654%%EOF
R lt
8/12/2019 PDF SecretsPDF Secrets
119/129
Result
different content !
restore content by cutting after the first %%EOF
I t l d t t hid
8/12/2019 PDF SecretsPDF Secrets
120/129
Incremental update to hide page
use the same trickto override /Type /Pages
%%EOF
1 0 obj>endobj
xref0 10000000000 65535 f1 1
0000118783 00000 ntrailer >
startxref118849%%EOF
A t l l k i th ild ?
8/12/2019 PDF SecretsPDF Secrets
121/129
Actual leaks in the wild ?
in any PDF with/Prev in the trailer :restore each intermediate versionby truncating after each %%EOF
8/12/2019 PDF SecretsPDF Secrets
122/129
incremental PDF found in the wild
(removed parts, incorrect page number)
8/12/2019 PDF SecretsPDF Secrets
123/129
Printed USA
Copy/Paste corruption
8/12/2019 PDF SecretsPDF Secrets
124/129
some files produced corrupted text whencopying(mentioned in the first part)
this is due to fonts: /Subtype /Type3 with no/To Unicod e mapping
Copy/Paste corruption
8/12/2019 PDF SecretsPDF Secrets
125/129
Conclusion
Conclusion
8/12/2019 PDF SecretsPDF Secrets
126/129
Conclusion
the PDF file format is awkward not too complex if you just want to hide/reveal secrets
be careful when removing sensitive elements! quite easy to check if elements are still removed or not overlapping DOESNT work
hiding and recovering elements is easy content is still there!
Suggestions?
8/12/2019 PDF SecretsPDF Secrets
127/129
Suggestions?
Im interested in: hiding technics automated revealing technics
documents that are a pain to rebuild split fonts in small paths ? licensed fonts are converted to glyphs
no more text
8/12/2019 PDF SecretsPDF Secrets
128/129
8/12/2019 PDF SecretsPDF Secrets
129/129
@angealbertini
corkami.c om
http://www.corkami.com/http://www.corkami.com/https://twitter.com/angealbertini