Upload
-
View
229
Download
0
Embed Size (px)
DESCRIPTION
A detailed description/walkthrough of a famous challenge at HackThisSite.
Citation preview
My Journey to Cracking
Steganography Mission 15
at HackThisSite
by Ivan Ivanov Petrov (Keeper)
FIRST EDITION
About the underground
The website was founded by Jeremy Hammond in the late 2003. For a long time, its been a
subject to many different organizations trying to gain control over it and destroy the general
community.
In November 2004 the (now defunct) HackThisSite-based HowDark Security Group notified the phpBB Group, makers of the phpBB bulletin software, of a serious vulnerability in the product. The vulnerability
was kept under wraps while it was brought to the attention of the phpBB admins, who after reviewing,
proceeded to downplay its risks. Unhappy with the Groups' failure to take action, HowDark then
published the bug on the bugtraq mailing-list. Malicious users found and exploited the vulnerability
which led to the takedown of several phpBB-based bulletin boards and websites. Only then did the
admins take notice and release a fix. Slowness to patch the vulnerability by end-users led to an
implementation of the exploit in the Perl/Santy worm (read full article) which defaced upwards of 40,000
websites and bulletin boards within a few hours of its release.
- Wikipedia, the free encyclopedia
The community is dedicated to facilitating an open learning environment by providing a series
of hacking challenges, articles, resources, and discussion of the latest happenings in hacker
culture. An online movement of artists, activists, hackers and anarchists who are organizing to
create new worlds.
Considering that several of the hacking challenges are simulated web defacements, the
question of the ethics of hacking is repeatedly brought up. They consider hacking itself to be a
tool, a skill which in itself is neutral, a means without end. It can be used for good (for the
benefit of all) or bad (mindless destruction or theft). They do not encourage negative use of the
information we provide. They are more concerned with the greater risks of not distributing this
information and are ready to accept the consequences.
About Steganography Mission 15
Starting off from the very beginning, the mission originally had a fairly simple solution until
there was a followed-up update of the entire challenge which altered the concept entirely. The
mission drew attention due to the fact that many famous and not so well-known
steganographers have tried to figure out the notion behind it but none has been able to so far.
Ever since the year of 2008, the challenge has only been solved by eighteen people worldwide
(whose origin is unknown up to now). Some state that few of those were the very
administrators of the website whose hands get to know the answer to every submitted
challenge on the board. Others are inclined to believe that the solvings are a result of extensive
exhausted search attacks (a.k.a brute-force attacks).
My involvement in this mission started back in 2012 when I first had the chance of getting
introduced to steganography. At first, I thought there wasnt anything special about it but soon
after I took it on a higher level and was unable of solving it, I found out that it was an
underground competition.
Before we proceed with any
further talk, let us bring out
the foremost details that
need be mentioned.
Beginning with the image
itself in the first place:
The steganographied image
has a divided IDAT structure
of 12 blocks (the last LSB
slightly smaller) (.PNG). The
data seems to have been
concealed by altering
the enhanced LSB values, eliminating the high-level bits for each pixel except for the last least
significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give
any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. Initial
analyzes on the image did not show anything in specific or rather odd beyond the utter lack of
one value in any of the three color values (RGB) and the heightened presence of another value
in one third of the color values. Studying these and replacing bytes has given me nothing,
however, and I was at a loss as to whether this avenue is even worth pursuing at all.
Hence, I looked into developing a script in rather Python, PHP or C/C++ that would reverse the
process and 'restore' the enhanced LSBs. Automating the process guarantees a higher
percentage of success rate since a number of different analyses are being carried in a matter of
seconds whereas it would take quite a while for a single person to conduct these experiments.
Converting the image to a 24-bit .BMP and tracking down the red curve from a chi-square
steganalysis, it is certain that there is a steganographied data within the file therefore nothing
has been or will be at vain.
First, there is a little bit more than 8 vertical zones. That means that the hidden data is a little
bit more than 8kB in size. One pixel can be used to hide three bits (one in the LSB of each RGB
color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and
by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. However, that ain't
the case here.
The analysis of the APPO and APP1 markers of a .JPG extension of the file also gave some
awkward outputs:
Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
OFFSET: 0x00000000
*** Marker: APP0 (xFFE0) ***
OFFSET: 0x00000002
length = 16
identifier = [JFIF]
version = [1.1]
Chi-square analysis (Java module)
Chi-square analysis (Batch module)
density = 96 x 96 DPI (dots per inch)
thumbnail = 0 x 0
*** Marker: APP1 (xFFE1) ***
OFFSET: 0x00000014
length = 58
Identifier = [Exif]
Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]
Endian = Motorola (big)
TAG Mark x002A = x[002A]
EXIF IFD0 @ Absolute x[00000026]
Dir Length = x[0003]
[IFD0.x5110 ] =
[IFD0.x5111 ] = 0
[IFD0.x5112 ] = 0
Offset to Next IFD = [00000000]
*** Marker: DQT (xFFDB) ***
Define a Quantization Table.
OFFSET: 0x00000050
Table length = 67
----
Precision=8 bits
Destination ID=0 (Luminance)
DQT, Row #0: 2 1 1 2 3 5 6 7
DQT, Row #1: 1 1 2 2 3 7 7 7
DQT, Row #2: 2 2 2 3 5 7 8 7
DQT, Row #3: 2 2 3 3 6 10 10 7
DQT, Row #4: 2 3 4 7 8 13 12 9
DQT, Row #5: 3 4 7 8 10 12 14 11
DQT, Row #6: 6 8 9 10 12 15 14 12
DQT, Row #7: 9 11 11 12 13 12 12 12
Approx quality factor = 94.02 (scaling=11.97 variance=1.37)
Being nearly convinced that there is no encryption algorithm applied therefore no key
implementation follows the concealment - my notion is that of coding a script that would shift
the LSB values and return the originals. The file was run under several structure analyses,
statistical attacks, BPCS and a few others.
The histogram of the image shows a specific
color with an unusual spike to it. I
manipulated that as best I can to try and
view any hidden data, but to no avail. Those
are the histograms of the RGB values as
follows:
Then there are the multiple IDAT chunks. I
did put together a similar image by defining
random color values at/for each pixel
location, and I too wound up with several of
these. Unfortunately, very little was found inside of them. Even more interesting is the way that
color values are repeated in the image. It seems as though the frequency of reused colors could
hold some clue. Yet did not fully understand that relationship, if any exists at all. Additionally,
there is only a single column and a single row of pixels that do not possess a full value of 255 on
their alpha channel. I even interpreted the X, Y, A, R, G, and B values of every pixel in the image
as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs
cannot tell us anything. There is no evident break. Here are several other histograms which
show the weird curve of the blue value from the RGB:
The red curve shows some
difference. It can see
something that we cannot
spot (yet). Statistical detection
is more sensitive than our
eyes, and I guess that was my
final point. However, there is
also a sort of latency in the
red curve. Even without
hidden data, it starts at
maximum and stays like that
for some time. It is close to a
false positive state. Looks like
the LSB in the image is very
close to random, and the
algorithm needs a large
population (keep in mind that
the analysis was carried on a consistently incrementing population of pixels) until abutting upon
a threshold where the choice was to be made whether the red curve has to go down or up
depending on the state of pixels (which are never randomized). The same sort of latency
happens occurs in the occasion of hidden data. You hide 1kB or 2kB of data, but the red curve
does not pay attention to that and alters not its direction after this amount of data. It waits a
little bit (and in our situation - respectively at around 1.3kB and 2.6kB. Here is a representation
of the data types from a hex editor:
Here's another spectrum to confirm the behavior of the blue (RGB) value. Notice the sudden
curve at the beginning.
As mentioned above, there is no
evident clue of the original
values of the RGB alpha channel.
They are either set to 255 or 0
depending on their Least
Significant Bit. The other option
that was in my mind at that
moment was that the mission
was intended to implement a
protocol for the usage of
quantum steganography. Matlab
and a few other steganalysis techniques seem tempting but to a certain degree. The only
steganalysis attack that can reveal whether there is anything concealed in terms of eLSB
technique is the chi-square. As for Matlab, the tools it offers are of no great use since they are
restricted to what the user supplies as information and we currently have none valid. In
particular, I could easily reverse the process by pulling the least significant bit from every pixel
channel, group them into words of 8 bits and convert back to text. However, that is if I knew
the key or variable used for the layer encryption.
Protocols such as those for hiding quantum information in a codeword of a quantum error-
correcting code passing through a channel are more likely to be the case. Meaning that I cannot
(it is impossible to) eavesdrop simply with the power to monitor the channel, but without the
secret key, cannot distinguish the message from the channel noise. In other words there must
be something other besides this that is the case which I have yet not found. Also noise would
not only refer to the visual representation of the file. It could be related to a hex dump or
whatnot - any unreadable/corrupted data as a whole.
The idea here behind eLSB shifting is that each pixel is being replaced with a different value and
hence makes the image totally unrecognizable. It is called enhanced because we are eliminating
the high-level bits for every pixel except for the last LSB one and this is the case where we can
most often evaluate the layer by looking into the structure of an image and following let us say
an IDAT of 9 blocks, last LSB will be either smaller or equal to the previous bits (rarely equal in
fact) which means that the previous ones have been altered and there's literally no room for
the last LSB.
One of the few techniques that can be used to detect eLSB steganography (and actually
differentiate it from quantums) is statistical analyses. The chi-square module represents the
following data as shown below.
The program will output a graph with two curves. The first one in red is the result of the chi-square test. If it's close to one, than the probability for a random embedded message is high. So, if there is a
random message embedded, this green curve will stay around 0.5. On the graph network, every
vertical blue line represents 1 kilobyte of embedded data.
- Somnium, a.k.a. Guillermito
This is a sample representation of how the LSBs are being enhanced and set to either 255 or 0.
Basically, the noise level depends on how much data we want to steganography and of course
the size of the image, the color capacity etc.
Now let us say there is some sound or whatever audio file meddled. If we are good enough with
steganography you could mix up both eLSB and audio rendering of an image and come up with
an incredibly secure layer. Consider we have a file calledfuke.wav which is somewhat altered
and has some data within it. One of the ways to check for anything specific or whatever is to
put the file under a frequency analysis and see whether there is something worth pursuing.
First let's see a temporal analysis alongside a TFFT. Actually, the only difference between a FFT
and a temporal analysis is that the TFFT studies both the time and frequency of the signal while
the FFT one only the signal itself (in other words we need to define a spectrum in order to see
the temporal frequencies).
If that does not suit us, we can use sox for Linux boxes and generate a similar spectrum. Note
that sox works only with .wav files (which is pretty much the extension that most software
worships). Now to output a spectrum we do the following:
Code:
We may have to use a converter like ffmpeg or similar to alter the extension if we have
previously generated a different one than .wav. And so we end up with the following:
Similar to that spectrogram are the following. The first one of which is with a dBV^2 scale on a
1024-bit window at 85%+- and the second one a linear scale and a 2048-bit window at 90%+
with a log bin. Quite better visible as we can plainly see. Same would refer if we manage to
scale the sox spectrogram and manipulate it as best as we can but I frankly do not think sox
offers such possibilities.
Frankly speaking there is a lot of software for
embedding and extracting data but none is
actually efficient when it comes to reversing
the process. In this case the only possible way
to reverse it will be to pull the least significant
bit from every pixel channel, group them into
words of 8 bits and convert back to text but
that would only be possible if we had any clue
on which pixels have been altered (which we do
not possess as mentioned earlier). Matlab, however, is not the only possibility we are left with.
There are numerous software distributions for that purpose though a lot of people who are
capable or have been capable of reaching to this point will be experienced enough to code their
own script for such purpose (even though being less optimized and functional).
That being said, the ultimate mission remains a mystery which has lead me to no avail. The
avenues one could pursue throughout this challenge are literally more than an experienced
steganographers can imagine.