My Journey to Cracking Steganography Mission 15

My Journey to Cracking

Steganography Mission 15

at HackThisSite

by Ivan Ivanov Petrov (Keeper)

FIRST EDITION

About the underground

The website was founded by Jeremy Hammond in the late 2003. For a long time, its been a

subject to many different organizations trying to gain control over it and destroy the general

community.

In November 2004 the (now defunct) HackThisSite-based HowDark Security Group notified the phpBB Group, makers of the phpBB bulletin software, of a serious vulnerability in the product. The vulnerability

was kept under wraps while it was brought to the attention of the phpBB admins, who after reviewing,

proceeded to downplay its risks. Unhappy with the Groups' failure to take action, HowDark then

published the bug on the bugtraq mailing-list. Malicious users found and exploited the vulnerability

which led to the takedown of several phpBB-based bulletin boards and websites. Only then did the

admins take notice and release a fix. Slowness to patch the vulnerability by end-users led to an

implementation of the exploit in the Perl/Santy worm (read full article) which defaced upwards of 40,000

websites and bulletin boards within a few hours of its release.

- Wikipedia, the free encyclopedia

The community is dedicated to facilitating an open learning environment by providing a series

of hacking challenges, articles, resources, and discussion of the latest happenings in hacker

culture. An online movement of artists, activists, hackers and anarchists who are organizing to

create new worlds.

Considering that several of the hacking challenges are simulated web defacements, the

question of the ethics of hacking is repeatedly brought up. They consider hacking itself to be a

tool, a skill which in itself is neutral, a means without end. It can be used for good (for the

benefit of all) or bad (mindless destruction or theft). They do not encourage negative use of the

information we provide. They are more concerned with the greater risks of not distributing this

information and are ready to accept the consequences.

About Steganography Mission 15

Starting off from the very beginning, the mission originally had a fairly simple solution until

there was a followed-up update of the entire challenge which altered the concept entirely. The

mission drew attention due to the fact that many famous and not so well-known

steganographers have tried to figure out the notion behind it but none has been able to so far.

Ever since the year of 2008, the challenge has only been solved by eighteen people worldwide

(whose origin is unknown up to now). Some state that few of those were the very

administrators of the website whose hands get to know the answer to every submitted

challenge on the board. Others are inclined to believe that the solvings are a result of extensive

exhausted search attacks (a.k.a brute-force attacks).

My involvement in this mission started back in 2012 when I first had the chance of getting

introduced to steganography. At first, I thought there wasnt anything special about it but soon

after I took it on a higher level and was unable of solving it, I found out that it was an

underground competition.

Before we proceed with any

further talk, let us bring out

the foremost details that

need be mentioned.

Beginning with the image

itself in the first place:

The steganographied image

has a divided IDAT structure

of 12 blocks (the last LSB

slightly smaller) (.PNG). The

data seems to have been

concealed by altering

the enhanced LSB values, eliminating the high-level bits for each pixel except for the last least

significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give

any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. Initial

analyzes on the image did not show anything in specific or rather odd beyond the utter lack of

one value in any of the three color values (RGB) and the heightened presence of another value

in one third of the color values. Studying these and replacing bytes has given me nothing,

however, and I was at a loss as to whether this avenue is even worth pursuing at all.

Hence, I looked into developing a script in rather Python, PHP or C/C++ that would reverse the

process and 'restore' the enhanced LSBs. Automating the process guarantees a higher

percentage of success rate since a number of different analyses are being carried in a matter of

seconds whereas it would take quite a while for a single person to conduct these experiments.

Converting the image to a 24-bit .BMP and tracking down the red curve from a chi-square

steganalysis, it is certain that there is a steganographied data within the file therefore nothing

has been or will be at vain.

First, there is a little bit more than 8 vertical zones. That means that the hidden data is a little

bit more than 8kB in size. One pixel can be used to hide three bits (one in the LSB of each RGB

color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and

by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. However, that ain't

the case here.

The analysis of the APPO and APP1 markers of a .JPG extension of the file also gave some

awkward outputs:

Start Offset: 0x00000000

*** Marker: SOI (xFFD8) ***

OFFSET: 0x00000000

*** Marker: APP0 (xFFE0) ***

OFFSET: 0x00000002

length = 16

identifier = [JFIF]

version = [1.1]

Chi-square analysis (Java module)

Chi-square analysis (Batch module)

density = 96 x 96 DPI (dots per inch)

thumbnail = 0 x 0

*** Marker: APP1 (xFFE1) ***

OFFSET: 0x00000014

length = 58

Identifier = [Exif]

Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]

Endian = Motorola (big)

TAG Mark x002A = x[002A]

EXIF IFD0 @ Absolute x[00000026]

Dir Length = x[0003]

[IFD0.x5110 ] =

[IFD0.x5111 ] = 0

[IFD0.x5112 ] = 0

Offset to Next IFD = [00000000]

*** Marker: DQT (xFFDB) ***

Define a Quantization Table.

OFFSET: 0x00000050

Table length = 67

----

Precision=8 bits

Destination ID=0 (Luminance)

DQT, Row #0: 2 1 1 2 3 5 6 7

DQT, Row #1: 1 1 2 2 3 7 7 7

DQT, Row #2: 2 2 2 3 5 7 8 7

DQT, Row #3: 2 2 3 3 6 10 10 7

DQT, Row #4: 2 3 4 7 8 13 12 9

DQT, Row #5: 3 4 7 8 10 12 14 11

DQT, Row #6: 6 8 9 10 12 15 14 12

DQT, Row #7: 9 11 11 12 13 12 12 12

Approx quality factor = 94.02 (scaling=11.97 variance=1.37)

Being nearly convinced that there is no encryption algorithm applied therefore no key

implementation follows the concealment - my notion is that of coding a script that would shift

the LSB values and return the originals. The file was run under several structure analyses,

statistical attacks, BPCS and a few others.

The histogram of the image shows a specific

color with an unusual spike to it. I

manipulated that as best I can to try and

view any hidden data, but to no avail. Those

are the histograms of the RGB values as

follows:

Then there are the multiple IDAT chunks. I

did put together a similar image by defining

random color values at/for each pixel

location, and I too wound up with several of

these. Unfortunately, very little was found inside of them. Even more interesting is the way that

color values are repeated in the image. It seems as though the frequency of reused colors could

hold some clue. Yet did not fully understand that relationship, if any exists at all. Additionally,

there is only a single column and a single row of pixels that do not possess a full value of 255 on

their alpha channel. I even interpreted the X, Y, A, R, G, and B values of every pixel in the image

as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs

cannot tell us anything. There is no evident break. Here are several other histograms which

show the weird curve of the blue value from the RGB:

The red curve shows some

difference. It can see

something that we cannot

spot (yet). Statistical detection

is more sensitive than our

eyes, and I guess that was my

final point. However, there is

also a sort of latency in the

red curve. Even without

hidden data, it starts at

maximum and stays like that

for some time. It is close to a

false positive state. Looks like

the LSB in the image is very

close to random, and the

algorithm needs a large

population (keep in mind that

the analysis was carried on a consistently incrementing population of pixels) until abutting upon

a threshold where the choice was to be made whether the red curve has to go down or up

depending on the state of pixels (which are never randomized). The same sort of latency

happens occurs in the occasion of hidden data. You hide 1kB or 2kB of data, but the red curve

does not pay attention to that and alters not its direction after this amount of data. It waits a

little bit (and in our situation - respectively at around 1.3kB and 2.6kB. Here is a representation

of the data types from a hex editor:

Here's another spectrum to confirm the behavior of the blue (RGB) value. Notice the sudden

curve at the beginning.

As mentioned above, there is no

evident clue of the original

values of the RGB alpha channel.

They are either set to 255 or 0

depending on their Least

Significant Bit. The other option

that was in my mind at that

moment was that the mission

was intended to implement a

protocol for the usage of

quantum steganography. Matlab

and a few other steganalysis techniques seem tempting but to a certain degree. The only

steganalysis attack that can reveal whether there is anything concealed in terms of eLSB

technique is the chi-square. As for Matlab, the tools it offers are of no great use since they are

restricted to what the user supplies as information and we currently have none valid. In

particular, I could easily reverse the process by pulling the least significant bit from every pixel

channel, group them into words of 8 bits and convert back to text. However, that is if I knew

the key or variable used for the layer encryption.

Protocols such as those for hiding quantum information in a codeword of a quantum error-

correcting code passing through a channel are more likely to be the case. Meaning that I cannot

(it is impossible to) eavesdrop simply with the power to monitor the channel, but without the

secret key, cannot distinguish the message from the channel noise. In other words there must

be something other besides this that is the case which I have yet not found. Also noise would

not only refer to the visual representation of the file. It could be related to a hex dump or

whatnot - any unreadable/corrupted data as a whole.

The idea here behind eLSB shifting is that each pixel is being replaced with a different value and

hence makes the image totally unrecognizable. It is called enhanced because we are eliminating

the high-level bits for every pixel except for the last LSB one and this is the case where we can

most often evaluate the layer by looking into the structure of an image and following let us say

an IDAT of 9 blocks, last LSB will be either smaller or equal to the previous bits (rarely equal in

fact) which means that the previous ones have been altered and there's literally no room for

the last LSB.

One of the few techniques that can be used to detect eLSB steganography (and actually

differentiate it from quantums) is statistical analyses. The chi-square module represents the

following data as shown below.

The program will output a graph with two curves. The first one in red is the result of the chi-square test. If it's close to one, than the probability for a random embedded message is high. So, if there is a

random message embedded, this green curve will stay around 0.5. On the graph network, every

vertical blue line represents 1 kilobyte of embedded data.

- Somnium, a.k.a. Guillermito

This is a sample representation of how the LSBs are being enhanced and set to either 255 or 0.

Basically, the noise level depends on how much data we want to steganography and of course

the size of the image, the color capacity etc.

Now let us say there is some sound or whatever audio file meddled. If we are good enough with

steganography you could mix up both eLSB and audio rendering of an image and come up with

an incredibly secure layer. Consider we have a file calledfuke.wav which is somewhat altered

and has some data within it. One of the ways to check for anything specific or whatever is to

put the file under a frequency analysis and see whether there is something worth pursuing.

First let's see a temporal analysis alongside a TFFT. Actually, the only difference between a FFT

and a temporal analysis is that the TFFT studies both the time and frequency of the signal while

the FFT one only the signal itself (in other words we need to define a spectrum in order to see

the temporal frequencies).

If that does not suit us, we can use sox for Linux boxes and generate a similar spectrum. Note

that sox works only with .wav files (which is pretty much the extension that most software

worships). Now to output a spectrum we do the following:

Code:

We may have to use a converter like ffmpeg or similar to alter the extension if we have

previously generated a different one than .wav. And so we end up with the following:

Similar to that spectrogram are the following. The first one of which is with a dBV^2 scale on a

1024-bit window at 85%+- and the second one a linear scale and a 2048-bit window at 90%+

with a log bin. Quite better visible as we can plainly see. Same would refer if we manage to

scale the sox spectrogram and manipulate it as best as we can but I frankly do not think sox

offers such possibilities.

Frankly speaking there is a lot of software for

embedding and extracting data but none is

actually efficient when it comes to reversing

the process. In this case the only possible way

to reverse it will be to pull the least significant

bit from every pixel channel, group them into

words of 8 bits and convert back to text but

that would only be possible if we had any clue

on which pixels have been altered (which we do

not possess as mentioned earlier). Matlab, however, is not the only possibility we are left with.

There are numerous software distributions for that purpose though a lot of people who are

capable or have been capable of reaching to this point will be experienced enough to code their

own script for such purpose (even though being less optimized and functional).

That being said, the ultimate mission remains a mystery which has lead me to no avail. The

avenues one could pursue throughout this challenge are literally more than an experienced

steganographers can imagine.

Documents

My Journey to Cracking Steganography Mission 15