46
Cracking the Code: Foundations of Cryptology A brief introduction to the underlying terms and concepts of cryptography and cryptanalysis Martina Weber

Cracking the Code: Foundations of Cryptology A brief introduction to the underlying terms and concepts of cryptography and cryptanalysis Martina Weber

Embed Size (px)

Citation preview

Cracking the Code:Foundations of Cryptology

A brief introduction to the underlying terms and concepts of cryptography and cryptanalysis

Martina Weber

Project Definition & Requirements

Design and implement tools that allow you to quickly crack XOR-encryption schemes.

General Requirements:– XOR-Encrypt a text using a key.– Given an encrypted message, produce the original

message.– Analyze the “quality” of various techniques and

solutions.– Create a Human Computer Interface for the system.

The Story Line

Alice needs to send a classified message to Bob, however, she does not want her archrival, Eve, to know the confidential information. Therefore Alice and Bob agree they will disguise their message by employing an encryption scheme with an agreed upon key - but Eve is clever and devious...

Defining the Terms

Plain text - the text that Alice wishes to transmit to Bob, in its original form

Cipher text - the result of Alice encrypting the text with the key

Decrypt - reconstructing the plaintext using the cipher text and the key

The Conventions

To distinctly identify the original text from the encoded text, plaintext characters will be delimited in lower case and cipher text characters in upper case.

Generally, it is standard to omit all punctuation and spaces from the plaintext. This is done to eliminate analysis based on sentence structure and word length in the cipher text.

Eve’s Attack: Cryptanalysis

At first, Eve is baffled, but then she realizes that Alice and Bob only know two encryption schemes. Better yet, Eve is confident in her abilities to crypt analyze these schemes and knows she will be able to crack the code.

The Encryption Schemes

Each plaintext character in a message is substituted with a unique alternate character to obtain the cipher text, thus any given letter of the alphabet is always enciphered by the same cipher text letter.

The plaintext message is encoded with a keyword of length m. Thus, a character in the original text can be mapped to any of the characters in the keyword to produce the cipher text.

Monoalphabetic Substitution Cipher Polyalphabetic Substitution Cipher

A Closer Look at Monoalphabetic Substitution Ciphers

When a monoalphabetic substitution cipher is used, there is a one-to-one correspondence between the characters in the plaintext and the characters in the cipher text

A Simple Example Using a Monoalphabetic Substitution Cipher

The following is the key used:

Example using the key:Plaintext: thisiseasyCipher text: CQRBRBNJBH

To decrypt, simply look up the encrypted character in the table and use the plaintext character listed directly above

Plaintext a b c d e f g h i j k l m n o p q r s t u v w x y zCiphertext J K L M N O P Q R S T U V W X Y Z A B C D E F G H I

A Closer Look at Polyalphabetic Substitution Ciphers

When a polyalphabetic substitution cipher is used, there is NO one-to-one correspondence between the characters in the plaintext and the characters in the cipher text; a character could have been encoded using any of the m letters of the keyword.

Understanding Polyalphabetic Substitution Ciphers will Require a “New” Alphabet...

Instead of using alphabetic characters, the new notation will be using the numerical position (0 to 25) of a given letter

For example, A = 0, B = 1, ..., Y = 24, Z = 25

ABCDEFGHIJKLMNOPQRSTUVWXYZ

ABCDEFGHIJKLMNOPQRSTUVWXYZ

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 25

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 25

...And a Modest Mathematical Background in Modular Arithmetic

(x mod m) is evaluated as the remainder when dividing x by m

Modular arithmetic ([x+y] mod m) is performed by first adding x and y and the reducing the result modulo m. Adding two numbers in the range 0 to m-1 will yield a number in the range 0 to m-1

Examples:

+

÷

=

6 mod 3 = 0 5 mod 3 = 220 mod 7 = 6 10 mod 7 = 3~~~~~~~~~~~~~~~~~~~~~~

Let m = 26

(7+8) mod 26 = 15

(20 + 6) mod 26 = 0

(17 + 11) mod 26 = 2

(23 + 25) mod 26 = 22

Here is a “Cheat Sheet” for Arithmetic Modulo 26

Addition mod 26 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 251 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 02 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 13 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 24 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 35 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 46 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 57 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 68 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 79 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8

10 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 911 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 1012 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 1113 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 1214 14 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 1315 15 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1416 16 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1517 17 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1618 18 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1719 19 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1820 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1921 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2022 22 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2123 23 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2224 24 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2325 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A Simple Example Using a Polyalphabetic Substitution Cipher

First take the text to be encoded and convert it character by character into the respective numerical equivalent.

Then choose a key to use on the text. Convert this key into its numerical representation as well.

Next, write the converted key above the converted plaintext, repeating it as necessary, and add the characters together character by character, modulo 26.

Finally, convert the encoded numerical text back to alphabetic text (if you so wish)

Example continued...

A Simple Example Using a Polyalphabetic Substitution Cipher

Step One: Converting plaintext to numerical equivalent

Step Two: Converting key to numerical equivalent

Step Three: Adding the plaintext with the key, modulo 26

Step Four: Converting cipher text to its alphabetic equivalent

Plaintext h a v i n g f u n y e tNumerical Equivalent 7 0 21 8 13 6 5 20 13 24 4 19

Key y e sNumerical Equivalent 24 4 18

Plaintext 7 0 21 8 13 6 5 20 13 24 4 19+ Key 24 4 18 24 4 18 24 4 18 24 4 18= 31 4 39 32 17 24 29 24 31 48 8 37Mod 26 5 4 13 6 17 24 3 24 5 22 8 11

...Example continued...

Ciphertext 5 4 13 6 17 24 3 24 5 22 8 11Alphabetic Ciphertext F E N G R Y C Y F W I L

Thus, the plaintext “havingfunyet” is encoded with the key “yes” to the cipher text “FENGRYCYFWIL”

Had Alice actually sent this message to Bob, he would decode it using the inverse procedure: subtract the key from the cipher text mod 26

Note that subtracting in modular 26 means adding the additive inverse of an element. The inverse of a number x can be found by taking 26 - x. The results of this can be seen in the “key” row of the above decryption.

A Simple Example Using a Polyalphabetic Substitution Cipher

...Example continued

Ciphertext 5 4 13 6 17 24 3 24 5 22 8 11- Key 2 22 8 2 22 8 2 22 8 2 22 8= 7 26 21 8 39 32 5 46 13 24 30 19Mod 26 7 0 21 8 13 6 5 20 13 24 4 19Plaintext h a v i n g f u n y e t

Initial Assumptions

Assumptions about the Language:

– The plaintext will be based on the English language

– When doing frequency analysis to determine the key used, I will assume the key is an actual word

Assumptions about the Method Used:

– I will be doing analysis on the XOR polyalphabetic substitution cipher

– XOR encryption can be considered addition mod 26 as used previously in the example (i.e. A = 0, B = 1, ..., Z = 25)

A Note on the Method

Using addition mod 26 (instead of converting letters to binary representations and doing XOR bit-by-bit) does not take away from the learning experience. This is because in this type of cryptanalysis, the algorithm analyzes a character at a time without regard to the actual character, noting only that it is a distinct character. The addition mod 26 will simply provide an easier medium for both myself and peers to understand and convey the information as we can talk about specific characters and not be concerned with abnormal or unprintable characters that would otherwise be obtained in the XOR encryption.

Exploiting the Weaknesses

After Eve determines that Alice used a polyalphabetic cipher (after all, a monoalphabetic substitution cipher is too simple, even for Alice), she remembers the strategy for cracking the code: find the key length and then use a frequency analysis to determine either the plaintext or the key used for encryption.

Applicable Theories and Terms

Kasiski Test: In a polyalphabetic cipher text message, two identical segments of plaintext will be encrypted to the same cipher text whenever their occurrence in the plaintext is a multiple of the length of the keyword; therefore if a string of characters appears repeatedly in the cipher message, it is possible that the distance between the occurrences is a multiple of the length of the keyword

Friedman Test: Used to determine whether a cipher text has been enciphered using a monoalphabetic or polyalphabetic substitution cipher. If the cipher used is polyalphabetic, the text also suggests the length of the keyword using the Index of Coincidence

Continued...

Index of Coincidence: The probability of two letters randomly selected from a text being equal– The expected frequencies of the letters A through Z in the

English language are known. Using these probabilities, the index of coincidence for the language is approximately 6.5%. Hence, if two letters are arbitrarily chosen from an English text, nearly 6.5% of the time the letters would be the same.

– In a purely random text, the letters would occur with roughly the same frequency, resulting in the index of coincidence being about 3.8%.

Applicable Theories and Terms

...Continued

Conventions and Abbreviations Employed

n = the length of the cipher text being crypt analyzed

IC = Index of Coincidence, as discussed previously and represented by the following formula

IC = n*(n - 1)

Σi = 0

25

fi*(fi - 1)

Where fi represents the frequency of the respective alphabetic character in the cipher text

Let the Code Breaking Begin...

Armed with this bank of knowledge, Eve can proceed to crypt analyze Alice’s message to Bob. What are the methods she can use and how effective are the various techniques? What is the best approach?

And Now Onto the Fun Part...

Applying the theories and principles!

Determining the Key Length

I employed four distinct, yet related algorithms for finding the key length. These algorithms are outlined on the following slides.

Note: These algorithms can stand alone, however, for increased accuracy, they can be combined

Algorithm One: “Plug it in”

Simply plug data into the following formula:

Key Length =

Where n is the length of the text and IC is the Index of Coincidence for a specific text

0.027*n

(n-1)*IC - 0.038*n + 0.065

(Formula taken from Cryptology by Albrecht Beutelspacher, page 39.)

Algorithm Two: Shift and Count

1. Make a duplicate copy of the cipher text.

2. Align the copy under the original, only

shifted by x places.

3. Record x and the number of coincidences. (i.e. where the letters match)

4. Increase x and go to step two.

5. The shift with the most coincidences is a

likely guess for the key length.

(Algorithm taken from Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington, page 19.)

Algorithm Three: Friedman Test

1. for m = 1 to n

2. Fill ROWS of rectangular array with dimensions

m x (n/m) with consecutive substrings from the cipher text of length m.

3. Compute the IC of each COLUMN.

4. Find the average of all the column IC’s.

5. If the average IC is approx 0.065, break and m is the likely keyword length. Else continue loop.

(Algorithm adapted from Cryptological Mathematics by Robert Edward Lewand, pages 90 - 92.)

Algorithm Four: Kasiski Test

1. Determine repeating strings of characters in

the cipher text (of length at least three).

2. Tabulate the distances between occurrences.

3. The probable key length is a divisor of the

greatest common divisor (GCD) of all the

distances.

(Algorithm adapted from Cryptological Mathematics by Robert Edward Lewand, pages 90 - 92, and Cryptography Theory and Practice by Douglas R. Stinson, page 31.)

Theory Behind the Kasiski Test

If a string of characters is repeated in a plaintext message at a distance apart which is equal to a multiple of the length of the keyword, then the cipher text representations of these characters will be identical in each occurrence

And the Winner is...

The most accurate is the Friedman Test, also the slowest algorithm

The Shift and Count algorithm is very accurate as well, taking less time than the Friedman Test

The “Plug it in!” algorithm runs the fastest, but is only accurate on small keys

The Kasiski Test almost always results in output of the correct key length or a multiple thereof, but how many possible lengths must the user try before finding the correct one?

Determining the Plain Text/Key

I used three distinct, yet related algorithms for finding the plain text/key. These algorithms are outlined on the following slides.

These algorithms all require the key length as input, by knowing the key length, the cipher text can be split into rows of that length. Looking down a column, all letters are encrypted by the same key letter - resulting in a Monoalphabetic Substitution cipher!

Algorithm One: Basic Frequency Analysis

1. Split text into rows of the same length as the key.

2. For each column, determine the frequencies of each letter.

3. Compare to expected English frequencies (these values are known and tabulated) and "guess" at encryption.

4. Repeat process on next column.

(Algorithm taken from Beutelspacher and Lewand.)

Algorithm Two: Permute through All Shifts

1. Split text into rows of the same length as the key.

2. For each column, determine the frequencies of each letter.

3. Take the dot product of the column frequencies with the every possible shift of the standard English alphabet frequencies.

4. The largest value is the most likely shift.

5. Repeat the process on the next column.

(Algorithm taken from Introduction to Cryptography with Coding Theory by Wade Trappe and Lawrence C. Washington, pages 22 - 23.)

1. Split text into rows of the same length as the key.

2. For each column, determine the frequencies of each letter.

3. Find all MIc of each column with every other column.

4. Search for the MIc's closest to .065, this yields the relative shift from column i to column j.

5. Form a system of equations and solve in terms of one key letter.

6. The keyword is a cyclic shift of the result.

Algorithm Three: Find Relative Shifts between Key Letters

(Algorithm taken from Stinson pages 33 - 36.)

Continued...

Algorithm Three: Find Relative Shifts between Key Letters

MI(c) is represented by the equation on the right, where n and m are the lengths of substrings f and h, fi is the frequency of letter i, and h i - g is the frequency of letter i - g where 0 <= g <= 25.

MIC (f, hg) =25

Σi = 0

fi*h(i - g)

n*m

...Continued

And the Winner is...

Permute through All Shifts Algorithm, logical winner since all possibilities are attempted

The Basic Frequency Analysis works okay for small key lengths

What about the Relative Shifts Algorithm?– I need far more computing power (or patience) to

test this algorithm.– Yields accurate results when the matrix can be

solved

Down the Road: Unaddressed Issues and Enhancements to Implement

When the key length is equal to the plaintext length and the key is perfectly random, this XOR encryption method is considered perfectly secure. But, does key length really have to equal the plaintext length for the encryption to be secure; where exactly is the critical point?

What if a random key is used instead of an actual word? How will this effect the frequency analysis to determine the key?

Down the Road: Unaddressed Issues and Enhancements to Implement

•I used a cipher text only attack (the only available resource to analyze is the encrypted cipher text). Consideration should be given to various types of attacks, such as cribbing (knowledge that a certain word(s) appears in the plaintext) and taking advantage of multiple cipher texts in which the same key was used (additional information is gained under these circumstances because you KNOW the keys are overlapping starting at the beginning of the cipher text - however, how do you determine initially that the same key was used?).

...Continued

My final code requires “slimming down” to increase efficiency.

A spell checker/dictionary could be added to increase accuracy– Instead of giving the user all cyclic shifts of the

key word on the Find Relative Shifts between Key Letters Algorithm, only give the user actual words

– When using the other two algorithms, a spelling-auto-corrector would improve accuracy

...Continued

Down the Road: Unaddressed Issues and Enhancements to Implement

In the first two find plain text algorithms, allow the user to select specific letters in the keyword or plaintext to change and display the effect of these changes.

The key length algorithm that attempts to compute the GCD could be altered to throw out “bad” data – i.e. find the number(s) that are preventing a

common GCD and ignore those numbers

...Continued

Down the Road: Unaddressed Issues and Enhancements to Implement

Combine the various algorithms so they can share the results and base results off of one another.

Finally, how about considering a new method of encryption?

...Continued

Down the Road: Unaddressed Issues and Enhancements to Implement

Strategies & Knowledge

Research, research, research!– Understand everything you read, even how the author

got from one step to the next

Trial and error, but try it. Do an example first - ON PAPER (but make sure

you do your math right) No single part of the project was difficult to code,

but implementation required an in-depth understanding of the problem

Advice to Next Year’s Seniors

Start EARLY! It goes by FAST.

It is almost impossible to stay on target with your first schedule, second schedule, third schedule...

Lofty aspirations at the beginning, but reality will hit

ASK QUESTIONS!– Different professors have

different “specialty” areas, take advantage of it

– Your classmates can provide great insight

– Don’t re-invent the wheel, check out other solutions first

QUESTIONS