Wednesday, June 25, 2014

Week 1 Part 1

EDIT: So yeah, I fell off the wagon with this blog - Gonna change it up into something a little bit more interesting in time to come.

I'm leaving this up here anyway in case someone wants to read it.

So I've started this basic blog to write down all of my thoughts and notes on Dan Boneh's Cryptography 1 course that I just started in Coursera.
 
Let's begin:
 
The initial videos within the class discuss the basics of discrete probability. In a nutshell, Boneh covers a quarter worth of DP in less than 45 minutes. I had to spend about an hour reading through wiki books just to keep my mind clear and thoroughly understand what he was talking about. That being said, it's not entirely imperative that you completely understand the material, you just need to know the basics.
 
I have not completed watching all of week one's videos yet; I only finished the introduction ones. Being the idiot that I am, I skipped the material and went straight to the programming challenge that is listed for extra credit. The regular assignments aren't due till the 20th of July so I figured why not give this a shot now.
 
The extra credit gives you 11 cipher texts with the end goal of decrypting the final message. The only knowns are that they used the same key and that they were encrypted with a stream cipher. OK. So what do we do with this?
 
I hit up google for some research and took some notes on various topics that I thought would be useful. They things I learned/found are listed below. I'm not going to explain them here. People who are that interested can and will look up the details on their own time. They are just serving as a reminder to me. I was able to solve everything with the five things below:
 
XOR
f(a)f(b) = a⊕b (PARAMOUNT)
Crib Dragging (CRACKING METHOD)
http://www.dolcevie.com/js/converter.html - A hex to string converter
http://codeseekah.com/cicada/xor.html - An online app that XORs hex for you
 
Now, I'm not going to give the solution on this blog - doing so would be cheating. I will write down my thought process that led to me solving the challenge. Doing so will help me solidify the near 8 hour ordeal such that if I ever (for what reasons, I don't know) need to do this again, it will be quicker and less painful. Anywho!
 
After reading around and listening to Boneh, I found that using a one-time pad key more than once is a REALLY bad idea. Why? Here's why:
 
When you use a key more than once, you create this scenario:
 
f(a)f(b) = a⊕b
 
What does this mean? It means that if you XOR two cipher texts with the same key, the result is a new cipher text where the plain text is a reversible key. Sounds a little confusing but it's a glaring security hole for many reasons. Without diving into too much wordiness, it means that I can effectively guess the message through frequency analysis.
 
How is that? Well, we utilize an attack called crib dragging. In the crypto world, a "crib" is the same as a guess. Basically, we make an educated guess as to what words might be encrypted message. The classic "crib" is the word " the " with spaces on either end of it. We use this first as it's the most common word in the English language that could be within the texts.
 
So I began to XOR every single cipher text with the final cipher text and built a lengthy list of 11 XORed outputs. I then deployed the crib drag across every possible position for the combination t⊕10 where t is the target and 10 is the 10th message given to us. The end result is that I found the first 5 digits of plain text of *one of the two* messages that were XORed.
 
Awesome. SO I have some text, not sure what it belongs to, but I have some text. Well, I can reverse the process. Because we understand the property listed above, we can take the text I newly found, convert it to hex, and XOR is with t⊕10 again to get the OTHER text - We now have two pieces of text that belong to each other via the t⊕10 ciphertext.
 
I know this sounds confusing, but if you are going through the entire process of decrypting this crap, it will make a ton of sense. Plus, I will be including an example below of how it works visually.
 
 From here, I continued to bounce back and forth off of the texts making guess from the fragments - As you get more and more letters, you guess what words are there and expand outward. Your "crib" gets bigger.
 
It's really that simple. It's tedious, but highly doable. If you are skilled with Python, it's even easier.
 
Here's a visual example.
 
M1: the message is a secret
M2: there is no cow level
K: mathematicsistotallyawesome
 
Ok. So to encrypt, we first convert the messages to hex using that site I listed above:
 
M1H : 746865206d657373616765206973206120736563726574
M2H : 7468657265206973206e6f20636f77206c6576656c
KH : 6d617468656d61746963736973746f74616c6c79617765736f6d65
 
Ok. So now we perform M1H⊕KH  and M2H⊕KH to get the encrypted message - We do this with the other website I listed above. The results are below:
 
M1e : 1909114808081207080416491a074f15411f091a131211736f6d65
M2e : 1909111a004d0807490d1c49101b18540d091a1c0d7765736f6d65

We can already see that there are the same digits in the beginning parts which would lead any cryptanalyst towards thinking the same process/key was used to create those two encrypted texts.

Nevertheless, if we take the two ciphertexts and run M1e⊕M2e - the results are astounding.

M12XOR - 0000005208451a0041090a000a1c57414c1613061e657400000000

Let's run an XOR of the string "the " against the first four digits of M12XOR and see what happens.

The result: 54686572 which in hex is "Ther" - Didn't one of my messages above have that in there? Oh yeah, M2 did! With this rudimentary example, we don't need a key to force our way to the messages. I can then convert "Ther" into hex and XOR it again against M12e and it will reverse the process and give me "the " - This is the exact process that I used to decrypt Boneh's first programming challenge. It's tedious, but it works. Obviously, it would be a lot less tedious if you programmed it recursively with something like Python...

Now, I know what you're thinking - It's so easy because we already know the text, right? Partially. In practice, it's a lot dicier - Especially in Boneh's texts which are long and annoying. But the thing is that almost all of his sentences contain common words so it's just a matter of time and patience. If the word isn't seen or the XORed output is garbage, we drag the test hex word over a digit and try again until something makes sense.

The end lesson? Never repeat key use with a one-time pad.