Interactive Learning Tool for Cryptography
Interactive Learning Tool for Cryptography
Celedonio Arroyo Serrano
Computer Science
Alfredo Cruz, PhD
Associate Director for Computer Science
Polytechnic University of Puerto Rico
Abstract - Cryptography is the science of keeping the secrecy of messages. It is a common technique used to assure data confidentiality and integrity. Using a combination of theory and practice techniques, we can reduce the time that it takes for a student to understand and get familiarized with the different types of ciphers that are available. With the development of a learning tool specifically designed to provide basic understanding and capabilities of cryptography, anyone can learn, understand and use monoalphabetic, polyalphabetic, and polygraph ciphers. The paper will provide an inside view of the various functionalities available for the user, and information related to the types of cryptography used within the tool. A tutorial is provided as an in class laboratory exercise for network security or any related course in were the student will be able to encrypt or decrypt any plaintext or ciphertext using a key. This will also clearly show the differences between transposition and substitution ciphers. The learning tool will lead to: a faster understanding of cryptography; motivate and engage students to practice the theory of the courses related to cryptography; help master encryption/decryption concepts quickly and effectively. The tool will also provide an extra resource for the professor incorporating the tool into the lab environment and the course.
- INTRODUCTION
In our society information assurance has taken an important role. Almost all the transactions that are done over the internet are secured by different kind of algorithms in order to maintain the confidentiality, privacy and the integrity of the information that is being shared. In order to secure this information we rely on cryptography which is the practice and study of hiding information [1]. The terms encryption and decryption have become more common in everyday life. In college I found that these concepts are in many of the courses that I have taken during my academic career.
In this paper we propose the use of a visualization technique in order to understand the basic concepts of cryptography.
The purpose of this project is to provide instructors with a tool to effectively teach the basic concepts of cryptography. The tool will motivate the student to participate and understand the concept better by using a tool related to classical ciphers. With this tool the user will be able to differentiate the use of mono-alphabetic, polyalphabetic and polygraph ciphers. Even though many of these ciphers can be solved by hand, it will engage the student into using the technique and verifying if the encrypted or decrypted message is correct.
- SUPPORTING THEORY
VISUALIZATION
In 2011, Simms and Chi [14] said that visualization is a process of taking raw data and converting it to a form that is viewable and understandable to humans. They also said that it is becoming a basic building block in our daily lives because of the overwhelming amount of information required to be processed by the human brain. Visualization is a technique for creating images, diagrams, or animations to communicate a message. Primitive drawings on buildings and monuments are an example of visualization techniques, proving that visualization has been around since the dawn of man. Advances in technology have made visualization a very informative and easy to practice. The use of computer models as tools for advancing scientific knowledge is becoming a key component of current scientific research. As we can see, the implementation of visualization in the classroom can be very productive.
In 2011 Ma, Tao, Keranen, Mayo, and Kuang [10] stated that cryptography should be a course regularly offered at colleges and universities. Textbooks and hand-books aid in the teaching of cryptography, but students attracted to this field may post some unique challenges to educators. In particular, Computer Science (CS) students find that understanding the sophisticated mathematics behind the crypto-systems is a daunting task, while math majors often get lost in the details of the complicated algorithms. Educators need to find a way to help students understand both how and why the algorithms are used [10].
Visualization tools can be an effective way for educators to battle this challenge. While some visualization tools have been developed, not many of them allow the user to understand both the mathematical theory and the algorithm behind a certain crypto-system [10].
CRYPTOGRAPHY
Cryptography is the method of storing and trans-mitting data in a way that only those it is intended for can read and process it. It is the science of protecting information by encoding it into an unreadable format. Cryptography is an effective way of protecting sensitive information stored on media or transmitted through a network communication path. Although the ultimate goal of cryptography, and the mechanisms that make it up, is to hide information from unauthorized individuals, most algorithms can be broken and the information can be revealed if the attacker has enough time, desire, and resources. So a more realistic goal of cryptography is to make obtaining the information too work-intensive to be attractive to the attacker [8].
If we start to look back at the history of cryptography we see in many text books that this concept can reach back to almost 4000 years. The Egyptians used a type of substitution cipher to encipher some of their hieroglyphic writing on monuments. It’s also a fact that ancient Hebrews enciphered certain words in the scriptures [6]. Even 2000 years ago Julius Caesar used a simple monoalphabetic cipher to send secret messages to his generals. In the 1200s Roger Bacon described several methods to cipher messages [5]. In 1585 the poly-alphabetic substitution cipher was described by Blaise de Vigenere [13]. Many other ciphers had evolved during that time, producing a clear difference between classical ciphers and modern ciphers. In this paper the approach of classical ciphers is one of the primary concerns.
CLASSICAL CIPHER
As mentioned, the Classical cipher has been used throughout history. But now, it’s being used to learn the basic concepts of cryptography. With the fast growth of technology many techniques such as cryptanalysis the encryption can be broken very fast. Encrypted messages are not secure and many of them can be solved with the use of a piece of paper and a pencil.
DEFINITION OF CONCEPTS
In order to understand better the concept involved in cryptography we need to define a couple of basic concepts:
Plaintext:The original intelligible message or data that is fed into the algorithm as input.
Ciphertext: The scrambled message produced as output.
Encryption: The process of converting from plain-text to ciphertext.
Decryption: The process of restoring the plaintext from the ciphertext
Key: A value used to encrypt or decrypt a message or data.
Cryptanalysis: The branch of cryptology dealing with the breaking of a cipher to recover information.
- TYPES OF CIPHERS USED IN THE TOOL
The Interactive Learning Tool is still in progress. So far, the ciphers that are available in the tool are the Caesar cipher, Vigenere Cipher, Playfair Cipher, and the Rail Fence Cipher. In order to provide a clear understanding it is important to understand how each of these ciphers work and to classify each of them correctly.
MONOALPHABETIC CIPHER
Mono-alphabetic ciphers use fixed substitution over a plaintext in order to create a ciphertext. It’s important to mention that the algorithms in this kind of cipher are based on two general principles: substitution cipher and transposition ciphers.
Substitution cipher is a method of encoding by which units of plaintext are replaced with ciphertext, according to a regular system, for example a pair of letters, triple letters or a mixture of letters. Using this cipher an encrypted message can be sent and the receiver can decode the message by performing an opposite substitution [6].
In Transposition ciphersthe units of the plaintext are rearranged in a different and usually quite complex order, but the units themselves are left unchanged. In contrast to the substitution cipher, the units of the plain-text are retained in the same sequence in the ciphertext, but the units themselves are altered [6].
CAESAR CIPHER
The Caesar cipher is a monoalphabetic cipher. It is composed of the use of the alphabet (in our case the English alphabet) from A to Z. It’s considered in many textbooks and other sources of information as an early cipher method. The Caesar cipher is commonly used to teach the basic concepts of cryptography. Other types of classical ciphers will use a key and a plain-text in order to encrypt or decrypt a message, but this cipher uses a shift parameter. The shift parameter is responsible for the transformation of the ciphertext or the plaintext. Based on the shift parameter, the alphabets that are aligned together will be rotated to the left or to the right, depending on the number of shifts.
In Figure 1, we can see how the alphabet was trans-formed using a shift parameter of 3. We can see that different lines from the top alphabet are pointing to different letters from the bottom alphabet. In this example A was transformed into D, B was transformed into E, C was transformed into F, and so on, until the entire alphabet is transformed. It’s important to mention that once the alphabet reaches the final letter in the alphabet (Z) it will keep on transforming the alphabet starting from A again until the full transformation is completed.
Figure 1: Transformation of the Alphabet
This classical monoalphabetic cipher is very easy to break, based on the fact that it only holds 26 characters from the English alphabet. In other words, we can conclude that one of the 26 shifts will be the correct one and once it’s found, and it can be very easy to keep doing the transformation until obtaining the plaintext. This kind of action is known as a brute force attack.
Frequency Analysis
Another way to break this cipher is with frequency analysis. This is a technique that is very important even in modern cryptography, as we can see in many tools that are developed to break different kinds of ciphers. Since its part of the basic knowledge we will include more information about it.
According to Martin (2012), in monoalphabetic ciphers (like in the simple substitution ciphers), frequency analysis can help to break the ciphertext based on the size of the ciphertext used. Since it is looking for words that are always present in the English alphabet, it can be assumed that those letters can be repeated almost all the time (like for example the letter T and the letter E). Martin (2012) also explained that even though frequency analysis has a degree of trial and error before establishing what the correct matches are, it reduces the choice with respect to the remaining letter. Frequency analysis does have it’s limitations, but there are many tools that provide almost instant decryption of a ciphertext generated by the simple subtitution cipher, [11].
Mathematical Description of the Caesar Cipher
The mathematical description of the Caesar cipher is one of the easiest to understand. This makes the cipher very easy to implement using any pro-gramming language, even without the use of an arithmetic expression. In this case, an array would be created and a series of loops can then shift the alphabet depending on the number of shifts wanted. In the mathematical description of the cipher, we can conclude that ciphertext will be equal to C, while the plaintext will be equal to P, and the shift parameter will be K. Many books, articles and web pages use the same arithmetic expression of this cipher. The arithmetic expression is as follows:
C = P + K mod 26
The alphabet will be represented from 0 to 25, in were A = 0, B = 1, C = 2…..Z = 25.
POLYALPHABETIC CIPHER
We can see a noticeable difference when we compare a polyalphabetic cipher to a mono-alphabetic cipher.In the polyalphabetic cipher we can see how the substitution is done at different positions of the message.
In this cipher a unit from the plaintext is mapped to one of several possibilities making this type of cipher a little bit more secure than the regular monoalphabetic ciphers [3].
One of the characteristic of this kind of ciphers is that multiple cipher alphabets are used. In order to facilitate encryption, all the alphabets are usually written out in a large table that consists of a 26×26 graph, so that 26 full ciphertext alphabets are available. The method of filling the table, and of choosing which alphabet to use next, defines the particular polyalphabetic cipher.
VIGENERE CIPHER
Considered one of the most popular, it was first published in 1585. It was considered unbreakable until 1863.
The Vigenere Cipher consists of the alphabet written out 26 times in different rows, as shown in Figure 2. Each alphabet is shifted cyclically to the left compared to the previous alphabet, corresponding to the 26 possible Caesar ciphers [7]. In Figure 2 we can observe that the top row represents the plaintext letters. The leftmost column of the square (that has the letters from A to Z in order from top to bottom) represent the key letters. Now that we know that, we can use it to map that plaintext letter to the ciphertext letter on the column. The letters on the right column represent the letter on the row that is part of the key. With this combination we can find the ciphertext letter that we are looking for. In other words, we select the letter from the plaintext on the top row, and then we select the letter from the right column that is part of the key. We then trace those two and select the letter that intersects the column and the row, this letter is then the ciphertext letter. It’s important to mention that in order to start the encryption process we need to get the plaintext and align each letter on it with the letters of the key. If the key is too short for all the letters, then we keep repeating the key[12].For example, if the plaintext is SECURITY and the key is VEGAS then the alphabet used at each point depends on repeating the key used to encode or decode, aligning each character using the same letter on the key until all the characters in the plaintext are covered as follows:
Plaintext: / S / E / C / U / R / I / T / YKey: / V / E / G / A / S / V / E / G
Figure 2: Vigenere Square
The decryption is performed by going to the row in the table corresponding to the key, finding the position of the ciphertext letter in this row, and then using the column's label as the plaintext, same operation as in the encryption process but backwards.
Mathematical Description
The algebraic description of this cipher is as follows: If the lettersA–Zare taken to be the numbers 0–25, and addition is performedmodulo26, then Vigenere encryptionEusing the keyKcan be written as:
Ci = Ek ( Mi ) = ( Mi + Ki ) mod 26
For decryption we use D as decryptionusing the keyK, for example:
Mi = Dk ( Ci ) = ( Ci – Ki ) mod 26
With these formulas, it’s very easy to create the pseudo-code that is implemented in the tool.
POLYGRAPH CIPHER
In a polygraphic substitution cipher, plaintext letters are substituted in larger groups, instead of substituting letters individually. The first advantage is that the frequency distribution is much flatter than that of individual letters (though not actually flat in real languages; for example, 'TH' is much more common than 'XQ' in English). Second, the larger number of symbols requires correspondingly more ciphertext to productively analyze letter frequencies, making this type of cipher harder to break [11].
The earliest practicaldigraphic cipheror pairwise substitution was the so-calledPlayfair cipher.
PLAYFAIR CIPHER
In this cipher, a 5 x 5 grid is filled with the letters of a mixed alphabet (two letters, usually I and J, are combined). A digraph substitution is then simulated by taking pairs of letters as two corners of a rectangle, and using the other two corners as the ciphertext. This was developed by Charles Weatston. Special rules handle double letters and pairs falling in the same row or column [2].
Thanks to the way it’s implemented, the Playfair cipher is harder to break. There are almost 600 possible digraphs rather than the 26 possible monographs, making frequency analysis harder. But with today’s computing power and capabilities, this could take only a few seconds.
In Figure 3, we can see a 5 x 5 grid, also known as a graph. On it a key was used and the grid was generated. Notice that I and J are together in order to fulfill the combination mentioned before.
Figure 3: Playfair Graph
To generate the key table, one would first fill in the spaces in the table with the letters of the keyword, in Figure 3, we used the key ANNUALCONFERENCE, (Dropping any duplicate letters) then we filled the remaining spaces with the rest of the letters of the alphabet in order, in this version we put both "I" and "J" in the same space, this is the classical Playfair cipher. The keyword together with the conventions for filling in the 5 by 5 table constitutes the cipher key.
To encrypt a message, one would break the message into digraphs (groups of 2 letters) such that, for example, "MEET ME IN LAS VEGAS" becomes "ME XE TM IN LA SV EG AS", and map them out on the key table. If needed, append an "X" to complete the final digraph, or if two letters are the same like for example in the word “MEET” an “X” need to be used to separate the double “EE”, for example “ME XE”. The two letters of the digraph will be used to do the decryption or the encryption, and this is done by using a rectangle as the reference. For example in “ME XE” we are going to start by looking “ME” on the graph. We look for “M” and then “E” and we create a rectangle with those two, in other words we will consider each letter as a corner of a rectangle in the key table. Depending on the type of rectangle we need to use the following 4 rules to order each pair of letters in the plaintext in order to encrypt the plaintext: