Cryptowork
Cryptowork
Cryptowork
Page 1
Assessment C4 PH370 - Skills for Physicists
0 16 32 48 64 80 96 112
0 [NUL] [DLE] [SP] 0 @ P ‘ p
1 [SOH] [DC1] ! 1 A Q a q
2 [STX] [DC2] " 2 B R b r
3 [ETX] [DC3] # 3 C S c s
4 [EOT] [DC4] $ 4 D T d t
5 [ENQ] [NAK] % 5 E U e u
6 [ACK] [SYN] & 6 F V f v
7 [BEL] [ETB] ’ 7 G W g w
8 [BS] [CAN] ( 8 H X h x
9 [HT] [EM] ) 9 I Y i y
10 [LF] [SUB] * : J Z j z
11 [VT] [ESC] + ; K [ k {
12 [FF] [FS] , < L \ l |
13 [CR] [GS] - = M ] m }
14 [SO] [RS] . > N ˆ n ~
15 [SI] [US] / ? O _ o [DEL]
Table 1: ASCII 1967 Definitions. Column numbers indicate the value of the first element
of that column. Rows indicate the offset from that first element. ie. ‘e’ would be column
value 96 plus row value 5 = 101. All names enclosed in square brackets are the code names
for command characters.
4.3 Ciphers
he objective of this assignment is to develop you own program for encrypting and decrypting
T text. Encryption and decryption is a huge topic in itself, with constant development on new ways
to secure data both in place, and in transit. The algorithm used to perform encryption is referred
to as a ‘cipher’ (or, less commonly, ‘cypher’). A cipher is applied to the ‘plaintext’ (the original,
unencrypted, text or data), with a ‘key’. Based on the content of the key, the cipher generates a
encrypted version of the plaintext (the ‘ciphertext’). To reverse the process, one must possess (or
reverse engineer, for the more nefarious cases) the specific key used with the specific cipher to generate
the ciphertext.
Plain: a b c d e f g h i j k l m n o p q r s t u v w x y z
Cipher: c d e f g h i j k l m n o p q r s t u v w x y z a b
As you can see, a would be replaced by c, b by d and so on. This is not a particularly robust
cipher. A vital clue in breaking an encrypted piece of text is, with a sufficiently long piece of text, to
examine how often each letter turns up. With all languages, there is generally a very distinct pattern
of frequency of each letter. With a Cæsar Cipher encrypted text, you would see exactly the same
frequency, but shifted along by a number of characters. That number would tell you what the key was.
Additionally, if you guess or know that it is a Cæsar Cipher, you can simply try all 26 possible keys,
and see which one results in sensible text!
To consider the cipher in a different fashion, and see the maths which can be made to underpin
the system, give each character a value. Making a 1, b 2 and so on. With this, we can see that the
Page 2
Assessment C4 PH370 - Skills for Physicists
mathematical instruction for the cipher is as follows. In this case pi is a given plaintext value, at
position i in the text, k is the key value, and ci is the ciphertext value at position i. Encryption is
performed as follows:
ci = pi + k, (1)
as this is a ‘symmetric’ cipher, rearranging the equation gives us the process for decryption as well,
pi = ci − k. (2)
One addendum is that ‘modulo’ arithmetic must be followed, in other words, the values have to
loop around at the two ends. If we add 4 to 26, it must end up as 4, not as 30, as there isn’t a 30th
letter of the alphabet (in the English alphabet, anyway).
In the case of a Cæsar Cipher, a single keytext value applies across every character, which is what
makes it unreliable. As computers deal only in numbers, making the cipher into a mathematical
statement is convenient. This is especially true as the symbols for the letters are already actually
stored as numbers, as shown in the section on the ASCII character table. Using this table (at least,
the displayable portion between 32 and 126) gives us far more variation than the basic alphabet Cæsar
Cipher, but the single key value is still its fundamental weakness.
ci = pi + ki , (3)
this is still a ‘symmetric’ cipher, so again rearranging the equation gives us the process for decryption
as well,
pi = ci − ki . (4)
Consider the following example, in each case we show both the letters involved and their ASCII
character values.
First, we have the ‘plaintext’, the piece of text that is intended to be encrypted:
H e l l o , m y n a m e i s A n o n 2
72 101 108 108 111 44 32 109 121 32 110 97 109 101 32 105 115 32 65 110 111 110 50
Next, we choose our key, this will form our cipher. Note how, in both cases, we have been able to
use a combination of upper and lowercase letters, numbers and even punctuation (the comma in the
plaintext above), as they all have ASCII values:
S p y 0 0 7
83 112 121 48 48 55
To create a full key text, we repeat the key as many times as is necessary to be as long as the
plaintext (so that every position i has both a plaintext value and a keytext value):
Page 3
Assessment C4 PH370 - Skills for Physicists
S p y 0 0 7 S p y 0 0 7 S p y 0 0 7 S p y 0 0
83 112 121 48 48 55 83 112 121 48 48 55 83 112 121 48 48 55 83 112 121 48 48
We use this long version of the key to act as the value by which we shift each of the characters of
the plaintext. Shifting the original ASCII values by the ASCII value of the key character at the same
location (same number of characters along in the strings). So, for the capital ‘A’ in ‘Anon’, ASCII
value 65, its matching key character would be the capital ‘S’ in the final ‘Spy’, ASCII value 83. So we
would want to shift the value of 65 by 83 giving 148. This would give us the ASCII value of the new
character which would be the encrypted version of that letter.
However, because the ASCII code stops at 127, and only up to 126 are display characters (127
is the code for ‘delete’), we need to loop back around if a value exceeds 126. Additionally, display
character only start at ASCII value 32 (the value for a space), so rather than looping back to zero, we
loop back to 32 (by subtracting 95). To make this work, we need to ensure that the key value won’t
skip too far. If we shift character 121 (letter ‘y’) by 122 (letter ‘z’), looping if the answer is above 126,
but starting at 32 when we loop back around, we have: 121 + 5 from the key, reaching 126, then with
the remaining key value, we have 32 + 117 = 149; still greater than 126. This would cause a problem.
As such, instead of adding the value of the key character, we add the value of the key character minus
32. This ensures that it will never overrun after the addition. We are making it represent the range of
characters available.
For our key, we have:
S p y 0 0 7
83 112 121 48 48 55
- - - - - -
32 32 32 32 32 32
↓ ↓ ↓ ↓ ↓ ↓
51 80 89 16 16 23
We can then apply an encryption across the whole of the plaintext, for each plaintext character,
we add the value of ‘key text - 32’, then subtract by 95 if the result was greater than 126 (to loop back
to 32). This gives us the following:
H e l l o , m y n a m e i s A n o n 2
72 101 108 108 11144 32 109 121 32 110 97 109 101 32 105 115 32 65 110 111 110 50
+ + + + + + + + + + + + + + + + + + + + + + +
51 80 89 16 16 23 51 80 89 16 16 23 51 80 89 16 16 23 51 80 89 16 16
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
{ V f | C S ˆ s 0 ∼ x A V y y $ 7 t _ i ∼ B
123 86 102 124 32 67 83 94 115 48 126 120 65 86 121 121 36 55 116 95 105 126 66
In a more readable layout, ‘Hello, my name is Anon2’ has become ‘{Vf| CSˆs0∼xAVyy$7t_i∼B’;
quite unreadable, and extremely difficult, if not impossible, to convert back to the plaintext without
knowing the key.
The reason that this encryption, simple though it is, could potentially be impossible to crack, is
that the length of the original key is fairly similar to the length of the plaintext. Consider a key which
is the same length as the plaintext, every character has been moved by an entirely different amount,
purely dependent on the key. When a key is repeated for a long piece of plaintext, that repetition can
leave patterns which are detectable, combined with analysis of letter frequency, it can still be broken.
But with a key that doesn’t need to be repeated, there is no repetition which might give away the
underlying algorithm and key. This concept is called a ‘One-Time Pad’ (‘Pad’ referring to key, in
this context). Additional criteria on the key in this scenario do exist though, the key must itself be
entirely random. If the key contains repetition or patterns such as regular words which are potentially
recognisable, then the possibility exists to break the code without the key.
Hopefully this has provided some insight into how this Encryption method works, the assignment
instructions also include an enumerated series of points to describe it fully again. The decryption
procedure is almost identical. 32 is still subtracted from the key values; rather than adding the
Page 4
Assessment C4 PH370 - Skills for Physicists
plaintext values and key values, the modified key values are subtracted from the plaintext values; and
rather than checking for if the resultant value exceeds 126, the check should be for if it goes below 32.
If it does go below 32 it must loop back around from 126 by adding 95.
The two main methods described here (Caesar ciphers and Vigenére ciphers) are what are referred
to as ‘symmetric key algorithms’. This means that the same single key encrypts one way, and when
applied in reverse, decrypts. An extremely important field is that of ‘asymmetric key algorithms’. In
this case, there are two keys, paired to one another, key 1 and key 2. A message encrypted with key 1
is not created such that applying the same key in reverse will decrypt it; only its partner, key 2, can
perform the decryption. Vice-versa, a message encrypted with key 2, can only be decrypted with key 1.
This is the system used by the RSA public-key cryptographic system, used commonly for transmission
of electronic data over the Internet. Very frequently in emails, for example.
It is, however, a fascinating topic; both encryption/decryption procedures themselves, and the
methods used to crack encryption.
This would output ‘r’ and ‘i’ to the screen, being the sixth character of the string (position 5, in Python
notation), and the seventh character of the string (position 6). In the latter case, instead of printing
the character directly to screen, we have first stored it in a variable a.
The following code fragment shows one way we could use a combination of the elements seen in
this section.
7 for i in range(total_chars):
Page 5
Assessment C4 PH370 - Skills for Physicists
8 c = mystring[i]
9 c_ascii = ord(c)
10 print("{0}:{1}".format(c,c_ascii))
This code requests that the user type in a phrase and stores it (it will be a string by default). It
then finds how long the string is, and creates a loop up to that value. For each value i in the range
of positions inside the string, it stores the character at that position in the variable c, then finds the
ASCII value for that character, which it stores in the variable c_ascii. Both are then displayed to
the screen with a colon as the delimiter between them.
4.5 Brief
he following is the complete comprehensive brief associated with the work to hand in. It should
T contain all pertinent points regarding the nature and implementation of the assessed piece of work.
If there are any aspects which remain unclear, please make sure to ask.
• The goal is to produce a program/script which permits the user to encrypt or decrypt files.
• In particular, you will be required to encrypt and decrypt a file ‘plaintext.txt’ currently ac-
cessible on the PH370 Moodle page. In addition, you will need to decrypt an already-encrypted
file ‘secret.txt’.
• Your code should first ask the user whether they would like to encrypt or decrypt a document
(exactly how you do this and what you request the user type to choose this is up to you.)
• It will also then need to ask the user for a passcode to act as the key.
• In encrypt mode, it should open a specified file (this can be hardcoded into the script, or can be
read in from the user). It should then apply the encryption algorithm using the user’s key, and
save the result with the same filename as the input file but with ‘.enc’ added to the end. (e.g.
‘plaintext.txt’ would become ‘plaintext.txt.enc’).
• In decrypt mode, it should open a specified file (again, this can be hardcoded and changed when
needed, or read in from the user). It should then apply the decryption algorithm using the user’s
key, and save the resultant text into a file with the same name as the input file but with ‘.dec’
added to the end. (e.g. ‘plaintext.txt.enc’ should become ‘plaintext.txt.enc.dec’).
1. First, using the file ‘plaintext.txt’ as input, your script should run in encrypt mode (this
should result in a file named plaintext.txt.enc). The key should be ‘PH370 Python’ (note
the capitalisation).
2. Then, the script should be run in decrypt mode using the output from the previous step
(plaintext.txt.enc), and should result in a file named plaintext.txt.enc.dec. This
should match the original plaintext.txt contents. (The key should be the same as that
used for encryption).
3. Finally, the script should be run in decrypt mode, but use the file secret.txt as input
(either by user selection, or by suitably modifying the code itself). This should be decrypted
using the key same key as previously, PH370 Python. Following the established format, the
output file should be named ‘secret.txt.dec’.
Page 6
Assessment C4 PH370 - Skills for Physicists
References
[1] The Unicode Consortium. The unicode standard, version 10.0.0, (mountain view, ca: The unicode
consortium, 2017. isbn 978-1-936213-16-0), 2017.
Page 7