Files cryptography based on one-time pad algorithm

Received Sep 5, 2020 Revised Sep 26, 2020 Accepted Dec 5, 2020 The Vernam-Cipher is known as a one-time pad of algorithm that is an unbreakable algorithm because it uses a typically random key equal to the length of data to be coded, and a component of the text is encrypted with an element of the encryption key. In this paper, we propose a novel technique to overcome the obstacles that hinder the use of the Vernam algorithm. First, the Vernam and advance encryption standard AES algorithms are used to encrypt the data as well as to hide the encryption key; Second, a password is placed on the file because of the use of the AES algorithm; thus, the protection record becomes very high. The Huffman algorithm is then used for data compression to reduce the size of the output file. A set of files are encrypted and decrypted using our methodology. The experiments demonstrate the flexibility of our method, and it is successful without losing any information.


INTRODUCTION
In recent years, encryption is playing an essential role in many aspects, such as military, trade secrets and satellite imagery [1][2][3][4][5][6]. Encoding is the process of converting something in the physical world into a representation that can be stored or shared. Letters and words encode ideas and what is heard into a format that can be stored or shared. The goal of encoding is to deliver these ideas to their intended recipients. A person who does not understand the language, or does not know how to read it, will be unable to decode the information [7]. Encryption allows a person to hide the meaning of information or messages in such a way that only those who know the secret method may read them. For a very long time, people have had many different reasons for wanting to hide information from others. The earliest historical examples were for hiding trade secrets, military secrets, and secret correspondences between spies and lovers [8]. There were many ways that encryption tools were used in ancient times until the modern era saw the revolution of technology, and the concept of contemporary encryption emerged [9]. These same encryption principles are now used to safeguard your Internet communications [8].
Encryption in present-day has been achieving by utilizing algorithms that have a key to encrypt and decrypt data. These keys convert the data into "digital gibberish" via means of encryption and afterwards return them to the first structure through decryption [10]. Principally, the more extended the key is, the complex it is to break the code. An example of this would be, every binary unit of data has an estimation of 0 or 1. Therefore, an 8-bit key would then have 256 potential keys; a 56-bit key would have 72 quadrillion prospective keys to attempt to decipher the message. With present-day innovation, cyphers utilizing keys with these lengths are getting simpler to break. DES, an early US Government, endorsed cypher, has a sufficient key length of 56 bits, and test messages utilizing that cypher, were broken. Further, as technology advances, so reflect on the aspect of encryption, one of the most remarkable improvements in the research of cryptography is the presentation of the asymmetric key cyphers, these are algorithms which utilize two mathematically related keys to encrypt the same message. Before the presentation of the advanced encryption standard (AES), most ordinarily, the data sent over the Internet, for example, financial information, were encrypted by utilizing the data encryption standard (DES) which was endorsed for a brief period, even though, witnessed extensive use [11,12].
Therefore, the establishment of a new protocol known as the secure socket layer (SSL) [13], drew the way for online transactions to pass. Transactions were extending from the purchasing process to online bill pay and banking utilized SSL [14]. Besides, as wireless Internet connections turned out to be progressively essential among people, the necessity for encryption raised, as a level of security was needed in everyday situations. Data compression includes encoding information utilizing fewer bits than the original representation [15]. Compression can be grouped into two classifications, lossy or lossless. Lossless compression decreases bits by recognizing and disposing of measurable excess; consequently, no data are lost in lossless pressure. Conversely, Lossy compression diminishes bits by evacuating redundant or less significant data. Data compression is dependent upon a space-time complexity trade-off [16,17].
The one-time pad algorithm is derived from a previous cipher called Vernam Cipher, named after Gilbert Vernam [17]. The Vernam Cipher was a cipher that combined a message with a key-stream read from paper tape or pad [18]. The unbreakable aspect of the one-time pad comes from two assumptions; the keystream used is entirely random, and the key cannot be utilized more than once [18]. The security of the onetime pad depends on keeping the key 100% secret. The one-time pad is typically implemented by using a modular addition (XOR) to combine plain text elements with key stream elements. The key used for encryption is also used for decryption, applying the same key to the cipher text results back to the plain text. The cipher text is normally executed by utilizing the logical XOR operation to the individual bits of plain text and the key stream. The benefit of utilizing the XOR operation for this is that it can be reverted, simply via implementing the same operation again. The formulas (1) illustrate the encryption and decryption processes in Vernam algorithm: where ⊕ indicates to XOR operation, P , K, and C represent the plain text, the key-stream, and the cipher text, respectively. The Advanced Encryption Standard AES is a branch of the Rijndael block cipher created by two Belgian cryptographers [19]. Rijndael is a group of ciphers with various key and block sizes. The algorithm portrayed by AES is a symmetric-key algorithm, which means a similar key is utilized for both encrypting and decrypting the data [20]. In software engineering and information theory, a Huffman code is a specific kind of ideal prefix code that is usually utilized for lossless data compression [21]. The Huffman's algorithm result may be adopted as a variable-length code table for encoding a source symbol. The algorithm derives this table from the estimated probability or recurrence of the event for each potential value of the source symbol, therefore, Huffman's technique can be proficiently executed [22]. Recently, Zaeniah et al. [21] presented an examination of encryption and decryption application by utilizing the one-time pad algorithm, to guarantee the information of the individuals who don't have the power to fill in the data, in which they exploited the statistical analysis utilized in the compression algorithms to acquire all the more efficient encryption key. Rishav Ray et al. introduced a scheme to encrypt texts using randomized data hiding algorithm with modified generalized Cipher Method [22]. Miyano et al. [23] proposed a one-time cushion cryptographic strategy utilizing a star system of N Lorenz subsystems, alluded to as expanded Lorenz conditions, which produces messy time arrangement as pseudo random numbers to be utilized for concealing a plain text. Abiodun et al. discussed the problem of the encryption key and the process of moving safely [24].
The primary disadvantage of encryption using the one-time pad is that the encryption key has a similar length as the message to be encrypted [24]. Thus, this paper, a random key is generated before applying the Vernam algorithm and is then compressed with the message to be encrypted. The biggest obstacle to the application and circulation of the Vernam algorithm on the data of considerable size, for instance, we assume that we want to encrypt 1 MB of the data; therefore, we need to 1 MB for the key, Key data between the sending person and the recipient. The problems of using this algorithm can be summarized as follows:  Keys should not be reused.  Keys sequences should not be repeated.  Keys need to be shared somehow.
The modern cryptosystem concerns security criteria for data integrity, confidentiality, authentication, Non-Repudiation and reliability [25]. The main contributions of the proposed secure cryptosystem that overcome the problems that stand in the way of using the Vernam algorithm can be outlined as:  A random key is generated with the option to be the key from the integers or ASCII table symbols.  The encryption key is compressed with encrypted text to overcome the encrypted data size problem.  The key is hidden with the same encrypted message to become a single encrypted file, to overcome the problem of transferring the encryption key to the recipient.  Protect the encrypted file with a password encrypted with the AES algorithm to be used during the decryption process.
 Set the password hide point to reduce the decryption time.
The rest of this paper is organized as: The proposed methdology is presented in section 2. The experimental results and analysis are introduced in section 3. Finally, the main conclusions are presented in section 4.

METHODOLOGY
The flowchart of the proposed methodology is illustrated in Figure 1. In the stage of the implementation part, the VB.Net programming language is used to create the system. Figure 2 shows the Vernam system. Therefore, the main steps of our methodology, Firstly, a random encryption key is generated, to en-crypt the plain text with the Vernam algorithm, then the key with the encrypted version is stored in one file. Secondly, the AES algorithm is utilized to encrypt the password, which is considered the point of separation between the encrypted text and the encryption key. The encryption phase includes two algorithms (Vernam and AES) and using a simple steganography technique that hid the cryptographic key data. Thirdly, data compression is implemented since the file size has doubled due to the encryption key mode with encrypted data in the same file. Therefore, Huffman algorithm is used. The outlines of the encryption phase and the decryption phase are shown in Algorithm 1 and Algorithm 2, respectively. Check the decryption options (Password, Random key option, Compress option) that should be the same options used in the encryption, 3 Convert the decrypted file into an array of bytes, 4 Obtain the first ten elements of the matrix containing the password location 5 Extract the password from its location and decrypt it with the AES algorithm 6 Compare the password extracted with the password that was typed by the user to follow the decryption process if they match, (i) If the passwords match, then the encrypted text is divided into two parts, the encrypted text which is located at the left of the password, whereas, the right of the password indicates the encryption key which can be pressed according to the selected compression options. (ii) Else; the password does not match. 7 If the encryption key is compressed, it is decompressed using the Huffman decompression algorithm, 8 After the random key matrix created, start a process of decryption with the Vernam algorithm, 9 Write the output file that contains the decoded text.

EXPERIMENTAL RESULTS AND ANALYSIS
In this section, the program and proposed methodology have been conducted on a set of files of varying sizes and types; thus, the changing of the file size concerning the option of compression (with compression, or without) and the selected random keys (Integers, or ASCII table) was being studied.

Experimental results via integer's random key
Here, we were able to achieve the highest compression rate of data, and the proportion of this compression 30% have exceeded this proportion in files, the results of this experiment are reported in Table 1. Besides, Figure 4. Illustrates the size difference between the expected file and the encrypted file after compression, concerning some various file types. Therefore, without utilizing data compressing, the output file size was equal to the expected file size of the encoder; thus, there is no difference between the expected file size and encrypted file size. The numerical results of these experiments are reported in Table 2.

Experimental results via ASCII table random key
In the same manner, we used the same files which were used in the previous experiments, but the ASCII table random key was used with data compression. This experiment did not achieve the compression of the data that we were expected. An example of this would be, some file types reached 0% at the size difference between expected and encrypted file sizes, and in others, reached under 0%. That means, the performance of the designed solution affected by a dictionary of the compressed file, the compressed data volume exceeded the expected size, albeit a small percentage. The reason behind is the primary function of compression algorithms is to create a dictionary of compressed file symbols and included it in the same file as well as the encryption key generated from the symbols of ASCII table, therefore, increased the size of the dictionary. Although we used data compression, we noted there are some file types, such as image files and text files that have a bigger size than the expected output files, as shown in Table 3. Moreover, we studied the effect on file size while using ASCII table random key without compressing the data, the numerical results of this experiment are shown in Table 4.

Encryption and decryption time study
In this subsection, we study the encryption time (Et), and decryption time (Dt), thus, the Et is utilized to compute the throughput of encryption (TE) which is calculated as follows: where Tp indicates the total plain text (File Size) in KiloBytes (KB) and Et denotes encryption time in milliseconds (ms).
According to the consumed time of the encryption process, as shown in Table 5. We can see that selecting the data encryption option with compression took longer due to the compression algorithm which takes some time; thus, the more the file size will increase the time taken, and vice versa is true. It was noted that the use of cryptographic key generation technology from integers only takes much less time, sometimes reached more than 50% compared with the ASCII table random key generation option. However, in terms of the prospect of encrypting data without compression, the results were very close, and the size of the file had little effect on data encryption time.
Overall, the throughput of encryption for encryption key without compression is higher than the throughput of encryption for integer key with compression. In the stage of the consumed time at decryption process, as shown in Table 6. We found that the data which were encrypted with a data compression algorithm took longer during the decoding process, compared to the data that were encrypted without compression. Data compressed with a cryptographic key generated from integers only took much less time during the decryption process, sometimes reached to 85% of those data encrypted with a cryptographic key generated from ASCII code symbols. In the option to decrypt data without compression, the file size has not a significant impact on data decryption time.

CONCLUSION
Cryptography is a multidisciplinary topic, which plays a vital role in many applications such as network security, the privacy of information and communications. A cryptosystem becomes worthless if poorly managed and improperly carried out. Therefore, this paper proposed a practical methodology for files cryptography based on the one-time pad algorithm. Thus, this methodology has overcome the encryption key management of the Vernam algorithm by controlling the data type in the encryption key. The Huffman algorithm reduced the size of the output file. The output file was protected with password encrypted by the AES algorithm, therefore increased the difficulty of breaking the encrypted output file. Various types of files, such as (txt, pdf, doc, bmp, mp4, exe), were carried out for successful experiments without losing any information. Further, the outcome of this research demonstrated that the time consumed for encryption and decryption the file, which compressed with a cryptographic key generated from integers less than when the cryptographic key generated from the ASCII table. Whereas, the file's size had a little effect on data encryption time without compression. Since the cryptosystem has been playing a vital role in many applications. Future work would be how to explore designing a secure cryptosystem on the long key of encryption one-time-pad.