DNA computing based stream cipher for internet of things using MQTT protocol

ABSTRACT


INTRODUCTION
All devices present in different places around us, such as houses, buildings, cities, and even in our bodies, from the data perspective, can sense or generate data for various applications of our daily life such as health care, environmental monitoring, military and industry. When these devices communicate and share information among them over a distributed area through the internet, they constitute the Internet of things (IoT) application. Hence, an IoT device has the ability to communicate, upload, and download information through the internet without human intervention. In other words, the devices are capable of thinking and making a decision. Along with the rapid development of the IoT application, security in IoT is a crucial issue that includes threats aimed to exploit possible weaknesses [1,2]. In IoT, security is divided into two parts first, an authentication and authorization mechanism is required to ensure the security of the communication network that protects the network from any intruder device, which can send or receive information in the network. Secondly, the information itself should be secure also by means of encryption techniques. So on the basis of different cryptography algorithm, securing data device is possible. Cryptography is mainly used to secure information by sharing secret key over different devices. Two type of key are available symmetric and asymmetric key [3,4].
In symmetric, keys are used on both sides sender and receiver while, in asymmetric two different keys are used. IoT deals with real time data such as critical point, the size of data is an important metric too. For some application such as environmental monitoring sampling time is not very critical since data could be collected every minute or hours while in traffic monitoring or healthcare. When uploading or downloading small amount of data it will not require very high band width of internet and vice versa. Cryptography may change the data in type or size depending on the algorithm used such that the intruder cannot identify the original data. Therefore, the algorithm used for data encryption in IoT should be chosen carefully such that it would not overload the bandwidth or effect the real time application which can lead to a bad device performance. The typical security of IoT system can be classified into the following term: access control, authentication, privacy protection, communication security, data integrity and confidentiality, and availability [5].

LITERATURE SURVEY
Security is a critical issue in IoT application since the data is available over the internet; therefore more development is required in this field of research. Until now, there is no clear security platform for IoT. Ibrahim et al. [6] propose a DNA computing encryption algorithm which use amino acid coding to eliminate the one time pad limitation. Aieh et al. [7] Deoxyribonucleic acid (DNA) propose key sharing technique using Diffie-Hellman Cryptography symmetric algorithm. Also, an encryption technique has been propped by Anwar et al. [8]. Which uses symmetric key exchange, DNA computing hybridization, and one time pad technique Mektoubi et al. [9] propose base a mqtt protocol for secured communication of data and key exchanges in IoT network. Bhawiyuga et al. [10] propose an authentication token of mqtt protocol which has been implemented in a constrained device. Begum et al. [11] propose a hybrid cryptography algorithm using One Time Pad, RSA, and DNA computing for text hiding and protection for attackers. Huang et al. [12] propose a publish-subscribe pattern to preserve privacy in fog computing using (CoAP) application protocol. Andy et al. [13] discuss IoT an adequate implementation security mechanism. Wardana and Perdana et al. [14] propose an access control security system in IoT which uses mqtt protocol for communication and fog computing architecture.

IOT PROTOCOLS
IoT protocol is divided into four basic categories: application, service discovery, infrastructure, and other influential protocols. Table 1  is an open standard application layer protocol for the IoT focusing a message oriented environments. Its supports reliable communication via message delivery guarantees primitives including at most once, at-least-once and exactly once delivery [15,16].

MQTT PROTOCOL
The message Queuing Telemetry Transport (MQTT) protocol is a machine to machine M2M protocol, which runs over TCP/IP. It uses a publish/subscribe model between IoT nodes. A broker (cloud server) is the station where the publisher nodes send their messages in a specific topic, where the client node checks these topics. Nodes may subscribe in some topics and not in another. Also, other nodes can publish in specific topic. If for in instant, a node publish in a topic then each node subscribes in that topic would receive the message while other nodes whose not subscriber in that topic would not receive the message [18,19]. In this work, all messages which are transfers between IoT nodes have been encrypted in the publisher and decrypted in the subscriber side using One Time Pad (OTP) technique and DNA computing. Figure 1 shows a schematic diagram at mqtt protocol. Figure 1. Schematic diagram at MQTT protocol

One time pad
It is the most secured encryption techniques where each key is used once for each message. Each single piece of data is encrypted individually with a unique key. The disadvantage of this powerful method is that it requires a huge number of keys, therefore, Pseudo Random Number Generator (PRNG) could be used to generate the keys, but a key repetition is a problem [20]. In this work a Linear Feedback Shift Register (LFSR) has been used to generate a series of key according to the required polynomial and number of bits. These keys are joined to generate a single key with a length equal the length (in binary) of the original message. To improve the strength of the encryption algorithm a DNA computing has been used to encode the messages. The one time pad technique is easy to implement, through following steps of encryption. The original plain text message is as follows [21]: Message = mi = m1,m2,m3,...,mn, mi The key sequence by PRNG is: Pad = ki = k1,k2,k3,...,kn, ki Then the cipher text is as follows: To decrypt the cipher in the receiver side, the following function is used:

Genomic based cryptography
By improving the strength of the encryption, a DNA computing has been implemented. The Deoxyribonucleic Acid (DNA) is a biochemical macro molecule which contains genetic information necessary for the living beings. A genomics molecule consists of a two-stranded nucleotide that is obtained by two twisted single stranded DNA chains, hydrogen bonded together between bases A-T and G-C. The double helix stranded structure is configured by two single strands. Four kinds of bases are found in the strands: Adenine (A); Guanine (G); Thymine (T); and Cytosine (C) as show in Figure 2 DNA based cryptography algorithms have satisfactory results in terms of security and performance. Key features of DNA such as large storage capacity and uniqueness, provides more security to DNA based cryptography algorithm [22,23]. Tables 2 and 3 shows the DNA addition and subtraction rules where the addition rules are used in the encryption process and the subtraction rules in decryption process.  Table 2. Addition operation for the DNA sequence Table 3. Subtraction operation for the DNA sequence

Linear feedback shift register (LFSR)
A random number generator has been used to generate a lot of keys, the n-length LFSR consists of n flip-flops 0, 1, 2… N-1, each can store single bit. Figure 3 shows a 16 bit LFSR, the characteristic polynomial is x 16 +x 15 +x 13 +x 4 +1 [24,25]. Keys generated by LFSR are a 16 bit length with each iteration. When it reaches the seed value, keys would be repeated again, the algorithm that generate the key sequence is applied first, then another algorithm is used to combine these 16 bit keys into a single binary key with the same size of the original binary plain text message.(after convert it into its ASCII code values). By doing so, each message would have a key value differs from other message depending on its size (bits length).

PROPOSED ALGORITHM
In this work, the message transfer between IoT nodes through MQTT protocol has been encrypted and decrypted using one time pad and DNA computing techniques. Messages (plain text) generated by the publisher node is encrypted and the receiver node (subscriber) decrypt the message retrain the original message, show a schematic diagram of the propose system in Figure 4.

RESULTS
The encryption works in the following steps: 1) Convert the plain text into a binary form. For example a message "hello world" is converted to: 1000000000001011100000000000101101000000000001010100000000000110100000000000100101000 000000011001010000000001110010100000000111100101000000011110010100000010111100101 2) Encode the binary sequence message such that each two bits denote a genome depends on their where A=00, T=01, C=10, G=11. Then the DNA message is: AAAATCCAAAAATCTTAAAATCGAAAAATCGAAAAATCGGAAAAACAAAAAATGTGAAAA TCGGAAAATGACAAAATCGAAAAATCTA 3) Generate a PRNG using the 16-bit LFSR which will generate an array with 16-bit binary of each element.
In this step, an algorithm is used to combine these numbers to generate a binary sequence with a length equal to the length of the original binary plain text message: 1000000000001011100000000000101101000000000001010100000000000110100000000000100101000 000000011001010000000001110010100000000111100101000000011110010100000010111100101 4) The binary key message is also encoded into a genome sequence in the same manner in step 2: CAAAAACGCAAAAACGTAAAAATTTAAAAATCCAAAAACTTAAAAAGACCAAAAGCTTAAA AGGACCAAAGGACCAATTGCTTAATCG 5) By using Table 2 (Addition rules) then the DNA sequence is: CAAATCAGCAAATCGCTAAATCCTTAAATCCCCAAATCTCTAAAACGACCAATGCTTTAAT 6) A new binary key is generating using LFSR with length equal the DNA sequence generated in steps above. Such that if any bit in this key is 0 then the corresponding genome is inverted (A=T & G=C): 11011100000000000110110000000000011010000000000011100000000000110001000000000001000 7) The final sequence is the cipher message that is sent by the publisher node, the decryption process is the reverse process of the encryption but instead of the Table 2 (Addition rules), Table 3 ( Subtraction rules) are used: CATATCTCGTTTAGCGAAATTCGAATTTAGGGGAATTGAGATTTTGCTCCATACGAAATTA

ALGORITHM IMPLEMENTATION RESULTS
In Figure 5

CONCLUSION
Information security is one of the most risky and challenge issues in IoT application which require more attention from the researchers. In this work a multi-level of data encryption has been applied. Encode the plain text message into a DNA sequence. Then apply DNA computing between the coded DNA message and the encoded DNA key by means of DNA computing rules. Also another key sequence generated by the LFSR with different seed value, and generates a key sequence this time with length equal to the length of the encrypted DNA message to generate the cipher DNA message. The final algorithm shows that the size of the cipher message is twice the original message.