Implementation of message authentication code using DNA-LCG key and a novel hash algorithm

ABSTRACT


INTRODUCTION
Security aspects come into picture when there is a need to protect the information from an adversary who may be a threat to data authentication, confidentiality or integrity [1]. Various techniques like digital signatures are used for authentication purpose; encryption provides data confidentiality and data integrity is preserved using MAC [2]. MAC falls into two categories; those involving the use of secure hash algorithm known as HMAC [2] and those based on symmetric block cipher (CMAC).Typically most of the Hash algorithms use a compression function, which is a combination of binary and logical operations [2].
The composition of a MAC algorithm explain in (1). It takes a message and a secret key as input and produces an authentication code as output. The integrity of the received message is verified at the receiver end, as the recipient posses the secret key which can be used to generate the authentication code again and thus, comparing it with the one received. A popular form of MAC [2] uses a cryptographic hash function; the secret key can either be given as an input along with the hash or it can be embedded in an existing hash algorithm.
Various researches have been reported with enormous protocols of data security in the last few decades [2]. An implementation of Advanced Encryption Standard (AES) algorithm on a microcontroller for securing data in a small scale network has been presented in the past [3]. Sofia [4] presents a biometric based MAC algorithm called WBAN, which can be applied to assure data authenticity and integrity in a wireless body network. Dilli and Chandra [6] presented another scheme which uses HMAC SHA 256 Algorithm for message authentication and data integrity. Hans, Christian and Ulrich implemented a heterogeneous flexible computing platform for the network nodes, i.e. the Universal MAC (UMAC) using Universal Hashing [3]. Verma and Prajapati [6] present a novel SHA that possess less execution time and better bit difference value, this can be impleted in order to increase the security.
Security is a broader term which involves three requirements; Authenticity, Confidentiality and Integrity. Authenticity ensures that the received message is authorized and has been received from the intended party, confidentiality is prevention of data from unauthorized access, and integrity is detecting if the data contents have been altered by an unauthorized party [3]. Usually data integrity provides methods include the use of a shared key and a hash algorithm forming a MAC. In this scenario, the data sent by the source has a tag appended with it which is the result of a MAC function and is known as a Message Digest (MD). At the receiver end, the hash is again computed and then compared with the received value. If the received and the computed one are same then the message received is concluded to be unaltered else there has been a modification. This is formulated on the basis of the principle concept of MAC, which says that the MAC value is a unique representation of a data value and it cannot be same for two different values. MAC constructed from block cipher like DES, are called MAC schemes and those which are formed using cryptographic hash function like SHA are called as HMAC schemes [8].
Considering the significance of data integrity, a new MAC scheme RMAC is proposed which involves biological features of the user and LCG sequence as the key, along with a novel hash algorithm. The algorithm integrates a crypto-hash function along with a biometric key generation technique which involves the use of the DNA characteristics and LCG sequence [9].

PROPOSED ALGORITHM
Data integrity is preserved using MAC which is a function of a secret key and a hash algorithm. RMAC uses a novel hash algorithm which follows the basic structure of SHA-160 and has an 'f' function integrated into it along with a secret key that has been produced using a DNA sequence and LCG output random sequence. The details are explained in the following subsections:

Novel hash algorithm
The novel hash algorithm used in this scheme is an outcome of the 'f' function embedded in the basic structure of SHA-160 that consists of 80 rounds and for every 20 rounds, a constant 'K' is used as an input. So, there are total four 'K' values, each of which is an eight digit hexadecimal value. The MD produced is of 160-bits. The 'f' function used in the algorithm constitutes of three operations; Expansion (EXP), S-Box substitution (S) and modulo 2 48 addition (+) applied on the five register values (A, B, C, D, E) [10]. The structure of the proposed hash algorithm is explained using Figure 1.

DNA-LCG based secret key:
The proposed hash algorithm is applied on the message input along with the secret key. The secret key used in the presented technique is deduced using the characteristics of DNA. The DNA is represented in the form of a sequence constituting of 'agct' characters adhering to a unique paradigm for every individual. The characteristic uniqueness of a DNA sequence makes it impossible to be replicated or stolen.
To enhance the efficacy of the secret key, a random number generator LCG is used which produces a random output sequence of 256-bits; this output sequence is a result of the secret seed value given as an input to the random number generator. The DNA sequence is converted into its binary form and exclusive-or operation is applied between DNA and LCG sequence. The result is a 256-bit key, which is used in the formation of RMAC [9].

Formation of RMAC
MAC is also known as keyed Hash i.e. a hash algorithm which requires a secret key to operate. The generated DNA-LCG key is of 256-bits and in order to use it in RMAC, it needs to be converted into four 32bit keys. This conversion is done using few operations on the 256-bit key, which are explained using Table 1. Figure 2 explains the operations applied on DNA-LCG key in order to obtain four keys, which are to be used in MAC  //splitting the key into 8 parts of 32-bit each x(i+1,:)= Y(32*i+1:32*(i+1)) end for j=1:4 //Applying exclusive-or operation between each consecutive pair forming four 32-bit sequences and then complementing the results. k(j)= ~xor (x(2*j-1,:), x(2*j,:)) end //k represent the key// //The four key values are converted into hexadecimal form before being used in MAC// The four final keys in hexadecimal form are shown in Table 2. These four keys are replaced with the four 32-bit constant values used in elderly SHA-160 [10]. This frames a RMAC algorithm. The MAC values obtained, using the proposed technique are presented in Table 3. The obtained MAC values are converted into binary form and evaluated on randomness and avalanche criteria.

RESULTS AND DISCUSSIONS
The performance of RMAC is analyzed on the basis of NIST tests of randomness and avalanche criteria. This is done for three inputs values having varying lengths. These tests compute the P-value for a binary sequence; which must be greater than 0.01 for a sequence to be declared as random sequence [12]. The simulations have been carrierd out on MATLAB. The computed values for RMAC under various tests signifies the efficiency of the proposed technique and signifies its applicability in practical sensrios. In order to certify the efficiency of our presented scheme, the NIST results of RMAC are compared with those of the traditional ones. The eight digit hexadecimal key used for these algorithms is '3A54E26B', which is kept constant throughout for all the traditional techniques. A brief overview of the various NIST tests is given as: a. Frequency Test This test computes the ratio of the number of ones and zeros in a sequence. It observes the closeness between the number of ones and zeros. A sequence is random if the proportion of both is close to each other [12]. The results in Table 4 illustrate that the proposed algorithm produces better proximity between the count of ones and zeros as compared to most of the other schemes. The Binary Derivative Test proceeds by applying exclusive-or operation between consecutive bits of a sequence until only one bit is left. Then, the ratio of number of ones to the total length of the sequence in each case is computed. Lastly, the average of the ratio for all the sequences is calculated, if this value lies near to 0.5, the sequence is said to be random [12]. The results in Table 5 depict that the output of the proposed scheme is random.  Table 6. The motivation of this test is to calculate the frequency of all the overlapping bit patterns existing in the sequence. It compares the frequency of overlapping blocks of two sequential lengths with the expected outcome for a random sequence. The results are given in Table 7.  Table 8. As observed from Table 4 to Table 8 RMAC performs better by passing the NIST criteria of generating a random MAC. Thus, indicating its efficiency as a MAC technique. The purpose of MAC is to preserve data integrity and to significantly detect any change in the message [13]. Also, a particular MAC is unique for particular data content and thus it can indicate any change in the data. Thus, a change can be observed in a particular MAC value if the data file is altered [14]. To study this parameter, another test has been applied to the RMAC values, this is the Avalanche Test. This test calculates the avalanche effect i.e. the change in the output with respect to a change in the input, which is calculated using the formula given in (2). The input consists of 128-bits.

× 100
(2) The more the avalanche effect, better is the efficiency of the algorithm. This test has been applied by altering a single character of the input value. The Avalanche Test results are summarized in Table 9. It is observed that RMAC performs well under this criteria too, thus demonstrating its efficiency. The increased complexity of RMAC makes it highly resistive towards various network attacks on data integrity, a brief summary analysing the behaviour of the technique is presented in Table 10 [15]. The RMAC algorithm is complex and therefore is highly resistive towards various attacks on integrity, thus increasing its applicability in a data sensitive environment. As observed from the avalanche test analysis, a small change in the input results in a major change in the output. Hence, even the minute modification in the data would be detected.

Data diddling Attacks
Since the proposed scheme uses biological characteristics to frame the secret key, therefore it is not possible for the data to be modified by an unauthorized party. Man-in-the-middle attacks The proposed technique is hash based, thus it is highly resistive towards any unauthorized alterations in the transmitted data.

Seed Attacks
Keys generated using only the random number generator outputs are susceptible towards seed attacks; the key used in the proposed MAC is a result of fusion of DNA and random number generator output, thus increasing its resistance towards seed attacks.

CONCLUSION
This paper presents an efficient MAC technique which is a result of a novel hash algorithm and a secret key generated using DNA and LCG. MAC also known as cryptographic checksum is an authentication technique which uses a hash technique along with a secret key to preserve data integrity and validate the source. The RMAC involves the use biometric characteristics along with a novel hash algorithm to frame the MAC, thus increasing its efficiency. The analysis of the results concludes that the proposed algorithm has higher complexity than most of the other schemes and thus performs better than most of the existing HMAC schemes such as MD2, MD5, SHA-160, SHA-256, SHA-384 and SHA-512. This scheme uses a secret key which involves DNA characteristics of the user, thus making it considerably more reliable and resistive towards attacks. The proposed RMAC scheme can be effectively used in various cryptographic techniques for data integrity and better security.