An extensive research survey on data integrity and deduplication towards privacy in cloud storage

ABSTRACT


INTRODUCTION
Cloud computing has offered various forms of application in the form of services, while a distributed storage system in the cloud is one of the most significant contributions [1]. With a large scale of network of cloud clusters over its global datacenters, now it is feasible to access and store the data and make it available anytime [2]. Different forms of challenges associated with cloud computing are data availability, data access, data integrity, data location, and network load [3]. Apart from these, there are various other forms of security threats that act as a challenge to the cloud storage system. The primary challenge is to ensure data integrity, which is about offering access rights by legitimate members only. Therefore, a data integrity mechanism is usually implemented in the form of validating the data. There could be various reasons that lead to a vulnerable situation towards data integrity over the cloud storage system. At present, there are various schemes of data integrity [4][5][6], but they suffer from i) lack of dynamic support, absolutely no preservation of privacy, zero codes of error correction for solving data corruption issue. Majority of the existing schemes of data integrity suffers from more number of problems, e.g., limited numbers of secret keys are utilized in the verification process, and so they are not applicable for large scale data. Another related problem of the existing system is that a user will need to have physical access to complete data in order to generate a new security token and hence, they are not applicable for the larger file system. Apart from this, the quantity of updating operation, which is very important in security, is highly limited for the clients. Moreover, a scheme like scalable provable data possession doesn't offer insertion of the block. Apart from this, such a mechanism of data outsourcing increases privacy issues [7]. In the direction of the security of data, data deduplication is also frequently used for distributed data storage in the cloud. The prime task of the deduplication operation is to retain the highest possible security information and retain optimality of storage space [8]. Once the data is encrypted, it is subjected to a deduplication process which maintains more security and redundant data management. In order to offer a secured deduplication process, it is necessary to offer encryption process [9,10]. The process performs permutation of the data that is replicated with a specific secret key where the elements of the replicated data are obtained by applying conventional hash function. The client obtains the secret keys after the encryption process, and the encrypted data is forwarded to the client after that. According to the conventional theory, it is stated that applying secure deduplication will result in optimization of channel capacity, more data reliability, up-scaling performance, etc. However, it is very difficult to ascertain this fact in the practical situation as there are various forms of threats widely available over cloud ecosystem which is more potential and their attack behavior has never been studied in the past. At present, there is no such evidence of a standard model which claims that data cannot be accessed by the illegitimate member and thereby causing a breach to a distributed data storage system. Therefore, the present manuscript offers a discussion of some recent trends of research contribution towards data security over cloud storage system in order to visualize the existing scenario. The core goal of this paper is to offer current state of condition of the existing solution towards security problems in the cloud storage system. Section-2 discusses data integrity problem while discussion of data privacy problem is carried out by Section-3. The research work towards data deduplication is carried out by Section-4, followed by highlights of open research issues in Section 5. Section 6 briefs of possible future work direction towards addressing the existing security problem in cloud storage. Finally, Section 7 discusses the contribution of the existing paper.

STUDY TOWARDS DATA INTEGRITY PROBLEM
Data integrity is one of the primary security problems over the distributed storage system in the cloud ecosystem. The concept of data integrity lets the original user access and offer complete control of managing their intellectual property and bar other illegitimate users. However, there is less evidence about it. By making the replicates of the data over distributed cloud servers, the service providers have the nearest access to such data. Therefore, there is always uncertainty about the ownership of the data from the security aspect, which directs a question mark over data integrity over the cloud storage system. Various conventional mechanisms Figure 1 has been evolved out in order to address the problem of data integrity over cloud storage system viz. i) provable data possession [11], ii) Message Authentication Codes integrated with provable data possession scheme [11], iii) usage of symmetric encryption in provable data possession scheme, and iv) Proof of Retrievability [11], etc.  In the existing system, the data integrity problem is investigated by remotely accessing cloud storage units. However, such a mechanism of assessing data integrity is also challenging owing to the distributed nature of the cloud storage units. * The problem of data integrity is more complex in the area

2013
of Internet-of-Things (IoT) as massive generation of data. The existing mechanism is not functional over assuring IoT data integrity as their applicability is restricted over a single data block. This problem is sorted out by a tree-based data structure design for supporting the parallel update of multiple data blocks, as seen in the work of He et al. [12]. The authors have used a homomorphic encryption mechanism for seamless data transmission and for supporting enhanced updating process. However, such schemes are quite ineffective against sensitive file whose integrity cannot be ascertained. It is also essential that there should be run-time check towards such forms of the file system. Study towards such direction is carried out by Shi et al. [13] where the integrity of such dynamic data is made possible to be verified. An effective resistance towards illegal access of files is constructed by tracking operations associated with cache and input-output.
Blockchain is another mechanism to offer data integrity, considering the data via a virtual machine. Zhao et al. [14] have constructed a network on the basis of blockchain, followed by developing a partially constructed block that is distributed to other nodes for ensuring data integrity. The technique also uses attribute-based encryption for further securing the network of the data block. Apart from this, an identitybased encryption mechanism is also reported to offer remote checking of data integrity as seen in the work of Wang et al. [15]. Adoption of data auditing mechanism is also another mechanism assisting in the identification of the state of data integrity. However, they too suffer from key management problems that render the possibility of intrusion in storage units. The work of Li et al. [16] has constructed an auditing model where fuzzy logic has been used along with the secret sharing process for ascertaining robust data integrity with fault tolerance. Auditing method to offer data integrity has been presented by Shao et al. [17] where the vehicular network has been considered as a case study. The technique uses the tree-based structure with multiple branches for facilitating authentication as well as the technique also jointly uses a digital signature as well as bilinear pairing scheme. It is because the bilinear scheme has been found to reduce the overhead of the meta-data generation, as claimed by Shuang et al. [18]. Apart from this, the usage of enhanced signatures scheme is also proven helpful for offering data integrity of multiple clients with the same data. Such work was carried out by Wang et al. [19] where a public verification process has been presented with a data block being signed by multiple owners.
Essential information could also be in the form of a query system which is currently found to be vulnerable in terms of authentication of over outsourced cloud data. This problem has been addressed by Hu et al. [20], where a Voronoi based scheme has been introduced to understand the relationship between the spatial data and the query system. Apart from data, service integrity is another problem over cloud ecosystem when associated with the distributed architecture of the cloud. This problem has been solved by Du et al. [21] where the graph-based approach has been adopted for offering identification of malicious user followed by offering quarantined operation. The graph-based approach towards distributed cloud storage has also been presented by Lu and Hu [22] where the authentication is supported publically by Voronoi diagram over graph along with the enhanced hash tree. The author has also used homomorphic validation scheme to ensure data integrity.
According to Chen and Lee [23], code regeneration is one effective mechanism to ensure fault tolerance over a distributed storage unit. A model has been developed, which considers the mobility aspect of the Byzantium adversary and offers an enhanced capability to the client to perform a remote check of data integrity using a mathematical model. Study towards remotely checking of integrity has been carried out by Fan et al. [24] in order to protect the integrity proof using a non-conventional cryptographic means of handshaking mechanism. Adoption of erasure-coded while constructing a cloud storage system is also considered to protect data integrity. Integrity checking scheme presented by Shen et al. [25] using homomorphic validation scheme. Adoption of a trust factor over the operational platform is another mechanism to address this problem. The approach of Du et al. [26] has used a virtualized platform where trust computation is carried out towards access attempts over the cloud storage units. Apart from this, other popular existing schemes include joint usage of identity and homomorphic encryption (Yu et al. [27]) and obfuscation-based approach (Zhang et al. [28] and Zhu et al. [29]). These schemes address the data integrity problems with its specific cryptographic approach -the next section briefs of schemes to protect privacy factor.

STUDY TOWARDS DATA PRIVACY PROBLEM
Irrespective of the potential privilege of cloud storage in distributed order, there is always a risk of privacy factor associated with the data. The primary reason for this is the higher degree of dependency of the third party vendors to offer data security, which may not be appropriate to the exact business demands leading to loopholes in privacy. The root causes of privacy problems in the cloud are because of following-ineffective control over the data (especially while performing file sharing by the third party), illegitimate leakage of data (even by the service provider as well as by malicious hackers), accessibility of the data/service by diversified devices (or service provider), higher risk of data interception over internet, poor key management, storage of user credential over cloud that can be fairly compromised. Therefore, there are various pitfalls of the existing system that is not so robust in protecting privacy factor of the data stored over cloud storage unit. In order to address the privacy problem, there has been an evolution of various research-based schemes and techniques. Out of various schemes Figure 2, the encryption-based scheme is one potential scheme to resist adversary to leak data privacy factor. The work carried out by Alabdulatif et al. [30] has used homomorphic encryption for retaining the privacy factor for sensor data reposited over the cloud. According to this scheme, the transmitted data over the cloud is encrypted while forwarding to the cloud servers. Apart from encryption, recent approaches have also witnessed the usage of watermarking approaches towards strengthening data privacy. The work of Tang et al. [31] has utilized adaptive watermarking scheme that is capable of encapsulating the data securely. The technique also uses Diffie-Hellman as a standard key-exchange mechanism for resisting replay attack. The mechanism of data embedding is fixed while applying the adaptive watermarking operation. The technique uses a consensus mechanism with simplified challenge and response based intrusion resistance technique for preventing replay attack. Another recent work carried out by Du et al. [32] has used a symmetric encryption mechanism for resisting data leakage issues. The authors have presented an indexing mechanism for the privacy factors as well as protect multiple query processes which are claimed to be resistive against keyword-based intrusion. However, the approach could only offer forward privacy and not backward privacy factor, which is also essential. All these studies have been carried out with respect to hypothetical data and cannot be claimed to be secured if the data type is changed. It is because various biometric-based applications are running over the cloud system whose morphological information is protected in distributed storage units. The work carried out by Hu et al. [33] has used key agreement over the specific session as well as an encryption scheme for facilitating data privacy. The implementation of the study has been carried out, considering the fog computing environment where SHA-1 and Advanced Encryption Standard (AES) has been mainly used. Work in the equivalent direction towards adopting fog computing was also seen in the case study considered by Wang et al. [34]. According to the author, the existing encryption techniques that are frequently adopted in offering data privacy are incapable of resisting threats within the cloud storage units. Therefore, a multilayered based cloud storage system is formulated on fog computing. The technique has also used Hash-Solomon code for splitting the data as well as for assisting in decoding operation. Just like the capability to deal with the problem of data integrity, the code regeneration technique is found to resist data privacy problem too. By integrating auditing scheme with code regeneration approach, Liu et al. [35] have developed a system to ascertain robust data privacy. Auditing scheme has also been found to offer a solution towards privacy protection. Unfortunately, existing privacy protection scheme cannot be helpful much over the distributed nodes in the cloud. This problem has been discussed by Wang et al. [36] where the ring signature has been utilized for constructing metadata associated with verification demanded to assess the appropriateness of distributed shared data. According to this scheme, the information connected with the user identity is kept private from other users without any dependency over complete data.

2015
User information in terms of identity is highly variable term and can be used for protecting data integrity. Therefore, usage of user identity information integrated with lightweight encryption scheme can be considered as a good option for protecting data privacy. Study in this concern has been carried out by Yu et al. [37] where the authors have used user identity information integrated with the joint usage of the key-based and homomorphic-based encryption mechanism. The authors claim of good control of computational complexity as well as reduced cost of operation using this cryptographic approach. According to the study, the authors highlighted that frequently used public key infrastructure is not a good option as it suffers from computational complexity. The technique also claims that data privacy is ensured without leading any private information associated with the stored data over the cloud. Work of Li et al. [38] has developed an auditing scheme considering the concern of low-end computational devices. The technique uses a digital signature as well as the mechanism offers better data dynamic with a wide range of supportability towards batch auditing. Study towards facilitating the public assessment of the data privacy is also carried out by Wang et al. [39]. According to the scheme, the verification towards the data privacy can be carried out without any dependency to access the original data content. Such claims are also offered in the work of Hao et al. [40]. Research Gap: The approaches towards ensuring data privacy have been discussed by various researchers where the majority of the approaches are found to have a common claim, i.e., ensuring data privacy without any dependency of accessing the original data from the verifier viewpoint -the next section briefs of data deduplication approach.

STUDY TOWARDS DATA DEDUPLICATION PROBLEM
Owing to the distributed nature of the cloud storage units and the presence of the virtualized environment, duplicated, and redundant data always exists in multiple sources. Such presence of duplicated data results in error-prone query processing as well as could also result in a security breach over the cloud storage system. One of the recent techniques to mitigate this problem of redundant data is called data deduplication resulting in minimization of storage overhead as well as optimized better data integrity. According to the standard process, the input file is subjected to hashing for extracting hash value followed by comparing the obtained hash value with that maintained over the index table of hash. Upon finding a positive match, the pointer is set to the existing location of data or else it reposits the novel data on its memory system and allocates a new hash on it. Irrespective of various methods Figure 3, the process of data deduplication can take place in both target and source. Source deduplication results in zero hardware dependency along with minimization usage of storage and network resources. However, target-based deduplication in expensive even if it ensures performance benefits over large data scale [41]. In present times, the deduplication process can be carried out by inline deduplication, post-processing deduplication, block-orfile level deduplication [41]. However, this standard technique of deduplication suffers from various loopholes too viz. i) large expense with data center management, ii) inadequate performance for catering up operating system and backup demands, iii) non-practical capacity planning for deduplication process, iv) not so efficient usage of hashing over large scale environment for waste resource processing, and v) poorly planned life-cycle control process [41].  Apart from the issues mentioned above, there is a strong connection between the deduplication processes with the security factor in cloud storage units. When the cryptographic algorithms (which are majorly used in cloud storage security) are deployed than the original data is transformed to the encrypted data which is a very different format quite difficult even to identify its original form. This process is very different from data deduplication, and hence, there is a potential conflict between existing data encryption and data deduplication process flow. Therefore, when deduplication is applied over encrypted data, it will be extremely challenging even to identify the target data. It is because there can be the generation of multiple encrypted data forms of the same data when applied with different levels of the secret key. This causes failure in the deduplication process eventually.
In this regards, two standard techniques have been evolved in existing system viz. convergent encryption and proof of ownership [42]. The existing system offers feasibility for the direct client to monitor the deduplication process of their data, which also facilitates them to check the data integrity. Therefore, the existing system has jointly investigated data auditing process with deduplication. Study of Youn et al. [43] has applied a digital signature scheme as well as homomorphic validation approach. This operation is outsourced to a third party system in order to perform unbiased validation of data integrity of deduplicated data. Therefore, such a mechanism performs deduplication of data prior to the outsourcing process to the cloud storage system in order to retain better privacy factor. However, such schemes use equivalent encryption key for the same content, making it vulnerable for man in the middle attack. This problem is addressed by Hur et al. [44], where deduplication takes place with the server for managing the access rights for the dynamic data being uploaded by the users.
The presented scheme offers minimal data leakage and maximal data integrity. The proof-based approach also uses encryption while performing deduplication; however, their applicability is limited to a single user. Study towards the similar proof concept of multi-users has been presented by He at al. [45] where the authors have used proof of storage in its dynamic form for assisting the deduplication process for cross-users. The technique also constructs a tree using homomorphic validation. Irrespective of better execution formulation, the work suffers from computational complexity problem as there is an additional need of identifying all the duplicated encrypted files. This problem has been discussed in the work of Jiang et al. [46] using both static/dynamic method for complexity reduction.
A unique data deduplication scheme has been presented by Stanek et al. [47], which is based on a data popularity score of the data. According to this technique, the deduplication process is applied only when the data becomes popular. Study towards secure data deduplication over the multimedia file is discussed in the work of Zheng et al. [48], which encrypts the deduplicated file and uploads it on the specific media center. However, the strategy to offer defense against attacks is put forward by the third party server. Study towards deduplication concerning about reliability factor is carried out by Li et al. [49] over multiple servers of the cloud. The technique also implements secret sharing over a distributed storage system. The combined study of data integrity and deduplication process is presented by Li et al. [50] where a secured cloud system has been introduced. The mechanism calls for performing auditing operation over the conventional distributed software framework. This process generates an index of specific data prior to uploading process to ensure better data integrity. The approach presented by Yan et al. [51] has used re-encryption over proxy sources as well as challenges of ownership in order to perform deduplication of ciphered data over cloud storage system. The technique also establishes associated between access control systems with data deduplication. The similar direction of the work has also been carried out by Fan et al. [52] where convergent encryption process is mainly applied along with hashing/public encryption usage. Research Gap: Irrespective of the approaches mentioned above, the studies towards data deduplication are quite less in contrast to other associated problems with data security of the cloud. The next section outlines other auditing processes.

EXISTING SECURITY AUDITING SCHEMES
Auditing is a procedure to investigate the performance effectiveness of the services hosted over cloud environment. Generally, auditing is carried out by third parties in order to extract data associated with various operational performances of cloud-based application/services. The prime objectives of performing auditing are viz. i) formulating the data architecture, ii) controlling IT risk, iii) strategically constructing an IT plan, iv) communication management, and v) security controls. Therefore, auditing scheme relates to the operational assessment of cloud where security is just one factor to be assessed along with many other functional factors [53][54][55][56]. Hence, most recently Usage of the private key for validating the user by the third party is one of the common techniques of security auditing system (Zhang et al. [57]). Such scheme offer benefits on processing time, which is required to assess the scalability of the auditing scheme. As seen from the previous section, deduplication is also witnessed to be frequently used as a standard auditing mechanism over cloud storage (Aujla et al. [58]). Auditing scheme can also be enhanced by using messaging factor as well blockchain. It was found most recently that blockchain offers more privacy and better form of data integrity in the existing auditing scheme (Esposito et al. [59]). Apart from the messaging system, usage of identity factor for auditing offers more data hiding capabilities without affecting data availability. Study towards identity-based attribute for cloud auditing has also been carried out by Wang et al. [60] where the technique has been used for outsourcing data. Such scheme facilitates the user to select a secured proxy in order to outsource the data over the server. Identity of such proxy nodes is used for verification, which discards the utilization of certificate over the server (He et al. [61]). A scheme discussed by Shen et al. [62] has used a signature-based scheme for validating data integrity while performing remote auditing. Such schemes can be more inclined towards a single attribute of security while multiple attributes of security consideration are highly mandatory to offer data security over distributing storage (Yang et al. [63]). Existing approaches towards auditing scheme as also been focused on using symmetric encryption with the capability to verify the outcomes. Such techniques, when integrated with the hash tree, offers robust building evidence. It was also noticed that the existing auditing scheme claims of supportability for public users where public key encryption plays a dominant role, and thereby publically auditing tool has evolved (Yu et al. [64], Jiang et al. [65]). Usage of the hash tree was also found useful in auditing distributed software framework, e.g., MapReduce (Wang et al. [66]). Constructing a hash table dynamically also facilitates public cloud auditing, but they still suffer from dependencies from third parties (Tian et al. [67]). A recent study carried out by Wang et al. [68] has used public key encryption without any certificate while the scheme is claimed to offer provable possession of data. A similar form of adoption of provable possession of data was also seen in the work of Wu et al. [69]. However, such schemes also suffer from the disclosure of the public key. Yu and Wang [70] present a study addressing this problem. Apart from this, such schemes only support static attribute while the dynamic attribute is highly demanded (Ni et al. [71]). Incorporating flexibility to such an auditing scheme offers more capability to extend its verification process over multiple nodes, too (Jian et al. [72], Ren et al. [73], Zhu et al. [74]). Existing studies have also been carried out considering the mobile users where auditing is facilitated without any dependency of a third party (Zhang et al. [75]). It was noted that the usage of proxy re-encryption is quite good enough for resisting threats if they are well defined. Literature has also witnessed a unique approach where the algebraic charecteristics of data is computed for carrying out remote auditing of data over cloud storage (Sookhak et al. [76]). Enhancement to the existing data structure in this regard also assists in dynamic auditing data. A similar line of methodology was also carried out by Yuchuan et al. [77] where the algebraic properties of data are dynamically computed for facilitating remote auditing process. The study considers formulating the model using proxy node, cloud, and user where signatures are used in proxy nodes, and data is maintained in cloud storage. Adoption of the trust factor is another evolved scheme facilitating secured auditing procedure over cloud storage. The works of Gonzales et al. [78] have developed a reference model using multi-tenancy. An effective auditing scheme is also presented by controlling  [54]). Such a scheme offers enhanced forward security and better security assessment model. Adoption of encryption based approach for performing remote auditing of data is more prevalent in the existing literature. The message authentication code is reportedly used alongside with homomorphic validation method for data auditing. Utilization of proof of retrievability is another data auditing scheme in the existing system [79]. Existing literature has also explored that if the updates among the storage units over cloud could be securely updated than it could offer better-secured reposition of distributed data over the cloud. This fact was proven by Liu et al. [80] where a signature, as well as the hash tree, has been used. The work of Yang et al. [81] has taken the shape of a protocol emphasizing over privacy actor while performing auditing while the work of Wang et al. [82] discusses data dynamicity associated with auditing. Research Gap: It can be noticed that there has been extensive research contribution focusing on public auditing mechanism, which is mainly carried out remotely. Majority of the schemes offers such verification privilege to users where different encryption and signature schemes are used to secure the auditing operation over cloud storage.

OPEN RESEARCH ISSUES
The open research issues are as follows: -Unrealistic Assumptions: Almost majority of the solution towards data integrity problem is carried out by public verification by the user only and not by the service provider. This assumption bounds the user to involve in the verification process with higher communication overhead consistently. Moreover, user cannot be assumed to always possess high configuration computational device and good network resource availability. Another unrealistic assumption of all the approaches of public auditing scheme is that the users (or auditors) are a non-malicious node. It is not always possible to confirm this as normally the users will be have more exposure to the threats compared to service providers and hence if the auditors are from user side than there is no guarantee of its legitimacy. -Non-Applicability towards External Intruder: A closer look into all the existing approaches towards data integrity, data privacy, and data deduplication method for secured cloud storage will show that they have been experimented with respect to specific forms of threats. Such threats are mainly internal, and hence, privacy cannot be protected for such data. All these forms of threats are highly capable of bypassing the existing auditing mechanism as it is not cost effective feasibility to construct a secure communication channel during the ongoing auditing process. -Computational Cost not Considered: Practically speaking, all the outsourced data cannot be considered to be safe, which is not discussed in the existing system due to the lack of sufficient physical control over the outsourced data. Researchers have also claimed that remote auditing schemes can solve it, but they are not much applicable to the massive scale of data owing to the involvement of large cost. Some of the presented technique claims of supporting updating operation over dynamic data, but such operations are carried out at the cost of the extensive computational burden. -Deduplication not focused on Data Integrity: The existing approaches towards data deduplication have used file level as maximum approaches. All these approaches are found to use convergent encryption as a standard. By doing so, data integrity cannot be ascertained as performing deduplication over the encrypted file will require some dependency on the metadata information which was never considered by any researchers. It will mean that the deduplication process in the existing system will only retain privacy to some level at the high computational cost but not the data integrity. In order to offer better data security over cloud storage, it is necessary to incorporate data integrity, data privacy, and secure data deduplication combined. None of the existing research work is found to offer benchmarked outcome of secured distributed cloud storage till date.

FUTURE LINE OF RESEARCH
From the prior section, it was seen that it is quite a challenging process to jointly achieve data integrity and data deduplication in order to incorporate better data privacy over cloud storage. Therefore, better feasibility of implementation of the secure cloud storage system can be carried out using divide and conquer rule. Figure 5 highlights the future line of research to secure distributed cloud storage system. Following are the brief information of implementation: -The strategy of Implementation: The core strategy of implementation will be to develop two different subframework viz. i) framework for offering robust data integrity and ii) framework for secure data deduplication. Both the framework will have a common goal of data privacy incorporated within it. Apart from this, the proposed system also targets to resists the majority of lethal threats over the cloud storage server.

2019
-The flow of Execution: The primary step will be to develop the first sub-framework, where users will be offered authority to cross-check the integrity of the data stored in the distributed cloud. A simplified encryption scheme can be developed to store the indexed data, followed by a unique preventive measure. A challenge-based message could also be used for preventing any form of access by the intruders, thereby protecting data integrity and privacy. The secondary step will be to enhance the standard approach of proof of ownership. A novel indexing mechanism can be formulated that maintains consistency over the secure data deduplication process. The existing tree structure can also be modified for facilitation better encryption process over the key. This will assist in the generation of the secret key to be used for data uploading over storage servers resulting in better privacy control. The indexing mechanism can be carried out over block levels, which offers unique deduplication process along with privacy preservation. -Anticipated Outcomes: The anticipated outcome of the proposed study will be to retain a good balance between dynamic intruder resistivity capability and optimal service delivery. The model is also expected to offer both forward and backward secrecy with less computation overhead, unlike any existing system.

CONCLUSION
Offering a higher degree of protection over split data in the storage servers of the cloud system is yet to be achieved. At present, there is various works being carried out towards ensuring data security, but the approaches towards securing a data storage system are quite scattered. It is because efficient and robust cloud data storage will mandatory required to ensure optimal data integrity, data privacy, and data deduplication, which are some elementary operation carried out. Unfortunately, the existing research work is not found to incorporate all the above three points towards evolving up for better storage solution. Therefore, the existing solution always lacks one out of these three points towards a secure data storage system. This manuscript discusses the contribution of recent work being carried out in this direction and briefs of all the open end problems followed by a discussion of a possible way to carry out further research work.