Novel holistic architecture for analytical operation on sensory data relayed as cloud services

ABSTRACT


INTRODUCTION
The adoption of the sensor in consistently increasing in many commercial as well as domestic applications in present times [1]. Basically, the sensors are programmed to collect significantly large quantity of an environmental data which is accumulated over certain sink by the process called as data aggregation [2]. From the technical standards and concept of sensor network, it is believed that every sensor has a specific allocation of particular amount of resources (especially energy) while performing the process of data transmission as well as idle sensing [3]. It will obviously mean that a sensor spontaneously degrades its energy level in each unit of duration of their operation. Therefore, when sensors are deployed in massively connected environment, they accumulate a larger stream of data which is required to be processed too. Usually, an algorithm responsible to do that is embedded within a sensor itself. This will mean that more the data to be processed, the sensor will deplete more amount of energy by themselves and eventually it will start negatively affecting the complete network too. This is the case how energy efficiency is connected with the analytical operation over the big data, which requires immediate attention from the application viability perspective. Apart from this, there are many other problems too. For an example, when an investigation is carried out towards big data analytical aspect, it is necessary that the data under consideration should have all the inherent properties of it that can be used as evidence that input data is of big data. Unfortunately, extracting such form of live stream of data with all the charecteristics of the big data is not feasible from research viewpoint, which is also one of the core impediments towards a successful research work over sensory big data [4]. Another essential problem associated with the research-based analysis

4323
of the big data is that all the big data is required to be stored at specific sector over the distributed storage units of the cloud data centers. Unfortunately, all the existing commercial practices of big data managements calls for first storing the raw data over the storage units and then analytical schemes are applied over the big data in order to obtain mined information. The finally obtained mined information is then stored in data warehouse [5]. Such chain of processes not only captures maximum infrastructures but also is computationally expensive process for which reason the knowledge extracted are just stored and its utilization is again based on the skills of the users. At present, there are research work being carried out over big data approaches and its improvement [6][7][8], but in reality they are much in infancy stage and requires more time in order to expect the anticipated outcome. There is also no denying the fact that there existing good number of mining/analytical tools to carry out the task of mining but unfortunately, there is always certain level of barriers in order to expect maximized outcome and full-fledge accuracy. Although various research papers advocates the usage of distributed software framework Hadoop and MapReduce but there are also researcher who has explicitly highlighted limitation of it which is yet an unsolved problem till date [9]. Therefore, this paper presents a novel and unique modeling of analytical approach that considers sensory data as an input which after processing yields a better form of knowledge that has higher accuracy. The modeling is carried out in such a way that a sensor without dissipating more amount of resources should offer supportability of better form of big data analytical operation. Optimization of the storage is another significant target of the proposed system. The organization of the proposed paper is as follows: Section 2 discusses about system design, along with assumption and strategies used followed by discussion of result analysis in section 3. Finally, the conclusive remarks are provided in section 4. This section discusses about the recent work being carried out towards developing analytical applicatiions using big data concept as an extension to our prior work [10]. Most recently, the discussion carried out by Zou et al. [11] have stated importance of big data approach over forestry data. Similar form of discussion of the big data analytics considering a case study of disaster management was carried out by Shah et al. [12]. Consideration of the case study of healthcare section was seen in the work of Hadi et al. [13] where a network optimization-based approach has been discussed with respect to linear programming and fairness scheme. Issues and challenges related to the big data aggregation connected to sensory application have been discussed by Boubiche et al. [14] where various strategies have been presented. The work of Cao et al [15] have presented connection between energy efficiency and big data and concluded that there is potential need to be considered as they are yet open issues. Jabbar et al. [16] have presented a discussion of framework that deals with the processing big data in existing system along with highlights of the effectiveness. Jindal et al. [17] have used fuzzy-logic towards developing an analytical solution considering healthcare-based application emphasizing over the classifier design. The work of Meng et al. [18] have presented a framework for improving quality of experience by adopting convolution neural network exclusively meant for high-dimensional problems. Puthal [19] have used a lattic-based model for formulating access-based methodology towards streams of big data focusing on volume and velocity issues in big data considering healthcare applications. Explicit study towards sensory data and its scalability approach was discussed by Rafferty et al. [20]. Another review carried out by Rizwan et al. [21] have emphasized over the nano communication aspect over big data considering case study of healthcare sector. Al-Ali et al. [22] have presented a work where energy efficiency factor is emphasized over using business intelligence over the analytical application of big data. Magarino et al. [23] have used agent-based approach for investigating the sleep-based data obtained from sensors and IoT applications with higher accuracy. Marjani et al. [24] has also highlighted various open-end issues connected with the use of big data mining over IoT. Parwez et al. [25] have used unsupervised approach of classification over the mobile network for analyze call records. Wang et al. [26] have presented a predictive framework for forecasting the price of electricity on the basis of the classification as well as selection of potential features. Sun et al. [27] have presented a discussion about the analytical approach over the network community in IoT. The work of Li et al. [28] has carried out study towards geographic data where the optimization approach is used using open source distributed software frameworks. Similar line of research work is also carried out by Yue et al. [29] who have developed big data analytics for identifying web events associated with sensory data. Jiang et al. [30] have used hidden markov modeling for monitoring the behaviour of ambient assisted living. Therefore, there has been various approaches towards big data analytics which is proven to offer significant beneficial mining operation while the next section outlines the research issues. The identified research issues are as follows:  Existing approaches of the data analytics associated with big data doesn't include the comprehensive and inherent problems in it prior to processing.  Storage factor is not found to be addressed in any big data related approach without which applicability of analytics cannot be scalable and practical too.  Exclusive connectivity and impact study of energy constraint over resource constrained nodes are not studied over a discrete scale of approaches.  Studies proving the cost effectiveness of the presented solution towards improving the performance of big data analytics are less emphasized over existing system. Therefore, the statement of problem of the proposed study can be stated as "Developing a cost effectiveness in modeling analytical tool for complex sensory stream of big data is quite challenging in order to better performance of the cloud services." This part of the research work is an extension of our prior model [31] and [32] while the focus of the proposed approach is towards evolving up with a holistic architecture that can use sensory data as a service with cost effective design implementation. Considering a case study of internet-of-things (IoT), the implementation is carried out using analytical research methodology. The pictorial representation of the proposed system implementation is as Figure 1.  According to Figure 1, it states that proposed system is basically a level-based and top-down approach when it comes to the architecture design. The top level of the architecture is about synthetic generation of the sensory data which is further followed up by constructing a scalable storage system. This first level of operation leads to generation of knowledge. The second level of operation is responsible for performing analytical operation considering a real-time scenario of implementation. The analysis is carried out considering an effective database management system with inclusion of gateway system of IoT, different varients of servers, and sensors nodes. The third level of operation is basically meant for performing analytical operation where tree-based mechanism is utilized for the topological construction of the IoT environment followed by the sorting of the branch. Finally, the proposed system makes use of the frequent pattern as the mining algorithm where the outcome shows that accurate knowledge extraction. Interesting thing is that the proposed system performs enhancement of the existing frequent pattern logic by adding up all the levels of operation prior to applying frequent pattern-based mining algorithms. The core idea of the proposed system is that it should use the extracted knowledge in the form of analyzed sensory data in the form of cloud-based services. Hence, the information obtained from the sensory fields as representatives of big data are not only optimally stored in cloud datacenters but are also relayed in the form of services.

SYSTEM DESIGN
The core purpoe of the proposed system design is to evolve up with an innovative architectural framework that is capable of offering sensory data in the form of knowledge that can be relayed in the form of cloud services. However, there are various aspects that are considerd in the implementation phase of the proposed study. This section discusses about the essential information included in the system design.

Assumption and dependencies
The primary assumption of the proposed system is to consider that the network connected between the user terminal and the service provider is efficiently configured and is highly safe so that none of 4325 the artifacts are the resultants of security breach. The user terminal is basically a gateway system which is directly connected to base station for aggregating overall sensory data. The secondary assumption of the proposed system is that there are large number data in the form of stream that are arranged in dynamic queue system where certain adaptive managememt of queuing system is assumed to be executed. The tertiary assumption of the proposed system is that all the sensors have a static rate of energy dissipation while attempting to perform any form of information forwarding process. The prime dependency factor associated with the proposed system is that it considers involvement of definitive number of sensors to be arranged in the form of clusters and performing the process of data aggregation. Another significant dependency of the proposed system is that as the study is highly dependent upon dynamic sensory data which is not feasilble to be obtained for the analysis and therefore it demands a programmatic mechanism for yielding sensory data in dynamic order. Consideration of this assumption and dependencies.

Implementation strategy
Developing a robust implementation plan for ensuring sensory data as a service is definitely not an easy task and there are various essential factors that are required to be considered while developing a cost-effective analytical model. Following are the strategies that are involved in design and development of proposed system.  Developing holistic architecture using modular approach The proposed system demands to be formulated in the form of a holistic architecture: however, constructing holistic architecture is a challenging task as there are many issues to be organized and addressed. Therefore, the proposed research work will consider modular approach where bigger problems will be split into smaller version and then grouped back together. The proposed system therefore split the complete implementation into three phases i.e. generation of sensory big data, performing optimization of storage system, and performing analytical operation on the top of it. All these three modules are grouped together to construct a holistic architecture.  Consideration of big data problems It is necessary that the implementation of the proposed system do have consideration of the problems associated with the big data. The core issues associated with the big data is that the data are large, unstructured, and is challenging to be reposited in the SQl based storage system. Therefore, the proposed system adopts a mechanism where internet-of-things (IoT) is considered as a case study with the presence of gateway system, database, and local sensors (or IoT device). The proposed system develops an explicit mechanism just to ensure an effective transmission of data as well as comprehensive analytics mapping with the real-time problem.  Strategic involvement of energy constraint There are various reasons for energy dissipation for a sensor or IoT device where the energy consumption is directly proportional to the data transmission process. Hence, the essential fact is that if the process involved in data processing and analysis is made lightweight than amount of energy that is allocated for such task can be controlled to some extent. Therefore, the proposed system considers a fixed budget of energy allocation in terms of constraint and uses a tree-based mechanism along with simplified mining approach using frequent patterns in order to simplify the process involved in sensory data analytics.
All the above mentioned three points are considered as the core strategic implementation in the proposed system where the prime logic is to ensure that an effective mechanism of knowledge extraction takes place with greater reliability. Therefore, the implementation of proposed system is carried out using analytical research method where it becomes feasible for performing an extensive discovery of knowledge from the sensory data with an assurance that that they are highly energy efficient as well as the outcome of knowledge delivery is quite accurate in its context.

Framework for sensory-information as services
This part of the framework is designed on the basis of the fact that existing forms of the sensory data are massively large in its size and dimension whereas the exising research-based approaches offers less emphasis over the data complexity. Apart from this, the volume of the sensory data is so high that it abnormally saturates the storage units of clouds which are again encountered with unstructured data. Hence, an analytical framework is designed for this purpose which is further classified in order to carry out three exclusive operations as follows as shown in Figure 2.  Yield of sensed information As the proposed system is basically a framework therefore it has a dependency of streamed information. For this purpose, any standard dataset can be utilized as the idea is just to offer maximized data in the form of stream. However, offline sensory data consideration will lack the problems that is associated with the real environment towards this data will be missing. Therefore, the proposed system will programmatically generate sensor data with an aid of Contiki for carrying out this experiment. Basically, it is one form of the open source system that offers extensive assessment environment. The information that is generated by the Contiki is considered to be sensory data that bears all the charecteristics of it carried out programmatically.  Storage optimization The proposed system offers a property of elasticity of the cloud storage system in the form of cloud bucket which is basically a form of directory of the storage on the top of the storage system used over the data centers. Interestingly, the proposed system offers storage facility for the user using the cloud buckets and not the data center storage. The study implementation uses distributed database management system in order to offer involuntary management of fault tolerance. As the prior sensory data are highly unstructured and therefore, it is quite challenging to perform any form of data processing on it. Therefore, this problem is mitigated by using Hbase that offers better indexing mechanism towards all the rows of the distributed data present in the cloud environment. Further, the mechanism towards using the cloud bucket systems offers the appropriate usage of the user on the basis of the actual demands of data processing. The proposed system maintains a unique indexing keys over different types of the distributed storage servers that actually assisys in faster data extraction and management process. The prime mechanism of this system enables the processing of unstructured to structured data that thereby makes it suitable for applying any form of analytics on it.  Cost-effective analytical operation This part of the implementation deals with highlighting the analytical operation that performs extraction of mined knowledge in order to relay sensory analyzed data as a service. The patterns associated with the transmission of the sensory data is obtained that is used for extracting an explicit data from the sensor nodes as well as various other associated information e.g. frequencies of attempts of retransmission and delay of transmitting information. As all the information associated with the network processing is very much important for the network analyst, therefore these sorts of latent information are extracted by the system for better precision management. The core meaing of the knowledge in the proposed system is basically the trend of the network pattern of transmission that is formulated by the sensor nodes/devices is the mode of different application. The proposed system can make use of this knowledge in order to improve the operation of various applications that depends upon the sensory data. It is already known that supportability of the unstructured data that further offers better accessibility towards the most discrete set of data that actually controls the overhead of data.

Storage optimization
Cost Effective analytical operation Figure 2. Framework for sensory-information as services

Framework for advanced analytics
This part of the implementation is focused on developing advanced analytical operation while the study also emphasizes over addressing the existing problem associated with restricted control over cloud-based resources as well as lack of compatibility of existing analytical operation from computational cost effective. The core study objective of ths part of the implementation is to offer a comprehensive design of the user-based knowledge mining approach. The secondary objective of this part of the implementation will be to obtain various potential patterns associated with the latent connectivity of the patterns among various set of discrete data generated by the unique sensor nodes. The core logic of the proposed study will be to obtain the output from the prior module which is further subjected to the tree mechanism for better analysis ofteh data in the form of nodes and edges. The process than carry out an effective tree-based management as well as sorting of the branches which is finally followed by the frequent patterns concept 4327 that ultimately leads to generation of various mined outcomes. The prime novelty of this part of the implementation is that it is capable of establishing significantconnectivity witin the different data in the form of nodes along with considering its respective context. The proposed system also presents an analytical framework that is capable of performing explicit computation of different variants of the estimates connected to the nodes. Apart from this, the complete concept is developed in such a way that it can actually support distributed application along with connectivity option of both single as well as double hop communication system. The presented study carry the entire operation with respect to various demands of the application associated with the cloud services. The proposed system also implements a threshold-based mechanism in order to find out if the selected pattern is supported by the present traffic of communication channel. The implementation strategy is continued using a novel tree-based mechanism which offers better localization of an explicit data as well as it finds a unique route of communication between two active nodes that always starts from the root node. Further the implementation is carried out by dual steps of operation where the first step is to construct the tree structure while the second step is to carry out analytical operation. Figure 3 highlights the proposed framework which generates various data SD1, SD2, …. SDn in distributed manner that further uses tree mechanism to generate a highlight connected tree topology with individual trees t1, t2, …. Tn. The mechanism supports tree management as well as performs sorting of branch followed by extracting candidate frequent pattern, which is ultimately known as final knowledge extraction.

RESULT ANALYSIS
This section discusses about the outcome being obtained from the implementation of the proposed system. The scripting of the proposed system is carried out in MATLAB where 500 nodes have been considered to be dispersed in a simulation area of 1000x1200m 2 . All the configuration of the sensor node bears the charecteristics of MEMSIC nodes while the proposed system has been compared with the existing frequent pattern-based approach in order to assess performance comparative analysis. The assessment has been carried out with respect to effective mining duration, energy depletion, and memory consumption.
The study outcome as shown in Figures 4-6 clearly shows that proposed system is better in contrast to existing frequent pattern approach. The basic reason for mining duration is that existing frequent pattern involves counting items which exponentially increases whereas proposed system counts only knowledge obtained hence time taken is very much reduced. Proposed system involved zero recursive operation while tree-based approach is utilized for better topological construct of an IoT system that ensures better control of energy reduction which cannot be carried out in existing system. The proposed system has also lower memory consumption as adoption of tree-based approach reduces the dependencies of maximized location of data storage which is not seen in existing approach. Therefore, proposed system can be claimed to offer better and cost-effective analytical performance in contrast to existing system.

CONCLUSION
At present, there are various forms of approaches that has been stated and claimed to effectively mitigate the problems of load balancing in cloud environment. However, mitigating the massive load over dynamic and distributed environment of private cloud is something which is quite challenging. Therefore, the proposed system has introduced a mechanism of cost-effective load balancing mechanism using analytical methods. The significant contribution of the proposed approach is as follows: i) it overcomes the limitation of cloud-based control system, incompatibility of existing mining models, and is applicable over comprehensive event processing, ii) the study presents a simplified and yet sophisticated usage of frequent patterns that connects data with nodes, iii) the method also uses tree-based scheme capale enough for sustaining discrete transmission over sensory application.