CCCORE: Cloud Container for Collaborative Research

ABSTRACT


INTRODUCTION
In mid-1990's various grid-based cyberinfrastructures or e-infrastructures were constituted that integrated high-speed research networks and middleware services and endorsed researchers for collaborative sharing of distributed resources. These firmly unified science gateways served as resource providers for specialized as well as generic research initiatives [1]. However, restricted interface to the data, domainspecific nature of science gateways did not match the requirement of the researchers outside those domains [2]. With the advent of cloud computing, easy reconfigurable and adaptive Virtual private research environments and science clouds became a preferred alternative to a traditional grid or cluster-based einfrastructures. Cloud-based collaborative research platforms provide the researchers with computing, storage resources required to run their applications, and they can collaborate to share data and application, while he concentrates on his area of research. Cloud platform offers compute environment with the huge set of computing resources much bigger than what an individual research organization can afford. Organizations can scale up, scale down the resources, and pay for it according to the usage. Multitenancy provided by cloud architecture enabled the creation of domain and requirement specific virtual private research environments that expedited researchers for collaboration and sharing of the resources [3]. Several science clouds such as Nectar Research cloud [4] provides the infrastructure to run compute-intensive scientific applications [5], [6]. Even though a substantial amount of research work has been carried out with regard to cloud-based collaborative research platforms, ample work does not exist in view of dynamic resource allocation in collaborative research cloud frameworks. The primary aim of the paper is to design a Cloud Container based Collaborative Research (CCCORE) framework employing an on-demand, dynamic resource provisioning according to the varying workload, through a comprehensive assessment of requirements of the users and available resources in a collaborative research environment.

Background
In this section, we discourse an illustrative set of existing work related to cloud-based collaborative research platforms, among which some platforms used hypervisor-based virtualization while others have deployed containerization based resource allocation.
Benjamin H. Brinkman et al [7] proposed a cloud-based portal for sharing data and collaborating on projects containing large EEG datasets for fostering collaborative research. Authors discuss that portal provides fundamental requirements of collaborative research platform and some of the features they have emphasized are the security of the data and access rights on the data, access to data and results of an analysis, a platform independent tool to view and search datasets.
Tarek Sherif et al [8] proposes a CBRAIN, a web-based generic collaborative research platform that offers access to remote data sources, distributed computing sites, processing and visualization tools for data and compute-intensive research in neuroimaging.
A. Mc Gregor et al [9] present RP-SMARF, a collaborative research platform built on cloud, in the area of smart facilities management, which connects geographically disseminated heterogeneous resources.
Bastian Roth et al [10] have sort after the challenges in scientific collaboration and proposed an approach, which leverages on groupware tools and hypervisor-based virtualization techniques like KVM, VMware vSphere or Xen to run a generic collaboration platform.
Muhamad Fitra Kacamarga et al [11] authors put forward complete computing platform in bioinformatics research, which uses Docker containers for lightweight virtualization. Paper describes that Docker containers allow customization of the compute environment and effectively overcome the challenges in VM based approach.
Yujian Zhu et al [12] demonstrates a lightweight container based and a scalable system called Docket is based on LXC (Linux Containers) which provides a platform to run different application frameworks pertaining to academic and scientific research.
Elahehkheiri et al [13] have elaborated a tenant-based resource allocation approach using genetic algorithm and heuristic algorithm to overcome the issues of over-utilization and under-utilization in resource allocation for SaaS applications.
Sijin He et al [14] have proposed a virtual resource unit named EAC, which delivers better resource efficiency and scalability and discussed resource-inefficiency in the VM-based approach.

Problem
Scientific research in various disciplines often involves researchers from different organizations collaborating to conduct analysis, experiments or simulations that are data and compute intensive and with unpredictable resource requirements [15], [16]. These kind applications or tools requires highly dynamic resource allocation method. The resource intensive applications, data, and tools shared in highly collaborative research platforms suffer from bursty workloads [17]. However, most of the collaborative research platforms depend on the Cloud service providers for resource provisioning that schedule the applications independently and provisions the resources statically. Lack of a comprehensive assessment of applications and the available resources can lead to under or over utilization of resources and increased execution time for an application [18], which is undesirable in a collaborative research environment.
Therefore, we identified that the major problems as for resource allocation in collaborative research cloud frameworks with varying workloads are: a. Bursty workloads owing to Data and compute-intensive tools and applications. b. Static provisioning of resources, which leads to resource locking. c. Increased execution time due to lack of comprehensive assessment of applications and the available resources.

Proposed solution
Our proposal is the design of Cloud Container based Collaborative Research (CCCORE) framework that intends on-demand, customized containerization, comprehensive assessment of resource requirements and applies a scalable algorithm that uses underutilized residual resources to achieve optimal resource allocation in a dynamic collaborative research environment. CCCORE offers a proficient way to standardize research methods, establish a relationship among data, and share the findings amongst researchers.This enables the researcher to focus on his domain of research rather than gaining the proficiency in infrastructure installations and analysis tools [19]. CCCORE rapidly spawns computational instances and provide a customized unit of resources according to the varying workload of applications or tools used by the researcher [20]. Researchers often need to replicate the results, study the inferences or analyze the results by varying the parameters. CCCORE containerizes entire set of data, application and all its dependencies, hence deliver a complete compute environment for the researcher.

ARCHITECHTURE OF CCCORE 2.1. CCCORE components
CCCORE integrates two units a) Research collaboration unit (RCU) and b) Management Interface (MI). RCU is ready to use container with data, applications/ tools, and operating system. It is optimized based on a finish time. RCU is shared among collaborating researchers on a trusted network. The residual resource pool of RCU provides it the capability to run an instance of an application and create an operating image for theresearcher. Figure 1 demonstrates the model of an RCU.

Figure 1. Model of RCU
We defined the original researcher who owns the research data, application or tools as owner. MI manages and administers RCU. Researcher sends the login request to the owner through MI. Owner approves or denies the login request depending on the credentials. When researcher request for the resources, MI will check resources available with owner and provision RCU from his pool of resources. The CCCORE defines permissions to view, edit, delete and publish the data and applications in the container based on user rights. The owner through MI set researcher's rights on RCU through Access control list (ACL). The two conditions that arise in setting the rights of the researcher are: a. The owner gives the researcher full rights on RCU and owner rolls backs his rights on it. b. Owner and researcher collaborate and hold the same rights on RCU. The Researcher will not see the RCU in his account. View The Researcher can see the RCU in his account and can view the data and tools available in the RCU. View and execute The Researcher can view the data and work on the data with tools available in a different parameter setting. Ownership Researcher will own RCU.

Sequence diagram of CCCORE
The stepwise description of the sequence diagram is given below: Step 1: Researcher request for resources to MI Step 2: MI verifies researcher and authenticate.
Step 3: MI sends query research request to owner.
Step 4: Owner verifies the request, authenticate and allocate resources packaged in RCU.
Step 6: Set researcher rights on RCU and grant it to researcher.
Step 8: MI monitors RCU performance for under provisioning or over provisioning.
Step 9: MI manages RCU the resource and resource allocation.
Step 10: MI optimizes RCUfor better finish time and resource utilization. Step 11: Researcher sends the decommission request to MI upon finishing the job.
Step 12: MI decommissions RCU by releasing the resources.
Step 13: MI update the resource pool of RCU.
Step 14: MI update the RCU decommission to owner.

CCCORE capabilities
In the following section, we describe some of the key potentialities of CCCORE as a collaborative research platform.
Customization: CCCORE creates custom-built RCUs on demand according to researcher's requirements. A researcher can select data (raw or analyzed), applications, and compute, storage resources bundled as RCU.
Flexibility: Inthe scientific research analysis, researcher may often need to build multiple environments, to generate various results based on the parameter settings. CCCORE enables the researchers to work on an existing project by duplicating the same settings irrespective of the local host environment [21]. CCCORE setsan environment to run multiple instances of same applications for different users.
Reproducibility: Reproducibility of researchis time consuming and challenging and call for configuring the platform, virtual machine clustering, compatibility fixes for operating system, software libraries andtools [22]. CCORE expounds reproducibility to facilitate researchers to reproduce the complete compute environment used by the original researcher. CCCORE create lightweight RCUswith an entireset of data, application and all its dependencies like root file systems, registries, software libraries and thus the entire workflow of a project used by the original researcher could be replicated and extended byother Computational portability: Some computational tools used for scientific analysis tightly couples with system environments and registry settings. RCU being a lightweight container and platform independent is portable across all platforms. The replication of the computational environments to run the applications shared between researchersis resolved in CCCORE as RCU instances can be exported to any environment, consequently enabling the emulation of computational environments to run these applications. Open Virtualization Format (OVF) defines an open source standard for packaging and distributing software for virtual machines.
Dynamic resource provisioning: CCCORE count on autoscaling tofurther dynamic allocation of resources for compute intensive research applications. Research tools or applications may demand set of dedicated resource or at times workload can vary based on the intensity of analysis. Scalability [23] imparted in CCCORE enables allocation of resourcesin response to the uncertain workload.Demand-driven resource provisioning commissions or decommissions resource instances for the RCU through MI.To achievea faster execution time, MI allocates residual resources of any RCU to any other RCU that demands it. Provisioning the compute capacity according to the varying workload that occurs in scientific applications requires the elimination of resource locking due to static provisioning of resources. Moreover, the static resource provision causes under utilization or over utilization of resources that poses a challenge in resource allocation.

Framework of CCCORE
The main modules of the layered framework of CCCORE are Physical layer, virtualization and control layer, service layer, delivery layer. Figure 3 illustrates layered framework architecture of CCCORE.  Table 2 shows the functionalities of each node of Virtual infrastructure diagram. We consider a as the virtual link bandwidth between virtual router and computational resources b as thevirtual limit latency between compute nodes. MI connects the resources (storage, compute) through virtual routers. To create an RCU, MI selects one of the computation nodes 3, 6, 7 based on the workload, through virtual router 4 creating routes 5-4-6, 5-4-7 or 5-4-3. MI connects Computation nodes 3, 6, 7 to storage node 1 through virtual router 2. MI comprehensively assess the available resources of CCCORE and allocates bandwidth and resourcesto any RCU based on workload requirement and finish time.
Virtualization and control layer: In Hypervisor based virtualization; the guest operating system that runs the applications consumes server resources thus increasing the system overheads [24]. Virtualization and control layer has adopted operating system level virtualization that enables the RCUs to share the operating system with host and other RCUs [25]. The layer offers an abstraction for the researchers and ensures isolation of resources for all the RCUs.
Service layer: This layer acts as a repository, which storesimages inOVF (Open Virtual Format) of all RCUs .RCU is exported in OVF format to the image depo. OVF format enhances theportability and platform independenceof RCU. Researchers access the allocated RCUthrough the service layer.
Delivery layer: In a collaborative research environment where resource demands are always high, Virtual Machine (VM) based approach can be in efficient. Delivery layer counts on rapidly scalable containers to accommodate high resource demands [26].

RESEARCH METHODOLGY 3.1. System model
We model dynamic resource allocation problem as an optimization problemand aims to minimize the finish time and improve the throughput to achieve optimal resource utilization. Our container based resource allocation algorithm enhances dynamic scalability by employing underutilized residual resources [27] and hence minimize finish time of an application.
Consider the set of total available resources N p (compute, memory, storage, and bandwidth) in CCCORE. Each RCU is denoted as r, residual resources in each RCU is denoted . Consider job (application) Aj with workload L j, and maximum allowed service delay T j , then the resources required is calculated as = (1)

1665
RCU will not execute a job with a size less than defined minimum value to avoid under utilization and resource locking. We define a minimum size of any job executed by RCU.
Minimum job size shouldbe ≥ Ljβ j where β j = MI comprehensively assess the total available resources in CCCOREto optimally allocate resources. Total residual resources in RCU is calculated as, Z= ∑ Finish time for a job is a ratio of workload to resource required with a specific time delay. Finish time decreases with optimal utilization of residual resources. Finish time for a Job A j is calculated as, = - Let Z r is allocated bandwidth for each user, Y is the unused bandwidth for RCU, n is the maximum number of RCUs that can be created in CCCORE, x is active RCUs at any moment of time.
Maximum throughput allocated to any RCU (X r ) is calculated as: Maximum throughput of CCCORE is calculated as ∑ ∑

Proposed algorithm
An on-demand, flexible resource provisioning call for a comprehensive assessment of requirements of the users and available resources. The proposed algorithm aims to minimize the finish time, improve the throughput and achieve optimal resource utilization. If the initially provisioned resources of an RCU is not adequate either to meet the finish time or resource requirements of an application, MI allocate the requested resources from the unused residual resources of other RCUs.

Algorithm 1: RCU Allocation
Input: A: Maximum number of RCUs allocated for each researcher owner N: Total number for RCUs available in CCCORE B: Maximum number RCUs any researcher can request. Output: RCU ij 1. If B ≤ A then 2.
Create RCU ij 4. If B> A then 5. Obtain RCU ij (A ≤ B ≤ N) from N with MI approval 6.
Create RCU ij 7. Set user rights for RCU ij

RESULT AND ANALYSIS
The hardware infrastructure deployed for the experiment consisted is as follows: Identical configuration of four physical machines each with configuration core i5 5287U processor 3 MB smart cache, 2 core /4 threads @ 2.9 GHz. Installed Memory (RAM): 4.00 GB which are connected using 1G Ethernet switch cisco SF 300 -24 port.We configured RCU based systemswith Physical machines installedwith Ubuntu 14.04, Open stack and 4 LXD (Linux containers).VM based systems are installed with windows 2012 server Standard edition with service pack 2 and 4 VMs.

Scenario I
We evaluated VM-based and the RCU-based systems for resource efficiency with respect to finish time and throughput. Improvement of finish time, increases the resource efficiency in a collaborative research environment. We compared the VM-based and RCU-based systemsby running a .net application and Sage Math. While the .net application is computationally light, sage math is a memory and compute intensive application. We conducted multiple iterations by varying configuration of VM and RCU. We conducted 50 iterations for .net application, as it is lightweight and 10 iterations for Sage math. Table 3 showsaverage finish time in executing the .net application for configurations 1) 2core compute, 1GB RAM and 2) 4core compute, 8GB RAM and in executing Sage Math application for configuration s3) 6core compute, 8GB RAM and 4) 8core compute, 16GB RAM using VM based and RCU based systems.  Figure 5 highlights that RCU showed 45% better finish time than VM for configuration 1) 53.8% better finish time for configuration, 2), 41% better finish time for configuration, 3) 48% better finish time for configuration, 4) the comparative analysis highlights that with increase of resources (core and memory) our proposed RCU based CCCORE delivers a better finish timethan VM, due to improved resource utilization implemented through our algorithm.  We compared finish time of VM, LXD and RCU systems withan identical configuration of 8 core, 16GB RAM in three iterations varying the residual resources. By varying the residual resources, we analyzed the impact of resource optimization in the finish time. In the first iteration, no residual resources were made available in the system; second iteration, with 25% residual resources available, in the third iteration 60% residual resources were available. The comparative analysis shown in figure 6 demonstrates that finish time for VM and LXD did not change with the availability of residual resources, but RCU employed underutilized residual resources and achieved a better finish time.

Scenario III
We conducted experimentsto evaluate the throughput of RCU and VM for processing data in varying sizes (1 GB, 4GB, and Bulk data ≥100 GB). The purpose of thestudy is to analyse the efficiency of RCU inutilizing the unused bandwidth toachieve better throughput as shown in Table 5.  Figure 7. Comparison of throughput of VM and RCU in processing data of varying sizes As it is obvious from Figure 7, while migrating 1GB data, RCU systems deliver improved throughput of 13% more than the throughput of VM based systems. Throughput increased by 24% with data of 4GB and 30.15% with bulk data migration. Therefore RCU achieves a better throughput compared to VM in processing data in variable sizes since is able to use the unusedbandwidth to achieve better throughput.

CONCLUSION
We have designeda Cloud Container based Collaborative Research (CCCORE) framework with dynamic resource provisioning according to the varying workload in the collaborative research environment. The proposed system relies on flexible, customized containers named as RCU to spawn complete computational environment for the researchers. Comprehensive assessment of user's requirements and using underutilized residual resources enhanced the efficiency of CCCORE. Experimental evaluation indicates that proposed RCU based CCCORE framework outperformed VM based systems in terms of finish time and throughput. Our future work will comprise the workflow automation of CCCORE and improve the container security.