Campus realities: forecasting user bandwidth utilization using Monte Carlo simulation

Received Oct 31, 2019 Revised Mar 12, 2020 Accepted Mar 30, 2020 Adequate network design, planning, and improvement are pertinent in a campus network as the use of smart devices is escalating. Underinvesting and overinvesting in campus network devices lead to low network performance and low resource utilization respectively. Due to this fact, it becomes very necessary to ascertain if the current network capacity satisfies the available bandwidth requirement. The bandwidth demand varies from different times and periods as the number of connected devices is on the increase. Thus, emphasizing the need for adequate bandwidth forecast. This paper presents a Monte Carlo simulation model that forecast user bandwidth utilization in a campus network. This helps in planning campus network design and upgrade to deliver available content in a period of high and normal traffic load.


INTRODUCTION
At the inception of the internet, only limited users were found online in a typical campus network because mobile smart devices were not common. As mobiles and smart devices started exploding, bandwidth became a strain on-campus network due to streaming media like YouTube, Netflix, Facebook amongst others. Now, scientific education is moving to the cloud making file transfer consume more bandwidth on the campus network [1][2][3][4][5]. Thus, driving many campus network operators to evaluate capacity upgrade. Due to cost particularly in developing countries, many higher institutions and Universities have not kept to the pace of network technology investment. However, it is pertinent that these universities find a way to upgrade their campus network and extend the life of existing infrastructure while simplifying the architecture to enable the low cost of operation [6][7][8][9][10]. Higher institutions and University education missions are also dependent on their network capabilities. But most of the traditional campus network designs were built to operate on a three-tiered routed network model. This model assumes that learning in higher institution and universities take place in classrooms and data is mostly consumed only within the classroom environment. Nonetheless, demand varieties in campus networks that support the use of mobile technologies, cloud applications, research files, and data transfers must be attended to [11][12][13][14][15][16][17]. Figure 1 shows the diagram of a typical campus traffic allocation.
It is evident that these days applications are growing. The use of bandwidth and campus network traffic is deterministic from end-users accessing content from the internet or cloud data centers. Therefore, there is a need to develop a model on a high bandwidth network that will help schedule data movements [18][19]. Efficient network performance is achieved if the planning of network usage is done a priori. Hence, there is a need to build a model that can predict data usage for real-world traffic in the campus network. This will help not only in planning and upgrading campus network resources but also improve the overall performance of the network in terms of bandwidth consumption [20][21][22][23]. Literature in the past has tried to address the issue of bandwidth utilization in campus and residential networks by forecasting the bandwidth demands for aggregated subscribers as presented in the works of [1][2][3][4]. The use of statistical techniques was employed to quantify the concurrent number of traffic of fixed access networks with a specific target to residential areas only. [18][19][20][21] explicitly developed an adaptive bandwidth management system for higher education institutions with a view to increasing the bandwidth of the users who access more educational websites. The theory of their design was based on the work of the authors in [24] that studied the utilization of bandwidth in the face of increased internet traffic in the era of 'bring your own device'(BYOD) and increased digital content. Traffic policing and shaping was applied to prioritize traffic to effectively utilize bandwidth. A hybrid data mining scheme that utilized clustering and classification for the allocation of bandwidth in a priority-based manner has also be used to manage bandwidth as presented in the work of [25]. The essence of the work was to study and forecast students' behavioral patterns in a campus network and determine the primary aspects that influence the students in browsing the internet. The reviewed works showed that there is still a need to develop a user bandwidth utilization model based on campus realities.
Hence, the contribution of this paper is the development of a realistic model based on the experimental setup that can forecast user bandwidth utilization in a campus network from the User end. The remaining aspects of the paper are itemized as follows: section two presents the review of related works, its contributions, and limitations. In section three, the model formulation as well as governing equations. Results analysis and conclusions drawn from in section four and section five respectively.

RESEARCH METHOD
The step by step approach employed for successfully implementing the proposed model presented in this paper are discussed as follows.

Campus network design and upgrade
Usually, in a campus network, the best-used design is the hierarchical network design. This design presents three layers which are the core layer, the distribution layer, and the access layer. The distribution layer switches at higher layers are directly connected to the internet, while access layer switches are directly connected to the end-users (computer or smart devices). To collect data, we set up a network from higher layers to lower layers in the hierarchies of networks. We had a dedicated database server responsible for the collection of network traffic and network behavior. This server was in operation 24/7 with a view to providing an adequate result. As depicted in Figure 2, we collected traffic data on a private network by configuring different user profiles. This is a typical scenario of a campus network where each student or staff user has a unique username and password for accessing the internet.

Traffic generation
Approximately, 50 computers (users) were accessing and surfing various websites on the network at the same time 24/7 using their login profile to generate diversified traffic within a particular interval of time. The captured traffic from different LANs and WANs is monitored and stored in a database server. This is done constantly without any interruption and downtime. The daily, weekly and monthly traffic data generated shows the average bandwidth and resource usage information of the testbed network based on the login in the information of the Mikrotik device as in Figures 2-5. As a network setup testbed, it was subjected to rigorous usage within a week to collect traffic data as presented in

Traffic observation
In this particular research paper, we try to monitor the In-Out traffic on the Router's port for all the incoming and outgoing traffic with the aid of the Wireshark software. This was done to achieve an accurate and precise result. The packets sent were recorded for varying times in hours and minutes as shown in Figure 6 using Wireshark. From the Wireshark software, the values in Table 1 were extracted that shows the payload used in each day by all the users surfing the net. As presented in the captured sample data in Table 1, we have the bandwidth payload captured for the different days of the week. The payload is comprised of traffic generated from research, social media and video streaming sites. This was monitored and presented in Table 2 according to their usage.

RESULTS AND DISCUSSIONS
Looking at the network topology in Figure 2, and from the sample of data traffic collected, our creative outcomes help the analytical results for decision making and precision the network bandwidth. Based on the data collected, our analysis and result are as follows. Let the total number of users a campus network can concurrently support be U, amount of available network bandwidth in an institution be A, average utilization be K, average utilization per applications be K p and transfer data rate be D. Therefore, where is the available bandwidth, is the transfer data rate and is the current user. The amount of available bandwidth is in Gbps and needs to be converted to Mbps to determine how much bandwidth each application is consuming. The bandwidth total ( ) can be calculated as: where K pR is data usage on research, K psm is data usage on social media, K pls is data usage on live streaming. The required minimum bandwidth is calculated using (3) as: The bandwidth utilization of each user can be limited (increased or decreased) using (4).
where is the number of research sites accessed, is the number of the live streaming site accessed, is the number of social media site accessed and is the total number of sites accessed?

Bandwidth forecast using Monte Carlo simulation
The steps employed for the Monte Carlo simulation are highlights as follows a.
Step 1 Sort the data in ascending order. b.
Step 2 Compute the margin of the extremum and calculate the number of intervals using (5) and (6) respectively.
where is the extremum margin, is the maximum of the data, is the minimum of the data, is the number of intervals and is the total number of elements in the data. Then the margin ( ) between the lower and upper bound of the data is calculated using (7).
The equation shows the margin between the upper and lower bound of 5 intervals. Based on (5)- (7), the calculated interval values are given in Table 3. c.
Step 3 In this step, we compute the frequency distribution and then converted into probability values. Thereafter the cumulative probability distributions are computed as shown in Table 4. The daily payload occurrence probability is shown in Figure 7. From Figure 7, it can be observed that the payload with a higher probability of occurrence falls within the interval values of the first interval. The corresponding probability of the order intervals is as given in the figure. Using this information, we can perform a probability and the Anderson Darling test is given in Figure 8. It was observed that the data follows a normal probability distribution with a p-value=0,664 ≥ α=0,05. Thus, fifty (50) normally distributed random numbers were generated for each day of the month using Minitab simulator. Table 5 shows samples of the generated random numbers   Figure 7. Probability of occurrence The probability intervals (upper and lower bound intervals) were calculated using the cumulative probability intervals obtained from the previous table. This probability intervals and its interval values are given in Table 6. The average of the lower and upper bound values is calculated for each of the intervals corresponding to the intervals for which the random numbers were classified. These classified average values are given in Table 7.
The average of each row in Table 7 is computed as shown in the last column of the table. The average value is treated as the Monte Carlo simulation of the payloads for each day for the next thirty schools' days. These Monte Carlo simulated values were then compared with the actual daily payloads as shown in Table 8.  It can be observed that the two series presented in Table 7, are generated with different samples (t-sample). This is then used to test whether the population's means are equal or not. The graphical view of the actual payloads and the Monte Carlo simulated payload is presented in Figure 8. From Figure 8, it can be observed the forecasted data follows the trend of the actual data very closely. This shows that the developed Monte Carlo based simulation model is effective in forecasting the payload internet usage of the study campus area network. The statistical data used for this study was generated for 50 users for 30 school days. On average, the total amount of data required to meet the network demand of the user in a month is estimated as 7.64 gigabytes. Therefore, the total amount of data required to cater to the need of 2500 students in the campus is estimated as 382 gigabytes.

CONCLUSION
The aim of this research work which is to forecast user bandwidth utilization on a campus network has been largely met. The research used Nuhu Bamalli Polytechnic, Zaria, Kaduna State, Nigeria as a case study. A network of 50 users concurrently surfing various sites for 30 days was set up with for collecting data. Monte Carlo simulation was used to forecast user bandwidth utilization to plan campus network design and capacity upgrade for 2500 users and beyond.