Sensitivity of MAPE using detection rate for big data forecasting crude palm oil on k-nearest neighbor

Forecasting involves all areas in predicting future events. Many problems can be solved by using a forecasting approach to become a study in the field of data science. Forecasting that learns through data in the light age is able to solve problems with large-scale data or big data. With the big data, the performance of the k-nearest neighbor (k-NN) method can be tested with several accuracy measurements. Generally, accuracy measurement uses MAPE so it is necessary to conduct sensitivity on MAPE by combining it with the detection rate which is the difference technique. In addition, the k-NN process has been developed for the sake of running sensitivity by performing normalized distance using normalized Euclidean distance so that in this paper using the crude palm oil (CPO) price dataset, it is able to forecast and become a future model and apply it to Business Intelligence and analysis. In the final stage of this paper, the accuracy value in doing big data forecasting on CPO prices with MAPE is 0.013526% and MAPE sensitivity combined with a detection rate of 0.000361% so that future processes using different methods need to involve detection rates.


INTRODUCTION
Forecasting is a technique of data mining combined with machine learning that is used to analyze and calculate future events by using reference to past data with qualitative and quantitative approaches [1][2][3]. Forecasting has the objective of estimating prospects for economic progress as well as business activities and the environmental impact on these prospects [4,5]. So that forecasting for every company or business organization enters the most important part in every management decision making and forecasting itself can be the basis for management to plan activities and decisions in the short, medium and long term [6,7]. In conducting the forecasting process, a method is needed in its application, namely the K-nearest neighbor method [8,9]. k-NN is a method for classifying and forecasting objects based on learning data that is closest to the data object carried out by training data [10][11][12]. k-NN is very often found in papers that do forecasting such as reference [13] doing big data forecasting research, which is to predict customers who will be in arrears in credit payments to a bank and the result is a time measurement when the program carries out the forecasting process. In doing forecasting, of course, a learning technique is needed to train the past [14]. Learning past data can be trained using a method [15,16]. One of the methods of forecasting is k-NN [17,18]. In carrying out training data, you must have large data because of the forecasting process so that large data is needed in forecasting to get as few errors as possible in it [19].
As in Al-Khowarizmi [20] predicting the price of crude palm oil (CPO) and turning it into business intelligence in order to cut processing operating costs are so high that the method used by reference in making predictions is the Simple Evolving Connectionist System method. CPO is a product that is transacted in the commodity market or natural resource product which is commonly processed into various derivative products, both in the form of consumer goods and industrial raw materials [21][22][23]. So that it is very suitable that the CPO data collected is used as big data to be forecasted.
In obtaining a small error value, of course, a technique of measuring accuracy must be carried out as is done by Lubis [24] to analyse bank customer data and measure the accuracy value looking for the smallest error with MAPE and MSE so that in his research the smallest error value was achieved using MAPE. In addition, reference [25] also applies the smallest error value accuracy in the Simple Evolving Connectionist System method using MAPE but is influenced by the normalized distance formula.
However, Prayudani [26] compared the smallest error value with MAPE, MAE and MSE where the MAPE and MAE results have the same value except that the difference is in units using percent (%). In addition, by getting the value of accuracy and the smallest error value, it will increase productivity in business intelligence [20]. So that the application of data science has been used in various fields of science such as in this paper to measure accuracy to get the smallest error value that will be used to do big data forecasting on CPO prices using the KNN method so that it can be used as a model in a business intelligence and analysis (BIA).

RESEARCH METHOD 2.1. Dataset
The dataset in this paper uses a big data pattern, namely getting CPO price, the dataset is obtained from www.investing.com based on time and world CPO prices from January 1 st 2010 to August 25 th 2020 in the forecasting process involving training data and data testing. Where is the training data from January 1, 2010 to December 30 th 2019 and testing data from January 1 st 2020 to August 25 th 2020.

General architecture
In this paper, so that the research does not expand, a general architecture is described in this paper as shown in Figure 1.  In general architecture, it is described in the steps: a. Putting the dataset on storage. b. Perform calculations with the normalized Euclidean formula c. Training data -Input data into the k-NN network by calculating the formula -Looking for neighbours with the formula assignment. d. Testing Data -Input data into the k-NN network by calculating the formula -Looking for neighbours with the formula assignment. e. Analyse the results by obtaining sensitivity in MAPE with the combination of Detection Rate to obtain the results of big data forecasting with CPO price dataset.

Development of k-NN distance formulas
K-NN is a machine learning method because it involves past data to predict future data. The k-NN process is not only for forecasting but for predicting classification and association in finding patterns [27]. K-NN is the result of advanced improvisation Nearest Neighbour classification techniques. This concludes because each new data can be trained by many k neighbours, where k is a positive integer, with a small amount of data [28]. Forecasting uses k-NN at its stage of conducting training data and testing data based on categories in the sample or on past data and in accordance with the training sample k which is the closest neighbour to the sample test, then entering into the category that has the largest probability category [29]. Near or far the distance of a point from its neighbour can be calculated using the Euclidean distance [30]. Euclidean distance is represented in (1): where: K is number of input nodes I is initial node value W is value of the destination node Dn is the distance between the starting point where it will move which is a point that is known to its class and new point to be addressed. The distance between the starting point and the training point is calculated and the closest k point is taken and if the new point or the target point is predicted to enter the class with the most forecasting of these points [31].

Sensitivity accuracy
Calculation of accuracy is one of the important things in pattern recognition. This process is carried out as a measure of evaluation in a system. Measuring the level of accuracy can use various ways, one of which is using the Detection Rate. The detection rate (DR) equation is represented in (2) [32]: where: a is actual data b is result data Then in this paper also measured by mean absolute percent error (MAPE). The (3) of MAPE is as [25,33]: where: a = actual data b = result data n = lots of data In addition to the Detection rate and MAPE, accuracy measurements are carried out by combining MAPE and Detection rate to get a good sensitivity value with (4).
where the difference value in getting the sensitivity MAPE value is combined with the detection rate.

RESULTS AND DISCUSSION
In this stage, forecasting is carried out because the forecasting process is needed to plan future activities, therefore it must consider and decide what forecasting is actually needed. Forecasting is a process that is not difficult, but the mistake of deciding the goal or what is desired will result in different results so that the accuracy of forecasting will be doubted. Each process also needs to determine every detail of the forecast in terms of weeks, months or years. In forecasting, identify what data is needed and what data is available. Identifying this data will have an impact on the choice of forecasting methods later.
After getting the data to be evaluated like a dataset, the next step is to choose and determine the right model or forecasting method. Generally, the method chosen for forecasting is a method that has considered factors such as cost and ease of use. In addition, one of the most important factors is the accuracy of forecasting. The most common way is to find the best two or three methods and then test them on historical data to see which forecasting method or model is the most accurate.
In this paper, big data forecasting is carried out on CPO prices using the k-NN method and obtaining an accurate sensitivity value on the accuracy measurement technique with MAPE in combination with the detection rate. Before doing data training and testing, first determine what data will do the training and data testing. The dataset used from January 1, 2010 to August 25, 2020 in the forecasting process involves training data and data testing. Where is the training data from January 1, 2010 to December 30, 2019 and testing data from January 1, 2020 to August 25. The dataset is summarized in Table 1.   Table 1 is a summarized dataset, data of 2365 rows and as a whole can be seen in the graph available in Figure 2. Figure 2 is the entire dataset. Next, do the forecasting process with the k-NN method where the first process of forecasting determines the value of k using the normalized Euclidean distance as in (1). The forecasting process is carried out based on 165 testing data and 2,200 training data with detailed training from 1 January 2010 to 30 December 2019 and testing data from 1 January 2020 to 25 August. Forecasting is an ongoing process at this stage. After making the forecast, one must record what actually happened (actual) and then use that information to measure the accuracy of the forecast and get the sensitivity of the accuracy. It should be noted that the best forecasting method in the past is not necessarily the best result for the future. Therefore, it is necessary to change the forecasting method along with changes in the data to be forecast. The forecasting results in this paper go well and can be seen in Table 2.   Table 2 shows the summary results of forecasting using k-NN where the process of determining the neighbor's distance uses the normalized Euclidean distance and the calculation of the error value and the detection rate value can be seen. The detection rate value is smaller than the error value commonly used in the calculation of accuracy. For more details, the summary of Table 2 can be seen in the graph provided in Figure 3.   The result of MAPE sensitivity looks smaller at 0.000361%. This further shows that the MAPE sensitivity combined with the detection rate obtain a very good accuracy value. So that MAPE sensitivity results have been found and can be tested on other methods.

CONCLUSION
In the end, this paper has found a sensitivity value in the accuracy measurement technique using MAPE combined with the detection rate and has been tested to perform big data forecasting on the k-NN method. The forecasting process is carried out with 2365 datasets with training data of 2200 and testing data of 165. The results of forecasting are going well so that this paper can develop towards data science as data processing to get very good accuracy values. As in the aim of this paper, to get a good sensitivity value in accuracy using MAPE and accuracy using MAPE combined with detection rates where the results received in forecasting CPO prices at k-NN, the MAPE value is 0.013526% while MAPE is combined with detection. rate of 0.000361% so that the best sensitivity value in forecasting using MAPE combined with detection rate.