Shannon entropy on near-infrared spectroscopy for nondestructively determining water content in oil palm

Indonesia is the world’s largest producer of palm oil. To preserve its competitive advantages, the Indonesian oil palm sector must expand high-quality palm oil output. In oil palm quality control, the water content is a crucial parameter as it can be used as a reference to determine the right harvest time. Thus, this study proposed a near-infrared (NIR) spectroscopy as a fast and non-destructive analysis to assess oil palm water content. NIR spectra were processed using Shannon entropy to describe the characteristics at each wavelength. In this study, oil palm fruit samples at various maturity levels were collected with eight different maturity fractions. Based on the analysis, the Shannon entropy value is closely related to any changes in the water content of palm oil. The entropy value has a decreasing trend as the water content increases. The proposed technique can predict the water content of an oil palm with satisfactory performance with values of 0.9746 of coefficient of determination (R 2 ) and 2,487 of root mean square error (RMSE). Application of this model will lead to a fast and accurate prediction system related to oil palm water content.


INTRODUCTION
In Indonesia, millions of people are employed by the oil palm sector, which contributes significantly to the country's economy. Indonesia provides about 45% of global demand. With an annual production of more than 45 million tonnes of palm oil and a plantation area of 14.99 million hectares in 2022 [1]. Indonesia is the world's largest producer of palm oil [2], [3]. To preserve its competitive advantages, the Indonesian oil palm sector must expand high-quality palm oil output while keeping production costs low. Fruit quality can be determined by the degree of ripeness, water content, oil content and free fatty acid content, which ultimately determines the quality of the oil palm produced [4]. Water content is a crucial parameter as it can be used as a reference to determine the right harvest time, which, combined with an accurate assessment of oil palm fruit ripeness, maintains optimal oil palm quality [5]- [8]. The demand for and interest in portable instrument applications in modern oil palm agriculture appears to have skyrocketed. With properly constructed sensors, the qualities of an agricultural product may be quickly perceived and correctly connected to its quality. It was proposed that, as with oil palm fruit, measuring water content in fruit leads to knowledge of its quality as well as maturity stage [9]. Traditional approaches to quality assessment of agricultural products are usually time-consuming and expensive. Traditional methods have been used for a long time, but they are very tedious, expensive, and timeconsuming [10], [11]. This method is carried out in a laboratory containing chemical solutions potentially harmful to the environment. This method damages the originality and structure of the fruit, as well as its water content. In addition, the fresh fruit bunch (FFB) of oil palm also has the potential to be wrong because they are grouped based on color which is considered to have the same water content. Therefore, replacing this method with a fast, non-destructive, cost-effective technique is necessary.
Oil palm FFB water content measurement has always relied on visual inspection, but recent study suggests a non-destructive testing technology might greatly improve efficiency and accuracy. Various methods for detecting the water content of oil palm fruit have been developed and tested, including based on: near-infrared (NIR) spectroscopy [3], inductive sensor [5], computer vision [12], image analysis [13], and electronic nose [14]. The NIR-based system is one of the popular and most promising methods to be used in predicting the water content of oil palm [15]. Spectroscopic analysis in both the visible and near-infrared ranges has been widely used by researchers for classification and assessing the quality and internal properties of fruit, such as analysis of vitamin C content in apples [16], chemical properties of mango [17], and internal quality of citrus [18]. This method is known as fast and non-destructive analysis to assess fruit quality. Therefore, applying NIR to assess palm quality, especially water content, has tremendous potential.
Using a spectrometer and NIR spectroscopy, the chemical composition of oil palm fruit is assessed using a regression model. The biggest challenge in implementing NIR technology is managing the immense amount of data produced by each wavelength spectrum [15]. If this is applied directly to technological equipment, the computing burden will increase much further To create an accurate prediction, it is required to undertake an efficient and effective chemometric analysis. Researchers have presented chemometric analysis strategies for evaluating the quality of oil palm, including: principal component analysis [3], artificial neural network (ANN) [10], empirical mode decomposition [15], partial least squares [19] and genetic algorithm [10]. Due to the nonlinear complexity of NIR signals, certain signal processing techniques are necessary to improve forecast accuracy. In contrast to previously reported chemometric analyses, we propose a Shannon entropy analysis to extract the spectra at each wavelength from the NIR data. In this study, it is evident that research gaps have been identified and novelties discovered.
In the field of information science, Shannon entropy has seen a lot of use. The Shannon entropy quantifies the degree of unpredictability that is linked to a random variable [20], [21]. Shannon entropy quantifies the expected value of the information contained inside a communication. Shannon entropy has been frequently utilized in image encryption for many years as an all-encompassing measure of information and uncertainty. When analyzing the NIR spectra in the presence of significant noise interference, standard approaches are frequently erroneous. This article proposes a method based on the computation of Shannon entropy for improving the accuracy and efficiency of the algorithm used to forecast the water content of palm oil. Therefore, this study aims to investigate the application of Shannon entropy analysis to NIR spectra to predict the water content of palm fruit. This water content information can be used to determine the quality of oil palm. Thus, handling each step in the oil palm industry will run effectively and efficiently.

METHOD 2.1. Collection of oil palm FFB samples
The Cikabayan Oil Palm Plantation provided samples of oil palm fruit at several stages of ripeness (IPB University, Bogor, Indonesia). All tenera varieties, and FFB were harvested from eight different maturity fraction conditions according to the standards of the Indonesian Palm Oil Research Institute (IOPRI) [22]. Since flowers bloom from unripe to ripe, the collected FFB were categorized according to their maturity age: 3, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, and 6 months following anthesis (maa). The FFB sample was classified according to professional workforce based on the palm oil database literature in each plantation. Each maturity age was represented by one FFB, and 35 palm fruit were taken from each FFB at various proximal, central and distal positions. Thus, the total sample used for the development of this system was 350 palm fruits, which were sent directly to the laboratory for measurement of water content and acquisition of NIR spectra. Each new sample was taken from the same oil palm FFB on the test day. The test must be completed on the same day as sampling to avoid inconsistencies and contamination.

Water content analysis
According to standard SNI 01-2891-1992, the oven test is utilized to determine the water content of oil palm fruit [23]. The principle of this approach is to evaporate the water from oil palm fruit by heating it at 105 o C for 3 hours. This process is repeated until the sample's mass is stable. The difference between the sample's weight before and after heating is considered to be its water content on a % wet basis (% w.w).

NIR spectra acquisition
The NIR spectra of oil palm samples were obtained using an NIRFlex N-500 spectrometer by BUCHI Labortechnik AG, Switzerland. For NIR, the measurement mode is post dispersive transflectance. The principle of NIR spectra measurement is to shoot light at near-infrared wavelengths onto the palm oil sample. Some of the reflected energy will be received by the detector as absorbance data. In NIR analysis, fresh fruit samples are scanned directly on the NIR sensor without slicing and grinding. Spectral absorbance for wavelengths of 1,000 to 2,500 nm was taken for further analysis. A total of 350 samples of palm fruit were scanned one by one for their NIR spectra and used for the development of a mathematical model related to the prediction of water content.

Shannon entropy analysis
German physicist Clausius proposed the idea of entropy in 1865. "Energy deterioration" was the original meaning of the term, which referred to a certain physical condition [24]. In thermodynamics, it has been used extensively. In 1948, Shannon developed "information theory" by adapting the notion of entropy from statistical physics to the analysis of signals with the aim of resolving issues with quantitative information measurement. In this study, the concept of Shannon entropy was used to analyze the NIR spectra of oil palm. Shannon entropy in NIR spectra can be calculated using a modified formula based on Gianakopulus (1) [25]. Where, where, c is the ratio of the total energy of the NIR sub-wavelength spectra to the total energy of the entire NIR spectra, i=1, 2, 3,…, n.
In various contexts, the Shannon entropy is used to quantify the randomness or unpredictability of information. Specifically, Shannon entropy quantifies the expected value of the information contained in a data set. According to Nanda et al. [26], entropy can be described as a measure of the sudden changes in the energy levels of a spectra.

Polynomial regression
Regression is a method of estimating relationships from given data to describe the characteristic of a data set [27]. This relationship can then be used for various calculations such as for forecasting future values [28]. In polynomial regression, the correlation between the explanatory and response variables is represented by a polynomial of the appropriate degree. If the degree of the polynomial regression is large, then the regression curve becomes too flexible to be able to adjust to the pattern of data distribution [29]. In this study, polynomial regression was used to predict water content based on the Shannon entropy value in the NIR spectra. Polynomial regression with various degrees (linear, quadratic, and cubic) can be formulated by (2). Where, is the target value of water content, (1,2,3,…, ) is the slope of the polynomial regression line, is the shannon entropy value, and 0 is the intercept.

Evaluation
The performance of the model is assessed by making a comparison between the measured values and the predicted results [30]. Root mean square error (RMSE) and coefficient of determination (R 2 ) were utilized to rank the accuracy of the polynomial regression model developed for predicting oil palm water content [31]. With a high R 2 and a small RMSE, the developed model is more reliable [32], [33]. All datasets were randomized into calibration and prediction datasets with a ratio of 8:2.

NIR spectra
First and second overtones, as well as basic vibration combinations, especially carbon-hydrogen, produce a complex near-infrared spectrum with several overlapping bands. When analyzing samples with NIR spectroscopy, it is possible for the spectra to overlap. This may occur if the samples being studied are 5400 not well-defined or if there is a great deal of fluctuation within a sample. When spectra overlap, it can be more difficult to interpret the data precisely, and there may be an increase in outliers. Outliers are data points that differ dramatically from the general data trend or pattern. To obtain reliable results, it may be required to employ extra analytical procedures or collect additional data in this instance.
The NIR spectrum of the oil palm samples analyzed in this work, shown in Figure 1, is consistent with the previous reports. In general, there are two main peaks along the NIR spectra. The first small peak occurs near 1,200 nm. This area corresponds to the absorption wavelength for carbohydrates [34]. An absorbance band occurs at a wavelength of about 1,450 nm, which is generated mostly by combining the fundamental vibrations of the C-H groups [35], [36]. The fluctuations in NIR absorption intensity reflect differences in the water content contained in the oil palm fruit. NIR spectra with this overlap will likely contain more outliers. Therefore, analytical techniques for dimension reduction, such as Shannon entropy, are very useful for increasing model accuracy, separating overlapping spectra and reducing the number of outliers.

Oil palm water content
During ripening, oil palm fruit water content displays distinctive characteristics. One of the factors in determining oil palm maturity is the availability of water. The standard water content for ripe oil palm fruit is around 30% to 40% [37]. Based on the analysis, oil palm has a water content between 75.70% (3 maa of age) to 23.5% (6 maa of age). In general, during the ripening process of oil palm, there is a decrease in water content as shown in Figure 2. At each age of maturity, the oil content of oil palm products will increase while the water content decreases. Sinambela et al. [7] reported that the water content in oil palm varies greatly along the fresh fruit bunch area, namely proximal, central, and distal. The water content of centrally located palm fruit is lowest compared to its proximal and distal counterparts. This demonstrates that the center of FFB matures at a quicker rate than its surrounding parts. Numerous variables, including environmental circumstances, climate, cultivars, and others, can influence variations in water content at different fruit locations. Figure 3 shows a two-dimensional (2D) plot of the water content and Shannon entropy values. Based on the analysis, the Shannon entropy value ranges from 0.0024 to 0.073, with a water content between 26% to 88%. Visually, the entropy value has a decreasing trend as the water content increases. This confirms that there is a close and regular relationship between the value of water content and Shannon entropy. Changes in decreasing value of Shannon entropy suddenly occurred at a water content of about 86%. This may be related to the change in absorbance in the NIR spectra with samples containing more water. The term "entropy" refers to a statistical quantity that is used to quantify randomness and disorder in calculations. Entropy, as defined by Shannon, is the "statistical average" of the information function over the set of information sources and the set of individual information sources and their derivative processes.

Figure 2. Water content in various stages of oil palm maturity
The main contribution of this study is the implementation of the Shannon entropy approach to obtain the characteristics of the NIR spectra at each wavelength, which is simpler to achieve reliable predictions. The application of Shannon entropy shows good performance, as reported by previous researchers, including: Yu et al. [38] suggested an entropy-based technique for selecting hyperspectral characteristic wavelengths to categorize Lycium, Zhu et al. [39] utilized the entropy of maize seed hyperspectral data for maize seed purity identification, and Liu et al. [40] implemented the Shannon entropy for origins categorization of egg with various storage conditions.

Modelling
The Shannon entropy value is used as the main input in constructing a mathematical equation based on polynomial regression to predict the water content of oil palm as shown in Table 1. This study proposes a polynomial regression model with various degree values between 2 and 6. Based on the analysis, the R 2 and RMSE values in the polynomial model range between 0.9679 to 0.9746 and 2,487 to 2,531, respectively. This shows that the model has a satisfactory performance in predicting the water content of palm oil. In addition, this study confirms that the higher the degree value, the more accurate the model is, as evidenced by the high R 2 metric evaluation and low RMSE. In the end, the polynomial regression equation with the degree of 6 was chosen as the model to be used in predicting the water content of palm oil with values of 0.9746 of R 2 and 2.487 of RMSE. Application of this model will lead to a fast and accurate prediction system related to palm water content. The NIR spectral characteristics of oil palm consist of various data sets with different properties. Predictions related to palm oil parameters, such as water content, can be performed with various forms of regression. Polynomial regression can be applied to a single regression variable called simple polynomial regression or calculated on multiple regression variables as multiple polynomial regression. The high degree of polynomial regression method is more likely to fit the training data than linear regression. If the degree d of the polynomial regression is large, then the regression curve becomes more flexible concerning the trend of the data [41].

CONCLUSION
This paper demonstrates the application of the NIR spectra processing method to predict the water content of palm oil based on Shannon entropy analysis. Based on the analysis, the Shannon entropy value is closely related to any changes in the water content of palm oil. The entropy value has a decreasing trend as the water content increases. Furthermore, this Shannon entropy analysis is combined with polynomial regression to build a reliable oil palm water content prediction model. Based on performance evaluation, the proposed technique can predict the water content of palm oil with satisfactory performance with values of 0.9746 of R 2 and 2.487 of RMSE. Application of this model will lead to a fast and accurate prediction system related to palm water content. This information can be utilized to enhance the production process and guarantee product quality. Thus, the entropy analysis technique has the potential to detect the water content in fresh fruit bunches of oil palm to determine the internal quality of oil palm. Future research will embed this entropy-based model to develop a portable instrument for predicting palm water content.

Walidatush Sholihah
is a lecturer in the Computer Engineering Technology Study Program at the College of Vocational Studies at IPB University. She completed her master's in computer science at IPB University, Indonesia, in 2020. She has authored or co-authored several conference papers in various reputable national and international journals. She has also authored some books in mathematics and computing. Her research interests include software engineering, automation systems, precision farming, and distance learning. In addition, she is a member of the Informatics and Computing Higher Education Association. Currently, she is the head of computational and library development at the College of Vocational Studies at IPB University. She can be contacted at email: walidah@apps.ipb.ac.id.

Gema Parasti Mindara
is a lecturer in Software Engineering Technology Study Program at the College of Vocational Studies at IPB University. She completed her master's degree in Computer Science at Universitas Indonesia, in 2013. She has authored or coauthored several papers in various reputable national and international journals. Her research interest include, software engineer, automated system, big data technology. currently, she is a head of Software Engineering Technology at College of Vocational Studies at IPB University. She actively as a member of Indonesia Woman in Cyber Security and member of the Informatics and Computing Higher Education Association in Indonesia. She can be contacted at email: gemaparasti@apps.ipb.ac.id.

Muhammad Iqbal Nurulhaq
is a lecturer in the Technology and Management Plantation at college of Vocational Studies at IPB University. He completed the master's in Agronomy and Horticulture at IPB University, Indonesia, in 2020.He has authored or co-authored several conference papers in various reputable national and international journals. He has also authored some books about agronomy, plant physiology, and Community Development. His research interests include Sago palm, Plantation Commodity, Community Development, and precision farming. In addition, He is a member of the Indonesian Sago Palm Society. Currently, he is the secretary of Production Technology and Agricultural Community Development Program at the College of Vocational Studies at IPB University. He can be contacted at email: muhammadiqbalnurulhaq@apps.ipb.ac.id.