http://ijece.iaescore.com Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble

Info 2021 Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, support vector machines (SVM), and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of


INTRODUCTION
Predicting company bankruptcy is one of the most important parts of management science problems. The main purpose of this prediction is to categorize companies that are safe and unsafe or bankrupt [1]. In addition, the wrong decision-making in financial institutions that are in financial difficulty or distress is experienced by many social costs such as owners or shareholders, managers, government and others. Therefore, the prediction of company bankruptcy has become a special concern among industrial practitioners as well as academics or researchers [2]- [5].
Nowadays, machine learning techniques [6] and artificial intelligence [7] computation have been widely used by researchers to solve bankruptcy prediction problems such as support vector machines (SVM) [8]- [16], decision trees [17]- [23], artificial neural networks (ANN) [24]- [31] and discussion with systematic literature review technique [32]- [37]. Meanwhile, improvement in machine learning techniques through various strategies has also been carried out such as boosting improvement based on feature selection known as FS-Boosting is proven to have good performance as a learner and has higher accuracy and diversity based on two selected company bankruptcy data sets [38]. The combination of SVM and ANN integrated with dropout, auto-encoder proved to produce better accuracy than logistic regression, genetic algorithm and  [39]. A hybrid approach based on synthetic minority over-sampling technique known as the SMOTE technique with the ensemble learning method, i.e. Boosting, Bagging, Naive Bayes, ANN, Random forest, Rotation forest and diverse ensemble creation by oppositional relabeling of meaningful training examples (DECORATE) are proven to efficiently improve performance parameters such as accuracy, AUC, error types 1 and 2, G-mean through the collected data set of Spanish companies [40]. The integration approach of SVM proportions, boosting and bagging in an ensemble strategy called Bagged-pSVM and Boosted-pSVM which is based on a learning perspective with label proportions where unlabeled learning data are provided with different bags and only given a bag based on the proportion of instances level with particular classes. This approach is proposed to overcome a large number of instance-level labeled learning data [41]. The hybrid of SMOTE-edited nearest neighbor (SMOTE-ENN) as over-sampling technique and CBoost algorithm as cost-sensitive learning or predictive model. This hybrid produces the best performance of existing learning techniques [42]. Reducing the unbalanced class of bankruptcy data sets using over-sampling or SMOTE techniques then ANN as a predictive model. This concept resulted in significant performance than the ANN and weak learners trained in the AUC section [43]. Borderline synthetic minority over-sampling technique (BSM) and stacked auto-encoder (SAE) based on the Soft-max classifier are proposed to solve the unbalanced classification of company bankruptcy prediction problems. This combination approach is considered more efficient than the combination of BSM with machine learning techniques and machine learning techniques without over-sampling [44].
At the same time, the process of running the company's business produces financial data that can be used to predict bankruptcy [45]. The latest discussion regarding bankruptcy prediction focuses on feature selection [33]. Company financial data such as sales, profit and asset data affect the analysis process of bankruptcy predictions. The resulting company financial data has many features so that the best feature analysis process is needed to improve the quality of predictions. Two types feature selection based on filter and wrapper with two types classification techniques based on bagging and boosting ensemble classifier to model predictive [46]. Son et al. [47] used Skewness reduction for data normalization and XBoost algorithm to select features important to serve as attributes of bankruptcy predictions. The result of Son et al.'s method can improve predictions with an accuracy of 17% of the AUC level. Nobre [48] used the XGBoost algorithm to feature selection combined with principal component analysis (PCA) and discrete wavelet transform (DWT) to analyze bankruptcy predictions. The results of the analysis show that the method used has a return value of 49.26% Based on previous research, increasing accuracy is the main focus in predictive studies of corporate bankruptcy. Combined approaches or improved methods are still very much needed to achieve better accuracy. Therefore, This study uses a feature analysis approach to select the best features, and combines several machine learning algorithms (stacking ensemble) to improve accuracy. XGBoost feature importance is used to select highly influential features based on the weight value of each feature during the prediction analysis process [49]. In addition to selecting the best features, this study also combines machine learning methods consisting of K-nearest neighbor, decision tree, SVM and random forest in this case called ensemble learning with the stacking method [33]. The purpose of this study was to find the highest accuracy by selecting the best selection feature and combining several machine learning methods using a stacking ensemble.

THEORETICAL BACKGROUND 2.1. Boosting tree method
Boosting is a superior method in combining several basic classifications to produce an algorithm that is superior in achieving accuracy than other classification algorithms. Boosting is an additive ensemble method that works by adding new models to reduce errors made by older or existing models. Sequentially, the models are added in such a way that no possible improvement occurs. Boosted models can produce good accuracy even though the basic classification has only slightly better accuracy than random classification, so that the basic classification is considered a weak learner [50].
In a supervised learning setting, Let data-set D = {( , ): ∈ ℝ , ∈ ℝ} arranged of n data with m features and n labels, a boosting tree model uses K additive functions ( ) to predict the out put.
Clearly, : ℝ ⟶ indicates the structure of each tree that maps a sample to the corresponding index of leaf and W ∈ ia a weight of leaf with T leaves. In order to learn the function set, we minimize the function of loss ( ) = ∑ =1 ( , � ) + ∑ =1 ( ) where ( ) = + ∥ ∥ 2 is a term of regularization that penalizes model complexity. The function of loss L(g) contains K-function as parameters so it is so hard to optimize directly. Instead, we optimize the additively model. Given � be ℎ sample prediction at ℎ iteration. We will add to minimize. Which means that we greedily add the that most improve our model for each iteration. We use approximation of second-order that uses a gradient on this intermediate function of loss ( ) . This is the reason we name it gradient boosting algorithm as shown in algorithm 1 in Figure 1. The Xgboost [49] is an open-source library of software that gives framework of gradient boosting for C++, Java, Python, math-lab and R. It uses a gradient-boosting algorithm that results in a prediction model in the form of an ensemble of weak prediction models, which are decision trees, typically.

Stacking ensemble modeling
The stacking ensemble introduced by Wolpert [51] then formalized by Breimen [52] and theoretically validated by Van der Laan et al. [53] is one of the learning algorithms known as a superior learning framework based on generalizing losses. Due to its superior performance compared to other learning algorithms, Stacking ensemble has many applications for predicting company bankruptcy. As described in algorithm 2 in Figure 2. Therefore, to improve the prediction accuracy, the stacking ensemble is proposed in this study to be combined with the XGboost algorithm.

RESEARCH METHOD
The research method of bankruptcy prediction analysis uses several stages, i.e. data collection, preprocessing data, feature importance and modeling. Generally, the research framework can be shown in Figure 3. The data-set in this study was taken publicly from Kaggle. The data-set is historical data on bankruptcy from Polish companies and has a range of years that are listed in the data-set starting from 2000 to 2012 [54]. The data-set is composed of 65 features related to the company's business continuity process. The total data rows in the data-set are 42,627 rows. The target data-set feature is in the "class" column with detailed contents, namely 0 and 1. Variable data 0 means that it is not bankrupt and vice versa in data variable 1 indicates bankruptcy. Data pre-processing means normalizing data sets that do not support the analysis process [47], [55]. Data that do not support the analysis process are repetitive data, blank data and abnormal data. Features that are not related to the analysis process in the data set will be normalized [56], [57]. The datapreprocessing method in this study is data scaling. Data scaling is a method of simplifying the range of numeric data values in a data-set that has the same value [58]. Data scaling creates a balanced range of numeric data. Importance features are selected based on the calculation of the XGBoost algorithm [48]. The method of determining the value of the feature weight is calculated based on the effect of the feature on the results of predictive analysis. The final result of determining the best features is applied to the data-set to improve the results of prediction accuracy. The modeling process uses stacking ensemble learning, which is the process of combining several machine learning algorithms such as K-nearest neighbor, decision tree, gradient boosting tree and random forest [59]. Ensemble stacking is one of the ensemble learning methods and can use heterogeneous machine learning methods. Stacking ensemble learning uses meta-learning algorithms to find the best results for combining predictions from two or more basic machine learning algorithms. The stacking ensemble has the advantage of being able to take advantage of the work processes of several machine learning algorithm models that function well in classification or regression tasks and make predictions better than the work process of one machine learning model in ensemble learning.

RESULTS AND DISCUSSION
The analysis process uses the google collab tool with the python programming language and the help of the scikit-learn, pandas, numpy libraries and other supporting libraries. The data-set used comes from Kaggle with a detailed data-set consisting of 5 CSV files which are combined into one to facilitate the prediction analysis process. The pre-processing data stage is scaling the data on the data-set using the standard scaler python library. The data scaling process was applied to each of the numerical data contained in the data-set. Data transformation is only performed on features used for the prediction process. This is because the target feature data are binary, namely 0 and 1, so there is no need for transformation through data scaling. The results of the scaling data are then analyzed at the feature importance stage.

Feature importance
The feature importance stage is the process of selecting the best features from the research dataset. The process of determining the best features uses an algorithm of feature importance from the XGBoost machine learning method. The important features are selected based on the weight value of each feature generated during the prediction analysis process. the best feature is selected based on the feature weight that is more than 10. Details of the best feature selection results are shown in Table 1. The modeling stage is in the form of a normalized bankruptcy prediction analysis process through the data pre-processing stage. At this stage the dataset is analyzed using various machine learning methods. The prediction analysis process begins by dividing the data into training data and test data with a 75:25 ratio. The data sharing process was stratified and repeated. Stratified is a data sharing method based on the weight ratio ratio of the features for which the selected category. In this case the category feature selected is the target feature. Repeated is a method in which the data sharing process is repeated according to the parameters. The looping process is added with data shuffle, resulting in different data for each iteration. Cross-validation is included in this process to avoid overfit and underfit to maximize the quality of predictive analysis. Overfit is a model that is highly dependent on the dataset and has a high error value on the testing data. Underfit is a model that cannot fully understand the dataset being analyzed.

Stacking
The machine learning method used in the modeling stage of this study is stacking ensemble learning. Stacking means stacking, which means piling up the work process of machine learning methods to produce better predictive results. Machine learning methods that can be used in stacking can be selected heterogeneously. The type of stacking ensemble method used in this study is the classification of bankruptcy predictions.
The stacked machine learning algorithms in this study are K-nearest neighbor, decision tree, gradient boosting tree and random forest in this case called the base model. The base model can consist of many algorithms, but the more algorithms are used, the more resources and time it uses. Algorithms in the base model are not limited to just one model, they can also be used from many variations of the model according to research needs. This research process uses the classification method so that the algorithm used is a type of classification. The result of the base model buildup is calculated by the meta learner. Meta learner is a machine learning algorithm that is used to analyze and combine the results of each base model in order to obtain a better prediction rate from the base model. The meta learner used in this study is LightGBM. The final result of the stacking model is a prediction generated by the meta learner. The accuracy details are shown in Figure 4. In Figure 4, it is shown that the difference between the models varies. The lowest level of accuracy is obtained in the decision tree algorithm with only 94.8%. The highest level of accuracy is obtained by the stacking model algorithm 97%.

CONCLUSION
In this study, a new method has been used to analyze bankruptcy predictions using the best feature selection and ensemble learning. The process of selecting the best features uses XGBoost's important features and the stacking method. The base model used is the K-nearest neighbor, decision tree, gradient boosting tree and random forest. The meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.