Predicting reaction based on customer's transaction using machine learning approaches

Banking advertisements are important because they help target specific customers on subscribing to their packages or other deals by giving their current customers more fixed-term deposit offers. This is done through promotional advertisements on the Internet or media pages, and this task is the responsibility of the shopping department. In order to build a relationship with them, offer them the best deals, and be appropriate for the client with the company's assurance to recover these deposits, many banks or telecommunications firms store the data of their customers. The Portuguese bank increases its sales by establishing a relationship with its customers. This study proposes creating a prediction model using machine learning algorithms, to see how the customer reacts to subscribe to those fixed-term deposits or offers made with the aid of their past record. This classification is binary, i.e., the prediction of whether or not a customer will embrace these offers. Four classifiers that include k-nearest neighbor (k-NN) algorithm, decision tree, naive Bayes, and support vector machines (SVM) were used, and the best result was obtained from the classifier decision tree with an accuracy of 91% and the other classifier SVM with an accuracy of 89%.


INTRODUCTION
Banking advertising comprises advertisements by financial institutions. This category includes, in addition to advertising directed at bank clients, business reports and information pamphlets; statements about the payment of new shares, reports on investment program outcomes, as well as several additional financial announcements may also be included [1]- [3]. Many banks or telecommunication companies store their customers' data to establish a relationship with customers and provide them with the best offers and at the same time be appropriate for the customer with the guarantee that the company will recover their deposits. The Portuguese bank increases its sales by establishing a cordial relationship with its customers. Transaction predictions use k-nearest neighbors' (k-NN) algorithm [4], [5], decision tree [6]- [10], naive Bayes [6], [11]- [13], and support vector machines (SVM) [14] to bank marketing. This study proposes creating a prediction model using machine learning algorithms to see how the customer reacts to subscribing to those fixed-term deposits or offers made through their past data [15]- [19]. This classification is binary, i.e., the prediction of whether or not a customer will participate in these offers. Four classifiers k-NN, decision tree, SVM, and naive Bayes were used.

LITERATURE REVIEW
As the number of Internet users and businesses grows, many clusters of e-commerce applications appear not to be physically linked to each other in the system but are inter-related in business. Online banking has been in practice since the 1980s, when it was first introduced by four major banks in New York [20], [21]. The study of commercial financial transactions made a short-term expectation using a logistic regressive model and a SVM model. The comparison between them concluded that the SVM model prediction was better than 100% of the logistic regressive model and 97.67%, respectively [22]. The study was based on building an intelligent banking system on Hercules. The results of the study reached that the Hercules architecture's intelligent online banking has greatly improved the intelligence and security of the traditional online banking system. Finally, we summarize and analyze the intelligent online banking system's value and innovation, looking forward to discovering the system's flaws [23]. Research was also conducted based on neural networks for predicting the transactions by automated teller machine (ATM) [24]. Additionally, research was done on the security flaws in online and mobile banking systems [25], banking fraud detection [26], banking apps and online payment systems [27], a strong and secure authentication method [28], and Internet banking user behavior. It is advised that initiatives meant to boost trust in the financial sector receive more attention. Winning customer trust through activities such as the secure processing and transmission of highly secret data could be a helpful step toward retaining electronic customers [29]. The dataset is downloaded from the UCI machine learning repository and is correlated to the Portuguese banking institution's direct marketing campaign. These campaigns were focused on calls over the phone. More than one call to the same customer was also made to reach whether (yes) or not (yes\no) their "term deposit" product was subscribed to. There were 4 datasets with all examples (41,188) and 20 inputs ordered by date from which bank-additional-full.csv is used (from May 2008 to November 2010). There are 20 variables for input and 1 variable for output (desired target). The dataset contained a variety of customer details, including age, job, marital status, education, default, housing, loan, contact, month, day(s)-of-week, duration, campaign, prays, previous, poutcome, em.var.rate, cons.price.idx, cons.conf.idx, euribor3m, nr. employed and one output variable y denotes whether or not the consumer subscribed to the term deposit. Such datasets were loaded in the Python language for preprocessing to check for any missing values and it was found there were unknown values. However, the dataset is imbalanced [30], [31]. Deep learning, data mining, a robot, a decision tree, and k-NN were among the other studies presented [32]- [38].

METHOD 3.1. Pre-processing
In this step, the data needs to be processed and it is done in four steps: the first step is the drop of unknown data. The second step is to convert features from categorical to numeric; the third step is to balance the data, and the last step is to choose the best feature. In the beginning, we had to clean the data from the null values. We excluded features with 330 unknown attributes, and some categorical data have been converted to numeric.

Dataset unbalanced
When we are using some classification algorithms such as k-NN or naive Bayes, and so on, those algorithms can perform poorly, or they may lead to some overfitting problems. Because we have so many imbalanced data points in our dataset, such that the ratio of one class is 70% and the other class is 30%, the accuracy method will be ineffective. When applying the dataset to the classifiers, we will not get high accuracy. However, before applying classifiers, we must first process the dataset [39]. The difference between the balanced dataset and the unbalanced dataset was significant as it showed the bad performance in the unbalanced dataset and the low accuracy with the delay in finding the best value for K, which reached 37 with an accuracy of 0.92. Unlike the weighted data, it reached a value of K=3, with a high accuracy of 94, as shown in Figure 1. To solve this problem, we will use two techniques: random under-sampling and the synthetic minority oversampling technique, as shown in Figure 2.

Classifiers and test evaluation 3.2.1. Random under-sampling method
The method is processing of the minority to make it equal to the majority by sampling fakes from some minority class randomly and then repeating this process until the two classes are equal [40]. However, it may negatively affect the model's performance by copying the noise rows. Moreover, they are known as "naive resampling" approaches because they make no assumptions about the data and utilize no heuristics. This makes them easy to design and quick to execute, which is ideal for highly large and complex datasets.

Synthetic minority oversampling technique
Synthetic minority oversampling technique (SMOTE) is attempting to select a point from the minority and then produce a new point using the k-NN method [41]. This is a sound strategy since, in terms of feature space, new samples are quite similar to existing instances from the minority class. But necessitates numerous calculations, as depicted in Figure 3.

Experiment to choose of approach dataset unbalanced
After applying the two techniques, it was found that the best choice was SMOTE because the results were better in generalization and reduced overfitting in the k-NN model. Moreover, the number of samples was sufficient for the purpose of applying this technique, as shown in Figure 4. It has been concluded that the value of k=3 was deduced early in the first cycle, whose high accuracy is 0.93 and that the other method achieves the value of k=11 in the last cycle, whose weak accuracy is 0.87, such that k starts from 3 to 100.

Features selection from original dataset
This involves choosing some features to improve performance and reduce the model's prediction time. When employing statistically based feature selection techniques, each input variable's relationship to the target variable is assessed, and the input variables with the most robust relationships are chosen. Although  Filter methods are based on calculating the score for each feature with a target and then choosing the best score between them. It is not a learning process, but rather the search for related features with a label prior to process learning [42]. Algorithm advantages are not computationally expensive, and they avoid features with least effects on the target [43]- [45].
The wrapper methods create a subset of the dataset by training (machine learning module) and then repeat this training process by adding or removing some features until finding the best combination to achieve the aim (using a greedy algorithm to find the best combination). The common techniques in this approach are forward selection and backward elimination [42]. Quality is an algorithm advantage, but it is computationally costly [46]- [48].

Experiment features selection
Two experiments were going to be conducted. The first experiment employed filter selection to obtain the best 14 features, yielding the results shown in Table 1. The second experiment used wrapper methods to get the best 14 features, and we got the results as shown in Table 2. As shown in Figure 5, after applying the k-NN model to both methods, we obtained the highest accuracy in wrapper methods.  0  age  1  job  2  education  3  loan  4  contact  5  month  6 Day-of-week 7 duration 8 campaige 9 Emp.vqr.ratecons.conf.idx 10 Emp.var, rate 11 cons.price.idx 12 Cons. price.idx 13 Euribor3m Figure 5. Metrics of accuracy for two methods

Classifiers and test evaluation 3.3.1. Classifiers
After applying data processing in the previous steps, in this step, we implement machine learning algorithms, which are divided into two categories: parametric and non-parametric classifiers. This process aims to find the best models for each of the two sections. Moreover, the best model for each type will be selected.

Non-parametric classifier
It is also called "lazy teaching" as it does not use assumptions in learning. Simply, the samples collected in training data are used [49]. The algorithms that under this technique are k-NN and decision trees. After processing it in the previous steps, we will apply our dataset to the k-NN classifiers and decision trees, compare the results, and choose the best algorithm between them.

Parametric classifier
Linear machine learning algorithms are what metric algorithms are called, and linear regression is frequently used in them [50]. This technique employs algorithms such as naive Bayes, SVM, and others. Following the processing in the preceding steps, we will apply our dataset to the naive Bayes classifiers and SVM, compare the results, and select the best algorithm among them.

Experiment test evaluation
After balancing the data, we have 53,258 rows, and we will experiment with separating the data 1 to 20% for the testing dataset and 80% for the training dataset. 2 to 30% for the testing dataset and 70% for the training dataset. According to exp experience using the decision tree in Figure 6, it has been found that 20% of testing is better than 30%.

EXPERIMENTS AND ANALYSIS
The experiment was performed with Python, which contains most of the machine learning algorithms. Before modeling, we need a preliminary exploration of the data set (41,188 rows and 21 features). We have previously explained that the data is not balanced to a large extent, and this problem was addressed using the SMOTE method, which was superior to the other methods. In this step, we will build the models both (k-NN and decision trees) for the non-parametric method, and we will try (Pace, SVM) for the parametric and choose the best of them in the models. The k-fold (5 folds) cross-validation that gave better results was employed in the course of conducting the study.

KMM
We obtained the best accuracy after changing the model settings k-NN, where the number of neighbors was chosen 3, and at the same time, the weight was determined to distance. The k-NN algorithm was chosen is brute with k-fold (5 folds) cross-validation, see Figure 7. Moreover, we got an accuracy of 93, but also the model does not work well in class 0, and values of k equal to 3 are undesirable because it tends to bring about bias.

Decision tree model
The decision tree algorithm was implemented and what was obtained was the highest accuracy after changing the model's settings, where the best criterion was chosen, which was Gini and min_samples_splitwas 2. The length of the tree was 14 with k-fold (5 folds) cross-validation. Moreover, we got an accuracy of 92, but also the model does not work well in class 0, as shown in Figure 8.

Naive Bayes model
Naive Bayes model algorithm is a classification method built on the Bayes theorem and predicated on the idea of predictor independence. It is simply said that the presence of one feature does not depend on the existence of any other feature in the class. The naive Bayes model algorithm was implemented with default values, and we got an accuracy of 81%, as shown in the Figure 9.

SVM model
SVM is a collection of supervised learning techniques used for outliers' identification, regression, and classification. The benefits of SVMs include still useful in situations where the number of dimensions exceeds the number of samples and efficient in high-dimensional environments. SVMs have been implemented by using the class of SVM model from Sklearn, and we got 89% accuracy by using the settings {'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}, as shown in Figure 10.

RESULTS
That the model is nonparametric, k-NN has 14 features and K-3 value of parameter will be testing accuracy of 92% and score before testing 91%. The model SVM has 14 features and Kernel='rbf value of parameter will be tested with 89% accuracy before testing 89%. With knowledge, the type of model is nonparametric. The model decision tree that has 14 features and a depth of 14 values of parameters will be tested with an accuracy of 92% and a score of 91% before testing. Finally, naive Bayes with 14 features and defaults of value parameter will have a testing accuracy of 80% and a score before testing of 80%. It is a parametric type of model. Table 3 shows the results of each model and the number of features that were used.

CONCLUSION
The algorithm and the model are two terms for processor and the simplification of problems. The model k-NN has 14 features and a K-3 value of parameter will be tested with 92% accuracy. The score before testing is 91%. Knowing that the type of model is nonparametric, the model SVM has 14 features and Kernel="rbf value of parameter will be tested with 89% accuracy before testing and 89% after testing. The model decision tree that has 14 features and depth=14 value of 14 values of parameters was tested with an accuracy of 92% and a score of 91% before testing. Finally, the naive Bayes with 14 features and value parameter defaults will have a testing accuracy of 80% and a score before testing of 80%. It is a parametric type of model. The results of each model and the number of features that were used. The current assignment uses 14 features rather than the original 20 because they were identified by feature selection techniques that improved the model's performance in terms of time speed. According to the task assigned to us, we had to choose only two algorithms. One is parametric and the other is referred to as non-parametric. The naive Bayes algorithm was selected as a parametric algorithm and got an accuracy of 80%. Moreover, k-NN, SVM's, and decision tree accuracy were (92%, 89%, and 91%) respectively, and all were considered nonparametric algorithms. However, the decision tree was selected as an alternative to the k-NN since the value of k=3 in the k-NN is low and could be sensitive to noise and outliers. For future work, deep learning and clustering can also be used to determine and know the loans. This work can be made in the form of an application that can be used on smart mobile phones in general.

Ghazwan Abdul Nabi Al Ali
received the B.S. degree in Computer Science from Iraq, University of Basra, and the M.S. degree in Computer Science from The University of Science Malaysia. He is currently working as a programmer at the University of Basra. His research interests include software engineering and deep learning. He can be contacted at ghazwan.alali@uobasrah.edu.iq.

Hussain A. Younis
received the bachelor's degrees from the University of Basrah, Iraq, master's degrees from the Shiats University, India. He is currently pursuing a Ph.D. degree with the School of Computer Sciences, Universiti Sains Malaysia (USM). He is currently a lecturer with the College of Education for Women, University of Basrah. His research interests include artificial intelligence, electronic education, robots, image processing, pattern recognition, QR code, biometrics, and intelligent information systems. He can be contacted at hussain.younis@uobasrah.edu.iq.