The role of Louvain-coloring clustering in the detection of fraud transactions

ABSTRACT


INTRODUCTION
One of data mining techniques using unsupervised learning is clusters [1], [2].Where unsupervised learning is learning without information so that it becomes new knowledge based on large data sources [3], [4].Clustering itself aims to group data on the same characteristics and areas based on the required characteristics of other regions [5], [6].Various studies are often carried out by clustering, as was done [7] do managerial clustering which has a lot of data and expertise in decision makers within the company so that it can be used as a reference in changing leadership.aside from that [8] do a categorization of behavior on bitcoin.Job operation requires a different amount of time, and it is impossible to process a series of operations on a job simultaneously because an operation must be carried out sequentially.The problem is choosing the best sequence of work activities and determining which machine can perform the task to minimize machine idle and standby time [9].
Various fields can be utilized in this clustering technique as is currently developing, namely in the business sector [10].In the field of business and various problems as well as with big data can be solved in the field of business [11].Various types of businesses are also a concern for the need for data mining techniques to gain new knowledge [12].As is the case in banking which has a financial transaction process where one type of crime that often occurs is fraud in the transaction process, both in the declared type of fabrication crime [13], [14].

609
Fraud in transactions in banking certainly causes a lot of harm to various parties [15].As in the incident that has been quoted from [16] perform fraud detection using a machine learning approach and produce an related-party transactions (RPTs) knowledge graph to obtain performance and verify fraud in transactions.Besides that, fraud also occurs from research [17] grouping fraudulent transactions in the data mining process with the aim of increasing the profits of the companies and obtaining optimal results so that managerial can improve the quality of the company.
With these problems, of course, to get contributions in the field of computer science, the right algorithm is chosen to detect fraud [18].Recorded transaction processes are of course large data and can be used as new knowledge [19].Thus, of course, an optimal algorithm is needed to detect fraudulent transactions.As done [20] creating an anti-fraud system in banking to be able to monitor fraud in transactions where the system is given knowledge with a clustering approach to data mining.
However, in business processes the Louvain algorithm is often used in clustering [18].Louvain algorithm is a community clustering algorithm based on graphs.So that data on transactions becomes a graph that can facilitate the detection of fraud as a community [21].Louvain algorithm has also been carried out [22] in conducting clusters for local energy market travel routes in the US where the optimization model on the Louvain algorithm is located on modularity based on a combination of graphic-intrinsic parameters so that it has an accuracy of 0.05 to 0.15.So that in this study an optimization was found in the Louvain algorithm with the image function so that decision makers can quickly get labels in the fraud transaction community to be the Louvain-Coloring algorithm approach.

MATERIAL AND METHOD 2.1. Dataset
The dataset in this research uses community banking transaction data where the data consists of 33,491 non-fraud data and 241 fraudulent transaction data.Because the Louvain algorithm can perform clustering to detect communities.However, one of the shortcomings of the Louvain algorithm on large datasets is its relatively low processing time.However, in terms of nodes and data proximity, it will certainly ensure whether there is no more data that is close to fraudulent.

Louvain-coloring algorithm in clustering
Louvain algorithm is an algorithm with unsupervised learning to be able to classify data fraud in banking transactions [23].Where the Louvain algorithm has modularity optimization and community aggregation with maximum modularity based on (1) [24].
.  = is a weight value that can replace the input of neighboring matrix values with side weights connected between nodes  and ,  = ∑   which is the degree of node I,   which is a community attribute, -function  (, ) is 1 if = and 0, then.=1 ∑   2 if the weights are graphs.Louvain will randomly arrange all nodes in the network according to the modular optimization method [21].Then, one by one, it removes and inserts each node into another community  until there is no appreciable increase in modularity (input parameter) as shown in (2): Louvain randomizes the existing nodes in the modular optimization method.For a definite increase, Louvain deletes and inserts each node in the community that is not the same as C so that a very visible increase does not occur.Like the (3) [25]: While  , and ∑  must be calculated for each test community, node-specific   /(2m) is analyzed.In this way, the final expression is only recomputed when different nodes are considered during modulus optimization [26].After completing the first stage, all nodes belonging to the same community are combined into one giant node.The links that connect the giant nodes are the sum of the previous links that have connected the nodes of the same different community.

RESULTS AND DISCUSSION
Clustering is carried out on communities that commit fraud in transactions, especially in banking.The one-time dataset is based on public data consisting of 33,732 datasets and has been informed that 33,491 data are not fraudulent and 241 are fraudulent in transactions.But in the proof to detect how accurate the knowledge that will be achieved based on the clustering technique is of course tested with the Louvain algorithm.Where, after data exploration, it is confirmed whether transactions that are not fraudulent approach or identify fraud.

Louvain algorithm to detect community
This test uses 6 scenarios to detect communities with the Louvain algorithm, using several combinations of existing parameter values, which will be discussed in the test scenarios.The default The role of Louvain-coloring clustering in the detection of fraud transactions (Heru Mardiansyah) parameter value is the new network's two-phase maxLevel (default=10), which is the level for the community hierarchy.When running, the higher the level, the larger the community, which may not be exactly what is desired.Next, maxIteration (default=10), which is the first phase, continues to repeat iteratively until the increase in modularity is negligible or the number of iterations is reached as specified.Tolerance (default=0.0001) the smaller the tolerance, the higher it is, the better, but more iterations may be required.Concurrency (default=4) which is the number of threads that can run simultaneously.

Scenario 1
Scenario 1 uses parameters with the default values maxLevels, maxIteration, tolerance, and concurrency.Comparison of test results from the Louvain algorithm and Louvain coloring.Louvain coloring's modularity value increased by 0.981199% compared to Louvain.Louvain coloring has a better processing time with a reduction time of 57.82%.The number of communities produced by the Louvain and Louvain coloring algorithms only has a difference of 0.07% (normal).The test results show a modularity value of 0.981199, communityID 181075, userCount 206, flaggedCount 7, flaggedRatio 0.033891.By comparing all scenarios, scenario 5 is the best scenario.So, based on the test results for scenario 1 above, the Louvain coloring algorithm is better than the Louvain algorithm.

Scenario 2
Scenario 2 uses parameters with a maxLevels value of 10, and the default values for maxIteration, tolerance, and concurrency.Comparison of test results for the Louvain algorithm and Louvain coloring.Where, Louvain's modularity value is 0.981199% higher compared to Louvain coloring.And the time from Louvain coloring is better than Louvain with a reduction time of 23.53%.And the number of communities produced by the Louvain algorithm and Louvain coloring only has a difference of 0.03% (normal).So, based on the test results for scenario 2, the Louvain algorithm is better than the Louvain coloring algorithm.Testing is carried out by changing the maxLevel parameter with a test value of 1 to 10, and for other parameters, namely maxIteration, tolerance, and concurrency, use the default values.Based on Table 1  Table 1 shows the test results for the maxLevel combination with the results parameters modularity, communityCount, communityID, userCount, FlaggedCount, Ratio having different values when the maxLevel parameter with different values is run, and the accuracy results do not change when the test is carried out with the maxLevel value different.Testing is carried out by changing the maxLevel parameter with a test value of 1 to 10, and for other parameters, namely maxIteration, tolerance, and concurrency, use the default values.When maxLevel with a value of 10 is tested, it gives the highest modularity results from the others.

Scenario 3
Scenario 3 uses parameters with a maxIteration value of 700, and the default values for maxLevels, tolerance, and concurrency.Comparison of test results for the Louvain algorithm and Louvain coloring.Louvain coloring's modularity value is 0.000128 higher compared to Louvain.The time from Louvain is better than Louvain coloring with a time reduction of 80.23%.And the number of communities produced by the Louvain coloring algorithm is better than Louvain.So, the test results for scenario 3 are that the Louvain coloring algorithm is better than the Louvain algorithm.The test is carried out by combining the maxIteration parameter values, and for the other parameters, namely maxLevel, tolerance and concurrency, they use the default values and in Table 2

Scenario 4
Scenario 4 uses parameters with a tolerance value of 0.000000001, and maxLevels, maxIteration, and concurrency are the default.Comparison of test results from the Louvain algorithm and Louvain coloring.Louvain's modularity value is 0.0063% higher compared to Louvain coloring.And the processing time of Louvain coloring is better than Louvain with a reduction time of 17.45%.And the number of communities produced by the Louvain and Louvain coloring algorithms has a difference of 0.05% (normal).So, the test results for scenario 4 are that the Louvain coloring algorithm is better than the Louvain algorithm.Testing is carried out by combining the tolerance parameter values, and for other parameters, namely maxLevel, maxIteration, and concurrency, using the default values and shown in Table 3 scenario 4 results.In Table 3 the values for the tolerance parameters used are 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 00000001, 0.000000001, 00000000001, with the highest test results on the parameter value 0.000000001 which has a modularity value 0.847597.

Scenario 5
Scenario 5 uses parameters with a concurrency value of 1, and maxLevels, maxIteration, tolerance, and are the defaults.Comparison of test results from the Louvain and Louvain coloring algorithms.Louvain coloring's modularity value is 0.0024% higher compared to Louvain.And the time from Louvain coloring is better than Louvain with a reduction time of 55.96%.And the number of communities produced by the Louvain and Louvain coloring algorithms has a difference of 0.12%.So based on this, the test results for scenario 5 are that the Louvain coloring algorithm is better than the Louvain algorithm.Testing is carried out by combining the concurrency parameter values, and for other parameters, namely maxLevel, maxIteration, and tolerance using the default values in Table 4 scenario 5 results.In Table 4 the values for the concurrency parameters used are 1, 2, 3 and 4, with the highest test results on the value of parameter 1 which has a modularity value of 0.847234.

Louvain-coloring algorithm optimization to detect community
In this scenario, tests will be run with parameter values for maxLevel, maxIterations, tolerance and concurrency.The test results, the value of Q_m is 0.981199, communityID 181075, userCount 206, flaggedCount 7, flaggedRatio 0.033891.By comparing all of Louvain's scenarios, this scenario can be said to be the best scenario.The results are shown in Table 5.
From the several scenarios that have been carried out from Table 5 it is clear that the optimal results are clearly visible.Based on the results that have been achieved, it can be seen from the dataset analyzed that fraud has some closeness from the data and transaction community that is good and not fraudulent.So that a graph is formed in Figure 2. From Figure 2 on the graph, it can be seen that the results with the Louvain algorithm have been optimized for Louvain-coloring, there are 19 data that are close to the Fraud community.The modularity value of the Louvain coloring algorithm is better than the Louvain algorithm.From the 5 test scenarios carried out, the Louvain coloring algorithm excels in 4 scenarios, namely 1, 3, 4 and 5.Meanwhile, Louvain's algorithm only excels in scenario 2. The modularity value of the Louvain Coloring algorithm is at the lowest value of 0.980969 and the highest is 0.981207.Whereas the Louvain algorithm has the lowest value of 0.981037 and the highest of 0.981157.The highest modularity value of the Louvain coloring algorithm is obtained through scenario 1 which uses parameters with default values.While the highest modularity value of the Louvain algorithm is obtained through scenario 4 which uses a tolerance value of 0.00000001 and other parameters with default values.
The Louvain coloring algorithm has the fastest or shortest processing time produced by scenario 4, which is 123 ms, and the longest is produced by scenario 1, which is 394 ms.While the Louvain algorithm has the fastest processing time produced by scenario 2, namely 136 ms, and the longest produced by scenario 1, which is 934 ms.The longest time produced by the two algorithms in scenario 1 has a very large or quite significant time difference.
The Louvain coloring algorithm has the smallest community generated by scenario 2 with a total of 11,641 while the largest is generated by scenario 4 with a total of 11,668.Meanwhile the Louvain algorithm has the smallest community generated by scenario 2 with a total of 11,649 and the largest community is generated by scenario 4 with a total of 11,664 It is very interesting that the two algorithms together produce the smallest and largest numbers in the same scenario.And for the number of communities produced, even though it has a difference, it is in a very small range between 4 to 14 communities (<1% of the community).The researchers consider the number of communities generated can be excluded from the assessment of the algorithm.So that exposure in the Louvain-coloring algorithm is able to solve the clustering problem in transaction fraud by forming a new label so as to facilitate the decision-making process.

CONCLUSION
The focus of this research is cluster research on big data, namely fraudulent and non-fraud transactions in banking.Where the data that has been received is 33,491 non-fraud data and 241 fraudulent transaction data.Then to detect this, clustering is carried out on data mining techniques so that they are able to detect fraud transactions.This has been proven where the cluster process using the Louvain algorithm is able to solve clustering problems in fraudulent transactions.From the data that has been received based on the Louvain algorithm which approaches fraud data increases and has an accuracy of 90%.However, to make labeling easier, the Louvain algorithm is optimized by presenting the data in a colored graph.This makes it easier to detect.This is of course the basis for future predictions or forecasting in the cluster process in the behavior of fraud perpetrators.
Int J Elec & Comp Eng ISSN: 2088-8708  The role of Louvain-coloring clustering in the detection of fraud transactions (Heru Mardiansyah)

Table 4 .
Scenario 5 results The role of Louvain-coloring clustering in the detection of fraud transactions (HeruMardiansyah)613