Framework to predict NPA/Willful defaults in corporate loans: a big data approach

ABSTRACT


INTRODUCTION
Banking system with integration of advent technology helps to foster the economic development. They perform mainly two important functions. One is mobilizing deposits by providing attractive interest rates to convert inert savings into active capital and second is distributing these deposits through loans to the corporates to grow further that directly helps in economic development. Availing loan has become an easy process in India with the credit and cheque settlements. Banking as well as Non-Banking Finance Companies offer different types of loans according to requirements of corporates [1,2]. The requirements can be purchase of inventory, payment of long unpaid bills, building of infrastructure, purchase of equipment, loan repayments and so on [3]. Based on the requirements loans are broadly classified as Personal Loan, Credit Card Loan, Home Loan, Vehicle Loan, Education Loan, Loan against the Insurance Schemes/FD/Mutual funds, and Business Loan to Corporates. There are several business loans possible such as Working capital loan to use in day to day activities, Real Estate loan to buy a property for production, Venture loan to start up business, Line of credit loan for certain financial assistance periodically, Equipment loan to assist buying asset requirements, Term loan to acquire long term fixed assets, Loan against property for supporting business by providing security to the corporates, Cash Credit facility as overdraft against the security of the stock by pledging the current assets and Letter of creditwith which the bank guarantees that the seller will receive payment on certain conditions. In this paper, focus is on the process of managing corporate loan, as the recovery of these loans is tedious task and it affects economy of the country heavily. Healthy banking system represents healthy economy of the nation. However there are hindrances to achieve the required set up for the same. All the corporate loans borrowed do not end up as assets for the bank. There are two outcomes of the loans: One is performing assets and another one is Non-Performing Assets (NPA). NPAs are the loans which generate the loss in capital of banks and are not easily recoverable by the banks. This is the most tedious challenge for banking sector as it impacts the performance by declining the profits. It has become major problem for all public sector and private sector banks. According to RBI report 2016, total gross NPA amount was 6 lakh crores. By 2017 there was increase of 1 lakh crore [7.31 Cr]. Losses are over four times more than the profits indicating NPA's power to trap the economy of the country in vicious debt cycle.
Banks provide big loans to corporates in order to achieve higher profits [4]. Companies start behaving as a defaulter by showing losses in company's finance. Some deliberately don't repay even if they have sufficient financial resources to pay. Companies propose to pay these loans by taking other loans from multiple banks. Unrecoverable loans put company to bankruptcy status. Due to the willful default behaviour, many genuine companies do not get economic support at the time of need and may end up in the failure. Hence, such companies may not be able to pay existing loans, and get defaulter tag. When an individual or business enterprise declines to fulfill payment commitments with financial institutions even when it has sufficient capacity for repayment, such a borrower unit is considered as willful default. More bad loans get generated because of existing bad loans. In order to prevent the failing economy the banks decide to lend money to save the companies going bankrupt. They take advantage of the situation leading to increased willful defaults, bringing the economy back to the same status.
It has been observed that there is a rising trend in NPAs, especially in public sector banks. There are several causes for this and there is a strong evidence for defining the relation between fraud and NPA [5]. RBI data obtained through RTI request indicate that 8670 loan fraud cases amounting Rs. 612.6 billion are recorded over last five financial years. These frauds are referring to cases where borrower deliberately tries to deceive the bank and does not repay the loan amount [6].
NPA is a major issue faced by all commercial banks as the banks are considering loan advances as revenue generating asset. Quality of this asset need to be taken care to improve the profitability of financial institutions, and ultimately financial climate of the economy as a whole. In this regard early detection of NPA is a big relief for such a major problem faced by all financial institutions. While analyzing the scenario, it is not just the poor economic conditions that resulted in NPA, but deliberate defaults have also resulted in huge piled up of NPA. Hence, more emphasis on identifying willful default is the need of the hour. The objective of this paper is to build a data model for early detection of the willful default. The process of classifying willful default involves considering various categories of parameters such as financial, personal, social etc. The data to capture these parameters could be structured, unstructured and needs continuous analysis. Also, the data need to be captured from various sources which are called as Heterogeneous sources. Hence, Big data technology is used to design a data model where the size, variety and complexity of data can be handled effectively [7]. Various classification algorithms are designed using big data approach to evaluate the classification data model for prediction of early stages of NPAs.
The rest of the paper is organized as follows. Section 2 discusses about the background of the work with respect to national and international scenario and literarture review. Section 3 describes parameterization process and data model for npa/willful defualt indentification. Section 4 explains the framework for NPA/willful default identification. The validation process of framework and evaluation of prediction model is explained in Section 5. Secion 6 drwas conclusion.

BACKGROUND
There are several policies, schemes which the law agencies have set up to deal with the falling economy of the country. The extensive literature survey has been carried out for studying the NPA scenario in India as well as in various other countries [8][9][10]. Also, the study has been done to identify various parameters which are useful for NPA identification process.

International scenario
China recently has 250 million dollars of bad debt. These are mainly the loans that are directly related to real estate, used to develop the infrastructure. The state of the NPA is due to political and social implications, legal impediments, bankruptcy laws, real estate. Italy also faced 207 billion dollar bad debt due to real estate. Bad debts along with national debts incurred huge financial crisis for the country. However as it is part of the Eurozone, it was saved by bailout funds provided to recapitalize banks.
Russia accounts up to 9.16% NPA ratio of the total loans. It is because the country is mostly dependent on oil and gas exports. When global oil prices crashed down, it marked collapse of Russia's oil and gas industries. In turn, banks approved loans to rescue the economy. However the sanctioned money never came back leading to the financial crisis. Adding to its economic sanctions imposed by America and other European countries caused slowdown in economic growth.
Spain faced debt/housing crisis of unpaid loans in 2008. But Government provided fix to the issue quickly with remedial measures. As a result, the bad debt decreased from 6.09% in 2016 to 5.7% in June 2017. Ireland is facing the NPA issue due to economic slowdown. It has set up National Asset Management Agency for insolvency services to support real estate and housing debtors. These remedial measures have dropped country's bad loan ratio from 27% in 2013 to 14.2 percent in 2016. This trend is contributing to success which can be attributed to successful debt restructuring programs.

Indian scenario
In Indian financial sector during 2017, the most discussed topics are GST, demonetization and NPAs. NPA has led to almost 10% of the loans impacting around 9 lakh crore, affecting Indian economy negatively. RBI is the Indian banking institute which coordinates and regulates the activities of the banks in Indian economy.
Major challenges faced by RBI are mentioned as follows [3]: a. NPA: As explained earlier, NPA is the indicator to identify the status of the corporate loans which are not regularly settled and creates financial crisis for banks as well as for company. NPAs affect the financial growth of the bank and hence declines the economic condition of the country. In order to tackle this, Indian government has taken many initiatives. 1) Temporary relief of several thousand crores 2) Special courts to deal with companies having bad loans 3) Reduced interest rates 4) Merger of banks to reduce burden of bad loans b. Bank Frauds and Cyber Threats: Bank frauds and cyber-attacks on financial transactions are illegal means of obtaining the money or assets especially from bank. One way to obtain money from a bank is to take out a loan, which bankers are more than willing to encourage if they have good reason to believe that the money will be repaid in full with interest. A fraudulent loan, however, is one in which the borrower is a business entity controlled by a dishonest bank officer or an accomplice. The "borrower" then declares bankruptcy or vanishes and the money is gone. The borrower may even be a non-existent entity and the loan merely an artifice to conceal a theft of a large sum of money from the bank. This can also be seen as a component within mortgage fraud. Today's robbers are doing robbery behind the internet using targeted and sophisticated cybercrime tactics. Some of the example attacks are phishing, Carbanak malware, SQL injection attacks, and attacks on bank database, credit cards and on online financial transactions. IT teams at banks have increased protection of customer data and limited credit card fraud, but the security of most banks' internal systems still need to be improved. c. Increase in excess liquidity: The increase of penalty rate will increase the interest rates and excess reserve owned by banks. Therefore, the total liquidity in economy will increase rapidly without involving policy rate reduction mechanism (loose monetary policy), just when the liquidity should be restricted. The reason behind increase in excess liquidity in bank is the economic condition which is in liquidity trap. Liquidity trap is a condition where return from banking loan is too small to cover intermediation cost and banks get higher yield in reserves than giving loans. In this condition, expansive monetary policy will only cause increase in excess reserves. Due to increase in liquidity, financial crisis are increasing in the banks which leads to weakening the domestic currency with respect to international currencies.

Literature review
In Indian financial sector during 2017, the most discussed topics are GST, demonetization and NPAs. NPA has led to almost 10% of the loans impacting around 9 lakh crore, affecting Indian economy negatively. RBI is the Indian banking institute which coordinates and regulates the activities of the banks in Indian economy. Charan and Brar [11] have presented a study on stressed assets in India. They have mentioned about identification of number of factors that lead to this situation. They have identified broad categories of the reasons such as stress for global slow down, governance related issues, political factors as well as malintentions and misconduct. They also emphasize need for extensive research into the factors that cause deteriorating asset quality in public sector banks.
In the study on frauds in the Indian Banking Industry by Charan et. al., they used interview based approach to identify the reasons for frauds in banking sector [5]. They mention the main factors as lack of supervision from the management, lack of incentive mechanisms for employees, non-cooperative staff, corporate borrowers and third party agencies etc. One very important thing noted is absence of strong regulatory system and absence of tools and techniques to detect early warning signals.
Bardan and Mukhrjee [7] deal with willful default and its implications for profitability and decisionmaking process of the loans at banks. They examine the cases where the borrower defaults willfully by under reporting its cash flow. In the analysis they mention it is necessary for the regulator to choose lower loan capacity to avoid NPA levels at the bank due to willful default. However, it will exert sinking pressure on the profit level of the bank. Hence it will face a trade-off between greater incidence of willful default and higher profit of the bank. They also emphasize that the reason for increasing willful default is weak monitoring and supervision system, poor bankruptcy laws in developing countries like India. All these give opportunity for the borrower to willfully default the loan.
The research papers [12][13][14][15][16] show that the risks the banks face and default behavior were challenges even two decades before, however with the new technologies at hand challenges have become more difficult to address. As per the literature study and national and international scenarios, the aim of the work is to define the standard process to identify the parameters which can be used for early detection of fraud behavior and further helpful for early identification of NPAs/Willful default.
The objectives of the proposed approach to identify NPAs/Willful default are as follows a. Understand the loan process b. Identify data parameters for early identification of willful defaulters or NPAs c. Identify suitable technology and develop model and algorithms for willful default identification

PARAMETERIZATION PROECESS FOR NPA/WILLFUL DEFUALT INDENTIFICATION
The process of loan sanctioning after the request for loan till the completion of it is shown in Figure 1. The main important block of the loan process is monitoring the financial health of the corporate/customer in order to understand the fraud or willful behavior. Monitoring financial health need various parameters which are closely associated with the purpose. Hence there is a requirement to define standard process to identify parameters which will be useful to define the data model for early prediction of frauds, willful defaults and further NPAs. Parameterization process is the important process of identifying essential and critical parameters for carrying out a particular analytical task and coming out with valuable outcome. For willful default identification in the loan scenario the process is defined and is shown in Figure 2. The process starts with identifying sources that help to understand the various terminologies of the loan and causes of NPA and thereby willful default. The defined parameterization process considers the parameters from different sources such as RBI document, literature, case analysis reports, brainstorming session, bank documents and so on. These parameters are huge and unstructured and hence need to be classified into broad categories to further capture specific parameters for each category along with the ranges. The process is dynamic in nature, covers parameters related to fraud and identifies the change in ranges as per the categories.

Figure 2. Parameterization process
According to RBI circular RBI/2015-16/100 [17] a willful default is considered to occur in any of the following four cases: a. When there is a default in repayment obligations by the borrower unit to the financial institutions even when it has the capacity to honor the said obligations. There is deliberate intention of not repaying the loan. b. The funds are not utilized for the specific purpose intended for which finance was availed but have been diverted for other purposes. c. When the funds have been tapped off and not been utilized for the purpose for which it was availed.
Further, no assets are available which justify the usage of funds. d. Asset bought by the lenders' funds have been sold off without the knowledge of the lender.
Also in cases where a letter of comfort or guarantees are furnished by group companies of willfully defaulting units, these obligations are not honored when they are invoked by the lender, then such group companies are also considered to be willful defaulters.
RBI suggests in its document on data standardization [3,6] that, there is a data requirement for proper supervision. The data is broadly divided into two groups 1) Data submitted by banks 2) Data generated or compiled by the supervisor. Furthermore, data can also have other characteristics which need to be considered. Table 1 shows these considerations suggested by RBI. Considering the data standardization requirement of RBI and increasing concern of loan frauds committed, the objective of the study emphasizes to define the set of parameters which help to detect willful default behavior and build a data model.
In order to understand the usefulness of the required parameters, brainstorming session was arranged with bank experts, company officials, loan supervisors and financial brokers. The discussion happened on scenarios with respect to banks, companies who are taking loans, and other financial scenarios. This session was extremely useful to obtain initial broad set of parameters to begin the process, These are shown in Figure 3.
After identifying initial level of parameters as per brainstorming session, further many parameters are identified by learning case studies, the literature survey and discussion with the domain expert. Based on all these inputs and studies, the identified parameters as early indicators are grouped into six groups as shown in   The attributes under each of the group are defined as: 1.
Financial a.
Financial leverage ratios : i. Leverage ratios indicate fixed expenses obligations. Since the fixed expenses are period cost, it should be recovered from the period in which it is incurred. Worsening leverage ratio indicates that the company is not in the position to recover its fixed obligations. 1) Asset Coverage Ratio: It indicates total backup of assets for each rupee of loan raised. If it is more than 1 then company can manage to repay its long term loans with existing assets. 2) Debt Equity Ratio: This indicates outsiders' contribution to capital compared to owners' contribution. Ideal ratio is 1:1 but standard is fixed based on the gestation period and sector. d. Key personal outlook, director of the company is responsible for loan process which is taken for business/corporate requirements e. Social behavior of the directors of management people can be obtained through social media posts 6. Bank: There are several parameters which Banks maintain for each loan, some of them are listed below.
a. Purpose of the loan b. Past loan status c. Annual Income d. Grade of the loan e. Credit score f. delinqencies g. delay in payments Apart from these parameters, Companies Auditor's Report Order (CARO) can be considered as the master document to analyse the parameters mentioned in. For instance if the CARO report says the assets are not validated, then it is a negative indicator. Then asset ratios need not be considered even if they look good. Companies Act, 2003 requires that the auditor's report of specified companies should include a statement on the prescribed matters. These reporting requirements have been prescribed under the Companies (Auditor's Report) Order, 2016. CARO report has information on Fixed Asset, Inventory, Loan given by Company, Loan to director and investment by the company, Deposits, Cost Records, Statutory Dues, Repayment of Loan, Utilization of IPO and further public offer, Reporting of Fraud, Approval of managerial remuneration, Nidhi Company, Related Party Transaction, Private Placement of Preferential Issues, Non Cash Transaction, Register under RBI Act 1934. This information also need to be considered as potential parameters for identification of NPAs based on pre-loan and post-loan performance analysis of the same. The pre-loan and post-loan performance analysis of the parameters mentioned above need to be done to understand the pattern of performance. If post-loan performance declined as compared to pre-loan performance then it is a negative indicator. However, continuous monitoring of the loan is required for early detection of the willful default behavior.
Considering all the above broad groups, the parameterization process is carried out to identify effective parameters for data model. For each parameter suitable data type and range or value indicating good loan are identified. The parameters, data type and values are shown in Table 2. The parameterization process followed to identify the parameters is unique and effective as all aspects and scenarios related to loan process have been taken into consideration while defining the final list of parameters. Hence, the process is highly feasible to implement with the help of Information and Communication Technologies (ICT). As the number of banks and companies are increasing and also number of loans increasing, the data capturing and analysis process for all these parameters is not able to be implemented using traditional ICT. Further, paper describes the big data based novel framework designed for loan process and data analysis.

FRAMEWORK FOR NPA/WILLFUL DEFAULT IDENTIFICATION
The parameterization process followed to identify the parameters is unique and effective as all aspects and scenarios related to loan process have been taken into consideration while defining the final list of parameters. Hence, the process is highly feasible to implement with the help of Information and Communication Technologies (ICT). As the number of banks and companies are increasing and also number of loans increasing, the data capturing and analysis process for all these parameters is not able to be implemented using traditional ICT. Further, paper describes the big data based novel framework designed for loan process and data analysis.
A novel framework for NPA/Willful default identification is designed and is represented in Figure 5. This framework mainly provides technical solution to handle the complete loan process staring from sanctioning to early identification of the willful default. For this process all the parameters required for early detection of NPA/willful default are identified through data parameterization process. These parameters need to be collected at the loan approval level and then continuous monitoring has to be done until loan is completed. During monitoring the pattern of loan payment, transactions carried out, behavioral and social traits are analyzed and if the pattern is not normal it is identified as outlier behavior and hence possible default case. This process is carried out longitudinally until the loan is fully paid or declared as NPA.
According to E\&Y survey [18] early warning signs to identify defaults must leverage technology and data analytical capabilities. Only technology can bring revolutionary shift in NPA management in India. Assistance of Automated solutions in data analysis can enable early indicators that will generate alerts before the situation becomes worse. Classification algorithms are required to build prediction model for NPA/willful default. Hence prediction algorithms are implemented using machine learning utilizing various structured and unstructured parameters. These machine learning prediction models are designed using map reduce logic on hadoop big data platform [19][20][21]. The classification algorithms considered are Naive Bayes [22], Logistic Regression [23], Support Vector Machine [24], Neural Network [25] and Random forest [26]. These are implemented using Map-Reduce technique of Big Data on Hadoop Cluster. The models are compared based on accuracy obtained and the algorithm with best accuracy is considered for prediction.

EVALUATION OF PREDICTION MODELS
The evaluation of the models is done considering structured and unstructured data. Structred data fields include Loan ID, Customer ID, Current Loan Amount, Term Credit Score, Annual Income, Years in current job, Home Ownership, Purpose, Monthly Debt, Years of Credit History, Months since last delinquent, Number of Open Accounts, Number of Credit Problems, Current Credit Balance, Maximum Open Credit, Bankruptcies, Tax Liens etc.The dataset comprises of around two lacs of rows. Unstructured data considered includes synthesized social media data. Sentiment analysis using Apache Hive is done on this data to get social outlook value [27,28]. If this value is positive it indicates positive life style. Payment data is also considered to get spending patterns values. These values are added to the data. Spending pattern and Social outlook are the parameters from Table 2 and are synthesized for the purpose of validation. The aim of the model is Loan default prediction. For this purpose prediction models are built using machine learning on Hadoop and spark. Multiple machine learning models are implemented. These models are evaluated based on accuracy obtained. Machine learning algorithms considered for building prediction models are Logistic regression, Neural Network, Random Forest, and Naive Bayes. The results obtained for the model are shown in the Figure 6. As depicted in the figure Neural network has the highest accuracy, hence is used in the process of prediction of NPA and there by willful default. Figure 6. Evaluation of classification algorithms for prediction

CONCLUSION
Banking is the major service sector to balance the economy of the country. The loans going bad intentionally are not only affecting the bank's profitability but also causing setback for the economy of the country as a whole. The technological assessment and support for early identification of such willful default is the need of the hour. It is imperative that customers' entire profile including behavioral, financial, social parameters have to be considered and monitored. In this paper a process for identification of critical parameters is designed for early identification of willful default. This parameterization process needs to be integrated into the process of loan. Hence a novel framework which takes in to account starting from loan sanctioning till completion is designed. The framework is built using big data technology as it needs to deal with both structured and unstructured parameters. In order to choose the best prediction model in the framework an experiment is conducted. It is carried out on the loan data set which is structured and the generated synthetic unstructured data. Various classification models are built using map reduce and compared based on the accuracy. The results show that neural network has the best performance, and hence it is implemented in the framework. The results also indicate that in order to identify willful default unstructured components play a major role.