Data visualization and toss related analysis of IPL teams and batsmen performances

ABSTRACT


INTRODUCTION
Sports analytics and Data Visualization has provided a greater platform for Player selectors, managers and also the players to increase on field performance. Decision makers and analysis, the next piece of the framework, is the process of applying statistical tools and algorithms to data to gain insight into what is likely to happen in the future. Sports analytics [1] is being applied in various sports like Soccer, basketball and cricket. Each movement of the ball, the player strike rate, run rate, everything is captured using special camera systems and other recording mechanisms. This data is run through various statistical algorithms, tools and visualization techniques to provide deeper insight and pave way for recommendations to the player or team. With the ease of obtaining and storing data, advanced analytics and machine learning techniques are applied to engineer a predictive model for various team sports like cricket. There are three versions of cricket -Test matches, One-day Internationals and Twenty20. Test Cricket is one of the highest-level formats which is played bettween two countries over the duration of five days, ODI is considered as a limited over formats of cricket and T20 is one of the latest and successful forms of cricket. The T20 format gave birth to Indian Premier League (IPL) a professional league contested during April and May of every year [2]. It was initiated by BCCI (Board of Control for Cricket in India) in 2008. This shorter version of cricket is one of the most successful one in terms of fan engagement and business. Everyone enjoys this shorter version of cricket.
The main objective of this league is to provide a platform for young and talented players. IPL works on the franchise system of hiring players. There are eight teams in IPL. Each team is a group of eleven players consisting of batsmen, bowler, and all-rounders. This tournament is being played in different cities, because of this, there is a huge fan following with a lot of media interest and business involvement. IPL is a mixture of talent and opportunity so basically player performance is the key factor in this. Various other key factors are the type of pitches -Flat pitches, pitches that favor fast bowling, spin bowling and swing bowling and whether they are beneficial for batsmen, non-striker batsmen, and bowlers for holding a good partnership. All these natural parameters and historical data of players will help the team management in the selection process. When it comes to IPL or any kind of sports, Team strength, Special (Key) Players, Home Crowd plays an important role in the prediction of a match. Analytics is one of the most important factors in Cricket history [3]. There will always be some sought of uncertainty attached to bowler or batsmen average performance. Last over's and power plays are the turning point of the matches. Selecting the right player for these crucial over's is not easy. Analytics can help in all these tough situations. Analytics bridges the gap for team selectors, coaches, and managers. Analytics gives us them a clearer idea about player consistency, fast scoring and finishing ability. To manage the risk in a better way and to get the probable winners, analytics play a crucial role in the field and out of the field. Data Visualization is one of the major outcomes in sports analytics [4]. The visual form of data is more easily understandable over numbers and text. This paper explores the data visualization techniques, Toss related analysis like plotting for the data collected.

RELATED WORK
Sports analysis is a huge cluster of specific data and statistics. Sports analytics are the present and future of the professional sports era. On-field and Off-field analytics have gone beyond providing player and team analysis and predicting correct results. The authors in [5] discussed a factor analysis approach to study the performance of cricket players and findings of his study say that batting capability dominates over bowling. The study reveals that the performance of bowlers is one of the crucial and significant factors which may change the scenario of matches. Coaches and selectors can include good all-rounder players to improve team results. The work in [6] compared cricketers batting and bowling performances using graphical methods. Batsman and Bowler's record of season 2008 has been utilized for the analysis and interpretation of the graphs. Twelve bowlers and twelve batsmen were selected who bowled at least 100 balls and took at least four wickets and batsmen faced at least 100 balls had at least four completed innings. To predict the player performance in ODI using various Machine Learning Algorithm techniques is done in [7].
Naïve Bayes, Decision tree, multiclass SVM and Random forest are used to generate the prediction models for batsmen score and bowlers wickets for both the teams. Random Forest gives the most accurate results for both the scenarios out of all the four techniques used. The authors in [8] discussed various key performance indicators to study the player performance in IPL from different countries. Cluster analysis has been applied on the datasets of players of IPL season 2010. The study reveals that players of England had performed well as a group and New Zeeland players are the lowest performers. The factor analysis used in [9] with various statistical techniques which shows that batting capability dominates over bowling. Dataset of 85 batsmen and 85 bowlers has been considered from IPL season 2012. Various dimensions of bowling and batting were usedthree dimensions grouped into factor two (bowling), five dimensions grouped into factor one (batting). Variance explained by factor one is much higher than factor two which clearly shows that batting capability dominates over bowling. The authors in [10] measured the performance evaluation of fast bowlers and spinners based on various factors and ranked the performance with the help of AHP and TOPSIS. Different criteria's and parameters are used such as economy rate, bowling average and bowling strike rate to rank the players.
The study reveals that Indian bowlers performed well and the top 7 bowlers are Indians in all the three seasons (2008,2009, and 2010). The machine learning-based approach used in [11] which clustered the players according to the roles and in order to rank the player's performance, a novel index, namely Deep Performance Index is formulated. Players from IPL season 2008 taken up for the formulation of performance ranking. 201 players are analyzed with T20 and IPL as their career data. Players got clustered into different groups depending upon their batting and bowling performances. The authors in [12] discussed the IPL teams and players to do the evaluation with the help of correlation, association and classification rules. Naïve Bayesian classification is used to predict the team results by considering the individual performances of players. Analysis of team performance at home and away ground is also analyzed. By support and confidence of the players, selectors get the idea to filter out players for the next season. The work in [13] discussed the prediction tool and machine learning algorithms which are used to analyze the past performance of players, and it will be beneficial for team authorities to select the right player. HBase an open source, distributed prediction tool is presented to keep the data related to matches and players of IPL seasons. Past performances of players have been visualized by HBase tool. Statistical analysis of player's performed based on different characteristics. Prediction performed on performances of the team depending on the statistics of the individual players. The authors in [14] analyzed the data of ODI matches of Indian cricket team's and apply association rules on home ground or away game attributes, toss, batting order and the final match results. The authors in [15] pro-posed a model that works on two methods which are to predict the score of first innings on the basis of current run rate, number of wickets fallen, venue of match and batting team. Second method predicts the outcome of match considering same attributes from the first method along with the target given to batting team. A dataset of ODI matches from 2002 to 2014 used in these two methods. Naïve Bayes and Linear Regression Classifier have been used to implement these two methods. The authors in [16] predicted the performance of batsmen of IPL sea-son 4 based on the performances of player in first three seasons. Multi-Layer perceptron (MLP) neural network is used to predict the past performances. This prediction can help the management and selectors to decide which batsmen they should bid for and who should not be considered at all. The authors in [17] predicted the result of a match by comparing the strengths of two teams. A performance of individual players from each team is measured by them. They implemented algorithms to predict the performances of batsmen and bowlers from past and recent career data. The work in [18] is done for analyzing the performances of bowlers. A measure called Combined Bowling Rate which is a combination of three traditional bowling parameters: bowling average, strike rate and economy is used for the experiment.
The authors in [19] formulated a statistical model to estimate the value of player by considering different statistics of batsmen, bowlers and all-rounders. They tried to build a systematic logical decision model to select better players for auction. A multi-objective optimization evolutionary method [20] used in this paper to optimize batting and bowling strengths of a team and to find the team members. Performances of each player are also evaluated by using NSGA-II algorithm. The authors in [21] use some string similarity metrics: Le-venshteinSim (LEVS), LeeSim (LEES), Jaccard Coefficient (JACC), Dice Coefficient (DICE), Jaro-Winkler Distance (JWD) to compare and differentiate the performances of unknown performers to that of experts. They used the concepts of Learning Analytics, Game Analytics, Productive Analytics and Data Visualization to analyze the Serious Game Analytics from User Generated Data. The work in [22] is done on artificial and real-world dataset including different Visualization techniques: uncertainty visualization, ensemble data visualization and multidimensional/multivariate data visualization. They concluded that differences in ensemble distribution are most crucial and important factors for the proper analysis of a game.

ABOUT TOOLS AND METHODOLOGY
IPL, one of the biggest leagues in T20 cricket with millions of fans all over the world. Around 696 matches have taken place from 2008-2018. There is a huge data which include ball by ball insights of each match of each innings with match location and all other necessary details. Spyder, the free integrated and Scientific Python Development Environment has been used to do the data exploration and plotting functions for visualization.
Spyder offers various popular scientific packages for deep inspection and exploration of data. Proper Analysis and Visualization performed in Spyder with numerous packages such as NumPy, Pandas, Matplotlib and Seaborn. These packages help to do the basic and modern visualization. In my work, Seaborn is used for Toss Related Analysis Approach and Matplotlib is used for player visualization.

DATA COLLECTION
Data has been collected from www.iplt20.com,www.cricsheet.org. Data consists of the ball by ball details for a total of 696 matches from 2008-2018. Ball by ball data provides in depth detail of all the balls thrown in that particular over. The ball could be either wide, dead, no ball or a player got singles, doubles, triples, six or four on that ball. There are two csv files of datasets. Matches.csv gives the details of match venue, location, Season, contesting team, about toss winner and toss decision, match result, win got by runs or wickets, player of the match, details of all the three umpires and match Winner etc. Deliveries.csv is the ball by ball data and the combination of all the deliveries for all the matches from 2008-18. It consists of different attributes Match_id, bowling team, batting team, batsmen, bowler, Nonstriker, no ball runs, penalty runs, Extra runs, over, total runs etc. Innings tell if the first team was going on field or second one. Over describes the current over number. Ball describes the current ball number of the current over. Table 1 decribes the total of ten attributes which were used for the visualization of batsmen performances and toss related analysis. Toss decision, Toss Winner and Winner are the key attributes used for toss related analysis for 696 matches from 2008-2018. The Team who won the toss. Winner The Team who won the match.

Pre-processing phase
In this phase filtration and cleaning of matches and deliveries datasets took place. This phase mainly deals with standardization, transformation and correction of data. There was no major pre-processing done for the data collected as most of much was normalized.

Data visualization
The most important and significant part of data visualization and predictive analysis is to represent the data in form of charts and graphs to get a visual presentation of data. The collected data is visualized to get a better and clear understanding about all the parameters of the Season, the team, All-rounders, batsmen and bowlers so that it will be helpful for the team selectors, Captains and managers for the next auction. Different packages are used to get the proper analysis and visualization for players and teams. NumPy is used as numerical computing for the given datasets. Pandas used as the data processing and I/O for both csv files. Matplotlib used as the basic visualization for players. Seaborn package used as the modern visualization for Toss related analysis as well as for team and player insights. Different new features are introduced such as the number of total matches played by the team for all the eleven seasons, Maximum Man of the Matches, Maximum Centuries Scored by Batsmen, Maximum Player of the Match Awards, Maximum Count of Toss Wins by Different Teams, Decision taken by each team after winning toss etc. Table 2 and Table 3 lists top players having Maximum centuries scored and maximum Man of the match titles conquered. CH Gayle, AB de Villiers and SK Raina are on the top for both the titles.  Analysis: By comparing Figure 1, Table 2 and Table 3 Chris Gayle, one of the best T20 batsmen from West Indies has won Player of the Match Award for 20 times which is the best record in IPL so far.   Table 2, Table 3, Figure 1, Figure 2 and Figure 3, Clear picture of all such star player (Batsmen) got visualized who will be the first preference for Team selectors and management to bid on them and take them in their court. Table 4 shows the list of players (Batsmen) having best strike rate during the span from 2008-2018 and Table 5 shows the Top 10 players with Maximum runs.   Table 6 give clear idea of toss related analysis for all teams, Mumbai Indians and Kolkata Knight Riders are on the top list having maximum count of toss wins.  Table 6 and Figure 4, Mumbai Inidans captain and Kolkata Knight Riders Captain have a good hold of run with the coin. In T20 games, toss plays a crucial role sometimes dew factor on the ground, or the mositure content in first 10 hours can change the game. Captain and other team member analyze the score before starting of the match. Different types of pitches play different roles for batsmen and bowlers. By winning the toss Captain can analyze that on this particular pitch, bat first or field first, which one can give them an advantage.  Table 7 depicts the decision taken by each team after winning the toss. Chennai super kings decided maximum times to bat first rather than fielding because of key Players like MS Dhoni and Suresh Raina who analyzes the pitches very well. Analysis: Figure 5 and Table 7 illustrates the true mentally of the IPL as well as the T20 game. After winning the toss, teams are preferring to field first so that they can plan their innings well while chasing. There are different versions of pitches are available Pitches that favor spin bowling which are mostly found in the Indian Subcontinent, Flat pitches which are batsman friendly, Pitches that favor swing bowling, Pitches that favor fast bowling. So basically, fielding first over batting can become the advantage. In the last three seasons (2016,2017,2018) team strategies are quite similar. They analyze the pitch and the venue very well.  Table 7 and Figure 6, one point which should be noted that only Chennai Super Kings is the team who prefers to bat after winning the toss. Out of 147 matches, 77 times Chennai won the toss and 45 times they decide to bat first and 32 times to bowl. This can be because of the captaincy of MS Dhoni who rely on his bowling and fielding unit. Winning count of 90 matches and loss count of 57 matches are the stats for the Chennai team. MS Dhoni and SK Raina who won 14 times Maximum man of the matches and they both are in the list of Maximum centuries as well. So, batting first for Chennai is always the right decision. Three times Chennai super Kings is the winner of IPL (2010, 2011 and 2018). All other teams especially Royal Challengers Bangalore who played a total of 166 matches and they also won the toss 77 times out of which 57 times they decide to field first and 20 times to bat. CH Gayle and V Kohli of Bangalore scored the maximum centuries by a count of 8 and 5 and Bangalore have a winning count of 79 and loss count as 87. So, their decision compared to Chennai is not at a perfect level. As they also have star players like V Kohli and CH Gayle, they can move their decision stats to bat first rather than a field in the upcoming matches or Seasons.

CONCLUSION
In this paper, the performance of cricket players(batsmen) and toss related analysis in IPL from season 2008-2018 has been visualized. Finding out the hidden parameters, patterns and attributes that lead to the outcome of a cricket match helps the team owners and selectors to recognize better players. A salary of

4431
IPL cricket players is decided through the auction process. Thus, it is a part of franchise and matter of decision making about which player to be bided for and at what cost by the past performance of players in IPL. Every Selector needs young and dynamic players who can handle the pressure calmly, and go towards the winning line. This paper highlights the player performance especially batsmen and addresses the analysis that is done for Maximum Man of the Matches, Maximum Centuries Scored by Batsmen, Top Batsmen, Batsmen with Top Strike Rate, Top 10 Players with Maximum Runs. Statistics of 696 matches have been used in this experiment and even for toss related analysis such as Count of Toss wins, Decision taken by each team after winning the toss, Toss Decision Season Wise, Toss Decision Team Wise. Based on the above analysis, the Indian batsmen are very good and are on top choice by the selectors. SK Raina considered as the finest batsmen who is second in the top list of batsmen having maximum runs, maximum man of the matches, maximum centuries scored, V Kohli at the first position of maximum runs and even he is in the list for maximum centuries. All other Indian Star batsmen MS Dhoni (Best Captain, Maximum runs and Maximum man of the matches), Rishabh Pant (second best strike rate and maximum centuries), RG Sharma, S Dhawan, G Gambhir, YK Pathan and M Vijay performed very well at the end of last five overs. Selectors have the clear choice to give preference to Indian Players at first as they performed very well in season from 2008-2018. We also presented toss related analysis, in which MS Dhoni is the best captain for CSK who won the toss maximum times having count of 77 and elected to bat first. Their choice of bat first mostly results in win. Most of the times filed first is elected by the captains so that they can plan and perform well by chasing. RCB, KKR, MI and KXIP elected field first most of the times having count of 57 and 49. Selectors have the clear choice to select batsmen from Mumbai Indians and Kings XI Punjab as this two teams handled the pressure very well during all the seasons from 2008-2018. By considering all this visualization and toss related analysis, Team Management can select the right players and rights teams at the time of auction. A good and strong cricket team can be formed within a given budget, which will have the highest chance of winning.