Dalal 2024 Ijca 923744
Dalal 2024 Ijca 923744
27
player workload. By providing live match predictions, real- Sinha[7] used machine learning techniques to anticipate IPL
time data streaming and predictive models could improve the match outcomes in his research. Utilizing attributes like city,
fan experience. Beyond the virtual world, this real-time team1, team2, toss winner, and winner, the study compared six
interaction may have an impact on the betting market and algorithms (Decision Tree, Naive Bayes, Random Forest,
improve fan interaction at sporting events. ANN, Logistic Regression, KNN) to predict a team's win or
loss. The research highlighted Random Forest's 88.46%
Smartwatches and fitness trackers have already started to make accuracy and extended analysis into sentiment from Twitter
their way into the world of cricket. Real-time insights into data, gauging public opinion about IPL teams and players.
players' physical conditions and levels of fatigue during
matches may be possible by integrating data from such devices Ahmed et al.’s study[8] examined cricket team unpredictability.
into analytics. This knowledge could help coaches improve Analyzing Pakistan's One-Day International (ODI) matches, it
practise regimens and in-game tactics. employed attributes like batting average, bowling average,
strike rate, economy rate, fielding average. Applying SVM,
2. OVERVIEW kNN, Decision Tree, and Random Forest, the study highlighted
2.1 Literature Review SVM's 82.5% accuracy. Batting average and strike rate
Kapadia et al.[1] use machine learning to anticipate cricket emerged as key predictors for match outcomes.
match results using IPL data. They use four algorithms, Anik A. I. et al.[9] focuses on predicting individual ODI cricket
Random Forest, Naive Bayes, Model Trees, KNN to select player performance. The study aimed to establish a model
features using filter-based methods. They evaluate the models' estimating player batting and bowling performance by
precision, recall, and accuracy. They discover that tree-based considering factors like role, order, style, opposition, and
models outperform statistical and probabilistic models, but venue. Employing Linear Regression, Decision Tree, Random
none of the models perform well when a coin toss is involved. Forest, and Artificial Neural Network, the research achieved
They make recommendations for potential applications of peak accuracy with Random Forest (93.75% for batting, 95%
machine learning in sport analytics. for bowling). Practical applications like team selection,
ranking, and match analysis were also explored.
Kampakis and Thomas[2] investigate the use of historical data
from the English Twenty20 Cup to anticipate match results in According to Mustafa R. U. et al.[10], Twitter sentiments were
their study. The paper builds predictive models from 500 plus used to estimate cricket match results. Analyzing sentiments
team and player metrics using a variety of feature selection and emotions towards teams and players, the study employed
techniques and four classification algorithms. The study Naive Bayes, Support Vector Machine, Random Forest, and
concludes that when it comes to forecasting cricket match Artificial Neural Network to categorize tweets. The research
results, tree-based models, particularly gradient boosted reported Artificial Neural Network achieving 86.67% accuracy
decision trees, outperform probabilistic and statistical models. for sentiment analysis and 83.33% for emotion analysis,
highlighting the crowd's predictive potential in match
outcomes.
Mahajan et al.[3] use ML to forecast IPL match results in their
paper taking into account factors including home advantage, In the paper by P. E. Allsopp and Stephen R. Clarke[11], a
player data, and current form. The study uses supervised Bayesian network was developed to model cricket team
learning methods including Random Forest, Naive Bayes, performance. Utilizing factors like batting, bowling, fielding,
KNN and Gradient Boosted Decision Trees to estimate team home advantage, and weather, the study assigned ratings using
strength and player performance. The study rates model historical data. The Bayesian network achieved 74% accuracy
precision and provides guidance for upcoming sports analytics in one-day matches and 68% in test matches. Sensitivity
machine learning applications. analysis further revealed influential factors in match outcomes.
Bandulasiri's study[4] used logistic regression to investigate the Manoj Ishi et al.’s[12] study focuses on accurately categorizing
factors that influence ODI results. It considers factors such cricket players into five performance groups for ODI matches.
home advantage, toss choices, strategy, match type, and field They attain high accuracy rates of 97.14% for batters, 97.04%
advantage and uses the Duckworth-Lewis technique for games for bowlers, 97.28% for batting all-rounders, 97.29% for
impacted by rain. The study evaluates the significance of these bowling all-rounders, and 92.63% for wicket keepers using a
factors and examines the accuracy of Duckworth-Lewis against hybrid strategy of CS-PSO feature optimization and Support
a predetermined criterion. The study reveals unexpected Vector Machine. By considering players' historical and current
findings and offers views on cricket analytics. performance, the authors provide an efficient method for team
prediction, contributing to the advancement of predictive
Passi and Pandey's research[5] aims to forecast ODI cricket analytics in cricket.
players' performances. The study focuses on utilizing methods
of supervised learning to forecast the number of runs batsmen Indika Wickramasinghe's paper[13] comprehensively explores
will score and the number of wickets bowlers will take. Out of machine learning applications in cricket, offering insights into
Naive Bayes, Random Forest, Multiclass SVM, and Decision player classification, performance prediction, and decision-
Trees, the study finds that Random Forest is the most making realms. Encompassing 177 players and ten
trustworthy classifier for both predictions. performance indicators, the study classifies all-rounders in ODI
Cricket, predicting new talents via KNN, Naive Bayes and
Ahmed's paper[6] aimed to predict ODI cricket results using Random Forest classifiers. Impressively, Random Forest
data mining. The study built a model on team rating, toss attains a noteworthy prediction accuracy, surpassing its
outcome, venue, weather, and prior straight wins. It utilized counterparts. The work not only surveys existing applications
machine learning methods such as k-Nearest Neighbors, but also outlines challenges and future research directions,
Random Forests, Decision Trees, Naive Bayes, ANN, and making it a valuable resource for understanding the evolving
Logistic Regression to categorize Pakistan's match results, as role of machine learning in cricket analysis.
well as the unique variable "Consecutive wins before current
match."
28
Apurva Lawate et al.'s[14] study introduces a ML based Srikantaiah K C et al.'s[19] study presents a comprehensive
methodology to forecast the projected score and determine the approach to predicting IPL match outcomes using a variety of
winner in IPL cricket. By utilizing key features such as wickets ML algorithms, including Random Forest Classifier, Logistic
taken and runs scored in the last 5 overs, overs bowled, overall Regression, SVM and KNN. The research integrates team
score, and wickets at the current ball¹, the model employs composition, player batting and bowling averages, previous
Linear Regression to predict the 1st inning's score, achieving a match success, and traditional factors like toss, venue, and day-
data explanation rate of approximately 75.226%. However, the night considerations. With a notable accuracy of 88.10%, the
absence of information regarding the accuracy in winner study underscores the effectiveness of the Random Forest
prediction leaves a gap in the assessment. The research algorithm in outperforming other techniques. By combining
underscores the possibilities of ML in cricket outcome player-centric and contextual variables, the research offers
estimation while highlighting areas for further exploration in valuable insights into enhancing the precision of match
enhancing prediction accuracy and winner determination. outcome predictions in the dynamic context of IPL cricket.
Mazhar Javed Awan et al.[15] investigate cricket match analytics Aman Sahu et al.’s study[20] delves into predictive cricket
using a Big Data methodology, combining machine learning analysis through machine learning methodologies. Focused on
and big data analytics to forecast team scores and winners. outcome prediction, the research employs the Random Forest
Utilizing both linear regression models in traditional machine Classifier algorithm and utilizes label encoding for dataset
learning and Spark ML within a big data framework, the study preprocessing. Utilizing the data analysis tool Google Colab,
achieves notable predictive outcomes. The results, with the authors process data and offer recommendations. While the
accuracy at 95% and measured through various metrics accuracy of the model isn't explicitly mentioned, the research
including RMSE, MSE, and MAE, underscore the efficacy of contributes to the field by highlighting the possibilities of ML
the Spark ML-based approach in enhancing prediction algorithms in forecasting cricket game outcomes, offering
accuracy. The study advances the expanding field of sports insights into the processing techniques employed for improved
analytics by demonstrating how big data and machine learning predictive performance.
have the potential to transform the prediction of cricket match
results. Daniel Mago Vistro et al.[21] focus lies on pre-match IPL winner
prediction via trained machine learning models. Employing
Pallavi Tekade et al.'s study[16] presents an insightful algorithms like Naive Bayes, Logistic Regression, Random
exploration of cricket match outcome prediction using a a Forest, SVM, and Decision Tree on various datasets, the study
variety of supervised machine learning methods. Focused on showcases the potential of these techniques in forecasting
Indian Premier League (IPL) matches, the research delves into cricket match outcomes. Although the specific features used for
key determinants of match results and applies various prediction aren't outlined in the abstract, their related work
algorithms including Decision Trees, Bayes Network, Logistic attests to a noteworthy 90% prediction accuracy². This research
Regression, Support Vector Machines, Linear Regression, and contributes to the evolving landscape of sports analytics,
Random Forest. The paper accentuates the importance of underlining the role of machine learning and data analytics in
selecting an optimal regression model that aligns with the data, cricket outcome prediction.
leading to superior predictions. With a notable peak accuracy Manoj Ishi et al’s research[22], delves into victory prediction in
of 90%², the study advances the field of sports analytics by ODI cricket via a comprehensive ensemble methodology.
demonstrating how machine learning may be used to predict Encompassing 128 features, the study introduces three models
cricket match results. based on team batting-bowling strength, run-scoring pattern,
and overall team prowess. Employing ensemble algorithms like
Prasad Thorat et al.'s research[17] addresses the intriguing voting and stacking classifiers, the research employs machine
domain of cricket score prediction by employing machine learning to predict match outcomes. Incorporating feature
learning techniques. Focused on forecasting the first innings' selection techniques, the investigation evaluates models based
final score in cricket matches, the study employs the linear on F1 score, precision, accuracy, and recall value. Notably,
regression algorithm. The model draws upon critical factors Support Vector Machine and Logistic Regression yield optimal
such as runs scored in 5 overs and wickets taken². While limited results, achieving a 96.30% accuracy in predicting ODI match
in scope, the research contributes to the realm of sports winners¹. This study contributes to enhancing cricket outcome
analytics, showcasing the potential of ML in anticipating predictions through advanced machine learning techniques.
cricket match results. Further exploration and validation could
potentially refine this predictive approach for enhanced 2.2 Research Gap
accuracy and broader application within the field. Researchers and experts are becoming interested in cricket, a
popular sport. The growing importance of using machine
Rushikesh Bhor et al.'s[18] work presents a comprehensive
learning for analytics and prognostication in the domain is
exploration of cricket match prediction through machine
highlighted by the numerous research papers that have
learning methodologies. Focusing on outcome prediction, the
presented various models and methodologies to forecast cricket
study incorporates diverse factors influencing match results,
match results. However, the common flaws in these studies
encompassing ground conditions, historical player and team
highlight a number of promising directions for further research
performance records at specific venues. In order to get the best
in this developing subject.
predictions, a regression model is suggested in the study, which
also emphasizes the significance of the "master" element's key The only emphasis on certain cricket leagues or formats limits
factor impact. Leveraging a range of techniques, including these analyses, making their conclusions possibly unapplicable
Naïve Bayes Classification, Euler's Strength Formula, and in other contexts. Some studies focus on data from the Indian
Ensemble techniques, the research demonstrates the authors' Premier League (IPL), while others isolate information from
commitment to enhancing predictive accuracy. While absent in English county cricket. Different rules, tactics, and
the paper, the incorporation of multiple machine learning characteristics specific to several formats and leagues can have
algorithms underscores the study's commitment to robust an impact on players' and teams' efficacy and consistency. As
predictions in cricket match outcomes. a result, it is crucial to create more thorough and reliable
29
models that take a variety of cricket leagues, formats, and
influencing variables into consideration.
These enquiries are further limited by their absence of relevant
factors that may affect cricket match results. Players' current
form and fitness, match timings (day or night), weather
circumstances, pitch characteristics, toss decisions, etc. are not
always taken into account. For prediction, some studies just
consider historical match results, while others only include
team and individual data. However, these unaccounted factors
have the potential to significantly influence batting and
bowling plans and, in turn, have an impact on players' and
teams' performances. Therefore, it is clear that more
Fıgure 1: Sample Player Data
comprehensive models must be developed in order to account
for these factors and their complex interactions.
Match Data Extraction (cricsheet.org) – For comprehensive
Furthermore, a fundamental flaw in these queries is the lack of match data, we turned to cricsheet.org, a valuable repository of
comparison or validation with other cricket match prediction cricket match information. We extracted complete match data
models or procedures, such as betting odds, professional in JSON format from cricsheet.org, encompassing a wide array
opinions, and simulation methodology. While some studies of details such as match outcomes, player performances, team
claim that random forest is superior to other algorithms or that statistics, and other relevant attributes. (see Figure 2 and Figure
tree-based models are superior to probabilistic and statistical 3) This data, extracted in a structured JSON format, served as
ones, these claims lack support from comparisons with other the foundation for our historical match analysis, enabling us to
models or techniques. In order to accurately evaluate and verify unravel patterns and trends that contribute to match predictions.
various models and methodologies for predicting cricket
matches, a variety of indicators and benchmarks must be used 3.2 Data Compiling
for thorough evaluation. Player Data Compilation – The player data extracted from
iplt20.com was organized on the website in a year-wise
A fourth restriction in these analyses is the exclusion of
manner. To create a comprehensive dataset for each player, we
explanatory or interpretative insights into the selected variables
initiated a meticulous data compilation process. This involved
and models, as well as their implications for cricket analytics
systematically aggregating the player statistics from different
and decision-making. This omission is notable. Intricate
years, ensuring that we captured the evolution of each player's
hierarchical characteristics may be used to predict match results
performance over time. By consolidating the information into
in some cases, or player batting and bowling abilities may be
combined stats for each player, we generated a cohesive dataset
modelled using performance data. The relevance or
that encapsulated their overall contributions across multiple
significance of these characteristics within the context of
IPL seasons.
cricket analytics and tactical decisions, however, remains
unexplained. As a result, it becomes necessary to explain and Match Data Compilation – While cricsheet.org provided a
construct the features and models used to predict cricket wealth of match data, it also included extraneous information
matches by utilising statistical methodologies and domain that was not pertinent to predicting match outcomes. To
expertise for thorough understanding. streamline the dataset and focus on the most crucial factors, we
performed a meticulous filtering process. Unnecessary
3. METHODOLOGY columns were excluded, and only the valuable attributes
3.1 Data Extraction essential for our predictive model were retained.
In the pursuit of assembling a comprehensive dataset for our
cricket match analytics and prediction model, we employed a
two-pronged approach, extracting data from two distinct
sources: iplt20.com and cricsheet.org.
Player Data Extraction (iplt20.com) – To gather detailed
player statistics, we utilized web scraping techniques with the
Selenium framework on iplt20.com. Selenium facilitated the
automated retrieval of player data, allowing us to navigate
through the website's structure and extract relevant information
efficiently. By leveraging this method, we acquired essential
player-specific details, including batting and bowling averages,
strike rates, and other performance metrics. (see Figure 1)
30
composite player rating that considered various facets of a
player's skill set.
Weight determination using Deep Learning – To optimize
the weights assigned to each feature in our player rating
formula, we employed a deep learning model. This model was
trained on historical data to learn the intricate relationships
between the selected features and match outcomes. The result
was a set of dynamically determined weights that reflected the
contextual importance of each feature, providing a more
nuanced and adaptive approach to player rating. (see Figure 4)
31
Logistic Regression: Provided interpretability and insights
into linear relationships between features and match outcomes.
Naive Bayes: Known for simplicity and efficiency, suitable for
datasets with numerous features.
4. RESULTS
In evaluating the performance of our cricket match prediction
models, we employed several machine learning algorithms,
each yielding varying levels of testing accuracy. Notably, the
Random Forest model exhibited the highest accuracy,
achieving an impressive 89.82%. (see Figure 6) Further
analysis using a confusion matrix for the Random Forest model
reveals precision, recall, and F1 score values of 0.91, 0.88, and
0.9, respectively. (see Figure 7)
5. CONCLUSION
In conclusion, our research has made significant strides in the
realm of cricket match analytics and prediction through the
implementation and evaluation of various machine learning
models. Amongst the models assessed, Random Forest
Figure 6: Testing Accuracy of All Models emerged as the standout performer, achieving an impressive
Conversely, the SVM Classifier achieved a testing accuracy of testing accuracy of 89.82%. The precision, recall, and F1 score
77.77%, showcasing a respectable predictive capability. values of 0.91, 0.88, and 0.9, respectively, underscore the
Logistic Regression and Naive Bayes models demonstrated model's robust predictive capabilities.
testing accuracies of 77.28% and 62.41%, respectively. While other models demonstrated respectable accuracies, the
comprehensive performance of Random Forest, as evidenced
by the detailed confusion matrix, sets it apart. This precision-
oriented approach not only enhances our understanding of
predictive strengths but also highlights areas for potential
refinement.
The practical implications of our work extend beyond academic
pursuits, offering tangible benefits to cricket enthusiasts and
professional teams alike. Our research provides a reliable tool
for strategic decision-making, unlocking crucial insights into
the factors influencing match outcomes. As we navigate the
intersection of sports and technology, the success of the
Random Forest model in our study serves as a foundation for
future endeavors, paving the way for continued innovation in
Figure 7: Performance Matrix of All Models cricket match analytics.
The Random Forest confusion matrix (see Fig. 8) provides
additional insight into the model's performance, detailing true 6. FUTURE WORK
negatives, false positives, false negatives, and true positives. In the pursuit of refining our cricket match prediction model,
This detailed breakdown enables a nuanced understanding of future work will build upon the foundation laid by historical
the model's predictive strengths and areas for improvement. match analysis. One notable enhancement involves the
formulation of a proprietary method for calculating player
consistency. Our unique approach to capturing player
performance nuances has proven instrumental in boosting the
model's predictive accuracy.
Moving forward, we aim to deepen our understanding of player
dynamics by expanding and refining the player-centric metrics.
Incorporating additional factor like individual strengths can
contribute to a more comprehensive assessment of a player's
impact on match outcomes.
Furthermore, our future research will delve into the exploration
of advanced techniques, including ensemble methods and deep
learning architectures. These methodologies hold the potential
to unveil intricate patterns and relationships within the data,
32
thereby elevating the sophistication and accuracy of our
predictive models. [11] Allsopp PE, Clarke SR. Rating teams and analysing
outcomes in one-day and test cricket. Journal of the Royal
Collaboration with cricket experts, statisticians, and data Statistical Society Series A: Statistics in Society. 2004
scientists remains integral to our future endeavors. By Nov;167(4):657-67.
combining domain expertise with cutting-edge technology, we
aspire to develop predictive models that not only excel in [12] Ishi M, Patil J, Patil V. An efficient team prediction for
historical match analysis but also adapt dynamically to real- one day international matches using a hybrid approach of
time scenarios. The ongoing refinement of our model and the CS-PSO and machine learning algorithms. Array. 2022
exploration of novel features will ensure its relevance and Jul 1;14:100144.
effectiveness in the ever-evolving landscape of cricket
analytics. Through these future initiatives, we anticipate [13] Wickramasinghe I. Applications of machine learning in
providing cricket enthusiasts and professional teams with a cricket: a systematic review. Machine Learning with
predictive tool that continually pushes the boundaries of Applications. 2022 Dec 15;10:100435.
accuracy and insight in the realm of cricket match prediction.
IJCATM : www.ijcaonline.org
33