Entropy 23 00090 v3
Entropy 23 00090 v3
Article
Who Will Score? A Machine Learning Approach to Supporting
Football Team Building and Transfers
Bartosz Ćwiklinski , Agata Giełczyk * and Michał Choraś
Faculty of Telecommunications, Computer Science and Electrical Engineering, UTP University of Science and
Technology, 85-796 Bydgoszcz, Poland; bartosz.cwiklinski@utp.edu.pl (B.Ć.); chorasm@utp.edu.pl (M.C.)
* Correspondence: agata.gielczyk@utp.edu.pl
Abstract: Background: the machine learning (ML) techniques have been implemented in numerous
applications, including health-care, security, entertainment, and sports. In this article, we present how
the ML can be used for building a professional football team and planning player transfers. Methods:
in this research, we defined numerous parameters for player assessment, and three definitions of a
successful transfer. We used the Random Forest, Naive Bayes, and AdaBoost algorithms in order
to predict the player transfer success. We used realistic, publicly available data in order to train
and test the classifiers. Results: in the article, we present numerous experiments; they differ in the
weights of parameters, the successful transfer definitions, and other factors. We report promising
results (accuracy = 0.82, precision = 0.84, recall = 0.82, and F1-score = 0.83). Conclusion: the presented
research proves that machine learning can be helpful in professional football team building. The
proposed algorithm will be developed in the future and it may be implemented as a professional tool
for football talent scouts.
rugby league players throughout the 2015 NRL season, including some data collected by
GPS (Global Positioning System). Similar approaches were presented in [13,14]. In these
articles, the authors proposed their systems for predicting the possibility of the injury of
the football players.
The authors in [15] were exclusively focused on football. They summed up the usage
of machine learning in this area and highlighted that possessing even bigger and bigger
amounts of data could bring about a revolution in football analytics. As potential fields for
implementing ML, they indicated tactic improvements, discovering the factors that lead to
goal scoring, the identification of the opposing team’s strengths and weaknesses before the
match, and determining the areas that a team needs to improve in.
The article [16] presents the machine learning approach for creating a ranking of
professional football players. The main aim of the research is to provide a data-driven
framework that offers a role-aware evaluation of football players’ performance. The ranking
estimation is also presented in the article [17], in which a generic algorithm was used in
the voting part of the system.
Because, across countries and continents, football has drawn increasingly more atten-
tion, more bookmakers are offering football bets. The scientific approaches to beating the
bookmakers were presented in [18,19].
It is also possible to combine machine learning with computer vision in the sport
domain, as presented in [20]. The article presents the basketball players’ movement
recognition and its classification as a shoot, a pass, a catch, or a dribble.
Machine learning can also be implemented in order to predict the future performance
of athletes. This kind of analysis can be especially beneficial for the coaches of young,
promising players, or the scouts looking for new sport stars. The possible implementations
of machine learning in performance prediction were presented in [21]-in tennis, [22]-
handball, and in [23]-archery.
However, the machine learning-based solutions could be useful not only for the
professional athletes, but also for amateurs. The wearable devices have recently become
increasingly popular. In [24], a possible implementation of the machine learning techniques
in the wearable devices was presented. Thanks to this kind of solution, a person can
constantly monitor their health. On the other hand, in [25], a ML-based framework
for training plan generation in coaching was proposed. In the sport jargon, a person
working on transfers and statistics is called a scout. There are plenty of professional tools
supporting scouts in their duties. However, they technically do not provide any transfer
suggestions (they can only present some statistics on the potential player), or they treat
the transfer management as a black box. The first one is Scoutactic (https://scoutastic.
com/en/). Among its functionalities, the following are enumerated: team management-
dynamic resource planning, task assignment, ticket status, area match search, and scouting-
activity analytics; match report analysis-extracting, displaying and filtering the relevant
content from match reports; and, data generation-automated generation of the relevant
performance data and development indicators while using the AI analysis of videos
(training, test matches, and TV broadcasts). However, it does not support team building
and transfers.
Wyscout is the other platform dedicated to scouts (https://wyscout.com/). It is the
world’s leading provider of football performance data. It provides advanced statistics that
can help coaches to analyse and prepare matches, give scouts powerful tools to identify
the most promising profiles, and enable player agents to better understand their players’
strengths and weaknesses. Even though this platform provides multiple and detailed data,
it does not support transfers.
Last but not least, a tool for scouts is Scisports (https://www.scisports.com/). It
has three main functionalities: recruitment, performance analysis, and data delivery. Al-
though it provides the recruitment, it does not support team building again. It only gives
some extensive filters and criteria (like age ranges, nationalities, and contract end-dates).
Entropy 2021, 23, 90 4 of 12
Table 1. Players’ physical parameters used in the football team building and transfer support-
ing method.
Table 2. Players’ technical parameters used in the football team building and transfer support-
ing method.
Table 3. Players’ psychological parameters used in the football team building and transfer supporting method.
The metrics that are included in the physical aspect are the statistics showing the
number of games played and the duels won against the opponent. We have chosen them
in order to reflect the performance and motor skills of the player.
The statistics that make up the technical aspect have been selected to reflect the
contribution of the player to the game of the whole team. The key elements of soccer, such
as scoring a goal, dribbling, or passing, can be presented in two ways: numerically (as an
effect) or as a percentage (as an effectiveness). Soccer is a team sport, so choosing only
one of these variants could create false results, e.g., if the team does not create shooting
situations, even the best striker will not score a large number of goals, adequately, with a
large number of situations a player with poor efficiency will score goals.
The common saying ’statistics don’t play’ says that taking only basic statistics into
account does not always reflect the skills of the player and his contribution to the team.
In order to best reflect the football profile, we have decided to also include metrics, such as
big chance created or mistake leads to a goal.
The psychological aspect consists of the statistics of the cards received, contact with
the ball, and the parameters that indicate the likely acclimatization of the player in the
new club. Penalties received for fouls are indicative of the level of aggression of the
player. The number of contacts with the ball may indicate the player’s participation in the
team’s game.
In order to create the dataset, we gathered together the pieces of information con-
cerning the transfers that were made in the most powerful professional football leagues.
The dataset contains the transfers from: La Liga—Spain, Serie A—Italy, Bundesliga—
Germany, Ligue 1—France, Premjer-Liga—Russia, Primeira Liga—Portugal, and Premier-
ship and Championship—UK. Those leagues are considered the best worldwide and,
basically, the transfers were made between the clubs representing those leagues.
The dataset contains the parameters obtained from four seasons: 2019/20, 2018/19,
2017/18, and 2016/17. The values were modified by weights-the most recent season is
more important than the other ones. Thus, we introduced the weights: 1.0, 0.8, 0.5, and
0.3, respectively. In a football game, there is one position with specific characteristics
and stats: a goalkeeper. He neither scores nor participates in aerials and passes. Hence,
the goalkeepers can influence the results of the predictions and it was decided to omit
them in the dataset. In the dataset, there were 3482 records, each corresponding to a single
football player transfer. The data acquisition step and all the subsequent elements of the
proposed approach are presented in Figure 1.
Figure 1. The dataflow of the proposed method: data acquisition, pre-processing, machine learning,
and the result.
In Figure 2, the distributions of the parameters in the dataset are presented. In this
figure, four bar charts are visible. They present the quantity of transfers in the dataset
concerning the specific value of a parameter: age, physical aspect, psychological aspect, and
technical aspect. As visible, they can be described as looking like the Gaussian distribution.
In fact, they represent the real situation; most of the players are on the average level of
technique, psyche, and physics, while only some players have extraordinary skills.
Entropy 2021, 23, 90 7 of 12
k n m
P = a· ∑ xtl αl + b · ∑ xpi βi + c · ∑ x f j γj (1)
l =1 i =1 j =1
Entropy 2021, 23, 90 8 of 12
Secondly, the success of the transfer can rely on the overall quality of the target team,
e.g., it is very difficult to score frequently, when the partners from the team are weak
and do not properly support the player. Thus, in the second approach, we compare the
player assessment (Equation (1)) with the average assessment of the team. If the person’s
evaluation is greater than or equal to the average value, the transfer is considered to
be successful.
The last successful transfer definition is based on the average team player’s assessment
again, but it introduces one modification: the average is calculated taking only 18 scores
(starting eleven + 7 substitutions). It is because, generally speaking, other players do not
appear in the line-up.
Because of the fact of using three different football player transfer definitions, the ratio
between the successful and unsuccessful transfers is not constant for all experiments.
For example, it was 2024:1458 for definition 1, 2378:1104 for definition 2, and 2384:1098 for
the last definition.
Figure 3. The market value (in eM) of some selected football players in 2019 and 2020: Kylian
Mbappe, Harry Kane, Neymar, Mohamed Salah, and Raheem Sterling.
Table 4. The results obtained from experiments for different successful transfer definitions.
Table 5. The results obtained from experiments for different weights of aspects (a—technical, b—
psychological, and c—physical).
Table 6. Results that were obtained from experiments for different classifiers.
5. Threats to Validity
Arsene Wenger, the veteran Arsenal boss, admitted that football, as well as any other
sport, is mainly unpredictable https://www.irishexaminer.com/sport/soccer/arid-20
321805.html. It may lead to the assumption that the proposed approach can just be a
suggestion. The transfer supporting may contain a flaw and the predictions that are made
by the proposed method have to be treated more as suggestions, since there are always
unpredictable factors (e.g., family life/problems, serious injuries).
Another threat to validity is the situation that can be currently observed. The 2020
season was interrupted and the training cycle was destroyed in most leagues all over
the world due to the global COVID-19 pandemic. It may cause some misleading data in
statistics and, consequently, it may lead to failed predictions in the future.
Last but not least, the differences between leagues may be another threat to validity.
It is very difficult to compare the leagues from different countries. The proposed method
was tested against the data regarding the players and transfers from the best leagues.
Entropy 2021, 23, 90 11 of 12
Additionally, our data from the top leagues only concerned the male players/leagues.
It is the future plan to evaluate whether the same approach will work well on the data
regarding female football players.
As for construct validity, the used dataset is believed to be representative for the top
football leagues, and it is well constructed based on the open and available data. As for
external validity, we are aware that the proposed method is based on top leagues and that
it can be biased and provide less promising results for weaker players and teams.
6. Conclusions
In this paper, we present the possible approach to the professional team building by
means of the machine learning based tool. The proposed solution provides promising
results and it can support a scout or a team manager in the process of transfer planning.
Nevertheless, as mentioned in the Section 5, there are some threats to validity that can
affect the obtained results and the whole approach. Additionally, as mentioned in [27],
sport is unpredictable. Hence, the indications coming from ML should be treated solely as
the pieces of advice.
In this research, unlike some other approaches that have been recently published
(e.g., [28]), we only used the realistic data. We obtained the dataset from the online sources
and hope that it can represent the real situation in football.
The obtained results also depend on the successful transfer definition and the weights
of the selected parameters. In this paper, we provide three various definitions and three
different weights for each parameter. It is also remarkable that we do not only take the
technical and physical parameters of a professional football player into account, but also
his psychological state.
In the near future, we are going to further improve the presented solution. The possi-
ble extension would include adding more classifiers (as a neural network) and building
ensembles. Apart from that, we are going to verify the solution on other data, possibly
coming from a greater number of (less popular) leagues. Last but not least, a possible
extension is an attempt to move the proposed approach to other ball-sports, e.g., volleyball
or basketball, which would definitely need the parameters’ lists to be redefined.
Author Contributions: Conceptualization, B.Ć. and M.C.; methodology, B.Ć., M.C. and A.G.; soft-
ware, B.Ć.; validation, A.G.; formal analysis, B.Ć., M.C., and A.G.; investigation, B.Ć.; resources, B.Ć.
and A.G.; data curation, B.Ć.; writing, original draft preparation, B.Ć. and A.G.; writing, review
and editing, B.Ć., M.C., and A.G.; visualization, A.G.; supervision, M.C.; project administration,
M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Sarlis, V.; Tjortjis, C. Sports analytics–Evaluation of basketball players and team performance. Inf. Syst. 2020, 93, 101562.
[CrossRef]
2. Bunker, R.P.; Thabtah, F. A machine learning framework for sport result prediction. Appl. Comput. Inform. 2019, 15, 27–33.
[CrossRef]
3. Xiao-wei, X. Study on the intelligent system of sports culture centers by combining machine learning with big data. Pers.
Ubiquitous Comput. 2020, 24, 151–163. [CrossRef]
4. Thabtah, F.; Zhang, L.; Abdelhamid, N. NBA game result prediction using feature analysis and machine learning. Ann. Data Sci.
2019, 6, 103–116. [CrossRef]
5. Baboota, R.; Kaur, H. Predictive analysis and modelling football results using machine learning approach for English Premier
League. Int. J. Forecast. 2019, 35, 741–755. [CrossRef]
6. Horvat, T.; Havaš, L.; Srpak, D. The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game
Outcomes. Symmetry 2020, 12, 431. [CrossRef]
7. Sharma, M.; Kumar, N.; Kumar, P. Badminton match outcome prediction model using Naïve Bayes and Feature Weighting
technique. J. Ambient. Intell. Humaniz. Comput. 2020, 1–15. [CrossRef]
Entropy 2021, 23, 90 12 of 12
8. Stübinger, J.; Mangold, B.; Knoll, J. Machine Learning in Football Betting: Prediction of Match Results Based on Player
Characteristics. Appl. Sci. 2020, 10, 46. [CrossRef]
9. Gu, W.; Foster, K.; Shang, J.; Wei, L. A game-predicting expert system using big data and machine learning. Expert Syst. Appl.
2019, 130, 293–305. [CrossRef]
10. Choraś, M.; Pawlicki, M.; Kozik, R.; Demestichas, K.; Kosmides, P.; Gupta, M. SocialTruth project approach to online disinforma-
tion (fake news) detection and mitigation. In Proceedings of the 14th International Conference on Availability, Reliability and
Security, Canterbury, UK, 26–29 August 2019; pp. 1–10.
11. Schumaker, R.P.; Jarmoszko, A.T.; Labedz, C.S., Jr. Predicting wins and spread in the Premier League using a sentiment analysis
of twitter. Decis. Support Syst. 2016, 88, 76–84. [CrossRef]
12. Welch, M.C.; Cummins, C.; Thornton, H.; King, D.; Murphy, A. Training load prior to injury in professional rugby league players:
analysing injury risk with machine learning. ISBS Proc. Arch. 2018, 36, 330.
13. Vallance, E.; Sutton-Charani, N.; Imoussaten, A.; Montmain, J.; Perrey, S. Combining Internal-and External-Training-Loads to
Predict Non-Contact Injuries in Soccer. Appl. Sci. 2020, 10, 5261. [CrossRef]
14. Liu, G.; Sun, H.; Bai, W.; Li, H.; Ren, Z.; Zhang, Z.; Yu, L. A learning-based system for predicting sport injuries. In MATEC Web of
Conferences; EDP Sciences: Les Ulis, France, 2018; Volume 189, p. 10008.
15. Herold, M.; Goes, F.; Nopp, S.; Bauer, P.; Thompson, C.; Meyer, T. Machine learning in men’s professional football: Current
applications and future directions for improving attacking play. Int. J. Sport Sci. Coach. 2019, 14, 798–817. [CrossRef]
16. Pappalardo, L.; Cintia, P.; Ferragina, P.; Massucco, E.; Pedreschi, D.; Giannotti, F. PlayeRank: Data-driven performance evaluation
and player ranking in soccer via a machine learning approach. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–27. [CrossRef]
17. Maanijou, R.; Mirroshandel, S.A. Introducing an expert system for prediction of soccer player ranking using ensemble learning.
Neural Comput. Appl. 2019, 31, 9157–9174. [CrossRef]
18. Stübinger, J.; Knoll, J. Beat the Bookmaker–Winning Football Bets with Machine Learning (Best Application Paper). In
International Conference on Innovative Techniques and Applications of Artificial Intelligence; Springer: Berlin/Heidelberg, Germany,
2018; pp. 219–233.
19. Knoll, J.; Stübinger, J. Machine-learning-based statistical arbitrage football betting. KI-Künstliche Intell. 2020, 34, 69–80. [CrossRef]
20. Ji, R. Research on Basketball Shooting Action Based on Image Feature Extraction and Machine Learning. IEEE Access 2020,
8, 138743–138751. [CrossRef]
21. Kramer, T.; Huijgen, B.C.; Elferink-Gemser, M.T.; Visscher, C. Prediction of tennis performance in junior elite tennis players. J.
Sport Sci. Med. 2017, 16, 14.
22. Oytun, M.; Tinazci, C.; Sekeroglu, B.; Acikada, C.; Yavuz, H.U. Performance Prediction and Evaluation in Female Handball
Players Using Machine Learning Models. IEEE Access 2020, 8, 116321–116335.
23. Musa, R.M.; Majeed, A.A.; Taha, Z.; Abdullah, M.; Maliki, A.H.M.; Kosni, N.A. The application of Artificial Neural Network and
k-Nearest Neighbour classification models in the scouting of high-performance archers from a selected fitness and motor skill
performance parameters. Sci. Sport 2019, 34, e241–e249.
24. Huifeng, W.; Kadry, S.N.; Raj, E.D. Continuous health monitoring of sportsperson using IoT devices based wearable technology.
Comput. Commun. 2020, 160, 588–595. [CrossRef]
25. Zahran, L.; El-Beltagy, M.; Saleh, M. A Conceptual Framework for the Generation of Adaptive Training Plans in Sports
Coaching. In International Conference on Advanced Intelligent Systems and Informatics; Springer: Berlin/Heidelberg, Germany, 2019;
pp. 673–684.
26. Choraś, M.; Pawlicki, M. Intrusion Detection Approach based on Optimised Artificial Neural Network. Neurocomputing 2020, in
press.
27. Horvat, T.; Job, J. The use of machine learning in sport outcome prediction: A review. Wiley Interdiscip. Rev. Data Min. Knowl.
Discov. 2020, 10, e1380. [CrossRef]
28. Behravan, I.; Razavi, S.M. A novel machine learning method for estimating football players’ value in the transfer market. Soft
Comput. 2020, 1–13. [CrossRef]