Football Analyst: Sistemes de Suport A La Decisió
Football Analyst: Sistemes de Suport A La Decisió
Football Analyst: Sistemes de Suport A La Decisió
ANALYST
Sistemes de Suport a la Decisió
Marc Valls y Ernesto Sánchez
31-05-2022
1. Introduction
Nowadays, obtaining, manipulating and interpreting data is a fundamental part of
many areas of work. In our case, we have focused on the world of football. Currently,
all top-level professional clubs have a team of professionals dedicated to this task.
That is why we thought it was interesting to investigate how they did it, what data
they used and what the reports they produced were like. After an intensive
investigation, we discovered that, thanks to the data, the technical staff of the teams
are able to predict the goals that the opposing team can score, the influence of the
public in the stadiums, the way the referees whistle, etc. It should be emphasized
that only data from the current season will appear in our database.
It is time to talk about the dimensions of the database of our project. First we have
the table Local Teams and Away Teams, which have the attributes Name, Stadium
Name, Establishment and Sponsor. Here is stored all the information related to the
teams of our database, these are the teams belonging to the five big European
leagues and the participants of the Champions League of the current season. This is
why we have the Competition table to store the different competitions. Then we have
the Time table, to keep track of the dates of the matches. As we have said before,
referees have a very important role in the football spectacle, so they could not be
missing in our database. The attributes we have thought for them are their name,
nationality, age, years of experience and gender. Finally, we have the Stadiums
table, which will have the attributes Name, City, Country, Capacity and Ticket Price.
Once the dimensions were clear, we had to construct the table of facts, which we
called Football Matches. This is where the measures that we have thought of for the
study of the teams appear. Among them we highlight the number of home and away
goals for each team, the number of yellow cards shown by a referee, and many more
that we will be able to see later on. Once the entire database is built, it's time for the
reports and machine learning.
2. Design
2.1 Concept design: information package.
Hierarchy:
Name < City < Country (Stadiums)
Day < Month < Year < Season (Time)
Name < Stadium_name (Local_Teams and Away_Teams)
Attribute types:
Time Type
Date DATE
Day NUMERIC(2,0)
Month NUMERIC(2,0)
Year NUMERIC(4,0)
Season VARCHAR(9)
Local_Teams Type
Name VARCHAR(40)
Stadium_name VARCHAR(40)
Establishment NUMERIC(4,0)
Sponsor VARCHAR(20)
Away_Teams Type
Name VARCHAR(40)
Stadium_name VARCHAR(40)
Establishment NUMERIC(4,0)
Sponsor VARCHAR(20)
Competitions Type
Name VARCHAR(20)
Round VARCHAR(20)
Stadiums Type
Name VARCHAR(40)
City VARCHAR(15)
Country VARCHAR(10)
Capacity NUMERIC(5,0)
Referees Type
Name VARCHAR(40)
Nationality VARCHAR(10)
Age NUMERIC(2,0)
Experience NUMERIC(2,0)
Sex VARCHAR(6)
2.2 Logical design: Star schema
Time id_time
Competitions id_competition
Local_Teams id_localTeam
Away_Teams id_awayTeam
Referees id_referee
Stadiums id_stadium
PARTITIONS
By Range:
An interesting partition would be using the PARTITION BY RANGE(date). Here, we can find
those matches played between two different dates. This can be useful in a query which
demmans, for example, the number of goals scored in a month, or in a part of the season.
By List:
For example, in this kind of partition, we can make a query about the number of yellow cards
of a list of referees. This is a common point of interest for the teams.
3. IMPLEMENTATION
3.1 Power BI Reporting
Finally, we have made a season sum up:
3.2 Application of data mining algorithms with
KNIME
AWAY_TEAM
LOCAL_TEAM
4. Conclussion
The realisation of this project has helped us to enter the interesting world of data
mining. We have had to carry out an intense research and a hard study to be able to
build the final result. The most interesting part of the project has been the creation of
the dashboards. After building the entire database, we have been able to see what
we wanted to analyse through all kinds of graphs.
We have also learnt the importance of a good use of data and we believe that it will
be useful for our future work.
With this document, we would like to introduce the exciting world of data mining to
those people who do not know about it and we hope that they have found it
interesting.