1.4 Data Mining in A Programming Language
1.4 Data Mining in A Programming Language
Data Analysis
1.4 Data mining in a programming language
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 1
1
INTRODUCTION
Hello Everyone,
In this presentation we are going to talk about data analysis / data mining and how can we
use a programing language to take advantage of it. As we all know the use of data is
extremely important for companies to know their held and to take decisions, as well as for
technology, especially artificial intelligence and machine learning. So in this presentation
we are going to talk about what data analysis and mining is; and some of its tools, how to
analyze data and how this relates to technology.
Remember that my door is always open in case that you have any question.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 2
2
DATA ANALYSIS INTRODUCTION
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 3
3
WHAT IS DATA ANALYSIS
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to suit facts,” Sherlock Holmes proclaims in
Sir Arthur Conan Doyle's A Scandal in Bohemia.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 4
4
DATA ANALYSIS TYPES
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 5
5
DATA ANALYSIS TYPES
There are several different types of data analysis. These are the following:
● Descriptive …………………………………………………….
● Diagnostic ……………………………………………………..
● Predictive ………………………..…………………………….
● Prescriptive ……………………………………………………
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 6
6
DATA ANALYSIS PROCESS
As the data available to companies continues to grow both in amount and complexity,
so too does the need for an effective and efficient process by which to harness the
value of that data. The data analysis process typically moves through several iterative
phases. Let’s take a closer look at each.
Identify the business question you’d like to answer. What problem is the company
trying to solve? What do you need to measure, and how will you measure it?
Collect the raw data sets you’ll need to help you answer the identified question.
Data collection might come from internal sources, like a company’s client
relationship management (CRM) software, or from secondary sources, like
government records or social media application programming interfaces (APIs).
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 7
7
DATA ANALYSIS PROCESS
Clean the data to prepare it for analysis. This often involves purging duplicate and
anomalous data, reconciling inconsistencies, standardizing data structure and
format, and dealing with white spaces and other syntax errors.
Analyze the data. By manipulating the data using various data analysis techniques
and tools, you can begin to find trends, correlations, outliers, and variations that
tell a story. During this stage, you might use data mining to discover patterns
within databases or data visualization software to help transform data into an
easy-to-understand graphical format.
Interpret the results of your analysis to see how well the data answered your
original question. What recommendations can you make based on the data?
What are the limitations to your conclusions?
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 8
8
DATA ANALYSIS PROCESS IN EXCEL
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 9
9
DATA ANALYSIS (DATA MINING)
Now that we know about Data Analysis, we need to incorporate another definition
when searching to take advantage of the information, Data Mining.
Data mining is the process of searching and analyzing a large batch of raw data in
order to identify patterns and extract useful information.
The difference of Data Analysis and Data Mining is that Data Analysis will help to clean
the information and present it on a way that will be easy to take decisions as for Data
Mining the information will be worked to extract specific information.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 10
10
DATA ANALYSIS (DATA MINING)
Data mining involves exploring and analyzing large blocks of information to
glean meaningful patterns and trends. The data mining process breaks down
into four steps:
● Classification ………………………………………………………
● Clustering …………………………………………………………..
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 12
12
DATA ANALYSIS (DATA MINING TECHNIQUES)
● Neural networks…………………………………………………
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 13
13
DATA ANALYSIS (DATA MINING)
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 14
14
DATA ANALYSIS (DATA MINING - DECISION
TREES)
A decision tree is a non-parametric supervised learning
algorithm, which is utilized for both classification and
regression tasks. It has a hierarchical, tree structure, which
consists of a root node, branches, internal nodes and leaf
nodes.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 16
16
DATA ANALYSIS (DATA MINING - DECISION
TREES)
An example in a business would be something like, "earnings are expected to increase by $5
million.” But since the events indicated by end nodes are speculative in nature, chance nodes
also specify the probability of a specific projection coming to fruition.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 17
17
DATA ANALYSIS (DATA MINING - DECISION
TREES)
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 18
18
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 19
19
WHAT IS BIG DATA
It is very important to remember what Big Data is since advance data analytics depends
on Big Data.
When we talk about Big Data we refer to data sets or combinations of data sets whose
size (volume), complexity (variability) and speed of growth (velocity) make it difficult to
capture, manage, process or analyze them using conventional technologies and tools,
such as such as relational databases and conventional statistical or visualization packages,
within the time necessary for them to be useful.
Although the size used to determine whether a given data set is considered Big Data is
not firmly defined and continues to change over time, most analysts and professionals
currently refer to data sets ranging from 30-50 Terabytes to several Petabytes.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 20
20
WHAT IS DATA ANALYTICS
Advanced analytics is a comprehensive set of analytical techniques and methods such
as Big Data, Artificial Intelligence (AI), Machine Learning, etc.
These techniques allow for better predictive analysis and provide insights into
technological change. As it occurs, it provides a broader view that enables organizations
to develop better responses and act on more accurate forecasts and processes.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 21
21
DIFFERENCE BETWEEN DATA ANALYSIS AND
DATA ANALYTICS
Data analysis is a traditional or generic type of analytics used in enterprises to make data-
driven decisions.
Data analytics is a specialized type of analytics used in businesses to evaluate data and
gain insights.
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 22
22
DATA ANALYTICS TYPES
Within advanced data analytics we can differentiate between 4 main types:
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 23
23
ADVANCE ANALYTICS
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 24
24
ADVANCE ANALYTICS TOOLS
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 25
25
WHAT IS THE ROLE OF AI IN DATA ANALYTICS
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 26
26
CONCLUSION
In conclusion, we can see that data can be used in different ways, we can use it to
review what happened in the past, what is happening in the present and what will
come in the future, but this does not stop there since Intelligent machines can also
use data, clean it and analyze it to make decisions, for example, the exact time to do
maintenance or what to produce, what not to produce and when to produce. What
other technological advances will the use of data bring us in the future?
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 27
BIBLIOGRAPHY CONSULTED
Coursera Staff, (Nov,2023) What is data analysis
https://www.coursera.org/articles/what-is-data-analysis-with-examples
Alexandra twin, (Feb, 2024) What Is Data Mining? How It Works, Benefits, Techniques, and Examples
https://www.investopedia.com/terms/d/datamining.asp
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 28
28
BIBLIOGRAPHY CONSULTED
Katherin Haan, (Mar, 2024) The Best Data Analytics Tools Of 2024
https://www.forbes.com/advisor/business/software/best-data-analytics-tools/
Fabyio Villegas, (2024) Data Analytics vs Data Analysis: Key differences with uses
https://www.questionpro.com/blog/data-analytics-vs-data-analysis/#:~:text=Data%20analytics%20is%20a%20general,parts%20relat
e%20to%20one%20another
.
PowerData, (-) Big Data: ¿En qué consiste? Su importancia, desafíos y gobernabilidad
https://www.powerdata.es/big-data
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 29
ADVANCE ANALYTICS TOOLS
Coordinación de Tecnologías
para la Educación
Contemporary Developments - Edgar Olivares (2024) 30
30
Todos los recursos educativos abiertos, elaborados por la Universidad
Anáhuac México y su equipo de docentes, se proveen bajo la licencia
Creative Commons Reconocimiento -NoComercial- SinObraDerivada CC
BY-NC-ND. http://creativecommons.org/licenses/by-nc-nd/4.0/
Coordinación de Tecnologías
para la Educación
Selected Topics in Information Technologies - Edgar Olivares (2024) 31
31