Asm1 Statistics DongThiLinh BH01210
Asm1 Statistics DongThiLinh BH01210
Asm1 Statistics DongThiLinh BH01210
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that
making a false declaration is a form of malpractice.
Grading grid
P1 P2 P3 M1 M2 D1
1
Summative Feedback: Resubmission Feedback:
2
Contents
I. Introduction..................................................................................................................................... 4
II. Evaluate the nature and process of business and economic data/information from a range of
different published sources (P1) ............................................................................................................ 4
2. Explain how data can be turned into information and information into knowledge from
published sources ............................................................................................................................... 6
III. Evaluate data from a variety of sources using different methods of analysis (P2 + P3) ........... 10
2. Interpreting data from a variety of sources using different methods of analysis .................... 11
IV. Critically evaluate the methods of analysis used to present business and economic
data/information from a range of different published sources (M1) .................................................. 17
3
I. Introduction
I am a Research Analyst at SSI Securities Joint Stock Company. My company is planning to
prepare business reports of some Vietnamese companies/industries by applying some statistical
methods. The purpose of this report is to provide the clearest view of how the applications of
statistical data analysis methods are demonstrated in different fields in real life, and especially is in
the economic environment. At the same time, this report also provides profound knowledge about
how to transform data into information, and from information into knowledge. Besides, the report
also helps readers understand how the implementation of analytical methods takes place by applying
them to real data. From there, we can bring insight into the business results of industries in Vietnam
and support investment decision making for SSI Securities Joint Stock Company.
Below is the main content of my report. The first part is evaluating the nature and process of
business and economic data/information from a range of different published sources including
providing information about data, information, knowledge and describing the conversion process
based on these. actual data. The next section, valuing data from a variety of sources using different
methods of analysis, provides content on applying analytical methods such as descriptive analytics,
exploratory analytics, confirmatory analytics, and describes how they are used. How to do this by
analyzing real data. The next section critically evaluates the methods of analysis used to present
business and economic data/information from a range of different published sources. And the last
part is application of analytical methods in different fields.
II. Evaluate the nature and process of business and economic data/information
from a range of different published sources (P1)
1. Define data, information and knowledge
Knowledge, information, and data are crucial words and concepts in knowledge management,
intellectual capital, and organizational learning. These words represent foundational concepts that
underpin the efficient functioning and growth of organizations. However, their meanings are often
subject to ambiguity and confusion, leading to challenges in effectively managing and leveraging these
assets. This study provides the reasons for the main terms' ambiguity and confusion, proposed
definitions of the key terms, and two models of their transformations and interactions.
4
The data are literal values and integers collected through various receiving methods. Data in
their simplest form consist of literal values and raw numbers. Although it is usually alphanumeric (text,
numbers, and symbols), it can include images or sounds (internetofwater.org, 2020). Data alone does
not have any meaning until it is processed - a process called contextualizing the data. From there,
information emerges as we work with those numbers and we can find value and meaning
(subscription.packtpub.com, 2018).
Information is created when data is processed, organized, or structured to provide context and
meaning. Information is essentially processed data. Information usually has several meanings and
purposes. Information can help us make decisions more easily. After processing the data, we can take
the information in a context to give it appropriate meaning (subscription.packtpub.com, 2018).
Knowledge is meaningful information. Knowledge only occurs when human experience and
insight is applied to data and information. We can talk about knowledge when data and information
turn into a set of rules to support decisions. In fact, we cannot store knowledge as it implies a
theoretical or practical understanding of a subject. The ultimate purpose of knowledge is to create
value (subscription.packtpub.com, 2018).
We can observe that the relationship between data, information and knowledge looks like
cyclical behavior. The example below will help you better understand these three terms.
Data Information Knowledge
Convert gage height to stream flow Restrict withdraws because
Stream gage height estimates to provide summary stats streamflow is below 7Q10
for last 10 years
Assess whether annual precipitation Prioritize investing in floodplain
Amount of precipitation
is increasing, decreasing, or staying mapping given increases in
in rain gage
the same precipitation over last 20 years
Combine lead level, customer, and Alert customers with lead levels
Amount of lead in water
drinking water standards data to exceeding safe drinking water
samples
locate violations standards
Correlate volume of treated water Continue investing in the low
Volume of treated water with number of low-flush toilets flush toilet rebate program give
installed over time large water savings
5
2. Explain how data can be turned into information and information into
knowledge from published sources
Through the preliminary analysis above, we can see the close relationship between data,
information and knowledge. In real life, there is countless data that we collect, but when we turn it
into information, we do not realize that we have processed that information to make it meaningful.
Besides, based on knowledge, experience, etc., each person will have a way to turn that information
into knowledge that brings value to real life. Researchers have also gone through such stages so that
from raw data they can come up with great inventions or measures to promote more developed life.
To better understand how raw data can be transformed into information and information into
knowledge, below I will analyze that process in more detail.
Average CPI growth rate in the first 9 months of 2023 compared to the
same period last year
6
4,89
5 4,6
4,18
3,84
4 3,55
3,29 3,16
3,12 3,1
3
0
January February March April May June July August September
6
searched and collected through an official source (General Statistics Office). These numbers are
considered raw data because when these numbers stand alone they do not have any meaning, they
are simply numbers and importantly, none of us have can accurately judge what these numbers
represent. Raw data refers to information that has not been processed, in this case, if I myself just
look at the above numbers, I can judge that these numbers may represent the set of measurement or
rate of increase in the price of gold. One thing that can be easily seen is that these are raw data and
they will not carry any value until they are processed.
Next, after the numbers I collected above have been proven to be raw data, the question here
is "How can we convert that data into information and Make the numbers more meaningful? In our
lives, when processing any data, our brains will tend to automatically process that data into
information through a process of learning and understanding. Observe that data. So the key point
here is that if we want data to become meaningful, we must process these data and this process is
called data contextualization. This means we must add context to the data in which these numbers
are collected or generated. Without additional context, the raw data won't represent anything
concrete. Going back to the data I have collected, including: 4.89; 4.6; 4.18; 3.84; 3.55; 3.29; 3.12; 3.1;
3.16. So for this data to be meaningful, it is imperative to put these numbers in context. And the
context here is the average CPI growth rate in the first 9 months of 2023 compared to the same period
last year in Vietnam. One thing that can be clearly seen is that when context is added, the data
becomes more meaningful. At this point we can understand from the data 4.89; 4.6; 4.18; 3.84; 3.55;
3.29; 3.12; 3.1; This 3.16 represents the average CPI growth rate in the first 9 months of 2023
compared to the same period last year in Vietnam. More specifically, we can also understand that
4.89 represents the average CPI increase in Vietnam in January 2023 compared to January 2022 or
3.16 also represents the average CPI increase. troops in Vietnam September 2023 compared to
September 2022. Without adding context to this data, we cannot derive meaningful information at
all. Besides, for this data to truly be transformed into information, we must go through an additional
processing process in addition to the process of adding context to the data, this processing may
include operations. math or comparison. For the above data representing the average CPI growth rate
in the first 9 months of 2023 compared to the same period last year in Vietnam, when I compare the
CPI growth rate of the 9 months, I draw the following results: The information is as follows: The
average CPI growth rate in January 2023 in Vietnam compared to January 2022 is the highest
7
compared to the remaining 8 months; The average CPI growth rate in August 2023 in Vietnam
compared to August 2022 is the lowest compared to the remaining 8 months. When performing
comparisons and subtractions for the CPI growth rate in August and September 2023, I obtained the
following information: The CPI growth rate in September 2023 is higher than the CPI growth rate in
August 2023 and 0.06% higher. When performing comparisons and subtractions for the CPI growth
rate in January and September 2023, I obtained the following information: The CPI growth rate in
September 2023 has decreased compared to the CPI growth rate in January 2023. and decreased by
1.73%. It can be seen that through the process we put context into the data and the process we
process, we gain a lot of information and this information has the role of providing the foundation to
become valuable knowledge.
And the final step of a data-driven transformation is transforming information into knowledge.
From the initial raw magic, after a process of adding context and processing data, from raw data it
becomes meaningful information, but the information it simply provides to us. understand that data,
understand what that data represents. Knowledge is different. If information simply helps us
understand data, then knowledge can bring value. Knowledge can be measures, policies, etc. that we
can derive through understanding information. But the way information is transformed into
knowledge depends on each person's knowledge, experience, observation and thinking. Going back
to the data I collected above, through the process of adding context, it is possible to understand that
those data represent the average CPI growth rate in the first 9 months of 2023 compared to the same
period of the year. in Vietnam and through the processing process, I also concluded that the average
CPI growth rate in Vietnam in 2023 is on a downward trend. While in January 2023 the growth rate
was at 4.89%, by September 2023 it had decreased to 3.16. After applying knowledge as well as my
own research and thinking, the following will be the knowledge transformed from the information I
have drawn. Specifically, to put it simply, the consumer price index (CPI) measures the average change
over time in the price that urban consumers pay for a basket of consumer goods and services on the
market. On the other hand, the rate of decline in CPI represents the decrease in the average price
level of that basket over a specific period of time. If the CPI falls, it usually indicates deflation or a
decline in the general price level of goods and services. This can have various effects on the economy.
Deflation can lead to lower consumer spending as people expect prices to fall further, which can
negatively impact businesses and economic growth. It can also increase the real debt burden as the
8
value of money increases over time. However, moderate and controlled declines in CPI can sometimes
be a natural part of a healthy economic cycle. Central banks often aim for a target inflation rate, and
a slight decline in the CPI can be a sign of a balanced economy. On the other hand, extreme and
prolonged deflation can pose challenges and is often a concern for policymakers. The CPI decrease in
2023 is due to the 2023 domestic petroleum price index decreasing by 11.02% compared to 2022
according to world price fluctuations, causing the overall CPI to decrease by 0.4 percentage points;
kerosene decreased by 10.02%. In addition, the gas group price index decreased by 6.94% compared
to 2022, causing the overall CPI to decrease by 0.1 percentage point. In addition, the price index of
the postal and telecommunications group decreased by 0.81% compared to the previous year due to
the decrease in prices of older generation mobile phones. A decrease in CPI in 2023 may put pressure
on inflation in 2024. World input raw material prices are high, Vietnam is a country that imports a lot
of raw materials for production, so it will affect costs. , costs, putting pressure on businesses'
production and thereby pushing up the prices of domestic consumer goods. In addition, the rising US
dollar increases the cost of importing raw materials, putting pressure on the price level of domestic
goods. In addition, the Government's recovery support programs, public investment disbursement,
tourism services... are expected to put pressure on the price level in the near future. In 2024, to
achieve the average CPI target of 4% - 4.5%, I will propose policies to increase CPI and control inflation
at a stable level. One is to strengthen the implementation of public investment according to the 2024
State budget estimate approved by the National Assembly along with controlling the efficiency and
input material prices of projects. Second, continue to maintain a loose monetary policy (the State
Bank maintains low interest rates and expands controlled credit). However, it is necessary to strictly
control credit disbursement for the right purpose and effectively when using capital for production
and consumption to ensure total supply output and business efficiency and increase total demand
through personal consumption. core. Third is to stabilize the foreign exchange market and currency
value in the context of the Government using loose monetary policy (going against the policy trend of
countries around the world). Therefore, it is necessary to maintain a surplus of trade balance for
goods, increase attraction and disbursement of FDI capital and attract foreign currency through
investment projects, thereby contributing to stabilizing the foreign exchange market. . Fourth,
effectively and flexibly carry out the role of regulating and stabilizing prices of State-managed goods.
Ministries and branches should proactively propose plans to adjust the prices of State-managed
9
goods, avoiding accumulation in the first months of 2024. The year is to fully prepare goods sources
to ensure timely response to people's needs. people, especially food items, essential consumer goods
and services. Sixth, strengthen propaganda work, ensure timely, transparent, and complete
information on prices and price management work of the Government and the Price Management
Steering Committee.
III. Evaluate data from a variety of sources using different methods of analysis (P2 +
P3)
1. Definition of terms in statistics
Data analysis is the process of collecting, modeling, and analyzing data using various statistical
and logical methods and techniques. Businesses rely on analytical processes and tools to extract
insights that support strategic and operational decision making (Calzon, 2023). There are many
different data analysis methods that can be suitable for each type of data or the purpose of what the
analyst wants to use the data for, but below I will analyze the content more clearly. It covers three
data analysis methods: Descriptive Analysis, Exploratory Analysis, and Confirmatory Analysis.
The first is for descriptive analysis. Descriptive statistics are brief informational coefficients that
summarize a given data set, which can be either a representation of the entire population or a sample
of a population. Descriptive statistics are broken down into measures of central tendency and
measures of variability (spread). Measures of central tendency include the mean, median, and mode,
while measures of variability include standard deviation, variance, minimum and maximum variables,
kurtosis, and skewness (Adam Hayes, 2023).
• Mean: Use “Mean” to describe the sample with a single value representing the center of the
data. Many statistical analyzes use the “Mean” as a standard measure of the center of the data
distribution.
• Minimum: Minimum value is the smallest data value in “Sample”. Use the minimum value “Min”
to identify a possible exception or data entry error. One of the simplest ways to evaluate the
spread of your data is to compare the minimum and maximum levels.
• Maximum: Maximum value is the largest data value in “Sample”. Use the maximum value to
identify possible exceptions or data entry errors. One of the simplest ways to evaluate the
spread of your data is to compare the minimum and maximum levels.
10
• Median: Median is another measure of the center of a data distribution. The “Median” median
value is usually less affected by outliers than the “Mean” median value. Half the data values are
greater than the median value and half the data values are less than the median value.
• Standard deviation: Use standard deviation to determine the spread of data compared to the
average value.
• Observation: is a collection of data of an overall unit collected expressed under different
measurement scales (licensesoft.vn, 2019).
The second is Exploratory analysis. As its name suggests, the primary purpose of exploratory
analysis is discovery. Before that, there was still no concept of the relationship between data and
variables. Once data is investigated, exploratory analysis helps you find connections and generate
hypotheses and solutions to specific problems. A typical application area for it is data mining.
Besides the two data analysis methods above, there is also confirmatory analysis. Confirmatory
data analysis is the part where you evaluate your evidence using traditional statistical tools such as
significance, inference, and reliability. Confirmatory data analysis involves things like: testing
hypotheses, making estimates with specific levels of precision, regression analysis, and variance
analysis. In this way, your validation data analysis is where you put your findings and arguments to
trial (sisense.com, 2019).
11
industries in Vietnam, and at the same time provide providing a foundation for decision-making in
determining the investment level for SSI Securities Joint Stock Company.
Min -9 -9 -9 -9 -9
In the case of the first variable, the variable "d3a" represents the percentage of national sales
for each product. The average value of this variable is 81,564, meaning that on average each product
accounts for 81,564% of national revenue. This value is quite high and unrealistic, indicating that there
are some outliers or errors in the data. The median value is 100, which means that half of the products
have a national sales percentage equal to or less than 100%, and half have a percentage equal to or
greater than 100%. The maximum value is 100, which means the highest percentage of national sales
for any product is 100%. The minimum value is -9, which means the lowest percentage of national
sales for any product is -9%. This is not possible, because negative percentages have no meaning in
this context. The standard deviation is 34.675, which measures how different the values are from the
mean. A high standard deviation means there is a lot of variation or dispersion in the data. In this case,
the standard deviation is very high compared to the mean and median, indicating that the data is not
consistent or reliable. The number of valid observations is 996, which means there are 996 products
with valid values for the national sales percentage. This rules out any missing or invalid values, such
as blank spaces or non-numeric entries.
12
In the case of the second variable, the variable “d12a” represents the percentage of input
materials and supplies of domestic origin in the last fiscal year. The average value of this variable is
70.131, meaning that on average, businesses use 70.131% of domestic materials in their production
process. The median value is 100, meaning half of the businesses only use domestic raw materials and
supplies, while the other half use imported raw materials and supplies from abroad. The maximum
value is also 100, meaning some businesses only use domestic raw materials and supplies. The
minimum value is -9, meaning that some businesses reported a negative proportion of domestic raw
materials and supplies, which is impossible and indicates a data entry error. The standard deviation is
39,640, which means there are huge differences in the value of this variable across companies. This
suggests that there is no clear pattern or trend in companies' use of domestic raw materials and
supplies and that they may have different strategies or preferences regarding sourcing decisions. its
supply.
In the case of the third variable, the variable “l3b” represents the percentage of full-time
workers who have completed high school. The mean value of "l9b" is 88.608, meaning that on average
88.608% of full-time workers have completed high school. The median of "l9b" is 100, meaning the
middle value of the variable is 100%. This indicates that there is a skewed distribution of the variable,
with more values at the higher end. The maximum value of "l9b" is also 100, meaning the highest
value of the variable is 100%. This means that some full-time workers have completed high school
with perfect scores. The minimum value of "l9b" is -9, which means the lowest value of the variable is
-9%. This is an invalid value because having a negative percent complete is meaningless. This suggests
that there was some error in the data collection or aggregation process. The standard deviation of
"l9b" is 26,390, which means there is a large variation or dispersion of values around the mean. This
also indicates that there are some extreme values or outliers in the data. The valid observations of
"l9b" are 996, which means there are 996 full-time employees with valid values for this variable.
In the case of the fourth variable, the variable “d3c” represents the percentage of direct export
revenue of a sample of 996 enterprises. The descriptive statistics of this variable show a mean value
of 11,915, meaning that on average the firms in the data directly export about 12% of their revenue.
However, the average value is 0, indicating that half of the firms do not export directly. This shows
that the distribution of this variable is highly skewed, with a small number of companies having very
high direct export revenue values and many companies having zero or negative values. The maximum
13
value is 100, meaning some firms directly export all of their revenue, while the minimum value is -9,
meaning some firms have negative export revenue. This could be due to errors in data collection or
because some businesses import more than they export. The standard deviation of the variable is
28.502, reflecting the high variability and dispersion of the data. A large standard deviation implies
that the data points are far from the average, suggesting a low degree of similarity between firms in
terms of their direct export revenues.
In the case of the last variable, the “eah2a” variable measures the percentage of revenue from
new or significantly improved products or services in the sample of 304 companies. The descriptive
statistics of this variable show a mean value of 35.056, meaning that on average the companies in the
sample derive 35.056% of their revenue from new products or services. The median value is 25,
meaning half of the companies have less than 25% of their revenue from new products or services
and half have more than 25%. The maximum value is 100, which means some companies get all their
revenue from new products or services. The minimum value is -9, which means there may be some
companies with negative revenue from new products or services, possibly due to costs or losses
related to innovation. The standard deviation is 33.056, which means that there are large differences
in the proportion of revenue from major innovations across companies in the sample. This suggests
that some companies have been more successful than others in generating revenue from their key
innovation.
From the analysis described above, the next step here will be advanced data analysis, which is
exploratory analysis. After performing analytical description for 5 random variables selected from the
data file, through descriptive analysis, it is possible to find the basic information of each variable
clearly summarized. and easy to understand. Besides, also in the process of performing descriptive
analysis of these 5 variables, applying my knowledge and experience, I received the variable "d3a"
(can accurately calculate the percentage of sales). national sales) and the variable “d3c” (which
represents the direct export rate in percentage) are correlated with each other. The variable “d3a”
means 81,564; median is 100; maximum is 100; minutes is -9; the deviation is 34,675 and the
corresponding quantity value is 996. The variable “d3c” means 11,915; median is 0; maximum is 100;
minutes is -9; The standard deviation is 28,502 and the valid number is 996. As I mentioned above, I
have noticed a correlation between these two variables, but figuring out whether this is true or not is
impossible to confirm because This is only based on my own subjective factors and has not been
14
proven based on specific evidence. So, with the use of exploratory analysis using a scatter plot with
the goal to explore the relationship between two variables “d3a” and “d3c”, we will see a correlation
between the two. Variables can display any information analysis work based on graphs. But first, to
do this, the two variables "d3a" and "d3c" must have the condition of quantifying the variable. These
2 variables are both quantitative variables because they both represent a percentage of national sales
and direct export sales, these numbers are completely measurable and are expressed in terms of
specific tools. And they are also expressed with specific tools so it is possible to find the mean, median,
etc. of each number. Therefore, it can be determined that the variable "d3a" and the variable "d3c"
are both quantitative variables. Because the condition is satisfied, I will next examine the relationship
between the two variables using a scatter plot.
Above is the Scatter Plot chart that indicates the correlation between the two variables "d3a"
and "d3c". Based on the chart, we can see between the two variables are linear, negative and quite
strong relationship between the two variables. This shows that when the percentage of direct export
revenue increases, the percentage of national revenue decreases and vice versa. There are also some
exceptions hidden in data, such as according to the above chart, there are some points of value
located in a very low position. These exceptions may have some special characteristics that affect their
15
sales performance, such as market size, product quality or customer preferences. Besides, the
correlation coefficient of -0.944 also helps confirm a negative and strong relationship between these
two variables. From this can also show that, through the method of discovery analysis, we can discover
the relationship between the two variables independently and can help the data become more useful
than all.
In the process of performing exploratory data analysis and through scatter plots, we can
discover the correlation between two variables, specifically the variables “d3a” and “d3c”. This serves
as the foundation for us to proceed to confirmatory analysis. The purpose of confirmatory analysis is
to affirm the relationship between two variables with certainty. In exploratory analysis, as the name
suggests, we only discover the relationship between two variables, but it may be somewhat
ambiguous as it is only represented through the visual display of the chart. In contrast, with the
confirmatory analysis method, we can assert the clear relationship between the two variables by
demonstrating it through equations. Therefore, the main objective of this section is to find the
regression equation between the two variables and confirm the relationship between them.
The correlation coefficient of -0.944 between the two variables indicates a strong negative linear
relationship. This means that as one variable increases, the other variable decreases proportionally.
From this, we can deduce the regression equation between the two variables “d3a” and “d3c” as
follows:
d3a = -0,944 x d3c + 92,816
Finding the regression equation as above further confirms the negative correlation between the
two variables. Therefore, in the case where other factors remain constant, if the percentage of direct
exports (d3c) increases by 1%, the percentage of national sales (d3a) will decrease by 0.944%.
Additionally, the intercept of the equation (92.816) represents the value of “d3a” when “d3c” is 0,
indicating that in the absence of direct exports, national sales would be 92.816%. The equation can
be utilized to predict the value of “d3a” for any given value of “d3c” or vice versa by substituting the
known variable and solving for the unknown one.
16
3. Evaluate qualitative data
The variable "d30b" assesses the perceived level of difficulty that respondents face in accessing
transport services. It is a qualitative variable with seven possible categories: "Don't know," "Does not
apply," "No obstacle," "Minor obstacle," "Moderate obstacle," "Major obstacle," and "Very severe
obstacle." Among these, "Don't know" appears 18 times (1.81%); "Does not apply" appears 3 times
(0.30%); "No obstacle" appears 444 times (44.58%); "Minor obstacle" appears 287 times (28.82%);
"Moderate obstacle" appears 162 times (16.27%); "Major obstacle" appears 61 times (6.12%), and
"Very severe obstacle" appears 21 times (2.11%). Analyzing this data will provide a clearer
understanding of the results of the qualitative variable, identifying which results are predominant and
which ones have the smallest proportions. In this case, with the variable "d30b," the frequency
distribution shows that the majority of respondents (44.58%) indicate no obstacles in accessing
transportation, followed by 28.82% reporting minor obstacles. The percentage of respondents
reporting obstacles at a moderate, major, or very severe level is 16.27%, 6.12%, and 2.11%,
respectively. Only 0.30% of respondents state that the question does not apply to them, and 1.81%
claim not to know the answer. The variable "d30b" can be utilized to analyze the accessibility and
affordability of transport services for different respondent groups, as well as the potential impact of
transportation barriers on their livelihoods and well-being.
IV. Critically evaluate the methods of analysis used to present business and
economic data/information from a range of different published sources (M1)
Descriptive analytics is a data analysis method that summarizes and presents key features of a
data set without making any inferences or predictions (studyonline.unsw.edu.au, 2020). It is often
17
used as a preliminary step in exploratory data analysis or as a way to communicate basic
characteristics of a data set to a lay audience. The descriptive analysis method brings certain benefits
to users. First of all, descriptive statistics is a simple method to summarize data (Yihang Dong, 2023).
We only need to use basic statistical tools and techniques to be able to understand the main features
in the data set we are looking for. need. By using manual formulas for small data files or using
worksheets like Stata for very large data files, we can see the average value. mean, median, mode,
standard deviation, etc. of any quantitative data file. As analyzed above, I chose 5 random variables
to analyze very quickly and easily. All of these 5 variables have a very large number of observations,
but through the use of description, it only took me less than 1 minute to get the results about average,
median, min, max, etc. Thanks to descriptive statistics, data analysis becomes faster and easier than
ever. Next, descriptive analytics can also provide a clear overview of the data, which can facilitate data
visualization and reporting (Hussain, 2015). These insights can then be used to create effective data
visualizations and reports that can communicate key findings and meanings of the data to diverse
audiences. Just like applying the analysis described above to the variable "d3a", it can be understood
that on average each product accounts for 81,564% of total national revenue or half of the products
have a percentage of global sales. countries equal or less than 100% and half have percentages equal
to or greater than 100%. Understanding the information of this data happens quickly and
conveniently, and this is also the foundation for data to be analyzed more deeply to support in real
cases.
Besides the advantages that the descriptive analysis method brings, this descriptive analysis
method also has certain limitations. The first is that descriptive analysis does not provide any causal
explanations or predictions for observed phenomena, which can limit its usefulness in answering
complex research questions. complex or solve real-world problems (Hayes.L, 1997). For example,
descriptive analysis can help us infer that there is a correlation between two variables, but this is only
based on the subjective factors of the person performing the data analysis, descriptive analysis. does
not tell us how they are related. Next, descriptive analytics may not capture the underlying
relationships, interactions, or dependencies between variables, which can lead to oversimplification
or misinterpretation of the data (Tennant, 2019). For example, descriptive analysis may ignore the
impact of confounders or moderating variables that influence the outcome of interest. Overall,
descriptive analytics is a useful data analysis method that can provide a quick and easy overview of
18
data. However, it has some limitations that should be considered when interpreting and applying the
results. Descriptive analytics needs to be supplemented with other data analysis methods that can
provide more insights and value from the data.
Exploratory analysis methods are a set of techniques that aim to summarize, visualize, and
explore patterns in data without making any formal assumptions or hypotheses (J. Behlings, 2012).
This exploratory analysis method has certain advantages. First, exploratory analysis can help us better
understand the data, its structure, distribution, outliers, and discover relationships and underlying
data problems (Komorowski, 2016). Using descriptive statistics, histograms, box plots, or scatterplots
can help summarize key features and variations in the data. These methods can also help identify any
errors, missing values, or anomalies in the data that may affect the quality and validity of the analysis.
Next, exploratory analysis can help generate new ideas, hypotheses, or research questions that can
be tested later using more formal methods (A. Gelman, 2004). For example, using correlation analysis,
clustering, or dimensionality reduction can help identify patterns or groups in the data that may
suggest interesting relationships or factors. These methods can also help explore data from different
perspectives or perspectives, and reveal hidden or unexpected insights that may not be obvious from
just looking at the data.
Besides the advantages that this method can bring, exploratory analysis still has certain
limitations. First, exploratory analysis can be subjective and biased, depending on the choices and
preferences of the analyst (Russo, 2015; Schweinsberg, 2021). Different analysts can draw different
conclusions from the same data using different exploratory methods. For example, using different
scales, colors, or labels can affect how the audience perceives and interprets the data. These methods
may also reflect the analyst's prior knowledge, beliefs, or expectations, which may not be shared or
agreed upon by others. Next, they can be misleading or inaccurate, especially if the data are noisy,
complex, or contain hidden confounders (A. Gelman, 2004). Exploratory methods may not account
for uncertainty, randomness, or causality in the data. For example, using averages, percentages, or
simple correlations can ignore variation, distribution, or the influence of other variables in the data.
These methods can also overfit or underfit the data, meaning they can capture noise or miss important
signals in the data. In addition, exploratory methods may not provide sufficient evidence or
justification for claims or inferences made from the data. They may be incomplete or incomplete,
requiring further confirmation or confirmation by more formal methods. Exploratory methods may
19
not be able to answer specific research questions or hypotheses of interest (Lindsey, 1999). For
example, using linear regression, logistic regression, or decision trees may provide simple and
straightforward models for the data, but they may not capture the complexity, non-linearity, or
correlation of data. effects of fundamental phenomena. These methods may also lack robustness or
generality, meaning they may not work well for new or different data sets.
Confirmatory analysis methods are statistical techniques for testing whether a predetermined
model or hypothesis fits observed data (Fox, 2010). Confirmatory analysis methods include
confirmatory factor analysis (CFA), structural equation modeling (SEM), and hypothesis testing. The
confirmatory analysis method has certain advantages. First, they allow researchers to test their
theoretical assumptions and expectations against empirical evidence (Reuterberg, 1992). Or on the
other hand, confirmatory analysis can also provide data to help confirm the relationship between two
independent variables based on supporting evidence. For example, a researcher might use
confirmatory analysis to test whether a personality test measures the five personality factors as
expected. This can help the researcher confirm the validity of the test and its agreement with theory.
Additionally, researchers can use fit indices and factor loadings to evaluate how well the model
represents the data and how well each item relates to its corresponding factor. Next, confirmatory
analysis provides a rigorous and objective way to assess the validity and reliability of a measurement
tool or cause-and-effect relationship (Wolff, 1983). For example, we can use confirmatory analysis to
test whether a business's training program improves employee performance and how it affects
motivation and satisfaction. their hearts. This can help the researcher evaluate the effectiveness of
the program and its impact on various aspects of employee well-being. Furthermore, researchers can
use path coefficients and errors to estimate the magnitude and direction of causal effects and
measurement error.
Besides the advantages that the confirmatory analysis method brings, this method also has
certain limitations. First of all, to carry out the analytical method confirms the solid theoretical
foundation and clear specification (James, 1982). For example, if we want to confirm the relationship
between two independent variables, we must first use our knowledge to make a judgment about
whether those two independent variables are related or not. No, if we make a wrong determination,
the results will also be wrong. Or, for example, a researcher needs to understand the literature and
theory behind the construct or relationship they want to measure or test. This can limit the
20
researcher's creativity and flexibility and restrict them from following a predetermined plan.
Additionally, if the researcher makes incorrect or inappropriate assumptions or specifications, they
may present a model or hypothesis that is invalid or misleading. Additionally, when confirmatory
analyzes are performed they can be influenced by researcher bias or confirmation bias if the model
or hypothesis is not tested against alternative explanations. For example, a researcher may ignore or
reject evidence that conflicts with his or her expectations or preferences. This could compromise the
objectivity and validity of the results and lead to false or misleading conclusions. Additionally, if there
are multiple plausible models or hypotheses that fit the data equally well, it may be difficult to choose
among them based on empirical criteria alone.
V. Application of analytical methods in different fields (M2)
Amid the rapid evolution of various industries, the adoption of data analytics methods has
emerged as a key driver in informing decision-making processes. The complex and dynamic nature of
sectors such as healthcare, banking and manufacturing require a deep understanding of data to derive
valuable insights. This exploration delves into the multifaceted use of data analytics methods,
including descriptive analytics, exploratory analytics, and confirmatory analytics, across various
industries. By taking a close look at how these analytics methods can be harnessed in healthcare,
banking and manufacturing, we aim to shed light on their impact in improving operational efficiency,
planning overall strategy and decision-making in each area.
In the healthcare industry, decisions often have life-changing consequences for both patients
and the general population. The ability to quickly collect and analyze complete, accurate data allows
decision-makers to make choices regarding treatment or surgery, predict the path of large-scale
health events, and long-term planning. Whether you are a medical doctor working directly with
patients or a healthcare administrator dealing with the business side of the industry, applying data
analytics methods such as descriptive analytics, Exploratory or confirmatory analytics can provide the
foundation for sound, impactful decision making. Start with descriptive analysis, which aims to
summarize and present data in a meaningful way. In the healthcare industry, descriptive analytics can
answer questions such as: What are the patient characteristics? How many patients are in each
category? What are the most frequent or common conditions or treatments? How do patients
compare across different groups or regions? (Catherine, 2021). For example, doctors use descriptive
21
analytics to understand data about patients with diabetes. A table analyzing data from descriptive
statistics can show the number and proportion of patients with different types of diabetes in a
hospital. This can help determine the prevalence and distribution of diabetes in patient populations
and compare it with national or global statistics. A table can also display descriptive statistics of the
data, such as mean, median, mode, standard deviation, range, and quartiles. Besides, in the medical
industry, descriptive analysis is also applied to hospital admission trends of patients. Descriptive
analytics can help track changes in the supply and demand of healthcare services and evaluate the
impact of policies or interventions. Descriptive analysis also helps us locate initial thoughts about the
correlation or relationship between two or more variables, such that from descriptive analysis doctors
can make meaningful predictions. subjectivity about the relationship between hospital admissions
and seasonality or weather conditions. And this is also the foundation so that when performing
exploratory and descriptive analysis, these predictions can be clearly and accurately verified. Next, in
the medical industry, descriptive statistics can also be used to analyze patient satisfaction. Applying
descriptive analysis can show what percentage of customers are completely satisfied with the quality
of service at the hospital, and what percentage of customers are dissatisfied. This can help measure
quality of care and identify areas for improvement or best practice. In particular, in the medical
industry it can be used to determine the extent of virus spread. One of the main uses of descriptive
analytics in determining the spread of a virus is to create dashboards that display key metrics and
metrics related to the outbreak. These dashboards can display information such as the number of
confirmed cases, deaths, recoveries, testing, vaccinations and hospitalizations over time and across
different regions. Also shows the distribution of cases by age, gender, ethnicity, and other
demographic factors. Dashboards can help stakeholders such as health authorities, policymakers,
researchers and the public track the progress and impact of the virus outbreak and compare the
situation between different countries. different countries or regions (World Health Organization,
2020). Specifically, the Centers for Disease Control and Prevention (CDC) has leveraged descriptive
analytics to predict the next flu outbreak. CDC has been collecting data on reported flu cases for more
than a decade. Through a process of descriptive statistics, CDC uses that data to help predict the
severity of future flu seasons. Similar to the influenza example is the coronavirus (COVID-19)
pandemic. Descriptive analytics is fundamental to helping predict future spikes in cases that can help
hospitals ensure staff have enough personal protective equipment and patient beds. It can also enable
22
school administrators to decide if in-person learning is a safe option and for individuals to make the
right choices about personal safety and hygiene, physical distancing society and travel (Catherine,
2021). Besides the descriptive analysis method, the exploratory analysis method also brings many
positive effects to the healthcare industry. It involves finding patterns, relationships, exceptions,
anomalies, or hidden factors in data that are not obvious from descriptive analysis. Exploratory
analysis can help us generate hypotheses, identify potential causes or effects, discover new insights,
or find opportunities for improvement. Descriptive analytics can be useful in the healthcare industry
to report the current status of health status, patient population, treatment or medical service
outcomes, etc. For example, a pharmaceutical company wants to develop a new drug to treat
diabetes. It performs exploratory analysis on clinical trial data to understand the effects of drugs on
different types of patients. Next use descriptive statistics to summarize the baseline characteristics
and outcomes of patients who received the drug or placebo, using visualizations to show changes in
blood sugar over time. time for different patient groups. In addition, the use of correlation analysis to
examine the relationship between the results and potential predictors, such as age, gender, weight,
diet, etc. As a result, it can evaluate the effectiveness and safety of the drug, and determine the
optimal dosage and target population for the trial. Miller (2017) explained that Blue Cross Blue Shield
has been analyzing pharmacy and insurance data for many years. Data are relevant to opioid abuse
and overdose. Through exploratory analysis, Blue Cross Blue Shield was able to effectively identify
nearly 750 risk factors that can predict whether someone is at risk for opioid abuse (comptia.org,
2018). Confirmatory analysis is the highest level of data analysis. It involves testing hypotheses,
verifying assumptions, validating results, or evaluating using statistical techniques such as hypothesis
testing, confidence intervals, correlation, regression, ANOVA, etc. Confirmatory analysis can help us
confirm or reject our hypotheses, quantify our uncertainty, measure our effect size, or assess our
significance. The healthcare industry is one of the most data-intensive sectors in the world, generating
huge amounts of data from a variety of sources such as electronic health records, medical claims,
clinical trials , patient surveys, and wearables. These data can provide valuable insights to improve
healthcare quality, efficiency and outcomes. However, to draw valid and reliable conclusions from
these data, it is essential to use appropriate confirmatory data analysis methods that can account for
the complexity and magnitude of the data. uncertainty of the data. One of the most popular methods
of analyzing validation data in the healthcare industry is regression analysis. Regression analysis is a
23
statistical technique that examines the relationship between one or more independent variables
(predictors) and a dependent variable (outcome). For example, regression analysis can be used to
investigate how patient characteristics (such as age, gender, and comorbidities) influence length of
hospital stay or risk of readmission. Regression analysis can also be used to evaluate the effectiveness
of an intervention (such as a drug or procedure) by comparing the results of a treatment group and a
control group. A recent study by Christoph D Spinner et al (2020) used regression analysis to compare
the mortality of COVID-19 patients receiving remdesivir versus those receiving standard care.
Remdesivir is an antiviral drug approved by the FDA for the treatment of COVID-19 in October 2020.
This study was a retrospective cohort study using data from 2348 admitted COVID-19 patients.
hospitals treated with remdesivir or standard care from May to October 2020. The study used Cox
proportional hazards regression to estimate the hazard ratio (HR) of death from remdesivir versus
care standard of care, adjusting for potential confounders such as age, sex, race, comorbidities,
oxygen requirements, and hospital characteristics. Results showed that remdesivir was associated
with a lower risk of death in hospitalized COVID-19 patients (HR = 0.74; 95% CI = 0.59-0.93; p = 0.01),
suggests that remdesivir may have a beneficial effect on survival. In summary, applying data analytics
methods, including descriptive analytics, exploratory analytics, and confirmatory analytics, has proven
to be instrumental in advancing the healthcare industry. These analytical methods have enabled
healthcare professionals and organizations to draw meaningful insights from massive data sets,
helping to make more informed decisions, improving outcomes. of patients and improve operational
efficiency. Integrating these data analytics methods not only transforms the way healthcare data is
interpreted, it also paves the way for innovative research, personalized medicine, and evidence-based
practice, ultimately contributing part of the overall improvement of health care delivery and patient
care. As technology continues to evolve, the synergy between data analytics and healthcare
approaches will certainly play a pivotal role in shaping the future landscape of the industry, driving a
on data to maximize the potential for positive health outcomes.
Besides the application of statistical data analysis methods that bring many benefits to the
healthcare industry, the same goes for the banking industry. The advent and application of data
analytics has helped the banking industry optimize its processes and streamline its operations,
thereby improving efficiency and competitiveness. Many banks are working to improve their data
analytics, mainly to give them a competitive edge or to predict emerging trends that could affect their
24
business. Starting with descriptive analytics, banking businesses can use descriptive analytics to
understand customer segments and their preferences by analyzing demographic, behavioral, and
transaction data. pandemic. Banks use descriptive analytics to segment customers based on their age,
income, location, spending habits, and loyalty. Banks can then offer different products and services to
each segment, such as savings accounts, credit cards, loans, insurance or investment plans. Banks can
also use descriptive analytics to track customer satisfaction and feedback, and adjust their services
accordingly. This can help banks tailor their products and services to meet customer needs and
expectations, as well as identify cross-selling and up-selling opportunities. Besides, a more notable
point is that descriptive analysis is also applied in the banking industry to manage risk. Manage risk
and compliance by analyzing credit scores, default rates, loan-to-value ratios, capital adequacy ratios,
liquidity ratios and regulatory reporting. More specifically, descriptive analytics can help identify the
source and type of risks affecting the banking industry such as credit risk, market risk, operational risk,
liquidity risk, etc. Analyzing historical data on overdue debts, interest rates, market fluctuations, fraud,
cyber attacks, etc., banking businesses can understand the nature and severity of these risks. and how
they impact their performance and profitability. For example, descriptive data analytics can help
banking businesses determine a borrower's probability of default (PD) and loss due to default (LGD),
based on credit history, income, collateral, etc. The use of descriptive techniques and tools, such as
dashboards, charts, graphs, tables, etc. helps banking businesses to visualize and communicate the
data results as well as their insights to stakeholders and decision makers, and facilitate identification
of gaps, challenges, trends, best practices, etc. Besides methodology Descriptive analysis methods,
exploratory analysis and confirmatory analysis are also widely applied in the banking industry. In the
banking industry, exploratory and confirmatory analytics can be applied to many different areas such
as customer segmentation, fraud detection, risk management, and marketing. Banking businesses use
heuristic and confirmatory analytics to identify the characteristics and preferences of different
customer segments, such as age, income, spending behavior, loyalty and churn rate. Exploratory
analysis may reveal that younger customers are more likely to use mobile banking and digital wallets,
while older customers prefer to visit branches and transact cash. Based on these insights, banks can
design different features and offers for each segment, such as mobile alerts, rewards programs, and
personalized offers . Furthermore, exploratory and confirmatory analytics can help track changes and
trends in customer behavior over time, such as the adoption of new technology, the impact of
25
economic conditions and competitor influence. This can help predict and respond to customer needs
and preferences in a timely and proactive manner, and can help tailor products and services to meet
individual needs and expectations. segment. Next, exploratory and confirmatory analytics are also
used in the banking industry to detect and prevent fraudulent transactions and activities, such as
identity theft, money laundering, phishing, and hacking. network. Businesses in the banking industry
are exposed to global financial criminal activity, from money laundering and market misconduct to
sanctions, terrorist financing, bribery and corruption, costs about 1.3 trillion USD per year, according
to Refinitiv Survey 2018 (signzy.com, 2022). The application of exploratory analysis along with the
application of new technologies and techniques can help the banking industry face this problem more
easily. Applying exploratory analytics can identify unusual patterns and outliers in transaction data,
such as high-frequency transactions, high volumes, or overseas locations. These may indicate possible
fraud attempts that require further investigation and verification. Additionally, exploratory and
confirmatory analytics can help build predictive models that can flag suspicious transactions and
activities in real time, based on historical data and machine learning algorithms. . This can help prevent
fraud before it happens or minimize its impact. This can help protect banks and customers from
financial loss and reputational damage as well as comply with regulatory requirements. Exploratory
analytics is also applied in the banking industry to assess and manage risk levels and performance of
bank portfolios such as credit risk, market risk, liquidity risk and operational risks. Banks using
exploratory and confirmatory analytics can measure the correlation and covariance of different assets
and liabilities in a portfolio, such as loans, deposits, securities, etc. securities and derivatives. This can
help diversify your investment portfolio and reduce overall risk levels. Furthermore, exploratory
analysis can help simulate different scenarios and stress tests that could affect portfolio performance,
such as changes in interest rates, exchange rates , inflation rates or market conditions. This can help
evaluate a portfolio's sensitivity and resilience under different circumstances.
Besides industries that carry a lot of data, the application of data analysis methods is also very
effective in the aviation industry. The airline industry is notoriously capital intensive. Aircraft
acquisitions are expensive, fleets require heavy maintenance, and everything about airline operations
- from fuel to airport parking to passengers to staff - is all subject to big cost. Obviously, any minute
lost while an aircraft is out for maintenance prevents it from becoming a revenue-generating asset.
Airline players face immense pressure arising from airline maintenance involved in maintaining
26
dispatch reliability, reducing maintenance costs and enhancing aircraft safety. MRO alone accounted
for 9% of total operating costs for airline operators globally in 2018. As airlines continuously struggle
with rising costs, reducing MRO-related costs becomes important. This is why experts are talking
about analyzing data through different methods to be able to know exactly what is happening, why it
is happening and the possible impact of any action. any event for their business (Saravanan Rajarajan,
2021). Aviation businesses use descriptive analytics to help visualize and summarize flight data, such
as flight time, distance, speed, altitude, fuel consumption, and emissions. This can help evaluate the
efficiency and environmental impact of flight operations as well as identify potential areas for
improvement. Vedant Singh & Somesh Kumar Sharma (2014) performed a descriptive analysis of flight
data from a major airline showing that a 2% reduction in airspeed can save up to 6% fuel and reduce
emissions by 18%. CO2 emissions every year. Descriptive analysis also shows that optimal flight speeds
vary depending on aircraft type, route, weather conditions and traffic congestion. Therefore, the
airline has implemented a dynamic speed management system, which adjusts flight speed according
to these factors. Descriptive analytics is also used to understand customer characteristics, such as
their demographics, preferences, and behavior. For example, American Airlines uses descriptive
analytics to segment their customers into different groups based on their loyalty status, travel
frequency, destination, and spending habits. This can help American Airlines tailor their marketing
campaigns, incentives, and rewards to each customer segment. An example of a descriptive analytics
project that American Airlines undertook was to analyze the travel patterns of their AAdvantage
members and determine the most popular routes, destinations, and seasons for each loyalty level.
set. Besides, combining data analysis methods can help airlines monitor and report the status and
condition of aircraft components such as engines, landing gear, avionics systems aviation and cabins.
A descriptive analysis of maintenance data from a regional airline shows that changing air filters every
500 hours instead of 1000 hours can prevent engine failures and save up to $1 million each year.
Descriptive analysis also shows that air filters are more likely to become clogged in certain areas due
to dust and pollution. Therefore, the company has increased the frequency of air filter replacements
in those areas (Lee K., 2021). Besides, these data analysis methods are also applied in the aviation
industry to analyze and display safety data, such as accident reports, incident reports, near-accident
situations, dangers and risks. The exploratory analysis method also helps aviation businesses analyze
opportunities or problems. American Airlines used exploratory analysis to find out what factors
27
influence customer satisfaction, loyalty, and retention. This can help American Airlines improve
service quality, customer feedback, and loyalty programs. An example of an exploratory analytics
project that American Airlines worked on was using natural language processing (NLP) to analyze
customer sentiment and review topics across social media and online platforms. This helps American
Airlines understand what customers like or don't like about their experience and how they can
improve it.
28
three distinct industries: healthcare, banking and aviation. By shedding light on how these methods
are used in real-life situations in these fields, this assignment will bridge the gap between theoretical
knowledge and practical implementation. This not only reinforces the importance of analytical
methods in decision-making but also highlights their versatility across different industries. In short,
with my assignment above, I have met the requirements of the given topic. I recommend this
assignment to get Merit score.
VII. Conclusion
Above is the main content of my report. The first part is evaluating the nature and process of
business and economic data/information from a range of different published sources including
providing information about data, information, knowledge and describing the conversion process
based on these. actual data. The next section, valuing data from a variety of sources using different
methods of analysis, provides content on applying analytical methods such as descriptive analytics,
exploratory analytics, confirmatory analytics, and describes how they are used. How to do this by
analyzing real data. The next section critically evaluates the methods of analysis used to present
business and economic data/information from a range of different published sources. And the last
part is application of analytical methods in different fields.
In essence, this assignment serves as a comprehensive guide, equipping readers with the
knowledge and skills needed to navigate the complexities of economic and business data, understand
analytical methods different and appreciate their application in important industries. The journey
from evaluating data to analyzing the strengths and weaknesses of analytical methods culminates in
a practical exploration of their use in areas important to our daily lives .
VIII. Reference
internetofwater.org (2018) What are data, information, and knowledge?, Internet of Water - Better
Water Data for Better Water Management, [online] Available at: https://internetofwater.org/valuing-
data/what-are-data-information-and-knowledge/ (Accessed March 4, 2024).
subscription.packtpub.com (2016) Practical Data Analysis - Second Edition, Practical Data Analysis -
Second Edition, [online] Available at:
https://subscription.packtpub.com/book/data/9781785289712/1/ch01lvl1sec11/data-information-
and-knowledge (Accessed March 4, 2024).
29
Calzon, B. (2023) What is data analysis? Methods, techniques, types & how-to, BI Blog | Data
Visualization & Analytics Blog | datapine, [online] Available at: https://www.datapine.com/blog/data-
analysis-methods-and-techniques/.
Adam Hayes (2023) Descriptive Statistics: Definition, Overview, Types, Example, Investopedia, [online]
Available at: https://www.investopedia.com/terms/d/descriptive_statistics.asp (Accessed March 4,
2024).
licensesoft.vn (n.d.) Tổng quan về tính toán các chỉ số thống kê đối với Hàng và Cột, Tổng quan về tính
toán các chỉ số thống kê đối với Hàng và Cột, [online] Available at: https://licensesoft.vn/tong-quan-
ve-tinh-toan-cac-chi-so-thong-ke-doi-voi-hang-va-cot.htm (Accessed March 4, 2024).
Catherine (2021) 3 Applications of Data Analytics in Healthcare, Business Insights Blog, [online]
Available at: https://online.hbs.edu/blog/post/data-analytics-in-healthcare (Accessed March 4,
2024).
comptia.org (2018) How Is Data Analytics Used in Health Care | CompTIA, CompTIA, [online] Available
at: https://www.comptia.org/content/articles/how-is-data-analytics-used-in-health-care (Accessed
March 4, 2024).
Rajarajan, S. (2021) How can data analytics help the aviation industry?, How can data analytics help
the aviation industry?, [online] Available at: https://www.ramco.com/blog/aviation/how-can-data-
help-aviation-industry.
30
Dong, Y. (2023) Descriptive Statistics and Its Applications | Highlights in Science, Engineering and
Technology, Descriptive Statistics and Its Applications | Highlights in Science, Engineering and
Technology, [online] Available at: https://drpress.org/ojs/index.php/HSET/article/view/8159.
Hayes, L. J., Adams, M. A. and Dixon, M. R. (2017) Causal constructs and conceptual confusions - The
Psychological Record, SpringerLink, [online] Available at:
https://link.springer.com/article/10.1007/BF03395214.
Komorowski, M., Marshall, D. C., Salciccioli, J. D. and Crutain, Y. (2016) Exploratory Data Analysis,
Exploratory Data Analysis | SpringerLink, [online] Available at:
https://link.springer.com/chapter/10.1007/978-3-319-43742-2_15.
Russo, 2015 (n.d.) How Much Does Your Data Exploration Overfit? Controlling Bias via Information
Usage, How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage | IEEE
Journals & Magazine | IEEE Xplore, [online] Available at:
https://ieeexplore.ieee.org/document/8861141 (Accessed March 4, 2024).
31