DATA ANALYTICS Unit III & IV
DATA ANALYTICS Unit III & IV
DATA ANALYTICS Unit III & IV
Unit III
Working with time series data and regression analysis.
1) Introduction of time series data - Time series data is a type of data that is
collected or recorded over time at regular intervals. In time series analysis, the
order of observations is crucial, as they are taken at successive points in time. This
type of data is commonly used in various fields, including finance, economics,
signal processing, environmental science, and many others
Bhise N K
DATA ANALYTICS BCS SY
❖ Pattern Identification.
❖ Visualization.
❖ Forcasting.
❖ Business Inteligence.
2) Introduction to time series forcasting - Time series forecasting is a specialized area
of predictive analytics that involves making predictions about future values based on
historical data points ordered chronologically. In a time series, each data point is associated
with a specific timestamp, and the goal is to use the patterns and trends within the
historical data to make accurate predictions for future time point.
Key Aspect of time series analysis.
❖ Trend
❖ Seasonality
Bhise N K
DATA ANALYTICS BCS SY
There are different types of moving averages, but the most common one is the
Simple Moving Average (SMA). The Simple Moving Average is calculated by
taking the average of a set of data points over a specified period and then
moving the average to the next set of data points. The formula for calculating the
Simple Moving Average for a given data set is :
SMA = Some of data points in the specified period / No. of data point in the
specified period.
2)Choose aperied - Decide on the period for your moving average. For example, if
you want a 3-period moving average, you would use the average of the first 3
data points, then the next 3, and so on.
Bhise N K
DATA ANALYTICS BCS SY
3) Calculate the moving average - Place the formula in the cell where you want
the moving average to start.
If your data is in column A and you are calculating a 3-period moving average,
and your first data point is in cell A2, the formula in cell B4 would be:
=AVERAGE(A2:A4)
Bhise N K
DATA ANALYTICS BCS SY
Y = B0 + B1X + E
The goal of simple linear regression is to estimate the values of the coefficients B0 and
B1 that minimize the sum of the squared differences between the observed values of Y
and the values predicted by the regression model. This is often done using the method
of least squares..
Bhise N K
DATA ANALYTICS BCS SY
Once the coefficients are estimated, the regression model can be used to make
predictions about the dependent variable based on values of the independent variable.
Additionally, the fit of the model can be assessed using various metrics such as the
coefficient of determination (R2) and hypothesis tests for the significance of the
coefficients.
Bhise N K
DATA ANALYTICS BCS SY
Model diagnostics and validation in Excel typically involve assessing the performance
and accuracy of a model built within Excel, such as a financial model, forecasting
model, or regression analysis. Here's a general overview of steps you can take for
diagnostics and validation:
Data Preparation:
● Ensure your data is clean, organized, and appropriately formatted.
● Split your data into training and testing sets if applicable.
Bhise N K
DATA ANALYTICS BCS SY
Model Building:
● Construct your model using Excel functions, formulas, or add-ins.
● Document your model's assumptions, methodology, and limitations.
Diagnostic Checks:
● Perform basic checks to ensure your model is functioning correctly, such
as:
● Verifying formulas and references.
● Checking for errors or inconsistencies.
● Assessing outliers or anomalies in the data.
Model Evaluation:
● Evaluate the performance of your model using appropriate metrics.
● For forecasting or regression models, consider metrics like Mean Absolute
Error (MAE), Root Mean Squared Error (RMSE), or R-squared (R²).
● For financial models, assess metrics such as Net Present Value (NPV),
Internal Rate of Return (IRR), or Payback Period.
Validation:
● Validate your model against real-world data or known outcomes.
● Compare model predictions or outputs with observed results.
● Use techniques like cross-validation if applicable.
Sensitivity Analysis:
● Conduct sensitivity analysis to understand how changes in input variables
affect model outputs.
● Use Excel's built-in tools like Data Tables or Scenario Manager for
sensitivity analysis.
Visualizations:
● Create visualizations to present your model's outputs and insights
effectively.
● Excel offers various chart types and formatting options for visual
representation.
Documentation:
● Document your findings, assumptions, methodologies, and validation
results thoroughly.
● Include notes within your Excel file or create a separate documentation
file.
Peer Review:
● Have your model reviewed by colleagues or subject matter experts to
identify potential errors or areas for improvement.
Revision and Iteration:
Bhise N K
DATA ANALYTICS BCS SY
● Based on feedback and validation results, revise and refine your model as
needed.
● Iteratively improve your model to enhance its accuracy and reliability.
Version Control:
● Maintain version control to track changes and ensure traceability of model
revisions.
Final Review and Approval:
● Conduct a final review of your model before deployment or presentation.
● Obtain necessary approvals or sign-offs from stakeholders.
Assigning the quality of a model based on R-squared, adjusted R-squared, and standard
error involves assessing how well the model fits the data and whether it provides
meaningful insights. Here's how you can interpret these metrics:
R-squared (R²):
● R-squared is a statistical measure that represents the proportion of the
variance in the dependent variable that is explained by the independent
variables in the model.
● It ranges from 0 to 1, where 1 indicates that the model explains all the
variability of the response data around its mean.
● Higher R-squared values generally indicate a better fit of the model to the
data.
● However, R-squared alone does not determine whether a model is good or
bad; it should be interpreted in conjunction with other metrics.
Interpretation:
● R-squared values closer to 1 imply that the model explains a large portion
of the variability in the data and is considered desirable.
● R-squared values closer to 0 suggest that the model does not explain
much of the variability in the data and may not be useful for prediction.
Adjusted R-squared:
● Adjusted R-squared is similar to R-squared but adjusts for the number of
predictors in the model.
● It penalizes excessive use of predictors and provides a more accurate
measure of model fit, especially when comparing models with different
numbers of predictors.
Bhise N K
DATA ANALYTICS BCS SY
● Look for higher R-squared and adjusted R-squared values, indicating better model
fit.
● Compare adjusted R-squared values across models to assess the trade-off
between model complexity and explanatory power.
● Aim for lower standard error values, indicating more accurate predictions
Bhise N K
DATA ANALYTICS BCS SY
Normality Assumption:
● Normality of residuals is essential for regression analysis. You can assess
this assumption by examining the distribution of residuals.
● After running your regression model, calculate the residuals (the
differences between the observed and predicted values).
● Use Excel to create a histogram or a Q-Q plot of the residuals to visually
inspect their distribution.
● Additionally, you can perform a formal test for normality, such as the
Shapiro-Wilk test, using Excel's statistical functions or add-ins like Real
Statistics Resource Pack.
Linearity Assumption:
● The relationship between the independent and dependent variables should
be linear. You can check this assumption by plotting the observed values
of the dependent variable against the predicted values from your
regression model.
● After running the regression, create a scatter plot in Excel with the
observed values on the y-axis and the predicted values on the x-axis.
● Ensure that the points on the scatter plot are randomly distributed around
a diagonal line, indicating linearity.
● You can also check for linearity by examining residual plots, where
residuals should be randomly distributed around zero for different values
of the independent variables.
Multicollinearity Assumption:
● Multicollinearity occurs when independent variables in a regression model
are highly correlated with each other.
● Calculate correlation coefficients between independent variables using
Excel's CORREL function.
● Alternatively, you can use Excel's Data Analysis Toolpak to perform a
correlation analysis.
● Look for high correlation coefficients (close to +1 or -1) between pairs of
independent variables, indicating potential multicollinearity issues.
● Consider using variance inflation factor (VIF) calculations to quantitatively
assess multicollinearity, which can be computed using Excel formulas
after estimating your regression model.
Homoscedasticity Assumption:
● Homoscedasticity means that the variance of the residuals is constant
across all levels of the independent variables.
● After running the regression, plot the residuals against the predicted
values or against each independent variable.
Bhise N K
DATA ANALYTICS BCS SY
● Ensure that there are no discernible patterns or trends in the residual plot,
indicating constant variance.
● You can also perform formal tests for homoscedasticity, such as the
Breusch-Pagan test or White's test, using Excel's statistical functions or
add-ins
Cross-validation and model selection testing are both important techniques used in
machine learning to evaluate and select the best-performing model for a given dataset.
Cross-validation:
● Cross-validation is a resampling technique used to assess how well a
model generalizes to an independent dataset.
● The basic idea is to partition the dataset into multiple subsets or folds.
The model is trained on a portion of the data and validated on the
remaining portion.
● Common types of cross-validation include k-fold cross-validation,
stratified k-fold cross-validation, leave-one-out cross-validation (LOOCV),
etc.
● By repeating this process with different partitions of the data, we can
obtain multiple estimates of model performance. The final performance
metric is often computed as the average across all folds.
Model selection testing:
● Model selection refers to the process of choosing the best model or
algorithm from a set of candidate models.
● Model selection testing involves evaluating different models using a
performance metric and selecting the one that performs best on unseen
data.
● This process typically involves comparing the performance of models
using techniques such as cross-validation, holdout validation, or other
validation strategies.
● Performance metrics used for model selection testing depend on the
problem at hand but often include accuracy, precision, recall, F1 score,
ROC AUC, etc.
Bhise N K
DATA ANALYTICS BCS SY
model candidate is trained and evaluated using cross-validation, and the model with the
best average performance across the folds is selected as the final model. Additionally,
variables and the dependent variable is not linear. In these cases, the relationship may
models can capture more complex patterns and relationships in the data compared to
Model Representation:
y = f(x, β) + ε
Bhise N K
DATA ANALYTICS BCS SY
● Where:
Model Fitting:
optimization methods.
Model Evaluation:
model fits the data and how well it generalizes to unseen data.
Bhise N K
DATA ANALYTICS BCS SY
(MAE), etc.
Applications:
complex relationships in the data and are valuable tools for modeling real-world
evaluation techniques.
add-in.
Implementing a non-linear regression model in Excel using the Solver Add-In involves
fitting a curve to data points by minimizing the sum of squared differences between the
Organize your data: Have your independent variable (X) in one column and your
Bhise N K
DATA ANALYTICS BCS SY
Choose a Model: Decide on the type of non-linear model you want to fit to your
Initial Guess: Provide initial guesses for the parameters A and B. You can either
Set up the Model in Excel: In another column, calculate the predicted Y values
Sum of Squared Residuals (SSR): Square each residual and sum them up. This is
Use Solver Add-In: Go to the "Data" tab, click on "Solver" (if you haven't installed it
yet, you may need to add it from Excel Add-Ins), and set up Solver to minimize the
Run Solver: Click Solve, and Solver will try different values of A and B to minimize
the SSR.
Analyze Results: Once Solver converges, you'll get the optimal values of
parameters A and B.
Bhise N K
DATA ANALYTICS BCS SY
Let's say your data is in columns A and B, with X values in column A and Y values in
column B.
In another column, calculate predicted Y values using the formula Y = A * EXP(B * X).
Run Solver, and it will find the optimal values for A and B.
This approach can be generalized to any non-linear model. Just replace the model
formula with the one you want to fit.
components, typically trend, seasonal, and irregular components. While Excel doesn't
have a built-in function specifically for time series decomposition, you can use some of
One common method for time series decomposition is the classical decomposition
Bhise N K
DATA ANALYTICS BCS SY
Here's a general guide on how to perform time series decomposition in Excel using
these steps:
Import Your Time Series Data: Input your time series data into Excel. Typically,
you'll have two columns: one for the dates (time) and another for the
corresponding values.
Estimate the Trend: You can use various methods to estimate the trend, such as
moving averages or linear regression. For instance, you could calculate a moving
average over a certain window of time to smooth out fluctuations and estimate
the trend.
Seasonal Adjustment: To adjust for seasonal effects, you'll need to calculate
seasonal indices. One simple method is to calculate the average value of the
time series for each season (e.g., each month or each quarter) and then calculate
seasonal indices by dividing each observed value by the corresponding seasonal
average. Subtracting these seasonal indices from the original values gives you
the seasonally adjusted series.
Residual Calculation: Once you have estimated the trend and adjusted for
seasonal effects, you can calculate the residuals (irregular component) as the
difference between the original values and the sum of the trend and seasonal
components.
Bhise N K
DATA ANALYTICS BCS SY
Visualization and Analysis: Plot the original time series data along with the
estimated trend, seasonal, and irregular components to visualize the
decomposition. You can use Excel charting features for this.
laborious, it is possible with careful use of formulas and data manipulation techniques.
languages like Python or R that have built-in functions and libraries for time series
Trend:
● Definition: The long-term movement or direction of the data over
time. It represents the underlying pattern in the data that persists
over a long period.
● Characteristics:
● Trends can be increasing, decreasing, or stable over time.
● They reflect changes due to underlying factors such as
population growth, economic cycles, technological
advancements, etc.
● Identification:
● Visual inspection of the time series plot.
Bhise N K
DATA ANALYTICS BCS SY
Decomposing time series data in Excel using moving averages and seasonal indices
involves estimating the trend and seasonal components separately. Here's how you can
do it step by step:
Import Your Time Series Data: Input your time series data into Excel. You should
have two columns: one for the dates (time) and another for the corresponding
values.
Calculate Moving Averages for Trend Estimation:
Bhise N K
DATA ANALYTICS BCS SY
● Choose a window size for your moving average. The window size
determines how many consecutive data points are averaged.
● In a new column, calculate the moving average for each data point using
Excel's AVERAGE function combined with relative cell references. For
example, if your time series values are in column B starting from B2, and
you've chosen a window size of 5, in cell C3, you would input
=AVERAGE(B2:B6) and drag this formula down to calculate moving averages
for all data points.
Calculate Seasonal Indices:
● Determine the periodicity of your seasonal component (e.g., monthly,
quarterly).
● Calculate the average value for each season. For instance, if you have
monthly data, calculate the average value for each month across all years.
● Divide each observed value by the corresponding seasonal average to
obtain seasonal indices.
● In Excel, you can calculate these seasonal averages manually or use
functions like AVERAGEIFS or PivotTables.
● Once you have the seasonal indices, expand them to match the length of
your time series data.
Calculate Seasonally Adjusted Values:
● Divide the original time series values by the seasonal indices to obtain
seasonally adjusted values. You can do this in a new column.
● This step removes the seasonal component from the original data, leaving
the trend and irregular components.
Calculate Residuals (Irregular Component):
● Subtract the trend (moving averages) from the seasonally adjusted values
to obtain residuals.
● Residuals represent the irregular component of the time series data.
Visualize the Components:
Bhise N K
DATA ANALYTICS BCS SY
● Plot the original time series data, moving averages (trend), seasonal
indices, and residuals to visualize how each component contributes to the
overall series.
● Excel's charting features can be used for this purpose.
By following these steps, you can decompose your time series data into its trend,
seasonal, and irregular components using moving averages and seasonal indices in
Bhise N K
DATA ANALYTICS BCS SY
and models. Here are some advanced techniques commonly used for time series
forecasting:
Bhise N K
DATA ANALYTICS BCS SY
Bhise N K
DATA ANALYTICS BCS SY
When applying advanced time series forecasting techniques, it's essential to evaluate
model performance using appropriate metrics and consider factors such as data
quality, seasonality, trend patterns, and the forecasting horizon. Additionally, model
taken into account when selecting the most suitable technique for a particular
forecasting task.
Yt=c+ϕ1Yt−1+ϕ2Yt−2+⋯+ϕpYt−p+εt
Where :
The AR model captures the linear relationship between the current value of the
time series and its past values.
Bhise N K
DATA ANALYTICS BCS SY
The Moving Average (MA) model is another time series forecasting technique
that predicts future values based on the weighted sum of past prediction errors.
In an MA model of order q (denoted as a MA(q)) the current value of Yt is
modeled as a function of the q most recent predictors errors.
Yt=μ+θ1εt−1+θ2εt−2+⋯+θqεt−q+εt
Where:
The MA model captures the dependence between the current value of the time series
and the residual errors from previous predictions.
using custom formulas alone can be quite challenging due to the complexity of these
models. However, you can use Excel in conjunction with add-ins or external tools to
perform advanced forecasting. One such popular add-in for Excel is the "Solver" add-in,
which can be used to optimize parameters for simpler models like exponential
Here's a general approach using Solver and an external tool like R or Python for ARIMA
forecasting:
Bhise N K
DATA ANALYTICS BCS SY
This approach leverages the strengths of both Excel and external tools like R or Python
to perform advanced time series forecasting. While Excel may not be suitable for
Bhise N K
DATA ANALYTICS BCS SY
directly implementing complex forecasting models, it can still be a valuable tool for data
software.
Bhise N K
DATA ANALYTICS BCS SY
Unit IV
hypotheses about a population parameter: the null hypothesis (H0) and the alternative
Null Hypothesis (H0): This hypothesis typically represents the status quo or the
belief that there is no effect or no difference. It is the hypothesis that researchers
aim to test against.
Alternative Hypothesis (H1 or HA): This hypothesis represents the opposite of
the null hypothesis, suggesting that there is an effect or a difference.
framework for making decisions and drawing conclusions based on data. Here's why it's
important:
from the sample data is strong enough to reject the null hypothesis in favor of
the alternative hypothesis.
Scientific Rigor : In scientific research, hypothesis testing helps ensure that
conclusions drawn from data are based on sound statistical principles rather
than intuition or anecdotal evidence.
Inference about Populations: By analyzing sample data and drawing conclusions
about population parameters, hypothesis testing allows researchers to make
generalizations and infer about broader populations from which the samples
were drawn.
Control of Type I and Type II Errors: Hypothesis testing helps control the risks of
making Type I errors (rejecting a true null hypothesis) and Type II errors (failing to
reject a false null hypothesis). By setting a significance level (alpha), researchers
can control the probability of making Type I errors.
Comparison of Models or Interventions: Hypothesis testing is often used to
compare different models, treatments, or interventions to determine which one is
more effective or whether there is a significant difference between them.
Validation of Theories: Hypothesis testing provides a means to validate or refute
theories by subjecting them to empirical scrutiny.
systematic framework for making decisions and drawing conclusions based on data,
and more.
Bhise N K
DATA ANALYTICS BCS SY
testing, as they form the foundation for making statistical decisions based on sample
effect. It often represents the status quo or the default assumption. In hypothesis
Example:
● H0: The mean exam scores of students who received tutoring are equal to the
mean exam scores of students who did not receive tutoring.
Example:
● HA: The mean exam scores of students who received tutoring are not equal to
the mean exam scores of students who did not receive tutoring.
Bhise N K
DATA ANALYTICS BCS SY
● The alternative hypothesis contradicts the null hypothesis and suggests the
presence of an effect or a difference.
● In most cases, the objective of hypothesis testing is to assess the evidence
provided by sample data and decide whether there is sufficient evidence to reject
the null hypothesis in favor of the alternative hypothesis.
Understanding and properly formulating null and alternative hypotheses is essential for
statistical evidence.
Bhise N K
DATA ANALYTICS BCS SY
depending on the nature of the alternative hypothesis. Let's explore each type:
One-Tailed Test:
In a one-tailed test, the alternative hypothesis specifies the direction of the difference or effect
relative to the null hypothesis. It focuses on testing whether a parameter is either greater than or less
Two-Tailed Test:
In a two-tailed test, the alternative hypothesis does not specify the direction of the difference or
effect relative to the null hypothesis. It focuses on testing whether a parameter is different (either
Bhise N K
DATA ANALYTICS BCS SY
Two-Tailed Test:
Two-Tailed Test:
● The alternative hypothesis suggests that the parameter is simply different from the
null hypothesis value, without specifying the direction.
● This type of test is used when researchers are interested in determining if there is a
significant difference, regardless of whether it is an increase or decrease.
● Example: Testing whether there is a significant difference in average response times
between two customer service methods.
Key Differences:
● One-tailed tests are more sensitive to detecting effects in one specific direction, while
two-tailed tests are sensitive to detecting effects in either direction.
● The choice between one-tailed and two-tailed tests depends on the research question and
the specific hypothesis being tested.
● One-tailed tests may be appropriate when there is a strong theoretical basis for predicting
the direction of the effect, while two-tailed tests are more conservative and are used when
the direction is uncertain or when researchers want to detect any significant difference.
Understanding the distinction between one-tailed and two-tailed tests is crucial for selecting the
appropriate hypothesis test and interpreting the results accurately based on the research objectives
and hypotheses.
2. Confidence Intervals.
In Excel, you can calculate confidence intervals for population means using the
confidence function. This function returns the confidence interval for a population
mean, given a sample mean, the standard deviation of the population, and the size of
the sample. Here's how you can use it:
Bhise N K
DATA ANALYTICS BCS SY
● Alpha - The significance level (or confidence level) which corresponds to the
probability that the calculated confidence interval contains the true population
parameter. It ranges from 0 to 1. For example, for a 95% confidence level, alpha
would be 0.05.
● standard_dev - The standard deviation of the population.
● size - The size of the sample.
Let's say you have a sample of exam scores with a mean of 80, a standard
deviation of 10, and a sample size of 30. To calculate the 95% confidence interval
for the population mean:
This will return the margin of error (half-width of the confidence interval). To get the
upper and lower bounds of the confidence interval, you would add and subtract this
margin of error from the sample mean.
You can also use the CONFIDENCE.NORM function or CONFIDENCE.T function if you know
the population standard deviation or have a sample size smaller than 30 and want to
use a t-distribution instead of a normal distribution.
Bhise N K
DATA ANALYTICS BCS SY
Remember, these functions help you calculate the confidence interval for the population
mean. If you want to calculate confidence intervals for other parameters or perform
different types of statistical analysis, Excel offers various other functions and tools.
1. CONFIDENCE Function:
The CONFIDENCE function in Excel is used to calculate the confidence interval for a
population mean, given a sample mean, standard deviation, and sample size. It
assumes a normal distribution.
● Alpha - The significance level (or confidence level) which corresponds to the
probability that the calculated confidence interval contains the true population
parameter. It ranges from 0 to 1.
● standard_dev - The standard deviation of the population.
● size - The size of the sample.
Bhise N K
DATA ANALYTICS BCS SY
Interpretation:
● Higher confidence levels correspond to wider confidence intervals. For example,
a 95% confidence level implies that if the sampling process were repeated many
times, approximately 95% of the calculated confidence intervals would contain
the true population parameter.
● A lower confidence level, such as 90%, would yield a narrower confidence interval
but with less certainty that it contains the true population parameter.
2. CONFIDENCE.NORM Function:
The CONFIDENCE.NORM function calculates the confidence interval for a population mean
when the population standard deviation is known.
3. CONFIDENCE.T Function:
The CONFIDENCE.T function calculates the confidence interval for a population mean
when the population standard deviation is unknown and the sample size is small
(typically less than 30).
Example Interpretation:
Bhise N K
DATA ANALYTICS BCS SY
Let's say you calculate a 95% confidence interval for the population mean using
CONFIDENCE. If the interval is (72, 88), it means that you are 95% confident that the true
In Excel, you can perform both Z-tests and T-tests to compare sample means or
proportions to population parameters or to compare two sample means. Here's an
overview of each:
Z-Test:
A Z-test is used when the population standard deviation is known, or the sample size is
large (typically n > 30). It is commonly used to compare a sample mean to a known
population mean or to test the proportion of successes in a sample against a known
population proportion.
Bhise N K
DATA ANALYTICS BCS SY
In Excel, you can perform a Z-test for means using the Z.TEST function and for
proportions using the Z.TEST function as well.
Z.TEST(array, sigma)
T-Test:
A T-test is used when the population standard deviation is unknown and the sample size
is small (typically n < 30). It is used to compare two sample means or to compare a
sample mean to a known population mean.
In Excel, you can perform different types of T-tests depending on whether the data is
from paired or unpaired samples and whether the variances of the samples are
assumed to be equal or not.
Bhise N K
DATA ANALYTICS BCS SY
Example:
Let's say you have two samples, A1:A10 and B1:B10, and you want to perform a
two-sample T-test assuming equal variances:
=T.TEST(A1:A10, B1:B10, 2, 2)
This formula will return the probability associated with a two-tailed Student's t-test.
Remember to adjust the inputs based on your specific data and hypothesis testing
requirements.
3.1 performing one sample two sample and paied t test in excel
using data analysis in toolpak :-
( https://www.geeksforgeeks.org/paired-sample-t-test-in-excel/
)
Bhise N K
DATA ANALYTICS BCS SY
Example:
Let's say you have observed and expected frequencies in cells A2:A5 and B2:B5,
respectively. To perform the Chi-squared test, you would use:
=CHISQ.TEST(A2:A5, B2:B5)
This function returns the p-value associated with the Chi-squared test, allowing you to
assess the significance of the association.
Bhise N K
DATA ANALYTICS BCS SY
Example:
Suppose you have data in columns A and B, where column A contains the group labels
and column B contains the corresponding values. To perform ANOVA, select a range
including both columns and use:
=ANOVA(B:B, A:A)
This function returns the F-statistic and associated p-value, allowing you to assess the
significance of differences between group means.
Notes:
● Ensure your data is properly organized in Excel before applying these functions.
● Interpretation of results involves comparing the p-value to a chosen significance
level (e.g., 0.05). If the p-value is less than the significance level, you reject the
null hypothesis and conclude there is a significant difference.
Bhise N K
DATA ANALYTICS BCS SY
them. It's a powerful tool for analyzing the effects of categorical independent variables
Key Concepts:
Independent and Dependent Variables:
● ANOVA involves one dependent variable (continuous) and one or more
independent variables (categorical) with two or more levels or groups.
Null and Alternative Hypotheses:
● Null Hypothesis (H0): There is no significant difference in means between
groups.
● Alternative Hypothesis (HA): At least one group mean is significantly
different from others.
F-Statistic:
● ANOVA calculates the F-statistic, which represents the ratio of the
variance between groups to the variance within groups. If the ratio is
sufficiently large, it suggests that the group means are significantly
different.
Bhise N K
DATA ANALYTICS BCS SY
Assumptions:
● Independence: Observations within each group are independent.
● Normality: The dependent variable follows a normal distribution within
each group.
● Homogeneity of Variance: The variance of the dependent variable is equal
across all groups.
Types of ANOVA:
One-Way ANOVA: Compares means across two or more independent groups or
levels of a single categorical variable.
Two-Way ANOVA: Examines the effects of two categorical independent variables
(factors) on a continuous dependent variable, allowing for the analysis of main
effects and interaction effects.
Bhise N K
DATA ANALYTICS BCS SY
Interpretation:
● A significant result in ANOVA indicates that at least one group mean is different
from the others, but it doesn't specify which groups are different.
● Post-hoc tests, such as Tukey's HSD or Bonferroni correction, can be used to
identify specific group differences after obtaining a significant ANOVA result.
ANOVA is widely used in various fields, including experimental research, social sciences,
medicine, and business, to compare means across multiple groups and understand the
into group differences and helps make informed decisions based on statistical
evidence.
4.3 conducting one way and two way ANOVA in excel using data
analysis toolpak -
between them. It's particularly useful when you have categorical independent variables
Bhise N K
DATA ANALYTICS BCS SY
and continuous dependent variables. ANOVA assesses whether the means of different
groups are equal or not by examining the variance within groups and between groups.
Bhise N K
DATA ANALYTICS BCS SY
Choose "ANOVA: Single Factor" for one-way ANOVA or "ANOVA: Two-Factor with
Replication" for two-way ANOVA.
In the dialog box that appears:
● Select the input range for the data.
● Choose whether your data has labels in the first row.
● Specify where you want the output to be placed.
Click "OK" to run the analysis.
Review the output:
● The output will include various statistics, including the F-statistic, p-value,
and means for each group.
● Examine the p-value to determine if there are statistically significant
differences between groups. A p-value less than your chosen significance
level (e.g., 0.05) indicates significance.
Notes:
● Ensure your data meets the assumptions of ANOVA, including normality of
residuals and homogeneity of variances.
● Interpret the results carefully, considering both statistical significance and
practical significance.
● ANOVA does not identify which specific groups differ from each other. Post-hoc
tests (e.g., Tukey's HSD test) can be used for pairwise comparisons if ANOVA
results are significant.
By following these steps, you can conduct one-way or two-way ANOVA in Excel using
the Data Analysis ToolPak, allowing you to compare multiple means and assess the
Bhise N K
DATA ANALYTICS BCS SY
tools for users. It's particularly useful for tasks such as statistical analysis, financial
features:
Installation: The Analysis ToolPak is not enabled by default in Excel. To use it,
you need to enable it first. You can do this by going to the "File" menu, selecting
"Options," then choosing "Add-Ins." From there, you can select "Analysis ToolPak"
and click "Go" to enable it.
Descriptive Statistics: The ToolPak includes functions for calculating various
descriptive statistics such as mean, median, mode, standard deviation, variance,
and quartiles. These functions are handy for summarizing and understanding the
characteristics of your data.
Histograms: You can use the ToolPak to create histograms, which are graphical
representations of the distribution of data. Histograms are useful for visualizing
the frequency distribution of a dataset.
Regression Analysis: The ToolPak provides tools for performing regression
analysis, including linear regression, multiple regression, and exponential
regression. These tools allow you to analyze the relationship between variables
and make predictions based on the data.
Bhise N K
DATA ANALYTICS BCS SY
Overall, the Analysis ToolPak is a valuable resource for users who need to perform
advanced data analysis tasks within Excel. Whether you're a student, researcher, or
business professional, the ToolPak can help you gain insights from your data and make
informed decisions.
Bhise N K
DATA ANALYTICS BCS SY
T-Test:
● To perform a T-Test using the Analysis ToolPak in Excel, first, ensure that
● Once enabled, you can find the T-Test tool under the "Data Analysis" tab.
● Input the ranges of your data for the two samples and other required
parameters.
● Excel will then calculate the T-Test statistics and provide results including
● Similar to the T-Test, you can find the ANOVA tool under the "Data
● Input the ranges of your data and select the appropriate options.
● Excel will compute the ANOVA table, which includes the sum of squares,
Bhise N K
DATA ANALYTICS BCS SY
Correlation:
● Input the range of your data, select the appropriate options, and specify
Regression:
standard errors, t-values, and p-values for each predictor variable, as well
These steps provide a basic outline of how to use the Analysis ToolPak in Excel for
various statistical analyses. Depending on your specific dataset and research questions,
you may need to adjust the settings and interpret the results accordingly.
Bhise N K
DATA ANALYTICS BCS SY
Excel and available as a standalone add-in for Excel 2010 and 2013. It's also integrated
into other Microsoft products such as Power BI and SQL Server Integration Services
(SSIS). Here's an overview of how you can use Power Query for data transformation in
Excel:
Getting Data: The first step in using Power Query is to import your data into
Excel. You can do this by going to the "Data" tab and selecting "Get Data" or
including Excel tables, CSV files, text files, databases, and online sources like
Data Cleaning and Transformation: Once you've imported your data, you can use
Power Query to clean and transform it as needed. This includes tasks such as:
● Removing duplicates
● Renaming columns
Bhise N K
DATA ANALYTICS BCS SY
Query Editor: Power Query provides a user-friendly interface called the Query
Editor for performing these data transformation tasks. The Query Editor allows
modified or removed.
Merging and Appending Data: Power Query allows you to merge multiple tables
or append data from different sources into a single table. This is useful for
combining data from multiple sources or consolidating data from different files
or worksheets.
Query Parameters and Functions: Power Query supports the use of query
parameters and functions, which allow you to create reusable queries and
dynamically adjust filter criteria, file paths, or connection strings, while functions
Data Load Options: Once you've transformed your data in Power Query, you can
choose how to load it into Excel. Options include loading the data to a new
worksheet, existing worksheet, or data model. You can also choose to refresh the
time.
Overall, Power Query provides a flexible and intuitive toolset for cleaning, transforming,
and reshaping data in Excel, making it easier to prepare data for analysis, reporting, and
visualization. It's particularly useful for working with large datasets or data from
Bhise N K
DATA ANALYTICS BCS SY
multiple sources, as it allows you to automate and streamline the data preparation
process.
Data Cleaning and Preparation: Power Query is commonly used to clean and
prepare raw data for analysis. This includes tasks such as removing duplicates,
format.
Data Integration: Power Query can be used to integrate data from multiple
sources into a single dataset. This is particularly useful in scenarios where data
Data Transformation: Power Query provides powerful tools for transforming and
reshaping data. Users can perform tasks such as splitting columns, merging
tables, unpivoting data, and creating calculated columns using a simple and
intuitive interface.
Data Enrichment: Power Query enables users to enrich their datasets by pulling
in additional data from external sources. This could include adding demographic
Bhise N K
DATA ANALYTICS BCS SY
and summarize data to create summary reports or dashboards. Users can group
with Excel's reporting features to create dynamic and interactive reports. Users
can use Power Query to prepare and clean data before feeding it into pivot tables,
Data Exploration and Analysis: Power Query can be used for ad-hoc data
exploration and analysis. Users can quickly filter, sort, and manipulate data to
Data Refresh and Automation: Power Query enables users to automate data
refresh processes, ensuring that their reports and dashboards are always
up-to-date with the latest data. Users can schedule data refreshes or set up
Overall, Power Query is a versatile tool that can streamline and automate various
data-related tasks, from data cleaning and integration to analysis and reporting, making
Bhise N K
DATA ANALYTICS BCS SY
data analysts and data scientists working with Microsoft Excel or Power BI. Power
Import Data:
● Go to the "Data" tab and select "Get Data" or "Get & Transform Data"
● Choose the data source you want to import from (e.g., Excel workbook,
● Connect to your data source and load the data into the Power Query Editor.
Clean Data:
Transform Data:
concatenations, etc.).
Bhise N K
DATA ANALYTICS BCS SY
● Split columns if needed (e.g., splitting full names into first and last
names).
● Use formulas and functions available in the Power Query Editor to perform
Data Load:
● Once you're satisfied with your data cleaning and transformations, you can
query.
Data Refresh:
including a user-friendly interface for most operations, formula language (M), and
Remember, the exact steps might vary slightly depending on your version of Excel or
Power BI, but the general process remains the same. Practice and exploration of Power
Bhise N K
DATA ANALYTICS BCS SY
Query features will help you become proficient in data import, cleaning, and
transformation tasks.
multiple sources -
Merging and appending queries are essential operations in Power Query for combining
data from multiple sources. Here's how you can perform these operations:
Merging Queries:
● Merging is used when you have two or more tables with related
● To merge queries:
● Select the second query you want to merge with and choose the
● Choose the type of join (e.g., inner, left outer, right outer, full outer)
● Click OK, and the merged data will be displayed in the Power Query
Editor.
Bhise N K
DATA ANALYTICS BCS SY
Appending Queries:
● Appending is used when you have similar data structures from different
sources, and you want to stack them on top of each other to create a
single table.
● To append queries:
● Click OK, and the data from the selected queries will be appended
Advanced Options:
● You can also perform more advanced merges and appends using the
"Merge Queries" and "Append Queries" options with additional options like
● For more complex scenarios, you might need to use the "Advanced Editor"
Data Validation:
data to ensure that it meets your expectations. Check for any unexpected
● Once you've merged or appended your queries and verified the data, you
can choose to load it back into Excel or Power BI, or you can close and
Bhise N K
DATA ANALYTICS BCS SY
By mastering merging and appending operations in Power Query, you can efficiently
combine data from multiple sources to perform comprehensive analyses. Practice and
experimentation will help you become proficient in handling various data integration
scenarios.
Excel's capabilities to handle larger volumes of data and perform more complex
analyses than traditional spreadsheets. Here's an introduction to Power Pivot for data
modeling in Excel:
allowing you to bring together data from multiple sources, relate them, and
(VertiPaq) that allows for fast calculations and analysis of large datasets directly
within Excel.
● Advanced Calculations: With Power Pivot, you can create advanced calculations
Bhise N K
DATA ANALYTICS BCS SY
● Integration with Power Query: Power Pivot works seamlessly with Power Query,
allowing you to import, clean, and transform data from various sources before
Data Modeling:
datasets effectively.
Data Analysis:
multidimensional analysis.
● Slice and dice your data dynamically to explore different perspectives and
Performance Optimization:
calculations efficiently.
Data Visualization:
Bhise N K
DATA ANALYTICS BCS SY
● Visualize your data using a variety of chart types and formatting options
available in Excel.
effectively.
Excel worksheets.
In summary, Power Pivot empowers Excel users to create robust data models, perform
advanced analytics, and gain valuable insights from their data, all within the familiar
Excel environment. With its intuitive interface and powerful features, Power Pivot is a
Bhise N K
DATA ANALYTICS BCS SY
1. Importing Data:
Open Power Pivot: In Excel, go to the "Power Pivot" tab and click on "Manage" to
Import Data: In the Power Pivot window, click on "From Other Sources" to import
data from various sources such as Excel tables, SQL databases, text files, etc.
Transform Data (Optional): You can use Power Query within Power Pivot to clean
and transform your data before loading it into the data model.
Load Data: After importing and transforming your data, click on "Close & Load" to
2. Defining Relationships:
Create Relationships: In the Power Pivot window, go to the "Home" tab and click
on "Diagram View." Here, you can visually define relationships between the tables
Bhise N K
DATA ANALYTICS BCS SY
tab and clicking on "Manage Relationships." Here, you can edit existing
Calculated Columns: In the Power Pivot window, select a table and click on "New
Column" in the "Columns" tab. You can then define a calculated column using
specified criteria. Use DAX formulas to define measures such as sums, averages,
counts, etc.
4. Optimizing Performance:
calculated columns and measures that consume memory. Use measures instead
Data Model Properties: In the Power Pivot window, go to the "Advanced" tab to
access data model properties. Here, you can optimize memory settings, set
Bhise N K
DATA ANALYTICS BCS SY
5. Refreshing Data:
Manual Refresh: You can manually refresh the data model by clicking on
"Refresh" in the Power Pivot window. This reloads data from the original data
Automatic Refresh: Configure automatic data refresh options to keep your data
and relationships are functioning as expected by testing them with sample data.
Data Validation: Validate the accuracy and integrity of your data model by
Once your data model is created and validated, you can deploy it for use by yourself or
others in your organization. You can share Excel workbooks containing the Power Pivot
By following these steps, you can create and manage a data model using Power Pivot in
Excel, enabling you to perform advanced data analysis and gain valuable insights from
your data.
Bhise N K
DATA ANALYTICS BCS SY
7.3 Working with calculated column and measure using DAX ( Data
Analysis Expression ) -
Working with calculated columns and measures using Data Analysis Expressions (DAX)
in Power Pivot allows you to perform advanced calculations and aggregations on your
data. Here's a guide on how to create and use calculated columns and measures using
DAX:
Calculated Columns:
Create Calculated Column:
● In Power Pivot, select the table to which you want to add a calculated
column.
● Go to the "Power Pivot" tab, click on "Add Column," and then select
"Calculated Column."
● A new column will appear in the table, and the formula bar will allow you to
● Write your DAX formula in the formula bar. For example, to concatenate
Bhise N K
DATA ANALYTICS BCS SY
expectations.
Measures:
Create Measure:
● In Power Pivot, go to the "Power Pivot" tab and click on "New Measure" in
● The formula bar will appear, allowing you to enter a DAX formula for your
measure.
● Write your DAX formula in the formula bar. For example, to calculate the
Aggregate Functions:
● Use DAX aggregate functions like SUM, AVERAGE, COUNT, MIN, MAX, etc.,
Contextual Functions:
Bhise N K
DATA ANALYTICS BCS SY
● Leverage DAX functions that work within the context of PivotTables and
Formatting:
Best Practices:
● Performance Optimization: Use measures instead of calculated columns
● Documentation: Document your DAX formulas and provide clear descriptions for
By mastering calculated columns and measures using DAX in Power Pivot, you can
perform sophisticated data analysis and derive valuable insights from your data models
in Excel.
Bhise N K
DATA ANALYTICS BCS SY
tool available in Microsoft Excel. It allows users to plot data on a 3D globe or custom
map, providing interactive and visually compelling ways to analyze geographic and
time-based data. Here's an introduction to Power Map for geospatial data visualization:
● Users can navigate, rotate, and zoom in/out of the map to explore data
● Users can also visualize time-based data over time by animating data
● Power Map allows users to plot multiple data points on the map and
● Users can interactively filter, drill down, and slice data to gain deeper
Bhise N K
DATA ANALYTICS BCS SY
markers, themes, and map styles to better represent their data and
visualization.
● Users can easily create, edit, and update maps directly within Excel
● To use Power Map, ensure that you have the appropriate version of Excel
(2013 or later) and that the Power Map add-in is enabled. You can enable
Prepare Data:
● Power Map works best with tabular data containing columns for location
● In Excel, go to the "Insert" tab and click on "Map" (or "3D Map" in older
Bhise N K
DATA ANALYTICS BCS SY
● Import your data into Power Map by selecting the appropriate Excel table
Map Configuration:
● Configure the map by selecting the geographic fields for location plotting
● Explore your data on the map by navigating, rotating, and zooming in/out
● Use interactive features such as filtering, drilling down, and animating data
or reports effectively.
Bhise N K
DATA ANALYTICS BCS SY
In summary, Power Map is a valuable tool for geospatial data visualization in Excel,
geographical datasets effectively. With its intuitive interface and powerful features,
Power Map enables users to explore, analyze, and present their data in visually
compelling ways.
power map -
Creating interactive 3D geospatial visualizations using Power Map in Excel allows you to
explore and analyze geographic data in a dynamic and immersive way. Here's a
Power Map:
Organize Data: Ensure that your data is organized in Excel with columns
Clean and Format Data: Clean up any inconsistencies or errors in your data and
Bhise N K
DATA ANALYTICS BCS SY
Open Excel: Launch Microsoft Excel and open the workbook containing your
data.
Enable Power Map Add-In: If you haven't already enabled the Power Map add-in,
go to the "File" menu, select "Options," then choose "Add-Ins." From the "Manage"
dropdown, select "COM Add-ins," then click "Go" and check the "Microsoft Power
Insert Power Map: With your data selected, go to the "Insert" tab in Excel and
click on "3D Map" (or "Map" depending on your Excel version) in the "Tours"
group.
Launch Power Map: Power Map will launch, and a new window will appear
Select Data: In the Power Map window, click on "Home" and then click on "New
Tour."
Select Data: Select the Excel table or range containing your data. Power Map will
Choose Geography: Select the geographic fields you want to visualize on the
Bhise N K
DATA ANALYTICS BCS SY
Choose Data: Select the numeric fields you want to represent visually on the map,
colors, markers, themes, and map styles to enhance visual clarity and appeal.
Navigate: Use the navigation controls to pan, zoom, and rotate the map to
Filter Data: Apply filters to focus on specific subsets of your data or to analyze
Analyze Trends: Use the timeline slider to visualize time-based data and analyze
Record Tour: Click on the "Tours" tab and then click on "New Scene" to start
recording a tour. Navigate around the map to capture different viewpoints and
insights.
Play Tour: Click on "Play Tour" to play back your tour and showcase your
Save Tour: Save your Power Moap tour within your Excel workbook for future
reference or sharing.
Bhise N K
DATA ANALYTICS BCS SY
Share Workbook: Share your Excel workbook containing the Power Map
visualization with others, ensuring they have the Power Map add-in enabled to
By following these steps, you can create interactive 3D geospatial visualizations using
Power Map in Excel, enabling you to explore and analyze geographic data in a dynamic
this:
Select Data:
● Once the map chart is inserted, you can select the data you want to
Bhise N K
DATA ANALYTICS BCS SY
● In the Format Map pane, you can customize various aspects such as map
● Choose the type of chart you want to create from the Charts group.
Select Data:
Bhise N K
DATA ANALYTICS BCS SY
legend, etc.
● Use the Chart Elements button (plus sign icon) that appears next to the
chart to add or remove chart elements like axes, titles, data labels, and
gridlines.
● Explore various chart styles and themes available in the Chart Styles and
your chart.
By following these steps, you can effectively customize map layers, chart types, and
visual elements in Excel to create visually appealing and insightful data presentations.
Bhise N K
DATA ANALYTICS BCS SY
Hypothesis testing is a fundamental statistical technique used to make inferences about population
parameters based on sample data. While traditional hypothesis tests like t-tests and ANOVA are
widely used, there are advanced techniques available for situations where the assumptions of these
tests are violated or when dealing with non-parametric data. Three such advanced techniques are
F Test:
● The F test is a statistical test used to compare the variances of two or more
populations.
● The null hypothesis of the F test states that the variances of the populations are
equal.
● The F test calculates the ratio of variances between groups to variances within
groups. If this ratio is sufficiently large, it suggests that the variances between
● The F test is sensitive to departures from normality and can be used as a robust
Mann-Whitney U Test:
Bhise N K
DATA ANALYTICS BCS SY
● The Mann-Whitney U test, also known as the Wilcoxon rank-sum test, is a
● It is used when the assumptions of parametric tests like the t-test cannot be met,
such as when the data are not normally distributed or when dealing with ordinal or
ranked data.
● The Mann-Whitney U test ranks all the observations from both samples together, then
● The null hypothesis of the Mann-Whitney U test states that there is no difference
● If the p-value calculated from the test is less than the chosen significance level, the
Kruskal-Wallis Test:
● The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA, used to
● The Kruskal-Wallis test ranks all the observations from all groups together, then
● The null hypothesis of the Kruskal-Wallis test states that there is no difference
● If the p-value calculated from the test is less than the chosen significance level, the
the groups.
In summary, the F test, Mann-Whitney U test, and Kruskal-Wallis test are advanced hypothesis
testing techniques that provide alternatives to traditional parametric tests in situations where the
underlying assumptions cannot be met or when dealing with non-parametric data. These tests allow
Bhise N K
DATA ANALYTICS BCS SY
researchers to make valid statistical inferences and draw conclusions from a wide range of data
Kruskal-Wallis test in Excel can be achieved using custom formulas or add-ins. Here's a
general overview of how you can implement these tests using custom formulas and
add-ins:
Mann-Whitney U Test:
The Mann-Whitney U test is a non-parametric test used to compare two independent
groups.
Use the U statistic to calculate the p-value using the appropriate distribution
Using Add-Ins:
Bhise N K
DATA ANALYTICS BCS SY
There are Excel add-ins available that can perform the Mann-Whitney U test. You can
search for add-ins specifically designed for statistical analysis in Excel. These add-ins
Kruskal-Wallis Test:
The Kruskal-Wallis test is a non-parametric test used to compare more than two
independent groups.
Using Add-Ins:
Similar to the Mann-Whitney U test, you can search for Excel add-ins that offer the
Kruskal-Wallis test functionality. These add-ins can automate the computation and
formula add-in. This involves writing VBA (Visual Basic for Applications) code to create
custom functions that perform the calculations for the Mann-Whitney U test,
Bhise N K
DATA ANALYTICS BCS SY
Kruskal-Wallis test, or other advanced hypothesis tests. Once created, these functions
When developing a custom formula add-in, ensure it's properly tested and validated to
ensure accurate results. Additionally, provide clear documentation and instructions for
Keep in mind that while implementing these tests in Excel can be useful for quick
analyses, it's essential to verify the accuracy of results and understand the assumptions
and limitations of each test. For critical or complex analyses, consider using dedicated
Bhise N K