Regression analysis is a statistical technique for investigating relationships between variables. It allows to determine how the dependent variable changes with independent variables and which independent variables impact the dependent variable. Linear regression finds the line of best fit to model the relationship between one dependent and independent variable or multiple independent variables.
Regression analysis is a statistical technique for investigating relationships between variables. It allows to determine how the dependent variable changes with independent variables and which independent variables impact the dependent variable. Linear regression finds the line of best fit to model the relationship between one dependent and independent variable or multiple independent variables.
Regression analysis is a statistical technique for investigating relationships between variables. It allows to determine how the dependent variable changes with independent variables and which independent variables impact the dependent variable. Linear regression finds the line of best fit to model the relationship between one dependent and independent variable or multiple independent variables.
Regression analysis is a statistical technique for investigating relationships between variables. It allows to determine how the dependent variable changes with independent variables and which independent variables impact the dependent variable. Linear regression finds the line of best fit to model the relationship between one dependent and independent variable or multiple independent variables.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 8
REGRESSION
• Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and independent variable(s) (predictor). This techniques is used for forecasting, time series modelling, and finding the causal effect relationship between the variables. • In statistical modeling, regression analysis is used to estimate the relationships between two or more variables: • Dependent variable (criterion variable) is the main factor you are trying to understand and predict. • Independent variables ( explanatory variables, or predictors) are the factors that might influence the dependent variable. Regression analysis helps you understand how the dependent variable changes when one of the independent variables varies and allows to mathematically determine which of those variables really has an impact. • A regression analysis model is based on the sum of squares, which is a mathematical way to find the dispersion of data points. The goal of a model is to get the smallest possible sum of squares and draw a line that comes closest to the data. • Simple linear regression models the relationship between a dependent variable and one independent variables using a linear function. If you use two or more explanatory variables to predict the dependent variable, you deal with multiple linear regression. • Linear regression equation Mathematically, a linear regression is defined by this equation: y = bx + a + ε Where: x is an independent variable. y is a dependent variable. a is the Y-intercept, which is the expected mean value of y when all x variables are equal to 0. On a regression graph, it's the point where the line crosses the Y axis. b is the slope of a regression line, which is the rate of change for y as x changes. ε is the random error term, which is the difference between the actual value of a dependent variable and its predicted value. The linear regression equation always has an error term because, in real life, predictors are never perfectly precise. In Excel, y = bx + a Multiple R. It is the Correlation Coefficient that measures the strength of a linear relationship between two variables. The correlation coefficient can be any value between -1 and 1, and its absolute value indicates the relationship strength. The larger the absolute value, the stronger the relationship: • 1 means a strong positive relationship • -1 means a strong negative relationship • 0 means no relationship at all R Square. It is the Coefficient of Determination, which is used as an indicator of the goodness of fit. It shows how many points fall on the regression line. The R2 value is calculated from the total sum of squares, more precisely, it is the sum of the squared deviations of the original data from the mean. In our example, R2 is 0.91 (rounded to 2 digits), which is fairy good. It means that 91% of our values fit the regression analysis model. In other words, 91% of the dependent variables (y-values) are explained by the independent variables (x-values). Generally, R Squared of 95% or more is considered a good fit. Adjusted R Square. It is the R square adjusted for the number of independent variable in the model. You will want to use this value instead of R square for multiple regression analysis. Standard Error. It is another goodness-of-fit measure that shows the precision of your regression analysis - the smaller the number, the more certain you can be about your regression equation. While R2 represents the percentage of the dependent variables variance that is explained by the model, Standard Error is an absolute measure that shows the average distance that the data points fall from the regression line. Observations. It is simply the number of observations in your model. Regression analysis output: ANOVA The second part of the output is Analysis of Variance (ANOVA): Basically, it splits the sum of squares into individual components that give information about the levels of variability within your regression model: • df is the number of the degrees of freedom associated with the sources of variance. • SS is the sum of squares. The smaller the Residual SS compared with the Total SS, the better your model fits the data. • MS is the mean square. • F is the F statistic, or F-test for the null hypothesis. It is used to test the overall significance of the model. • Significance F is the P-value of F. The ANOVA part is rarely used for a simple linear regression analysis in Excel, but you should definitely have a close look at the last component. The Significance F value gives an idea of how reliable (statistically significant) your results are. If Significance F is less than 0.05 (5%), your model is OK. If it is greater than 0.05, you'd probably better choose another independent variable. Regression analysis output: coefficients This section provides specific information about the components of your analysis: The most useful component in this section is Coefficients. It enables you to build a linear regression equation in excel: y = bx + a For our data set, where y is the number of umbrellas sold and x is an average monthly rainfall, our linear regression formula goes as follows: Y = Rainfall Coefficient * x + Intercept Y=0.45*x-19.074 For example, with the average monthly rainfall equal to 82 mm, the umbrella sales would be approximately 17.8: 0.45*82-19.074=17.8 In a similar manner, you can find out how many umbrellas are going to be sold with any other monthly rainfall (x variable) you specify.