ECONOMICS SEM 4 Notes Sakshi
ECONOMICS SEM 4 Notes Sakshi
ECONOMICS SEM 4 Notes Sakshi
Time series- Time-series data are often graphically depicted on a line chart, which is a plot
of the variable over time. It is created by plotting the value of the variable on the vertical
axis and the time periods on the horizontal axis.
Unit4 – sampling
Direct observation= The simplest method of obtaining data is by direct observation. When
data are gathered in this way, they are said to be observational.
Experimental data- produced through experiments. And is more expensive.
Surveys- One of the most familiar methods of collecting data is the survey, which solicits
information from people concerning such things as their income, family size, and opinions
on various issues.
Personal Interview- involves an interviewer soliciting information from a respondent by
asking prepared questions. A personal interview has the advantage of having a higher
expected response rate than other methods of data collection.
Telephone Interview- A telephone interview is usually less expensive, but it is also less
personal and has a lower expected response rate. Unless the issue is of interest, many people
will refuse to respond to telephone surveys.
Sampling Error-
refers to differences between the sample and the population that exists only because of the
observations that happened to be selected for the sample.
Sampling error is an error that we expect to occur when we make a statement about a
population that is based only on the observations contained in a sample taken from the
population
The difference between the true (unknown) value of the population mean and its estimate,
the sample mean, is the sampling error.
Eg- population wala.
Non sampling Error-
Non sampling error is more serious than sampling error because taking a larger sample will
not diminish the size, or the possibility of occurrence, of this error. Even a census can (and
probably will) contain nonsampling errors. Nonsampling errors result from mistakes made
in the acquisition of data or from the sample observations being selected improperly.
1. Errors in data acquisition- arises from the recording of incorrect responses.
2. Nonresponse error- refers to error (or bias) introduced when responses are not
obtained from some members of the sample.
3. Selection bias- occurs when the sampling plan is such that some members of the
target population cannot possibly be selected for inclusion in the sample.
Unit 5
Measures of central location-
Arithmetic mean= mean is computed by summing the observations and dividing by the
number of observations.
Population means- μ Sample mean- x bar
Function= AVERAGE ([Input Range]) or Descriptive Analysis.
Median= The median is calculated by placing all the observations in order (ascending or
descending). The observation that falls in the middle is the median.
Function= MEDIAN (Input Range)
The mode is defined as the observation (or observations) that occurs with the greatest
frequency. Both the statistic and parameter are computed in the same way.
Pg-93
Range- Largest obv-smallest obv
Variance- How given data varies to the arithmetic mean of the data set. Sample variance
denotes the variation between your sample data and the mean of your sample data. Sample
is plucked from the huge pool of population (entire available data).
Function: VAR (Input Range)
Population Varx and Sample Varx pg. 97
VARIANCE CANNOT BE NEGATIVE BECAUSE IT IS SQUARED WHICH
ELIMINATES ALL THE POSSIBILITIES OF BEING NEGATIVE.
Standard deviation: Shows how much your data deviates or varies from the average or the
mean of the data. Low SD means data is clustered around mean and high means there is a
high dispersion of data from mean.
Interpret= info depends on shape of histogram and if the histogram is bell shaped, use the
empirical rule.
1. Approximately 68% of all observations fall within one standard deviation of the mean.
2. Approximately 95% of all observations fall within two standard deviations of the mean.
3. Approximately 99.7% of all observations fall within three standard deviations of the
mean
The coefficient of variation of a set of observations is the standard deviation of the
observations divided by their mean: Population coefficient of variation: CV = σ/μ
Sample coefficient of variation: cv = s/x
Percentile
The P th percentile is the value for which P % are less than that value and (100 – P)% are
greater than that value.
Because these three statistics divide the set of data into quarters, these measures of relative
standing are also called quartiles. The first or lower quartile is labelled Q1. It is equal to the
25th percentile. The second quartile, Q2, is equal to the 50th percentile, which is also the
median. The third or upper quartile, Q3, is equal to the 75th percentile.
Quintiles divide the data into fifths, and deciles divide the data into tenths.
LP = (n + 1) P/100
The interquartile range measures the spread of the middle 50% of the observations. Large
values of this statistic mean that the first and third quartiles are far apart, indicating a high
level of variability.
=Q3-Q1
Population mean(neu), Sample mean, Population Variance, and sample variance. (Refer to
notebook)
Sd= square root of variance.
E(x)=sigma x P(X=x)
It is the sum of the values taken by random variable and it is associated probabilities.
BINOMIAL DISTRIBUTION
Fixed no of trials
Has only two outcomes – success and failure
Probability of success- p
Probability of failure= 1-p
trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials
Bernoulli trial= if there are only two outcomes success and failure, Probability of success- p
Probability of failure= 1-p
trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials
P(x)= n! / x! (n-x)!
Function= BINOM.DIST
POISSON DISTRIBUTION
If we want to find out the probability of successes in each interval of time or space, we use
Poisson distribution.
Function=POISSON.DIST(x, neu, t/f)
P(x) = e−μμx/ x! and substituting x = 0
NORMAL DISTRIBUTION
Based on continuous random variables, values are within a particular interval.
Heights, Weights, and distance covered.
It is bell shaped and symmetric around the mean.
Functions= NORM.DIST(x, neu, SD, true)
NORM.S. DIST (Z, True)
NORM.INV
NORM.S.INV
Regression
Regression analysis is used to predict the value of one variable on the basis of another
variables. It develops a mathematical equation or model that describes the relation between
dependent variable (variable to be forecast) and independent variable (believed by the
practitioner)
Simple Linear Regression Model- also called first order linear model
y = β0 + β1x + ε
y = dependent variable x = independent variable β0 = y-intercept β1 = slope of the line
(defined as rise/run) ε = error variable
The straight line that we wish to use to estimate β0 and β1 is the “best” straight line—best in
the sense that it comes closest to the sample data points. This best straight line, called the
least squares line. / Least square method – Chp 4 pg 114
The slope is defined as rise/run, which means that it is the change in y (rise) for a oneunit
increase in x (run). Put less mathematically, the slope measures the marginal rate of change
in the dependent variable. The marginal rate of change refers to the effect of increasing the
independent variable by one additional unit.
Intercept- point at which the regression line and the y axis intercept.
Coefficient of Correlation tells us about the linear relationship whether it is strong or weak.
Refer to pg 636 and 637 of textbook for all formulae and example.
Interpretation of Regression: for xr 16-04
1. Multiple R: The correlation coefficient between the independent variable and
dependent variable is 0.876 which indicates a strong positive relation.
2. R square: The coefficient of determination is 0.767 which means that 76.8% of the
variation in the dependent variable is explained by the independent variable.
3. Adjusted R: Adjusts the R squared value for the no of predictors in the model. 0.749
indicates that the independent variable explains a substantial amount of variation in
dependent variable.
4. Standard Error: estimated is 3.4. It measures the average distance btw the observed
and predicted values.
5. Observations- 15
6. ANOVA table shows the sources of variation in the regression model.
Df- Degree of freedom. It is 1 for regression and 13 for residuals.
SS- sum of squares. Represents variation exp by regression model and the
unexplained variation of residuals.
MS- mean square. Dividing SS by Df.
F- F statistic is ratio of mean square values and tests the overall significance of
regression model. In this case it is small indicating that model is statistically
significant.
7. Intercept: The estimated intercept is 26.917. Means that when x=o then the ___ is
26.917. It represents the predicted value of the dependent variable when all
independent variables are zero.
8. Overweight: The estimated coefficient for the independent variable "Overweight" is
0.794. It indicates that for each unit increase in the "Overweight" variable, the
predicted value of the dependent variable increases by 0.794.
9. This table shows the predicted values, residuals, and observed values for each
observation in the dataset.
Predicted Television: The predicted values of the dependent variable based on the
regression model.
Residuals: The differences between the observed values and the predicted values.
Positive values indicate that the actual values are higher than predicted, while
negative values indicate the opposite.
10. Based on this information, you can conclude that the regression model is statistically
significant and explains a significant portion of the variation in the dependent
variable. The "Overweight" variable has a positive and significant effect on the
predicted value of the dependent variable.
11. The slope coefficient is 26.917 which predicts the value of all dependent variable
when all independent variables are zero.
OR
Linear Regression equation is y^ (predicted values of y)= bX(slope ie rate of increase or
decrease of Y that for each unit increase in X) + a (Y intercept= level of y when x=0)
SSE- Sum of squares for error is the minimized sum of squared deviations.
CORRELATION
It is a measure of linear association.
Karl Pearson
Spearman’s Correlation