Models Assignment
Models Assignment
Models Assignment
INTRODUCTION
An econometrics model specifies the statistical relationship that is believed to hold between the
various economic quantities pertaining to a particular economic phenomenon.
Econometrics is used to create the values for economic relationships with the help of the
integration of economics, along with mathematical and statistical objectives of these parameters.
In a general language, econometrics uses economics theories with mathematical forms and
combines them with empirical economics. With the help of econometrics methods, it is easy to
obtain parameter values to identify the coefficient relationship of mathematics and economics.
An econometric model consists of - a set of equations describing the behaviour. These equations
are derived from the economic model and have two parts observed variables and disturbances.
An econometric model then is a set of joint probability distributions to which the true joint
probability distribution of the variables under study is supposed to belong. In the case in which
the elements of this set can be indexed by a finite number of real-valued parameters, the model is
called a parametric model; otherwise it is a nonparametric or semi parametric model. A large part
of econometrics is the study of methods for selecting models, estimating them, and carrying out
inference on them.
2. ECONOMETRICS MODELS
An econometric model is based on equations that can describe the economy's impact on business
management. It can be an observed variable or disturbance equation, but there is a statement to
derive the variables instead of assumptions.
In linear regression, the relationships are modeled using linear predictor functions whose
unknown model parameters are estimated from the data. Such models are called linear models.
Most commonly, the conditional mean of the response given the values of the explanatory
variables (or predictors) is assumed to be an affine function of those values; less commonly, the
conditional median or some other quantile is used. Like all forms of regression analysis, linear
regression focuses on the conditional probability distribution of the response given the values of
the predictors, rather than on the joint probability distribution of all of these variables, which is
the domain of multivariate analysis.
Linear regression has many practical uses. Most applications fall into one of the following two
broad categories:
1. If the goal is error reduction in prediction or forecasting, linear regression can be used to fit a
predictive model to an observed data set of values of the response and explanatory variables
2. If the goal is to explain variation in the response variable that can be attributed to variation in
the explanatory variables, linear regression analysis can be applied to quantify the strength of
the relationship between the response and the explanatory variables, and in particular to
determine whether some explanatory variables may have no linear relationship with the
response at all, or to identify which subsets of explanatory variables may contain redundant
information about the response
The linear regression model has five key assumptions:
Linear relationship between the dependent and independent variables.
Multivariate normality
No or little multi collinearity
No auto-correlation
Homoscedasticity, i.e. the variability in the response doesn’t increase as the value of the
predictor increases.
Additionally, the basic assumption of the linear regression model is that of a linear relationship
between the dependent and independent variables. We also assume that the errors follow a
normal distribution, and that the observations are independent of each other.
If one or more of these assumptions are violated, then the results of our linear regression may be
unreliable or even misleading.
Linear Regression is a very simple algorithm that can be implemented very easily to give
satisfactory results. Furthermore, these models can be trained easily and efficiently even on
systems with relatively low computational power when compared to other complex algorithms.
Linear regression has a considerably lower time complexity when compared to some of the other
machine learning algorithms. The mathematical equations of Linear regression are also fairly
easy to understand and interpret. Hence Linear regression is very easy to master.
Sensitive to outliers
Outliers of a data set are anomalies or extreme values that deviate from the other data points of
the distribution. Data outliers can damage the performance of a machine learning model
drastically and can often lead to models with low accuracy.
4. PROBIT MODEL
In probability theory and statistics, the Probit function is the quantile function associated with the
standard normal distribution. It has applications in data analysis and machine learning, in
particular exploratory statistical graphics and specialized regression modeling of binary response
variable.
Largely because of the central limit theorem, the standard normal distribution plays a
fundamental role in probability theory and statistics. If we consider the familiar fact that the
standard normal distribution places 95% of probability between −1.96 and 1.96, and is
symmetric around zero, it follows that:-
5. Tobit model
In statistics, a tobit model is any of a class of regression models in which the observed range of
the dependent variable is censored in some way.
Tobin's idea was to modify the likelihood function so that it reflects the unequal sampling
probability for each observation depending on whether the latent dependent variable fell above
or below the determined threshold. For a sample that, as in Tobin's original case, was censored
from below at zero, the sampling probability for each non-limit observation is simply the height
of the appropriate density function. For any limit observation, it is the cumulative distribution,
i.e. the integral below zero of the appropriate density function. The tobit likelihood function is
thus a mixture of densities and cumulative distribution functions. The likelihood function
Below are the likelihood and log likelihood functions for a type I tobit.
functions.
For a data set with N observations the likelihood function for a type I tobit is
Re parameterization
The log-likelihood as stated above is not globally concave, which complicates the maximum
Consistency
Interpretation
Type II
Type II tobit models introduce a second latent variable
In Type I tobit, the latent variable absorbs both the process of participation and the outcome of
interest. Type II tobit allows the process of participation (selection) and the outcome of interest
to be independent, conditional on observable data.
The Heckman selection model falls into the Type II tobit, which is sometimes called Heckit after
James Heckman
Type III
Type
Applications
Tobit models have, for example, been applied to estimate factors that impact grant receipt,
including financial transfers distributed to sub-national governments who may apply for these
grants. In these cases, grant recipients cannot receive negative amounts, and the data is thus left-
censored. For instance, Dahlberg and Johansson (2002) analyse a sample of 115 municipalities
(42 of which received a grant). Dubois and Fattore (2011) use a tobit model to investigate the
role of various factors in European Union fund receipt by applying Polish sub-national
governments.
In statistics, the logistic model (or logit model) is a statistical model that models the probability
of an event taking place by having the log-odds for the event be a linear combination of one or
more independent variables. In regression analysis, logistic regression[ (or logit regression) is
estimating the parameters of a logistic model (the coefficients in the linear combination).
Formally, in binary logistic regression there is a single binary dependent variable, coded by an
indicator variable, where the two values are labeled "0" and "1", while the independent variables
can each be a binary variable (two classes, coded by an indicator variable) or a continuous
variable (any real value).
Binary variables are widely used in statistics to model the probability of a certain class or event
taking place, such as the probability of a team winning, of a patient being healthy, etc. (see §
Applications), and the logistic model has been the most commonly used model for binary
regression since about 1970.
Applications
Logistic regression is used in various fields, including machine learning, most medical fields,
and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is
widely used to predict mortality in injured patients, was originally developed by Boyd et al.
using logistic regression.[6] Many other medical scales used to assess severity of a patient have
been developed using logistic regression. Logistic regression may be used to predict the risk of
developing a given disease (e.g. diabetes; coronary heart disease), based on observed
characteristics of the patient (age, sex, body mass index, results of various blood tests, etc.).
Another example might be to predict whether a Nepalese voter will vote Nepali Congress or
Communist Party of Nepal or Any Other Party, based on age, income, sex, race, state of
residence, votes in previous elections, etc
Model
Logistic regression is a method that we can use to fit a regression model when the response
variable is binary.
Before fitting a model to a dataset, logistic regression makes the following assumptions:
Assumptions of binary logistic regression
Assumption #1: The Response Variable is Binary
Logistic regression assumes that the response variable only takes on two possible outcomes.
Some examples include:
Yes or No
Male or Female
Pass or Fail
Drafted or Not Drafted
Malignant or Benign
Logistic regression assumes that there is no severe multicollinearity among the explanatory
variables.
Assumption #4: There are No Extreme Outliers
Logistic regression assumes that there are no extreme outliers or influential observations in the
dataset.
Assumption #5: There is a Linear Relationship Between Explanatory Variables and the
Logit of the Response Variable
Logistic regression assumes that there exists a linear relationship between each explanatory
variable and the logit of the response variable. Recall that the logit is defined as:
A binary logit model is a statistical model that models the probability of an event taking place by
having the log-odds for the event be a linear combination of one or more independent variables
Here are some advantages and disadvantages of binary logit models:
Advantages:
Disadvantages:
If the number of observations is lesser than the number of features, Logistic Regressionshould
not be used, otherwise, it may lead to overfitting .
The major limitation of Logistic Regression is the assumption of linearity between the
dependent variable and the independent variables. Non-linear problems can’t be solved with
logistic regression because it has a linear decision surface. Linearly separable data is rarely found
in real-world scenarios.
It can only be used to predict discrete functions. Hence, the dependent variable of Logistic
Regression is bound to the discrete number set.
It is tough to obtain complex relationships using logistic regression. More powerful and
compact algorithms such as Neural Networks can easily outperform this algorithm.
Which major will a college student choose, given their grades, stated likes and dislikes, etc.?
Which blood type does a person have, given the results of various diagnostic tests?
In a hands-free mobile phone dialing application, which person's name was spoken, given
various properties of the speech signal?
Which candidate will a person vote for, given particular demographic characteristics?
Which country will a firm locate an office in, given the characteristics of the firm and of the
various candidate countries?
Introduction
There are multiple equivalent ways to describe the mathematical model underlying multinomial
logistic regression. This can make it difficult to compare different treatments of the subject in
different texts. The article on logistic regression presents a number of equivalent formulations of
simple logistic regression, and many of these have analogues in the multinomial logit model.
The idea behind all of them, as in many other statistical classification techniques, is to construct
a linear predictor function that constructs a score from a set of weights that are linearly combined
with the explanatory variables (features) of a given observation using a dot product
observation i
to category k. In discrete choice theory, where observations represent people and outcomes
represent choices, the score is considered the utility associated with person i choosing outcome k.
The predicted outcome is the one with the highest score.
Assumptions
The multinomial logistic model assumes that data are case-specific; that is, each independent
variable has a single value for each case. As with other types of regression, there is no need for
the independent variables to be statistically independent from each other (unlike, for example, in
a naive Bayes classifier); however, collinearity is assumed to be relatively low, as it becomes
difficult to differentiate between the impact of several variables if this is not the case.[5]
If the multinomial logit is used to model choices, it relies on the assumption of independence of
irrelevant alternatives (IIA), which is not always desirable. This assumption states that the odds
of preferring one class over another do not depend on the presence or absence of other
"irrelevant" alternatives. For example, the relative probabilities of taking a car or bus to work do
not change if a bicycle is added as an additional possibility. This allows the choice of K
alternatives to be modeled as a set of K-1 independent binary choices, in which one alternative is
chosen as a "pivot" and the other K-1 compared against it, one at a time
Strength
Logistic Regression is one of the simplest machine learning algorithms and is easy to implement
yet provides great training efficiency in some cases. Also due to these reasons, training a model
with this algorithm doesn't require high computation power.
The predicted parameters (trained weights) give inference about the importance of each feature.
The direction of association i.e. positive or negative is also given. So we can use logistic
regression to find out the relationship between the features.
This algorithm allows models to be updated easily to reflect new data, unlike decision trees or
support vector machines. The update can be done using stochastic gradient descent.
Weaknesses
It is difficult to capture complex relationships using logistic regression. More powerful and
complex algorithms such as Neural Networks can easily outperform this algorithm.
The training features are known as independent variables. Logistic Regression requires
moderate or no multicollinearity between independent variables. This means if two
independent variables have a high correlation, only one of them should be used. Repetition of
information could lead to wrong training of parameters (weights) during minimizing the cost
function. Multicollinearity can be removed using dimensionality reduction techniques.
In Linear Regression independent and dependent variables should be related linearly. But
Logistic Regression requires that independent variables are linearly related to the log odds
(log(p/(1-p)).
Only important and relevant features should be used to build a model otherwise the
probabilistic predictions made by the model may be incorrect and the model's predictive value
may degrade.
6. Propensity score matching model
(i) In the statistical analysis of observational data, propensity score matching (PSM) is a
statistical matching technique that attempts to estimate the effect of a treatment, policy, or other
intervention by accounting for the covariates that predict receiving the treatment. PSM attempts
to reduce the bias due to confounding variables that could be found in an estimate of the
treatment effect obtained from simply comparing outcomes among units that received the
treatment versus those that did not.
The "propensity" describes how likely a unit is to have been treated, given its covariate values.
The stronger the confounding of treatment and covariates, and hence the stronger the bias in the
analysis of the naive treatment effect, the better the covariates predict whether a unit is treated or
not. By having units with similar propensity scores in both treatment and control, such
confounding is reduced.
(iii) PSM is for cases of causal inference and confounding bias in non-experimental settings in
which: few units in the non-treatment comparison group are comparable to the treatment units;
and
(iv) selecting a subset of comparison units similar to the treatment unit is difficult because units
must be compared across a high-dimensional set of pretreatment characteristics.
or control (
Strength of psm
PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias,"
which is not the case with most other matching methods. The insights behind the use of matching
still hold but should be applied with other matching methods; propensity scores also have other
productive uses in weighting and doubly robust estimation.
One disadvantage of PSM is that it only accounts for observed (and observable) covariates and
not latent characteristics. Factors that affect assignment to treatment and outcome but that cannot
be observed cannot be accounted for in the matching procedure. As the procedure only controls
for observed variables, any hidden bias due to latent variables may remain after matching.
Another issue is that PSM requires large samples, with substantial overlap between treatment and
control groups.
Weaknesses of psm
the The key advantages of PSM were, at the time of its introduction, that by using a linear
combination of covariates for a single score, it balances treatment and control groups on a large
number of covariates without losing a large number of observations. If units in the treatment and
control were balanced on a large number of covariates one at a time, large numbers of
observations would be needed to overcome the "dimensionality problem" whereby the
introduction of a new balancing covariate increases the minimum necessary number of
observations in sample geometrically.
Main theorems
Any score that is 'finer' than the propensity score is a balancing score
Interpretable coefficients.
As with many other regression models, ordinal logistic regression models provide highly
interpretable coefficients that explain the relationship between your features and your outcome
variable. These coefficients often come along with confidence intervals and statistical tests for
even better interpretability.
Not available in common libraries. Another downside of ordinal logistic regression is that it is
a relatively niche model that is not available in all common machine learning libraries. Ordinal
logistic regression, and regression models in general, tend to be more commonly used in fields
where inference and classical statistics are king. That means that ordinal logistic regression
models are more likely to be implemented in languages and programs that favor classical
statistics such as SAS and Stata.
General regression downsides. Ordinal logistic regression is subject to many of the same
pitfalls that other regression models like linear regression and logistic regression are. This means
that ordinal logistic regression models are also easily thrown off by things like outliers,
correlated features, non-specified interactions, and missing data.
The model obtains formal identification from the normality assumption when the same
covariates appear in the selection equation and the equation of interest, but identification will be
tenuous unless there are many observations in the tails where there is substantial nonlinearity in
the inverse mills ratio (IMR). Generally, an exclusion restriction is required to obtain credible
estimates.
Heckman selection model assumes that
1. error of both selection and main equation are correlated and distributed normally,
2. explanatory variables in selection equation are independent of the error term,
3. explanatory variables in main equation are independent of the error term.
Double-hurdle models are used with dependent variables that take on the endpoints of an interval
with positive probability and that are continuously distributed over the interior of the interval.
For example, you observe the amount of alcohol individuals consume over a fixed period of
time. The distribution of the amounts will be roughly continuous over positive values, but there
will be a ―pile up‖ at zero, which is the corner solution to the consumption problem the
individuals face; no individual can consume a negative amount of alcohol.
suppose individuals make their consumption decisions in two steps. First, the individual
determines whether he or she wants to participate in the market. This is called the participation
decision. Then the individual determines an optimal consumption amount (which may be 0)
given his or her circumstances. This is called the quantity decision
Where:
Assumptions of Multiple Linear Regression
1. A linear relationship between the dependent and independent variables
The first assumption of multiple linear regression is that there is a linear relationship between the
dependent variable and each of the independent variables. The best way to check the linear
relationships is to create scatterplots and then visually inspect the scatterplots for linearity. If the
relationship displayed in the scatterplot is not linear, then the analyst will need to run a non-
linear regression or transform the data using statistical software, such as SPSS.
2. The independent variables are not highly correlated with each other
The data should not show multicollinearity, which occurs when the independent variables
(explanatory variables) are highly correlated. When independent variables show
multicollinearity, there will be problems figuring out the specific variable that contributes to the
variance in the dependent variable. The best method to test for the assumption is the Variance
Inflation Factor method.
3. Regression models describe the relationship between variables by fitting a line to the
observed data. Linear regression models use a straight line, while logistic and nonlinear
regression models use a curved line. Regression allows you to estimate how a dependent variable
changes as the independent variable(s) change.
Assumptions of simple linear regression
Simple linear regression is a parametric test, meaning that it makes certain assumptions about the
data. These assumptions are:
1. Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t
change significantly across the values of the independent variable.
2. Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, an d there are no hidden relationships among observations.
3. Normality: The data follows a normal distribution.
For a simple linear regression, you can simply plot the observations on the x and y axis and then
include the regression line and regression function:
Advantages
Advantages of Linear Regression 1. Linear Regression performs well when the dataset is
linearly separable. We can use it to find the nature of the relationship among the variables. 2.
Linear Regression is easier to implement, interpret and very efficient to train. 3. Linear
Regression is prone to over-fitting but it can be easily avoided using some dimensionality
reduction techniques, regularization (L1 and L2) techniques and cross-validation. Disadvantages
of Linear Regression 1. Main limitation of Linear Regression is the assumption of linearity
between the dependent variable and the independent variables. In the real world, the data is
rarely linearly separable. It assumes that there is a straight-line relationship between the
dependent and independent variables which is incorrect many times. 2. Prone to noise and
overfitting: If the number of observations are lesser than the number of features, Linear
Regression should not be used, otherwise it may lead to overfit because is starts considering
noise in this scenario while building the model. 3. Prone to outliers: Linear regression is very
sensitive to outliers (anomalies). So, outliers should be analyzed and removed before applying
Linear Regression to the dataset.
4. Prone to multicollinearity: Before applying Linear regression, multicollinearity should be
removed (using dimensionality reduction techniques) because it assumes athat there is no
relationship among independent variables.
References
Oxford English Dictionary, 3rd ed. s.v. probit (article dated June 2007): Bliss, C. I. (1934). "The
Method of Probits". Science. 79 (2037): 38–39. Bibcode:1934Sci....79...38B.
doi:10.1126/science.79.2037.38. PMID 17813446. These arbitrary probability units have been
called 'probits'.
Agresti, Alan (2015). Foundations of Linear and Generalized Linear Models. New York: Wiley.
pp. 183–186. ISBN 978-1-118-73003-4.
Aldrich, John H.; Nelson, Forrest D.; Adler, E. Scott (1984). Linear Probability, Logit, and
Probit Models. Sage. pp. 48–65. ISBN 0-8039-2133-
Tolles, Juliana; Meurer, William J (2016). "Logistic Regression Relating Patient Characteristics
to Outcomes". JAMA. 316 (5): 533–4. doi:10.1001/jama.2016.7653. ISSN 0098-7484. OCLC
6823603312. PMID 27483067.
Jump up to:a b c d e f g h i j k Hosmer, David W.; Lemeshow, Stanley (2000). Applied Logistic
Regression (2nd ed.). Wiley. ISBN 978-0-471-35632-5. Jump up to:a b Cramer 2002, p. 10–11.
Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. pp. 518–521.
ISBN 0-691-01018-8.
Goldberger, Arthur S. (1964). Econometric Theory. New York: J. Wiley. pp. 253–55. ISBN
9780471311010.
Tobin, James (1958). "Estimation of Relationships for Limited Dependent Variables" (PDF).
Econometrica. 26(1): 24–36. doi:10.2307/1907382. JSTOR 1907382
McCullagh, Peter (1980). "Regression Models for Ordinal Data". Journal of the Royal Statistical
Society. Series B (Methodological). 42 (2): 109–142. JSTOR 2984952.
"rologit.pdf" (PDF). Stata.
Greene, William H. (2012). Econometric Analysis (Seventh ed.). Boston: Pearson Education. pp.
824–827. ISBN 978-0-273-75356-8.
David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University
Press. p. 26. A simple regression equation has on the right hand side an intercept and an
explanatory variable with a slope coefficient. A multiple regression e right hand side, each with
its own slope coefficient
Rencher, Alvin C.; Christensen, William F. (2012), "Chapter 10, Multivariate regression –
Section 10.1, Introduction", Methods of Multivariate Analysis, Wiley Series in Probability and
Statistics, vol. 709 (3rd ed.), John Wiley & Sons, p. 19, ISBN 9781118391679.