10 Brozyna Co
10 Brozyna Co
10 Brozyna Co
net/publication/301343044
CITATIONS READS
29 816
3 authors:
Grzegorz Mentel
Rzeszów University of Technology
85 PUBLICATIONS 659 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Renewable energy and economic development in the European Union View project
All content following this page was uploaded by Grzegorz Mentel on 29 April 2016.
---------TRANSFORMATIONS IN --------
Brozyna, J., Mentel, G., Pisula, T. (2016), „Statistical Methods of the BUSINESS & ECONOMICS
Bankruptcy Prediction in the Logistics Sector in Poland and Slovakia”,
Transformations in Business & Economics, Vol. 15, No 1 (37), pp.93- © Vilnius University, 2002-2016
© Brno University of Technology, 2002-2016
114. © University of Latvia, 2002-2016
Received: December, 2014 ABSTRACT. The fundamental subject matter of this publication is the
1st Revision: January, 2015 analysis of the issue of bankruptcy in the context of appearance of possible
2nd Revision: February, 2015 threat signals. The presented research aims at proving validation values of
Accepted: April, 2015 described models in predicting possible bankruptcy signals and evaluating
the financial condition of the TSL sector (transport, spedition, logistics)
entities from Poland and Slovakia. In order to predict the risk of company
bankruptcy from the logistics sector, the following statistical models of
bankruptcy classification were used: classic linear discriminant analysis
and logistic regression. What is more, the predictions based on the so-called
classification trees and the method of nearest neighbours was applied. The
empirical verification of correct classification by given groups of methods
of statistical bankruptcy analysis from the perspective of their efficiency
showed that these methods could be characterized with a high quality of
bankruptcy prediction. The presented concepts allow evaluating quite easily
the threat of bankruptcy for a given group of entities. One vital advantage
of the presented results is the fact of dividing research sample into the so-
called learning group, for which parameters of the analysed models were
estimated, and the test sample researching effectiveness of proper
classifications, for which all the predictions were set for a period of both one
and two years before the bankruptcy.
Introduction
During the recurring financial crises or lesser or greater turmoil on the markets, the
stable existence of a company and its possibly growing income with each year often create a
vision which cannot be brought into life. Large number of companies struggle with many
problems, starting from troubles when obtaining a bank loan and ending with bad debts. Certain
situations will finally reveal the first symptoms of bankruptcy for companies, which in the
course of time may lead to bankruptcy.
The bankruptcy itself is not something, which appears suddenly, but it is a process taking
place over a longer period of time. Hence, it is possible to observe the worsening financial
condition of the entities. Thus, defining the above mentioned symptoms of incoming company
crisis can help to detect them in advance. Such approach has great importance, because it
provides the company with time to take proper measures.
The researching of bankruptcy is very important. Therefore, all the measures and
methods that allow to predict possible threats are much desired. However, it should be
remembered that the prepared research and applied predictive models should not introduce
excessive aversion to the conducted actions. Thus, this type of publications and attempts to
implement some models may give the managers a tool allowing to make more proper decisions.
In this article, four types of models from both statistical and non-statistical models
groups are described. Entities from the logistics sector active in the Podkarpacie region and the
others active in the Slovakian market were analysed. The companies from this sector are in the
area of interest of the authors of this study because of its importance for the economy of the
region and country. An essential division into two groups of companies was made, namely the
so-called healthy companies, not threatened with bankruptcy and a group of ill companies, in
the case of which a bankruptcy has been announced, or the insolvency proceedings are being
conducted. The estimation of parameters for each model was made on a learning sample;
whereas, their verification was conducted on the basis of companies classified into the test
sample.
The most commonly used statistical techniques to research companies’ bankruptcy are
based on the discriminant analysis, logit models and decision trees. Nowadays, they are very
rarely used as a sole model and only research method. They are used rather as a comparison
model, in relation to the other non-statistical models, or as a component models in hybrid
approaches. The division of research models used to predict bankruptcy has been presented in
Figure 1.
One vital issue in regards to application of statistical methods to predict companies’
bankruptcy is the work of Altman et al. (1977), where for the first time, the authors introduced
a new model of classification of bankrupted companies, which they named “Zeta analysis”. The
model was estimated on the basis of data from 111 companies and included 7 diagnostic
variables. General effectiveness of correct classifications for this model was 96% for the data
one year before the bankruptcy period and only 70% for the data 5 years before the bankruptcy.
The usage of statistical methods to evaluate the bankruptcy risk can be as well found in the
works of many authors. Martin (1977) estimated the model of logistic regression for predicting
bank bankruptcy on the basis of data derived from the USA Federal Reserve Bank.
Ohlson (1980) proposed a model of logistic regression to scrutinise company’s
bankruptcy risk. Financial data were taken from the databases of a few financial institutions
(i.e. Moody’s, COMPUSTAT). The effectiveness of proper classification for data one year
before bankruptcy was more than 96%, while for the data two years before the bankruptcy,
more than 95%.
Karels, Prakash (1987) researched the application of linear discriminative analysis
models to research bankruptcy risk from an angle of fulfilling or not fulfilling the required
assumption of normality of distribution for financial indicators, which are bankruptcy
predictors. The model estimated by them, on the basis of random sample of 50 companies (data
from the COMPUSTAT database), had a general effectiveness of correct classifications on the
level of 96% for the non-bankrupt companies and 55% for bankrupts.
Kolari et al. (2002) introduced the so-called system of early warning for bankruptcy risk
for large banks in the USA. The system was based on the logistic regression models and had
general effectiveness of correct classifications of 96% and 95%, respectively, for the data one
year and two years before bankruptcy period.
Jones, Hensher (2004) presented the so-called Mixed Logit Model to predict the
bankruptcy of a company. It was a three-state logit model. They examined the model with three
states: 0 – company not threatened with bankruptcy; 1 – company threatened with bankruptcy;
2 – company that has announced bankruptcy. They proved that the model estimated by them
has better predictive properties than a classic multi-state logit model. The application of
decision trees for bankruptcy classification problems can be found, for example, in the works
by Marais et al. (1984) and Frydman et al. (1985).
The bankruptcy risk analysis based on the statistical and non-statistical models is
applied to the companies from all over the world, e.g., Tseng, Hu (2010) use four techniques
(logit model, quadratic interval logit model, backpropagation multi-layer perceptron and radial
basis function network) to predict bankrupt and non-bankrupt firms in England. Choi, Lee
(2013) by using the back-propagation neural network and multivariate discriminant analysis
present a multi-industry investigation of the bankruptcy of Korean companies. Fedorova et al.
(2013) apply the combinations of modern learning algorithms to identify the most effective
approach to bankruptcy prediction for Russian manufacturing companies in their paper. Korol
(2012) compares the effectiveness of twelve models for forecasting the bankruptcy risk of Stock
Exchange companies from the Central Europe and Latin America in his paper.
Information about bankruptcies of Polish companies was taken from the bankruptcy
database of Polish companies, i.e., Corporate Database EMIS information system (Emerging
Markets Information Service).
In order to predict the bankruptcy of logistics sector companies, 28 financial indicators
characterizing the financial condition and managing effectiveness of researched companies
have been chosen as bankruptcy predictors. The indicators have been divided into 5 groups:
financial liquidity indicators, profitability indicators (return on sales), indebtedness indicators
and financial leverage, operating effectiveness (proficiency) and other indicators of capital-
material structure of a company.
Statistical data of financial indicators for the Polish companies were taken from the
financial reports of companies. The following financial indicators were chosen for the research:
Liquidity indicators (*100%): X1 - CURRENT LIQUIDITY INDICATOR: Current
assets/Short-term liabilities, X2 - FAST LIQUIDITY INDICATOR: (Current assets –
Stock)/Short-term liabilities, X3 - LIQUIDITY INDICATOR (KO/SB): Circulating capital
(working capital)/Balance sheet total = (Current assets – Short-term prepayments and accruals
- Short-term liabilities)/Balance sheet total, X4 - IMMEDIATELY DUE INDICATOR:
(Current assets – Stock – Short-term receivables)/Short-term liabilities, X5 – CASH
LIQUIDITY INDICATOR: Cash and cash equivalents/Short-term liabilities;
Profitability indicators (*100%):X6 - OPERATING PROFIT MARGIN: Operating
result (profit-operating loss)/Net sales income, X7 – Profitability: Net profit/(Equity capital –
Net profit), X8 - RETURN ON ASSETS (Asset profitability) (ROA): Net profit/Balance sheet
total, X9 – RETURN ON EQUITY (profitability of equity capital) (ROE): Net profit/Equity
capital, X10 – RETURN ON CAPITAL: Net profit/(Assets in total – Short-term liabilities),
X11 – NET SALES PROFITABILITY (ROS): Net profit/Net sales income, X12 – GROSS
PROFIT MARGIN: (Net income from sales of goods and products and equal to them –
Operating expenses)/Net income from sales of goods and products and equal to them;
Indebtedness indicators and financial leverage effect (*100%): X13 – GENERAL
DEBT: (Short-term liabilities + Long-term liabilities)/Balance sheet total, X14 - DEBT ON
EQUITY: Total liabilities/Equity capital, X15 – DEBT (Equity capital + Long-term
liabilities)/Fixed assets, X16 – ASSETS DEBT: Short-term liabilities/Balance sheet total, X17
– DEBT Gross profit/Short-term liabilities, X18 – DEBT (Net profit + Depreciation)/Total
liabilities, X19 – LONG-TERM DEBT: Long-term liabilities/Equity capital, X20 –
FINANCIAL LEVERAGE: Assets total/Equity capital, X21 – LEVERAGE
(DEBT/COMPANMY VALUE): Total liabilities/(Equity capital + Total liabilities – Cash and
its equivalents);
Operating effectiveness indicators: X22 – RECEIVABLES TURNOVER [in days]:
Average short-term receivables/Net sales income *360, X23 - OBRÓT AKTYWAMI: Net sales
income/Assets *100%, X24 – STOCK TURNOVER [in days]: Stock/Net sales income * 360,
X25 – CASH CYCLE: Short-term receivables/Net sales income * 365 + Stock/Operating
expenses * 365 – Short-term liabilities (without special funds and other short-term financial
liabilities)/Operating expenses (without other operating expenses) * 365;
Financial indicators characterizing the companies’ capital and material structure
(*100%):X26 – Equity capital/Balance sheet total, X27 – Fixed assets (without long-term
prepayments and accruals)/Balance sheet total, X28 – Fixed assets/Current assets.
The research samples were created on the basis of the collected statistical data. The
dependent variable was a qualitative dichotomous dependent variable Y defining whether a
company is a company which declared bankruptcy (Y=1 – bankrupt) or a company not
threatened with bankruptcy (Y=0 – non-bankrupt). Twenty-eight previously characterized
financial indicators were chosen as a set of entry variables (bankruptcy predictors).
Two research samples were created. The first one included these bankrupted companies
from the logistics sector and healthy companies corresponding to them, for which statistical
data for one year before the bankruptcy period was available (1-year prediction horizon). The
second research sample included the bankrupted and healthy companies for which statistical
data for two years before bankruptcy period was available (2-year prediction horizon). For each
of the research samples, one corresponding healthy company not threatened with bankruptcy
was chosen for one bankrupted company. In order to select healthy companies, the ratio analysis
has been applied, which is generally an accepted standard for the assessment of the functioning
of companies and has been used in practice for many years. Thanks to the thorough index
analysis, there were selected only those companies from the logistics sector whose indicators
pointed to a good financial condition and ability to pay its obligations.
The research sample for data one year before the bankruptcy period included 33
bankrupted companies and 33 healthy companies (statistical data for one year before the
bankruptcy was available for only that number of companies); whereas, in the case of data for
2 years before the bankruptcy period, there were 57 healthy companies and 57 bankrupted
companies. Research samples were divided randomly into two samples: the learning sample,
on the basis of which the prediction model parameters were estimated, and test sample
researching the effectiveness of correct classifications. The learning sample for one year
prediction horizon included 47 companies (23 bankrupts and 24 non-bankrupts), whereas the
test sample included 19 companies (10 bankrupts and 9 non-bankrupts). For two year prediction
horizon, the learning sample included 86 logistic companies (43 bankrupts and non-bankrupts),
whereas the test sample included 28 companies (14 bankrupts and non-bankrupts).
In order to scrutinize the influence of chosen variables explanatory variable on the
explained variable identifying the companies’ bankruptcy, a ranking analysis of predictors was
conducted. A vital issue when choosing proper predictors is as well posed by the necessity to
choose only such predictors, which have the best prognostic properties in scope of separation,
i.e., distinguishing between the bankrupt and healthy companies. When preparing a ranking of
predictors depending on their classifying power, in practice, there can be used the following
coefficients: Information Value (IV), Gini and Cramer’s V.
IV coefficient, i.e., information value of a predictor is expressed by the formula:
k
n NB n B niNB / nNB
IV i i ln B
i 1 nNB nB ni / nB , (1)
NB
where k is the number of attributes (variability intervals) of the examined predictor, ni
B
- the number of healthy companies for i-variability interval of predictor’s value, ni - the number
of bankrupted companies for i-variability interval of predictor’s value, nNB - the total number of
healthy companies, nB - the total number of bankrupted companies.
The higher are the values of IV coefficient, the higher are the predictive power of the
explanatory variable in the scope of differentiation between healthy and bankrupted companies.
It is assumed that IV values above 0.3 point out to a strong predictive power, while the values
below 0.02 show complete lack of such predictive power.
Gini coefficient is based on Lorenz curve coefficient (for the so-called ROC curve, i.e.,
Receiver Operating Characteristic). It expresses the ratio of fields on the graph of ROC curve
(see Figure 2) which is expressed by the formula
k 1
A A
Gini 2 A 2 (0.5 B) 1 2 B 1 yi 1 yi xi 1 xi
A B 0.5 i 1 , (2)
where k is the number of attributes (variability intervals) of the examined diagnostic
i n Bj
yi
variable, j 1 nB - cumulated percent of bankrupts, for i-attribute value of variable,
NB
i n
xi
j
B
Yi (cumulated percent of bankrupts)
0,8
0,6
A
0,4
0,2
0,0
independence test between the variable 0-1 defining company’s bankruptcy and examined
indicator (predictor) of bankruptcy.
The higher are the V-Cramer’s coefficient values (closer to 1), the better are predictive
power of the examined indicator in predicting companies’ bankruptcy.
Table 1 and Table 2 present a set values of measures for the ranking of predictors
ordered according to the importance of the information value (IV) coefficient for the research
learning samples (one- and two-year bankruptcy prediction horizon).
Table 1. Indicators for rating predictors to learn the data estimated on the basis of one year before the
bankruptcy period
Coefficient Coefficient
Predictor Predictor
IV Cramer’s V Gini IV Cramer’s V Gini
X16 2,64 0,7 0,75 X13 1,4 0,8 0,82
X14 2,47 0,68 0,01 X6 1,37 0,52 0,58
X11 2,29 0,66 0,72 X3 1,24 0,8 0,85
X26 2,06 0,75 0,78 X4 1,23 0,67 0,75
X18 1,93 0,75 0,82 X22 1,13 0,47 0,21
X15 1,93 0,75 0,82 X5 1,04 0,77 0,82
X17 1,9 0,61 0,68 X7 0,77 0,59 0,34
X8 1,9 0,61 0,68 X19 0,67 0,5 0,04
X10 1,83 0,60 0,1 X2 0,63 0,74 0,71
X1 1,72 0,74 0,75 X12 0,61 0,58 0,58
X20 1,69 0,73 0,03 X27 0,61 0,37 0,2
X21 1,66 0,71 0,72 X24 0,53 0,35 0,26
X25 1,54 0,56 0,51 X28 0,1 0,15 0,1
X9 1,46 0,54 0,07 X23 0,03 0,08 0,07
Source: created by the authors.
Table 2. Indicators for rating predictors to learn the data estimated on the basis of two years before the
bankruptcy period
Predictor Coefficient Coefficient
Predictor
IV Cramer’S V Gini IV Cramer’S V Gini
X13 1,87 0,58 0,62 X27 0,48 0,34 0,31
X26 1,63 0,57 0,6 X2 0,46 0,46 0,33
X21 1,23 0,51 0,52 X5 0,42 0,47 0,5
X16 1,23 0,49 0,55 X17 0,42 0,31 0,33
X15 1,19 0,49 0,51 X6 0,4 0,3 0,31
X3 1,11 0,47 0,51 X19 0,38 0,29 0,21
X14 0,98 0,46 0,14 X28 0,33 0,28 0,24
X20 0,97 0,45 0,11 X18 0,32 0,44 0,45
X9 0,94 0,44 0,3 X24 0,31 0,27 0,1
X10 0,9 0,43 0,09 X1 0,29 0,42 0,31
X7 0,76 0,42 0,01 X22 0,27 0,26 0,25
X8 0,73 0,4 0,38 X4 0,21 0,42 0,4
X11 0,73 0,37 0,35 X23 0,19 0,21 0,16
X25 0,53 0,34 0,35 X12 0,08 0,14 0,14
Source: created by the authors.
The indicators which are potential bankruptcy predictors have been previously
anonymized and grouped into k=5 categories, according to the intervals of predictors
variability.
The following statistical models of bankruptcy classification were used to predict the
risk of company bankruptcy from the logistics sector, i.e., classic linear discriminant analysis
and logistic regression. What is more, the predictions based on the so-called classification trees
and the method of nearest neighbours were used. As it was previously mentioned, the chosen
methods represent a group of statistical methods.
Table 3. Summary file from the U Mann-Whitney test results for diagnostic indicators, for which there are
no significant differences in group averages for the data one year before the bankruptcy period
Sum of ranks, Sum of ranks,
Statistics Statistics
Indicator class: class: Test probability (p-value)
U Z
non-bankrupt bankrupt
X7 667 461 185 1.926 0.054
X9 547 581 247 -0.606 0.544
X10 597 531 255 0.436 0.663
X14 586 542 266 0.202 0.839
X19 578 550 274 0.032 0.974
X20 587 541 265 0.223 0.823
X22 501 627 201 -1.585 0.113
X23 547 581 247 -0.606 0.544
X24 647 481 205 1.500 0.134
X27 634 494 218 1.224 0.221
X28 624 504 228 1.011 0.312
Source: created by the authors.
The U Mann-Whitney test was conducted in a similar way to research the sample for 2
years before the bankruptcy period. The variables: X7, X10, X12, X14, X19, X20, X23, X24, X27,
X28, were not taken into account for the discriminant model in a further analysis due to the fact
that they did not fulfil the assumption concerning the significant differences in the group
averages.
Table 4. The selected factors and factorial loads for the financial indicators chosen for the LDA model for
the annual horizon of bankruptcy prediction
Indicator Factor 1 Factor 2 Factor 3 Factor 4
X1 0,202989 0,665655 0,122855 0,675759
X2 0,187557 0,661696 0,116269 0,679428
X3 0,963897 0,032657 0,157821 0,156122
X4 0,160394 0,044599 -0,032889 0,948049
X5 0,134304 0,062217 -0,025955 0,938738
X6 0,220255 -0,007286 0,960207 -0,011805
X8 0,863936 0,101991 0,282553 0,006641
X11 0,259115 0,495666 0,802406 -0,010217
X12 0,154252 0,027002 0,954167 -0,006586
X13 -0,975855 -0,076655 -0,139614 -0,123111
X15 0,355713 -0,026919 0,359925 0,093521
X16 -0,975831 -0,070230 -0,116049 -0,129125
X17 0,022439 0,981589 0,024921 -0,000348
X18 0,069037 0,955293 0,074854 0,140203
X21 -0,978005 -0,081108 -0,090674 -0,050183
X25 0,524968 0,070665 0,206248 0,258338
X26 0,974396 0,083771 0,135116 0,128659
Explained
6,152951 3,050959 2,849142 2,869047
variance
Variance
0,361938 0,179468 0,167597 0,168767
share
Source: created by the authors.
In order to eliminate the variables with large mutual correlations (replicating the same
pieces of information in the model), a multidimensional factor analysis was used in the model.
When choosing factor representatives, the factor values from predictor rank table were taken
into account to choose the most significant variables.
The method of isolating the main constituents was used. Factor loads were considered
significantly correlated with the factor at the boundary value set on the level of 0.7. The variant
with factor rotation was used (normalized Varimax method). A minimal own value of 1 and
maximal number of isolated factors of 7 were chosen for the isolation of factors.
Table 4 presents the results of factor analysis for one-year prediction horizon. Thus, the
following predictors were chosen for the LDA model for one-year prediction horizon: X21
(strong significant correlation with factors 1), X18 (strong significant correlation with factor 2),
X11 (strong significant correlation with factor 3) and X4 (strong significant correlation with
factor 4). The indicators that are weakly correlated with factors as well as with each other: X 1,
X2, X15, X25, were as well included in the model.
Similar factor analysis was conducted for the indicator values for two years before the
bankruptcy period. In that case, the following variables were chosen to the LDA model for a
two-year prediction horizon: X5, X6, X8, X9, X15, X17, X21, X22, X25, X26.
In order to estimate the linear discriminant analysis model LDA, the generalized
discriminant analysis models module from the Statistica 10 package was used. Variant of
discriminant analysis used in the package is based on the calculation for each j variable class a
dependant variable of the so-called classifying function described with the following formula
(Prusak, 2005):
FK j a0, j a1, j X1 ... ak , j X k
, (4)
ak , j
where is classifying function factors for a j classifying category and k predictor.
The analysed object can be classified in this class for which the value of classifying
function for the analysed object is greater.
In the estimated models, only those diagnostic variables were left, for which the value
of statistics - Wilks was statistically significant on the level of p<0,1. There were two variants
of diagnostic variables estimated for both one- and two-year prediction horizons.
Table 5 presents the values of multidimensional Wilks’s test for significance of
diagnostic variables in a model and the estimated factors of classifying functions of these
models.
Table 5. Results of Wilks’s test and estimation of classification functions for the LDA models
Value Effect error Discriminant functions
Predictor Test
df
F
df
Test probability (p-value)
Class: NB Class: B
Discriminant model LDA – 1 year before the bankruptcy
absolute term Wilks’s 0,79 11,4 1 44 0,0014 -2,23126 -1,04695
X1 Wilks’s 0,72 17,3 1 44 0,0001 0,01555 0,00411
X6 Wilks’s 0,91 4,1 1 44 0,0481 0,00511 -0,05231
Discriminant model LDA – 2 years before the bankruptcy
absolute term Wilks’s 0,80 20,4 1 83 0,0000 -1,98475 -1,24563
X2 Wilks’s 0,96 3,0 1 83 0,0848 0,02028 0,01443
X26 Wilks’s 0,89 9,7 1 83 0,0025 0,00313 -0,01121
Source: created by the authors.
General form of two-state model of logistic regression describing the dependence of the
possibility of bankruptcy of examined companies depending on a set of factors influencing the
occurrence of this event is expressed by function
1
P(Y 1) ( 0 1 X1 ... k X k )
1 e . (5)
In order to choose potential variables for a logit model, a factor analysis was used as
well as the values of ranking statistics for the importance of predictors (Table 1 and Table 2).
For prediction horizon of 1 year, the X23 and X28 variables were discarded from the list of
potential variables, because they had low value of ranking measures, whereas for a model with
prediction horizon of 2 years, the following variables were discarded: X12, X19, X22, X23, X24,
X28.
After implementing factor analysis, the following variables were chosen for estimating
model for a one year prediction horizon: X26, X18, X20, X11, X22, X5, X10, as well as other
variables (weakly correlated with factors and between themselves): X1, X2, X7, X15, X24, X25,
X27
A list of potential diagnostic indicators for a model with two year prediction horizon,
including variables: X3, X5, X7, X8, X9, X13, X17, X21, X25, was selected in a similar way.
In order to estimate the parameters of logistic regression model, a module of generalized
linear and non-linear models was used (generalized logit model).
In the estimated models, there were only these diagnostic variables, for which the Wald
statistics value was statistically relevant on the level of p<0.05.
The table below (Table 6) presents the estimated coefficients and values of Wald
statistics for both logit models with 1 year and 2 year prediction horizon.
C&RT (Classification and Regression Trees) is a tool for statistical analysis of data used
to create classification and regression models. Tree is a kind of a graphic model created as a
result of recurrent division of a set of output observations into numerous subsets. The aim of
such division is to gain subsets as homogenous as possible in regards to dependent variable
value. Algorithm of recurrent division (so-called Recursive Partitioning) can use different
independent variable on each stage of division. All independent variables (predictors) are
always taken into account, and the chosen variable guarantees the best division of node, namely
receives the division into the most homogenous subsets is received.
Algorithms of decision trees can be divided into 3 basic types:
CLS (Concept Learning System);
AID (Automatic Interaction Detection), an example of this type of algorithms are
CHAID type trees;
C&RT (Classification and Regression Trees).
More about methods and trees algorithms in classifying and regression use can be found
in Breiman et al. (1993).
C&RT trees algorithms were used in this publication to analyse the bankruptcy of
logistics companies. A Statistica package module – General models of classification and
regression trees was used. All 28 financial indicators were chosen as entry variables. Gini
measure was used as a method of trees division, whereas in order to choose the best trunked
tree, a V-times cross-validation as a rule of one standard error was used. Minimization of
average costs of incorrect classification was used as a criterion of optimal tree trunking (the
same costs of incorrect classification, equal to 1, were set for bankrupts and non-bankrupts).
The structure of the best classification trees for one year and two year prediction horizon
is presented in Table 7. There are rules of tree division and node creation as well as classification
effectiveness of trees that are given in the table. For a classification tree for one year prediction,
the average costs of incorrect classification amounted to 0.106 for a learning sample and 0.162
for a test sample. For a two year prediction, these costs amounted to 0.256 and 0.258,
respectively.
The figure below (Figure 3) presents a graphic illustration of the classification tree to
classify the logistics companies threatened with bankruptcy in one year period horizon.
2 G B 2 G 2 B 2
with the following formula (Thomas, 2009): ,
(11)
NB x f (x | NB)
where x is the average value of bankruptcy probability in a population
B x f (x | B)
of healthy companies (NB), x - the average value of bankruptcy probability in
x NB f (x | NB) B2 x B f (x | B)
2 2 2
NB
a population of bankrupts (B), x , x - variance
of bankruptcy probability distribution respectively for the population of healthy companies and
bankrupts, f (x | NB), f (x | B) - percentage of healthy and bankrupt companies for a given
category of bankruptcy probability.
It is assumed that the divergence should take values above 0.5 in order for the scrutinized
distributions to lay far enough from each other and the scrutinized model to have acceptable
ability to properly separate bankrupts from companies not threatened with bankruptcy.
Hosmer-Lemeshow statistics is based on Chi-squared statistics, and it is calculated by
n p NBi
2
N
HL i i
i 1 ni pi 1 pi
using the following formula (Thomas, 2009): , (12)
where pi is average probability of affiliation with non-bankrupt class for the given i
rating category, NBi - the number of healthy companies in a given rating category, N - set
number of rating categories, into which the range of bankruptcy probability fluctuation has been
divided. Hosmer-Lemeshow statistics has a distribution with df N 2 degrees of freedom.
2
The higher are the values of H-L statistics, the better is the model’s ability to differentiate
distribution in both populations (B and NB), and the better are the classifying abilities of the
model.
ROC concentration curve is a graphic way of presenting classification power of models
in correct separation of bankrupted and healthy companies in comparison with the perfect model
(having an effectiveness of 100% correct classification) and random model (completely random
classification). The measure of conformity with the perfect model is the measure of field under
ROC curve
AUROC 0.5 Gini 1
. The higher (closer to 1) are the values of the field under
ROC curve, the better is the predictive ability of the evaluated model.
n i 1 , (13)
where n is the number of observations in sample, d i - dummy variable with value of 1,
when company is considered bankrupt and with a value of 0 otherwise, PDi - bankruptcy
probability estimated on the basis of a model.
The lower is the Brier Score value, the better calibrated is the model for data, and it
should have better prediction properties.
LL model reliability coefficient (LL) is defined with the following formula (Prusak,
n n
LL P Yi | X i PDi ( X i )Yi 1 PDi ( X i )
1Yi
2005): i 1 i 1 (14)
where n is the number of observations, PDi ( X i ) - estimated bankruptcy probability at
given values of entry variables (independent) in a model, Yi - dummy variable defining Y=1 –
bankrupts and Y=1 – non-bankrupts.
Table 10. Validation parameters of estimated models for a prediction horizon of 1 year
Eff1 Eff2 Brier
Model IV K-S Gini Divergence H-L AUROC LL (model)
NB B Score
learning sample
88% 96% 3.6 0.83 0.89 8.8 11.2 0.95 0.081 1,8 106
Logit
test sample
78% 90% 2.8 0.80 0.91 5.7 3,3 0.95 0.108 3, 0 103
learning sample
92% 91% 4.0 0.83 0.89 5.3 17.9 0.95 0.152 1,3 1010
Network
MLP 26-8-2 test sample
89% 100% 2.4 0.89 0.82 2.9 48.2 0.91 0.162 2,1105
learning sample
92% 83% 2.6 0.75 0.86 4.2 13.1 0.93 0.135 1,5 109
Network
MLP 6-3-2 test sample
89% 90% 2.8 0.79 0.91 7.0 3.6 0.96 0.111 8, 6 104
learning sample
92% 96% 5.7 0.96 0.93 14.4 7.2 0.97 0.059 1,8 105
C&RT Tree
test sample
78% 90% 1.3 0.68 0.67 4.2 19.0 0.83 0.140 1,1 104
The higher is the values of classification model reliability for a learning sample, the
better it is calibrated on the basis of entry data. High values of reliability indicator for the test
sample should point to good classifying value of the model as well as new, unknown cases.
Table 10 and Table 11 present the validation statistics for all the examined models of
predicting bankruptcy of logistics companies.
Table 11. Validation parameters of estimated models for a prediction horizon of 2 years
Eff1 Eff2 Brier
Model IV K-S Gini Divergence H-L AUROC LL (model)
NB B Score
learning sample
74% 81% 1.9 0.58 0.65 2.2 6.0 0.82 0.172 1,1 1019
Logit
test sample
79% 79% 1.8 0.57 0.70 2.9 4.9 0.85 0.153 2,8 106
learning sample
88% 84% 3.6 0.74 0.87 5.9 9.6 0.94 0.103 4,8 1013
Network
MLP 22-17-2 test sample
100% 86% 3.7 0.86 0.92 10.2 2.4 0.96 0.087 3, 7 10 4
learning sample
67% 79% 2.4 0.56 0.70 2.4 14.8 0.85 0.184 1,8 1021
Network
MLP 4-8-2 test sample
100% 86% 3.7 0.86 0.94 6.8 11.4 0.97 0.167 4, 6 107
learning sample
88% 81% 2.8 0.70 0.75 4.2 8.8 0.88 0.127 2, 6 1016
C&RT tree
test sample
71% 71% 1.3 0.50 0.56 0.8 19.0 0.78 0.229 0
Method of test sample
k-nearest
neighbours 72% 90% 2,4 0,98 0,43 2,8 12,6 0,70 0,161 0
5. Bankruptcy Prediction for Logistics Companies from the Podkarpacie Region and
Slovakia
When setting predictions of possible bankruptcy with the help of analysed models, it is
worth separating them into two groups, as it was previously done. One of them comprises of
predictions of models estimated on the basis of data one year before the bankruptcy period,
second of them of predictions by the same group of models estimated, however, on the basis of
data for two years before the bankruptcy period (Table 12). In the first case, there is a sample
of 125 “healthy” logistics entities, out of which 82 (65.6%) are companies from the Podkarpacie
region and 43 (34.4%) are companies from Slovakia. In the second variant, the total number of
companies is 104, out of which 61 (58.7%) are companies from the Podkarpacie region and 43
(41.3%) from Slovakia.
Table 12. Average values of predictions in section of analysed models for the Podkarpacie region and
Slovakia
Poland
Slovakia
Podkarpacie
estimations on the basis of data for one year before the
bankruptcy period
LDA model 0.391036 0.381273
logit model 0.316692 0.220125
C&RT model 0.234901 0.127353
k-nearest neighbours method 0.270732 0.248837
One-year average prediction 0.303340 0.244397
Two-year average prediction 0.438284 0.391394
Three-year average prediction 0.518581 0.491453
estimations on the basis of data for two years before the
bankruptcy period
LDA model 0.348442 0.354281
logit model 0.354433 0.273509
C&RT model 0.439643 0.413356
k-nearest neighbours method 0.385246 0.395349
One-year average prediction 0.381941 0.359124
Two-year average prediction 0.564387 0.554978
Three-year average prediction 0.666512 0.671540
Source: created by the authors.
Table 13. Scale of bankruptcy threat in the context of number of possible bankruptcies indications by the
analysed models
Number of bankruptcies
0 1 2 3 4
estimations on the basis of data for one year before the
bankruptcy period
Poland
52 13
Podkarpacie 4 (4.88%) 5 (6.10%) 8 (9.76%)
(63.41%) (15.85%)
region
29 5 5
Slovakia 2 (4.65%) 2 (4.65%)
(67.44%) (11.63%) (11.63%)
81 13 15
Total 9 (7.20%) 7 (5.60%)
(64.80%) (10.40%) (12.00%)
estimations on the basis of data for two years before the
bankruptcy period
Poland
15 20 11 9
Podkarpacie 6 (9.84%)
(24.59%) (32.79%) (18.03%) (14.75%)
region
8 14 9 10
Slovakia 2 (4.65%)
(18.60%) (32.56%) (20.93%) (23.26%)
23 34 20 16 11
Total
(22.12%) (32.69%) (19.23%) (15.38%) (10.58%)
Scale of
Not very possible Possible
bankruptcy threat
Source: created by the authors.
A large number of companies from both regions was placed in the group of the so-called
small risk in the case of possible predictions concerning bankruptcy. It is the most visible in the
case of estimations made on the basis of data one year before the bankruptcy. In the case of
almost 2/3 analysed entities, not even one of the analysed models indicated a bankruptcy threat.
A slightly worse situation is in the case of calculations based on data two years before the
bankruptcy period. In this variant, there is an increased percentage of companies for which one
of the four analysed models showed a negative signal concerning the appearance of possible
bankruptcy. The increase in that case was more than quadruple and included almost 1/3 of the
entities.
When making a comparison of relations of companies from the Podkarpacie region and
Slovakia, there can be observed that in the case of the latter, a downward tendency is kept for a
number of entities, for which most models predict a bankruptcy risk. In the case of estimations
on data both one and two years before the bankruptcy period in the group of companies
threatened at most with bankruptcy, there are seven companies (16.28% of the examined ones)
in the second case when the estimations were done on the basis of data two years before the
bankruptcy period, there are 12 companies of this type in that group (27.91%). Shifting this way
of thinking to the Podkarpacie region market, a similar tendency cannot be confirmed. The
situation is a bit different. Even if most companies are in the group of low bankruptcy risk, in
the group of high bankruptcy risk, there are many of them, at least in the first considered variant.
There, the percentage of threatened entities equals 25.61%; thus, every fourth examined
company can be included. In the second case, it is a bit better, because the general percentage
of companies with an increased risk of bankruptcy stays on an almost unchanged level of
24.59%.
Conclusions
References
Altman, E.I., Haldeman, R.G., Narayanan, P. (1977), “ZETA ANALYSIS, A new model to identify bankruptcy
risk of corporations”, Journal of Banking and Finance, Vol. , No , pp.29-54.
Atiya, A.F. (2001), “Bankruptcy prediction for credit risk using neural networks: A survey and new results”, IEEE
Transactions on Neural Networks, Vol. 12, No 4, pp.929-935.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1993), Classification and Regression Trees, Town,
Chapman and Hall.
Choi, W.S., Lee, S. (2013), “A multi-industry bankruptcy prediction model using back-propagation neural network
and multivariate discriminant analysis”, Expert Systems with Applications, Vol. 40, No 8, pp.2941-2946.
Fedorova, E., Gilenko, E., Dovzhenko, S. (2013), “Bankruptcy prediction for Russian companies: Application of
combined classifiers”, Expert Systems with Applications, Vol. 40, No , pp.7285-7293.
TRANSFORMATIONS IN BUSINESS & ECONOMICS, Vol. 15, No 1 (37), 2016
J. Brozyna, G. Mentel, T. Pisula ISSN 1648 - 4460
Simulation and Evaluation of Business Economic Factor
Fletcher, D., Goss, E. (1993), “Application forecasting with neural networks an application using bankruptcy data”,
Information and Management, Vol. 24, No , pp.159-167.
Frydman, H., Altman, E.I.,Kao, D. (1985), “Introducing recursive partitioning for financial classification: The case
of financial distress”, Journal of Finance, Vol. 40, No 1, pp.269-291.
Hamrol, M., Chodakowski, J. (2008), “Prognozowaniezagrożeniafinansowegoprzedsiębiorstwa.
Wartośćpredykcyjnapolskichmodelianalizydyskryminacyjnej”, Badania Operacyjnei Decyzje, No 3,
pp.17-31, [Prognozowaniezagrożeniafinansowegoprzedsiębiorstwa.
Wartośćpredykcyjnapolskichmodelianalizydyskryminacyjnej, in Polish].
Jones, S., Hensher, D.A. (2004), “Predicting firm financial distress: A mixed logit model”, Accounting Review,
Vol. 79, No 4, pp.1011–1038.
Karels, G.V., Prakash, A.J. (1987), “Multivariate normality and forecasting of business bankruptcy”, Journal of
Business Finance and Accounting, Vol. 14, No 4, pp. .
Kaski, S., Sinkkonen, J., Peltonen, J. (2001), “Bankruptcy analysis with self-organizing maps in learning metrics”,
IEEE Transactions on Neural Networks, Vol. 12, No 4, pp.936-947.
Kiviluoto, K. (1998), “Predicting bankruptcies with self-organizing map”, Neurocomputing, Vol. 21, No , pp.191-
201.
Kolari, J., Glennon, D., Shin, H., Caputo, M. (2002), “Predicting large US commercial bank failures”, Journal of
Economics and Business, Vol. 54, No 32 1, pp.361-387.
Korol, T. (2012), “Early warning models against bankruptcy risk for Central European and Latin American
enterprises”, Economic Modelling, Vol. 31, No , pp. 22-30.
Kumar, P.R., Ravi, V. (2007), “Bankruptcy prediction in banks and firms via statistical and intelligent techniques
– A review”, European Journal of Operational Research, Vol. 180, No , pp.1-28.
Lam, M. (2004), “Neural networks techniques for financial performance prediction: integrating fundamental and
technical analysis”, Decision Support Systems, Vol. 37, No , pp.567-581.
Lee, K., Booth, D., Alam, P. (2005), “A comparison of supervised and unsupervised neural networks in predicting
bankruptcy of Korean firms”, Expert Systems with Applications, Vol. 29, No , pp.1-16.
Leshno, M., Spector, Y. (1996), “Neural network prediction analysis: The bankruptcy case”, Neurocomputing,
Vol. 10, No , pp.125-147.
Löffler, G., Posch, P.N. (2007), Credit risk modeling using Excel and VBA, Wydawnictwo Wiley, Chichester,
West Sussex, p.156.
Marais, M.L., Patel, J., Wolfson, M. (1984), “The experimental design of classification models: An application of
recursive partitioning and bootstrapping to commercial bank loan classifications”, Journal of Accounting
Research, Vol. 22, No , pp.87-113.
Martin, D. (1977), “Early warning of bank failure: A logit regression approach”, Journal of Banking and Finance,
Vol. 1, No , pp.249-276.
Matuszyk, A. (2004), Credit scoring – metodazarządzaniaryzykiemkredytowym, Wydawnictwo CeDeWu,
Warszawa, p.119-122, [.
Ohlson, J.A. (1980), “Financial rations and the probabilistic prediction of bankruptcy”, Journal of Accounting
Research, Vol. 18, No , pp.109-131
Prusak, B. (2005), Nowoczesnemetodyprognozowaniazagrożeniafinansowegoprzedsiębiorstw, Wydawnictwo
Difin, Warszawa, [Modern methods of forecasting the financial risks of companies, in Polish].
Serrano-Cinca, C. (1996), “Self -organizing neural networks for financial diagnosis”, Decision Support Systems,
Vol. 17, No , pp.227-238.
Tam, K.Y., Kiang, M. (1992), “Predicting bank failures: A neural network approach”, Decision Sciences, Vol. 23,
No , pp.926-947.
Thomas, L.C. (2009), Consumer credit models. Pricing, Profit and Portfolios, Oxford University Press, Oxford,
p.111.
Tseng, F.M., Hu, Y.C. (2010), “Comparing four bankruptcy prediction models: Logit, quadratic interval logit,
neural and fuzzy neural networks”, Expert Systems with Applications, Vol. 37, No 3, pp.1846-1853.
Wilson, R.L., Sharda, R. (1994), “Bankruptcy prediction using neural networks”, Decision Support Systems, Vol.
11, No , pp.545-557.
Witkowska, D. (2002), Sztucznesiecineuronoweimetodystatystyczne. Wybranezagadnieniafinansowe, C.H. Beck,
Warszawa, pp.86-87, [.
Yu, L., Wang, S., Lai, K.K., Zhou, L. (2008), Bio-Inspired Credit Risk Analysis. Computational Intelligence with
Support Vector Machines, Springer-Verlag, Berlin Heidelberg, pp.14-15.
SANTRAUKA
Esminė šio straipsnio tema tai – bankroto problema galimų grėsmių atsiradimo kontekste. Pristatomas
tyrimas siekia įrodyti aprašytų modelių gautas vertes, numatant galimus bankroto ženklus ir vertinant Lenkijos ir
Slovakijos įmonių iš TSL (transporto, laivybos, logistikos) sektoriaus finansinę būklę. Tam, kad būtų galima
prognozuoti logistikos sektoriaus įmonės bankroto riziką, tokie statistiniai bankroto klasifikavimo modeliai kaip
klasikinė tiesinė diskriminantinė analizė ir logistinė regresija buvo panaudoti. Prognozės taip pat rėmėsi taip
vadinamais klasifikavimo medžiais ir artimiausio kaimyno metodu. Tinkamo klasifikavimo empirinis
patvirtinimas, kuris buvo taikytas pateiktų statistinės bankroto analizės metodų grupėms, vertinant jų efektyvumą,
atskleidė, kad šiems metodams būdingas aukštos kokybės bankroto prognozavimas. Aprašytos sąvokos suteikia
galimybę lengvai įvertinti bankroto rizikos grėsmę analizuojamose įmonėse. Vienas iš esminių pateiktų rezultatų
privalumų tai - tyrimo imties skirstymas į taip vadinamas mokymosi grupes, kurioms buvo nustatyti tirtų modelių
parametrai, ir tyrimo bandinys tinkamų klasifikacijų veiksmingumui tirti, kuriam visos prognozės buvo nustatytos
vienerių ir dviejų metų prieš bankrotą periodams.