Asian Development Bank Institute: ADBI Working Paper Series
Asian Development Bank Institute: ADBI Working Paper Series
Asian Development Bank Institute: ADBI Working Paper Series
No. 1111
April 2020
The Working Paper series is a continuation of the formerly named Discussion Paper series;
the numbering of the papers continued without interruption or change. ADBI’s working papers
reflect initial ideas on a topic and are posted online for discussion. Some working papers may
develop into other forms of publication.
Suggested citation:
Nguyen, L. H. and M. Sagara. 2020. Credit Risk Database for SME Financial Inclusion. ADBI
Working Paper 1111. Tokyo: Asian Development Bank Institute. Available:
https://www.adb.org/publications/credit-risk-database-sme-financial-inclusion
Tel: +81-3-3593-5500
Fax: +81-3-3593-5571
URL: www.adbi.org
E-mail: info@adbi.org
Abstract
We introduce the Credit Risk Database (CRD) and its contribution to financial inclusion efforts
in Japan. By collecting financial data about small and medium-sized enterprises (SMEs), the
CRD contributes to the overall understanding of the SME sector, to the adaptation of risk-
based lending and to a fairer loan guarantee system. In addition to financial data, the CRD
also includes alternative data, bank account transaction data, when assessing SME credit. A
machine learning model is adopted to process the extremely large body of transaction data.
The best performing predictors of default include cash balance
and cash outflow related to repayments. The machine learning model outperforms the logistic
model and is highly accurate in predicting the probability of short-term default. The alternative
data and model can serve as both an enabling short-term monitoring instrument and a credit
assessment tool for SMEs without financial statements.
Contents
1. INTRODUCTION ......................................................................................................... 1
REFERENCES ..................................................................................................................... 12
ADBI Working Paper 1111 Nguyen and Sagara
1. INTRODUCTION
The access that small and medium-sized enterprises (SMEs) have to finance has been
central to policy discussion in both advanced and developing countries. Maningo (2016)
and Amornkitvikai et al. (2016) describe common situations in developing Asia where
SMEs struggle with limited funding, restrictive repayment periods, high financing costs,
and a complex lending process. One of the major concerns is collateral, which most
SMEs cannot provide.
Efforts to loosen collateral requirements only began after the Asian financial crisis as a
result of declining financial assets and property. These efforts, however, made only a
small impact on closing the SME finance gap. The World Bank reports an unmet SME
finance demand of about $5.3 trillion every year, the largest share of which is from Asia.
Financial institutions are encouraged to further lessen dependence on collateral and to
fast-track risk-based lending for SMEs, to improve their access to finance.
In this paper, we introduce the Japanese approach to SME risk-based lending. We
present Japan’s nationwide financial database, which plays an important role as SME
infrastructure enabling SME credit risk analysis and modeling. We also introduce the
three most important contributions of the database: SME statistical analysis, credit risk
assessment models and the guarantee fee rating scheme. Second, we explore bank
transaction data as an alternative for incorporation into risk models to tackle challenges
in short-term monitoring and the lack of financial information for micro SMEs and
startups. We briefly describe our model, which was developed together with the Bank of
Japan (BOJ) and a Japanese Megabank, along with the main variables, the machine
learning approach and its overall performance. Finally, policy implications for the
improvement of SME access to finance are drawn from the Japanese experience,
indicating the leadership role that the financial sector authority should take in establishing
SME data infrastructure and promoting statistical analysis using both traditional financial
data and alternative data.
1
ADBI Working Paper 1111 Nguyen and Sagara
2.5 million SMEs and 1.2 million sole proprietors. There are 21 million financial
statements (data points) for SMEs and 5.6 million financial statements for sole
proprietors.
1 The study investigates firm performance before and after receiving subsidies from the government. Firms
are categorized by their level of creditworthiness measured by default probability, and by growth potential
measured by growth scores. Both default probability and growth scores are build-in parameters calculated
using CRD models. For further details, refer to the Japan SME Agency sponsored paper (in Japanese):
https://www.meti.go.jp/meti_lib/report/H30FY/000316.pdf.
2
ADBI Working Paper 1111 Nguyen and Sagara
As part of the SME inclusion efforts, the credit grantee system for SMEs is being
discussed and designed in many developing nations, such as the Philippines, Myanmar,
Thailand, and so on. Maningo (2016) pointed out that the lack of SME information is a
challenge for the Philippines’ Credit Surety Fund, in order to correctly assess and
manage default risk, as the past due ratio increased to 1.6% in 2014 compared to the
2013 level of 0.6%. A database such as the CRD could both provide SME information
for guarantee decision-making and create an enabling instrument to strategically decide
guarantee fees.
3
ADBI Working Paper 1111 Nguyen and Sagara
2 35–85 financial items in financial statements, approximately 174 financial ratios, are calculated and
become candidates for model variables.
4
ADBI Working Paper 1111 Nguyen and Sagara
sample dataset segmented into industries and regions. If the model doesn’t fit to the out-
of-sample dataset and accuracy falls, adjustments to the model shall be advised by the
committee. The results of validation are available yearly to all member institutions.
Validation framework is described in more details by Kuwahara et al. (2019).
3.1 Dataset
The transaction dataset consists of data ranging from October 2014 to May 2018
(44 months). 4 The original frequency of the transaction data was measured in seconds
but later transformed into monthly frequency for analysis purposes. The two main
reasons for this transformation are that default information is often available in monthly
frequency; and that some major transaction data, such as fixed revenues and fixed costs,
are often monthly payments.
Table 1 describes the three categories of data in our dataset: cash-inflow, cash-outflow
and cash balance. The cash-inflow subcategory mainly consists of cash flow from
3 The original study is published in Bank of Japan Working Paper series (Japanese):
https://www.boj.or.jp/research/wps_rev/wps_2019/wp19j04.htm/.
4 This dataset is prepared for research purpose only.
5
ADBI Working Paper 1111 Nguyen and Sagara
6
ADBI Working Paper 1111 Nguyen and Sagara
Source: Author.
7
ADBI Working Paper 1111 Nguyen and Sagara
The study further employs XGBoost, a more advanced version of Random Forest, which
can customize the creation of each decision tree. In XGBoost, new trees are added to
correct or minimize the errors of sequential trees. The best model is created when no
further improvement can be made. XGBoost is easier to overlearn than Random Forest
and so the regularization and cross-validation of the model is necessary to prevent such
overlearning.
Source: Authors.
8
ADBI Working Paper 1111 Nguyen and Sagara
5 The closer that AR is to one, the better the model. A random model’s AR equals 0 and a perfect model’s
AR equals 1. See Annex 3 of “The Study for the Introduction of Credit Risk Database (CRD) in the
Philippines: Data Quality Examination Report” for technical details.
6 Feature importance helps indicate a variable’s predictive power and which variables are the top features
contributing to the prediction of the model. Feature importance is calculated based on the training model,
not testing models, most commonly by the Gini coefficient. Refer to Hastie et al. (2014) for technical
details.
9
ADBI Working Paper 1111 Nguyen and Sagara
Table 4 reports the model performance of the traditional logistic model and machine
learning model. Logistic regression that employs some of the variables with the highest
feature importance performs slightly inferior to machine learning models on out-of-
sample data. The logistic model provides an AR of approximately 0.71 for both testing
data and back test data, lower than XGBoost in both tests and lower than random forest
in back testing.
7 The screening for the new credit line will be done by a machine learning model, analyzing the cash
movements of the applicant’s account. Financial cost ranges from 3% to 9%. The lending amount is
capped at 10 million yen, approximately $100,000 (1USD=100Yen). News release of Resona Bank
(in Japanese): https://www.resona-gr.co.jp/holdings/news/hd_c/download_c/files/20200110_1a.pdf?_ga
=2.47077284.2056143362.1579136287-646537328.1579049610.
10
ADBI Working Paper 1111 Nguyen and Sagara
by cash inflows from revenue, cash outflows for variable costs, cash outflow for cost of
goods and services, and cash inflows from loans.
Model accuracy measured by accuracy ratio peaks at 0.707 for a random forest with 350
decision trees and 150 features, and at 0.733 for XGBoost where the tree’s maximum
depth equals four nodes and learning rate equals 0.01. Both models reach accuracy
levels general approved by the Japanese banking sector.
This paper indicates that SME financial inclusion efforts can incorporate both traditional
and alternative data. Traditional data mainly assists lending decisions, while alternative
data supports monitoring as well as credit assessment for SMEs without financial
statements. Since the establishment of a nationwide database involves many interested
parties and complex procedures, the leadership of the financial sector authority is an
important factor that helps synchronize the efforts of all parties, and directs them towards
a common goal.
11
ADBI Working Paper 1111 Nguyen and Sagara
REFERENCES
Amornkitvikai, Y., and C. Harvie. 2016. The Impact of Finance on the Performance of
Thai Manufacturing Small and Medium-Sized Enterprises. ADBI Working Paper
576. Tokyo: Asian Development Bank Institute. Date of access: 2020/04/09.
Available: http://www.adb.org/publications/impactfinance-performance-thai-
manufacturing-small-and-medium-sized-enterprises/.
Banko Sentral ng Pilipinas. 2019. Financial Inclusion Initiatives. Manila. Date of access:
2020/04/09. Available: http://www.bsp.gov.ph/downloads/Publications/
2019/microfinance_2019.pdf.
Deloitte Tohmatsu Consulting. 2018. Data utilization of SMEs and small businesses
and the optimal information website. Japan SME Agency Report. Date of
access: 2020/04/09. Available: https://www.meti.go.jp/meti_lib/report/
H30FY/000316.pdf.
Hastie T., Tibshirani R. and Friedman J. 2014. The Elements of Statistical Learning.
Springer. California.
Japan International Cooperation Agency (JICA). 2019. The Study for the Introduction of
Credit Risk Database (CRD) in the Philippines: Final Report. Date of access:
2020/04/09. Available: http://open_jicareport.jica.go.jp/pdf/12344552.pdf.
Japan SME Agency, Ministry of Economy, Trade and Industry. 2019. White Paper on
Small and Medium Enterprises in Japan. Date of access: 2020/04/09. Available:
https://www.meti.go.jp/english/press/2019/pdf/0426_010a.pdf.
Japan Federation of Credit Guarantee Corporation, Annual Report 2019. Date of
access: 2020/04/09. Available: https://www.zenshinhoren.or.jp/english/
anual2019.pdf.
Kuwahara, S., N. Yoshino, M. Sagara, and F. Taghizadeh-Hesary. 2019. Establishment
of the Credit Risk Database: Concrete Use to Evaluate the Creditworthiness of
SMEs. ADBI Working Paper 924. Tokyo: Asian Development Bank Institute.
Date of access: 2020/04/09. Available: https://www.adb.org/publications/
establishment-credit-risk-database-evaluatecreditworthiness-smes.
Maningo, G. V. 2016. Credit Surety Fund: A Credit Innovation for Micro, Small, and
Medium-Sized Enterprises in the Philippines. ADBI Working Paper 589. Tokyo:
Asian Development Bank Institute. Date of access: 2020/04/09. Available:
http://www.adb.org/publications/credit-surety-fundcredit-innovation-micro-small-
and-medium-sized-enterprises-philippines/.
Miura S., Ijitsu Y., Takekawa M. 2019. Credit Risk Assessment using Transaction
Information: An Empirical Study with AI Approach. Tokyo: Bank of Japan
Working Paper Series. Date of access: 2020/04/09. Available:
https://www.boj.or.jp/research/wps_rev/wps_2019/wp19j04.htm/.
World Bank. 2017. Global Findex Database. Date of access: 2020/04/09. Available:
https://globalfindex.worldbank.org/basic-page-overview.
12