Paper 25
Paper 25
Paper 25
ABSTRACT The significant losses that banks and other financial organizations suffered due to new bank
account (NBA) fraud are alarming as the number of online banking service users increases. The inherent
skewness and rarity of NBA fraud instances have been a major challenge to the machine learning (ML)
models and happen when non-fraud instances outweigh the fraud instances, which leads the ML models to
overlook and erroneously consider fraud as non-fraud instances. Such errors can erode the confidence and
trust of customers. Existing studies consider fraud patterns instead of potential losses of NBA fraud risk
features while addressing the skewness of fraud datasets. The detection of NBA fraud is proposed in this
research within the context of value-at-risk as a risk measure that considers fraud instances as a worst-case
scenario. Value-at-risk uses historical simulation to estimate potential losses of risk features and model them
as a skewed tail distribution. The risk-return features obtained from value-at-risk were classified using ML
on the bank account fraud (BAF) Dataset. The value-at-risk handles the fraud skewness using an adjustable
threshold probability range to attach weight to the skewed NBA fraud instances. A novel detection rate (DT)
metric that considers risk fraud features was used to measure the performance of the fraud detection model.
An improved fraud detection model is achieved using a K-nearest neighbor with a true positive (TP) rate of
0.95 and a DT rate of 0.9406. Under an acceptable loss tolerance in the banking sector, value-at-risk presents
an intelligent approach for establishing data-driven criteria for fraud risk management.
INDEX TERMS Detection rate, fraud detection, K-nearest neighbor, skewed instances, value-at-risk.
I. INTRODUCTION and debit card fraud, mortgage fraud, and many more [4.5].
The Association of Certified Fraud Examiners (ACFE) The act of opening an account to commit fraud at banks or
2022 released a financial fraud report stating that 2,110 fraud other financial organizations is known as ‘‘new bank account
cases involving industries in financial sectors in 133 coun- (NBA) fraud’’ [6]. Fraud not only results in immediate finan-
tries resulted in losses of around $3.6 billion [1]. Financial cial losses and erodes public confidence in institutions, but
fraud can be termed as the deliberate employment of unlaw- has broader consequences, affecting customers and finan-
ful procedures or tactics to obtain financial gain [2]. The cial systems through market instability and contributing to
consequences of financial fraud can potentially disrupt larger macroeconomic downturns [7]. Fraud datasets typi-
economies, raise living expenses, and undermine consumer cally exhibit some properties including skewness, evolving
confidence [3]. Forms of financial fraud include insurance patterns, highly dimensional, and restricted access to relevant
fraud, money laundering, new bank account fraud, credit information. Specifically, fraud skewness which represents
the majority fraud class over the non-fraud class has been
The associate editor coordinating the review of this manuscript and a major concern to studies, as it affects the performance
approving it for publication was Chao Tong . of fraud detection model. The Skewed fraud instances can
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 64285
A. U. Usman et al.: Financial Fraud Detection Using Value-at-Risk With Machine Learning in Skewed Data
have a bad influence on machine learning algorithms such • This paper used novel detection rate performance met-
as distance-based algorithms [8]. Previous efforts in tackling rics to capture the overall performance in detection of
fraud involve developing rule-based expert systems, statisti- NBA fraud instances that incorporate risk fraud factors.
cal methods, machine learning, and risk-based methods [9], The remainder of the paper is arranged as follows: The
[10]. Due to the cost of maintenance and the inefficiency of study’s review of the literature is presented in Section II. The
rule-based methods [10], decision-makers decide to utilize problem definition is presented in Section III. The materials
statistical methods such as autoregressive models to handle and procedures are presented in Section IV. The experimental
financial fraud [11], [12], [13]. The complex patterns and setup is presented in Section V. The results are presented
high dimensional nature of frauds make the statistical meth- in Section VI. The study’s conclusions and discussions are
ods less effective, as such machine learning models were presented in Section VII.
deployed [10], [14]. However, some of the studies that utilize
machine learning techniques were found to have a high False
II. LITERATURE REVIEW
Positive (FP) rate [15], [16], [17]. Machine learning models
This section presents related studies in financial fraud detec-
can potentially handle high-dimensional data and complex
tion. Different studies exist that utilize both statistical and
patterns of fraud instances.
artificial intelligence-based methods in the context of a risk
To evaluate the effectiveness of machine learning model,
and financial fraud perspective.
Jesus et al. [18] presented the first domain-specific and real-
world bank account fraud (BAF) dataset. The datasets were
generated using generative adversarial networks (GANs) and A. STATISTICAL METHODS OF FRAUD DETECTION
evaluated using light gradient boosting method (LGBM).The Many studies in the literature utilize statistical methods in
study [18], [19] utilizes 25 sets of hyperparameter configura- evaluating financial fraud. Specifically, significant studies
tions to optimize the LGBM model, utility aware reweighing were found to utilize ordinary least squares (OLS) regression
was used to handle the class skewness of BAF dataset. and autoregressive (AR) models for financial fraud evalu-
The study [15] utilizes stacking in ensembled learning with ation. Using the Tehran Stock Exchange dataset, the study
majority voting to evaluate the BAF dataset and address [21] uses a regression model to investigate the association
the changing fraud patterns. The study [20] uses federated between auditor characteristics and fraud detection in emerg-
learning in addressing data privacy issues of BAF dataset ing economies. The authors provide useful information for
and deep neural networks to classify fraud instances. These improving the reliability of the findings. Using pooled OLS
studies achieve good performance in addressing BAF chal- and panel regressions, the study [22] investigates the effect
lenges; However, the studies do not consider the potential of political alignment on corporate fraud convictions, offer-
losses of fraud risk features. To our knowledge, little research ing insights into the connection between politics and fraud.
exists that employs machine learning techniques in NBA The authors use state-level data from 2003 to 2018 on US
fraud detection. The detection of NBA fraud is proposed corporate fraud convictions and party affiliation. The study
in this paper within the context of risk management that [23] utilizes OLS to investigate financial factors of financial
uses value-at-risk to considers skewed fraud instances as a fraud, which is attributed to the fraud triangle. The study [24]
worst-case scenario. To adequately estimate the losses of uses logistic regression to discover that external pressures
fraud risks, value-at-risk was augmented with expected loss and financial stability had a favorable impact on financial
and expected shortfall of frauds which further quantifies reporting fraud. On the other hand, collaboration, arrogance,
the mean and extreme loss effects respectively. These risk changes in directors, incompetent oversight, and hubris have
measures combination will allow the quantification of risks little bearing on false financial reporting. The study [25]
across mean, worst-case, and extreme scenarios. Value-at- provides evidence for the contribution of gender diversity to
risk employs historical simulation to estimate potential losses fraud commission and detection in Chinese listed businesses
of risk features. The risk-return features obtained from value- between 2007 and 2018 using bivariate probit model. The
at-risk are based on assessing their risk exposure to fraud authors opined that female corporate executives are linked to
risk. The risk-return features are sent as input to the NBA a stronger ability to detect fraud, which lowers the likelihood
fraud detection model. Different machine learning models that businesses to commit fraud. From the standpoint of
were trained; However, the K-nearest neighbor outperformed external auditors, the study [26] sheds light on the causes of
other models. The contributions of this paper are: fraud and the function of forensic accounting using regression
analysis to analyze Lebanese data. The study [4] discov-
• This paper used an extreme value theorem to model the ered that while the overall number of employees engaged
tails (potential losses) instead of the fraud pattern. in fraud affects the performance of money banks in Nigeria,
• This paper used value-at-risk to model the skewness of the number of fraud cases and the total amount lost to fraud
fraud instances more efficiently. had a favorable influence. The use of statistical methods by
• This paper utilized historical simulation to estimate the author such as OLS regression, Pearson correlation, and
value-at-risk as it makes no assumptions on any descriptive analysis strengthens the findings by the authors.
distribution. The sales growth index and the depreciation index factors
make up the M-score are used in the study [27] to analyze produce better and consistent results. The study [33] uses the
the possibility of profit management using the Athens Stock number of compromised records to determine the cost of a
Exchange Market. It is pertinent to know that a large body data breach; the findings indicate that the total number of
of literature exists that utilizes the AR model. To handle affected records has a Fréchet distribution, random forest is
[12] large-scale non-uniform transactions more quickly, the used for estimating the number of such records. The study
authors employ the AR model, which makes it appropriate [34] uses the estimate of generalized extreme value parame-
for detecting money laundering operations. The study [11] ters to evaluates competency, digital technology abilities, and
uses factor analysis to generate the composite indicator, frac- personality qualities that may improve the ability of external
tional integration (ARFIMA), and fractional cointegration auditors to identify fraud risk, the efficiency of fraud risk
VAR (FCVAR) approaches to evaluate the behavior of the assessment was linked to digital technology abilities through
composite suspicion tax fraud indicator about GDP and tax the application of the partial least-squares structural equation
collection. The study [13] employs the AR model, which is model (PLS-SEM). The study [35] identified a positive cor-
appropriate for studying networks with such topologies and relation between fraud risk assessment and management and
applying it to the detection of financial transaction fraud since the efficient use of forensic accounting using chi-square,
it considers the block-wise structure of networks. The authors fisher test, and correlation, however, there is no relationship
discovered that, in line with reality, there is a risk relationship between fraud risk assessment and management in terms of
between fraudulent groups and ordinary loan applicants. The techniques causing fraud. The study [9] examines fraud using
study [28] outlined specific identification indicators that help ensemble learners for anomaly detection and also handles
with the detection of financial fraud using digital distribu- data skewness, a triage model that receives input from the
tion laws, and the authors demonstrate that the probability ensemble model, and a risk model that estimates the financial
of financial fraud increases significantly as the deviation of losses. The authors successfully provide an effective fraud
financial data distribution from Benford’s law increases. risk-based detection, from machine learning techniques to
In summary, a large body of literature uses statistical meth- risk assessment, but do not to evaluate fraud detection by
ods to analyze the causes and effects that influence financial first considering the risk component before subjecting it to
fraud, but due to the complex nature and scalability of fraud, machine learning detection.
statistical methods are not enough to adequately examine In summary, risk measures are good in the assessment
financial fraud. and management of the features associated with fraud for
effective fraud prevention and control. However, due to the
nonlinearity, high dimension, and complex nature of fraud,
B. RISK-BASED METHODS OF FRAUD DETECTION these risk measures need to be augmented with other tech-
This section presents the financial fraud assessment from the niques such as machine learning techniques that enable
perspective of risk mitigation. The existing studies utilize proper and efficient fraud prevention and detection.
different risk measures such as value-at-risk (VaR), expected
loss, and expected shortfall to assess the level of risk of
fraud. The study [29] offers strategies for breaking down the C. MACHINE LEARNING METHODS IN FRAUD DETECTION
risk of fraud, identifying potential fraudsters, and enabling This section presents studies that utilize machine learning
more targeted anti-fraud measures by tying the motivation techniques for the classification of fraud applications. The
of the fraud triangle to human tendencies that lead to spe- majority of the presented studies consider the detection
cific actions as well as the meta-model of fraud together. while addressing the skewed nature of fraud instances. Sam-
Regression analysis is utilized in the study [30] to look at pling methods, hybrid methods, and other novel methods are
how enterprises manage risk to determine how control envi- majorly used to overcome the skewed nature of fraud datasets.
ronments, risk assessments, control activities, information The study [36] addresses class skewness in credit card fraud
and communication, and monitoring contributed to fraud pre- using quantum machine learning (QML) and support vec-
vention and detection efforts in Indonesian firms. The study tor machines (SVM). The results show that classic machine
[31] defined additional security attributes that might have learning techniques are still useful for non-time series data,
an impact on the cloud system and carried out an anomaly whereas QML applications can be used for time-series-based
detection based on risk assessment named parallel processing and highly skewed data. Quantum neural network (QNN)
(PP) that covers cyber threats and exploitation likelihoods. achieves good performance in fraud detection by the study
The model checker is then employed to determine the risk [37]. The study [38] trained different machine learning mod-
exposure rates associated with the respective attacks. The els, all of which were using default implementations and
study [32] proposes a framework in which doubly-truncated parameters, XGBoost performed more accurately than any
severity distributions are used to estimate the operational risk other models. The effectiveness of telecom fraud is assessed
and offered a framework that includes database construction in the study [39] using a dynamic graph neural network
and risk modeling. By applying value-at-risk and expected (DGNN), the authors effectively present a suggested method
shortfall to identify operational risk sources like external for resolving the issue of telecom fraud detection in extensive
fraud risk and legal risk sections, the authors were able to phone social networks. To assess credit card fraud while
considering the skewness of fraud instances, the study [40] TABLE 1. Table of related studies.
makes use of logistics regression (LR), K-nearest neighbor
(KNN), decision tree (DT), random forest (RF), and autoen-
coder (AE) as they can handle skewed data better than other
models, the AE model performs better. KNN, linear dis-
criminant analysis (LDA), and linear regression are used in
the study [41] to investigate credit card fraud, by address-
ing the skewed nature of the credit card fraud data and
using cross-validation techniques, KNN showed higher per-
formance. Using ARIMA model for fraud detection based on
daily transaction counts, the study [14] carried out anomaly
detection, the model is contrasted with four industry-standard
anomaly detection algorithms: the box plot, isolation for-
est(IF), local outlier factor (LOF), and K-means models.
An ensemble classifier (EC) [42] incorporating bagging and
boosting has been used to address the issue of fraud class
skewness, the approach are found to perform better when
compared to the current methods. The study [43] addresses
the issue of skewed datasets by using fuzzy C-means cluster-
ing and the selection of related instances. The authors address
the issues with conventional under-sampling strategies to
enhance the detection performance and accuracy. To iden-
tify fraudulent transactions, the study [44] suggested LSTM
ensemble, SMOTE-ENN was used to address the problem
of fraud skewness. The method outperformed other algo-
rithms in terms of performance, but, SMOTE method may
occasionally produce instances that are not typical instances
of the minority class. A dynamic ensemble technique [45]
for anomaly identification in the Internet of Things sys-
tems is proposed. To address the issue of fraud skewness,
the borderline-synthetic minority over-sampling approach
(Borderline-SMOTE), One-Sided Selection (OSS), and adap-
tive synthetic (ADASYN) were applied in the study [46], OSS
were found to be optimal under-sampling technique and that
adaptive synthetic (ADASYN) performs better when employ-
ing the gradient tree boosting (GTB) classifier. Random forest
ensemble approach [47] performed exceptionally well on
oversampling and under-sampling. Though under-sampling
usually led to the loss of important information while on the
other hand, oversampling brings information that may not be
fully a representative of the training set.
It is widely acknowledged that the skewed distribu-
tion of fraud instances presents a significant challenge for
many machine learning models. The resampling techniques
that have been used in effective fraud skewness mitigation
may not be free from certain shortcomings. The resam-
pled instances usually suffer from non-representative of the
dataset, overfitting, and the loss of important data. Hence,
there is a need to augment the effort of machine learning
algorithms with novel approach in overcoming this challenge.
D. RESEARCH PROBLEM
The problem of NBA fraud keeps increasing daily as the
number of online banking service users keeps increasing [3].
TABLE 1. (Continued.) Table of related studies. TABLE 1. (Continued.) Table of related studies.
D. VALUE-AT-RISK V
Because of the value-at-risk emphasis on statistically extreme
but significant fraud instances, it is ideally more suitable for
the development of efficient fraud detection models. In the
financial sector, value-at-risk is a quantile of loss distribution
that gives a range of potential losses and is one of the most
frequently used measures of risk. V can also be termed as
a statistical measure of the risk of loss over a specific time
at a given confidence level. It also plays a significant part
in the Basel regulatory framework. V has a confidence level
α ∈ (0, 1) [48]. This experiment adopted the Solvency II
framework which uses a one-year horizon with the level of
FIGURE 2. Value-at-risk return curve.
confidence, α equal to 0.995. It can be written as in (1):
V =ℓ+C (3)
TABLE 8. Parameter analysis involving a number of nearest neighbors k financial fraud challenges. The value-at-risk attach confi-
of KNN.
dence probability weight to the rare fraud cases with nearest
neighbor distance k. The distance weight of KNN is impera-
tive in inhibiting class skewness by assigning a higher weight
to near instances which in turn facilitates efficient detection
of skewed instances. The deployment of expected shortfall
and expected loss by value at risk allows quantification of
TABLE 9. Parameter analysis involving a distribution assumption of NB. risk across mean, worst-case, and extreme scenarios enabling
aggregation of their strengths. Therefore, an accurate fraud
detection system assists organizations in making effective
choices and reducing the overall expense of fraud detection
and prevention. This paper does not consider the time win-
dows in the experiment. However, the major challenge is the
lack of data availability in NBA fraud detection.
distributions were evaluated and the accuracy results as given
in Table 9. CONFLICT OF INTEREST
There are no competing interests disclosed by the authors.
VII. DISCUSSIONS AND CONCLUSION
This section presents a discussion of the results and the AVAILABILITY OF DATA AND MATERIALS
conclusion of our findings. The data used in this research is publicly available at
https://github.com/feedzai/bank-account-fraud
A. DISCUSSIONS
This paper explored improving the performance of NBA ACKNOWLEDGMENT
fraud detection model by employing value-at-risk. The per- The authors would like to acknowledge the support of Prince
formance of the fraud detection models was measured based Sultan University for paying the Article Processing Charges
on the removal of redundant features to lower the complexity (APC) of this publication and also would like to thank the
of the model, the selection of an important feature capable School of Statistics & Mathematics, Zhejiang Gongshang
of influencing fraud detection to avoid noise and collinearity, University, China.
and the engineering of features from the contextual perspec-
tive that increase the model performance. The raw features REFERENCES
were sent to BLR, KNN, and NB models for classification. [1] ACFE. Association of Certified Fraud Examiners (ACFE) 2022 Report to
KNN model outperforms other models as shown in Table 3 the Nations. Accessed: 2023. [Online]. Available: https://legacy.acfe.com/
report-to-the-nations/2022/
with an accuracy result of 0.9884, TP rate result of 0.0061, [2] T. Ashfaq, R. Khalid, A. S. Yahaya, S. Aslam, A. T. Azar, S. Alsafari, and
FP rate result of 0.0007, and f-score result of 0.0115. The I. A. Hameed, ‘‘A machine learning and blockchain based efficient fraud
performance of fraud detection is not very good and reliable detection mechanism,’’ Sensors, vol. 22, no. 19, p. 7162, Sep. 2022.
[3] N. S. Alfaiz and S. M. Fati, ‘‘Enhanced credit card fraud detection model
as evidenced by the TP rate and f-score, hence, necessitating using machine learning,’’ Electronics, vol. 11, no. 4, 662, 2022.
the need for the model improvement. Given that, value-at-risk [4] A. Alfaadhel, I. Almomani, and M. Ahmed, ‘‘Risk-based cybersecurity
was employed to improve the model. To improve NBA fraud compliance assessment system (RC2AS),’’ Appl. Sci., vol. 13, no. 10,
p. 6145, May 2023.
detection model, raw features were simulated through value-
[5] D. Sarma, W. Alam, I. Saha, M. N. Alam, M. J. Alam, and S. Hossain,
at-risk. The risk-return features obtained from value-at-risk ‘‘Bank fraud detection using community detection algorithm,’’ in Proc. 2nd
were sent to BLR, KNN, and NB models for classification. Int. Conf. Inventive Res. Comput. Appl. (ICIRCA), Jul. 2020, pp. 642–646.
Among the models, the KNN model performs better as shown [6] A. Pagano, ‘‘Digital account opening fraud on demand deposit accounts:
An assessment of available technology,’’ Ph.D. thesis, Utica College, Utica,
in Table 4 with an f-score result of 0.9333, TP rate result of NY, USA, 2020.
0.95, accuracy result of 0.9167, and DT rate result of 0.9406. [7] Shuftipro. New Account Fraud—A New Breed of Scams. Accessed:
The NBA fraud detection model based on value-at-risk fea- 2023. [Online]. Available: https://shuftipro.com/reports-whitepapers/new-
account-fraud.pdf
tures appears to have good performance. The reliability test [8] R. Sasirekha, B. Kanisha, and S. Kaliraj, ‘‘Study on class imbalance
conducted using the Kupiec test proved to be reliable and problem with modified KNN for classification,’’ in Intelligent Data Com-
consistent as shown in Fig. 8. This indicates that the value-at- munication Technologies and Internet of Things, vol. 101. Singapore:
Springer, 2022, pp. 207–217, doi: https://doi.org/10.1007/978-981-16-
risk engineered features led to the improvement of K-nearest 7610-9_15.
neighbor fraud detection model. [9] P. Vanini, S. Rossi, E. Zvizdic, and T. Domenig, ‘‘Online payment fraud:
From anomaly detection to risk management,’’ Financial Innov., vol. 9,
no. 1, p. 66, Mar. 2023, doi: 10.1186/s40854-023-00470-w.
B. CONCLUSION [10] X. Zhu, X. Ao, Z. Qin, Y. Chang, Y. Liu, Q. He, and J. Li, ‘‘Intelligent
The value-at-risk-based fraud detection model presented in financial fraud detection practices in post-pandemic era,’’ Innovation,
this paper enables the quantification and mitigation of fraud vol. 2, no. 4, Nov. 2021, Art. no. 100176, doi: 10.1016/j.xinn.2021.100176.
[11] M. Monge, C. Poza, and S. Borgia, ‘‘A proposal of a suspicion of tax fraud
risk features and at the same time overcome the influence indicator based on Google Trends to foresee Spanish tax revenues,’’ Int.
of skewed fraud instances which is very crucial in solving Econ., vol. 169, pp. 1–12, May 2022, doi: 10.1016/j.inteco.2021.11.002.
[12] S. Kannan and K. Somasundaram, ‘‘Autoregressive-based outlier [34] N. I. Mat Ridzuan, J. Said, F. M. Razali, D. I. Abdul Manan, and
algorithm to detect money laundering activities,’’ J. Money Laundering N. Sulaiman, ‘‘Examining the role of personality traits, digital technology
Control, vol. 20, no. 2, pp. 190–202, May 2017, doi: 10.1108/jmlc-07- skills and competency on the effectiveness of fraud risk assessment among
2016-0031. external auditors,’’ J. Risk Financial Manage., vol. 15, no. 11, p. 536,
[13] B. Xiao, B. Lei, W. Lan, and B. Guo, ‘‘A blockwise network autoregressive Nov. 2022, doi: 10.3390/jrfm15110536.
model with application for fraud detection,’’ Ann. Inst. Stat. Math., vol. 74, [35] O. E. Akinbowale, H. E. Klingelhöfer, and M. F. Zerihun, ‘‘Application of
no. 6, pp. 1043–1065, Dec. 2022, doi: 10.1007/s10463-022-00822-w. forensic accounting techniques in the south African banking industry for
[14] G. Moschini, R. Houssou, J. Bovay, and S. Robert-Nicoud, ‘‘Anomaly and the purpose of fraud risk mitigation,’’ Cogent Econ. Finance, vol. 11, no. 1,
fraud detection in credit card transactions using the ARIMA model,’’ in Dec. 2023, Art. no. 2153412, doi: 10.1080/23322039.2022.2153412.
Proc. 7th Int. Conf. Time Forecasting, Jul. 2021, p. 56, doi: 10.3390/eng- [36] H. Wang, W. Wang, Y. Liu, and B. Alidaee, ‘‘Integrating machine learning
proc2021005056. algorithms with quantum annealing solvers for online fraud detection,’’
[15] A. A. Alhashmi, A. M. Alashjaee, A. A. Darem, A. F. Alanazi, and IEEE Access, vol. 10, pp. 75908–75917, 2022.
R. Effghi, ‘‘An ensemble-based fraud detection model for financial [37] N. Innan, A. Sawaika, A. Dhor, S. Dutta, S. Thota, H. Gokal, N. Patel,
transaction cyber threat classification and countermeasures,’’ Eng., Tech- M. A.-Z. Khan, I. Theodonis, and M. Bennai, ‘‘Financial fraud detection
nol. Appl. Sci. Res., vol. 13, no. 6, pp. 12433–12439, Dec. 2023, doi: using quantum graph neural networks,’’ Quantum Mach. Intell., vol. 6,
10.48084/etasr.6401. no. 1, pp. 1–18, Jun. 2024.
[16] R. M. Aziz, R. Mahto, K. Goel, A. Das, P. Kumar, and A. Saxena, [38] A. Alwadain, R. F. Ali, and A. Muneer, ‘‘Estimating financial fraud through
‘‘Modified genetic algorithm with deep learning for fraud transactions of transaction-level features and machine learning,’’ Mathematics, vol. 11,
ethereum smart contract,’’ Appl. Sci., vol. 13, no. 2, p. 697, Jan. 2023, doi: no. 5, p. 1184, Feb. 2023.
10.3390/app13020697. [39] L. Ren, R. Hu, D. Li, Y. Liu, J. Wu, Y. Zang, and W. Hu, ‘‘Dynamic graph
[17] M. Hegazy, A. Madian, and M. Ragaie, ‘‘Enhanced fraud miner: Credit neural network-based fraud detectors against collaborative fraudsters,’’
card fraud detection using clustering data mining techniques,’’ Egyptian Knowl.-Based Syst., vol. 278, Oct. 2023, Art. no. 110888.
Comput. Sci. J., vol. 40, no. 3, pp. 1–10, 2016. [40] V. Chang, L. M. T. Doan, A. Di Stefano, Z. Sun, and G. Fortino,
[18] S. Jesus, J. Pombal, D. Alves, A. Cruz, P. Saleiro, R. Ribeiro, J. Gama, ‘‘Digital payment fraud detection methods in digital ages and industry
and P. Bizarro, ‘‘Turning the tables: Biased, imbalanced, dynamic tabular 4.0,’’ Comput. Electr. Eng., vol. 100, May 2022, Art. no. 107734, doi:
datasets for ML evaluation,’’ in Proc. Adv. Neural Inf. Process. Syst., 10.1016/j.compeleceng.2022.107734.
vol. 35, 2022, pp. 33563–33575. [41] J. Chung and K. Lee, ‘‘Credit card fraud detection: An improved strategy
[19] J. Pombal, P. Saleiro, M. A. T. Figueiredo, and P. Bizarro, ‘‘Fairness-aware for high recall using KNN, LDA, and linear regression,’’ Sensors, vol. 23,
data valuation for supervised learning,’’ 2023, arXiv:2303.16963. no. 18, p. 7788, Sep. 2023, doi: 10.3390/s23187788.
[20] T. Awosika, R. Mani Shukla, and B. Pranggono, ‘‘Transparency and pri- [42] V. S. S. Karthik, A. Mishra, and U. S. Reddy, ‘‘Credit card fraud detection
vacy: The role of explainable AI and federated learning in financial fraud by modelling behaviour pattern using hybrid ensemble model,’’ Arabian
detection,’’ 2023, arXiv:2312.13334. J. Sci. Eng., vol. 47, no. 2, pp. 1987–1997, Feb. 2022, doi: 10.1007/s13369-
[21] J. Khaksar, M. Salehi, and M. Lari DashtBayaz, ‘‘The relationship between 021-06147-9.
auditor characteristics and fraud detection,’’ J. Facilities Manage., vol. 20, [43] H. Ahmad, B. Kasasbeh, B. Aldabaybah, and E. Rawashdeh, ‘‘Class
no. 1, pp. 79–101, Jan. 2022, doi: 10.1108/jfm-02-2021-0024. balancing framework for credit card fraud detection based on clustering
and similarity-based selection (SBS),’’ Int. J. Inf. Technol., vol. 15, no. 1,
[22] A. Cordis, ‘‘Political alignment and corporate fraud: Evidence from the
pp. 325–333, Jan. 2023, doi: 10.1007/s41870-022-00987-w.
United States of America,’’ J. Appl. Accounting Res., Oct. 2023, doi:
10.1108/jaar-06-2022-0159. [44] E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido,
‘‘A neural network ensemble with feature engineering for improved credit
[23] M. J. Rahman and X. Jie, ‘‘Fraud detection using fraud triangle theory:
card fraud detection,’’ IEEE Access, vol. 10, pp. 16400–16407, 2022, doi:
Evidence from China,’’ J. Financial Crime, vol. 31, no. 1, pp. 101–118,
10.1109/ACCESS.2022.3148298.
Jan. 2024, doi: 10.1108/jfc-09-2022-0219.
[45] J. Jiang, F. Liu, Y. Liu, Q. Tang, B. Wang, G. Zhong, and W. Wang,
[24] T. Achmad, I. Ghozali, and I. D. Pamungkas, ‘‘Hexagon fraud: Detection
‘‘A dynamic ensemble algorithm for anomaly detection in IoT imbalanced
of fraudulent financial reporting in state-owned enterprises Indonesia,’’
data streams,’’ Comput. Commun., vol. 194, pp. 250–257, Oct. 2022, doi:
Economies, vol. 10, no. 1, p. 13, Jan. 2022.
10.1016/j.comcom.2022.07.034.
[25] Y. Wang, M. Yu, and S. Gao, ‘‘Gender diversity and financial state- [46] D. Sisodia and D. S. Sisodia, ‘‘Data sampling strategies for click fraud
ment fraud,’’ J. Accounting Public Policy, vol. 41, no. 2, Mar. 2022, detection using imbalanced user click data of online advertising: An empir-
Art. no. 106903. ical review,’’ IETE Tech. Rev., vol. 39, no. 4, pp. 789–798, Jul. 2022, doi:
[26] J. Hendieh, M. Schneider, and T. Sakr, ‘‘Fraud detection and prevention,’’ 10.1080/02564602.2021.1915892.
Middle-East J. Sci. Res., vol. 31, no. 1, pp. 44–52, 2023. [47] A. Singh, R. K. Ranjan, and A. Tiwari, ‘‘Credit card fraud detection under
[27] A. Maniatis, ‘‘Detecting the probability of financial fraud due to earn- extreme imbalanced data: A comparative study of data-level algorithms,’’
ings manipulation in companies listed in Athens stock exchange market,’’ J. Exp. Theor. Artif. Intell., vol. 34, no. 4, pp. 571–598, Jul. 2022, doi:
J. Financial Crime, vol. 29, no. 2, pp. 603–619, Mar. 2022. 10.1080/0952813x.2021.1907795.
[28] Y. Gong, J. Li, Z. Xu, and G. Li, ‘‘Detecting financial fraud using two types [48] A. J. McNeil, R. Frey, and P. Embrechts, ‘‘Quantitative risk management:
of Benford factors: Evidence from China,’’ Proc. Comput. Sci., vol. 214, Concepts, techniques and tools, Revised edition,’’ in Princeton Series in
pp. 656–663, Jan. 2022, doi: 10.1016/j.procs.2022.11.225. Finance. Princeton, NJ, USA: Princeton Univ. Press, 2015.
[29] P. Kagias, A. Cheliatsidou, A. Garefalakis, J. Azibi, and N. Sariannidis, [49] D. Gorton, ‘‘Modeling fraud prevention of online services using incident
‘‘The fraud triangle – an alternative approach,’’ J. Financial Crime, vol. 29, response trees and value at risk,’’ in Proc. 10th Int. Conf. Avail-
no. 3, pp. 908–924, May 2022, doi: 10.1108/jfc-07-2021-0159. ability, Rel. Secur., Toulouse, France, Aug. 2015, pp. 149–158, doi:
[30] T. Tarjo, H. V. Vidyantha, A. Anggono, R. Yuliana, and 10.1109/ARES.2015.17.
S. Musyarofah, ‘‘The effect of enterprise risk management on [50] Y. Lyu, F. Qin, R. Ke, Y. Wei, and M. Kong, ‘‘Does mixed frequency vari-
prevention and detection fraud in Indonesia’s local government,’’ ables help to forecast value at risk in the crude oil market?’’ Resour. Policy,
Cogent Econ. Finance, vol. 10, no. 1, Dec. 2022, Art. no. 2101222, doi: vol. 88, Jan. 2024, Art. no. 104426, doi: 10.1016/j.resourpol.2023.104426.
10.1080/23322039.2022.2101222. [51] S. B. Abdullahi and K. Chamnongthai, ‘‘IDF-sign: Addressing inconsistent
[31] B. Stojanović and J. Božić, ‘‘Robust financial fraud alerting system based depth features for dynamic sign word recognition,’’ IEEE Access, vol. 11,
in the cloud environment,’’ Sensors, vol. 22, no. 23, p. 9461, Dec. 2022, pp. 88511–88526, 2023.
doi: 10.3390/s22239461. [52] A. Mahajan, V. S. Baghel, and R. Jayaraman, ‘‘Credit card fraud detection
[32] Y. Yao and J. Li, ‘‘Operational risk assessment of third-party payment using logistic regression with imbalanced dataset,’’ in Proc. 10th Int. Conf.
platforms: A case study of China,’’ Financial Innov., vol. 8, no. 1, p. 19, Comput. Sustain. Global Develop. (INDIACom), Mar. 2023, pp. 339–342.
Dec. 2022, doi: 10.1186/s40854-022-00332-x. [53] F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, ‘‘Insurance
[33] J. S. Kamdem and D. Selambi, ‘‘Cyber-risk forecasting using machine fraud detection: Evidence from artificial intelligence and machine learn-
learning models and generalized extreme value distributions,’’ Hal Sci., ing,’’ Res. Int. Bus. Finance, vol. 62, Dec. 2022, Art. no. 101744, doi:
vol. 1, pp. 1–23, Jan. 2022. 10.1016/j.ribaf.2022.101744.
[54] E. Ileberi, Y. Sun, and Z. Wang, ‘‘A machine learning based credit card BAYAN ALGHOFAILY received the master’s and
fraud detection using the GA algorithm for feature selection,’’ J. Big Data, Ph.D. degrees in computer science from Toronto
vol. 9, no. 1, p. 24, Dec. 2022, doi: 10.1186/s40537-022-00573-8. Metropolitan University, Toronto, Canada. During
[55] P. Atchaya and K. Somasundaram, ‘‘Novel logistic regression over Naive that period, she was a member of the Distributed
Bayes improves accuracy in credit card fraud detection,’’ J. Surv. Fisheries Applications and Broadband Networks Laboratory
Sci., vol. 10, no. 1S, pp. 2172–2181, 2023. (DABNEL). She focused on studying how the per-
[56] R. Bin Sulaiman, V. Schetinin, and P. Sant, ‘‘Review of machine learning formance of machine learning models is affected
approach on credit card fraud detection,’’ Hum.-Centric Intell. Syst., vol. 2, by dataset features. She is currently an Assistant
nos. 1–2, pp. 55–68, Jun. 2022, doi: 10.1007/s44230-022-00004-0.
Professor with the Department of Information Sys-
[57] A. Kannagi, J. Gori Mohammed, S. Sabari Giri Murugan, and
tem, CCIS, Prince Sultan University (PSU). She
M. Varsha, ‘‘Intelligent mechanical systems and its applications on online
fraud detection analysis using pattern recognition K-nearest neighbor is also a member of the Artificial Intelligence and Data Analytics (AIDA)
algorithm for cloud security applications,’’ Mater. Today: Proc., vol. 81, Laboratory, CCIS, PSU. Her research interests include AI, NLP, ML, and
pp. 745–749, 2023, doi: 10.1016/j.matpr.2021.04.228. neural networks. She continues to explore this further in her research.
[58] P. H. Kupiec, ‘‘Techniques for verifying the accuracy of risk measurement
models,’’ in Division of Research and Statistics, Division of Monetary
Affairs, Federal Reserve Board, vol. 95. USA: Journal of Derivatives, 1995.
[59] K. Kireev, M. Andriushchenko, C. Troncoso, and N. Flammarion, ‘‘Trans-
ferable adversarial robustness for categorical data via universal robust
embeddings,’’ 2023, arXiv:2306.04064.