Abstract
An automatic alpha factor mining method is proposed in this paper to assist expert traders in finding profitable alpha factors efficiently. Unlike finding qualified alpha factors by directly enumerating all possible combinations, the mining task is formulated as an iterative convolution kernel learning problem. Each kernel to be learned is associated with a unique alpha factor. To better solve the learning problem, the sparsity is introduced at the mutation step of the learning process to find simple and interpretable solutions efficiently and relieve the overfitting risks in real-world trading. A theorem is proposed to prove that the designed learning process can complete automatically in finite iterations as all convolution kernel vectors converge to zero vectors. In addition, a score function based on win rate, expected return and trade frequency is designed to evaluate the performance of market entry signals generated by the alpha factors practically. The convolution kernels with high score values are recorded and exported as the mined alpha factors. The experiment results show that the proposed method can achieve superior performance on both the China government bond dataset and the gold dataset.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets generated and analyzed during the current study are available from the corresponding author at reasonable request and can be downloaded at https://github.com/szy1900/autoAlpha/
References
Dai Z, Zhu H, Kang J (2021) New technical indicators and stock returns predictability. Int Rev Econ Finance 71:127–142
Cui C, Wang W, Zhang M, Chen G, Luo Z, Ooi BC (2021) Alphaevolve: A learning framework to discover novel alphas in quantitative investment. In: Proceedings of the 2021 International conference on management of data, pp 2208–2216
Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 116659
Cao H (2022) Entrepreneurship education-infiltrated computer-aided instruction system for college music majors using convolutional neural network. Front Psychol 13
Huang C, Han Z, Li M, Wang X, Zhao W (2021) Sentiment evolution with interaction levels in blended learning environments: Using learning analytics and epistemic network analysis. Australas J Educ Technol 37(2):81–95
An Z, Ding Y, Wu Q (2022) Trend prediction of stock index based on convolutional neural network. In: 2022 7th International conference on cloud computing and big data analytics (ICCCBDA), pp 17–21. IEEE
Li K (2022) Predicting stock price using convolutional neural network. In: 2022 IEEE International conference on artificial intelligence and computer applications (ICAICA), pp 739–742. IEEE
Wang T, Zhang L, Hu W (2021) Bridging deep and multiple kernel learning: A review. Inf Fusion 67:3–13
Wu D, Wang B, Precup D, Boulet B (2019) Multiple kernel learning-based transfer regression for electric load forecasting. IEEE Trans Smart Grid 11(2):1183–1192
Zhang T, Li Y, Jin Y, Li J (2020) Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment. arXiv:2002.08245
Buhrmester V, Münch D, Arens M (2021) Analysis of explainers of black box deep neural networks for computer vision: A survey. Mach Learn Knowl Extraction 3(4):966–989
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. arXiv:1703.00810
Du L, Gao R, Suganthan PN, Wang DZ (2022) Bayesian optimization based dynamic ensemble for time series forecasting. Inf Sci 591:155–175
Awal MA, Masud M, Hossain MS, Bulbul AA-M, Mahmud SH, Bairagi AK (2021) A novel bayesian optimization-based machine learning framework for covid-19 detection from inpatient facility data. Ieee Access 9:10263–10281
Turkoglu B, Uymaz SA, Kaya E (2023) Chaos theory in metaheuristics. Comprehensive metaheuristics. Elsevier, Amsterdam, pp 1–20
Huang Y, Gao Y, Gan Y, Ye M (2021) A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 425:207–218
Tian J, Hou M, Bian H, Li J (2022) Variable surrogate model-based particle swarm optimization for high-dimensional expensive problems. Complex Intell Syst 1–49
Yadav RK et al (2020) Pso-ga based hybrid with adam optimization for ann training with application in medical diagnosis. Cognit Syst Res 64:191–199
Liang Y, Liu J (2021) Feature selection using forest optimization algorithm based on multi-ethnic strategy. In: 2021 16th International conference on intelligent systems and knowledge engineering (ISKE), pp 63–68. IEEE
Ghaemi M, Feizi-Derakhshi M-R (2014) Forest optimization algorithm. Expert Syst Appl 41(15):6676–6687
Kaya E, Gorkemli B, Akay B, Karaboga D (2022) A review on the studies employing artificial bee colony algorithm to solve combinatorial optimization problems. Eng Appl Artif Intell 115:105311
Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybernet 13(4):1179–1196
Turkoglu B, Uymaz SA, Kaya E (2022) Binary artificial algae algorithm for feature selection. Appl Soft Comput 120:108630
Turkoglu B, Kaya E (2020) Training multi-layer perceptron with artificial algae algorithm. Eng Sci Technol Int J 23(6):1342–1350
Greenacre M, Groenen PJ, Hastie T, d’Enza AI, Markos A, Tuzhilina E (2022) Principal component analysis. Nat Rev Methods Primers 2(1):100
Stephens T (2016) Genetic Programming in Python With a Scikit-Learn Inspired API: Gplearn
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126
Kim B (2014) Simplicity Theory. Oxford University Press, Oxford
Bargagli Stoffi FJ, Cevolani G, Gnecco G (2022) Simple models in complex worlds: Occam’s razor and statistical learning theory. Minds Mach 32(1):13–42
Sharma D (2022) Occam’s razor and surgeons. Indian J Surg 1–2
Ladley D, Pellizzari P (2014) The Simplicity of Optimal Trading in Order Book Markets. Springer, New York City, pp 183–199
Heinz A, Jamaloodeen M, Saxena A, Pollacia L (2021) Bullish and bearish engulfing japanese candlestick patterns: A statistical analysis on the s &p 500 index. Q Rev Econ Finance 79:221–244
Indah YR, Mahyuni LP (2022) The accuracy of relative strength index (rsi) indicator in forecasting foreign exchange price movement. Inovbiz: J Inovasi Bisnis 10(1):96–101
Sagar R, Sharma GP (2012) Measurement of alpha diversity using simpson index (1/lamda): the jeopardy. Environ Skeptics Critics 1(1):23
Wang CD, Chen Z, Lian Y, Chen M (2022) Asset selection based on high frequency sharpe ratio. J Econ 227(1):168–188
Kakushadze Z (2016) 101 formulaic alphas. Wilmott 2016(84):72–81
Du X, Tanaka-Ishii K (2022) Stock portfolio selection balancing variance and tail risk via stock vector representation acquired from price data and texts. Knowl-Based Syst 249:108917
Riley T, Yan Q (2022) Maximum drawdown as predictor of mutual fund performance and flows. Financ Anal J 78(4):59–76
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A. List of abbreviations
The abbreviations used in this paper are summarized in Table 10.
1.2 B. List of symbols
The symbols used in this paper are summarized in Table 11.
1.3 C. Summary of 28 basic patterns
The name of all 28 basic patterns used as the columns of each input are provided in Table 12. For a given trading day, the possible value for each basic pattern is either 0 or 1, indicating the occurrence and absence of the basic pattern, respectively.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, Z., Mao, X., Yang, X. et al. Mining profitable alpha factors via convolution kernel learning. Appl Intell 53, 28460–28478 (2023). https://doi.org/10.1007/s10489-023-05014-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05014-4