Abstract
We use 63 features extracted from sources such as versioning and issue tracking systems to predict defects in short time frames of two months. Our multivariate approach covers aspects of software projects such as size, team structure, process orientation, complexity of existing solution, difficulty of problem, coupling aspects, time constrains, and testing data. We investigate the predictability of several severities of defects in software projects. Are defects with high severity difficult to predict? Are prediction models for defects that are discovered by internal staff similar to models for defects reported from the field?
We present both an exact numerical prediction of future defect numbers based on regression models as well as a classification of software components as defect-prone based on the C4.5 decision tree. We create models to accurately predict short-term defects in a study of 5 applications composed of more than 8.000 classes and 700.000 lines of code. The model quality is assessed based on 10-fold cross validation.
Chapter PDF
Similar content being viewed by others
References
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–689 (1999)
Knab, P., Pinzger, M., Bernstein, A.: Predicting defect densities in source code files with decision tree learners. In: Proceedings of the International Workshop on Mining Software Repositories, Shanghai, China, May 2006, pp. 119–125. ACM Press, New York (2006)
Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the International Conference on Software Engineering, St. Louis, MO, USA, May 2005, pp. 284–292 (2005)
Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proceedings of the International Symposium on Software Testing and Analysis, Rome, Italy, July 2002, pp. 55–64 (2002)
Schröter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil, September 2006, pp. 18–27 (2006)
Wagner, S., Jürjens, J., Koller, C., Trischberger, P.: Comparing Bug Finding Tools with Reviews and Tests. In: Khendek, F., Dssouli, R. (eds.) TestCom 2005. LNCS, vol. 3502, pp. 40–55. Springer, Heidelberg (2005)
Khoshgoftaar, T.M., Yuan, X., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Uncertain classification of fault-prone software modules. Empirical Software Engineering 7(4), 297–318 (2002)
Briand, L.C., Basili, V.R., Thomas, W.M.: A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering 18(11), 931–942 (1992)
Nikora, A.P., Munson, J.C.: Developing fault predictors for evolving software systems. In: Proceedings of the Software Metrics Symposium, Sydney, Australia, September 2003, pp. 338–350 (2003)
Shirabad, J.S., Lethbridge, T.C., Matwin, S.: Mining the maintenance history of a legacy software system. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, The Netherlands, September 2003, pp. 95–104 (2003)
Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, Netherlands, September 2003, pp. 23–32. IEEE Computer Society Press, Los Alamitos (2003)
Moeller, K.H., Paulish, D.: An empirical investigation of software fault distribution. In: Proceedings of the International Software Metrics Symposium, pp. 82–90 (1993)
Hatton, L.: Re-examining the fault density-component size connection. IEEE Software 14(2), 89–98 (1997)
Lehman, M.M., Belady, L.A.: Program Evolution - Process of Software Change. Academic Press, London (1985)
Gall, H., Jazayeri, M., Ratzinger (former Krajewski), J.: CVS release history data for detecting logical couplings. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2003, pp. 13–23. IEEE Computer Society Press, Los Alamitos (2003)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Ratzinger, J., Fischer, M., Gall, H.: Evolens: Lens-view visualizations of evolution data. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, September 2005, pp. 103–112 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Ratzinger, J., Pinzger, M., Gall, H. (2007). EQ-Mine: Predicting Short-Term Defects for Software Evolution. In: Dwyer, M.B., Lopes, A. (eds) Fundamental Approaches to Software Engineering. FASE 2007. Lecture Notes in Computer Science, vol 4422. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71289-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-71289-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71288-6
Online ISBN: 978-3-540-71289-3
eBook Packages: Computer ScienceComputer Science (R0)