Modeling hierarchical usage context for software exceptions based on interaction data

Chen, Hui; Damevski, Kostadin; Shepherd, David; Kraft, Nicholas A.

doi:10.1007/s10515-019-00265-3

Modeling hierarchical usage context for software exceptions based on interaction data

Published: 13 August 2019

Volume 26, pages 733–756, (2019)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

366 Accesses
Explore all metrics

Abstract

Traces of user interactions with a software system, captured in production, are commonly used as an input source for user experience testing. In this paper, we present an alternative use, introducing a novel approach of modeling user interaction traces enriched with another type of data gathered in production—software fault reports consisting of software exceptions and stack traces. The model described in this paper aims to improve developers’ comprehension of the circumstances surrounding a specific software exception and can highlight specific user behaviors that lead to a high frequency of software faults. Modeling the combination of interaction traces and software crash reports to form an interpretable and useful model is challenging due to the complexity and variance in the combined data source. Therefore, we propose a probabilistic unsupervised learning approach, adapting the nested hierarchical Dirichlet process, which is a Bayesian non-parametric hierarchical topic model originally applied to natural language data. This model infers a tree of topics, each of whom describes a set of commonly co-occurring commands and exceptions. The topic tree can be interpreted hierarchically to aid in categorizing the numerous types of exceptions and interactions. We apply the proposed approach to large scale datasets collected from the ABB RobotStudio software application, and evaluate it both numerically and with a small survey of the RobotStudio developers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Studying software logging using topic models

Article 30 January 2018

Mining Behavioral Patterns for Conformance Diagnostics

Separating passing and failing test executions by clustering anomalies

Article Open access 03 October 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The Eclipse UDC dataset is a well known source of this type of data in the software engineering community. Available at: http://archive.eclipse.org/projects/usagedata/.
See the Stack Overflow discussion “Is it a good or bad idea throwing Exceptions when validating data?” at https://stackoverflow.com/questions/1504302/is-it-a-good-or-bad-idea-throwing-exceptions-when-validating-data and many other discussions on the subject.

References

Agrawal, A., Fu, W., Menzies, T.: What is wrong with topic modeling? And how to fix it using search-based software engineering. Inf. Softw. Technol. 98, 74–88 (2018)
Article Google Scholar
Arnold, D.C., Ahn, D.H., De Supinski, B.R., Lee, G.L., Miller, B.P., Schulz, M.: Stack trace analysis for large scale debugging. In: 2007 IEEE International Parallel and Distributed Processing Symposium, p. 64. IEEE (2007)
Blei, D.M., Moreno, P.J.: Topic segmentation with an aspect hidden Markov model. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pp. 343–348. ACM, New York, NY, USA (2001). https://doi.org/10.1145/383952.384021
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010). https://doi.org/10.1145/1667053.1667056
Article MathSciNet MATH Google Scholar
Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4408965
Chen, T.H., Thomas, S.W., Hassan, A.E.: A survey on the use of topic models when mining software repositories. Empir. Softw. Eng. 21(5), 1843–1919 (2016). https://doi.org/10.1007/s10664-015-9402-8
Article Google Scholar
Chou, A., Yang, J., Chelf, B., Hallem, S., Engler, D.: An empirical study of operating systems errors. SIGOPS Oper. Syst. Rev. 35(5), 73–88 (2001). https://doi.org/10.1145/502059.502042
Article Google Scholar
Damevski, K., Chen, H., Shepherd, D., Pollock, L.: Interactive exploration of developer interaction traces using a hidden Markov model. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, pp. 126–136. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2901739.2901741
Damevski, K., Shepherd, D.C., Schneider, J., Pollock, L.: Mining sequences of developer interactions in visual studio for usage smells. IEEE Trans. Softw. Eng. 43(4), 359–371 (2017). https://doi.org/10.1109/TSE.2016.2592905
Article Google Scholar
Damevski, K., Chen, H., Shepherd, D.C., Kraft, N.A., Pollock, L.: Predicting future developer behavior in the IDE using topic models. IEEE Trans. Softw. Eng. 44(11), 1100–1111 (2018). https://doi.org/10.1109/TSE.2017.2748134
Article Google Scholar
Dang, Y., Wu, R., Zhang, H., Zhang, D., Nobel, P.: Rebucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 1084–1093. IEEE Press, Piscataway, NJ, USA (2012). http://dl.acm.org/citation.cfm?id=2337223.2337364
Davies, S., Roper, M., Wood, M.: Using bug report similarity to enhance bug localisation. In: 2012 19th Working Conference on Reverse Engineering, pp. 125–134 (2012). https://doi.org/10.1109/WCRE.2012.22
Dhaliwal, T., Khomh, F., Zou, Y.: Classifying field crash reports for fixing bugs: a case study of mozilla firefox. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 333–342 (2011). https://doi.org/10.1109/ICSM.2011.6080800
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, vol. 2, 3rd edn. Chapman & Hall/CRC, Boca Raton (2014)
MATH Google Scholar
Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., Hunt, G.: Debugging in the (very) large: ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 103–116. ACM (2009)
Gu, Z., Barr, E.T., Schleck, D., Su, Z.: Reusing debugging knowledge via trace-based bug search. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 927–942. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2384616.2384684
Han, S., Dang, Y., Ge, S., Zhang, D., Xie, T.: Performance debugging in the large via mining millions of stack traces. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 145–155. IEEE Press, Piscataway, NJ, USA (2012). http://dl.acm.org/citation.cfm?id=2337223.2337241
Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 837–847. IEEE Press, Piscataway, NJ, USA (2012). https://dl.acm.org/citation.cfm?id=2337322
Kaushik, N., Tahvildari, L.: A comparative study of the performance of IR models on duplicate bug detection. In: 2012 16th European Conference on Software Maintenance and Reengineering, pp. 159–168 (2012). https://doi.org/10.1109/CSMR.2012.78
Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Zhai, C.: Have things changed now?: An empirical study of bug characteristics in modern open source software. In: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, ASID ’06, pp. 25–33. ACM, New York, NY, USA (2006). https://doi.org/10.1145/1181309.1181314
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. SIGOPS Oper. Syst. Rev. 42(2), 329–339 (2008). https://doi.org/10.1145/1353535.1346323
Article Google Scholar
Murphy-Hill, E., Jiresal, R., Murphy, G.C.: Improving software developers’ fluency by recommending development environment commands. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pp. 42:1–42:11. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2393596.2393645
Nguyen, V.A., Boyd-Graber, J.L., Resnik, P., Chang, J.: Learning a concept hierarchy from multi-labeled documents. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3671–3679. Curran Associates Inc, Red Hook (2014)
Google Scholar
Paisley, J., Wang, C., Blei, D.M., Jordan, M.I.: Nested hierarchical Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015). https://doi.org/10.1109/TPAMI.2014.2318728
Article Google Scholar
Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D., De Lucia, A.: How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 522–531. IEEE Press, Piscataway, NJ, USA (2013). http://dl.acm.org/citation.cfm?id=2486788.2486857
Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000)
Google Scholar
Roos, M., Martins, T.G., Held, L., Rue, H., et al.: Sensitivity analysis for bayesian hierarchical models. Bayesian Anal. 10(2), 321–349 (2015)
Article MathSciNet Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, pp. 487–494. AUAI Press, Arlington, Virginia, United States (2004). http://dl.acm.org/citation.cfm?id=1036843.1036902
Snipes, W., Nair, A.R., Murphy-Hill, E.: Experiences Gamifying developer adoption of practices and tools. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 105–114. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2591062.2591171
Soh, Z., Drioul, T., Rappe, P.A., Khomh, F., Gueheneuc, Y.G., Habra, N.: Noises in interaction traces data and their impact on previous research studies. In: 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–10 (2015). https://doi.org/10.1109/ESEM.2015.7321209
Sun, X., Liu, X., Li, B., Duan, Y., Yang, H., Hu, J.: Exploring topic models in software engineering data analysis: a survey. In: 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 357–362 (2016). https://doi.org/10.1109/SNPD.2016.7515925
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006). https://doi.org/10.1198/016214506000000302
Article MathSciNet MATH Google Scholar
van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004). https://doi.org/10.1109/TKDE.2004.47
Article Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 1105–1112. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1553374.1553515
Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 123–131. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2339530.2339552
Williams, C.C., Hollingsworth, J.K.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng. 31(6), 466–480 (2005). https://doi.org/10.1109/TSE.2005.63
Article Google Scholar
Wu, R., Zhang, H., Cheung, S.C., Kim, S.: Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA 2014, pp. 204–214. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2610384.2610386
Yin, Z., Caesar, M., Zhou, Y.: Towards understanding bugs in open source router software. SIGCOMM Comput. Commun. Rev. 40(3), 34–40 (2010). https://doi.org/10.1145/1823844.1823849
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the RobotStudio team at ABB Inc for providing the interaction dataset and responding to the survey. The authors are also grateful to the anonymous reviewers’ constructive comments.

Author information

Authors and Affiliations

Department of Computer and Information Science, Brooklyn College of the City University of New York, Brooklyn, NY, 11210, USA
Hui Chen
Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
Kostadin Damevski
ABB Corporate Research, Raleigh, NC, 27606, USA
David Shepherd & Nicholas A. Kraft

Authors

Hui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kostadin Damevski
View author publications
You can also search for this author in PubMed Google Scholar
David Shepherd
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas A. Kraft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Damevski, K., Shepherd, D. et al. Modeling hierarchical usage context for software exceptions based on interaction data. Autom Softw Eng 26, 733–756 (2019). https://doi.org/10.1007/s10515-019-00265-3

Download citation

Received: 11 May 2018
Accepted: 02 August 2019
Published: 13 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10515-019-00265-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Modeling hierarchical usage context for software exceptions based on interaction data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Studying software logging using topic models

Mining Behavioral Patterns for Conformance Diagnostics

Separating passing and failing test executions by clustering anomalies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Modeling hierarchical usage context for software exceptions based on interaction data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Studying software logging using topic models

Mining Behavioral Patterns for Conformance Diagnostics

Separating passing and failing test executions by clustering anomalies

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation