Abstract
Transactional memory has been attracting increasing attention in recent years, and it provides optimistic concurrency control schemes for shared-memory parallel programs. The rapid development and wide adoption of transactional memory make this programming paradigm promising for achieving breakthroughs in massively parallel computing. There has been a large number of discussions towards transactional memory systems, which aimed at providing relatively simple and intuitive synchronization construction for shared-memory parallel programs without sacrificing performance. Hardware transactional memory (HTM) has become commercially available in mainstream processors, however, due to several inherent architectural limitations that will abort hardware transactions, such as cache overflows, context switches, hardware as well as software exceptions, etc., nowadays HTM systems come in a best-effort way, which necessitates the adoption of a software fallback path to ensure forward progress. In this paper, we survey state-of-the-art software-side optimizations for best-effort hardware transaction system, as well as several novel performance tuning techniques. Research efforts about joint usage of HTM and non-volatile memory (NVM) are also discussed.





Similar content being viewed by others
References
Abadi, M., Birrell, A., Harris, T., Isard, M.: Semantics of transactional memory and automatic mutual exclusion. ACM Trans. Program. Lang. Syst. 33(1). https://doi.org/10.1145/1889997.1889999
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Ansari, M., Jarvis, K., Kotselidis, C., Lujan, M., Kirkham, C., Watson, I.: Profiling transactional memory applications. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 11–20. (2009)
Apalkov, D., Khvalkovskiy, A., Watts, S., Nikitin, V., Tang, X., Lottis, D., Moon, K., Luo, X., Chen, E., Ong, A., et al.: Spin-transfer torque magnetic random access memory (stt-mram). J. Emerg. Technol. Comput. Syst. 9(2) (2013). https://doi.org/10.1145/2463585.2463589
Armstrong, N., Felber, P., Gramoli, V.: Space-constrained data structures for htm (2018)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Avni, H., Kuszmaul, B.C.: Improving htm scaling with consistency-oblivious programming. In: 9th Workshop on Transactional Computing, TRANSACT, vol. 14 (2014)
Avni, H., Levy, E., Mendelson, A.: Hardware transactions in nonvolatile memory. In: Proceedings of the 29th International Symposium on Distributed Computing - Volume 9363, ser. DISC 2015, pp. 617–630. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-48653-541
Belay, A., Bittau, A., Mashtizadeh, A., Terei, D., Mazières, D., Kozyrakis, C.: Dune: Safe user-level access to privileged CPU features. In: Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 335–348. USENIX, Hollywood, CA (2012). https://www.usenix.org/conference/osdi12/technical-sessions/presentation/belay
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Bonnichsen, L.F., Probst, C.W., Karlsson, S.: Hardware transactional memory optimization guidelines, applied to ordered maps. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 3, pp. 124–131. (2015)
Brown, T.: A template for implementing fast lock-free trees using htm. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, ser. PODC ’17, pp. 293–302. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3087801.3087834
Brown, T., Kogan, A., Lev, Y., Luchangco, V.: Investigating the performance of hardware transactions on a multi-socket machine. In: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’16, pp. 121–132. Association for Computing Machinery, New York, NY (2016). https://doi.org/10.1145/2935764.2935796
Burr, G.W., Breitwisch, M.J., Franceschini, M., Garetto, D., Gopalakrishnan, K., Jackson, B., Kurdi, B., Lam, C., Lastras, L.A., Padilla, A., et al.: Phase change memory technology. J. Vac. Sci. Technol. B Nanotechnol. Microelectron. Mater. Process. Measure. Phenomena 28(2), 223–262 (2010)
Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: Transaction on 2014 Workshop. ACM (2014)
Castro, D., Romano, P., Barreto, J.: Hardware transactional memory meets memory persistency. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 368–377 (2018)
Chakrabarti, D.R., Banerjee, P., Boehm, H., Joisha, P.G., Schreiber, R.S.: The runtime abort graph and its application to software transactional memory optimization. In: International Symposium on Code Generation and Optimization (CGO 2011), pp. 42–53. (2011)
Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid norec: a case study in the effectiveness of best effort hardware transactional memory. SIGPLAN Not. 46(3), 39–52 (2011). https://doi.org/10.1145/1961296.1950373
Dice, D., Kogan, A., Lev, Y.: Refined transactional lock elision. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’16, pp. 19:1–19:12. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2851141.2851162
Dice, D., Herlihy, M., Lea, D., Lev, Y., Luchangco, V., Mesard, W., Moir, M., Moore, K., Dan, N., Sun, M.: Applications of the adaptive transactional memory test platform. Applications of the Adaptive Transactional Memory Test Platform Researchgate (2008)
Dice, D., Harris, T., Kogan, A., Lev, Y.: The influence of malloc placement on tsx hardware transactional memory. arXiv:1504.04640 (2015)
Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia, PA (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues
Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 3–14. (2014)
Giles, E., Doshi, K., Varman, P.: Continuous checkpointing of htm transactions in nvm. SIGPLAN Not. 52(9), 70–81. (2017). https://doi.org/10.1145/3156685.3092270
Hammarlund, P., Martinez, A.J., Bajwa, A.A., Hill, D.L., Hallnor, E., Jiang, H., Dixon, M., Derr, M., Hunsaker, M., Kumar, R., Osborne, R.B., Rajwar, R., Singhal, R., D’Sa, R., Chappell, R., Kaushik, S., Chennupaty, S., Jourdan, S., Gunther, S., Piazza, T., Burton, T.: Haswell: The fourth-generation intel core processor. IEEE Micro 34(2), 6–20 (2014)
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th annual international symposium on computer architecture, ser. ISCA ’93, pp. 289–300. ACM, New York, NY, USA (1993). https://doi.org/10.1145/165123.165164
Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual (2016)
Izraelevitz, J., Kogan, A., Lev, Y.: Implicit acceleration of critical sections via unsuccessful speculation. 11th ACM SIGPLAN Wkshp. on Transactional Computing, TRANSACT, vol. 16 (2016)
Izraelevitz, J., Xiang, L., Scott, M.L.: Performance improvement via always-abort htm. In: 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 79–90 (2017)
Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv:1903.05714 (2019)
Joshi, A., Nagarajan, V., Cintra, M., Viglas, S.: Dhtm: Durable hardware transactional memory. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 452–465 (2018)
Li, X., Gulila, A.: Optimised memory allocation for less false abortion and better performance in hardware transactional memory. Int. J. Parallel Emerg Distrib. Syst. (2019). https://doi.org/10.1080/17445760.2019.1605605
Liu, Y., Gottschlich, J., Pokam, G., Spear, M.: Tsxprof: Profiling hardware transactions. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 75–86. (2015)
Liu, M., Zhang, M., Chen, K., Qian, X., Wu, Y., Zheng, W., Ren, J.: Dudetm: building durable transactions with decoupling for persistent memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17. New York, NY, USA: Association for Computing Machinery, pp. 329–343 (2017). https://doi.org/10.1145/3037697.3037714
Minh, Chi Cao, Chung, JaeWoong, Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: 2008 IEEE International Symposium on Workload Characterization, pp. 35–46. (2008)
Nguyen, D., Pingali, K.: What scalable programs need from transactional memory. In: Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems, ser. ASPLOS ’17, pp. 105–118. Association for Computing Machinery, New York, NY (2017). https://doi.org/10.1145/3037697.3037750
Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the intel optane byte-addressable nvm. In: Proceedings of the International Symposium on Memory Systems, ser. MEMSYS ’19, pp. 304–315. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3357526.3357568
Sanchez, D., Yen, L., Hill, M.D., Sankaralingam, K.: Implementing signatures for transactional memory. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40, pp. 123–133. IEEE Computer Society, Washington, DC, USA (2007). https://doi.org/10.1109/MICRO.2007.24
Sanchez, D., Kozyrakis, C.: The zcache: decoupling ways and associativity. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 187–198. (2010)
Sutton, R.S., Barto, A.G.: Reinforcement learning i: Introduction (1998)
Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 46(3), 91–104 (2011). https://doi.org/10.1145/1961296.1950379
Wang, Q., Su, P., Chabbi, M., Liu, X.: Lightweight hardware transactional memory profiling. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’19. New York, NY, USA: Association for Computing Machinery, pp. 186–200. (2019). https://doi.org/10.1145/3293883.3295728
Wang, X., Zhang, W., Wang, Z., Wei, Z., Chen, H., Zhao, W.: Eunomia: scaling concurrent search trees under contention using htm. In: Sigplan Symposium on Principles and Practice of Parallel Programming
Wu, Z., Lu, K., Zhang, W., Nisbet, A., Luján, M.: POSTER: quiescent and versioned shadow copies for NVM. In: 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019, Seattle, WA, USA, September 23-26, 2019. IEEE, pp. 491–492 (2019). https://doi.org/10.1109/PACT.2019.00060
Xiang, L., Scott, M.L.: Software partitioning of hardware transactions. ACM SIGPLAN Notes 50(8), 76–86 (2015)
Xiang, L., Scott, M.L.: Compiler aided manual speculation for high performance concurrent data structures. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 47–56 (2013)
Xiang, L., Scott, M.L.: Mspec: A design pattern for concurrent data structures. 7th SIGPLAN Wkshp. on Transactional Computing (TRANSACT), New Orleans, LA (2012)
Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel\(^{\textregistered }\) transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13. New York, NY, USA: Association for Computing Machinery (2013). https://doi.org/10.1145/2503210.2503232
Zardoshti, P., Zhou, T., Balaji, P., Scott, M.L., Spear, M.: Simplifying transactional memory support in c++. ACM Trans. Archit. Code Optim. 16(3), (2019). https://doi.org/10.1145/3328796
Zhang, W., Lu, K., Wang, X.: Versionized process based on non-volatile random-access memory for fine-grained fault tolerance. Front. IT & EE 19(2), 192–205 (2018). https://doi.org/10.1631/FITEE.1601477
Zhang, W., Lu, K., Wang, X., Jian, J.: Fast persistent heap based on non-volatile memory. IEICE Trans. 100-D(5), 1035–1045 (2017). https://doi.org/10.1587/transinf.2016EDP7429
Zhang, W., Lu, K., Luján, M., Wang, X., Zhou, X.: Fine-grained checkpoint based on non-volatile memory. Front. IT & EE, 18(2), 220–234 (2017). https://doi.org/10.1631/FITEE.1500352
Zyulkyarov, F., Stipic, S., Harris, T., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: “Discovering and understanding performance bottlenecks in transactional applications. In: 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 285–294 (2010)
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. This work is supported by National High-level Personnel for Defense Technology Program (2017-JCJQ-ZQ-013), NSF 61902405.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, Z., Lu, K., Wang, R. et al. A survey on optimizations towards best-effort hardware transactional memory. CCF Trans. HPC 2, 401–414 (2020). https://doi.org/10.1007/s42514-020-00049-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-020-00049-2