Abstract
Dependency graph-based concurrency control (DGCC) protocol has been shown to achieve good performance on multi-core in-memory system. DGCCseparates contention resolution from the transaction execution and employs dependency graphs to derive serializable transaction schedules. However, distributed transactions complicate the dependency resolution, and therefore, an effective transaction partitioning strategy is essential to reduce expensive multi-node distributed transactions. During failure recovery, log must be examined from the last checkpoint onward and the affected transactions are re-executed based on the way they are partitioned and executed. Existing approaches treat both transaction management and recovery as two separate problems, even though recovery is dependent on the sequence in which transactions are executed. In this paper, we propose to treat the transaction management and recovery problems as one. We first propose an efficient distributed dependency graph-based concurrency control (DistDGCC) protocol for handling transactions spanning multiple nodes and propose a new novel and efficient logging protocol called dependency logging that also makes use of dependency graphs for efficient logging and recovery. DistDGCCoptimizes the average cost for each distributed transaction by processing transactions in batches. Moreover, it also reduces the effects of thread blocking caused by distributed transactions and consequently improves the runtime performance. Further, dependency logging exploits the same data structure that is used by DistDGCCto reduce the logging overhead, as well as the logical dependency information to improve the recovery parallelism. Extensive experiments are conducted to evaluate the performance of our proposed technique against state-of-the-art techniques. Experimental results show that DistDGCCis efficient and scalable, and dependency logging supports fast recovery with marginal runtime overhead. Hence, the overall system performance is significantly improved as a result.






















Similar content being viewed by others
References
Agrawal, R., Jagadish, H.: Recovery algorithms for database machines with non-volatile main memory. In: Database Machines, pp. 269–285 (1989)
Agrawal, R., Carey, M.J., Livny, M.: Concurrency control performance modeling: alternatives and implications. TODS 12(4), 609–654 (1987)
Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. PVLDB 10(4), 337–348 (2016)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010)
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. SIGMOD 14(2), 1–8 (1984)
Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: SIGMOD, pp. 1243–1254 (2013)
Eswaran, K.P., Gray, J.N., Lorie, R.A., Traiger, I.L.: The notions of consistency and predicate locks in a database system. Commun. ACM 19(11), 624–633 (1976)
Faleiro, J.M., Abadi, D.J.: Rethinking serializable multiversion concurrency control. PVLDB 8(11), 1190–1201 (2015)
Gray, J.: The transaction concept: virtues and limitations. VLDB 81, 144–154 (1981)
Hagmann, R.B.: Reimplementing the cedar file system using logging and group commit. In: SOSP, pp. 155–162 (1987)
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)
Jagadish, H., Silberschatz, A., Sudarshan, S.: Recovering from main-memory lapses. VLDB 93, 391–404 (1993)
Jagadish, H.V., Lieuwen, D., Rastogi, R., Silberschatz, A., Sudarshan, S.: Dali: a high performance main memory storage manager. In: VLDB, pp. 48–59 (1994)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3(1–2), 681–692 (2010)
Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S., Jones, E.P., Madden, S., Stonebraker, M., Zhang, Y., et al.: H-store: a high-performance, distributed main memory transaction processing system. VLDB 1(2), 1496–1499 (2008)
Kemper, A., Neumann, T.: Hyper: a hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Kim, K., Wang, T., Johnson, R., Pandis, I.: Ermia: fast memory-optimized database system for heterogeneous workloads. In: SIGMOD, pp. 1675–1687 (2016)
Kimura, H.: Foedus: OLTP engine for a thousand cores and NVRAM. In: SIGMOD, pp. 691–706 (2015)
Kimura, H., Graefe, G., Kuno, H.A.: Efficient locking techniques for databases on modern hardware. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, pp. 1–12 (2012)
Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. TODS 6(2), 213–226 (1981)
Larson, P.Å., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)
Lehman, T.J., Carey, M.J.: A concurrency control algorithm for memory-resident database systems. In: Foundations of Data Organization and Algorithms, pp. 489–504 (1989)
Li, X., Eich, M.H.: Post-crash log processing for fuzzy checkpointing main memory databases. In: ICDE, pp. 117–124 (1993)
Lomet, D., Tzoumas, K., Zwilling, M.: Implementing performance competitive logical recovery. PVLDB 4(7), 430–439 (2011)
Louie, R.H.Y., Li, Y., Vucetic, B.: Practical physical layer network coding for two-way relay channels: performance analysis and comparison. IEEE Trans. Wirel. Commun. 9(2), 764–777 (2010)
Malviya, N., Weisberg, A., Madden, S., Stonebraker, M.: Rethinking main memory OLTP recovery. In: ICDE, pp. 604–615 (2014)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. TODS 17(1), 94–162 (1992a)
Mohan, C., Pirahesh, H., Lorie, R.: Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions. SIGMOD 21(2), 124–133 (1992b)
Mu, S., Cui, Y., Zhang, Y., Lloyd, W., Li, J.: Extracting more concurrency from distributed transactions. In: OSDI, pp. 479–494 (2014)
Neumann, T., Mühlbauer, T., Kemper, A.: Fast serializable multi-version concurrency control for main-memory database systems. In: SIGMOD, pp. 677–689 (2015)
Ongaro, D., Rumble, S.M., Stutsman, R., Ousterhout, J., Rosenblum, M.: Fast crash recovery in ramcloud. In: SOSP, pp. 29–41 (2011)
Pelley, S., Wenisch, T.F., Gold, B.T., Bridge, B.: Storage management in the NVRAM era. PVLDB 7(2), 121–132 (2013)
Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. FAST 7, 17–23 (2007)
Ren, K., Thomson, A., Abadi, D.J.: Lightweight locking for main memory database systems. PVLDB 6(2), 145–156 (2012)
Ren, K., Faleiro, J.M., Abadi, D.J.: Design principles for scaling multi-core OLTP under high contention. In: SIGMOD, pp. 1583–1598 (2016)
Roos, F., Lindah, S.: Distribution system component failure rates and repair times—an overview. In: NORDAC, pp. 23–24 (2004)
Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. TDSC 7(4), 337–350 (2010)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., et al: C-store: a column-oriented dbms. In: VLDB, pp. 553–564 (2005)
Tan, K., Cai, Q., Ooi, B.C., Wong, W., Yao, C., Zhang, H.: In-memory databases: challenges and opportunities from software and hardware perspectives. SIGMOD Rec. 44(2), 35–40 (2015)
Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP, pp. 18–32 (2013)
Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. PVLDB 7(10), 865–876 (2014)
Wu, Y., Chan, C.Y., Tan, K.L.: Transaction healing: scaling optimistic concurrency control on multicores. In: SIGMOD, pp. 1689–1704 (2016)
Yao, C., Agrawal, D., Chen, G., Lin, Q., Ooi, B.C., Wong, W.F., Zhang, M.: Exploiting single-threaded model in multi-core in-memory systems. TKDE 28(10), 2635–2650 (2016a)
Yao, C., Agrawal, D., Chen, G., Ooi, B.C., Wu, S.: Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases. In: SIGMOD, pp. 1119–1134 (2016b)
Yu, X., Bezerra, G., Pavlo, A., Devadas, S., Stonebraker, M.: Staring into the abyss: an evaluation of concurrency control with one thousand cores. PVLDB 8(3), 209–220 (2014)
Yu, X., Pavlo, A., Sanchez, D., Devadas, S.: Tictoc: time traveling optimistic concurrency control. SIGMOD 8, 209–220 (2016)
Yuan, Y., Wang, K., Lee, R., Ding, X., Xing, J., Blanas, S., Zhang, X.: Bcc: reducing false aborts in optimistic concurrency control with low cost for in-memory databases. PVLDB 9(6), 504–515 (2016)
Zhang, H., Chen, G., Ooi, B.C., Tan, K.L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)
Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: OSDI, pp. 465–477 (2014)
Acknowledgements
The work in this paper is supported by the National Research Foundation, Prime Ministers Office, Singapore, under its Competitive Research Programme (CRP Award No. NRF-CRP8-2011-08). The work is also in part supported by the Tencent-NUS Collaborative Research Grant. We would like to thank the anonymous reviewers for their insightful feedback that helped improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yao, C., Zhang, M., Lin, Q. et al. Scaling distributed transaction processing and recovery based on dependency logging. The VLDB Journal 27, 347–368 (2018). https://doi.org/10.1007/s00778-018-0500-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0500-2