Abstract
Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary’s concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup’s parallelism. As a result, the primary’s concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.














Similar content being viewed by others
Notes
Some concurrency control protocols allow two transactions to update a row’s cells in parallel [25]. For ease of exposition, we assume they cannot, but rows are not fundamental to our design—C5 could be adapted for finer granularities.
References
C5-Cicada. https://github.com/princeton-sns/c5-cicada-exp (2022)
C5-MyRocks. https://github.com/princeton-sns/c5-myrocks-exp (2022)
Adya, A.: Weak consistency: A generalized theory and optimistic implementations for distributed transactions. Ph.D. thesis, MIT, Cambridge (1999)
Antonopoulos, P., Budovski, A., Diaconu, C., Hernandez Saenz, A., Hu, J., Kodavalla, H., Kossmann, D., Lingam, S., Minhas, U.F., Prakash, N., et al.: Socrates: The new SQL server in the cloud. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1743–1756. Association for Computing Machinery, Amsterdam, The Netherlands (2019)
Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ANSI SQL isolation levels. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–10. Association for Computing Machinery, San Jose (1995)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Cha, S.K., Song, C.: P*TIME: Highly scalable OLTP DBMS for managing update-intensive stream workload. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 1033–1044. VLDB Endowment, Toronto (2004)
Chandramouli, B., Prasaad, G., Kossmann, D., Levandoski, J., Hunter, J., Barnett, M.: FASTER: A concurrent key-value store with in-place updates. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 275–290. Association for Computing Machinery, Houston (2018)
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–8. Association for Computing Machinery, Boston (1984)
Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1243–1254. Association for Computing Machinery, New York (2013)
Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., Wang, K., Ricart, G., Landweber, L., Elliott, C., Zink, M., Cecchet, E., Kar, S., Mishra, P.: The design and operation of CloudLab. In: Proc. USENIX Annual Technical Conference (ATC), pp. 1–14. USENIX Association, Renton (2019)
Elnikety, S., Pedone, F., Zwaenepoel, W.: Generalized snapshot isolation and a prefix-consistent implementation. Tech. Rep. IC/2004/21, School of Computer and Communication Sciences, EPFL, Lausanne (2004)
Facebook: MyRocks GitHub Wiki (2019). https://github.com/facebook/mysql-5.6/wiki
Facebook: RocksDB: A Persistent Key-value Store (2020). https://rocksdb.org/
Faleiro, J.M., Abadi, D.J.: Rethinking serializable multiversion concurrency control. Proc. Very Large Data Bases Endow. (PVLDB) 8(11), 1190–1201 (2015)
Faleiro, J.M., Abadi, D.J., Hellerstein, J.M.: High performance transactions via early write visibility. Proc. Very Large Data Bases Endow. (PVLDB) 10(5), 613–624 (2017)
Faleiro, J.M., Thomson, A., Abadi, D.J.: Lazy evaluation of transactions in database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 15–26. Association for Computing Machinery, Snowbird (2014)
Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Data Eng. Bull. 8(2), 3–10 (1985)
GitLab: GitLab.com Database Incident (2017). https://about.gitlab.com/2017/02/01/gitlab-dot-com-database-incident/
GitLab: Postmortem of Database Outage of January 31 (2017). https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/
Google: LevelDB (2020). https://github.com/google/leveldb
Hellerstein, J.M., Stonebraker, M., Hamilton, J.: Architecture of a database system. Found. Trends Databases 1(2), 141–259 (2007)
Helt, J., Sharma, A., Abadi, D.J., Lloyd, W., Faleiro, J.M.: C5: Cloned concurrency control that always keeps up (2022). https://doi.org/10.48550/arXiv.2207.02746. https://arxiv.org/abs/2207.02746
Hong, C., Zhou, D., Yang, M., Kuo, C., Zhang, L., Zhou, L.: KuaFu: Closing the parallelism gap in database replication. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 1186–1195. Institute of Electrical and Electronics Engineers, Brisbane (2013)
Huang, Y., Qian, W., Kohler, E., Liskov, B., Shrira, L.: Opportunities for optimism in contended main-memory multicore transactions. Proc. Very Large Data Bases Endow. (PVLDB) 13(5), 629–642 (2020)
Instagram: Instagration Part 2: Scaling Our Infrastructure To Multiple Data Centers (2015). https://instagram-engineering.com/instagration-pt-2-scaling-our-infrastructure-to-multiple-data-centers-5745cbad7834
Intel: Intel Optane Persistent Memory (2020). https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html
Intel: Intel Server Products (2020). https://www.intel.com/content/www/us/en/products/servers.html
Jeffrey, M.C., Ying, V.A., Subramanian, S., Lee, H.R., Emer, J., Sanchez, D.: Harmonizing speculative and non-speculative execution in architectures for ordered parallelism. In: Proc. IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 217–230. Institute of Electrical and Electronics Engineers, Fukuoka City (2018)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. Very Large Data Bases Endow. (PVLDB) 3(1–2), 681–692 (2010)
Kim, K., Wang, T., Johnson, R., Pandis, I.: ERMIA: Fast memory-optimized database system for heterogeneous workloads. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1675–1687. Association for Computing Machinery, San Francisco (2016)
King, R.P., Halim, N., Garcia-Molina, H., Polyzois, C.A.: Management of a remote backup copy for disaster recovery. ACM Trans. Database Syst. (TODS) 16(2), 338–368 (1991)
Larson, P.R., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. VLDB Endow. 5(4), 298–309 (2011)
Lee, J., Kim, K., Cha, S.K.: Differential logging: A commutative and associative logging scheme for highly parallel main memory databases. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 173–182. Institute of Electrical and Electronics Engineers, Heidelberg (2001)
Levandoski, J., Lomet, D., Sengupta, S., Stutsman, R., Wang, R.: High performance transactions in Deuteronomy. In: Proc. Conference on Innovative Data Systems Research (CIDR), pp. 1–12. cidrdb.org, Asilomar (2015)
Lim, H., Kaminsky, M., Andersen, D.G.: Cicada: Dependably fast multi-core in-memory transactions. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 21–35. Association for Computing Machinery, Chicago (2017)
Linux: libhugetlbfs: preload library to back text, data, malloc() or shared memory with hugepages (2020). https://linux.die.net/man/7/libhugetlbfs
Lu, H., Veeraraghavan, K., Ajoux, P., Hunt, J., Song, Y.J., Tobagus, W., Kumar, S., Lloyd, W.: Existential consistency: Measuring and understanding consistency at facebook. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 295–310. Association for Computing Machinery, Monterey (2015)
MariaDB: MariaDB 10 Parallel Replication (2017). https://mariadb.com/kb/en/parallel-replication/
Matsunobu, Y.: Making slave pre-fetching work better with SSD. https://yoshinorimatsunobu.blogspot.com/2011/10/making-slave-pre-fetching-work-better.html/ (2011)
Minhas, U.F., Rajagopalan, S., Cully, B., Aboulnaga, A., Salem, K., Warfield, A.: RemusDB: transparent high availability for database systems. VLDB J. 22(1), 29–45 (2013)
Mitzukas, D.: On MySQL replication prefetching. https://dom.as/2011/12/03/replication-prefetching/ (2011)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)
Mohan, C., Treiber, K., Obermarck, R.: Algorithms for the management of remote backup data bases for disaster recovery. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 511–518. Institute of Electrical and Electronics Engineers, Vienna (1993)
MySQL: Improving The Parallel Applier With Writeset-based Dependency Tracking (2017). https://mysqlhighavailability.com/improving-the-parallel-applier-with-writeset-based-dependency-tracking/
MySQL: Group Commit of Binary Log (2019). https://dev.mysql.com/worklog/task/?id=5223
MySQL: InnoDB (2019). https://dev.mysql.com/doc/refman/5.6/en/innodb-storage-engine.html
MySQL: MySQL 5.6 Reference Manual (2020). https://dev.mysql.com/doc/refman/5.6/en/
Narula, N., Cutler, C., Kohler, E., Morris, R.: Phase reconciliation for contended in-memory transactions. In: Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 511–524. USENIX Association, Broomfield (2014)
Nielsen, K.: Bug #74177 –slave-preserve-commit-order causes slave to deadlock and break for some queries. https://bugs.mysql.com/bug.php?id=74177 (2014)
Oracle: Method of applying changes to a standby database system (2001). Patent No. US6980988B1, Filed Oct. 1, 2001, Issued Dec. 27, 2005
Oracle: Eager replication of uncommitted transactions (2014). Patent No. US9747356B2, Filed Jan. 23, 2014, Issued Aug. 29, 2017
Oracle: Oracle 19 Database Administrator’s Guide (2020). https://docs.oracle.com/en/database/oracle/oracle-database/19/admin/index.html
Oracle: Oracle Active Data Guard (2020). https://www.oracle.com/technetwork/database/availability/dg-adg-technical-overview-wp-5347548.pdf
Oracle: Understanding Oracle GoldenGate (2020). https://docs.oracle.com/en/middleware/goldengate/core/19.1/
Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM (JACM) 26(4), 631–653 (1979)
PostgreSQL: PostgreSQL 12.1 Documentation (2020). https://www.postgresql.org/docs/12/index.html
PostgreSQL: Snapshot Synchronization Functions (2020). https://www.postgresql.org/docs/current/functions-admin.html
Qin, D., Brown, A.D., Goel, A.: Scalable replay-based replication for fast databases. Proc. Very Large Data Bases Endow. (PVLDB) 10(13), 2025–2036 (2017)
Qin, D., Brown, A.D., Goel, A.: Caracal: Contention management with deterministic concurrency control. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 180–194. Association for Computing Machinery, Virtual Event (2021)
Schwalb, D., Faust, M., Wust, J., Grund, M., Plattner, H.: Efficient transaction processing for Hyrise in mixed workload environments. In: Proc. International Workshop on In Memory Data Management and Analytics (IMDM), pp. 16–29. Springer, Hangzhou (2014)
Sharma, Y., Ajoux, P., Ang, P., Callies, D., Choudhary, A., Demailly, L., Fersch, T., Guz, L.A., Kotulski, A., Kulkarni, S., Kumar, S., Li, H., Li, J., Makeev, E., Prakasam, K., Renesse, R.V., Roy, S., Seth, P., Song, Y.J., Wester, B., Veeraraghavan, K., Xie, P.: Wormhole: Reliable pub-sub to support geo-replicated internet services. In: Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 351–366. USENIX Association, Oakland (2015)
Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M.J., Theimer, M.M., Welch, B.B.: Session guarantees for weakly consistent replicated data. In: Proc. International Conference on Parallel and Distributed Information Systems (PDIS), pp. 140–149. Institute of Electrical and Electronics Engineers, Austin (1994)
Thomson, A., Abadi, D.J.: The case for determinism in database systems. Proc. Very Large Data Bases Endow. (PVLDB) 3(1–2), 70–80 (2010)
Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: Fast distributed transactions for partitioned database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–12. Association for Computing Machinery, Scottsdale (2012)
Council, T.P.C.: TPC Benchmark C revision 5.11 (2010)
Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 18–32. Association for Computing Machinery, Farmington (2013)
Verbitski, A., Gupta, A., Saha, D., Corey, J., Gupta, K., Brahmadesam, M., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvilli, T., Bao, X.: Amazon Aurora: On avoiding distributed consensus for I/Os, commits, and membership changes. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 789–796. Association for Computing Machinery, Houston (2018)
Wang, T., Johnson, R., Pandis, I.: Query fresh: Log shipping on steroids. Proc. Very Large Data Bases Endow. (PVLDB) 11(4), 406–419 (2017)
Wang, T., Kimura, H.: Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proc. Very Large Data Bases Endow. (PVLDB) 10(2), 49–60 (2016)
Whitney, A., Shasha, D., Apter, S.: High volume transaction processing without concurrency control, two phase commit, SQL or C++. In: International Workshop on High Performance Transaction Systems, pp. 211–217. Springer, Asimolar (1997)
Yan, C., Cheung, A.: Leveraging lock contention to improve OLTP application performance. Proc. Very Large Data Bases Endow. (PVLDB) 9(5), 444–455 (2016)
Zamanian, E., Yu, X., Stonebraker, M., Kraska, T.: Rethinking database high availability with RDMA networks. Proc. Very Large Data Bases Endow. (PVLDB) 12(11), 1637–1650 (2019)
Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 465–477. USENIX Association, Broomfield (2014)
Acknowledgements
We thank the anonymous reviewers and our shepherd for their helpful comments and feedback. We are also grateful to Princeton’s systems group for their comments on earlier versions of this paper. This work was supported by the National Science Foundation under grants CNS-1824130 and IIS-1910613.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Helt, J., Sharma, A., Abadi, D.J. et al. C5: cloned concurrency control that always keeps up. The VLDB Journal 34, 24 (2025). https://doi.org/10.1007/s00778-025-00901-3
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s00778-025-00901-3
