C5: cloned concurrency control that always keeps up

Jeffrey Helt ORCID: orcid.org/0000-0003-1192-7111¹,
Abhinav Sharma²,
Daniel J. Abadi³,
Wyatt Lloyd¹ &
…
Jose M. Faleiro⁴

254 Accesses
Explore all metrics

Abstract

Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary’s concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup’s parallelism. As a result, the primary’s concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 6

An Open Source Synchronous and Asynchronous Approach for Database Replication

Predicting Replication Time in Cassandra

Plover: parallel logging for replication systems

Article 03 January 2020

Notes

Some concurrency control protocols allow two transactions to update a row’s cells in parallel [25]. For ease of exposition, we assume they cannot, but rows are not fundamental to our design—C5 could be adapted for finer granularities.

References

C5-Cicada. https://github.com/princeton-sns/c5-cicada-exp (2022)
C5-MyRocks. https://github.com/princeton-sns/c5-myrocks-exp (2022)
Adya, A.: Weak consistency: A generalized theory and optimistic implementations for distributed transactions. Ph.D. thesis, MIT, Cambridge (1999)
Antonopoulos, P., Budovski, A., Diaconu, C., Hernandez Saenz, A., Hu, J., Kodavalla, H., Kossmann, D., Lingam, S., Minhas, U.F., Prakash, N., et al.: Socrates: The new SQL server in the cloud. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1743–1756. Association for Computing Machinery, Amsterdam, The Netherlands (2019)
Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ANSI SQL isolation levels. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–10. Association for Computing Machinery, San Jose (1995)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Google Scholar
Cha, S.K., Song, C.: P*TIME: Highly scalable OLTP DBMS for managing update-intensive stream workload. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 1033–1044. VLDB Endowment, Toronto (2004)
Chandramouli, B., Prasaad, G., Kossmann, D., Levandoski, J., Hunter, J., Barnett, M.: FASTER: A concurrent key-value store with in-place updates. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 275–290. Association for Computing Machinery, Houston (2018)
DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–8. Association for Computing Machinery, Boston (1984)
Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1243–1254. Association for Computing Machinery, New York (2013)
Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., Wang, K., Ricart, G., Landweber, L., Elliott, C., Zink, M., Cecchet, E., Kar, S., Mishra, P.: The design and operation of CloudLab. In: Proc. USENIX Annual Technical Conference (ATC), pp. 1–14. USENIX Association, Renton (2019)
Elnikety, S., Pedone, F., Zwaenepoel, W.: Generalized snapshot isolation and a prefix-consistent implementation. Tech. Rep. IC/2004/21, School of Computer and Communication Sciences, EPFL, Lausanne (2004)
Facebook: MyRocks GitHub Wiki (2019). https://github.com/facebook/mysql-5.6/wiki
Facebook: RocksDB: A Persistent Key-value Store (2020). https://rocksdb.org/
Faleiro, J.M., Abadi, D.J.: Rethinking serializable multiversion concurrency control. Proc. Very Large Data Bases Endow. (PVLDB) 8(11), 1190–1201 (2015)
Google Scholar
Faleiro, J.M., Abadi, D.J., Hellerstein, J.M.: High performance transactions via early write visibility. Proc. Very Large Data Bases Endow. (PVLDB) 10(5), 613–624 (2017)
Google Scholar
Faleiro, J.M., Thomson, A., Abadi, D.J.: Lazy evaluation of transactions in database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 15–26. Association for Computing Machinery, Snowbird (2014)
Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Data Eng. Bull. 8(2), 3–10 (1985)
Google Scholar
GitLab: GitLab.com Database Incident (2017). https://about.gitlab.com/2017/02/01/gitlab-dot-com-database-incident/
GitLab: Postmortem of Database Outage of January 31 (2017). https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/
Google: LevelDB (2020). https://github.com/google/leveldb
Hellerstein, J.M., Stonebraker, M., Hamilton, J.: Architecture of a database system. Found. Trends Databases 1(2), 141–259 (2007)
Article Google Scholar
Helt, J., Sharma, A., Abadi, D.J., Lloyd, W., Faleiro, J.M.: C5: Cloned concurrency control that always keeps up (2022). https://doi.org/10.48550/arXiv.2207.02746. https://arxiv.org/abs/2207.02746
Hong, C., Zhou, D., Yang, M., Kuo, C., Zhang, L., Zhou, L.: KuaFu: Closing the parallelism gap in database replication. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 1186–1195. Institute of Electrical and Electronics Engineers, Brisbane (2013)
Huang, Y., Qian, W., Kohler, E., Liskov, B., Shrira, L.: Opportunities for optimism in contended main-memory multicore transactions. Proc. Very Large Data Bases Endow. (PVLDB) 13(5), 629–642 (2020)
Google Scholar
Instagram: Instagration Part 2: Scaling Our Infrastructure To Multiple Data Centers (2015). https://instagram-engineering.com/instagration-pt-2-scaling-our-infrastructure-to-multiple-data-centers-5745cbad7834
Intel: Intel Optane Persistent Memory (2020). https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html
Intel: Intel Server Products (2020). https://www.intel.com/content/www/us/en/products/servers.html
Jeffrey, M.C., Ying, V.A., Subramanian, S., Lee, H.R., Emer, J., Sanchez, D.: Harmonizing speculative and non-speculative execution in architectures for ordered parallelism. In: Proc. IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 217–230. Institute of Electrical and Electronics Engineers, Fukuoka City (2018)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. Very Large Data Bases Endow. (PVLDB) 3(1–2), 681–692 (2010)
Google Scholar
Kim, K., Wang, T., Johnson, R., Pandis, I.: ERMIA: Fast memory-optimized database system for heterogeneous workloads. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1675–1687. Association for Computing Machinery, San Francisco (2016)
King, R.P., Halim, N., Garcia-Molina, H., Polyzois, C.A.: Management of a remote backup copy for disaster recovery. ACM Trans. Database Syst. (TODS) 16(2), 338–368 (1991)
Article Google Scholar
Larson, P.R., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. VLDB Endow. 5(4), 298–309 (2011)
Article Google Scholar
Lee, J., Kim, K., Cha, S.K.: Differential logging: A commutative and associative logging scheme for highly parallel main memory databases. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 173–182. Institute of Electrical and Electronics Engineers, Heidelberg (2001)
Levandoski, J., Lomet, D., Sengupta, S., Stutsman, R., Wang, R.: High performance transactions in Deuteronomy. In: Proc. Conference on Innovative Data Systems Research (CIDR), pp. 1–12. cidrdb.org, Asilomar (2015)
Lim, H., Kaminsky, M., Andersen, D.G.: Cicada: Dependably fast multi-core in-memory transactions. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 21–35. Association for Computing Machinery, Chicago (2017)
Linux: libhugetlbfs: preload library to back text, data, malloc() or shared memory with hugepages (2020). https://linux.die.net/man/7/libhugetlbfs
Lu, H., Veeraraghavan, K., Ajoux, P., Hunt, J., Song, Y.J., Tobagus, W., Kumar, S., Lloyd, W.: Existential consistency: Measuring and understanding consistency at facebook. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 295–310. Association for Computing Machinery, Monterey (2015)
MariaDB: MariaDB 10 Parallel Replication (2017). https://mariadb.com/kb/en/parallel-replication/
Matsunobu, Y.: Making slave pre-fetching work better with SSD. https://yoshinorimatsunobu.blogspot.com/2011/10/making-slave-pre-fetching-work-better.html/ (2011)
Minhas, U.F., Rajagopalan, S., Cully, B., Aboulnaga, A., Salem, K., Warfield, A.: RemusDB: transparent high availability for database systems. VLDB J. 22(1), 29–45 (2013)
Article Google Scholar
Mitzukas, D.: On MySQL replication prefetching. https://dom.as/2011/12/03/replication-prefetching/ (2011)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)
Article Google Scholar
Mohan, C., Treiber, K., Obermarck, R.: Algorithms for the management of remote backup data bases for disaster recovery. In: Proc. IEEE International Conference on Data Engineering (ICDE), pp. 511–518. Institute of Electrical and Electronics Engineers, Vienna (1993)
MySQL: Improving The Parallel Applier With Writeset-based Dependency Tracking (2017). https://mysqlhighavailability.com/improving-the-parallel-applier-with-writeset-based-dependency-tracking/
MySQL: Group Commit of Binary Log (2019). https://dev.mysql.com/worklog/task/?id=5223
MySQL: InnoDB (2019). https://dev.mysql.com/doc/refman/5.6/en/innodb-storage-engine.html
MySQL: MySQL 5.6 Reference Manual (2020). https://dev.mysql.com/doc/refman/5.6/en/
Narula, N., Cutler, C., Kohler, E., Morris, R.: Phase reconciliation for contended in-memory transactions. In: Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 511–524. USENIX Association, Broomfield (2014)
Nielsen, K.: Bug #74177 –slave-preserve-commit-order causes slave to deadlock and break for some queries. https://bugs.mysql.com/bug.php?id=74177 (2014)
Oracle: Method of applying changes to a standby database system (2001). Patent No. US6980988B1, Filed Oct. 1, 2001, Issued Dec. 27, 2005
Oracle: Eager replication of uncommitted transactions (2014). Patent No. US9747356B2, Filed Jan. 23, 2014, Issued Aug. 29, 2017
Oracle: Oracle 19 Database Administrator’s Guide (2020). https://docs.oracle.com/en/database/oracle/oracle-database/19/admin/index.html
Oracle: Oracle Active Data Guard (2020). https://www.oracle.com/technetwork/database/availability/dg-adg-technical-overview-wp-5347548.pdf
Oracle: Understanding Oracle GoldenGate (2020). https://docs.oracle.com/en/middleware/goldengate/core/19.1/
Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM (JACM) 26(4), 631–653 (1979)
Article MathSciNet Google Scholar
PostgreSQL: PostgreSQL 12.1 Documentation (2020). https://www.postgresql.org/docs/12/index.html
PostgreSQL: Snapshot Synchronization Functions (2020). https://www.postgresql.org/docs/current/functions-admin.html
Qin, D., Brown, A.D., Goel, A.: Scalable replay-based replication for fast databases. Proc. Very Large Data Bases Endow. (PVLDB) 10(13), 2025–2036 (2017)
Google Scholar
Qin, D., Brown, A.D., Goel, A.: Caracal: Contention management with deterministic concurrency control. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 180–194. Association for Computing Machinery, Virtual Event (2021)
Schwalb, D., Faust, M., Wust, J., Grund, M., Plattner, H.: Efficient transaction processing for Hyrise in mixed workload environments. In: Proc. International Workshop on In Memory Data Management and Analytics (IMDM), pp. 16–29. Springer, Hangzhou (2014)
Sharma, Y., Ajoux, P., Ang, P., Callies, D., Choudhary, A., Demailly, L., Fersch, T., Guz, L.A., Kotulski, A., Kulkarni, S., Kumar, S., Li, H., Li, J., Makeev, E., Prakasam, K., Renesse, R.V., Roy, S., Seth, P., Song, Y.J., Wester, B., Veeraraghavan, K., Xie, P.: Wormhole: Reliable pub-sub to support geo-replicated internet services. In: Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 351–366. USENIX Association, Oakland (2015)
Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M.J., Theimer, M.M., Welch, B.B.: Session guarantees for weakly consistent replicated data. In: Proc. International Conference on Parallel and Distributed Information Systems (PDIS), pp. 140–149. Institute of Electrical and Electronics Engineers, Austin (1994)
Thomson, A., Abadi, D.J.: The case for determinism in database systems. Proc. Very Large Data Bases Endow. (PVLDB) 3(1–2), 70–80 (2010)
Google Scholar
Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: Fast distributed transactions for partitioned database systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 1–12. Association for Computing Machinery, Scottsdale (2012)
Council, T.P.C.: TPC Benchmark C revision 5.11 (2010)
Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: Proc. ACM Symposium on Operating Systems Principles (SOSP), pp. 18–32. Association for Computing Machinery, Farmington (2013)
Verbitski, A., Gupta, A., Saha, D., Corey, J., Gupta, K., Brahmadesam, M., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvilli, T., Bao, X.: Amazon Aurora: On avoiding distributed consensus for I/Os, commits, and membership changes. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 789–796. Association for Computing Machinery, Houston (2018)
Wang, T., Johnson, R., Pandis, I.: Query fresh: Log shipping on steroids. Proc. Very Large Data Bases Endow. (PVLDB) 11(4), 406–419 (2017)
Wang, T., Kimura, H.: Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. Proc. Very Large Data Bases Endow. (PVLDB) 10(2), 49–60 (2016)
Google Scholar
Whitney, A., Shasha, D., Apter, S.: High volume transaction processing without concurrency control, two phase commit, SQL or C++. In: International Workshop on High Performance Transaction Systems, pp. 211–217. Springer, Asimolar (1997)
Yan, C., Cheung, A.: Leveraging lock contention to improve OLTP application performance. Proc. Very Large Data Bases Endow. (PVLDB) 9(5), 444–455 (2016)
Google Scholar
Zamanian, E., Yu, X., Stonebraker, M., Kraska, T.: Rethinking database high availability with RDMA networks. Proc. Very Large Data Bases Endow. (PVLDB) 12(11), 1637–1650 (2019)
Google Scholar
Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 465–477. USENIX Association, Broomfield (2014)

Download references

Acknowledgements

We thank the anonymous reviewers and our shepherd for their helpful comments and feedback. We are also grateful to Princeton’s systems group for their comments on earlier versions of this paper. This work was supported by the National Science Foundation under grants CNS-1824130 and IIS-1910613.

Author information

Authors and Affiliations

Princeton University, Princeton, New Jersey, USA
Jeffrey Helt & Wyatt Lloyd
Meta Platforms, Menlo Park, California, USA
Abhinav Sharma
University of Maryland, College Park, California, USA
Daniel J. Abadi
San Francisco, California, USA
Jose M. Faleiro

Authors

Jeffrey Helt
View author publications
Search author on:PubMed Google Scholar
Abhinav Sharma
View author publications
Search author on:PubMed Google Scholar
Daniel J. Abadi
View author publications
Search author on:PubMed Google Scholar
Wyatt Lloyd
View author publications
Search author on:PubMed Google Scholar
Jose M. Faleiro
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey Helt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Helt, J., Sharma, A., Abadi, D.J. et al. C5: cloned concurrency control that always keeps up. The VLDB Journal 34, 24 (2025). https://doi.org/10.1007/s00778-025-00901-3

Download citation

Received: 20 September 2024
Revised: 22 December 2024
Accepted: 16 January 2025
Published: 12 February 2025
Version of record: 12 February 2025
DOI: https://doi.org/10.1007/s00778-025-00901-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Open Source Synchronous and Asynchronous Approach for Database Replication

Predicting Replication Time in Cassandra

Plover: parallel logging for replication systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

C5: cloned concurrency control that always keeps up

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Open Source Synchronous and Asynchronous Approach for Database Replication

Predicting Replication Time in Cassandra

Plover: parallel logging for replication systems

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now