Concurrency Control with Java and Relational Databases
Sérgio Soares and Paulo Borba
Informatics Center Federal University of Pernambuco Recife, PE, Brazil {scbs,phmb}@cin.ufpe.br
Abstract are more effective for developing web–based systems be-
cause they are tailored to a specific architecture widely used As web–based information systems usually run in con- to develop this kind of system. They also assume the use of current environment, the complexity for implementing and relational databases for implementing persistence, and con- testing those systems is significantly high. Therefore it is sider the databases concurrency control support. We also useful to have guidelines to introduce concurrency control, analyze the performance of several concurrency controls, avoiding ad hoc control strategies, which may have a neg- identifying the most efficient ones. ative impact in efficiency and may not guarantee system Since our architecture demands the manipulation of both safety. This paper defines guidelines for concurrency con- Java [10] objects and database tables, we must use pro- trol in web–based information systems implemented in Java gramming languages concurrency control mechanisms over with relational databases. In particular, we show where the objects. Our guidelines use concurrency control mech- Java and relational database concurrency control mecha- anisms of both Java and relational databases, which are nisms should be used in order to implement our concur- widely used for implementing web–based information sys- rency control strategy. Additionally, we analyze the perfor- tems. The guidelines indicate where each mechanism shall mance of different concurrency controls approaches. The be used in order to guarantee system correctness without re- main point of the guidelines is to guarantee system correct- dundant and expensive concurrency control. Therefore pro- ness without redundant concurrency control, both increas- grammers know which concurrency controls must be im- ing performance and guaranteeing safety. plemented by the database mechanisms and which ones the programming language features must implement. A pes- simist approach can leave all the concurrency control to 1. Introduction the database management system. However, this approach does not manipulate objects, avoiding several benefits of the As web–based information systems usually run in con- object–orientation. current environment, the complexity for implementing and The negative impact on efficiency caused by program- testing those systems is significantly increased. In fact, sub- ming language concurrency control mechanisms, and the tle implementation errors might appear and are usually dif- need for guidelines that support the correct and efficient in- ficult to detect and locate. This indicates that we need ade- troduction of concurrency control, has been reported by sev- quate ways to implement concurrent programs. In particu- eral researchers [3, 1, 6]. They are worried in guaranteeing lar, it is useful to have guidelines to introduce concurrency execution efficiency, avoiding unnecessary synchronization control, avoiding ad hoc control strategies, which may have in Java concurrent programs. Our approach differs in the a negative impact in efficiency, making redundant controls. sense that it is not general, but tailored to a specific architec- Moreover, ad hoc control strategies may not guarantee sys- ture and considers both language and database concurrency tem safety, adding new race conditions that lead to invalid control mechanisms. Although specific, a wide range of ap- executions. plication can be developed using this architecture. Some In order to avoid those problems, the guidelines pre- examples are shown in the next section. sented here guarantee safety, standardizing the control We applied our guidelines to two real web–based infor- strategies, favoring system extensibility, and improving mation systems. In the first one, which was already imple- maintainability. Contrasting with general well–known [11, mented and running, we used the guidelines as a mechanism 14] design patterns for concurrency control, our guidelines for checking whether the system controls concurrency cor- rectly. This helped us to validate the guidelines, since some concurrency control that our guidelines define was used in those systems, such as using transactions only when it is necessary. In the second system, the guidelines were ap- plied during the implementation phase and were responsible up to 10% for the implementation time, which demonstrate the small impact of them. This paper is structured in five sections. In Section 2 we present a specific architecture to developing web–based information systems. After that, in Section 3, we define the concurrency control guidelines, which aim to guarantee safety and efficiency. We identify the mechanism (database or programming language) that should be used in each sit- uation of the presented architecture. We discuss the perfor- mance impact of different concurrency control approaches in Section 4. Related approaches and conclusions are pre- sented in Section 5.
2. A Specific Architecture for Web–based Sys- Figure 1. Architecture’s class diagram.
tems
This specific architecture, which the guidelines are tai-
lored in, tries to provide some requirements that can be con- the development of web–based information systems. sidered general to web–based information systems, includ- ing the ability to work in a distributed environment, guar- antee data integrity and persistence in production environ- 3. Concurrency Control Guidelines ment, possibility to change the persistence mechanism, or use volatile data, for requirements validation, and be easily maintained. In order to attend those requirements a layer In this section we indicate the concurrency control mech- architecture and design patterns were used in a Java im- anisms (programming language or database) that should be plementation. This layer architecture aims to separate data used in each part of the code structured according to the management, business rules, communication (distribution), pattern presented in the previous section (see Figure 1). and presentation (user interface) concerns. Such structure By following those guidelines we can have a safe, non– prevents tangled code, for example, business code interlac- redundant, and efficient concurrency control. We define ing with data access code, and design patterns allow us to specific guidelines for each kind of class of the architecture reach greater reuse and extensibility levels. presented in Section 2. The guidelines prevent naive con- Figure 1 presents an UML [4] class diagram that illus- trols such as synchronizing all the system access methods trates the software architecture and design patterns [13, 9, 5] (facade’s methods) or implementing database transactions considered here, for a simple bank example. Accesses to for all those methods. We also indicate the most commonly the system are made through a unique entry point, the sys- applied concurrency control techniques according to our ex- tem facade [9] (Bank). The system facade also implements perience acquired through the implementation and analysis the Singleton [9] design pattern to guarantee that there is of systems that use the same software architecture consid- just a single instance of this class, which is the instance to ered here. be distributed over the user interfaces. The facade is com- General guidelines, based in basic principles of concur- posed of business collections, target of facade methods del- rency control, are needed to avoid concurrent executions of egation. Accounts are registered, updated, and queried in non–atomic methods and methods that read the attributes a web client implemented using Java Servlets technology. modified by the non-atomic ones. Examples of this kind of Section 3 provides more details about the classes of the soft- method are the ones that access long or double attributes, ware architecture. since the Java specification [10] does not guarantee atomic Although the guidelines presented in this paper have assignments to these types. To avoid such concurrent execu- been defined for a specific layer architecture and associated tions we can use the Java synchronized method modifier, design patterns [13], this architecture can be used to imple- which serializes concurrent executions of modified methods ment several kinds of system, but it is especially useful for in the same object. 3.1. Business Basic Classes collection. After executing, the account’s balance infor- mation may be inconsistent because each thread works with We start to define the specific guidelines with the busi- a different copy of the same object, and the second copy to ness basic classes (see Figure 1), which represent system be updated overwrites the first one. basic objects such as customers and accounts. First of all, In order to avoid this problem with concurrent updates we must identify which basic objects might be concurrently of object copies, our guideline suggests the implementation accessed. of an update control for each persistent object. This can be done using a timestamp–based technique that adds version information to those objects. Note that this technique is not Identifying Concurrent Access an implementation of a database algorithm such as times- The data collections classes (see Figure 1) determine if an tamp ordering [12]. The idea is to allow object updating just object may be concurrently accessed because these classes if there is not a newer version of it stored in the database. are responsible for storing and retrieving basic objects. Otherwise, the operation must be restarted. Usually, persistent collections that use relational databases create a new instance, with the data retrieved from the 3.2. Data Collection Classes database, for each request to search for an object. The data collections classes (see Figure 1) are responsi- Therefore, if two threads, for example, try to retrieve ac- ble to store and retrieve basic objects. They implement the counts with the same number, the threads will get references business–data interfaces, which are abstractions of the sys- to distinct copies of the object stored in the database. This tem’s data management. These interfaces allow us to easily avoids concurrent access on any Account instance, since extend the system changing the data management mecha- we assume that the concurrency environment allows two or nism, by providing implementations of the business–data more clients to access the system, but each user access it interfaces to the respectively data management mechanism. sequentially. In the persistent data collections we can also find concur- An alternative to this approach for implementing data rency problems when an object is inserted, updated, or re- collections is to use object caching to guarantee that there moved from the database. In these cases we must guarantee will be a single object in memory for each entity stored in that database features are properly used; we must include, the database. This approach is used by some relational and update, or remove an object inside a database transaction. object–oriented database access APIs [16, 17, 7] to prevent In this way, we can assume that great part of the concur- inconsistencies that might happen, for example, in the case rency control, regarding to data update, is implemented by of a concurrent update of two copies of the same account. the database. Considering this, we must guarantee that the In this case, concurrent requests to retrieve an account with methods of the facade are atomic regarding database access. the same number will receive a same reference to the object We can do this by implementing database accesses with a stored in the cache. This allows the concurrent access to single SQL command, or by implementing transactions in basic objects, which is controlled by basic guideline men- the facade’s methods, using the persistence mechanism in- tioned in the beginning of the section. terface services. To guarantee the atomicity of the data col- lection methods we must follow the following steps: Introducing Concurrency Control • Identify data collection methods that directly or indi- After identifying the basic objects that might be concur- rectly execute two or more SQL commands; rently accessed, we must apply the general guidelines to • Identify business collection methods that call the data the corresponding classes. In the classes whose objects collection methods identified in the last step; are not concurrently accessed we must still analyze situa- tions where concurrent updates of copies of the same object • Identify facade methods that call business collection should not be allowed. For example, an Account class has methods identified in the last step; methods such as deposit and withdraw, which update the • The facade methods identified in the last step balance attribute based on its old value. Concurrent up- must use the persistence mechanism interface meth- dates of two copies of the same account might take system ods beginTransaction, commitTransaction and to an inconsistent state. rollbackTransaction to implement a transaction Consider a possible execution where the balances of on its body. copies of the same Account are concurrently modified, where two threads concurrently execute three operations: to For example, in the following update method of the request an account to the data collection, to deposit a value Bank class, the underlined pieces of code are responsible in the retrieved object, and to update the object in the data for the transaction mechanism implementation. public class Bank { be executed with a single SQL command, we must imple- private AccountsRecords accounts; ment transactions in the facade’s methods, using the per- private PersistenceMechanismInterface pm; sistence mechanism interface services. The same control public void update(Account account) { is valid when a business collection implement an opera- try { tion with multiplies calls to data collections methods. One pm.beginTransaction(); example of this kind of method is the business collection accounts.update(account); pm.commitTransaction(); method transfer, which might be implemented by a call } to withdraw followed by a call to deposit. catch (DBTransactionException e) { pm.rollbackTransaction(); 3.4. Commonly Applied Controls } } ... After implementing and analyzing some running web– } based systems that use the layer architecture and the de- sign patterns presented here, we can identify which concur- Assuming that in order to update an account we had to rency controls were frequently applied. Some of the sys- invoke two SQL commands. tems we implemented and analyzed are a system to manage a telecommunication company’s clients, a system for per- 3.3. Facade and Business Collection Classes forming on–line exams, a the system for registering health complaints to the health authorities. The business collection classes are responsible for im- The only concurrency control in the facade class, usu- plementing verifications and validations according to the ally, is implementing transactions. In the business col- application business logic. Such classes use the business– lections there are some calls to the concurrency manager data interface services to store and retrieving basic objects. methods in order to avoid interference by business verifica- The facade class [9] provides an unified interface with all tions. Method synchronization is made only in the update system services, grouping all instances of the business col- method of the data collections classes that implement the lection classes. timestamp mechanism, and in some business collections Our guidelines for business collections and facade are methods that do not use the concurrency manager. This mainly concerned with identifying business logic that might happens when the system has few simultaneous users ac- lead to race conditions. An example of this kind of rule is cessing the system and these methods are lightweight, as verifying, before insert an object in a collection, if there is discussed in the following section. In the basic classes, the an object with the same code, or any sort of information that concurrency control commonly applied is to implement the is used in a primary key sense, of the object to be inserted. timestamp mechanism in the classes whose copies of the A concurrent execution that tries to register two objects with same object cannot be concurrently updated. This is our al- the same code may lead the system to an inconsistent state. ternative to intuitive controls that tend to synchronize and to Automatically allocating a code for each object, for exam- implement transactions in all facade methods, which is not ple, using a relational database sequence, or implementing efficient nor safety. this sequence in the business collection, which eliminates the need for the code verification, can avoid this problem. For other business verifications that generate situations 4. Performance Evaluation like the one described in the last paragraph we must pre- vent the concurrent execution synchronizing the methods In this section we present and analyze performance responsible for the verification. Therefore we should use the tests with different techniques and concurrency control ap- synchronized method modifier or the Concurrency Man- proaches, including the ones suggested by the guidelines ager pattern [15], which, provides an alternative to method definition. The tests show that some of these approaches synchronization aiming performance increasing. Concur- are not recommended, validating the advantages of the ap- rency Manager uses knowledge about the semantics of the proach suggested by our guidelines. Moreover, the tests methods in order to block only conflicting execution flows, support the decision of which alternative to use for concur- allowing the non–conflicting ones to execute concurrently. rency control, since some of our guidelines offer more than Another concurrency problem occurs when, for exam- one solution for some problems. ple, the facade class implements some operation with mul- tiple calls to methods of business collections. This will in- 4.1. Performance Tests directly call methods of the data collections, which implies in executing more than one SQL command to the database. We implemented a small customers registering system, As we mentioned in Section 3.2, when an operation cannot which allows customers registering, retrieving, and updat- ing. The tests execute these three operations in different ver- 12 available connections with the persistence mechanism, sions of the system, each one implementing different con- which were shared between the threads, without concurrent currency controls. We first compare the following versions: access to them.
• No control: system without concurrency control; 4.2. Performance Analysis
• Synchronized facade: all facade class methods syn- chronized; The following paragraphs summarize the tests with the different variations such as system workload and method • Facade with transactions: all facade class methods im- weight. plementing transactions; • Suggested Control: applying the defined guidelines, General Approaches for Concurrency Control with the concurrency manager in the business collec- Figure 2 presents a bar chart that compares the no concur- tion and the timestamp for the basic class. rency control system with the approach that synchronizes Only the synchronized facade approach and the one that all facade methods, the one that implements transactions in applies our guidelines guarantee system correctness, if ap- all facade methods, and the approach that applies our guide- plied separately. In fact, the former just guarantees system lines. correctness if two copies of a basic object can be concur- rently updated, otherwise the approach is not safety. The no control approach is used as a reference to measure the controls impact. The second test measures the impact of the timestamp mechanism, so we compare the following system versions: • No control: system without concurrency control; • Timestamp: timestamp mechanism implemented for Figure 2. Impact of concurrency controls. the Customer basic class; We also analyze our alternative for method synchroniza- tion, comparing the following system versions: The chart shows that synchronizing all facade methods is a very expensive approach, increasing the execution time • Synchronization in the Business Collection: business up to 120%. We can notice a significant overhead, more collection insert method synchronized; than 50%, when implementing transaction for all facade • Concurrency Manager in the Business Collection: methods. This is a motivation to apply our guidelines for business collection insert method using the Concur- concurrency control, because it indicates exactly which fa- rency Manager pattern; cade methods must implement transactions. We can con- clude that our guidelines impact is small, less than 10%, if For each of the different versions we also analyze varia- compared with the others approaches. However, this is a tions in the controlled methods. Thus, we can measure the necessary impact to guarantee the system safety. impact of different concurrency controls in different types This chart shows data observed by executing lightweight of systems represented by those variations. For example, method. In systems with heavyweight methods the exe- variations on method execution time were implemented by cution time increases 50%, but the difference of execution including a loop that increases the execution time in approx- time between the approaches is lower. imately 100%, which we called heavyweight methods. An- other variation in the tests is an increasing in the system Timestamp workload. We tested the system with workloads between 3 and 600 threads concurrently accessing the system. There- Similar to the results of the previous test, we can see in Fig- fore we could simulate a situation of extreme concurrency, ure 3 the performance impact of the timestamp mechanism. hardly found in access rates of real systems. For instance, The impact with lightweight methods is smaller then with we collected numbers for a considerably used web system heavyweight methods. This occurs because of the number that has a reasonable access rate of more than 400 users of connections, which restricts the number of concurrent re- (not hits) accessing the system in an hour, whereas our ex- quests to the persistence mechanism. If the methods of the periments execute in few seconds. In our tests there were data collection are heavier they take more time to execute and keep the connections for a bigger time, delaying the ex- ecution of others threads. In these tests we had 12 available connections with the persistence mechanism. Although a considerable impact in the case of heavyweight methods, this is a necessary control to guarantee system correctness. However, increasing the number of available connections adding more connections might decrease this impact.
Figure 4. Concurrency Manager versus
synchronized.
5. Conclusions
Contrasting with general well–know [11, 14] approaches
for concurrency control, we defined guidelines for develop- Figure 3. Timestamp mechanism impact. ing web–based systems tailored to a specific architecture widely used to develop this kind of system. We also as- sume the use of relational databases, which are also widely used for implementing web–based information systems. Our guidelines use the concurrency control mechanism of Concurrency Manager both Java [10], the programming language, and relational databases. However, those approaches [11, 14] cover a big- The last test compares the concurrency manager design pat- ger portion of the systems concurrency problems, since it is tern and the synchronized modifier performance to con- not tailored to a specific architecture for web–based systems trol the concurrency in the business collection methods. We using relational databases, as our guidelines. considered another variation besides the system workload The main point of the guidelines is to guarantee sys- and methods weight: some of the tests lead to race condi- tem correctness without redundant concurrency control, tion, on the same basic object, allowing the evaluation of both increasing performance and guaranteeing safety. The our solution according to this aspect. This variation is im- database management systems (DBMS) deal with a big por- plemented creating threads that try to insert, to update, and tion of the system concurrency control reducing the need for to retrieve customers with the same code. We considered programming language concurrency control features. This this variation because it suggests the use of different alter- is the case for basic classes, where language features are natives in specific situations, since the concurrency manager only necessary if their objects are concurrently accessed. uses this information (methods semantics) to synchronize According to our experience this is not the case for many the threads only when it is necessary. web–based systems developed with JDBC. However, the Figure 4 shows the negative performance impact caused DBMS do not solve all problems related to concurrency by the synchronized modifier versus compared with the control. Therefore we need to know where to use program- concurrency manager. When few users are accessing ming language features in order to avoid redundant controls the system concurrently the synchronized modified is and their negative performance impact. We have shown that slightly worst (up to 3%) than concurrency manager. For those problems can be solved through our guidelines for systems that expect this access rate we suggest the adop- concurrency control, preventing losses from 10% to 110% tion of the synchronized modifier solution, since it is sim- in the execution time, when compared with naive solutions pler to implement and to maintain. For systems that expect such as synchronizing facade methods. higher access rates, we suggest the use of the concurrency The concern about avoiding unnecessary concurrent con- manager, which offer a performance going of up to 20%. trol is topic of many works [3, 1, 6]. One of them [3] This gain is bigger (up to 30%) when there is no race con- uses global data flow analyses to identify what objects with dition over the same object, which is typically the case for synchronized methods cannot be concurrently accessed in systems like the one we analyzed. In the tests with heavy- a specific program. An advantage of this approach is that weight methods the control impact of the synchronized it is completely automatized. In fact, a big portion of our modifier is less than with lightweight methods, but still rel- guidelines can also be automatized. Our approach differs evant (10% to 20%). from this one because we guide the system implementation to avoid unnecessary synchronization, also giving guide- programming, systems, languages, and applications, pages lines to control race conditions added by business polices. 207–222, November 1999. However, contrasting with our approach, this related work [2] I. M. Author. Some related article I wrote. Some Fine Jour- doesn’t guarantee system safety. It guarantees just the nal, 99(7):1–100, January 1999. [3] J. Bogda and U. Hölzle. Removing unnecessary synchro- safety of the optimizations made by the analyses, therefore, nization in Java. In Proceedings of the 1999 ACM SIGPLAN the system implementation must guarantee safety before ap- conference on Object-oriented programming, systems, lan- plying the analyses (optimizations). We might say that our guages, and applications, pages 35–46, November 1999. guidelines and this approach are complementary, since the [4] G. Booch, I. Jacobson, and J. Rumbaugh. Unified Modeling guidelines can be applied to guarantee the system safety be- Language – User’s Guide. Addison–Wesley, 1999. fore execute the data flow analyses and the optimization. [5] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and In this paper we analyze some alternatives to solve con- M. Stal. A System of Patterns: Pattern–Oriented Software currency problems showing, in general, that the concur- Architecture. John Wiley & Sons, 1996. rency manager is more efficient than the synchronized [6] J.-D. Choi, M. Gupta, M. Serrano, V. C. Sreedhar, and S. Midkiff. Escape analysis for Java. In Proceedings of the modifier. Moreover, we show the negative impact of the 1999 ACM SIGPLAN conference on Object-oriented pro- widespread use of the synchronized method qualifier, as gramming, systems, languages, and applications, pages 1– well as of the unnecessary implementation of transactions 19. ACM, November 1999. in facade methods. The experiments also allow us to state [7] O. Design. Pse pro for java api user guide, 2001. that the impact of the guidelines application in a system is Avaiable at http://support.odi.com/i/documentation/doc/- relatively small; mainly when comparing with the other ap- psepro/pse-java/doc/pdf/pseug.pdf. proaches (see Section 4). Another advantage of our pro- [8] A. N. Expert. A Book He Wrote. His Publisher, Erewhon, posal being based on specific software architecture, is al- NC, 1999. lowing the exact definition and application of the guide- [9] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object–Oriented Software. lines, giving better support to programmers. Although spe- Addison–Wesley, 1994. cific, the software architecture has been and can be used to [10] J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Lan- implement a wide range of web–based information system. guage Specification. Addison–Wesley, second edition, 2000. Development productivity is increased because the guide- [11] D. Lea. Concurrent Programming in Java. Addison-Wesley, lines precisely indicate the points where concurrency con- second edition, 1999. trol code must be applied, identifying classes and situations [12] V. Li. Performance models of timestamp-ordering concur- passive of control, and which mechanism should be used to rency control algorithms in distributed databases. IEEE control such problem. Transactions on Computers, 36(9):1041–1051, 1987. We applied our guidelines in two real web–based infor- [13] T. Massoni, V. Alves, S. Soares, and P. Borba. PDC: Per- sistent Data Collections pattern. In First Latin American mation systems in order to validate them. The first sys- Conference on Pattern Languages Programming — Sugar- tem was already implemented and running, and we used the LoafPLoP, Rio de Janeiro, Brazil, 3th–5th October 2001. To guidelines as a mechanism for checking whether the sys- appear in UERJ Magazine: Special Issue on Software Pat- tem controls concurrency correctly. This helped us to val- terns. idate the guidelines, since some concurrency control that [14] D. Schmidt, M. Stal, H. Rohnert, and F. Buschmann. our guidelines define was used in those systems, such as us- Pattern-Oriented Software Architecture, Vol. 2: Patterns for ing transactions only when it is necessary. We also found Concurrent and Networked Objects. Wiley & Sons, 2000. some naive controls, which could be avoided if our guide- [15] S. Soares and P. Borba. Concurrency Manager. In First Latin lines were used. In the second system, which is another American Conference on Pattern Languages Programming implementation of the application implemented by the first — SugarLoafPLoP, Rio de Janeiro, Brazil, 3th–5th October 2001. To appear in UERJ Magazine: Special Issue on Soft- system, the guidelines were applied during the implemen- ware Patterns. tation phase. We made this to analyze the impact of using [16] A. Software. O2 technology user manual: Java relational the guidelines in the implementation time. In this case the binding. Version 2.0, July 1997. guidelines were responsible to 10% of the implementation [17] Sun Microsystems. The Enterprise Java Beans Specification, time, which shows a small impact of them. October 2000. Avaiable at http://java.sun.com/products/- ejb/docs.html.
References
[1] O. Agesen, D. Detlefs, A. Garthwaite, R. Knippel, Y. S.
Ramakrishna, and D. White. An efficient meta-lock for implementing ubiquitous synchronization. In Proceedings of the 1999 ACM SIGPLAN conference on Object-oriented