Academia.eduAcademia.edu

Behavior and Performance of Message-Oriented Middleware Systems

2002

The middleware technology used as the foundation of Internet-enabled enterprise systems is becoming increasingly complex. In addition, the various technologies offer a number of standard architectures that can be used by designers as templates to build applications. However, there is little concrete understanding in the software industry on the strengths and weaknesses of competing technologies, and the different trade-offs that various component architectures impose. The SACT Group at CSIRO has qualitatively and quantitatively evaluated a number of commercially available Message-Oriented Middleware (MOM) systems. This paper focuses on the results obtained from the performance evaluation of the IBM's MQSeries V5.2. It presents an overview of the technology, and discusses the metric used in this study for performance measurement The test results related to the sustainable performance of the system using various test configurations are described and their implications discussed.

Behavior and Performance of Message-Oriented Middleware Systems Phong Tran, Paul Greenfield CSIRO Mathematical and Information Sciences, North Ryde, Sydney, Australia {phong.tran, paul.greenfield}@csiro.au Ian Gorton School of Information Technologies, University of Sydney Sydney Australia Abstract The middleware technology used as the foundation of Internet-enabled enterprise systems is becoming increasingly complex. In addition, the various technologies offer a number of standard architectures that can be used by designers as templates to build applications. However, there is little concrete understanding in the software industry on the strengths and weaknesses of competing technologies, and the different trade-offs that various component architectures impose. The SACT Group at CSIRO has qualitatively and quantitatively evaluated a number of commercially available Message-Oriented Middleware (MOM) systems. This paper focuses on the results obtained from the performance evaluation of the IBM’s MQSeries V5.2. It presents an overview of the technology, and discusses the metric used in this study for performance measurement The test results related to the sustainable performance of the system using various test configurations are described and their implications discussed. Keywords: Distributed system, message-oriented middleware, MQSeries, performance evaluation 1. Introduction There are a number of strongly competing products in the Message-Oriented Middleware (MOM) technology market place. They all appear at first sight to offer much the same set of core features. In reality, these products are actually quite different, aiming at meeting the diverse needs of different core markets. Not surprisingly, they have their own strengths and weaknesses. These become crucial when an IT department is deciding which product will best suit its particular needs. In particular, MOM technology performance is often a key differentiator during the acquisition process. A literature survey was carried out to determine the current status of research on the performance of the most recent releases of various commercially available MOM products and how the system performance was measured. The survey result shows that the only papers related to the performance measurement of MOM systems are from the products’ vendors. Two performance studies have been published recently comparing MQSeries and MSMQ, one by IBM [1] and the other commissioned by Microsoft [2]. The studies have different test and measurement methods and the results are essentially conflicting. In response to these issues, the SACT research group in CSIRO, an independent research organization in Australia, initiated the Middleware Technology Evaluation (MTE) project. The major aim of this project was to rigorously and independently evaluate competing middleware products and disseminate the results to industry through publications and seminars. Previous work in this project includes studies on CORBA and EJBbased technologies [3][4][5]. The results of the work are reported in a series of comprehensive reports that are available via the Cutter Consortium1 and CSIRO publishing2. As part of the MTE project, a research program on the performance evaluation of MOM systems has been undertaken. Three of the most widely used MOM products, namely IBM’s MQSeries v5.2, Microsoft’s MSMQ v2.0 and TIBCO Rendezvous v6.6 were selected. While the comprehensive report on MOM technology [6] presents a complete picture and the performance results of each of the MOM products evaluated, this paper limits itself to a study on the behavior of the MQSeries V5.2. 2. Issues on performance measurement A key aspect of the performance measurement of a MOM system is the establishment of a scenario that best reflects “real world” MOM-based applications. Based on 1 2 www.cutter.com www.cmis.csiro.au/adsat/mte_reports.htm Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE this scenario, a metric for performance comparison can be established. The studies by IBM [1] and Microsoft [2] are based on different usage scenarios and reach contradictory conclusions. Apart from presenting the results, they reveal very little about the behaviors of the systems. The study by IBM is based on a closed-loop test scenario. In this scenario, a sender and a receiver are synchronized with the MQSeries transmission agent, such that the processing of a message between these processes is serialized. In other words, the sender puts a message on the queue and waits for it to be returned. The sender only sends the next message when it receives a reply to the previously sent message. This scenario is not ideal for most MOM applications where message senders and receivers are asynchronous. In many practical scenarios, messages are sent and forgotten. In an asynchronous scenario, senders, receivers and queue managers are concurrently active and can push the application beyond its capacity, resulting in unsustainable performance. In this study, a different scenario has been used. The end result of this study is the establishment of a new metric that is then used for performance measurement. 3. Overview of MQSeries Receiving Application Sending Application 4. Test scenario Sending system Receiving System Sending Application Msg Receiving Application TCP/IP Channel Msg Queue Manager Queue Manager Queue Queue Sender’s database for transactional tests Receiver’s database for transactional tests MQ Server MQ Server MQI API MQI API Queue Manager Queue Manager MCA Transmission Queue message channels. These consist of a transmission queue and two Message Control Agents (MCAs) connected by a network link. The MCAs are responsible for transmitting messages between queue managers. MQSeries provides direct access to a large number of parameters that an administrator can use to tune its behavior and optimize performance for a particular environment and workload. The default configuration settings include standard binding for non-persistent messages and a 64KB log buffer with three 1024KB log files for persistent messages. Alternative to the standard binding is FastPath binding for non-persistent messages. The FastPath binding links the queue manager into the application, reducing overheads and improving performance. The maximum log buffer is 2 MB. Channel MCA Application Queue Figure 1: MQSeries architecture IBM’s MQSeries [7] deals with queued messages. Figure 1 depicts the architecture of the system. Applications write messages to queues, where they can be sent to other queues and read by their receiving applications. MQSeries comes in both client and server forms. An MQSeries server contains all MQSeries components, including MQSeries Explorer, MQSeries Services and Queue Managers (QMs). It can provide messaging services to local applications and remote MQSeries clients, as well as exchanging messages with other MQSeries servers. An MQSeries client does not have its own queue managers. It relays messaging calls from applications to MQSeries servers. MQSeries servers communicate with each other over Figure 2: Open-loop test scenario Figure 2 depicts a scenario involving a multi-threaded sending application (sender), a multi-threaded receiving application (receiver) and two queue managers. As these components operate asynchronously, the sender and receiver are not synchronized. As depicted in Figure 2, the sender puts messages into a local queue and lets its queue manager transmit them to a remote queue. The receiver extracts them without replying to the sender. This is regarded as an open-loop scenario. In this scenario, these components are independent processes and as such they can handle messages at different rates. The difference between the transmission rate and the sending rate determines the queue depth at the sender’s queue. Similarly the difference between the transmission rate and the receiving rate determines the queue depth at the receiver’s queue. 5. Test platform The sender and its supporting middleware are run on a single Dell PC, connected over a 100Mb/s switched LAN Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE Computer CPU RAM Disk Type Operating System Database Sender 1x 733MHz Pentium III 0.5 GB 1x IBM Deskstar 75GXP disk 7200 rpm, 8.5ms avg seek UltraATA/100 Microsoft Windows 2000 Server SP2 SQL Server V7.0 Receiver 2x 450MHz Pentium II 0.5 GB 1x Seagate ST118202LW disk 10,000 rpm, 6.2ms avg seek Ultra2 Wide SCSI Microsoft Windows 2000 Server SP2 SQL Server V7.0 Sending rate MPS to another equivalent Dell PC running the receiver. The configuration of these test machines is shown in Table 1. 3000 2500 2000 1500 1000 500 0 0 20 40 60 80 100 120 140 run time (sec) Figure 4: Sending rate vs. run time Receiving rate 2500 Table 1: Test platform MPS 2000 Actual rate (mps) 6. Behavior of the system 1500 1000 4500 500 4000 0 3500 0 Sendingrate 3000 20 40 Receivingrate 60 80 100 run time (sec) 120 140 2500 Figure 5: Receiving rate vs. run time 2000 1500 1000 500 0 0 1000 2000 3000 4000 5000 6000 Programmedrate(mps) Figure 3: Recorded rates vs. programmed rate An initial experiment with the MQSeries was carried out to study the behavior of the system. Figure 3 shows a typical behavior of the system configured with its default configuration parameters. In this test, when the sender was sending non-persistent 64-byte messages at a rate below 2000 messages per second (mps), the receiver’s receiving rate was almost the same as the sender’s sending rate. While the sending rate continued to increase beyond 2000 mps, the receiver saturated at about 2000 mps, causing the receiving rate to level off. We call this rate the saturation point. Messages are now being queued in the sender’s transmission queue while the queue at the receiver remained empty. This means the QM’s transmission rate has reached its maximum and is the same as the receiver’s receiving rate. Figures 4 and 5 show another behavior of the system using the same message properties (non-persistent 64byte length). In this case the sending rate was set beyond the saturation point at 2400 mps. When the sender was sending messages at this rate, the receiver was initially receiving messages at the rate of 2000 mps. These rates remained stable for the first 100 seconds of run time. During this time the queue depth at the sender’s queue increased continually at a relatively linear rate with unsent messages while the queue depth at the receiver’s queue was almost zero. After the 100-second mark, the sending rate fluctuated sharply and at the same time dropped (Fig. 4), causing a ripple effect on the receiving rate (Fig. 5). The system at this stage was in the unstable state. At this time, the queue contained about 40000 messages. When the sender process stopped at the 120-second mark (Fig. 4), no more new messages were put into the queue. At this time the sender’s queue manager and the receiver continued to process the unsent messages, and the receiving rate increased sharply, becoming steady at 2200 mps (Fig. 5). This behavior shows that the sender’s queue manager cannot sustain the same level of performance when its queue is allowed to grow beyond a certain queue depth. The behavior also shows that the queue manager’s transmission capacity is constrained by the message queuing processing at the queue. In other words, when the Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE sender and queue manager are concurrently active, the transmission rate is lower than when only the queue manager is active. The explanation for this behavior is as follows: When the queue is empty, there is no disk I/O activity. When the queue starts to grow, its queue manager starts to fill up the queue’s buffer. When the buffer is full, the queue manager flushes those in the buffer to a disk file. This condition persists while the sender is running. As the messages have the same delivery priority, under this condition, the queue manager now has to read and send the old messages from the file first while continuing to enqueue new messages and flush those in the buffer to the same disk file. This causes a high level of disk I/O activity and a high level of CPU usage. As a result, when the queue depth reaches beyond a certain level, the queue manager starts to thrash as the CPU usage level reaches 100%. When the sender has stopped and no new messages are sent to the queue, the queue manager only needs to dequeue the unsent messages, including those in the file. It was observed that the disk I/O level dropped considerably and the CPU usage level also dropped. This means the queue manager has more resources with less disk I/O and as a result it can send the unsent messages at a higher transmission rate than the initial rate. This condition is equivalent to sending pre-loaded messages. 7. Metric for performance measurement The system’s behavior as described above leads to a conclusion that the system operates in two different states – sustainable and unsustainable. A sustainable state is the state in which the system can maintain the same level of performance for an unlimited run time. The unsustainable state causes erratic and reduced performance. As a result, the tests quantified the system’s performance by measuring the maximum sustainable throughput (MST). This is the throughput at the saturation point. This throughput level does not cause the queue depth to increase at either the sender’s or the receiver’s queue. Also the performance measurement is conducted with queues initially empty, and with both the sender and receiver running. There are a number of factors that can influence the performance of the system. These include: • the quality of service that defines the level of message delivery guarantee • the system’s configuration parameters • the message length, • the receiver’s number of threads. Therefore it is important to explore the extent of influence of these factors on the performance. 7.1. Quality of Service (QoS) variable MOM technologies are about delivering messages reliably. Products normally offer three levels of QoS. They are non-persistent, persistent and transactional. In non-persistent delivery, the MOM will do its best to deliver the message. Undelivered messages are only kept in memory and can be lost if a system or network fails before a message is delivered. With persistent delivery, the MOM guarantees to deliver messages despite system and network failures by logging them to disk as well as keeping them in memory. This means they can be recovered and subsequently delivered after a system failure. With the transactional delivery, the MOM tightly integrates persistent messaging operations with application code, not allowing transactional messages to be sent or received until the application commits their enclosing transaction. Transactional messaging also allows message sending and receiving to be coordinated with other transactional operations, such as database updates. Transactional messaging normally involves a transaction manager (TM). Combining these levels of QoS results in 3 different alternatives for performance measurement. They are nonpersistent non-transactional (NPNT), persistent nontransactional (PNT) and persistent transactional (PT). There are two variants of PT. They are Persistent Local Transaction (PLT) and Persistent Global Transaction (PGT). In PLT messaging, there is no integration with external transactional operations such as database updates and the MOM product itself manages the transaction with single-phase commits. PGT messaging involves external transactional operations such as database updates and the TM performs XA-compliant 2-phase commits. In these performance tests, MQSeries is used as the TM. 7.2. Other variables The performance tests are carried out with various configuration parameter values (default or tuned values recommended by the vendor). The configuration parameters include method of binding (standard or FastPath), log buffer size and number of log files as described in Section 3. With the message length, the performance measurement is limited to 64-byte, 1024-byte and 4096byte messages. The selection of these lengths may be arbitrary but should be sufficient to indicate how the system’s performance is affected. The performance tests are also carried out using two different types of receiver. The first test uses a singlethreaded receiver and the second test uses a multithreaded receiver. This is to show how throughput changes when the number of threads in the receiver is Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE increased, and to give an indication of how well the products will scale in a more realistic test case with many receiving applications. 8. Performance results and analysis Table 2 shows the maximum sustainable throughput (MST) obtained for 64-byte NPNT messages with varying number of receiving threads, message lengths and different types of bindings. It shows that the system performs best with a single-threaded receiver and the FastPath binding. FastPath binding improves the MST by about 40%. The throughput declines when multiple receiving threads are used, possibly due to lock contention somewhere inside the MQSeries code. It was observed that the CPU usage level remained stable at about 60% on both the sending and receiving machines for all of these tests. Receiving threads 1 5 15 20 Mgs length (bytes) 64 1024 4096 64 64 64 FastPath binding Standard binding 3000 2700 1650 2750 2650 2600 2150 2050 2050 2050 Receiving threads 1 15 20 2000 1000 0 Default log 120 130 Not tested Tuned log 120 130 130 1024-byte message Tuned log 110 120 120 The overall performance for the PNT messages is significantly lower than that for the NPNT messages. This is to be expected given that messages are now have to be logged to disk for recovery purposes. The disk I/O operations for message logging largely limit the throughputs shown in these tests. Table 3 also shows that the throughput is unchanged with increasing number of receiver threads. This is because when queuing PNT messages, MQSeries locks the queue while updating the log to ensure that updates to the queue are serialized. This means there can only be one operation in progress at the queue at any one time. 1 15 20 30 4000 3000 64-byte message Table 3: MST for PNT messages Receiving threads Table 2: MST for NPNT messages MST(mps) throughput by less than 1%. Tuning the log configuration to a 2MB buffer and twelve 4096KB log files, has no effect. 64–byte message Default Tuned log log 90 100 520 650 510 480 - 1024-byte message Default Tuned log log 80 90 380 570 370 - Table 4: MST for PLT messages 0 1000 2000 3000 4000 5000 m essage length (bytes) Figure 6: MST vs. message length for NPNT messages Since MQSeries precedes the message with a header of 430 bytes when sending a message, the actual message length is the sum of the length of the user’s message and the length of the header. Figure 6 shows the relationship between the MST and the actual message length. It shows that the MST decreases almost linearly with the increase of the message length. Table 3 shows the MST obtained from the PNT messaging tests. It shows that the impact of varying the number of receiving threads is negligible, and increasing the message length from 64 bytes to 1024 bytes reduces Table 4 shows the MST obtained from the PLT messaging tests using different message lengths, log configurations and number of receiving threads. It shows that the MST increases with the increasing number of the receiving threads, reaching 520 messages per second with 15 receiving threads using the default log. The tuned log further improves the MST by 25% for 64-byte messages, and by 50% for 1024-byte messages. Increasing the number of receiving threads more than 15 only decreases the performance. This is because the machine is disk I/O and CPU bound, causing contention. With the tuned log and 15 receiving threads, increasing the message length from 64 bytes to 1024 bytes causes the throughput to fall by 14%, from 650 to 570. Note that increasing the number of sending threads makes no difference to the performance. For PLT messaging, the log management and queue management are performed in parallel and asynchronously. Message logging takes place at the same Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE time as messages are being written to queues, allowing multiple applications to run simultaneously. This behavior is quite different to the serialized behavior for PNT messaging as explained earlier. As a result, compared with the MST for the PNT messages, the MST for PLT messages is five times higher. Logging becomes more efficient as the size of the log buffer and number of log files are increased. The use of larger buffer and more log files allows for more efficient and less frequent disk operations, with fewer checkpoints. This reduces disk I/O substantially and leads to the throughput improvements. Table 5 shows the MST obtained for PGT messages with different message lengths and varying number of receiving threads using the tuned log configuration. The transaction in each thread includes an update of a SQL Server V7.0 database table. Receiving threads 1 15 20 64-byte 30 60 60 Message length 1024-byte 4096-byte 30 30 60 60 60 60 Table 5: MST for PGT messages Compared with the MST for the PLT messages, the MST for PGT messages is significantly lower. There are a number of factors that affect the MST in this case. The overhead of managing global 2-phase commit transactions, database updates, and interfacing with external resource managers are some of those that contribute to the drop in the MST. Because of these factors, increasing the message length has no observable impact on the performance. The highest MST is 60 mps with 15 receiving threads. This paper presents the performance results using four types of QoS - non-persistent non-transactional, persistent non-transactional, persistent local transactional and persistent global transactional. The results show that the impact of various factors – message length, tuning parameters, QoS and the receiver’s number of threads on the performance is significant in many aspects. The MST decreases almost linearly with the increase of the message length for NPNT messages. Logging for PNT messages reduces the peak MST from 3000 mps to 130 mps. While the system does not scale for PNT messages, it scales well for both PLT and PGT messages, as increasing the receiver’s number of threads increases the MST for PLT messages by more than six times, from 100 mps to 650 mps. The MST for PGT messages is constrained by the local database updates and the overhead of the transaction manager. It does not show the same level of scalability as PLT messaging References [1] [2] [3] [4] [5] 9. Conclusion This paper presents the result of a study on the behavior of the MQSeries V5.2 using an open-loop test scenario with asynchronous senders and receivers. The result indicates that the system operates in two different states – sustainable and unsustainable. The system can be in one of these two states depending on the message sending or receiving rate. It remains in the sustainable state if the sending rate does not cause the queue to grow beyond a certain stable queue depth. Otherwise if the queue is allowed to continuously grow, the application enters an unsustainable state. The result of the study enables a clear metric to be established for the performance measurement of MOM technologies. The system’s performance is quantified by measuring the maximum sustainable throughput (MST). [6] [7] A. Rindos, M. Loeb, S. Woolet, “A Performance Comparison of IBM MQseries 5.2 and Microsoft Message Queue 2.0 on Windows 2000”, IBM SWG Competitive Technical Assessment, Research Triangle Park, 3/2001. http://www.microsoft.com/msmq/whitepapers.htm, “Performance Evaluation of Microsoft Messaging Queue, IBM MQSeries and MQBridge”. I.Gorton, A.Liu, P. Tran, “The Devil is in the Detail, A Comparison of CORBA Object Transaction Services”, the 6th International Conference on Object-Oriented Information Systems, pages 211-221, 18-20 December 2000, London. S.Ran, P. Brebner, I.Gorton, “The Rigorous Evaluation of Enterprise Java Bean Technology”, in 15th International Conference on Information Networking (ICOIN-15), Beppu City, Japan, Feb 2001, IEEE P. Tran, I Gorton, "Analyzing the Scalability of Transactional CORBA Applications", TOOLS Europe 2001, Zurich, Switzerland, 12-14/3/01, IEEE proceedings TOOLS 38, pp 102-110. P. Tran, P. GreenField, I. Gorton, H. Tran, “Performance Evaluation of Message-Oriented Middleware Technology - Report”, CSIRO Mathematical and Information Sciences, Australia, 8/2001. IBM, “MQSeries– Planning Guide”, 9th Edition, 3/2000. Proceedings of the 22 nd International Conference on Distributed Computing Systems Workshops (ICDCSW’02) 0-7695-1588-6/02 $17.00 © 2002 IEEE