Skip to main content
Log in

A taxonomy of application scheduling tools for high performance cluster computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Application scheduling plays an important role in high-performance cluster computing. Application scheduling can be classified as job scheduling and task scheduling. This paper presents a survey on the software tools for the graph-based scheduling on cluster systems with the focus on task scheduling. The tasks of a parallel or distributed application can be properly scheduled onto multi-processors in order to optimize the performance of the program (e.g., execution time or resource utilization). In general, scheduling algorithms are designed based on the notion of task graph that represents the relationship of parallel tasks. The scheduling algorithms map the nodes of a graph to the processors in order to minimize overall execution time. Although many scheduling algorithms have been proposed in the literature, surprisingly not many practical tools can be found in practical use. After discussing the fundamental scheduling techniques, we propose a framework and taxonomy for the scheduling tools on clusters. Using this framework, the features of existing scheduling tools are analyzed and compared. We also discuss the important issues in improving the usability of the scheduling tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. I. Ahmad and Y. Kwok, On exploiting task duplication in parallel program scheduling. IEEE Trans. on Parallel and Distributed Systems 9(9) (September 1998) 872–892.

    Article  Google Scholar 

  2. I. Ahmad, Y. Kwok, M. Yu, and W. Shu, CASCH: a software tool for automatic parallelization and scheduling of programs on message-passing multiprocessors. IEEE Concurrency 8(4) (October–December 2000) 21–33.

    Google Scholar 

  3. G. Aloisio and M. Bochicchio, The use of PVM with workstation clusters for distributed SAR data processing, in: Proceedings of HPCN Europe 1995, Milan, Italy, LNCS 919, Springer, (May 1995) pp. 570–581.

  4. O. Arndt, B. Freisleben, T. Kielmann, and F. Thilo, A comparative study of online scheduling algorithms for networks of workstations, Cluster Computing 3(2) (2000) 95–112.

    Google Scholar 

  5. M. Baker, G. Fox, and H. Yau, Cluster management software (1996) http://www.crpc.rice.edu/NHSEreview/CMS/.

  6. M. Baker, (ed.), Cluster computing white paper, IEEE Computer Society Task Force on Cluster Computing (TFCC) http://www.csm.port.ac.uk/∼mab/tfcc/WhitePaper/(December 2000).

  7. A. Beguelin, J. Dongarra, A. Geist, R. Manchek, K. Moore, R. Wade, and V. Sunderam, HeNCE: graphical development tools for network-based concurrent computers, in: Proceedings of the Scalable High Performance Computing Conference, Williamsburg, IEEE Computer Society Press, (April 1992) pp. 129–136.

  8. A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and K. Moore, HeNCE: a heterogeneous network computing environment, Scientific Programming 3(1) (1994) 49–60.

    Google Scholar 

  9. J. Bernabeu, Y. Khalidi, V. Matena, K. Shirriff, and M. Thadani, Solaris MC: a multi-computer OS, Technical Report: TR-95-48, Sun Microsystems http://research.sun.com/research/techrep/1995/abstract-48.html.

  10. R. Buyya (ed), High Performance Cluster Computing: Architectures and Systems, vol. 1, (Prentice Hall PTR, NJ, USA 1999).

  11. R. Buyya, (ed), High performance cluster computing: programming and applications, vol. 2, Prentice Hall PTR, NJ, USA (1999).

  12. J. Cao, A. Chan, Y. Sun, and K. Zhang, Dynamic Configuration Management in Graph-Oriented Distributed Programming Environment, Science of Computer Programming 48(1) (July 2003) 43–65.

  13. Condor, http://www.cs.wisc.edu/condor/.

  14. M. Cosnard and E. Jeannot, Compact DAG representation and its dynamic scheduling, Journal of Parallel and Distributed Computing 58(3) (September 1999) 487–514.

    Google Scholar 

  15. S. Darbha and D. Agrawal, A fast and scalable scheduling algorithm for distributed memory systems, in: Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, San Antonio, TX (October 25–28, 1995) pp. 60–63.

  16. S. Darbha and D. Agrawal, A task duplication based scalable scheduling algorithm for distributed memory systems, Journal of Parallel and Distributed Systems 46(1) (October 1997) 15–27.

    Article  Google Scholar 

  17. J. Dongarra, J. Croz, I. Duff, and S. Hammarling, A set of level 3 basic linear algebra subprograms, ACM Trans. on Mathematical Software 16(1) (1990) 1–17.

    Article  MATH  Google Scholar 

  18. D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik, and P. Wong, Theory and practice in parallel job scheduling, in: Proceedings of 3rd Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 1291, Springer-Verlag, (1997) pp. 1–34.

  19. D. Feitelson, A survey of scheduling in multiprogrammed parallel systems, Research Report RC 19790 (87657), IBM T. J. Watson Research Center (October 1994), Revised version in August 1997.

  20. D. Feitelson, Scheduling parallel jobs on clusters, in: High Performance Cluster Computing: Architectures and Systems, vol. 1, Rajkumar Buyya (ed.), Prentice-Hall (1999) pp. 519–533.

  21. J. Gehring and A. Reinefeld, MARS-a framework for minimizing the job execution time in a metacomputing environment, Future Generation Computer Systems 12(1) (1996) 87–99.

    Article  Google Scholar 

  22. A. Gerasoulis, J. Jiao, and T. Yang, A multistage approach to scheduling task graphs, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 22 (1995) 81–105.

  23. GLUnix, http://now.cs.berkeley.edu/Glunix/glunix.html.

  24. A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd ed., Pearson Education (2003).

  25. A. Grimshaw, J. Weissman, and W. Strayer, Portable run-time support for dynamic object-oriented parallel processing, ACM Trans. on Computer Systems 14(2) (May 1996) 139–170.

    Google Scholar 

  26. A. Grimshaw, W. Wulf, and the Legion team, The Legion vision of a worldwide virtual computer, Communications of ACM 40(1) (January 1997) 39–45.

  27. A. Grimshaw, A. Ferrari, F. Knabe, and M. Humphrey, Legion: an operating system for wide-area computing, IEEE Computer 32(5) (May 1999) 29–37.

    Google Scholar 

  28. L. Huang, M. Oudshoorn and J. Cao, Design and implementation of an adaptive task mapping environment for parallel programming, Australian Computer Science Communications 19(1) (February 1997) 326–335.

    Google Scholar 

  29. K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, (WCB/McGraw-Hill 1998).

  30. IBM Redbook, Workload management with LoadLeveler, http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246038.html?Open.

  31. IBM@server Cluster 1600 http://www-1.ibm.com/servers/eserver/clusters/hardware/1600.html.

  32. H. James, Scheduling in metacomputing systems, Ph.D. thesis, University of Philosophy (July 1999) http://www.dhpc.adelaide.edu.au/reports/057/html.

  33. C. Jenssen, Parallel computational fluid dynamics 2000: trends and applications, Elsevier Science (2001).

  34. J. Krallmann, U. Schwiegelshohn, and R. Yahyapour, On the design and evaluation of job scheduling algorithms, in: Proce of 5th Workshop on Job Scheduling Strategies for Parallel Processing, in conjunction with IPPS/SPDP’99, San Juan, Puerto Rico (April 16, 1999) pp. 17–42.

  35. Y. Kwok and I. Ahmad, Benchmarking and comparison of the task graph scheduling algorithms, Journal of Parallel and Distributed Computing 59(3) (December 1999) 381–422.

    Article  Google Scholar 

  36. Y. Kwok and I. Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys 31(4) (December 1999) 406–471.

    Article  Google Scholar 

  37. C. Lee, J. Hwang, Y. Chow, and F. Anger, Multiprocessor scheduling with interprocessor communication delays, Operations Research Letters 7(3) (June 1988) 141–147.

    Article  MathSciNet  Google Scholar 

  38. Legion, http://legion.virginia.edu/.

  39. LSF, http://www.platform.com/products/wm/LSF/index.asp.

  40. MOSIX, http://www.mosix.com/.

  41. MPI, http://www-unix.mcs.anl.gov/mpi/.

  42. Myricom: Creator of Myrinet, http://www.myri.com/.

  43. OpenMP, http://www.openmp.org/.

  44. G. F. Pfister, In Search of Clusters, 2nd ed., (Prentice Hall 1998).

  45. PVM, http://www.csm.ornl.gov/pvm/pvm_home.html.

  46. S. Ranaweera and D. Agrawal, A task duplication based scheduling algorithm for heterogeneous systems, in: Proceedings of 14th International Parallel and Distributed Processing Symposium (IPDPS’2000), Cancun, Mexico (May 1–5, 2000) pp. 445– 450.

  47. H. El-Rewini, T. Lewis, and H. Ali, Task Scheduling in Parallel and Distributed Systems, (Prentice Hall PTR, NJ 1994).

    Google Scholar 

  48. M. Senar, A. Ripoll, A. Cortes, and E. Luque, Clustering and reassignment-based mapping strategy for message-passing architectures, in: Proceedings of IPPS/SPDP 1998, Orlando, Florida (March 30-April 3, 1998) pp. 415–421.

  49. H. Shen, S. Lor, and P. Maheshwari, An architecture-independent graphical tool for automatic contention-free process-to-processor mapping, The Journal of Supercomputing 18(2) (February 2001) 115–139.

    Article  Google Scholar 

  50. S. Spach and R. Pulleyblank, Parallel raytraced image generation, Hewlett-Packard Journal 43(3) (June 1992) 76–83.

    Google Scholar 

  51. J. Squyres, A. Lumsdaine, and R. Stevenson, A cluster-based parallel image processing toolkit, Visual Data Exploration and Analysis III, vol. 2421 of SPIE Proceedings, Society of Photo-optical Instrumentation Engineers (SPIE) (1995) pp. 228–239.

  52. Sun Microsystems, Sun[tm] Clusters: providing enterprise-wide business-critical computing, White Paper (October 1997) http://wwws.sun.com/software/cluster/wp-sunclusters/.

  53. Y. Sun and C. Wang, Solving irregularly structured problems based on distributed object model, Parallel Computing 29(11/12) (November 2003) 1539–1562.

    Google Scholar 

  54. T. Tamura, M. Oguchi, and M. Kitsuregawa, Parallel database processing on a 100 node PC cluster: cases for decision support query processing and data mining, in: Proceedings of Supercomputing Conference (SC’97), San Jose (November 15–21, 1997).

  55. H. Topcuoglu, S. Hariri, W. Furmanski, J. Valente, I. Ra, D. Kim, Y. Kim, X. Bing, and B. Ye, The software architecture of a virtual distributed computing environment, in: Proceedings of 6th International Symposium on High Performance Distributed Computing (HPDC’97), Portland, OR (August 5–8, 1997) pp. 40–49.

  56. H. Topcuoglu, S. Hariri, D. Kim, Y. Kim, X. Bing, B. Ye, I. Ra, and J Valente, The design and evaluation of a virtual distributed computing environment, Cluster Computing 1(1) (1998) 81– 93.

    Article  Google Scholar 

  57. Windows 2000 Cluster Technologies, http://www.microsoft.com/windows2000/technologies/clustering/

  58. M. Wu and D. Gajski, Hypertool: a programming aid for message-passing systems, IEEE Trans. on Parallel and Distributed Systems 3(1) (1990) 330–343.

    Article  Google Scholar 

  59. T. Yang and A. Gerasoulis, PYRROS: Static task scheduling and code generation for message passing multiprocessors, in: Proceedings of the 1992 ACM International Conference on Supercomputing, Washington D.C. (1992) pp. 428–437.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiannong Cao.

Additional information

This work is supported by the Hong Kong Polytechnic University under grant H-ZJ80 and by NASA Ames Research Center by a cooperative grant agreement with the University of Texas at Arlington.

Jiannong Cao received the BSc degree in computer science from Nanjing University, Nanjing, China in 1982, and the MSc and the Ph.D degrees in computer science from Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently an associate professor in Department of Computing at the Hong Kong Polytechnic University, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. He was on the faculty of computer science at James Cook University and University of Adelaide in Australia, and City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile computing, fault tolerance, and distributed software architecture and tools. He has published over 120 technical papers in the above areas. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/programme committee member for many international conferences. Dr. Cao is a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing, and Computer Architecture Professional Committee of the China Computer Federation.

Alvin Chan is currently an assistant professor at the Hong Kong Polytechnic University. He graduated from the University of New South Wales with a Ph.D. degree in 1995 and was subsequently employed as a Research Scientist by the CSIRO, Australia. From 1997 to 1998, he was employed by the Centre for Wireless Communications, National University of Singapore as a Program Manager. Dr. Chan is one of the founding members and director of a university spin-off company, Information Access Technology Limited. He is an active consultant and has been providing consultancy services to both local and overseas companies. His research interests include mobile computing, context-aware computing and smart card applications.

Yudong Sun received the B.S. and M.S. degrees from Shanghai Jiao Tong University, China. He received Ph.D. degree from the University of Hong Kong in 2002, all in computer science. From 1988 to 1996, he was among the teaching staff in Department of Computer Science and Engineering at Shanghai Jiao Tong University. From 2002 to 2003, he held a research position at the Hong Kong Polytechnic University. At present, he is a Research Associate in School of Computing Science at University of Newcastle upon Tyne, UK. His research interests include parallel and distributed computing, Web services, Grid computing, and bioinformatics.

Sajal K. Das is currently a Professor of Computer Science and Engineering and the Founding Director of the Center for Research in Wireless Mobility and Networking (CReWMaN) at the University of Texas at Arlington. His current research interests include resource and mobility management in wireless networks, mobile and pervasive computing, sensor networks, mobile internet, parallel processing, and grid computing. He has published over 250 research papers, and holds four US patents in wireless mobile networks. He received the Best Paper Awards in ACM MobiCom’99, ICOIN-16, ACM, MSWiM’00 and ACM/IEEE PADS’97. Dr. Das serves on the Editorial Boards of IEEE Transactions on Mobile Computing, ACM/Kluwer Wireless Networks, Parallel Processing Letters, Journal of Parallel Algorithms and Applications. He served as General Chair of IEEE PerCom’04, IWDC’04, MASCOTS’02 ACM WoWMoM’00-02; General Vice Chair of IEEE PerCom’03, ACM MobiCom’00 and IEEE HiPC’00-01; Program Chair of IWDC’02, WoWMoM’98-99; TPC Vice Chair of ICPADS’02; and as TPC member of numerous IEEE and ACM conferences.

Minyi Guo received his Ph.D. degree in information science from University of Tsukuba, Japan in 1998. From 1998 to 2000, Dr. Guo had been a research scientist of NEC Soft, Ltd. Japan. He is currently a professor at the Department of Computer Software, The University of Aizu, Japan. From 2001 to 2003, he was a visiting professor of Georgia State University, USA, Hong Kong Polytechnic University, Hong Kong. Dr. Guo has served as general chair, program committee or organizing committee chair for many international conferences, and delivered more than 20 invited talks in USA, Australia, China, and Japan. He is the editor-in-chief of the Journal of Embedded Systems. He is also in editorial board of International Journal of High Performance Computing and Networking, Journal of Embedded Computing, Journal of Parallel and Distributed Scientific and Engineering Computing, and International Journal of Computer and Applications.

Dr. Guo’s research interests include parallel and distributed processing, parallelizing compilers, data parallel languages, data mining, molecular computing and software engineering. He is a member of the ACM, IEEE, IEEE Computer Society, and IEICE. He is listed in Marquis Who’s Who in Science and Engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, J., Chan, A.T.S., Sun, Y. et al. A taxonomy of application scheduling tools for high performance cluster computing. Cluster Comput 9, 355–371 (2006). https://doi.org/10.1007/s10586-006-9747-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-006-9747-2

Keywords

Navigation