A taxonomy of application scheduling tools for high performance cluster computing

Cao, Jiannong; Chan, Alvin T. S.; Sun, Yudong; Das, Sajal K.; Guo, Minyi

doi:10.1007/s10586-006-9747-2

A taxonomy of application scheduling tools for high performance cluster computing

Published: July 2006

Volume 9, pages 355–371, (2006)
Cite this article

Cluster Computing Aims and scope Submit manuscript

450 Accesses
3 Altmetric
Explore all metrics

Abstract

Application scheduling plays an important role in high-performance cluster computing. Application scheduling can be classified as job scheduling and task scheduling. This paper presents a survey on the software tools for the graph-based scheduling on cluster systems with the focus on task scheduling. The tasks of a parallel or distributed application can be properly scheduled onto multi-processors in order to optimize the performance of the program (e.g., execution time or resource utilization). In general, scheduling algorithms are designed based on the notion of task graph that represents the relationship of parallel tasks. The scheduling algorithms map the nodes of a graph to the processors in order to minimize overall execution time. Although many scheduling algorithms have been proposed in the literature, surprisingly not many practical tools can be found in practical use. After discussing the fundamental scheduling techniques, we propose a framework and taxonomy for the scheduling tools on clusters. Using this framework, the features of existing scheduling tools are analyzed and compared. We also discuss the important issues in improving the usability of the scheduling tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

On Effective Scheduling in Computing Clusters

Article 16 December 2019

Scheduling of Parallel Tasks with Proportionate Priorities

Article 21 May 2016

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Article Open access 31 May 2022

References

I. Ahmad and Y. Kwok, On exploiting task duplication in parallel program scheduling. IEEE Trans. on Parallel and Distributed Systems 9(9) (September 1998) 872–892.
Article Google Scholar
I. Ahmad, Y. Kwok, M. Yu, and W. Shu, CASCH: a software tool for automatic parallelization and scheduling of programs on message-passing multiprocessors. IEEE Concurrency 8(4) (October–December 2000) 21–33.
Google Scholar
G. Aloisio and M. Bochicchio, The use of PVM with workstation clusters for distributed SAR data processing, in: Proceedings of HPCN Europe 1995, Milan, Italy, LNCS 919, Springer, (May 1995) pp. 570–581.
O. Arndt, B. Freisleben, T. Kielmann, and F. Thilo, A comparative study of online scheduling algorithms for networks of workstations, Cluster Computing 3(2) (2000) 95–112.
Google Scholar
M. Baker, G. Fox, and H. Yau, Cluster management software (1996) http://www.crpc.rice.edu/NHSEreview/CMS/.
M. Baker, (ed.), Cluster computing white paper, IEEE Computer Society Task Force on Cluster Computing (TFCC) http://www.csm.port.ac.uk/∼mab/tfcc/WhitePaper/(December 2000).
A. Beguelin, J. Dongarra, A. Geist, R. Manchek, K. Moore, R. Wade, and V. Sunderam, HeNCE: graphical development tools for network-based concurrent computers, in: Proceedings of the Scalable High Performance Computing Conference, Williamsburg, IEEE Computer Society Press, (April 1992) pp. 129–136.
A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and K. Moore, HeNCE: a heterogeneous network computing environment, Scientific Programming 3(1) (1994) 49–60.
Google Scholar
J. Bernabeu, Y. Khalidi, V. Matena, K. Shirriff, and M. Thadani, Solaris MC: a multi-computer OS, Technical Report: TR-95-48, Sun Microsystems http://research.sun.com/research/techrep/1995/abstract-48.html.
R. Buyya (ed), High Performance Cluster Computing: Architectures and Systems, vol. 1, (Prentice Hall PTR, NJ, USA 1999).
R. Buyya, (ed), High performance cluster computing: programming and applications, vol. 2, Prentice Hall PTR, NJ, USA (1999).
J. Cao, A. Chan, Y. Sun, and K. Zhang, Dynamic Configuration Management in Graph-Oriented Distributed Programming Environment, Science of Computer Programming 48(1) (July 2003) 43–65.
Condor, http://www.cs.wisc.edu/condor/.
M. Cosnard and E. Jeannot, Compact DAG representation and its dynamic scheduling, Journal of Parallel and Distributed Computing 58(3) (September 1999) 487–514.
Google Scholar
S. Darbha and D. Agrawal, A fast and scalable scheduling algorithm for distributed memory systems, in: Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, San Antonio, TX (October 25–28, 1995) pp. 60–63.
S. Darbha and D. Agrawal, A task duplication based scalable scheduling algorithm for distributed memory systems, Journal of Parallel and Distributed Systems 46(1) (October 1997) 15–27.
Article Google Scholar
J. Dongarra, J. Croz, I. Duff, and S. Hammarling, A set of level 3 basic linear algebra subprograms, ACM Trans. on Mathematical Software 16(1) (1990) 1–17.
Article MATH Google Scholar
D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik, and P. Wong, Theory and practice in parallel job scheduling, in: Proceedings of 3rd Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 1291, Springer-Verlag, (1997) pp. 1–34.
D. Feitelson, A survey of scheduling in multiprogrammed parallel systems, Research Report RC 19790 (87657), IBM T. J. Watson Research Center (October 1994), Revised version in August 1997.
D. Feitelson, Scheduling parallel jobs on clusters, in: High Performance Cluster Computing: Architectures and Systems, vol. 1, Rajkumar Buyya (ed.), Prentice-Hall (1999) pp. 519–533.
J. Gehring and A. Reinefeld, MARS-a framework for minimizing the job execution time in a metacomputing environment, Future Generation Computer Systems 12(1) (1996) 87–99.
Article Google Scholar
A. Gerasoulis, J. Jiao, and T. Yang, A multistage approach to scheduling task graphs, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 22 (1995) 81–105.
GLUnix, http://now.cs.berkeley.edu/Glunix/glunix.html.
A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd ed., Pearson Education (2003).
A. Grimshaw, J. Weissman, and W. Strayer, Portable run-time support for dynamic object-oriented parallel processing, ACM Trans. on Computer Systems 14(2) (May 1996) 139–170.
Google Scholar
A. Grimshaw, W. Wulf, and the Legion team, The Legion vision of a worldwide virtual computer, Communications of ACM 40(1) (January 1997) 39–45.
A. Grimshaw, A. Ferrari, F. Knabe, and M. Humphrey, Legion: an operating system for wide-area computing, IEEE Computer 32(5) (May 1999) 29–37.
Google Scholar
L. Huang, M. Oudshoorn and J. Cao, Design and implementation of an adaptive task mapping environment for parallel programming, Australian Computer Science Communications 19(1) (February 1997) 326–335.
Google Scholar
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, (WCB/McGraw-Hill 1998).
IBM Redbook, Workload management with LoadLeveler, http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246038.html?Open.
IBM@server Cluster 1600 http://www-1.ibm.com/servers/eserver/clusters/hardware/1600.html.
H. James, Scheduling in metacomputing systems, Ph.D. thesis, University of Philosophy (July 1999) http://www.dhpc.adelaide.edu.au/reports/057/html.
C. Jenssen, Parallel computational fluid dynamics 2000: trends and applications, Elsevier Science (2001).
J. Krallmann, U. Schwiegelshohn, and R. Yahyapour, On the design and evaluation of job scheduling algorithms, in: Proce of 5th Workshop on Job Scheduling Strategies for Parallel Processing, in conjunction with IPPS/SPDP’99, San Juan, Puerto Rico (April 16, 1999) pp. 17–42.
Y. Kwok and I. Ahmad, Benchmarking and comparison of the task graph scheduling algorithms, Journal of Parallel and Distributed Computing 59(3) (December 1999) 381–422.
Article Google Scholar
Y. Kwok and I. Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys 31(4) (December 1999) 406–471.
Article Google Scholar
C. Lee, J. Hwang, Y. Chow, and F. Anger, Multiprocessor scheduling with interprocessor communication delays, Operations Research Letters 7(3) (June 1988) 141–147.
Article MathSciNet Google Scholar
Legion, http://legion.virginia.edu/.
LSF, http://www.platform.com/products/wm/LSF/index.asp.
MOSIX, http://www.mosix.com/.
MPI, http://www-unix.mcs.anl.gov/mpi/.
Myricom: Creator of Myrinet, http://www.myri.com/.
OpenMP, http://www.openmp.org/.
G. F. Pfister, In Search of Clusters, 2nd ed., (Prentice Hall 1998).
PVM, http://www.csm.ornl.gov/pvm/pvm_home.html.
S. Ranaweera and D. Agrawal, A task duplication based scheduling algorithm for heterogeneous systems, in: Proceedings of 14th International Parallel and Distributed Processing Symposium (IPDPS’2000), Cancun, Mexico (May 1–5, 2000) pp. 445– 450.
H. El-Rewini, T. Lewis, and H. Ali, Task Scheduling in Parallel and Distributed Systems, (Prentice Hall PTR, NJ 1994).
Google Scholar
M. Senar, A. Ripoll, A. Cortes, and E. Luque, Clustering and reassignment-based mapping strategy for message-passing architectures, in: Proceedings of IPPS/SPDP 1998, Orlando, Florida (March 30-April 3, 1998) pp. 415–421.
H. Shen, S. Lor, and P. Maheshwari, An architecture-independent graphical tool for automatic contention-free process-to-processor mapping, The Journal of Supercomputing 18(2) (February 2001) 115–139.
Article Google Scholar
S. Spach and R. Pulleyblank, Parallel raytraced image generation, Hewlett-Packard Journal 43(3) (June 1992) 76–83.
Google Scholar
J. Squyres, A. Lumsdaine, and R. Stevenson, A cluster-based parallel image processing toolkit, Visual Data Exploration and Analysis III, vol. 2421 of SPIE Proceedings, Society of Photo-optical Instrumentation Engineers (SPIE) (1995) pp. 228–239.
Sun Microsystems, Sun[tm] Clusters: providing enterprise-wide business-critical computing, White Paper (October 1997) http://wwws.sun.com/software/cluster/wp-sunclusters/.
Y. Sun and C. Wang, Solving irregularly structured problems based on distributed object model, Parallel Computing 29(11/12) (November 2003) 1539–1562.
Google Scholar
T. Tamura, M. Oguchi, and M. Kitsuregawa, Parallel database processing on a 100 node PC cluster: cases for decision support query processing and data mining, in: Proceedings of Supercomputing Conference (SC’97), San Jose (November 15–21, 1997).
H. Topcuoglu, S. Hariri, W. Furmanski, J. Valente, I. Ra, D. Kim, Y. Kim, X. Bing, and B. Ye, The software architecture of a virtual distributed computing environment, in: Proceedings of 6th International Symposium on High Performance Distributed Computing (HPDC’97), Portland, OR (August 5–8, 1997) pp. 40–49.
H. Topcuoglu, S. Hariri, D. Kim, Y. Kim, X. Bing, B. Ye, I. Ra, and J Valente, The design and evaluation of a virtual distributed computing environment, Cluster Computing 1(1) (1998) 81– 93.
Article Google Scholar
Windows 2000 Cluster Technologies, http://www.microsoft.com/windows2000/technologies/clustering/
M. Wu and D. Gajski, Hypertool: a programming aid for message-passing systems, IEEE Trans. on Parallel and Distributed Systems 3(1) (1990) 330–343.
Article Google Scholar
T. Yang and A. Gerasoulis, PYRROS: Static task scheduling and code generation for message passing multiprocessors, in: Proceedings of the 1992 ACM International Conference on Supercomputing, Washington D.C. (1992) pp. 428–437.

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Jiannong Cao & Alvin T. S. Chan
School of Computing Science, University of Newcastle upon Type, Newcastle upon Type, NE1 7RU, UK
Yudong Sun
Department of Computer Software, University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580, Japan
Sajal K. Das
Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, 76019-0015, USA
Minyi Guo

Authors

Jiannong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Alvin T. S. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Sajal K. Das
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiannong Cao.

Additional information

This work is supported by the Hong Kong Polytechnic University under grant H-ZJ80 and by NASA Ames Research Center by a cooperative grant agreement with the University of Texas at Arlington.

Jiannong Cao received the BSc degree in computer science from Nanjing University, Nanjing, China in 1982, and the MSc and the Ph.D degrees in computer science from Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently an associate professor in Department of Computing at the Hong Kong Polytechnic University, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. He was on the faculty of computer science at James Cook University and University of Adelaide in Australia, and City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile computing, fault tolerance, and distributed software architecture and tools. He has published over 120 technical papers in the above areas. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/programme committee member for many international conferences. Dr. Cao is a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing, and Computer Architecture Professional Committee of the China Computer Federation.

Alvin Chan is currently an assistant professor at the Hong Kong Polytechnic University. He graduated from the University of New South Wales with a Ph.D. degree in 1995 and was subsequently employed as a Research Scientist by the CSIRO, Australia. From 1997 to 1998, he was employed by the Centre for Wireless Communications, National University of Singapore as a Program Manager. Dr. Chan is one of the founding members and director of a university spin-off company, Information Access Technology Limited. He is an active consultant and has been providing consultancy services to both local and overseas companies. His research interests include mobile computing, context-aware computing and smart card applications.

Yudong Sun received the B.S. and M.S. degrees from Shanghai Jiao Tong University, China. He received Ph.D. degree from the University of Hong Kong in 2002, all in computer science. From 1988 to 1996, he was among the teaching staff in Department of Computer Science and Engineering at Shanghai Jiao Tong University. From 2002 to 2003, he held a research position at the Hong Kong Polytechnic University. At present, he is a Research Associate in School of Computing Science at University of Newcastle upon Tyne, UK. His research interests include parallel and distributed computing, Web services, Grid computing, and bioinformatics.

Sajal K. Das is currently a Professor of Computer Science and Engineering and the Founding Director of the Center for Research in Wireless Mobility and Networking (CReWMaN) at the University of Texas at Arlington. His current research interests include resource and mobility management in wireless networks, mobile and pervasive computing, sensor networks, mobile internet, parallel processing, and grid computing. He has published over 250 research papers, and holds four US patents in wireless mobile networks. He received the Best Paper Awards in ACM MobiCom’99, ICOIN-16, ACM, MSWiM’00 and ACM/IEEE PADS’97. Dr. Das serves on the Editorial Boards of IEEE Transactions on Mobile Computing, ACM/Kluwer Wireless Networks, Parallel Processing Letters, Journal of Parallel Algorithms and Applications. He served as General Chair of IEEE PerCom’04, IWDC’04, MASCOTS’02 ACM WoWMoM’00-02; General Vice Chair of IEEE PerCom’03, ACM MobiCom’00 and IEEE HiPC’00-01; Program Chair of IWDC’02, WoWMoM’98-99; TPC Vice Chair of ICPADS’02; and as TPC member of numerous IEEE and ACM conferences.

Minyi Guo received his Ph.D. degree in information science from University of Tsukuba, Japan in 1998. From 1998 to 2000, Dr. Guo had been a research scientist of NEC Soft, Ltd. Japan. He is currently a professor at the Department of Computer Software, The University of Aizu, Japan. From 2001 to 2003, he was a visiting professor of Georgia State University, USA, Hong Kong Polytechnic University, Hong Kong. Dr. Guo has served as general chair, program committee or organizing committee chair for many international conferences, and delivered more than 20 invited talks in USA, Australia, China, and Japan. He is the editor-in-chief of the Journal of Embedded Systems. He is also in editorial board of International Journal of High Performance Computing and Networking, Journal of Embedded Computing, Journal of Parallel and Distributed Scientific and Engineering Computing, and International Journal of Computer and Applications.

Dr. Guo’s research interests include parallel and distributed processing, parallelizing compilers, data parallel languages, data mining, molecular computing and software engineering. He is a member of the ACM, IEEE, IEEE Computer Society, and IEICE. He is listed in Marquis Who’s Who in Science and Engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, J., Chan, A.T.S., Sun, Y. et al. A taxonomy of application scheduling tools for high performance cluster computing. Cluster Comput 9, 355–371 (2006). https://doi.org/10.1007/s10586-006-9747-2

Download citation

Received: 01 September 2003
Revised: 01 March 2004
Accepted: 01 May 2004
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10586-006-9747-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

A taxonomy of application scheduling tools for high performance cluster computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Effective Scheduling in Computing Clusters

Scheduling of Parallel Tasks with Proportionate Priorities

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A taxonomy of application scheduling tools for high performance cluster computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Effective Scheduling in Computing Clusters

Scheduling of Parallel Tasks with Proportionate Priorities

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation