Ga Based Workflow Schedule
Ga Based Workflow Schedule
Ga Based Workflow Schedule
2, 2014
Amandeep Verma*
Department of Information Technology,
University Institute of Engineering and Technology,
Panjab University,
Chandigarh 160014, India
Email: amandeepverma@pu.ac.in
*Corresponding author
Sakshi Kaushal
Department of Computer Science and Engineering,
University Institute of Engineering and Technology,
Panjab University,
Chandigarh 160014, India
Email: sakshi@pu.ac.in
Abstract: Task scheduling and resource allocation are the key challenges of cloud computing.
Compared with grid environment, data transfer is a big overhead for cloud workflows. So, the
cost arising from data transfers between resources as well as execution costs must also be taken
into account during scheduling based upon user’s Quality of Service (QoS) constraints. In this
paper, we present Deadline Constrained Heuristic based Genetic Algorithms (HGAs) to schedule
applications to cloud resources that minimise the execution cost while meeting the deadline
for delivering the result. Each workflow’s task is assigned priority using bottom-level (b-level)
and top-level (t-level). To increase the population diversity, these priorities are then used to
create the initial population of HGAs. The proposed algorithms are simulated and evaluated with
synthetic workflows based on realistic workflows. The simulation results show that our proposed
algorithms have a promising performance as compared to Standard Genetic Algorithm (SGA).
Keywords: cloud computing; workflow; scheduling; DAG; genetic algorithm; grid computing.
Reference to this paper should be made as follows: Verma, A. and Kaushal, S. (2014) ‘Deadline
constraint heuristic-based genetic algorithm for workflow scheduling in cloud’, Int. J. Grid and
Utility Computing, Vol. 5, No. 2, pp.96–106.
computing, storage and network services like conventional system users. Many heuristic algorithms such as Minimum
utilities in everyday life (e.g., water, electricity, gas and Execution Time, Minimum Completion Time, Min-min and
telephony) (Gabriel et al., 2011), on a pay-as-you-go basis. Max-min are used as candidates for best-effort based
Meanwhile, cloud service providers are obligated to scheduling strategies (Yu and Buyya, 2008). Yong et al.
provider satisfactory Quality of Service (QoS) based on (2011) has proposed a Deadline and Budget Constrained
business service contracts (Geelan, 2008). (DBC) scheduling heuristics to deal with sequential workflow
Workflow scheduling is one of the key issues in the applications in grids, considering completion time, budget
workflow management especially in the grid and cloud and Relative Cost (RC) together as QoS constraints.
workflow systems. It is a process that maps and manages the Yu and Buyya (2005a) proposed a cost-based workflow
execution of inter-dependent tasks on the distributed resources scheduling algorithm to schedule scientific workflow
(Taylor et al., 2007). It allocates suitable resources to workflow applications on utility grid that minimises execution cost
tasks such that the execution can be completed to satisfy while meeting the deadline for delivering results. It also
objective functions imposed by users. Proper scheduling can handled the delays of service executions by rescheduling
have significant impact on the performance of the system unexecuted tasks. A budget constraint, priority rule-based
(Deelman et al., 2008). However, in general, the problem of iterative heuristic, Maximum Profit with Serial Complexity
mapping tasks on distributed services belongs to a class of (MPSC) for grid workflows, has also been proposed to
problems known as NP-complete problems (Yu and Buyya, improve iteratively the initial feasible solution within a few
2005a). For such problems, no known algorithms are able iterations and less computation time (Yuan et al., 2009).
to generate the optimal solution within polynomial time Many researchers used Genetic Algorithm (GA) for task
(Wieczorek et al., 2007). assignment. Yu and Buyya (2006a, 2006b) proposed a
There are two major types of workflow scheduling, best- modified GA as the crossover and mutation operations of
effort based and QoS constraint based scheduling, primarily SGAs that focused on homogenous and non reservation-
for grid workflow management systems (Yu and Buyya, enabled multiprocessor systems are unable to be applied on
2008). These scheduling algorithms attempt to minimise the the grid directly. The fitness function of GA is developed to
execution time, ignoring other factors such as the monetary encourage the formation of the solutions to achieve the
cost of accessing resources. But, as cloud computing budget and deadline constraint time minimisation of
adopts ‘market-oriented business model’, so there is another workflow execution in grids while meeting a specified budget
important parameter other than the execution time, i.e. cost of for delivering results (Florin et al., 2009). Similarly, Xue and
accessing resources (Liu, 2009; Pandey 2010). Usually, faster Wenhua (2010) had proposed an improved genetic algorithm
resources are more expensive than the slower ones. Therefore, for grid workflows in which, chromosomes of poor fitness
scheduling algorithms applicable in grid workflow scheduling make secondary preferential hybridisation and mutation with
cannot be directly applied over cloud workflow applications the overall best individual to increase the convergence rate of
(Fida 2008; Pandey et al., 2011). population. Fatima and Mona (2010) developed Critical Path
In this paper, we proposed deadline constraint Genetic Algorithm (CPGA), based on rescheduling Critical
Heuristic-based Genetic Algorithms (HGAs) to schedule Path Nodes (CPNs) in the chromosome through different
applications to cloud resources that minimise the execution generations and Task Duplication Genetic Algorithm
cost while meeting the deadline for delivering the result. (TDGA), based on task duplication techniques to overcome
The remaining paper is organised as follows: Section 2 the communication overhead and improved the scheduling
gives the related work in the area of workflow scheduling. performance. Andrew et al. (2010) proposed a multi-heuristic
Section 3 presents the workflow scheduling model. The evolutionary task algorithm to dynamically map tasks to
Standard Genetic Algorithm (SGA) and proposed HGAs processor in a heterogeneous distributed system utilising GA,
are discussed in Sections 4 and 5, respectively. The combined with eight heuristics in an effort to minimise the
proposed HGAs are evaluated and compared with SGA in total execution time.
Section 6 and Section 7 concludes the paper. A Hierarchic Genetic Scheduler (Joanna and Samee, 2012)
is developed for improving the effectiveness of the single
population genetic based scheduler in the dynamic grid
2 Related work environment for scheduling independent jobs. The authors
considered the bi-objective independent batch job
Workflow scheduling is classical NP-complete problem (Yu scheduling problem with makespan and flowtime minimised
and Buyya, 2005a). The major grid workflow scheduling in hierarchical mode. Furthermore, a two phase algorithm,
algorithms have been classified into two basic categories called H2GS (Mohammad and Nawwaf, 2011) has been
which are best-effort based scheduling and QoS constraint proposed for task scheduling in heterogeneous processor
based scheduling (Yu and Buyya, 2005b). In traditional networks. The first phase implements a heuristic list based
community based computing paradigms, best-effort based algorithm, called LDCP to generate a high quality schedule. In
scheduling strategies based are often applied to only the second phase, this schedule is injected into the initial
minimise the execution time without considering the population of GA, which proceeds to evolve shorter schedules.
monetary cost since resources are shared freely among Amir and Mohammad (2008) embedded the elitism method
98 A. Verma and S. Kaushal
into GA to generate the shorter schedules as well as to decrease 3.1 Basic definitions
the computation time to find the sub-optimal schedule as
Bottom level (b-level): The b-level of a task is the length of
compared to basic GA. The authors further extend their work
the longest path from the task to a leaf task (Amir and
by improving sub-optimal results using Simulated Annealing
Mohammad, 2009). The b-level of node is calculated as:
(SA) (Amir and Mohammad, 2009) by considering the
multiprocessor systems.
Only few researches addressed the workflow scheduling
blevel ti wi max {dij blevel t j }
t j succ ti
(1)
Table 1 shows the expected completion time of various Figure 3 Schedule according to t-level
tasks on three different machines. b-level and t-level of all
tasks is calculated using equations (1) and (2), respectively. T4 T8
M1: T1 T7
Then the tasks are sorted in descending order of b-level and
in ascending order of t-level to decide the order of execution
of all the tasks (Table 2).
Tasks/Machines M1 M2 M3
T1 3 5 1
T2 2 3 1
T3 3 5 1 M3: T3 T6 T9
T4 2 3 1
T5 2 3 1
T6 2 3 1 4 Standard genetic algorithm (SGA)
T7 2 3 1
T8 4 6 2 Genetic algorithms provide robust search techniques that allow
T9 3 5 1 a high-quality solution to be derived from a large search space
T10 2 3 1 in polynomial time, by applying the principle of evolution.
T11 5 7 3 GA is a specific class of evolutionary algorithms inspired
by evolutionary biology. A genetic algorithm combines the
exploitation of best solutions from past searches with the
Table 2 b-level and t-level of DAG
exploration of new regions of the solution space. Any solution
Order of Order of in the search space of the problem is represented by an
Parameters execution execution individual or chromosome (Yu and Buyya, 2006a). A genetic
AvgECT b-level t-level
tasks according according algorithm maintains a population of individuals that evolves
to b-level to t-level over generations towards the better solutions through a
T1 3 16 0 2 1 repetitive application of genetic operators such as crossover,
T2 2 17 0 1 2 mutation and selection (Yu and Buyya, 2006b). The quality of
T3 3 14 0 3 3 an individual in the population is determined by a fitness-
function. The fitness value indicates how good the individual is
T4 2 13 0 4 4
compared to others in the population (Kumar and Verma,
T5 2 11 5 5 5 2012).
T6 2 8 6 7 6
T7 2 10 7 6 7
4.1 Representation of individual in the population
T8 4 4 12 9 10
T9 3 3 11 10 9 A two-dimensional coding scheme has been employed by
many researchers (Yu and Buyya, 2006a; Yu and Buyya,
T10 2 2 10 11 8
2006b) for scheduling tasks in distributed systems. As
T11 5 5 12 8 11
illustrated in Figures 2 and 3, each schedule is simplified by
The tasks are sent to different machines according to their representing it as a 2D string. One dimension represents the
order of execution for completion of workflow application. numbers of machines while the other dimension shows the
Figures 2 and 3 show the schedules generated according to order of tasks on each machine.
b-level and t-level of DAG, respectively.
4.2 Fitness function
Figure 2 Schedule according to b-level
A fitness function is used to measure the quality of the
individuals in the population. The fitness function should
T8
M1: T1 T4 T7 encourage the formation of the solution to achieve the
objective function. For deadline constrained scheduling
problem, GA defines a fitness function as:
t I
M2: T2 T5 T10 T11 F I (3)
D
where t(I) is the completion time of an individual I and D is
the user specified deadline for scheduling the workflow
M3: T3 T6 T9 application. An individual is fit if the value of F(I) < 1,
otherwise the individual is not included into the population.
100 A. Verma and S. Kaushal
4.3 Crossover operation selects a resource and swaps the positions of two randomly
selected tasks on the resource. Replacing mutation re-
The idea behind the crossover is that it may result in an
allocates an alternative resource to a task in an individual. It
even better individual by combining two fittest individuals. randomly selects a task and replaces its current resource
Crossovers are used to create new individuals in the current assignment with a resource randomly selected in the
population by combining and rearranging parts of the existing resources which are able to execute the task (Edwin et al.,
individuals, selected with a crossover probability Cr as shown 1994). Mutation operation is applied using mutation
in Figure 4. A crossover point is selected according to height of probability Mr. A random number between 0 and 1 is
tasks (Edwin et al., 1994) that will cut the ordered list of tasks generated. If that number is less than Mr, then swap
for a particular machine into two halves. mutation is applied otherwise replace mutation is applied.
Crossover
M2: T1 T5 T10 T11 points
Crossover
M2: T2 T5 T8 T11 points
M3: T7 T9
T3
M2:
T1 T5 T8 T11
M3:
T3
T7 T9
M2:
T2 T5 T11
T10
M3:
T3 T7 T8
8 Validate each child according to the fitness function. Apply the crossover operator on the selected parent using
7
crossover probability Cr to create the children.
9 Add the valid child to create the new population
Apply the mutation operator with probability Mr on the
10 end while. 8
newly created children.
11 END 9 Validate each child according to the fitness function.
10 Add the valid child to create the new population
To increase the population diversity of SGA, we proposed
deadline constraint heuristic-based genetic algorithms. 11 end while.
12 END
6
Apply the selection operator to select the parent from the found in Bharathi et al. (2008). Figure 5 shows the approximate
population structure of a small instance of each workflow. These
Apply the crossover operator on the selected parent workflows have different structural properties in terms of their
7
using crossover probability Cr to create the children. basic components, i.e. pipeline, data aggregation, data
Apply the mutation operator with probability Mr on the distribution and their composition. For each workflow, tasks
8
newly created children. with same colour are of similar type. The Directed Acyclic
9 Validate each child according to the fitness function. Graph in XML (DAX) format for all these workflows
10 Add the valid child to create the new population are available at website (http://confluenece.peagasus.isi.edu/
11 end while. display.peagasus/WorkflowGenerator), from which we have
12 END chosen three sizes for our experiment, i.e. Small (about 50),
Medium (about 100 tasks) and large (about 1000 tasks).
The other important parameter of the experiments is the time of executing workflow. So, the deadline for whole workflow is
interval. Most of the current commercial clouds charge users defined as:
based on a long time interval equal to one hour like Amazon Deadline= α * Mf
(www.amaazon.com). Here a user has to pay for the whole
last time interval even if the user has used a fraction of it. where α is a deadline factor in range from 1.5 to 5 (Saeid
Therefore, the users prefer shorter time intervals in order to et al., 2013). For GA, the following parameters are set:
pay close to what they have really used like CloudSigma
Parameter Value
(www.cloudsigma.com). To evaluate the impact of short and
Initial population 10
long time interval on our algorithms, we consider two
different time intervals: a long one equal to one hour and a Crossover probability (Cr) 0.7
short one equal to five minutes (Saeid et al., 2013). Mutation probability (Mr) 0.1
Maximum generation 50
Figure 6 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version
for colours)
Figure 6 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version
for colours) (continued)
(e) SIPHT
Source: http://confluenece.peagasus.isi.edu/display.peagasus/ WorkflowGenerator
Figure 7 shows the average NSC of scheduling large three algorithms in all cases. As Montage and CyberShake
workflows with BGA, TGA, BTGA and SGA with a time consist of larger number of smaller jobs, their NSC value is
interval of one hour. The overall results are almost the same very high for a time interval of one hour as compared to
as Figure 6, except the value of NSC is increased as other three workflows. For scheduling small and medium
expected. It shows that BTGA outperforms than the other size workflow, we are getting the similar graphs.
Figure 7 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour
Figure 7 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour (continued)
(e) SIPHT
7 Conclusion and future work Deelman, E., Gannon, D, Shields, M. and Taylor, I. (2008)
‘Workflows and e-science: an overview of workflow system
features and capabilities’, Journal of Future Generation
In this paper, we have presented deadline constrained Computer Systems, Vol. 25, No. 5, pp.528–540.
heuristic-based genetic algorithms (HGAs) to schedule
Edwin, S.H, Nirwan, A. and Hong, R. (1994) ‘A genetic algorithm
applications to cloud resources that minimise the execution for multiprocessor scheduling’, IEEE Transactions on
cost while meeting the deadline for delivering the result. Parallel and Distributed Systems, Vol. 5, No. 2, pp.113–120.
Each workflow’s task is assigned priority using bottom level Fatima, A.O. and Mona, M.A. (2010) ‘Genetic algorithms for task
and top level. These priorities are then used to create the scheduling problem’, Journal of Parallel and Distributed
initial population of BGA, TGA and BTGA. The proposed Computing, Vol. 70, No. 1, pp.13–22.
algorithms are evaluated with synthetic workflows that are Fida, A. (2008) Workflow Scheduling For Service Oriented Cloud
Computing, Unpublished PhD Thesis, University of
based on realistic workflows with different structures and
Saskatchewan, Saskatoon, Canada.
different sizes. These algorithms also consider the pay-as- Florin, P., Ciprian, D. and Valentin, C. (2009) ‘Genetic algorithm for
you-use pricing model of current commercial clouds. The DAG scheduling in grid environments’, IEEE ICCP 2009:
comparison of proposed algorithms is done with SGA under Proceeding of International Conference on Intelligent Computer
same deadline constraint and pricing model. The simulation Communications and Processing, 27–29 August, Cluj-Napoca,
results show that our proposed algorithms have a promising Romania, pp.299–305.
performance as compared to SGA. From the three proposed Foster, I., Zhao, Y., Raicu, L. and Lu, S. (2008) ‘Cloud computing
and grid computing 360-degree compared’, Proceeding of
algorithms, BTGA is outperformed in all cases. In future we Grid Computing Environment Workshop, 12–16 November,
intend to improve our work for the real cloud environment Austin, TX, USA, pp.1–10.
including other QoS constraints and comparison can be Gabriel, M., Wolfgang, G. and Calvin, J.R. (2011) ‘Hybrid
made with other meta-heuristic techniques like PSO and computing—where hpc meets grid and cloud computing’,
ACO, etc. Journal of Future Generation Computer Systems, Vol. 27,
No. 5, pp.440–453.
Geelan, J. (2008) ‘Twenty one experts define cloud computing:
virtualization’, Electronic Magazine. Available online at:
References http://virtualization.sys-con.com/node/612375
Amir, M.R. and Mohammad, A.V. (2008) ‘A novel task Joanna, K. and Samee, U.K. (2012) ‘Multi-level hierarchic
scheduling in multiprocessor systems with genetic algorithm genetic-based scheduling of independent jobs in dynamic
by using elitism stepping method’, INFOCOMP – Journal of heterogeneous grid environment’, Journal of Information
Computer Science, Vol. 7, No. 2, pp.58–64. Sciences, Vol. 214, No. 12, pp.1–19.
Amir, M.R. and Mohammad, A.V. (2009) ‘A novel genetic algorithm Ke, L., Hai, J., Jinjun, C., Xiao, L., Dong, Y. and Yun, Y. (2010) ‘A
for static task scheduling in distributed systems’, Journal of compromised-time-cost scheduling algorithm in SwinDeW-C for
Computer Theory and Engineering, Vol. 1, No. 1, pp.1–6. instance-intensive cost-constrained workflows on cloud
computing platform’, International Journal of High Performance
Andrew, J.P., Thomas, M.K. and Thomas, J.N. (2010) ‘Multi- Computing Applications, Vol. 24, No. 4, pp.445–456.
heuristic dynamic task allocation using genetic algorithms in
Kumar, P. and Verma, A. (2012) ‘ Scheduling using improved
a heterogeneous distributed system’, Journal of Parallel and
genetic algorithm in cloud computing for independent tasks’,
Distributed Computing, Vol. 70, No. 7, pp.758–766.
Proceeding of International Conference on Advances in
Bharathi, S., Lanitchi, A., Deelman, E., Mehta, G., Su, M.H. and Computing, Communications and Informatics, 3–5 August,
Vahi, K. (2008) ‘Characterization of scientific workflows’, Chennai, India, pp.137–142.
3rd Workshop on Workflows in Support of Large Scale Liu, K. (2009) Scheduling Algorithms for Instance Intensive Cloud
Science, 17 November, Austin, TX, USA, pp.1–10. Workflows, Unpublished PhD Thesis, Swinburne University
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I. of Technology, Melbourne, Australia.
(2009) ‘Cloud computing and emerging it platforms: vision, Mohammad, I.D. and Nawwaf, K. (2011) ‘A hybrid heuristic-
hype, and reality for delivering computing as the 5th utility’, genetic algorithm for task scheduling in heterogeneous
Journal of Future Generation Computer Systems, Vol. 25, processor networks’, Journal of Parallel and Distributed
No. 6, pp.599–616. Computing, Vol. 71, No. 11, pp.1518–1531.
106 A. Verma and S. Kaushal
Pandey, S. (2010) Scheduling And Management Of Data Intensive Yu, J. and Buyya, R. (2005a) ‘Cost based scheduling of scientific
Application Workflows In Grid And Cloud Computing workflow application on utility grid’, Proceeding of 1st IEEE
Environment, Unpublished PhD Thesis, University of International Conference on e-Science and Grid Computing,
Melbourne, Melbourne, Australia. 1 July, Melbourne, Australia, pp.8–147.
Pandey, S., Karunamoorthy, D. and Buyya, R. (2011) ‘Workflow Yu, J. and Buyya, R. (2005b) ‘A taxonomy of workflow
engine for clouds’, Cloud Computing: Principles and management systems for grid computing’, Journal of Grid
Paradigms, Chapter 12, Weliy STM. Computing, Vol. 3, Nos. 1/2, pp.171–200.
Saeid, A., Mahmoud, N. and Dick, H.J.E. (2013) ‘Deadline- Yu, J. and Buyya, R. (2006a) ‘A budget constraint scheduling of
constrained workflow scheduling algorithms for infrastructure workflow application on utility grid using genetic algorithm’,
as a service clouds’, Journal of Future Generation Computer HPDC 2006: Proceeding of 15th IEEE International Symposium
Systems, Vol. 29, No. 1, pp.158–169. on High Performance Distributed Computing, 19–23 June, Paris,
Taylor, I., Deelman, E., Gannon, D. and Shields, M. (2007) Workflows pp.1–10.
for E-Science: Scientific Workflows for Grid, 1st ed., Springer. Yu, J. and Buyya, R. (2006b) ‘Scheduling scientific workflow
Verma, A. and Kaushal, S. (2011) ‘Cloud computing security issues applications with deadline and budget constraints using genetic
and challenges: a survey’, Proceeding of 1st International algorithms’, Scientific Programming Journal, Vol. 14, No. 3,
Conference on Advances in Computing and Communications, pp.217–230.
Part-IV, 22–24 July, Kochi, India, pp.445–454.
Yu, J. and Buyya, R. (2008) ‘Workflow scheduling algorithms for grid
Verma, A. and Kaushal, S. (2012) ‘Deadline and budget distribution computing’, in Xhafa, F. and Abraham, A. (Eds): Metaheuristics
based cost-time optimization workflow scheduling algorithm for for Scheduling in Distributed Computing Environment, Springer,
cloud’, IJCA Proceeding of International Conference on Recent Berlin.
Advances and Future Trends in IT, iRAFIT(7), 13–17 April,
Yuan, Y-C., Wang, K-J., Sun, X-S. and Guo, T. (2009) ‘An
Patiala, India, pp.1–4.
iterative heuristic for scheduling grid workflows with budget
Wieczorek, M., Prodan, R. and Hoheisel, A. (2007) Taxonomies of constraints’, Proceeding of 8th International Conference on
the Multi-Criteria Grid Workflow Scheduling Problem, Machine Learning and Cybernetics, 12–15 July, Baoding,
CoreGRID Technical Report Number TR-0106, 30 August. China, pp.1700–1705.
Xue, Z. and Wenhua, Z. (2010) ‘Grid workflow scheduling based
Zhangjun, W., Xiao, L., Zhiwei, N., Dong, Y. and Yun, Y. (2013) ‘A
on improved genetic algorithm’, Proceeding of International
market-oriented hierarchical scheduling strategy in cloud
Conference on Computer Design and Applications, 25–27
workflow systems’, Journal of Supercomputing, Vol. 63, No. 1,
June, Qinhuangdao, China, Vol. 5, pp.270–273.
pp.256–293.
Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D. and Jin, H. (2008)
‘An algorithm in swindew-c for scheduling transaction-
intensive cost-constrained cloud workflows’, e-Science08:
Proceeding of 4th IEEE International Conference on e- Websites
Science, 7–12 December, Indianapolis, IN, USA, pp.374–375.
http://confluenece.peagasus.isi.edu/display.peagasus/WorkflowGe
Yong, W., Bahati, R.M. and Bauer, M. (2011) ‘A novel deadline nerator (accessed 20 July 2012).
and budget constrained scheduling heuristics for computational
grids’, Journal of Central. South University of Technology, www.amazon.com (accessed on 10 August 2012).
Vol. 18, No. 2, pp.465−472. www.cloudsigma.com (accessed on 17 August 2012).