Ga Based Workflow Schedule

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

96 Int. J. Grid and Utility Computing, Vol. 5, No.

2, 2014

Deadline constraint heuristic-based genetic


algorithm for workflow scheduling in cloud

Amandeep Verma*
Department of Information Technology,
University Institute of Engineering and Technology,
Panjab University,
Chandigarh 160014, India
Email: amandeepverma@pu.ac.in
*Corresponding author

Sakshi Kaushal
Department of Computer Science and Engineering,
University Institute of Engineering and Technology,
Panjab University,
Chandigarh 160014, India
Email: sakshi@pu.ac.in

Abstract: Task scheduling and resource allocation are the key challenges of cloud computing.
Compared with grid environment, data transfer is a big overhead for cloud workflows. So, the
cost arising from data transfers between resources as well as execution costs must also be taken
into account during scheduling based upon user’s Quality of Service (QoS) constraints. In this
paper, we present Deadline Constrained Heuristic based Genetic Algorithms (HGAs) to schedule
applications to cloud resources that minimise the execution cost while meeting the deadline
for delivering the result. Each workflow’s task is assigned priority using bottom-level (b-level)
and top-level (t-level). To increase the population diversity, these priorities are then used to
create the initial population of HGAs. The proposed algorithms are simulated and evaluated with
synthetic workflows based on realistic workflows. The simulation results show that our proposed
algorithms have a promising performance as compared to Standard Genetic Algorithm (SGA).

Keywords: cloud computing; workflow; scheduling; DAG; genetic algorithm; grid computing.

Reference to this paper should be made as follows: Verma, A. and Kaushal, S. (2014) ‘Deadline
constraint heuristic-based genetic algorithm for workflow scheduling in cloud’, Int. J. Grid and
Utility Computing, Vol. 5, No. 2, pp.96–106.

Biographical notes: Amandeep Verma is an Assistant Professor (IT) in UIET, Panjab


University, Chandigarh, India. She is currently pursuing her PhD in workflow scheduling in
cloud computing. She received her BTech from Punjab Technical University, Jalandhar (Punjab),
India in 2002 and MTech degree in Computer Science Engineering from Punjabi University,
Patiala (Punjab), India in 2004. Her area of interest includes parallel and distributed computing,
computer networks and cloud computing.

Sakshi Kaushal is currently an Associate Professor at Department of Computer Science and


Engineering in UIET, Panjab University, Chandigarh, India. She received her PhD in Computer
Science and Engineering from Thapar University, Patiala, India in 2009. Her research interests
include computer networks and cloud computing. She has published various papers in the field of
mobility in wireless networks, analysis and implementation of mechanisms to optimise network
performance in high speed networks and cloud environment.

1 Introduction Internet (Foster et al., 2008). It attracts increasing interests


of researchers in the area of Distributed and Parallel
Cloud computing is emerging as the latest distributed Computing and Service Oriented Computing (Buyya et al.,
computing paradigm that provides a dynamically scalable 2009). Cloud computing delivers hardware infrastructure
service delivery and consumption platform facilitated and software applications as services (Verma and Kaushal,
through virtualisation of hardware and software with the 2011). It adopts a market-oriented business model where
provision of consuming various services on demand over users are charged for consuming cloud services such as

Copyright © 2014 Inderscience Enterprises Ltd.


Deadline constraint heuristic-based genetic algorithm 97

computing, storage and network services like conventional system users. Many heuristic algorithms such as Minimum
utilities in everyday life (e.g., water, electricity, gas and Execution Time, Minimum Completion Time, Min-min and
telephony) (Gabriel et al., 2011), on a pay-as-you-go basis. Max-min are used as candidates for best-effort based
Meanwhile, cloud service providers are obligated to scheduling strategies (Yu and Buyya, 2008). Yong et al.
provider satisfactory Quality of Service (QoS) based on (2011) has proposed a Deadline and Budget Constrained
business service contracts (Geelan, 2008). (DBC) scheduling heuristics to deal with sequential workflow
Workflow scheduling is one of the key issues in the applications in grids, considering completion time, budget
workflow management especially in the grid and cloud and Relative Cost (RC) together as QoS constraints.
workflow systems. It is a process that maps and manages the Yu and Buyya (2005a) proposed a cost-based workflow
execution of inter-dependent tasks on the distributed resources scheduling algorithm to schedule scientific workflow
(Taylor et al., 2007). It allocates suitable resources to workflow applications on utility grid that minimises execution cost
tasks such that the execution can be completed to satisfy while meeting the deadline for delivering results. It also
objective functions imposed by users. Proper scheduling can handled the delays of service executions by rescheduling
have significant impact on the performance of the system unexecuted tasks. A budget constraint, priority rule-based
(Deelman et al., 2008). However, in general, the problem of iterative heuristic, Maximum Profit with Serial Complexity
mapping tasks on distributed services belongs to a class of (MPSC) for grid workflows, has also been proposed to
problems known as NP-complete problems (Yu and Buyya, improve iteratively the initial feasible solution within a few
2005a). For such problems, no known algorithms are able iterations and less computation time (Yuan et al., 2009).
to generate the optimal solution within polynomial time Many researchers used Genetic Algorithm (GA) for task
(Wieczorek et al., 2007). assignment. Yu and Buyya (2006a, 2006b) proposed a
There are two major types of workflow scheduling, best- modified GA as the crossover and mutation operations of
effort based and QoS constraint based scheduling, primarily SGAs that focused on homogenous and non reservation-
for grid workflow management systems (Yu and Buyya, enabled multiprocessor systems are unable to be applied on
2008). These scheduling algorithms attempt to minimise the the grid directly. The fitness function of GA is developed to
execution time, ignoring other factors such as the monetary encourage the formation of the solutions to achieve the
cost of accessing resources. But, as cloud computing budget and deadline constraint time minimisation of
adopts ‘market-oriented business model’, so there is another workflow execution in grids while meeting a specified budget
important parameter other than the execution time, i.e. cost of for delivering results (Florin et al., 2009). Similarly, Xue and
accessing resources (Liu, 2009; Pandey 2010). Usually, faster Wenhua (2010) had proposed an improved genetic algorithm
resources are more expensive than the slower ones. Therefore, for grid workflows in which, chromosomes of poor fitness
scheduling algorithms applicable in grid workflow scheduling make secondary preferential hybridisation and mutation with
cannot be directly applied over cloud workflow applications the overall best individual to increase the convergence rate of
(Fida 2008; Pandey et al., 2011). population. Fatima and Mona (2010) developed Critical Path
In this paper, we proposed deadline constraint Genetic Algorithm (CPGA), based on rescheduling Critical
Heuristic-based Genetic Algorithms (HGAs) to schedule Path Nodes (CPNs) in the chromosome through different
applications to cloud resources that minimise the execution generations and Task Duplication Genetic Algorithm
cost while meeting the deadline for delivering the result. (TDGA), based on task duplication techniques to overcome
The remaining paper is organised as follows: Section 2 the communication overhead and improved the scheduling
gives the related work in the area of workflow scheduling. performance. Andrew et al. (2010) proposed a multi-heuristic
Section 3 presents the workflow scheduling model. The evolutionary task algorithm to dynamically map tasks to
Standard Genetic Algorithm (SGA) and proposed HGAs processor in a heterogeneous distributed system utilising GA,
are discussed in Sections 4 and 5, respectively. The combined with eight heuristics in an effort to minimise the
proposed HGAs are evaluated and compared with SGA in total execution time.
Section 6 and Section 7 concludes the paper. A Hierarchic Genetic Scheduler (Joanna and Samee, 2012)
is developed for improving the effectiveness of the single
population genetic based scheduler in the dynamic grid
2 Related work environment for scheduling independent jobs. The authors
considered the bi-objective independent batch job
Workflow scheduling is classical NP-complete problem (Yu scheduling problem with makespan and flowtime minimised
and Buyya, 2005a). The major grid workflow scheduling in hierarchical mode. Furthermore, a two phase algorithm,
algorithms have been classified into two basic categories called H2GS (Mohammad and Nawwaf, 2011) has been
which are best-effort based scheduling and QoS constraint proposed for task scheduling in heterogeneous processor
based scheduling (Yu and Buyya, 2005b). In traditional networks. The first phase implements a heuristic list based
community based computing paradigms, best-effort based algorithm, called LDCP to generate a high quality schedule. In
scheduling strategies based are often applied to only the second phase, this schedule is injected into the initial
minimise the execution time without considering the population of GA, which proceeds to evolve shorter schedules.
monetary cost since resources are shared freely among Amir and Mohammad (2008) embedded the elitism method
98 A. Verma and S. Kaushal

into GA to generate the shorter schedules as well as to decrease 3.1 Basic definitions
the computation time to find the sub-optimal schedule as
Bottom level (b-level): The b-level of a task is the length of
compared to basic GA. The authors further extend their work
the longest path from the task to a leaf task (Amir and
by improving sub-optimal results using Simulated Annealing
Mohammad, 2009). The b-level of node is calculated as:
(SA) (Amir and Mohammad, 2009) by considering the
multiprocessor systems.
Only few researches addressed the workflow scheduling
blevel  ti   wi  max {dij  blevel t j }
t j  succ  ti 
  (1)

on the clouds. For cloud workflow systems, mainly QoS-


constraint based scheduling strategies based on market- where wi is the average execution time of the task on the
oriented business model are required. A compromised-time- different computing machines. succ(ti) includes all the
cost scheduling algorithm has been proposed to accommodate children tasks of ti. dij is the data transmission time from a
transaction-intensive (Yang et al., 2008) and instance- task ti–tj. If a task has no children, its b-level is equal to the
intensive (Ke et al., 2010) cost-constrained workflows in average execution time of the task on the different
cloud respectively by compromising execution time and cost computing machines.
with user input enabled on the fly. The algorithm cut down Top-level (t-level): The t-level of a task of DAG is
the mean execution cost by over 15% whilst meeting the user- defined to be the length of the longest path from the task to
designated deadline or shortens the mean execution time by the entry task without considering the execution time of that
over 20% within the user-designated execution cost. A task (Amir and Mohammad, 2009) and is given by the
market-oriented hierarchical scheduling strategy (Zhangjun following equation:
et al., 2011) has been developed for instance intensive workflow
applications, to do the workflow scheduling at two levels in
tlevel  ti   max {dij  tlevel t j  wi }
t j  pred  ti 
  (2)
cloud environment. A package based random scheduling
algorithm has been presented as the candidate service-level where wi is the average execution time of the task on the
scheduling algorithm and three representative metaheuristic different computing machines. pred(ti) includes all the
based scheduling algorithms including GA, Ant Colony parent tasks of ti. dij is the data transmission time from a
Optimisation (ACO) and Particle Swarm Optimisation (PSO) task ti–tj. For entry task i.e. a task has no parent, its t-level is
were adapted, implemented and analysed as the candidate task- equal to zero.
level scheduling algorithms. Estimated completion time (ECT): The ECT is a n x m
However, all these works do not consider different matrix where ECTi,j shows the estimated completion time of
pricing model of cloud environment. Saeid et al. (2013) a task ti on the machine mj. The users are charged based
proposed two workflow scheduling algorithms for cloud upon the number of time intervals that they have used the
environment: one-phase algorithm, IC-PCP and two-phase particular machine. All computation and storage services of
algorithm, IC-PCPD2. Both algorithms have a polynomial service provider are assumed to be in the same physical
time complexity for scheduling large workflows under region, so the average bandwidth between the different
deadline constrained. The author considered different type available machines is roughly equal (Saeid et al., 2013).
of pricing model for simulation. So we used Genetic
Algorithm to schedule workflow applications to cloud 3.2 An illustrative example
resources by considering the different pricing model. In this
Consider a DAG with 11 tasks as shown in Figure 1. Each
paper, we focus on minimising the execution cost while
edge weight of DAG represents the data transmission time
meeting the deadline for delivering the result.
between the tasks.

Figure 1 A sample DAG


3 Workflow scheduling model
T3  T4 
T1  T2 
A workflow application is modelled by a Directed Acyclic 3 5
1  1

Graph (DAG), defined by a tuple G (T, E), where T is the
set of n tasks {t1, t2,, tn} and E is a set of e edges,
represent the dependencies. Each ti ε T, represents a task in 2
the application and each edge (titj) ε E represents a T5  T6  T7 
precedence constraint, such that the execution of tj ε T
2 1 
cannot be started before ti ε T finishes its execution (Verma 5 3

3
and Kaushal, 2012). If (ti, tj) ε T, then ti is the parent of tj 1
and tj is the child of ti. A task with no parent is known
T11 
as an entry task and a task with no children is known as T9 
T10 
T8 
exit task.
Deadline constraint heuristic-based genetic algorithm 99

Table 1 shows the expected completion time of various Figure 3 Schedule according to t-level
tasks on three different machines. b-level and t-level of all
tasks is calculated using equations (1) and (2), respectively. T4 T8
M1: T1 T7 
Then the tasks are sorted in descending order of b-level and
in ascending order of t-level to decide the order of execution
of all the tasks (Table 2).

Table 1 ECT matrix M2: T2 T5 T10  T11

Tasks/Machines M1 M2 M3
T1 3 5 1
T2 2 3 1
T3 3 5 1 M3: T3  T6  T9 
T4 2 3 1
T5 2 3 1
T6 2 3 1 4 Standard genetic algorithm (SGA)
T7 2 3 1
T8 4 6 2 Genetic algorithms provide robust search techniques that allow
T9 3 5 1 a high-quality solution to be derived from a large search space
T10 2 3 1 in polynomial time, by applying the principle of evolution.
T11 5 7 3 GA is a specific class of evolutionary algorithms inspired
by evolutionary biology. A genetic algorithm combines the
exploitation of best solutions from past searches with the
Table 2 b-level and t-level of DAG
exploration of new regions of the solution space. Any solution
Order of Order of in the search space of the problem is represented by an
Parameters execution execution individual or chromosome (Yu and Buyya, 2006a). A genetic
AvgECT b-level t-level
tasks according according algorithm maintains a population of individuals that evolves
to b-level to t-level over generations towards the better solutions through a
T1 3 16 0 2 1 repetitive application of genetic operators such as crossover,
T2 2 17 0 1 2 mutation and selection (Yu and Buyya, 2006b). The quality of
T3 3 14 0 3 3 an individual in the population is determined by a fitness-
function. The fitness value indicates how good the individual is
T4 2 13 0 4 4
compared to others in the population (Kumar and Verma,
T5 2 11 5 5 5 2012).
T6 2 8 6 7 6
T7 2 10 7 6 7
4.1 Representation of individual in the population
T8 4 4 12 9 10
T9 3 3 11 10 9 A two-dimensional coding scheme has been employed by
many researchers (Yu and Buyya, 2006a; Yu and Buyya,
T10 2 2 10 11 8
2006b) for scheduling tasks in distributed systems. As
T11 5 5 12 8 11
illustrated in Figures 2 and 3, each schedule is simplified by
The tasks are sent to different machines according to their representing it as a 2D string. One dimension represents the
order of execution for completion of workflow application. numbers of machines while the other dimension shows the
Figures 2 and 3 show the schedules generated according to order of tasks on each machine.
b-level and t-level of DAG, respectively.
4.2 Fitness function
Figure 2 Schedule according to b-level
A fitness function is used to measure the quality of the
individuals in the population. The fitness function should
T8 
M1:  T1  T4 T7  encourage the formation of the solution to achieve the
objective function. For deadline constrained scheduling
problem, GA defines a fitness function as:
t I 
M2:  T2  T5  T10  T11 F I   (3)
D
where t(I) is the completion time of an individual I and D is
the user specified deadline for scheduling the workflow
M3:  T3  T6  T9  application. An individual is fit if the value of F(I) < 1,
otherwise the individual is not included into the population.
100 A. Verma and S. Kaushal

4.3 Crossover operation selects a resource and swaps the positions of two randomly
selected tasks on the resource. Replacing mutation re-
The idea behind the crossover is that it may result in an
allocates an alternative resource to a task in an individual. It
even better individual by combining two fittest individuals. randomly selects a task and replaces its current resource
Crossovers are used to create new individuals in the current assignment with a resource randomly selected in the
population by combining and rearranging parts of the existing resources which are able to execute the task (Edwin et al.,
individuals, selected with a crossover probability Cr as shown 1994). Mutation operation is applied using mutation
in Figure 4. A crossover point is selected according to height of probability Mr. A random number between 0 and 1 is
tasks (Edwin et al., 1994) that will cut the ordered list of tasks generated. If that number is less than Mr, then swap
for a particular machine into two halves. mutation is applied otherwise replace mutation is applied.

4.4 Mutation operation 4.5 Selection operation


There are two types of mutation operations, swapping The selection operation is implemented using the roulette
mutation and replacing mutation. Swapping mutation randomly wheel (Edwin et al., 1994).

Figure 4 Applying crossover to two fittest individuals

Parent 1: M1  T2  T4  T6  T9 

Crossover 
M2:  T1 T5  T10 T11 points 

M3:  T3  T7  T8 

Parent 2: M1 T1  T4  T6  T10 


:

Crossover 
M2:  T2  T5  T8  T11  points 

M3:  T7  T9
T3

(a) Two Fittest individuals along their crossover points

Child 1:  M1:  T2  T4 T6 T10

M2: 
T1  T5 T8 T11 

M3: 
T3 
T7  T9 

Child 2:  M1:  T6  T9 


T1  T4

M2: 
T2  T5  T11 
T10 

M3: 
T3  T7  T8 

(b) Two children after applying crossover operation on (a)


Deadline constraint heuristic-based genetic algorithm 101

The pseudocode for SGA is given as: Pseudocode for TGA

Pseudocode for SGA 1 BEGIN


Calculate the t-level of all the tasks of workflow using
1 BEGIN 2
the equation (2).
Create an initial population consists of randomly
2 Create the initial population of TGA as the first
generated solutions.
individual is created by assigning the tasks according to
3 While termination criteria are not met do their t-level in ascending order to the available machines.
3
Evaluate the fitness of the individual in the population The rest of the individuals are created using random
4 assignment of the tasks to the available machines. Each
using equation (3).
individual is encoded using the 2-d encoding.
Apply the selection operator to select the parent from the
5 4 While termination criteria are not met do
population
Apply the crossover operator on the selected parent Evaluate the fitness of the individual in the population
6 5
using crossover probability Cr to create the children. using equation (3).
Apply the mutation operator with probability Mr on the Apply the selection operator to select the parent from the
7 6
newly created children. population

8 Validate each child according to the fitness function. Apply the crossover operator on the selected parent using
7
crossover probability Cr to create the children.
9 Add the valid child to create the new population
Apply the mutation operator with probability Mr on the
10 end while. 8
newly created children.
11 END 9 Validate each child according to the fitness function.
10 Add the valid child to create the new population
To increase the population diversity of SGA, we proposed
deadline constraint heuristic-based genetic algorithms. 11 end while.
12 END

5 Proposed algorithms 5.2 Top-level GA (TGA)


5.1 Bottom-level GA (BGA) In TGA, the first individual of initial population is created by
assigning the tasks according to their t-level in ascending order
A deadline constraint heuristic-based BGA is proposed in to the available machines. The rest of the individuals are
which each workflow task’s priority is assigned using
created using random assignment of the tasks to the available
bottom level. Then, these priorities are used to create the
machines. The pseudocode for TGA is presented above.
initial population of BGA. The pseudocode for the BGA is
given below:
5.3 Bottom-level and top-level GA (BTGA)
Pseudocode for BGA
BTGA creates the first individual of the initial population
1 BEGIN
by assigning the tasks according to their b-level in
Calculate the b-level of all the tasks of workflow using descending order to the available machines. For the rest of
2
the equation (1).
the individuals, firstly, the priority of each task is set equal
Create the initial population of BGA as the first to the total of its b-level and a random number which is
individual is created by assigning the tasks according to
their b-level in descending order to the available generated in the range of its (t-level/2, –t-level/2). Then all
3 machines. The rest of the individuals are created using the tasks are assigned to the available machines according to
random assignment of the tasks to the available their priority. The pseudocode for BTGA is given below:
machines. Each individual is encoded using the 2-d
encoding. Pseudocode for BTGA
4 While termination criteria are not met do 1 BEGIN
Evaluate the fitness of the individual in the population Calculate the b-level and t-level of all the tasks of
5 2
using equation (3). workflow using the equations (1) and (2).
Apply the selection operator to select the parent from the Create the initial population of BTGA as the first
6 individual is created by assigning the tasks according to
population
their b-level in descending order to the available
Apply the crossover operator on the selected parent using machines. For the rest of the individuals, firstly, the
7
crossover probability Cr to create the children. 3 priority of each task is set equal to the total of its b-level
Apply the mutation operator with probability Mr on the and a random number which is generated in the range of
8
newly created children. its (t-level/2, –t-level/2). Then all the tasks are assigned
9 Validate each child according to the fitness function. to the available machines according to their priority.
Each individual is encoded using the 2-d encoding.
10 Add the valid child to create the new population
4 While termination criteria are not met do
11 end while.
Evaluate the fitness of the individual in the population
12 END 5
using equation (3).
102 A. Verma and S. Kaushal

6
Apply the selection operator to select the parent from the found in Bharathi et al. (2008). Figure 5 shows the approximate
population structure of a small instance of each workflow. These
Apply the crossover operator on the selected parent workflows have different structural properties in terms of their
7
using crossover probability Cr to create the children. basic components, i.e. pipeline, data aggregation, data
Apply the mutation operator with probability Mr on the distribution and their composition. For each workflow, tasks
8
newly created children. with same colour are of similar type. The Directed Acyclic
9 Validate each child according to the fitness function. Graph in XML (DAX) format for all these workflows
10 Add the valid child to create the new population are available at website (http://confluenece.peagasus.isi.edu/
11 end while. display.peagasus/WorkflowGenerator), from which we have
12 END chosen three sizes for our experiment, i.e. Small (about 50),
Medium (about 100 tasks) and large (about 1000 tasks).

6 Performance evaluation 6.1 Experiment setup


For our experiment, we have simulated a cloud environment
In this section, we present our simulations of all the proposed
in java which consists of a data centre. The Virtual
algorithms. To evaluate the workflow scheduling algorithm,
Machines (VMs) are created over the physical resources in a
we used five synthetic workflows based on realistic
data centre. So, VMs in a data centre includes the resources
workflows from diverse scientific applications, which are:
with different processing speed and hence with different
 Montage: Astronomy pricing models. The processor speeds of VMs are selected
 CyberShake: Earthquake randomly in the range of 1000–5000 MIPS and price of
using these VMs is set within a range of 2–10 basic units
 Epigenomics: Biology such that fastest VM is roughly five times more expensive
 LIGO: Gravitational physics than the slowest one. The average bandwidth between these
VMs is assumed to be 20 Mbps (Saeid et al., 2013).
 SIPHT: Biology
The network transfer within a data centre is assumed to
The detailed characterisation for each workflow including their be free and only the cost for storage is considered during
structure and data and computational requirements can be experiment.

Figure 5 Structure of various workflows (see online version for colours)

(a) Montage (b) Epigenomics (c) SIPHT

(d) LIGO (e) CyberShake


Deadline constraint heuristic-based genetic algorithm 103

The other important parameter of the experiments is the time of executing workflow. So, the deadline for whole workflow is
interval. Most of the current commercial clouds charge users defined as:
based on a long time interval equal to one hour like Amazon Deadline= α * Mf
(www.amaazon.com). Here a user has to pay for the whole
last time interval even if the user has used a fraction of it. where α is a deadline factor in range from 1.5 to 5 (Saeid
Therefore, the users prefer shorter time intervals in order to et al., 2013). For GA, the following parameters are set:
pay close to what they have really used like CloudSigma
Parameter Value
(www.cloudsigma.com). To evaluate the impact of short and
Initial population 10
long time interval on our algorithms, we consider two
different time intervals: a long one equal to one hour and a Crossover probability (Cr) 0.7
short one equal to five minutes (Saeid et al., 2013). Mutation probability (Mr) 0.1
Maximum generation 50

6.2 Performance metrics


6.3 Experiment result
The performance metric chosen for the comparison is
Normalised Schedule Cost (NSC). The NSC of a schedule is As GA is a stochastic algorithm, so each algorithm is run ten
calculated as: times for each workflow and the average value of NSC is used
for comparing BGA, TGA, BTGA and SGA. Figure 6 shows
Total ScheduleCost the average NSC of scheduling large workflows with BGA,
NSC  (4)
Cc TGA, BTGA and SGA with a time interval of 5 minutes. It
shows that BTGA outperforms than the other three algorithms in
where Cc is the cost of executing the same workflow with all cases. It is clear from Figures 6 (a) and (b) that average NSC
the cheapest strategy i.e. to schedule all workflow tasks on for Montage and Cybershake is high as compared to other
the cheapest VM, according their precedence constraints. workflow structures as both of these workflow structure consist
For assigning the deadline, first we define the fastest of large number of tasks with smaller execution time on fastest
schedule as scheduling each workflow task on the fastest VM, VM at the second row, thus increasing the overall execution
according their precedence constraints, considering all data cost of the workflow. In case of SIPHT, the performance of
transmission time as zero. Thus the makespan of this fastest SGA and TGA is very similar as its structure consists of large
schedule, denoted by Mf, is a lower bound for the makespan number of top level tasks having priority zero.

Figure 6 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version
for colours)

(a) Montage (b) CyberShake

(c) Epigenomics (d) LIGO


104 A. Verma and S. Kaushal

Figure 6 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version
for colours) (continued)

(e) SIPHT
Source: http://confluenece.peagasus.isi.edu/display.peagasus/ WorkflowGenerator

Figure 7 shows the average NSC of scheduling large three algorithms in all cases. As Montage and CyberShake
workflows with BGA, TGA, BTGA and SGA with a time consist of larger number of smaller jobs, their NSC value is
interval of one hour. The overall results are almost the same very high for a time interval of one hour as compared to
as Figure 6, except the value of NSC is increased as other three workflows. For scheduling small and medium
expected. It shows that BTGA outperforms than the other size workflow, we are getting the similar graphs.

Figure 7 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour

(a) Montage (b) CyberShake

(c) Epigenomics (d) LIGO


Deadline constraint heuristic-based genetic algorithm 105

Figure 7 The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour (continued)

(e) SIPHT

7 Conclusion and future work Deelman, E., Gannon, D, Shields, M. and Taylor, I. (2008)
‘Workflows and e-science: an overview of workflow system
features and capabilities’, Journal of Future Generation
In this paper, we have presented deadline constrained Computer Systems, Vol. 25, No. 5, pp.528–540.
heuristic-based genetic algorithms (HGAs) to schedule
Edwin, S.H, Nirwan, A. and Hong, R. (1994) ‘A genetic algorithm
applications to cloud resources that minimise the execution for multiprocessor scheduling’, IEEE Transactions on
cost while meeting the deadline for delivering the result. Parallel and Distributed Systems, Vol. 5, No. 2, pp.113–120.
Each workflow’s task is assigned priority using bottom level Fatima, A.O. and Mona, M.A. (2010) ‘Genetic algorithms for task
and top level. These priorities are then used to create the scheduling problem’, Journal of Parallel and Distributed
initial population of BGA, TGA and BTGA. The proposed Computing, Vol. 70, No. 1, pp.13–22.
algorithms are evaluated with synthetic workflows that are Fida, A. (2008) Workflow Scheduling For Service Oriented Cloud
Computing, Unpublished PhD Thesis, University of
based on realistic workflows with different structures and
Saskatchewan, Saskatoon, Canada.
different sizes. These algorithms also consider the pay-as- Florin, P., Ciprian, D. and Valentin, C. (2009) ‘Genetic algorithm for
you-use pricing model of current commercial clouds. The DAG scheduling in grid environments’, IEEE ICCP 2009:
comparison of proposed algorithms is done with SGA under Proceeding of International Conference on Intelligent Computer
same deadline constraint and pricing model. The simulation Communications and Processing, 27–29 August, Cluj-Napoca,
results show that our proposed algorithms have a promising Romania, pp.299–305.
performance as compared to SGA. From the three proposed Foster, I., Zhao, Y., Raicu, L. and Lu, S. (2008) ‘Cloud computing
and grid computing 360-degree compared’, Proceeding of
algorithms, BTGA is outperformed in all cases. In future we Grid Computing Environment Workshop, 12–16 November,
intend to improve our work for the real cloud environment Austin, TX, USA, pp.1–10.
including other QoS constraints and comparison can be Gabriel, M., Wolfgang, G. and Calvin, J.R. (2011) ‘Hybrid
made with other meta-heuristic techniques like PSO and computing—where hpc meets grid and cloud computing’,
ACO, etc. Journal of Future Generation Computer Systems, Vol. 27,
No. 5, pp.440–453.
Geelan, J. (2008) ‘Twenty one experts define cloud computing:
virtualization’, Electronic Magazine. Available online at:
References http://virtualization.sys-con.com/node/612375
Amir, M.R. and Mohammad, A.V. (2008) ‘A novel task Joanna, K. and Samee, U.K. (2012) ‘Multi-level hierarchic
scheduling in multiprocessor systems with genetic algorithm genetic-based scheduling of independent jobs in dynamic
by using elitism stepping method’, INFOCOMP – Journal of heterogeneous grid environment’, Journal of Information
Computer Science, Vol. 7, No. 2, pp.58–64. Sciences, Vol. 214, No. 12, pp.1–19.
Amir, M.R. and Mohammad, A.V. (2009) ‘A novel genetic algorithm Ke, L., Hai, J., Jinjun, C., Xiao, L., Dong, Y. and Yun, Y. (2010) ‘A
for static task scheduling in distributed systems’, Journal of compromised-time-cost scheduling algorithm in SwinDeW-C for
Computer Theory and Engineering, Vol. 1, No. 1, pp.1–6. instance-intensive cost-constrained workflows on cloud
computing platform’, International Journal of High Performance
Andrew, J.P., Thomas, M.K. and Thomas, J.N. (2010) ‘Multi- Computing Applications, Vol. 24, No. 4, pp.445–456.
heuristic dynamic task allocation using genetic algorithms in
Kumar, P. and Verma, A. (2012) ‘ Scheduling using improved
a heterogeneous distributed system’, Journal of Parallel and
genetic algorithm in cloud computing for independent tasks’,
Distributed Computing, Vol. 70, No. 7, pp.758–766.
Proceeding of International Conference on Advances in
Bharathi, S., Lanitchi, A., Deelman, E., Mehta, G., Su, M.H. and Computing, Communications and Informatics, 3–5 August,
Vahi, K. (2008) ‘Characterization of scientific workflows’, Chennai, India, pp.137–142.
3rd Workshop on Workflows in Support of Large Scale Liu, K. (2009) Scheduling Algorithms for Instance Intensive Cloud
Science, 17 November, Austin, TX, USA, pp.1–10. Workflows, Unpublished PhD Thesis, Swinburne University
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I. of Technology, Melbourne, Australia.
(2009) ‘Cloud computing and emerging it platforms: vision, Mohammad, I.D. and Nawwaf, K. (2011) ‘A hybrid heuristic-
hype, and reality for delivering computing as the 5th utility’, genetic algorithm for task scheduling in heterogeneous
Journal of Future Generation Computer Systems, Vol. 25, processor networks’, Journal of Parallel and Distributed
No. 6, pp.599–616. Computing, Vol. 71, No. 11, pp.1518–1531.
106 A. Verma and S. Kaushal

Pandey, S. (2010) Scheduling And Management Of Data Intensive Yu, J. and Buyya, R. (2005a) ‘Cost based scheduling of scientific
Application Workflows In Grid And Cloud Computing workflow application on utility grid’, Proceeding of 1st IEEE
Environment, Unpublished PhD Thesis, University of International Conference on e-Science and Grid Computing,
Melbourne, Melbourne, Australia. 1 July, Melbourne, Australia, pp.8–147.
Pandey, S., Karunamoorthy, D. and Buyya, R. (2011) ‘Workflow Yu, J. and Buyya, R. (2005b) ‘A taxonomy of workflow
engine for clouds’, Cloud Computing: Principles and management systems for grid computing’, Journal of Grid
Paradigms, Chapter 12, Weliy STM. Computing, Vol. 3, Nos. 1/2, pp.171–200.
Saeid, A., Mahmoud, N. and Dick, H.J.E. (2013) ‘Deadline- Yu, J. and Buyya, R. (2006a) ‘A budget constraint scheduling of
constrained workflow scheduling algorithms for infrastructure workflow application on utility grid using genetic algorithm’,
as a service clouds’, Journal of Future Generation Computer HPDC 2006: Proceeding of 15th IEEE International Symposium
Systems, Vol. 29, No. 1, pp.158–169. on High Performance Distributed Computing, 19–23 June, Paris,
Taylor, I., Deelman, E., Gannon, D. and Shields, M. (2007) Workflows pp.1–10.
for E-Science: Scientific Workflows for Grid, 1st ed., Springer. Yu, J. and Buyya, R. (2006b) ‘Scheduling scientific workflow
Verma, A. and Kaushal, S. (2011) ‘Cloud computing security issues applications with deadline and budget constraints using genetic
and challenges: a survey’, Proceeding of 1st International algorithms’, Scientific Programming Journal, Vol. 14, No. 3,
Conference on Advances in Computing and Communications, pp.217–230.
Part-IV, 22–24 July, Kochi, India, pp.445–454.
Yu, J. and Buyya, R. (2008) ‘Workflow scheduling algorithms for grid
Verma, A. and Kaushal, S. (2012) ‘Deadline and budget distribution computing’, in Xhafa, F. and Abraham, A. (Eds): Metaheuristics
based cost-time optimization workflow scheduling algorithm for for Scheduling in Distributed Computing Environment, Springer,
cloud’, IJCA Proceeding of International Conference on Recent Berlin.
Advances and Future Trends in IT, iRAFIT(7), 13–17 April,
Yuan, Y-C., Wang, K-J., Sun, X-S. and Guo, T. (2009) ‘An
Patiala, India, pp.1–4.
iterative heuristic for scheduling grid workflows with budget
Wieczorek, M., Prodan, R. and Hoheisel, A. (2007) Taxonomies of constraints’, Proceeding of 8th International Conference on
the Multi-Criteria Grid Workflow Scheduling Problem, Machine Learning and Cybernetics, 12–15 July, Baoding,
CoreGRID Technical Report Number TR-0106, 30 August. China, pp.1700–1705.
Xue, Z. and Wenhua, Z. (2010) ‘Grid workflow scheduling based
Zhangjun, W., Xiao, L., Zhiwei, N., Dong, Y. and Yun, Y. (2013) ‘A
on improved genetic algorithm’, Proceeding of International
market-oriented hierarchical scheduling strategy in cloud
Conference on Computer Design and Applications, 25–27
workflow systems’, Journal of Supercomputing, Vol. 63, No. 1,
June, Qinhuangdao, China, Vol. 5, pp.270–273.
pp.256–293.
Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D. and Jin, H. (2008)
‘An algorithm in swindew-c for scheduling transaction-
intensive cost-constrained cloud workflows’, e-Science08:
Proceeding of 4th IEEE International Conference on e- Websites
Science, 7–12 December, Indianapolis, IN, USA, pp.374–375.
http://confluenece.peagasus.isi.edu/display.peagasus/WorkflowGe
Yong, W., Bahati, R.M. and Bauer, M. (2011) ‘A novel deadline nerator (accessed 20 July 2012).
and budget constrained scheduling heuristics for computational
grids’, Journal of Central. South University of Technology, www.amazon.com (accessed on 10 August 2012).
Vol. 18, No. 2, pp.465−472. www.cloudsigma.com (accessed on 17 August 2012).

You might also like