108220130104

Archive of SID
Int. J. Environ. Res., 7(1):27-38, Winter 2013

ISSN: 1735-6865
Forecasting Municipal Solid waste Generation by Hybrid Support Vector

Machine and Partial Least Square Model
Abbasi, M.*, Abduli, M.A. , Omidvar, B. , Baghvand, A.
Faculty of Environment, University of Tehran, Tehran, Iran
Received 12 Oct. 2011; Revised 14 Oct. 2012; Accepted 18 Oct. 2012
ABSTRACT: Forecasting of municipal waste generation is a critical challenge for decision making and planning,
because proper planning and operation of a solid waste management system is intensively affected by municipal
solid waste (MSW) streams analysis and accurate predictions of solid waste quantities generated. Due to
dynamic and complexity of solid waste management system, models by artificial intelligence can be a useful
solution of this problem. In this paper, a novel method of Forecasting MSW generation has been proposed.
Here, support vector machine (SVM) as an intelligence tool combined with partial least square (PLS) as a
feature selection tool was used to weekly prediction of MSW generated in Tehran, Iran. Weekly MSW
generated in the period of 2008 to 2011 was used as input data for model learning. Moreover, Monte Carlo
method was used to analyze uncertainty of the model results. Model performance evaluated and compared by
statistical indices of Relative Mean Errors, Root Mean Squared Errors, Mean Absolute Relative Error and
coefficient of determination. Comparison of SVM and PLS-SVM model showed PLS-SVM is superior to
SVM model in predictive ability and calculation time saving. Also, results demonstrate which PLS could
successfully identify the complex nonlinearity and correlations among input variables and minimize them. The
uncertainty analysis also verified that the PLS-SVM model had more robustness than SVM and had a lower
sensitivity to change of input variables.
Key words: Municipal Solid Waste, Support Vector Machine, Partial Least Square, Intelligent model
INTRODUCTION
Prediction of solid waste generation is the initial dynamic properties in the process of MSW generation
and major important step in planning and operation of and consequently the process must be fully
MSW management system (Chang and Lin, 1997; H.W. characterized. However, there are attempts which use
Chen, 2000; Thanh and Matsui, 2011; Arshad et al., the current information about input variables to
2011; Nouri et al., 2011; Hyun et al., 2011). Nowadays, forecast the future (Chang and Lin, 1997; Chang et al.,
various models have been proposed to forecast short 1993). But many developing countries may not have
and long term MSW generation which demonstrate sufficient budget and management task force available
difficulty of problem (Beigl and Lebersorger, 2009; to maintain a long-term and large-scale sampling and
Maqbool et al., 2011; Chen et al., 2011; Safari et al., analysis program. On the other hand, classic statistical
2011). Rapid waste generation growth, lack of models such as most commonly used multiple
information, affection of variable and out of control regression models cannot learn from new data and
factors on waste generation cause the forecasting to their precision is poor when inaccurate data are used
be a complex engineering problem especially in the to find information hidden in data and having a
developing countries (Abdoli et al., 2012; Nada et al., universal approximation (Zhu and ReVelle, 1993 ;
2012; Rashidi et al., 2012; Shafieiyoun et al., 2012; Svozil, 1997; Blasco et al., 1998). Based on the
Mahmoudkhani et al., 2012). In conventional methods, advantages of intelligence models, they become
waste generation is characterized by percapita indices popular and are inherently interested in all sciences
with respect to the demographic and socioeconomic also solid waste management (Bayar et al., 2009; Dong
factors (Grossman D, 1974; Mukherjee, 1997; Niessen et al., 2003; Karaca and Ozkaya, 2006). In these models,
WR, 1972). These models can be applied to the situation relation between input and output variables are first
in which underlying relationships have not significantly found by a learning process and then future outputs
changed over time. It means they don’t consider will be predicted. These data-driven models without
*Corresponding author E-mail:mar80162015@yahoo.com need of complete perception of MSW generation
27
www.SID.ir
Archive of SID Abbasi, M. et al.
process have high ability to model waste generation The aims of the study are to develop a novel method
fluctuations. Noori et al. used artificial neural network for prediction of solid waste generation using hybrid
(ANN) for forecasting waste generation in Mashhad, PLS-SVM model and analysis of uncertainty in the
Iran (Jalili and Noori, 2008). Model results showed good model results.
coincidence between empirical data and predictions.
However, ANN may not be able to precisely model MATERIALS & METHODS
non-stationary data if preprocessing of the input and/ Tehran, the capital of Iran, with population of
or output data is not performed. There are many data approximately 13 millions is the most important
selection and preprocessing methods which minimize metropolis and largest commercial and political centre
or convert inputs to useful information. Principal of the country. Daily waste generation in Tehran
component analysis (PCA), wavelet and Gamma test amounts to over 7500 tons. The total solid waste
were applied as data selection method in waste generated in Tehran during 2004 and 2005 was 2,614,904
generation prediction by Noori et al. (Noori et al., 2009). and 2,626,519 tons respectively and the total amount
The innate disadvantage such as over-fitting training, of MSW in these years was 2,561,069 and 2,570,988
local minimum, difficulty in determination of network tons respectively (Damghani et al., 2008). This amount
architecture and poor generalizing performance remain is 2.5-3 times more than other metropolises with the
unsolved and limit the application of the ANN approach same population. Management of this large quantity
into practice. Support vector machine, another of waste needs to prepare a suitable and precise model
intelligence model, developed by Vapnik can provide for forecasting solid waste generation. In this paper,
an effective novel approach to improve both weekly MSW generated in the period of 2008 to 2011
generalization performance of neural networks and was used to learn the model.
achieve global solutions simultaneously (Vapnik, 1995). To estimate the amount of generated waste in a
Recently, the SVM e-insensitive type has been city, seasonal patterns are more effective and applicable
extended to solve non-linear regression estimation and (Tchobanoglous, 1977). Therefore, weekly time series
time series prediction (Mukherjee, 1997; Broomhead, with 12 time lags (equal to a season) were inputted to
1998; Vapnik, 1997a, b). To provide the acceptable the model. So the predicted waste amounts were based
accuracy of prediction and speed calculations, data on 12 previous time series.
reduction and variable selection can be applied for
preprocessing inputs of SVM. There are many data Partial least square was proposed by Wold
reduction technique (Zhang et al., 2006; Zhang, 2007; originally (Wold, 1966). It can separate the information
Corcoran, 2003; Wang, 2006). The method selected for and noise of the predictors or independent variables
the present study is partial least square because it is (X). PLS works by successively extracting factors from
an unsupervised dimension reduction technique. both X and responses or dependent variables (Y) such
When the key area of application is multivariate that covariance between the extracted factors is
regression, there may be considerable improvement if maximized. The technique of PLS is similar to Principal
standardized liner combinations of predictive variables component (PCA). It also produces linear combinations
are built to capture as much information in the raw of the original surface parameters. However, PLS and
predictive variables as well as in the relation between PCA differ in the way they extract the principal
the predictive and target variables. PLS allows us to directions. PCA ignores the information in Y when
achieve this balance and provide an alternate approach building the principal components and PLS produces
to PCA technique (Saikat and Jun, 2008). the directions reflecting the relationship between Y and
X. So, PLS results will have more practical meanings.
Moreover, understanding uncertainty in a model
is important to interpreting its results. This becomes Assumed X is a matrix with n rows and p columns
especially important if the outcomes to be compared and Y is a matrix with n rows and q columns. PLS
are near one another in magnitude. Literature shows method can work with multivariate response variables
that just a few methods proposed for determining the (i.e., when Y is an n×q vector with q>1). However,
uncertainty such as bootstrap, sandwich estimator, here it supposed that Y is a single variable i.e., Y is n×1
maximum likelihood, and Bayesian inference which was and X is n×p. To build a PLS model, it is needed to
proposed by Marce and et al (2004). In order to provide regress X onto the x-scores (T), which are used to
the uncertainty associated with the estimation of predict the y-scores (U), which in turn are used to
MSWG, a Monte Carlo simulation was performed herein predict the responses Y. Thus X=TPT + E and Y=UQT +
due to its good performance (2007b; Aqil et al., 2007a). F, where Tn×r is X-scores, Un×r is Y-scores, Pp×r is X-
Monte Carlo simulation is a flexible tool for performing loadings, Q1×r is Y-loadings, En×p is X- residuals an Fn×1
uncertainty analysis of data-driven models. is Y-residuals. Decomposition is finalized so as to
28
www.SID.ir
Archive of SID Int. J. Environ. Res., 7(1):27-38, Winter 2013
maximize covariance between T and U. The solution of l

this optimization problem can be found below (Lorber w = ∑ (α i − α i * )Φ( x i ) (3)
et al., 1987; Wold et al., 1984). On eigenvalue i =1
decomposition process, the first extracted T and U are
of the form T=X.w and U=Y.c, where w and c are the By substituting equation (3) into equation (1), the
eigenvectors corresponding to the first eigenvalue of generic equation can be rewritten as:
XTYYTX and YTXXTY, respectively. It is noted XTY denotes l
the covariance of X and Y. Once the first factors have f ( x) = ∑ (α i − α i )(Φ ( xi ) ⋅ Φ ( x )) + b
*
been extracted the original values of X and Y are deflated i =1

(4)
as: X1=X – ttTX and Y1=Y- ttTY. The above process is l
= ∑ (α i − α i ) k ( x i , x ) + b
*
now repeated to extract the second PLS factors. The i =1
process continues until all possible latent variables
have been extracted T and U. More details of the PLS In equation (4) the dot product can be replaced with
procedure can be found in Geladi and Kowalski (Geladi, function k ( xi , x ) , known as the kernel function.
1986).
Kernel functions enable dot product to be performed
A brief description of the underlying principle of in h igh-dimensional feature space using low
Support vector machine (SVM) is presented here and dimensional space data input without knowing the
more details are described in literature (Vapnik, 1995, transformation Φ . All kernel functions must satisfy
1997a, 1998). In support vector machine, the input data Mercer’s condition that corresponds to the inner
is first mapped into high dimensional feature space by product of some feature space. The radial basis
the use of kernel function and then linear regression is function (RBF) is commonly used as the kernel for
performed in the feature space. The non-linear feature regression:
mapping will allow the treatment of non-linear problems
in a linear space. After training on set data SVM can be
used to predict the objects whose values are unknown.
k ( xi , x ) = exp − γ x − x i{ 2
} (5)
A regression SVM model estimates the functional
dependence of the dependent variable Y on a set of The ε -insensitive loss function is the most widely used
independent variables x. It assumes, like other cost function (Müller et al., 1997) . The function is in
regression problems, that the relationship between the the form:
independent and dependent variables is given by a
deterministic function f(x). Considering a set of training ⎧ f ( x) − y − ε , for f ( x) − y ≥ ε
Γ ( f (x ) − y ) = ⎨ (6)
data {( x1 , y1 ),...., ( x l , y l )} , where each xi ⊂ R
n otherwise
⎩0
denotes the input space of the sample and has a
By solving the quadratic optimization problem in (7),
corresponding target value y i ⊂ R for i=1,…, l where the regression risk in equation (2) and the ε -
l corresponds to the size of the training data (Vapnik, insensitive loss function (6) can be minimized:
1995; Müller et al., 1997). The idea of the regression
problem is to determine a function that can approximate
future values accurately. 1 l * * l *
∑ (α i − α i )(α j − α j ) k( xi , x j ) − ∑ α i ( yi −ε ) − α i ( yi + ε )
f ( x ) = ( w ⋅ Φ ( x )) + b 2 i, j = 1 i= 1
(1)
Where w , b and Φ denotes a non-linear Subject to:

transformation from R n to high dimensional space. l
The goal is to find the value of w and b such that ∑ αi − αi *

= 0, α i , α i * ∈ [0, C ] (7)
i =1
values of x can be determined by minimizing the
regression risk: The Lagrange multipliers, α i andα i , represent
*
l solutions to the above quadratic problem that act as

1
Rreg ( f ) = C ∑ Γ( f ( xi ) − yi ) +
2
w (2)
i =0 2 forces pushing predictions towards target value y i .
Where Γ(⋅) is a cost function, C is a constant and Only the non-zero values of the Lagrange multipliers
in equation (7) are useful in forecasting the regression
vector w can be written in terms of data points as:
29
www.SID.ir
Forecasting Municipal Solid waste Generation
Archive of SID
line and are known as support vectors. For all points In this research, uncertainty analysis was performed
inside the ε -tube, the Lagrange multipliers equal to as follows:
zero do not contribute to the regression function. Only Step 1: 1000 times randomly rearranged the database
without replacement since the ratio between the
if the requirement f ( x) − y ≥ ε is fulfilled, Lagrange
training and validation sets was kept fix. Thus, 1000
multipliers may be non-zero values and used as support weekly MSWG series were generated,
vectors. Step 2: 1000 different results for each forecasted weekly
The constant C introduced in equation (2) determines MSWG were obtained by SVM and PLS-SVM models.
penalties to estimation errors. A large C assigns higher Step 3: the resulting statistical performances (mean,
penalties to errors so that the regression is trained to median and variance) were collected, tabulated and
minimize error with lower generalization while a small C their distributions were plotted. It is to be noted that
assigns fewer penalties to errors; this allows the only the 95% confidence intervals of estimation are
minimization of margin with errors, thus higher reported in this study due to the fact that confidence
generalization ability. If C goes to infinitely large, SVR intervals provide more information than other statistical
would not allow the occurrence of any error and result values about the range of prediction associated with
in a complex model, whereas when C goes to zero, the the model. The 95% con fidence intervals are
result would tolerate a large amount of errors and the determined by finding the 2.5th and 97.5th percentiles
model would be less complex. of the constructed distribution.
Now, the value of w is solved in terms of the Lagrange
Suppose the current time is t, y(t+l) for the future
multipliers. For the variable b , it can be computed by time t+l is predicted with the knowledge of the value
applying Karush-Kuhn-Tucker (KKT) conditions y(t-n), y(t-n+1),…, y(t) for past time t-n, t-n+1, …, t,
which, in this case, implies that the product of the respectively. The prediction function is expressed as:
Lagrange multipliers and constrains has to equal zero:
y(t+l) = f(t, y(t), y(t-1), …, y(t-n)) (11)
α i (ε + ζ i − yi + (w, x i ) + b ) = 0
(8)
α i (ε + ζ i + y i − (w, xi ) − b ) = 0
* * As discussed above, in this study, next week waste
generation forecasted by 12 previous weekly waste
generation time series. Relative Mean Errors (RME),
and Root Mean Squared Errors (RMSE), Mean Absolute
Relative Error (MARE) and coefficient of determination
(C − α i )ζ i = 0 (9) (R2) are applied as performance indices.
( C − α i )ζ i = 0
* *
n
∑Y
1 (12)
RME = − Yi
*
i
n i =1
Where ζ i and ζ i * are slack variables used to measure
errors outside the ε -tube. Since α i , α i * = 0 and ζ i * = 0 1 n
∑ Y i − Y i*
2
RMSE = (13)
n i =1
for α i ∈ (0, C ), b can be computed as follows:
*
b = y i − (w , x i ) − ε for α i ∈ ( 0 , C ) 1 n Yi − Yi *
n∑
(10) MARE = (14)
b = y i − (w , x i ) + ε for α i ∈ ( 0, C )
*
i =1 Yi
Putting it all together, SVM can be used without

knowing the transformation. n
∑ (Y − Y
* 2
i i )
The Monte Carlo method is just one of many
R = 1− i =1 (15)
2
n
methods for analyzing uncertainty propagation, where
the goal is to determine how random variation, lack of ∑ (Yi − Yi ) 2
i =1
knowledge, or error affects the sensitivity, performance,
or reliability of the system that is being modeled. Monte
Carlo method is a technique that involves repeatedly Where Yi is the observation value, Yi is the average
forming a random vector of parameters from prescribed value and Yi * is the predicted value.
probability distributions, evaluating the function, and
then computing the statistics of the evaluated function.
30
www.SID.ir
Archive of SID
RESULTS & DISCUSSION values of C and ε for a specified γ were obtained and
A kernel function has to be selected from the then γ was changed. Second, after identifying a
qualified functions. Radial basis function was applied promising region, this region is searched in more detail.
due to its benefits over other kernel functions (Han, The test set is used as an independent set to calculate
2004; Dibike et al., 2001). Additionally, many works in the final prediction error. Furthermore, the test error is
modeling and forecasting have demonstrated the not used to select the optimal model but its size is
successful application of the radial basis function in compared to test set errors with other settings to
support vector regression (Liong and Sivapragasam, identify possible overtraining.
2002; Choy and Chan, 2003; Yu, 2004). The SVM RME, RMSE, MARE and R2 were used to find optimums.
parameters (C capacity, ε and γ kernel-specific The optimal parameters (C, ε, γ) = (150, 0.001, 0.6) were
parameter) are interdependent, and their (near) optimal obtained at RME=1467, RMSE=2070, MARE=0.027 and
values are often obtained by a trial and error method. R2=0.761. Fig. 1shows this optimal value.
Optimization of these parameters has been done by a
systematic grid search of the parameters using leave- 75 percentages of total inputs were used for train
one-out cross-validation on the training set. In this and rest of them was used for test. Observations
grid search, first, a broad range of parameters settings mapped via predictions in Fig. 2. Fig. 3 illustrate
are investigated with large steps. Here, optimized coefficient of determination in training and test stage.
2600
2000
2500
1900
RM SE (ton)
RME (ton)
1800 2400
1700 2300
1600 2200
1500
2100
1400
0 0.5 1 1.5 2 2.5 2000
0 0.5 1 1.5 2 2.5
Gamma value
Gamma value
0.037
0.78
0.035 0.76
C oefficient of determination
0.74
0.033 0.72
MARE
0.031 0.7
0.68
0.029 0.66
0.64
0.027
0.62
0.025 0.6
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
Gamma value Gamma value
Fig. 1. Statistics indexes via ã values to find SVM optimums a) RME, b) RMSE, c) MARE and d) R2
31 www.SID.ir
Abbasi, M. et al.
Archive of SID
Original MSWG Predicted by SVM
63000
58000
MSWG (ton)
53000
48000
43000
0 20 40 60 80 100 120 140
Time (Week)
Fig. 2. Forecasting results of MSWGs by SVM model
63000
R2 = 0.954
Predicted by SVM (ton)
58000
53000
48000
43000
43000 48000 53000 58000 63000
Original MSWG (ton)
63000
R2 = 0.761
Predicted by SVM (ton)
58000
53000
48000
43000
43000 48000 53000 58000 63000
Original MSWG (ton)
Fig. 3. Observations via predictions by the SVM model during a) train and b) test stages
32
www.SID.ir
Archive of SID
As seen in Fig. 3, SVM could forecast MSW To find optimum number of components, V-fold
generation with coefficient of determination 0.761 and cross validation was applied. Here V set to 10. Then,
coincidence between observations and predictions are PLS has built four components which eigenvalues of
acceptable. It means SVM have a good ability for MSW the components are illustrated in Table 1.
generation prediction. X-scores of these six components were replaced to
original data. It is noted, the components evolve 98%
In PLS method, number of components (NCs)
variance.
should be determined properly. Any method used for
The PLS-SVM optimal parameters were found by
determining NCs should take into account not only the
similar procedure to SVM pointed at (C, ε, γ) = (215,
goodness of fit but also the complexity taken to achieve
0.125, 0.077) as seen in Fig. 4. In addition, the model
that fit. In other words, when building model
results and coefficient of determination for training
components, a balance between NCs and the ability to
and test stages of the model is shown in Fig. 5 and 6
accurately predict data should be considered. A model
respectively.
with an insufficient NC cannot predict the data
accurately enough. A model with too many components Table 1. Eigenvalues of PLS components
has more components that it needs in predicting the
data. V-fold cross validation method can actually find
Eigenvalue % Total variance
Component No.
the optimal NC which completely described by Stone
and Brooks (Stone and Brooks, 1990).
1 4.759 4.759
Here, X scores produced by the plsregress function 2 2.246 2.246
in the Statistics Toolbox of MATLAB (R2009) were 3 0.793 0.793
used to predict MSWGs. 4 0.508 0.508
2400 3300
2200 3100
2900
2000
RME (ton)
2700
RMS E (ton)
1800
2500
1600
2300
1400 2100
1200 1900
1000 1700
0 0.5 1 1.5 2 2.5 1500
Gamma value 0 0.5 1 1.5 2 2.5
Gamma value
0.05
0.9
0.045 0.85
Coefficient of determination
0.04 0.8
0.75
MARE
0.035
0.7
0.03 0.65
0.6
0.025
0.55
0.02
0.5
0 0.5 1 1.5 2 2.5
0 0.5 1 1.5 2 2.5
Gamma value
Gamma value
Fig. 4. Statistics indexes via γ values to find PLS-SVM optimums a) RME, b) RMSE, c) MARE and d) R2
www.SID.ir
33
Archive of SID
Original MSWG Predicted by SVM
63000
58000
MSWG (ton)
53000
48000
43000
0 20 40 60 80 100 120 140
Time (Week)
Fig. 5. Forecasting results of MSWGs by PLS-SVM model
63000 R2 = 0.986
Predicted by PLS-SVM (ton)
58000
53000
48000
43000
43000 48000 53000 58000 63000
Original MSWG (ton)
63000
R 2 = 0.869
Predicted by PLS-SVM (ton)
58000
53000
48000
43000
43000 480 00 53000 58000 6 3000
Original MSWG (ton)

Fig. 6. Observations via predictions by the PLS-SVM model during a) train and b) test stages
34
www.SID.ir
Archive of SID
To compare results of two SVM and PLS-SVM the upper and the lower 95PPU (or the degree of
models, discussed statistics indces were used which uncertainty) determined from Eq. (17).
illustrated in Table 2. As seen, PLS-SVM resulted in
fewer errors and higher coefficient of determination 1 k
than SVM. Moreover, computation for PLS-SVM took dx = ∑ ( XU − XL )
k l =1
(16)
less time than SVM model. The PLS-SVM method
produced acceptable results but error of both models
still remains considerable. Complexity of the MSW where k is the number of observed data points. The
management systems and many factors, which affected best outcome is that 100% of the measurements are
dx
MSW generation cause these errors. However, PLS-
bracketed by the 95PPU, and is close to zero.
SVM model achieve simpler model and faster training
speed. Obviously, the reduction of the input vector However, because of model uncertainty, the ideal
dimensions is resulted in the reduction of SVM size values will generally not be achieved. A reasonable
and shortening the SVM training periods. Therefore, measure for, is calculated by the d-factor expressed as:
PLS-SVM model could be a better predictive model
than SVM. Alternatively, the proposed model can be dx
implemented as MSW generation annually or monthly. d − factor = (17)
σx
The uncertainty in the estimates of the observed
and predicted weekly MSWG during the training and where σx is the standard deviation of the measured
test stages has been quantified by estimating the variable X. A value of <1 is a desirable measure for the
confidence intervals of the simulation results. In this d-factor.
research, the 95 percent prediction uncertainties Plot of confidence intervals of SVM and PLS-SVM for
(95PPU) were calculated for forecasting models. This Tehran are shown in Fig. 7a and (b) during the training
is calculated by the 2.5th (X L) and 97.5th (X U) and testing stages. Moreover, Table 3 summarizes
percentiles of the cumulative distribution of every result analysis of 1000 simulations for Tehran case
simulated point. The goodness of fit is, therefore, study.
assessed by the uncertainty measures calculated from
the percentage of measured data bracketed by the As obviously seen in Fig. 7a and (b), both models
dx
well predicted the changes of MSWG in Tehran. For

95PPU band, and the average distance between SVM model, the magnitudes MSWG, except at peaks,
Table 2. Comparison of SVM and PLS-SVM models during testing and training periods
SVM PLS-SVM
Estimator Train Test Train Test

RME 677 1467 316 1139
RMSE 935 2070 499 1541
MARE 0.012 0.027 0.006 0.021
R2 0.954 0.761 0.986 0.869
Computation Time (s) 42 29
Table 3. Forecasting performance during the training and the testing stages based on averages obtained from
1000 time simulations
PLS-SVM SVM
Performance Train Test Train Test

index
dx
3 724 4377 4258 5108
d-factor 0 .33 0.46 0.41 0.57
35 www.SID.ir
Abbasi, M. et al.
Archive of SID
70000 original MSWG

Train
U pper co nfid ence interval
65000
Lower confidence interval
60000
MSWG (Ton)
55000
50000
45000
40000
0 20 40 60 80 100 120 140
Time (Week)
70000 or iginal M SW G
T rain
Uppe r conf ide nce inter va l
65000 Lowe r confide nc e inte rva l
MSWG (Ton)
60000
55000
50000
45000
40000
0 20 40 60 80 100 120 140
Time (W eek )
Fig. 7. the estimates of MSWG during the training and test stages for 1000 simulations by a) SVM and b)
PLS-SVM
were estimated closer to the observed data. However by the 95PPU. Because PLS-SVM creates lower bound,
PLS-SVM estimations were closer to the observed data numbers of observed data located on confidence bound
even at peaks. A similar trend was also found at the have no change. However, uncertainty of this model is
testing stage although the magnitude of uncertainty more reasonable on 95PPU than SVM. This shows that
in the training stage was lower than that of the testing PLS-SVM had more robustness, and had a lower
stage and the magnitudes of lower and upper sensitivity to change of input variables than SVM.
confidence bounds are estimated closer to the observe
data in the training stage. CONCLUSION
In addition, it was found from Table 3 the PLS- Support vector machine and support vector regression
SVM had lower uncertainty during training and test have demonstrated their success in time-series analysis
stages. Higher value of d-factor of SVM shows that and statistical learning. However, little work has been
this model is more sensible than PLS-SVM model to done in waste management also in forecasting waste
training data. Consequently, the wide of 95PPU bond, generation. In this paper, the feasibility of applying
dx
support vector regression and data reduction were

e.g. , for PLS-SVM is smaller than for SVM. For examined for waste generation time series prediction.
both models, 50% of the measurements were bracketed After numerous experiments, a set of SVM parameters
36 www.SID.ir
Archive of SID
proposed that can predict MSW generation time series Chen, J. Huang,W., Han, J. and Cao, Sh. (2011). The
very well. The results show that the SVM predictor Characterization and Application of Biological Remediation
significantly outperforms the other baseline predictors. Technology for Organic Contaminants, Int. J. Environ. Res.,
This evidences the applicability of support vector 5 (2), 515-530.
regression for forecasting MSW generation. Also, PLS Choy, K. Y. and Chan, C. W. (2003). Modelling of river
minimized input data and decreased error of final discharges and rainfall using radial basis function networks
modeling. Consequently, training finished very quickly based on support vector regression. International Journal of
with acceptable correlation. Combination of SVM with Systems Science, 34 (14-15), 763-773.
PLS produced a suitable model for MSW generation in Corcoran, J., Wilson, I. and Ware, J. (2003). Sparse support
large cities like Tehran. vector regression based on orthogonal forward selection for
the generalised kernel model. International Journal of
Meanwhile, the uncertainty associated with the Forecasting, 19, 623-634.
estimation of MSWG was estimated by Monte Carlo
Damghani, A. M., Savarypour, G., Zand, E. and Deihimfard,
simulations. The simulation results using 95%
R. (2008). Municipal solid waste management in Tehran:
confidence intervals indicated that estimations of the Current practices, opportunities and challenges. Waste
SVM and PLS-SVM models were closer to the observed Management, 28 (5), 929-934.
data except in peak points. However PLS-SVM had
more robustness, and had a lower sensitivity to change Dibike, Y. B., Velickov, S., Solomatine, D. and Abbott, M.
of input variables than SVM. B. (2001). Model induction with support vector machines,
Introduction and applications. Journal of Computing in Civil
REFERENCES Engineering, 15 (3), 208-216.
Abdoli, M. A., Karbassi, A. R., Samiee-Zafarghandi, R., Dong, C. Q., Jin, B. S. and Li, D. J. (2003). Predicting the
Rashidi, Zh., Gitipour, S.and Pazoki, M. (2012). Electricity heating value of MSW with a feed forward neural network.
Generation from Leachate Treatment Plant, Int. J. Environ. Waste Management, 23 (2), 103-106.
Res., 6 (2), 493-498.
Geladi, P. and Bruce, K. (1986). Partial least-squares
Arshad, A., Hashmi, H . N. and Qureashi, I. A. (2011). regression: a tutorial. Analytica Chimica Acta, 185, 1-17.
Anaerobic Digestion of CHLOrphenolic Wastes, Int. J.
Environ. Res., 5 (1), 149-158. Grossman, D, H. J. and Mark, D. H. (1974). Waste
generation methods for solid waste collection. J. Environ.
Bayar, S., Demir, I. and Engin, G. O. (2009). Modeling Eng., 6, 1219-1230.
leaching behavior of solidified wastes using back-propagation
neural networks. Ecotoxicology and Environmental Safety, Han, D. and Cluckie, I (2004). Support vector machines
72 (3), 843-850. identification for runoff modeling. In S. Y. Liong, Phoon,
K.K., Babovic, V (ed.), Proceedings of the Sixth International
Beigl, P. and Lebersorger, S. (2009). Forecasting Municipal Conference on Hydroinformatics, Singapore, 21–24 June
Solid Waste Generation for urben and rural regions. In R. 2004 (pp. 511-520). Singapore, Proceedings of the Sixth
Cossu and R. Stegmann (eds.), twelfth International Waste International Conference on Hydroinformatics.
Management Landfill Symposium (pp. 27-38). Sardinia,
Italy, CISA Environmental Sanitary Engineering Center. Hyun, I., Borinara, P. and Hong, K. D. (2011). Geotechnical
Considerations for End-Use of Old Municipal Solid Waste
Blasco, J. A., Fueyo, N., Dopazo, C. and Ballester, J. (1998). Landfills, Int. J. Environ. Res., 5 (3), 573-584.
Modeling the temporal evolution of a reduced combustion
chemical system with an artificial neural network. Jalili, G. Z. M. and Noori, R. (2008). Prediction of municipal
Combustion and Flame, 113, 38-52. solid waste generation by use of artificial neural network: A
case study of Mashhad. International Journal of
Broomhead, D. and Lowe, D. (1998). Multivariable Environmental Research, 2 (1), 13-22.
functional interpolation and adaptive networks. Complex
Systems, 2, 321-355. Karaca, F. and Ozkaya, B. (2006). NN-LEAP, A neural
network-based model for controlling leachate flow-rate in a
Chang, N. B. and Lin, Y. T. (1997). An analysis of recycling municipal solid waste landfill site. Environmental Modelling
impacts on solid waste generation by time series intervention & Software, 21 (8), 1190-1197.
modeling. Resources Conservation and Recycling. 19 (3),
165-186. Liong, S. Y. and Sivapragasam, C. (2002). Flood stage
forecasting with support vector machines. Journal of the
Chang, N. B., Pan, Y. C. and Haung, S. D. (1993). Time American Water Resources Association, 38 (1), 173-186.
series forecasting of solid waste generation. J Resour Manage
Technol, 21, 1-10. Lorber, A., Kowalski, B. R. and Chemometrics, J. (1987). A
theoretical foundation for the PLS algorithm. Journal of
Chen, H. W. and Chang, N. B. (2000). analysis of solid Chemometrics, 1, 19-31.
waste generation based on grey fuzzy dynamic modeling
Resources, Conservation and Recycling, 29, 1-18.
37
www.SID.ir
Archive of SID
Mahmoudkhani, R., Hassani, A. H., Torabian, A. and Stone, M. and Brooks, R. J. (1990). Continuum regression:
Borghei, S. M. (2012).Study on High-strength Anaerobic cross-validated sequentially constructed prediction
Landfill Leachate Treatability By Membrane Bioreactor embracing ordinary least squares, partial least squares and
Coupled with Reverse Osmosis, Int. J. Environ. Res., 6 (1), principal components regression. Journal of the Royal
129-138. Statistical Society, 52, 237-269.
Maqbool, F., Bhatti, Z . A., Malik, A. H., Pervez, A. and Svozil, D., Kvasnicka, V. and Pospichal, J. (1997).
Mahmood, Q. (2011).Effect of Landfill Leachate on the Introduction to multilayer feed-forward neural networks.
Stream water Quality, Int. J. Environ. Res., 5 (2), 491-500. Chemometrics and Intelligent Laboratory Systems, 39, 43-
62.
Mukherjee, S., Osuna, E. and Girosi, F. (1997). Nonlinear
prediction of chaotic time series using a support vector Vapnik, V. (1995). Nature of Statistical Learning Theory.
machine. IEEE Workshop on Neural Networks and Signal Springer. New York.
Processing. Amelia Island, FL.
Tchobanoglous, G., Eliaseen, R. and Theisen, H. (1977).
Müller, K. R., Smola, A. J., Ra¨tsch, G., Schölkopf, B., Solid Waste: Engineering principles and Management. Tokyo,
Kohlmorgen, J. and Vapnik, V. (1997). Predicting Time Series McGraw Hill.
with Support Vector Machines. Proceedings of the 7th Thanh, N. P. and Matsui, Y. (2011). Municipal Solid Waste
International Conference on Artificial Neural Networks pp. Management in Vietnam: Status and the Strategic Actions,
999-1004), Springer-Verlag. Int. J. Environ. Res., 5 (2), 285-296.
Nada, W. M., Van Rensburg, L., Claassens, S., Blumenstein, Vapnik, V. (1998). Statistical Learning Theory. Wiley. New
O. and Friedrich, A. (2012).Evaluation of Organic Matter York.
Stability in Wood Compost by Chemical and
Vapnik, V., Golowich, S. and Smola, A. (1997a). Support
Thermogravimetric Analysis, Int. J. Environ. Res., 6 (2),
method for function approximation regression estimation,
425-434.
and signal processingReport, MIT Press, Cambridge, MA.
Niessen, W. and Alsobrook, A. (1972). Municipal and
Wang, X. X., Chen, S., Lowe, D. and Harris, C. J. (2006).
industrial refuse, composition and rates. Proceedings of
Artificial neural networks based on principal component
National Waste Processing Conference, pp. 112–117.
analysis input selection for quantification in overlapped
Noori, R., Abdoli, M. A., Farokhnia, A. and Abbasi, M. capillary electrophoresis peaks.Chemom. Intell. Lab. Syst.,
(2009). Results uncertainty of solid waste generation 82, 165-175.
forecasting by hybrid of wavelet transform-ANFIS and
Wold, H. (1966). Estimation of principal components and
wavelet transform-neural network. Expert Systems with
related models by iterative least squares. In P. R. Krishnaiaah
Applications, 36 (6), 9991-9999.
(ed.), Multivariate analysis pp. 391-420). New York,
Nouri, N., Poorhashemi, S. A., Monavari, S.M., Dabiri, F. Academic Press.
and Hassani, A. H. (2011). Legal Criteria and Executive
Wold, S., Ruhe, A., Wold, H. and Dunn, W. (1984). The
Standards of Solid Waste Disposal Subjected to Solid Waste
collinearity problem in linear regression, the partial least
Management Act, Int. J. Environ. Res., 5 (4), 971-980.
squares (PLS) approach to generalized inverses. Journal of
Rashidi, Zh., Karbassi, A. R., Ataei, A., Ifaei, P., Samiee- Statistics Computation, 5, 735-743.
Zafarghandi, R. and Mohammadizadeh, M. J. (2012). Power
Yu, X., Liong, S.-Y. and Babovic, V. (2004). EC-SVM
Plant Design Using Gas Produced By Waste Leachate
approach for realtime hydrologic forecasting. Journal of
Treatment Plant, Int. J. Environ. Res., 6 (4), 875-882.
Hydroinformatics, 6 (3), 209-223.
Safari, E., Jalili Ghazizade, M., Shokouh, A. and Nabi
Zhang, Y. X., Li, H., Hou, A. X. and Havel, J. (2006).
Bidhendi, Gh. R. (2011). Anaerobic Removal of COD from
Artificial neural networks based on principal component
High Strength Fresh and Partially Stabilized Leachates and
analysis input selection for quantification in overlapped
Application of Multi stage Kinetic Model, Int. J. Environ.
capillary electrophoresis peaks. Chemometrics and
Res., 5 (2), 255-270.
Intelligent Laboratory Systems, 82 (1-2), 165-175.
Saikat, M. and Jun, Y. (2008). Principle Component Analysis
Zhang, Y. X. (2007). Artificial neural networks based on
and Partial Least Squares: Two Dimension Reduction
principal component analysis input selection for clinical
Techniques for Regression. Casualty Actuarial Society.
pattern recognition analysis. Talanta, 73 (1), 68-75.
Arlington, Virginia, 79-90.
Zhu, Z. and ReVelle, C. (1993 ). A cost allocation method
Shafieiyoun, S., Ebadi, T. and Nikazar M. (2012). Treatment
for facilities siting with fixed-charge cost functions. Civil
of Landfill Leachate by Fenton Process with Nano sized
Engineering Systems, 7, 29-35.
Zero Valent Iron particles, Int. J. Environ. Res., 6 (1), 119-
128.
38
www.SID.ir

108220130104

Uploaded by

Copyright:

Available Formats

108220130104

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

108220130104

Uploaded by

Copyright:

Available Formats

Archive of SID

Int. J. Environ. Res., 7(1):27-38, Winter 2013

Forecasting Municipal Solid waste Generation by Hybrid Support Vector

Abbasi, M.*, Abduli, M.A. , Omidvar, B. , Baghvand, A.

Faculty of Environment, University of Tehran, Tehran, Iran

Received 12 Oct. 2011; Revised 14 Oct. 2012; Accepted 18 Oct. 2012

maximize covariance between T and U. The solution of l

been extracted the original values of X and Y are deflated i =1

Where w , b and Φ denotes a non-linear Subject to:

The goal is to find the value of w and b such that ∑ αi − αi *

l solutions to the above quadratic problem that act as

Putting it all together, SVM can be used without

Original MSWG Predicted by SVM

Original MSWG Predicted by SVM

Original MSWG (ton)

well predicted the changes of MSWG in Tehran. For

Estimator Train Test Train Test

Performance Train Test Train Test

3 724 4377 4258 5108

d-factor 0 .33 0.46 0.41 0.57

70000 original MSWG

support vector regression and data reduction were

You might also like