Cat Boost
Cat Boost
Cat Boost
ISSN: 2454-132X
Impact Factor: 6.078
(Volume 9, Issue 5 - V9I5-1153)
Available online at: https://www.ijariit.com
ABSTRACT
In this paper, an ensemble learning method, in the form of a Categorical boost (Catboost) algorithm is adopted as an effective
predictive tool for envisaging values of average surface roughness and material removal rate during CNC turning operation of
C45 steel workpiece with a tungsten carbide cutting tool. In order to develop the related models, a grid with combinations of
different hyperparameters is created and tested for all the possible hyperparametric combinations of the model. The
configurations having the optimal values of the considered hyperparameters and yielding the lowest training error are finally
employed for predicting the response values in the CNC turning process. The performance of the developed models is finally
validated with the help of root mean squared percentage error. It can be observed that Catboost can be efficiently applied as a
predictive tool with excellent accuracy in machining processes.
Keywords: Catboost, LSTM, Material removal rate, Root Mean Square Error, Root Mean Squared Percentage Error (RMSPE)
1. INTRODUCTION
At some point throughout the manufacturing process, CNC machine tools are recommended for the accurate machining of all
metal components. The turning operations machining parameters depend on the machine tool, the material, the tool life, and the
operator's effectiveness. Only theoretical investigations benefit from the selection of machining variables based on the operator's
expertise and manuals. Good surface roughness and short machining times are the goals of the turning process, yet they are
incompatible. How these objectives balance out depends on the specific selection of cutting speed, feed, and depth of cut. In order
to investigate the complex interactions between the machining parameters, experiments were conducted in a CNC lathe to evaluate
performance metrics such surface roughness and machining time. Cemal Cakir et al.proposed a technique for figuring out the
amount of machining required for turning operations with the lowest production cost as the goal.[1] Lee et al. established a
relationship between the cutting speed, feed, and depth of cut with the surface roughness, cutting force, and tool life using an
adaptive modeling method.[2] S Chakraborty et al. presents the key algorithmic techniques behind Catboost, a new gradient
boosting toolkit. Their combination leads to Catboost, a new gradient boosting toolkit. Their combination leads to outperforming
other publicly available boosting implementations in terms of quality on a variety of datasets.[3]. To get improved surface quality
and surface integrity that is comparable to that produced by grinding, employ modest feed rates, fine depths of cut, and suitable
cutting instruments under dry circumstances.[4] The multi-response optimization is crucial in industrial applications. It is superior
to single-response technology optimization because all components are affected simultaneously by all input factors. To optimize the
turning process parameters on a CNC lathe with surface roughness, cutting forces, and MRR as multi-performance characteristics,
the Taguchi approach with GRA (Grey Relational Analysis) is utilized. It has been utilized successfully to create high-quality
© 2023, www.IJARIIT.com All Rights Reserved Page |145
International Journal of Advance Research, Ideas and Innovations in Technology
products at minimal cost in the fields of automotive, aerospace, etc [5] Goel et al. established an effective way for improving the
Taguchi with GRA-based HSLA steel slab milling process for multi-function features.[6] Siddiquee et al used the Taguchi approach
to conduct the tests on AISI 321 steel in order to optimize the deep drilling process parameters.[7] CatBoost employs a more
successful method. It is based on the ordering principle, which is the paper's main concept, and is motivated by online learning
algorithms that obtain training samples in a sequential sequence over time.[8,11] Gradient boosting is a powerful machine-learning
technique which produces cutting-edge outcomes in a range of real-world activities. It has long been the go-to technique for learning
issues involving diverse characteristics, noisy data, and intricate connections, such as online search, recommendation systems,
weather forecasting, and many more.[9,10]
2. EXPERIMENTAL SETUP
A 2-axis CNC lathe with a spindle rated at 7 Kw, 2800 rpm was used for the experiments. A tungsten carbide cutting tool is used to
perform the turning operation on a workpiece made of C45 steel. Various speeds between 00 and 2000 rpm with an increment of
200 rpm and feed rates of 0.1, 0.2, 0.3, and 0.4 mm/rev were used in the experiments, with a constant depth of cut of 1.5 mm.
Material removal rate, surface roughness, and tool life were the response variables in an experiment with a three-level, two-factor
factorial and three center points. The process variables and the experimental setup are given in Table 1. The machining parameters
taken into account during the design of the experiment are shown in Table 2, along with the associated material removal rate, tool
life, and surface roughness that were achieved for the input values.
Expt.No Cutting speed, Feed rate, Tool life Total material Surface
(m/min) (mm/rev) (min) removed(cm3) Roughness
(µm)
1 1570 0.1 14.41 3393.555 0.0125
2 1570 0.2 14.4 6782.4 0.05
3 1570 0.3 14.46 10215.99 0.1125
4 1570 0.4 14.46 13621.32 0.2
5 2093 0.1 14.45 4537.3 0.0125
6 2093 0.2 14.44 9068.32 0.05
7 2093 0.3 14.46 13621.32 0.1125
8 2093 0.4 14.45 18149.2 0.2
9 2617 0.1 14.46 5675.55 0.0125
10 2617 0.2 14.46 11351.1 0.05
11 2617 0.2 14.55 11421.75 0.05
12 2617 0.2 14.6 11461 0.05
13 3140 0.2 14.44 13602.48 0.05
14 3140 0.2 14.33 13498.86 0.05
15 3140 0.2 14.66 13809.72 0.05
16 3140 0.2 14.56 13715.52 0.05
17 3663 0.2 14.56 16001.44 0.05
18 3663 0.2 14.63 16078.37 0.05
19 3663 0.2 14.44 15869.56 0.05
20 3663 0.2 14.56 16001.44 0.05
21 4187 0.2 15.56 19543.36 0.05
© 2023, www.IJARIIT.com All Rights Reserved Page |146
International Journal of Advance Research, Ideas and Innovations in Technology
22 4187 0.2 16.56 20799.36 0.05
23 4187 0.2 17.56 22055.36 0.05
24 4187 0.2 18.56 23311.36 0.05
25 4710 0.2 19.56 27638.28 0.05
26 4710 0.2 20.56 29051.28 0.05
27 4710 0.2 21.56 30464.28 0.05
28 4710 0.2 22.56 31877.28 0.05
29 5233 0.2 23.56 36989.2 0.05
30 5233 0.2 24.56 38559.2 0.05
31 5233 0.2 25.56 40129.2 0.05
32 5233 0.2 26.56 41699.2 0.05
The study employs a two-step regression methodology to achieve precise predictions for crucial machining parameters—tool life,
total material removed, and surface roughness.
The equation of the two-step regression is ŷ₁ = β₀ + β₁X₁ + β₂X₂ + ... + βpXp, where ŷ₁ is the predicted value of the dependent
variable in the first stage, β₀ is the intercept, β₁ to βp are the regression coefficients for the independent variables X₁ to Xp. In the
first step, Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) networks are used to predict tool life,
capturing complex temporal relationships within the data. The LSTM model we implemented in your code is used for predicting
"Tool Life" based on the input features. The LSTM model can be described mathematically as follows: Input Sequence: Let 𝑋𝑡 be
the input feature vector at time step t. You have multiple input features in your code so 𝑋𝑡 is a vector of features at each time step t.
LSTM Cell Operations each LSTM cell processes the input 𝑋𝑡 and maintains its internal hidden state and cell state. At each time
step t, the LSTM cell performs the following operations (i) Forget Gate: It decides what information from the previous cell state
should be thrown away or kept. Let 𝑓𝑡 represent the forget gate value at time t. (ii) Input Gate: It updates the cell state with new
information. Let 𝑖𝑡 represent the input gate value at time t. (iii)Candidate Cell State: It calculates a new candidate cell state 𝐶̃𝑡 that
could be added to the cell state. This is based on the current input 𝑋𝑡 and the previous hidden state. (iv) Cell State Update: It updates
the cell state 𝐶̃𝑡 using the forget gate, input gate, and the candidate cell state. (v) Output Gate: It decides what the next hidden state
should be. Let 𝑜𝑡 represent the output gate value at time t.
In Output Layer: After processing the entire sequence of inputs, you have a final hidden state ℎ𝑡 from the last LSTM cell. You then
pass this hidden state through a Dense layer with a single unit (your output layer) to obtain the predicted "Tool Life" value. So,
© 2023, www.IJARIIT.com All Rights Reserved Page |147
International Journal of Advance Research, Ideas and Innovations in Technology
mathematically, the LSTM operations at each time step t can be summarized as:
𝑓𝑡 = 𝜎(𝑊𝑓 . [ℎ𝑡−1 , 𝑋𝑡 ] + 𝑏𝑓 )
𝑖𝑡 = 𝜎 (𝑊𝑖𝑊𝑓 . [ℎ𝑡−1 , 𝑋𝑡 ] + 𝑏𝑖 )
𝐶̃𝑡 = tanh( 𝑊𝑐 . [ℎ𝑡−1 , 𝑋𝑡 ] + 𝑏𝑐 )
𝐶𝑡 = 𝑓𝑡 . 𝐶𝑡−1 + 𝑖𝑡 . 𝐶̃
𝑜𝑡 = 𝜎(𝑊𝑜 . [ℎ𝑡−1 , 𝑋𝑡 ] + 𝑏𝑜 )
ℎ𝑡 = 𝑜𝑡 . tanh ( 𝐶𝑡 )
]
Fig 2. Long Short-Term Memory (LSTM) neural networks [12
Where 𝑊𝑓 , 𝑊𝑓 , 𝑊𝑐 , 𝑊𝑜 are weight matrices for the forget gate, input gate, candidate cell state, and output gate, respectively.𝑏𝑓 , 𝑏𝑖 ,
𝑏𝑐 𝑎𝑛𝑑 𝑏𝑜 are bias terms for the respective gates.𝜎 is the sigmoid activation function.
𝑡𝑎𝑛ℎ is the hyperbolic tangent activation function.ℎ𝑡−1 , 𝑋𝑡 represents the concatenation of the previous hidden state ℎ𝑡−1 and
the current input 𝑋𝑡 . The final predicted "Tool Life" value is obtained by passing through ℎ𝑡 the output Dense layer. This
mathematical representation captures the operations performed by the LSTM model in your code to predict "Tool Life" based on
the input features. The LSTM layer in code has 50 units, which means there are 50 LSTM cells in this layer.
In the second stage of two-step regression, the predicted values from the first stage are used as independent variables in a new
regression model to predict the dependent variable, then equation is
ŷ2 = ŷ1 + β₀ + β₁X₁ + β₂X₂ + ... + βpXp, where ŷ2 is the predicted value of the dependent variable in the second stage.
LSTM-predicted tool life values are incorporated into a CatBoost regression model in the second step, which leverages Bayesian
hyperparameter optimization and L2 regularization for fine-tuning, leading to significantly improved prediction accuracy. CatBoost
is an ensemble method based on gradient boosting with decision trees. While the mathematical equations for CatBoost are complex
due to the ensemble nature of the algorithm and the interaction with categorical features, A simplified representation of CatBoost
combines decision trees and updates predictions.
The core equation for CatBoost can be summarized as follows:
𝑁
𝑦̂𝑖 = ∑ 𝑓𝑡 (𝑋𝑖 )
𝑡=1
Where: 𝑦̂𝑖 represents the predicted target value for sample i. N is the total number of trees in the ensemble. 𝑓𝑡 (𝑋𝑖 ) represents the
prediction of the t-th decision tree for the input features 𝑋𝑖 . CatBoost builds an ensemble of decision trees, typically shallow trees
with limited depth. These trees are constructed sequentially, and each tree aims to correct the errors of the previous ones. Each
decision tree in the ensemble, denoted by 𝑓𝑡 (𝑋𝑖 ), predicts the target value for a given input 𝑋𝑖 .
For a dataset with N samples and a single decision tree, the update of the prediction 𝑓𝑡 (𝑋𝑖 ) after the t-th tree is
𝑁
The contribution 𝜔𝑖 is calculated based on how the t-th tree affects the gradient of the loss function. It depends on the loss function
used (e.g., mean squared error for regression, cross-entropy for classification) and can be more complex in practice. These individual
tree predictions are typically a real number (for regression tasks) or a probability (for classification tasks). The final prediction 𝑦̂𝑖
for a sample i is obtained by combining the predictions of all the decision trees in the ensemble. This combination can involve
simple averaging or weighted averaging, depending on the problem and hyperparameters.
The results demonstrate that this approach consistently yields Root Mean Squared Errors (RMSE) below 1 for all parameters,
showcasing the effectiveness of combining deep learning and gradient boosting techniques with advanced hyperparameter tuning
in machining predictions, offering valuable insights and applications in the manufacturing industry. A broad variety of
hyperparameters are available in CatBoost that may be adjusted to optimize the gradient boosting models. The performance of the
model may be influenced and many training process variables can be controlled. Booster parameters and learning task parameters
are the two primary types of parameters, and they are briefly detailed below.
Booster parameters:
iterations (or n- estimators): This parameter sets the number of boosting iterations or the number of trees in the ensemble.
Increasing this value may improve the model's performance, but be cautious of overfitting.
learning_rate (or eta): Learning rate controls the step size at each iteration while moving towards a minimum of the loss function.
Lower values make the learning process more robust but require more iterations.
depth (or max_depth): This parameter specifies the maximum depth of each tree in the ensemble. Deeper trees can capture more
complex patterns but may lead to overfitting.
l2_leaf_reg: This is the L2 regularization term on the weights of the leaf nodes. It helps control overfitting by penalizing large
weights.
Subsample: Subsample controls the fraction of data used for training each tree. A value less than 1.0 introduces randomness and
can help prevent overfitting.
colsample_bylevel and colsample_bynode: These parameters control the fraction of features (columns) used at each level or node
of the tree. They add more randomness to the model and can improve generalization.
loss_function: Specifies the loss function used for training. It can be set to different loss functions depending on your regression
or classification task.
Learning Parameters:
For Classification Tasks:
loss_function (default: 'Logloss'): This parameter specifies the loss function to be used for classification. Common choices include
'Logloss' (logarithmic loss, suitable for binary and multiclass classification) and 'CrossEntropy' (alternative name for 'Logloss').
eval_metric Determines the metric used for evaluating the model's performance during training and early stopping. Common
choices include 'Logloss' for binary classification, 'MultiClass' for multiclass classification, and 'AUC' (Area Under the ROC Curve).
custom_metric (default: None): Allows you to define custom evaluation metrics. You can pass a list of custom metric functions to
this parameter.
class_weights (default: None): If you have imbalanced classes, you can use this parameter to assign different weights to different
classes. It helps the model give more importance to minority classes.
For Regression Tasks:
loss_function (default: 'RMSE'): This parameter specifies the loss function for 2. regression tasks. Common choices include
'RMSE' (Root Mean Squared Error) and 'MAE' (Mean Absolute Error).
eval_metric (default: 'RMSE'): Determines the metric used for evaluating the model's performance during training and early
stopping. Common choices include 'RMSE,' 'MAE,' and 'R2' (Coefficient of Determination).
A statistical learning algorithm's test error is made up of two elements: bias and variance. The bias in a model is the inaccuracy
brought about by distinct model assumptions being oversimplified. The difference between the average forecast made by the
generated model and the actual value that it is attempting to predict may be used to describe it. A heavily biased model oversimplifies
the model and pays little attention to the training set of data. In both training and test data, it always results in larger mistakes. The
inaccuracy brought on by the training data's unpredictability is known as variance. High variance models closely scrutinize the
training data without making generalizations. In light of this, these models perform admirably on training data but may exhibit
significant error rates on test data.
4. CATBOOST AS THE VALIDATORY TOOL
In order to validate the accuracy of the Catboost algorithm for this CNC Turning algorithm statistical error estimators are considered.
i.e. Root Mean Squar Error(RMSE), Root Mean Squared Percent Error (RMSPE).
𝑛
(𝑦 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 = √∑
𝑛
𝑖=1
6. REFERENCE
[1]. Asokan P, Baskar N, Babu K, Prabhaharan G, Saravanan R (2005) Optimization of surface grinding operation using
particle swarm optimization technique. Journal of Manufacturing Science and Engineering, 127:885–892.
[2]. Cakir MC, Gurarda A (1998) Optimization and graphical representation of machining conditions in multi-pass
turning operations. Computer Integrated Manufacturing Systems, 11:157–170.
[3]. Chakraborty, Shankar, and Shibaprasad Bhattacharya. "Application of XGBoost algorithm as a predictive tool in a CNC
turning process." Reports in Mechanical Engineering 2, no. 1 (2021): 190-201.
[4]. Meyer, R., Köhler, J., and Denkena, B. Influence of the Tool Corner Radius on the Tool Wear and Process Forces
during Hard Turning. The International Journal of Advanced Manufacturing Technology. 2012, 58: 933-940.
[5]. Singh O P, Kumar G and Kumar M 2019 Role of Taguchi and grey relational method in optimization of machining
parameters of different materials: a review Acta Electronica Malaysia (AEM) 3(1) pp19-22
[6]. Goel P, Khan Z A, Siddiquee A N, Kamaruddin S and Gupta, R.K 2012 Influence of slab milling process parameters
on surface integrity of HSLA: a multi-performance characteristics optimization The International Journal of Advanced