Anomoly Detection - Ensemble - Classifiers
Anomoly Detection - Ensemble - Classifiers
Anomaly
Anomaly Detection
Max Weighted
Averaging
Voting Averaging
Max Voting technique
▪ Then, it automatically adjusts the weights of the data points after every decision tree.
▪ It gives more weight to incorrectly classified items to correct them for the next round.
▪ It repeats until the residual error, or the difference between actual and predicted values,
➢ Optimizes the loss function by generating base learners sequentially so that the
present base learner is always more effective than the previous one
• XGBoost uses multiple cores on the CPU so that learning can occur in parallel
during training.
➢ Ease of implementation
➢ Computational efficiency
Stacking
A technique that combines multiple machine learning algorithms via
meta-learning (algorithm learns from another learning algorithm)
Hyperparameter Tuning
Using Grid Search and
Random Search in Python
• Hyperparameter optimization is a technique that involves
searching through a range of values to find a subset of
results that achieve the best performance on a given
dataset.
• Two popular techniques used to perform hyperparameter
optimization –
• Grid
• Random search.
Grid Search
➢ We first need to define a parameter space or parameter grid, where
we include a set of possible hyperparameter values that can be used
to build the model.
➢ Used to place these hyperparameters in a matrix-like structure, and
the model is trained on every combination of hyperparameter
values.
➢ The model with the best performance is then selected.
➢ All possible values selected
1. estimator – A scikit-learn model
2. param_grid – A dictionary with parameter names as keys and lists of
parameter values.
3. scoring – The performance measure. For example, ‘r2’ for regression
models, and ‘precision’ for classification models.
4. cv – An integer that is the number of folds for K-fold cross-validation.
Random Search
• Random samples from a grid of hyperparameters instead of
conducting an exhaustive search.
• Can specify the number of total runs the random search should try
before returning the best model.
rf_random = RandomizedSearchCV(rf, rs_space,
n_iter=500, scoring='accuracy’,
n_jobs=-1, cv=3)
model_random = rf_random.fit(X,y)