Random Forest Algorithm
Random Forest Algorithm
Random Forest Algorithm
The below diagram explains the working of the Random Forest algorithm:
o There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low correlations.
<="" li="">
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point occurs,
then based on the majority of results, the Random Forest classifier predicts the
final decision. Consider the below image:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, rand
om_state=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting the Random Forest algorithm to the training set:
Now we will fit the Random forest algorithm to the training set. To fit it, we will
import the RandomForestClassifier class from
the sklearn.ensemble library. The code is given below:
Output:
Output:
By checking the above prediction vector and test set real vector, we can
determine the incorrect predictions done by the classifier.
Output:
Output:
The above image is the visualization result for the Random Forest classifier
working with the training set result. It is very much similar to the Decision tree
classifier. Each data point corresponds to each user of the user_data, and the
purple and green regions are the prediction regions. The purple region is
classified for the users who did not purchase the SUV car, and the green region
is for the users who purchased the SUV.
So, in the Random Forest classifier, we have taken 10 trees that have
predicted Yes or NO for the Purchased variable. The classifier took the majority
of the predictions and provided the result.
The above image is the visualization result for the test set. We can check that
there is a minimum number of incorrect predictions (8) without the Overfitting
issue. We will get different results by changing the number of trees in the
classifier.