Introduction to Random Forest

Random Forest works on classification and Regression both problem statement.

Random Forest uses N number of Decision Tree as base model and give some sample of the data to each Decision Tree to predict.

if you don’t know about the DT blogs 1,2.

for classification, Random Forest take every output of Decision Tree and do voting whichever class has highest voting it take that as output.

for Regression, Random Forest take average of every numeric output.

Why Random Forest called Random Forest and Why we need it?

Random forest adds additional randomness to the model, while growing the trees.

here, Randomness means RF gives random sample data to the DTs. that’s why there is random.

RF uses Decision Trees to prediction. if there is so many trees it called Forest. that’s why there is forest.

so, Decision Tree has problem called as Overfitting to solve that problem we use RF.

with Randomness in data and DTs, it decrease the overfitting in model.

Hyper Parameters for RF

1. n_estimators: total number of Decision Trees in RF. default = 100
2. criterion: for measuring quality of split
• {“gini”,”entropy”}, default=”gini”

3. max_features: number of features to consider when looking for the best split.

• {“auto”,”sqrt”,”log2”},int or float, default=”auto”
• If int, then consider max_features features at each split.
• If float, then max_features is a fraction and round(max_features * n_features) features are considered at each split.
• If “auto”, then max_features=sqrt(n_features).
• If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

n_feature means number of features.

Pros

reduce the overfitting of DT and helps to improve accuracy.

automates missing value present in data

normalising data is not required

Cons

require much computational power as it builds numerous trees to combine their input.

require much time as it combine so many trees.

Due to the ensemble of decision trees, it also suffers interpretability and fails to determine the significance of each variable.

Tip:- you can use different algorithm rather than DT for building base model in RF.(Wikipedia)

so, I have some question Regarding RF. if you know please let me know in comment section,

q) suppose we have classification problem(output yes/no) and in RF, we use 10 DT. if 5 DTs gave output as “yes” and another 5 DTs gave output as “No”. now, RF classifier choose which one as output ?

Thanks for reading this blog :)