🌲 Random Forest Virtual Lab

Explore ensemble learning through multiple decision trees working together

🌟 Enhanced Training

Full training with hyperparameters

Forest Visualization

See how multiple trees work together

Bootstrapping

Understand random sampling with replacement

Voting Mechanism

Learn how trees vote for final prediction

Feature Importance

Discover which features matter most

🌲 Enhanced Random Forest Training - Ensemble Learning

Build a Random Forest tree-by-tree with full control over hyperparameters and watch ensemble performance improve.

⚙️ Hyperparameters

More trees = better performance

Maximum tree depth

Min samples to split node

Features per split

Split quality measure

Sample with replacement

🎮 Training Controls

Tree 0 / 500.0% Complete

OOB Error

0.0%

Train Accuracy

0.0%

Val Accuracy

0.0%

Tree Depth

0

Num Leaves

0

📊 Training Metrics

OOB Error vs Number of Trees

Accuracy vs Number of Trees

Feature Importance (Tree 0)

📋 Training History

Tree #OOB ErrorTrain AccVal AccDepthLeaves

📐 Random Forest Equations

Gini Impurity:

Gini(D)=1i=1Cpi2\text{Gini}(D) = 1 - \sum_{i=1}^{C} p_i^2

where p_i is the proportion of class i in dataset D

Information Gain:

IG(D,A)=Impurity(D)vValues(A)DvDImpurity(Dv)\text{IG}(D, A) = \text{Impurity}(D) - \sum_{v \in \text{Values}(A)} \frac{|D_v|}{|D|} \text{Impurity}(D_v)

Out-of-Bag (OOB) Error:

OOB Error=1Ni=1N1[y^iOOByi]\text{OOB Error} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{1}[\hat{y}_i^{\text{OOB}} \neq y_i]

Estimated using samples not in bootstrap sample (~37% of data)

Ensemble Prediction (Classification):

y^=mode{h1(x),h2(x),...,hT(x)}\hat{y} = \text{mode}\{h_1(x), h_2(x), ..., h_T(x)\}

Majority vote from T trees

Feature Importance:

Importance(f)=1Tt=1Tnsplits on fΔImpurityn\text{Importance}(f) = \frac{1}{T} \sum_{t=1}^{T} \sum_{n \in \text{splits on } f} \Delta \text{Impurity}_n

💡 Understanding Random Forest

  • Ensemble Method: Combines multiple decision trees for better predictions
  • Bootstrap Aggregating (Bagging): Each tree trained on random sample with replacement
  • Random Feature Selection: Each split considers random subset of features
  • OOB Error: Built-in validation using out-of-bag samples (~37% per tree)
  • Feature Importance: Measures how much each feature contributes to predictions
  • Bias-Variance Tradeoff: More trees reduce variance without increasing bias
  • Advantages: Robust to overfitting, handles missing data, works with mixed data types
  • Use Cases: Classification, regression, feature selection, outlier detection