🌲 Random Forest Virtual Lab

Explore ensemble learning through multiple decision trees working together

🌟 Enhanced Training

Full training with hyperparameters

Forest Visualization

See how multiple trees work together

Bootstrapping

Understand random sampling with replacement

Voting Mechanism

Learn how trees vote for final prediction

Feature Importance

Discover which features matter most

🌲 Enhanced Random Forest Training - Ensemble Learning

Build a Random Forest tree-by-tree with full control over hyperparameters and watch ensemble performance improve.

⚙️ Hyperparameters

Number of Trees: 50

More trees = better performance

Max Depth: 10

Maximum tree depth

Min Samples Split: 2

Min samples to split node

Max Features: sqrt

Features per split

Criterion: gini

Split quality measure

Bootstrap: Yes

Sample with replacement

🎮 Training Controls

Speed:

Tree 0 / 500.0% Complete

OOB Error

0.0%

Train Accuracy

0.0%

Val Accuracy

0.0%

Tree Depth

Num Leaves

📊 Training Metrics

OOB Error vs Number of Trees

Accuracy vs Number of Trees

Feature Importance (Tree 0)

📋 Training History

Tree #	OOB Error	Train Acc	Val Acc	Depth	Leaves

📐 Random Forest Equations

Gini Impurity:

\text{Gini}(D) = 1 - \sum_{i=1}^{C} p_i^2

where p_i is the proportion of class i in dataset D

Information Gain:

\text{IG}(D, A) = \text{Impurity}(D) - \sum_{v \in \text{Values}(A)} \frac{|D_v|}{|D|} \text{Impurity}(D_v)

Out-of-Bag (OOB) Error:

\text{OOB Error} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{1}[\hat{y}_i^{\text{OOB}} \neq y_i]

Estimated using samples not in bootstrap sample (~37% of data)

Ensemble Prediction (Classification):

\hat{y} = \text{mode}\{h_1(x), h_2(x), ..., h_T(x)\}

Majority vote from T trees

Feature Importance:

\text{Importance}(f) = \frac{1}{T} \sum_{t=1}^{T} \sum_{n \in \text{splits on } f} \Delta \text{Impurity}_n

💡 Understanding Random Forest

• Ensemble Method: Combines multiple decision trees for better predictions
• Bootstrap Aggregating (Bagging): Each tree trained on random sample with replacement
• Random Feature Selection: Each split considers random subset of features
• OOB Error: Built-in validation using out-of-bag samples (~37% per tree)
• Feature Importance: Measures how much each feature contributes to predictions
• Bias-Variance Tradeoff: More trees reduce variance without increasing bias
• Advantages: Robust to overfitting, handles missing data, works with mixed data types
• Use Cases: Classification, regression, feature selection, outlier detection