Build a Random Forest tree-by-tree with full control over hyperparameters and watch ensemble performance improve.
Min samples to split node
Features per split
Split quality measure
Sample with replacement
🎮 Training Controls
📊 Training Metrics
OOB Error vs Number of Trees
Accuracy vs Number of Trees
Feature Importance (Tree 0)
📋 Training History
| Tree # | OOB Error | Train Acc | Val Acc | Depth | Leaves |
|---|
📐 Random Forest Equations
Gini Impurity:
Gini(D)=1−i=1∑Cpi2 where p_i is the proportion of class i in dataset D
Information Gain:
IG(D,A)=Impurity(D)−v∈Values(A)∑∣D∣∣Dv∣Impurity(Dv) Out-of-Bag (OOB) Error:
OOB Error=N1i=1∑N1[y^iOOB=yi] Estimated using samples not in bootstrap sample (~37% of data)
Ensemble Prediction (Classification):
y^=mode{h1(x),h2(x),...,hT(x)} Majority vote from T trees
Feature Importance:
Importance(f)=T1t=1∑Tn∈splits on f∑ΔImpurityn 💡 Understanding Random Forest
- • Ensemble Method: Combines multiple decision trees for better predictions
- • Bootstrap Aggregating (Bagging): Each tree trained on random sample with replacement
- • Random Feature Selection: Each split considers random subset of features
- • OOB Error: Built-in validation using out-of-bag samples (~37% per tree)
- • Feature Importance: Measures how much each feature contributes to predictions
- • Bias-Variance Tradeoff: More trees reduce variance without increasing bias
- • Advantages: Robust to overfitting, handles missing data, works with mixed data types
- • Use Cases: Classification, regression, feature selection, outlier detection