Recurrent Neural Networks Virtual Lab

Explore sequential data processing with RNNs, LSTMs, and GRUs

🎯

🌟 Enhanced Training

Full RNN/LSTM/GRU training

🌦️

Weather Pattern Recognition

Predict weather using RNN

🔗

Sequence Processing

How RNNs handle sequential data

🧮

LSTM Architecture

Long Short-Term Memory cells

⚙️

GRU Comparison

Gated Recurrent Units vs LSTM

📉

Vanishing Gradient

Understanding the problem

📈

Time Series Prediction

Forecasting with RNNs

🔄 Enhanced RNN Training - Recurrent Neural Networks

Train RNN, LSTM, or GRU networks with full control over hyperparameters and watch sequence learning in real-time.

⚙️ Hyperparameters

Recurrent cell architecture

Controls convergence speed

Hidden state dimension

Input sequence length

Training iterations

Sequences per batch

Prevents overfitting

🎮 Training Controls

Epoch 0 / 500.0% Complete

Train Loss

0

Val Loss

0

Train Perplexity

0

Val Perplexity

0

Gradient Norm

0

📊 Training Metrics

Loss Over Epochs

Perplexity Over Epochs

Gradient Norm Over Epochs

LSTM Gate Activations (Last 10 Epochs)

📋 Training History

EpochTrain LossVal LossTrain PPLVal PPLGrad Norm

📐 LSTM Equations

Forget Gate:

ft=σ(Wf[ht1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

Input Gate:

it=σ(Wi[ht1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)

Cell State Update:

Ct=ftCt1+ittanh(WC[ht1,xt]+bC)C_t = f_t * C_{t-1} + i_t * \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)

Output Gate:

ot=σ(Wo[ht1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)

Hidden State:

ht=ottanh(Ct)h_t = o_t * \tanh(C_t)

Cross-Entropy Loss:

L=1Tt=1TlogP(ytx1,...,xt)\mathcal{L} = -\frac{1}{T} \sum_{t=1}^{T} \log P(y_t | x_1, ..., x_t)

Perplexity:

PPL=exp(L)\text{PPL} = \exp(\mathcal{L})

💡 Understanding LSTM

  • LSTM: Uses gates to control information flow, solves vanishing gradient
  • Forget Gate: Decides what to forget from cell state
  • Input Gate: Decides what new information to store
  • Output Gate: Decides what to output from cell state
  • Use Case: Long sequences, complex dependencies
  • Perplexity: Lower is better, measures prediction uncertainty
  • Gradient Norm: Monitors gradient flow, helps detect vanishing/exploding gradients