Recurrent Neural Networks Virtual Lab

Explore sequential data processing with RNNs, LSTMs, and GRUs

🎯

🌟 Enhanced Training

Full RNN/LSTM/GRU training

🌦️

Weather Pattern Recognition

Predict weather using RNN

🔗

Sequence Processing

How RNNs handle sequential data

🧮

LSTM Architecture

Long Short-Term Memory cells

⚙️

GRU Comparison

Gated Recurrent Units vs LSTM

📉

Vanishing Gradient

Understanding the problem

📈

Time Series Prediction

Forecasting with RNNs

🔄 Enhanced RNN Training - Recurrent Neural Networks

Train RNN, LSTM, or GRU networks with full control over hyperparameters and watch sequence learning in real-time.

⚙️ Hyperparameters

Cell Type: LSTM

Recurrent cell architecture

Learning Rate: 0.001

Controls convergence speed

Hidden Size: 128

Hidden state dimension

Sequence Length: 20

Input sequence length

Epochs: 50

Training iterations

Batch Size: 32

Sequences per batch

Dropout: 0.2

Prevents overfitting

🎮 Training Controls

Speed:

Epoch 0 / 500.0% Complete

Train Loss

Val Loss

Train Perplexity

Val Perplexity

Gradient Norm

📊 Training Metrics

Loss Over Epochs

Perplexity Over Epochs

Gradient Norm Over Epochs

LSTM Gate Activations (Last 10 Epochs)

📋 Training History

Epoch	Train Loss	Val Loss	Train PPL	Val PPL	Grad Norm

📐 LSTM Equations

Forget Gate:

f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

Input Gate:

i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)

Cell State Update:

C_t = f_t * C_{t-1} + i_t * \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)

Output Gate:

o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)

Hidden State:

h_t = o_t * \tanh(C_t)

Cross-Entropy Loss:

\mathcal{L} = -\frac{1}{T} \sum_{t=1}^{T} \log P(y_t | x_1, ..., x_t)

Perplexity:

\text{PPL} = \exp(\mathcal{L})

💡 Understanding LSTM

• LSTM: Uses gates to control information flow, solves vanishing gradient
• Forget Gate: Decides what to forget from cell state
• Input Gate: Decides what new information to store
• Output Gate: Decides what to output from cell state
• Use Case: Long sequences, complex dependencies
• Perplexity: Lower is better, measures prediction uncertainty
• Gradient Norm: Monitors gradient flow, helps detect vanishing/exploding gradients