Understanding LSTMs Without the Math
A simple memory system explained using 3 rules
Introduction
When I first learned about LSTMs (Long Short-Term Memory networks), they felt overwhelming.
Equations. Gates. Sigmoid. Tanh.
But everything clicked when I saw a simplified version of an LSTM - one that uses just 3 simple rules.
No heavy math. Just logic. Let’s break it down.
A Simple Memory System (Toy Version)
We are given:
An input vector at each time step:
[x1,x2,x3]A memory state "aₜ"
An output "yₜ"
The 3 Rules (Core Idea)
This system works using just three conditions:
If x2 = 1 → add x1 to memory
If x2 = -1 → reset memory to 0
If x3 = 1 → output current memory
That’s it.
Think Like This
Imagine memory as a running total:
You add values
You can reset it
You can print it when needed
Example
Let’s take a short sequence:
| Time | x1 | x2 | x3 |
|---|---|---|---|
| 1 | 3 | 1 | 0 |
| 2 | 5 | 0 | 1 |
| 3 | 2 | 1 | 0 |
| 4 | 3 | 0 | 1 |
From Numbers to Words: How This Relates to Real Sentences
So far, we’ve been working with numbers.
But in real NLP tasks, the inputs are not numbers like 3 or 5.
They are words such as:
["I", "love", "France"]
What Changes in a Real LSTM?
In a real LSTM, each word is first converted into a numeric representation called an embedding.
So instead of:
$$x_1 = 3$$
we now have:
$$x_t = \text{embedding}(\text{"France"})$$
The toy version stores numbers
A real LSTM stores meaning
Our Toy Version vs Real NLP
| Toy Version | Real NLP Version |
|---|---|
x1 = a number like 3 or 5 |
xₜ = embedding of a word like “I”, “love”, “France” |
x2 decides whether to store or reset |
learned gates decide what context to keep or forget |
x3 decides whether to output |
learned output gate decides what hidden information to expose |
| memory stores a running value | memory stores context from the sentence |
Toy Input <> Sentence Token
| Time | Toy Input | Sentence Token | Intuition |
|---|---|---|---|
| 1 | [3,1,0] |
"I" |
low importance |
| 2 | [5,1,0] |
"love" |
useful context |
| 3 | [2,1,1] |
"France" |
important context, may influence output |
A Simple Comparison
In our Toy version example
At time step 1:
[3, 1, 0]
This means:
take value
3store it in memory
do not output yet
Real sentence example
At time step 1, suppose the input word is:
"France"
A real LSTM does not literally see the string "France".
It sees a vector representation of that word and learns whether it should:
store that word’s meaning in memory
forget previous context
use that memory later for prediction
So conceptually, this:
[3, 1, 0]
is playing the same role as:
["France", store, don’t output yet]
Intuition
In the toy version, we add numbers to memory.
In a real LSTM, we add meaning to memory.
One-line takeaway
The toy model stores numbers.
A real LSTM stores context.
What’s Actually Happening?
Even though this looks simple… This is mimicking how an LSTM works internally.
Mapping This to Real LSTM
A real LSTM has three gates:
Forget Gate
Input Gate
Output Gate
Let’s map our rules
Memory Update
To understand how this simplified system relates to a real LSTM, let’s look at how memory is updated.
Real LSTM:
$$c_t = f_t \cdot c_{t-1} + i_t \cdot \tilde{c}_t$$
In our version:
$$a_t = f_t \cdot a_{t-1} + i_t \cdot x_1$$
Gate Mapping
| Our Rule | LSTM Equivalent |
|---|---|
| x₂ = 1 → add | Input Gate (iₜ = 1) |
| x₂ = -1 → reset | Forget Gate (fₜ = 0) |
| x₃ = 1 → output | Output Gate (oₜ = 1) |
Output
Real LSTM:
$$h_t = o_t \cdot \tanh(c_t)$$
Simplified version:
$$y_t = o_t \cdot a_t$$
(No tanh — direct output)
Key Insight
LSTM is not magic. It’s just:
A memory system controlled by decisions
Should I remember something?
Should I forget something?
Should I show the result?
Why This Matters
Understanding this simplified version helps you:
Grasp LSTMs without getting lost in math
Build intuition before diving into deep learning
See how neural networks simulate logical systems
Remember
Once you see LSTM like this… You stop memorizing formulas and start understanding behavior.
This was one of those moments where a complex concept suddenly felt simple.
If you’re starting with AI/ML, focus on intuition first.
The math will make much more sense later.
If this helped you understand LSTMs better, feel free to share or connect, always happy to learn and discuss more 🚀
