Understanding LSTMs Without the Math

A simple memory system explained using 3 rules

Introduction

When I first learned about LSTMs (Long Short-Term Memory networks), they felt overwhelming.

Equations. Gates. Sigmoid. Tanh.

But everything clicked when I saw a simplified version of an LSTM - one that uses just 3 simple rules.

No heavy math. Just logic. Let’s break it down.

A Simple Memory System (Toy Version)

We are given:

An input vector at each time step: [x1,x2,x3]
A memory state "aₜ"
An output "yₜ"

The 3 Rules (Core Idea)

This system works using just three conditions:

If x2 = 1 → add x1 to memory
If x2 = -1 → reset memory to 0
If x3 = 1 → output current memory

That’s it.

Think Like This

Imagine memory as a running total:

You add values
You can reset it
You can print it when needed

Example

Let’s take a short sequence:

Time	x1	x2	x3
1	3	1	0
2	5	0	1
3	2	1	0
4	3	0	1

From Numbers to Words: How This Relates to Real Sentences

So far, we’ve been working with numbers.

But in real NLP tasks, the inputs are not numbers like 3 or 5.

They are words such as:

["I", "love", "France"]

What Changes in a Real LSTM?

In a real LSTM, each word is first converted into a numeric representation called an embedding.

So instead of:

$$x_1 = 3$$

we now have:

$$x_t = \text{embedding}(\text{"France"})$$

The toy version stores numbers
A real LSTM stores meaning

Our Toy Version vs Real NLP

Toy Version	Real NLP Version
`x1` = a number like 3 or 5	xₜ = embedding of a word like “I”, “love”, “France”
`x2` decides whether to store or reset	learned gates decide what context to keep or forget
`x3` decides whether to output	learned output gate decides what hidden information to expose
memory stores a running value	memory stores context from the sentence

Toy Input <> Sentence Token

Time	Toy Input	Sentence Token	Intuition
1	`[3,1,0]`	`"I"`	low importance
2	`[5,1,0]`	`"love"`	useful context
3	`[2,1,1]`	`"France"`	important context, may influence output

A Simple Comparison

In our Toy version example

At time step 1:

[3, 1, 0]

This means:

take value 3
store it in memory
do not output yet

Real sentence example

At time step 1, suppose the input word is:

"France"

A real LSTM does not literally see the string "France".

It sees a vector representation of that word and learns whether it should:

store that word’s meaning in memory
forget previous context
use that memory later for prediction

So conceptually, this:

[3, 1, 0]

is playing the same role as:

["France", store, don’t output yet]

Intuition

In the toy version, we add numbers to memory.

In a real LSTM, we add meaning to memory.

One-line takeaway

The toy model stores numbers.
A real LSTM stores context.

What’s Actually Happening?

Even though this looks simple… This is mimicking how an LSTM works internally.

Mapping This to Real LSTM

A real LSTM has three gates:

Forget Gate
Input Gate
Output Gate

Let’s map our rules

Memory Update

To understand how this simplified system relates to a real LSTM, let’s look at how memory is updated.

Real LSTM:

$$c_t = f_t \cdot c_{t-1} + i_t \cdot \tilde{c}_t$$

In our version:

$$a_t = f_t \cdot a_{t-1} + i_t \cdot x_1$$

Gate Mapping

Our Rule	LSTM Equivalent
x₂ = 1 → add	Input Gate (iₜ = 1)
x₂ = -1 → reset	Forget Gate (fₜ = 0)
x₃ = 1 → output	Output Gate (oₜ = 1)

Output

Real LSTM:

$$h_t = o_t \cdot \tanh(c_t)$$

Simplified version:

$$y_t = o_t \cdot a_t$$

(No tanh — direct output)

Key Insight

LSTM is not magic. It’s just:

A memory system controlled by decisions

Should I remember something?
Should I forget something?
Should I show the result?

Why This Matters

Understanding this simplified version helps you:

Grasp LSTMs without getting lost in math
Build intuition before diving into deep learning
See how neural networks simulate logical systems

Remember

Once you see LSTM like this… You stop memorizing formulas and start understanding behavior.

This was one of those moments where a complex concept suddenly felt simple.

If you’re starting with AI/ML, focus on intuition first.
The math will make much more sense later.

If this helped you understand LSTMs better, feel free to share or connect, always happy to learn and discuss more 🚀

Command Palette