Skip to main content

Command Palette

Search for a command to run...

Understanding LSTMs Without the Math

Updated
5 min read

A simple memory system explained using 3 rules


Introduction

When I first learned about LSTMs (Long Short-Term Memory networks), they felt overwhelming.

Equations. Gates. Sigmoid. Tanh.

But everything clicked when I saw a simplified version of an LSTM - one that uses just 3 simple rules.

No heavy math. Just logic. Let’s break it down.


A Simple Memory System (Toy Version)

We are given:

  • An input vector at each time step: [x1,x2,x3]

  • A memory state "aₜ"

  • An output "yₜ"


The 3 Rules (Core Idea)

This system works using just three conditions:

  1. If x2 = 1 → add x1 to memory

  2. If x2 = -1 → reset memory to 0

  3. If x3 = 1 → output current memory

That’s it.

Think Like This

Imagine memory as a running total:

  • You add values

  • You can reset it

  • You can print it when needed


Example

Let’s take a short sequence:

Time x1 x2 x3
1 3 1 0
2 5 0 1
3 2 1 0
4 3 0 1

From Numbers to Words: How This Relates to Real Sentences

So far, we’ve been working with numbers.

But in real NLP tasks, the inputs are not numbers like 3 or 5.

They are words such as:

["I", "love", "France"]

What Changes in a Real LSTM?

In a real LSTM, each word is first converted into a numeric representation called an embedding.

So instead of:

$$x_1 = 3$$

we now have:

$$x_t = \text{embedding}(\text{"France"})$$

  • The toy version stores numbers

  • A real LSTM stores meaning


Our Toy Version vs Real NLP

Toy Version Real NLP Version
x1 = a number like 3 or 5 xₜ = embedding of a word like “I”, “love”, “France”
x2 decides whether to store or reset learned gates decide what context to keep or forget
x3 decides whether to output learned output gate decides what hidden information to expose
memory stores a running value memory stores context from the sentence

Toy Input <> Sentence Token

Time Toy Input Sentence Token Intuition
1 [3,1,0] "I" low importance
2 [5,1,0] "love" useful context
3 [2,1,1] "France" important context, may influence output

A Simple Comparison

In our Toy version example

At time step 1:

[3, 1, 0]

This means:

  • take value 3

  • store it in memory

  • do not output yet


Real sentence example

At time step 1, suppose the input word is:

"France"

A real LSTM does not literally see the string "France".

It sees a vector representation of that word and learns whether it should:

  • store that word’s meaning in memory

  • forget previous context

  • use that memory later for prediction

So conceptually, this:

[3, 1, 0]

is playing the same role as:

["France", store, don’t output yet]

Intuition

In the toy version, we add numbers to memory.

In a real LSTM, we add meaning to memory.


One-line takeaway

The toy model stores numbers.
A real LSTM stores context.


What’s Actually Happening?

Even though this looks simple… This is mimicking how an LSTM works internally.


Mapping This to Real LSTM

A real LSTM has three gates:

  • Forget Gate

  • Input Gate

  • Output Gate

Let’s map our rules


Memory Update

To understand how this simplified system relates to a real LSTM, let’s look at how memory is updated.

Real LSTM:

$$c_t = f_t \cdot c_{t-1} + i_t \cdot \tilde{c}_t$$

In our version:

$$a_t = f_t \cdot a_{t-1} + i_t \cdot x_1$$


Gate Mapping

Our Rule LSTM Equivalent
x₂ = 1 → add Input Gate (iₜ = 1)
x₂ = -1 → reset Forget Gate (fₜ = 0)
x₃ = 1 → output Output Gate (oₜ = 1)

Output

Real LSTM:

$$h_t = o_t \cdot \tanh(c_t)$$

Simplified version:

$$y_t = o_t \cdot a_t$$

(No tanh — direct output)

Key Insight

LSTM is not magic. It’s just:

A memory system controlled by decisions

  • Should I remember something?

  • Should I forget something?

  • Should I show the result?


Why This Matters

Understanding this simplified version helps you:

  • Grasp LSTMs without getting lost in math

  • Build intuition before diving into deep learning

  • See how neural networks simulate logical systems


Remember

Once you see LSTM like this… You stop memorizing formulas and start understanding behavior.

This was one of those moments where a complex concept suddenly felt simple.

If you’re starting with AI/ML, focus on intuition first.
The math will make much more sense later.

If this helped you understand LSTMs better, feel free to share or connect, always happy to learn and discuss more 🚀

6 views