Let’s do this…
We all know LSTM’s are super powerful; So, we should know how they work and how to use them.
The gates are defined as:
Note for simplicity we define:
Which leads to:
The final updates to the internal parameters is computed as:
Putting this all together we can begin…
Let us begin by defining out internal weights:
And now input data:
I’m using a sequence length of two here to demonstrate the unrolling over time of RNNs
From here, we can pass forward our state and output and begin the next time-step.
And since we’re done our sequence we have everything we need to begin backpropogating.
First we’ll need to compute the difference in output from the expected (label).
Note for this we’ll be using L2 Loss: . The derivate w.r.t. is .
because there are no future time-steps.
Now we can pass back our and continue on computing…
And we’re done the backward step!
Now we’ll need to update our internal parameters according to whatever solving algorithm you’ve chosen. I’m going to use a simple Stochastic Gradient Descent (SGD) update with learning rate: .
We’ll need to compute how much our weights are going to change by:
And updating out parameters based on the SGD update function: we get our new weight set:
And that completes one iteration of solving an LSTM cell!
Of course, this whole process is sequential in nature and a small error will render all subsequent calculations useless, so if you catch ANYTHING email me at firstname.lastname@example.org