Demystifying Neural Networks
All you need is a piece of paper and a pencil
“If you want to change the world, pick up a pen and write.” — Martin Luther
I will explain what neural networks do with a real simple example.
I will use this basic neural network architecture:
I will initialize weights, biases, training inputs and output with these following values:
What this is all about is adjusting the values of the weights w1, w2, w3, w4, w5 and w6 to obtain the output we want o = 1 for the given inputs x1 = .10 and x2 = .20.
Hidden Neurons h1 and h2, and Output Neuron o are divided into two different functions: Net (linear regression function) and Act (activation function). I will be using the logistic function as the activation function for all neurons.
These are the formulas for both functions:
Net = w * x + b –linear regression function
Act = Sigmoid (Net) –logistic function 1 / (1 + exp (-x))
Fase I: Fordward or Computing Output
Let’s compute our first output using initialization values:
Neth1 = w1 * x1 + w2 * x2 + b1 –linear regression function
Acth1 = Sigmoid(Neth1) –logistic function
Neth2 = w3 * x1 + w4 * x2 + b1
Acth2 = Sigmoid(Neth2)
Neto = w5 * Acth1 + w6 * Acth2 + b2
Acto = Sigmoid(Neto2)
Sigmoid is the logistic function to squash the output –> 1 / (1 + exp (-x))
Neth1 = .30 * .10 + .40 * .20 + .15
Acth1 = Sigmoid(.26)
Acth1 = .5646363
Neth2 = .50 * .10 + .60 * .20 + .15
Acth2 = Sigmoid(.32)
Acth2 = .5793242
Neto = .70 * .5646363 + .80 * .5793242 + .25
Acto = Sigmoid(1.1087048)
Acto = .7518875 (vs 1.00 target)
Fase II: Computing the Cost
I will be using the popular squared error function for calculating the Total Cost.
Etotal = .5 * (target — output)**2
Etotal = .5 * (1 — .7518875)**2
Etotal = .0307798
Fase III: Backpropagation for Output Layer
We use partial derivatives of Total Error with respect to each weight in order to know how much a change in each weight affects the Total Error.
∂Etotal/∂w5 = ∂Etotal/∂Acto * ∂Acto/∂Neto * ∂Neto/∂w5 –chain rule
∂Etotal/∂Acto = .5 * 2 * (target — output) * -1
∂Etotal/∂Acto = .5 * 2 * (1 — .7518875) * -1
∂Etotal/∂Acto = — .24811243085
∂Acto/∂Neto = output * (1 — output)
∂Acto/∂Neto = .7518875 * (1 — .7518875)
∂Acto/∂Neto = .1865526
∂Neto/∂w5 = Acth1
∂Neto/∂w5 = .5646363
∂Etotal/∂w5 = -.2481124 * .1865526 * .5646363
∂Etotal/∂w5 = -.0261347
In order to decrease the error we need to subtract this value from the current weight multiplied by some learning rate which I will set to 0.5.
w5+ = w5 — α * ∂Etotal/∂w5
w5+ = .70 — .5 * -.0261347
w5+ = .7130673
Repeating this process we get the new weight w6:
w6+ = .8134073
Fase IV: Backpropagation for Hidden Layer
∂Etotal/∂w1 = ∂Etotal/∂Acth1 * ∂Acth1/∂Neth1 * ∂Neth1/∂w1 –chain rule
∂Etotal/∂Acth1 = ∂Etotal/∂Neto * ∂Neto/∂Acth1
∂Etotal/∂Neto = ∂Etotal/∂Acto * ∂Acto/∂Neto
∂Etotal/∂Neto = — .2481124 * .1865526
∂Etotal/∂Neto = — .0462869
∂Etotal/∂Acth1 = — .0462869 * ∂Neto/∂Acth1
∂Etotal/∂Acth1 = — .0462869 * w5
∂Etotal/∂Acth1 = — .0462869 * .70
∂Etotal/∂Acth1 = — .0324002
∂Acth1/∂Neth1 = Acth1 * (1- Acth1)
∂Acth1/∂Neth1 = .5646363 * (1- .5646363)
∂Acth1/∂Neth1 = .2458221
∂Neth1/∂w1 = x1
∂Neth1/∂w1 = .10
∂Etotal/∂w1 = — .0324002 * .2458221 * .10
∂Etotal/∂w1 = — .0007964
w1+ = w1 — α * ∂Etotal/∂w1
w1+ = .30 — .5 * — .0007964
w1+ = .3003982
Repeating this process we get the new weights w2, w3 and w4:
w2+ = .4007964
w3+ = .5004512
w4+ = .6009024
Now it's time to go back to Fase I and compute our new output and new total error using the new weights:
Acto = .7547169 (vs 1.00 target)
Etotal = .0300818
You might think this is not a big change, but by repeating this 5.000 times you will get the following results:
Acto = .9830802 (vs 1.00 target)
Etotal = .0001431
Neural networks are no longer a mystery!