Demystifying Neural Networks

Ariel Novelli

4 min readNov 3, 2020

All you need is a piece of paper and a pencil

“If you want to change the world, pick up a pen and write.” — Martin Luther

I will explain what neural networks do with a real simple example.

I will use this basic neural network architecture:

I will initialize weights, biases, training inputs and output with these following values:

What this is all about is adjusting the values of the weights w1, w2, w3, w4, w5 and w6 to obtain the output we want o = 1 for the given inputs x1 = .10 and x2 = .20.

Hidden Neurons h1 and h2, and Output Neuron o are divided into two different functions: Net (linear regression function) and Act (activation function). I will be using the logistic function as the activation function for all neurons.

These are the formulas for both functions:

Net = w * x + b –linear regression function
Act = Sigmoid (Net) –logistic function 1 / (1 + exp (-x))

Fase I: Fordward or Computing Output

Let’s compute our first output using initialization values:

Neth1 = w1 * x1 + w2 * x2 + b1 –linear regression function
Acth1 = Sigmoid(Neth1) –logistic function

Neth2 = w3 * x1 + w4 * x2 + b1
Acth2 = Sigmoid(Neth2)

Neto = w5 * Acth1 + w6 * Acth2 + b2
Acto = Sigmoid(Neto2)
Sigmoid is the logistic function to squash the output –> 1 / (1 + exp (-x))

Neth1 = .30 * .10 + .40 * .20 + .15
Acth1 = Sigmoid(.26)
Acth1 = .5646363

Neth2 = .50 * .10 + .60 * .20 + .15
Acth2 = Sigmoid(.32)
Acth2 = .5793242

Neto = .70 * .5646363 + .80 * .5793242 + .25
Acto = Sigmoid(1.1087048)
Acto = .7518875 (vs 1.00 target)

Fase II: Computing the Cost

I will be using the popular squared error function for calculating the Total Cost.

Etotal = .5 * (target — output)**2
Etotal = .5 * (1 — .7518875)**2
Etotal = .0307798

Fase III: Backpropagation for Output Layer

We use partial derivatives of Total Error with respect to each weight in order to know how much a change in each weight affects the Total Error.

∂Etotal/∂w5 = ∂Etotal/∂Acto * ∂Acto/∂Neto * ∂Neto/∂w5 –chain rule

∂Etotal/∂Acto = .5 * 2 * (target — output) * -1
∂Etotal/∂Acto = .5 * 2 * (1 — .7518875) * -1
∂Etotal/∂Acto = — .24811243085

∂Acto/∂Neto = output * (1 — output)
∂Acto/∂Neto = .7518875 * (1 — .7518875)
∂Acto/∂Neto = .1865526

∂Neto/∂w5 = Acth1
∂Neto/∂w5 = .5646363
∂Etotal/∂w5 = -.2481124 * .1865526 * .5646363
∂Etotal/∂w5 = -.0261347

In order to decrease the error we need to subtract this value from the current weight multiplied by some learning rate which I will set to 0.5.

w5+ = w5 — α * ∂Etotal/∂w5
w5+ = .70 — .5 * -.0261347
w5+ = .7130673

Repeating this process we get the new weight w6:

w6+ = .8134073

Fase IV: Backpropagation for Hidden Layer

∂Etotal/∂w1 = ∂Etotal/∂Acth1 * ∂Acth1/∂Neth1 * ∂Neth1/∂w1 –chain rule

∂Etotal/∂Acth1 = ∂Etotal/∂Neto * ∂Neto/∂Acth1
∂Etotal/∂Neto = ∂Etotal/∂Acto * ∂Acto/∂Neto
∂Etotal/∂Neto = — .2481124 * .1865526
∂Etotal/∂Neto = — .0462869
∂Etotal/∂Acth1 = — .0462869 * ∂Neto/∂Acth1
∂Etotal/∂Acth1 = — .0462869 * w5
∂Etotal/∂Acth1 = — .0462869 * .70
∂Etotal/∂Acth1 = — .0324002

∂Acth1/∂Neth1 = Acth1 * (1- Acth1)
∂Acth1/∂Neth1 = .5646363 * (1- .5646363)
∂Acth1/∂Neth1 = .2458221

∂Neth1/∂w1 = x1
∂Neth1/∂w1 = .10

∂Etotal/∂w1 = — .0324002 * .2458221 * .10
∂Etotal/∂w1 = — .0007964

w1+ = w1 — α * ∂Etotal/∂w1
w1+ = .30 — .5 * — .0007964
w1+ = .3003982

Repeating this process we get the new weights w2, w3 and w4:

w2+ = .4007964
w3+ = .5004512
w4+ = .6009024

Now it's time to go back to Fase I and compute our new output and new total error using the new weights:

Acto = .7547169 (vs 1.00 target)
Etotal = .0300818

You might think this is not a big change, but by repeating this 5.000 times you will get the following results:

Acto = .9830802 (vs 1.00 target)
Etotal = .0001431

Neural networks are no longer a mystery!

Demystifying Neural Networks

Written by Ariel Novelli