This is to implement backpropagation algorithm in numpy which would help me to further understand how this works.
import pandas as pd import numpy as np from pdb import set_trace from sklearn import datasets Design the network structure Each layer contains the weights/bias and activation union structures = [ {"input_dim": 2, "output_dim": 25, "activation": "relu"}, {"input_dim": 25, "output_dim": 50, "activation": "relu"}, {"input_dim": 50, "output_dim": 50, "activation": "relu"}, {"input_dim": 50, "output_dim": 25, "activation": "relu"}, {"input_dim": 25, "output_dim": 1, "activation": "sigmoid"}, ] Initiate the parameters The weights can be random number and bias are preferred to be small postive values in order to pass the relu in the beginning. def init_layers(structures, seed = 1105): params = {} for i, structure in enumerate(structures): params["W_{}".format(i)] = np.random.randn(structure["input_dim"], structure["output_dim"])/10 params["b_{}".format(i)] = np.random.randint(1,10, (1, structure["output_dim"]))/100 return params The forward and backword activation union During back propagation, it is appraent we would need use the output value before activation in feed forward process. We would need to save the ouput before and after activation in each layer for back propagation later. def relu(U): U[U < 0] = 0 return U def sigmoid(U): return np.divide(1, (1+np.exp(-1*U))) def relu_backward(du, U): du[U < 0] = 0 return du def sigmoid_backward(du, U): sig = sigmoid(U) * (1 - sigmoid(U)) return du*sig So, we return two values in single_layer_feedforward function corresponding to the activated output and output which doesn’t. The activated output will be feed as input into the next layer and the unactivated output will be used in backpropagation - the reason is we need the partial derivatives of activation union to its input.
...