 1. Cost Function 代价函数
1. Important Parameters:
1. L => Total number of layers in network
2. Sl => Number of units ( not counting bias unit ) in layer l
3. As below, L = 4, S1 = 3, S2 = 5, S4 = SL = 4
2. Two Classification Methods
1. Binary Classification 二元分类
1. y = 0 or 1
2. SL = K = 1 ( One output unit )
2. Multi-class Classification 多元分类
1. y is logical vectors, which uses 1 to denote the class
2. SL = K, K >= 3 ( K output units )

3. The Cost Function

1. J(theta) sum up the cost function in logistic regression of ALL Layers.
2. Regularisation sum up all Theta elements between each two layers.
2. Back Propagation 向后传播
1. Compute Gradient 用于计算梯度( CostFunction对Theta的偏微分 )
2. Algorithm 算法解释
3. Back Propagation in Practice 向后传播实践技巧
1. Learning Algorithm 学习算法
1. initialTheta
2. costFunction
2. Unrolling Parameters 展开参数
1. Change matrices into vectors
2. Change vectors into matrices
3. Gradient Checking 梯度检查
1. Use numerical estimate method to compute derivatives
2. Pros:
1. It can check is derivatives are correct
3. Cons:
1. It is super slow.
2. When you make sure back propagation gives similar values as gradient, just turn off it.
3. Be sure to disable gradient checking code before training your classifier. Or the training process would be super slow.
4. Random Initialisation 随机初始化
1. “Zero Initialisation" does not work in neural network.
2. Random Initialisation: Symmetry breaking
5. Put things together
1. Training a neural network
1. Pick a network architecture
1. Number of input units: Dimension of features x(i)
2. Number of output units: Number of Classes
3. Layers:
1. Number of layers
2. Units in each layer
1. Same units number in each layer
2. Usually the more units the better
2. Randomly initialise weights
1. Small values near zero
3. Implement forward propagation to get prediction for any x(i)
4. Implement code to compote cose function J(theta)
5. Implement backprop to compute partial derivatives of J(theta)
1. for i = 1:m
1. Perform forward propagation and back-propagation using example (x(i), y(i))
2. Get activations a(l) and delta(l) for l = 2,…,L