This is a note for MIT 6.S191 course
course 1. Intro to deep learning
The Perceptron: Forward Propagation
Single layer neural network with tensorflow:
from tf.keras.layers import *
inputs = Inputs(m)
hidden = Dense(d1)(inputs)
outputs = Dense(d2)(hidden)
model = Model(inputs, outputs)
This four lines of code computes the single layer NN.
Deep Neural Network
More hidden layers
Applying Neural Networks
Quantifying Loss
Compare Predicted loss vs actual loss
Minimize loss
Different loss functions
- Binary cross entropy loss
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
model.y,
model.pred
))
- Mean squared error loss
loss = tf.reduce_mean(
tf.square(
tf.subtract(model.y, model.pred)
)
)
Training Neural Network
Find W
matrices that results the minimum loss
- Gradient descent
# randomly initialize W
weights = tf.random_normal(shape, stddev=sigma)
# loop over the following two steps
# 1. gradient
grads = tf.gradients(ys=loss, xs=weights)
# 2. update weights
weights_new = weights.assign(weights - lr * grads)
# until grads converge
# return the weights
Neural Network in practice
- Learning rate is hard to set
- try out
- Design adaptive learning rates
- Methods
- Momentum
tf.train.MomentumOptimizer
- Adagrad
tf.train.AdagradOptimizer
- Adadelta
tf.train.AdadeltaOptimizer
- Adam
tf.train.AdamOptimizer
- RMSProp
tf.train.RMSPropOptimizer
- Momentum
- Methods
Mini-batching Gradient descent
- can utilize GPU to do the gradient descent
The problem of Over-fitting
Regularization
avoid over-fitting
Regularization method
- Dropout randomly
tf.keras.layers.Dropout(p=0.5)
- Early stopping. stop training before over-fitting
- training loss should always decay
- if validation set loss start to grow, we will just stop training here
Core Fundation Review
- The Perception
y = g({x}*[W]+w0)
Neural Networks, how does it work
Training in practice, over-fitting, etc.