Dear all,
Here is the instruction for Project 5 assignment: Consider in CANVAS: Module 13: Keras
DNN with Keras:
1. Detailed Version: With detailed explanation
[Jupyter Notebook] or [Accessibility score: High Click to improve HTML]
Â
2. Simplified Version: With codes only
[Jupyter Notebook] or [Accessibility score: High Click to improve HTML]
Consider one or more TIPS from below to construct five different DNN models for MNIST:
1) Use different # of hidden layers;
2) Use different activation functions;
3) Use different # of neurons;
4) Use different batch_size;
5) use different # of epochs.
In your project report, please consider some of the following tables/figures to summarize your models’ performances.
Some tips to build an efficient models for MNIST dataset:
#### The precautions when actually calling toolkit-Keras to train the model are as follows:
##### Build a neural network architecture
1. batch_size=100, epochs=20 is appropriate. If batch_size is too large, the loss curve will be too smooth and stuck at the local minima, saddle point or plateau. If the batch_size is too small, it will cause too many updates, too much computation, and slow speed. But can
ing a certain degree of accuracy improvement.
2. Do not have too many hidden layers, otherwise vanishing gradient may occur. In general, two to three layers are appropriate.
3. If there are too many layers, do not use activation functions that reduce input effects such as sigmoid, and should choose approximately linear activation functions such as ReLU and Maxout (the two should also be selected if the number of layers is small)
4. The number of neurons contained in each hidden layer is five or six hundred.
5. For classification problems, the loss function must use cross entropy (categorical_crossentropy) instead of mean square e
or (mse)
6. The optimizer generally chooses Adam, which combines RMSProp and Momentum, and also considers the past gradient, the cu
ent gradient, and the last momentum.
7. If the accuracy rate on the testing data is very low, and the accuracy rate on the training data is relatively high, you can consider using dropout. The way to use Keras is to add a model.add(Dropout(0.5)) after each hidden layer. The 0.5 parameter is set by yourself; note that after adding dropout, the accuracy on the training set will decrease, but the accuracy on the testing set will increase, which is normal.
8. If the input is the pixel of the picture, pay attention to normalizing the gray value, that is, divide by 255, so that it is between 0 and 1.
9. The final output is best to output the accuracy rates on the training set and the testing set at the same time, so that the right decision can be made.
##### For performance on training data
The first important point of training a neural network is that **we must first improve the performance on training data**
1. Activation function uses ReLU or Maxout
2. Use Adam for Adaptive learning rate
3. If the above two fail to improve the accuracy of the training set, it means that your network has no ability to fit training data. Try to change the network structure, including the number of layers and the number of neurons in each layer.
##### For performance on test data:
1. If the performance on the training set is relatively poor, no matter how the test set performs, go back and improve the performance on the training data.
2. If the performance on the training set is better, but the performance on the test set is relatively poor, it means that overfitting has occu
ed and the dropout method may be adopted for improvement.
For example, we can consider the following model structure:
# define network structure
model = Sequential()
model.add(Dense(input_dim=28 * 28, units=500, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(units=500, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(units=10, activation='softmax'))
# set configurations
model.compile(loss='categorical_crossentropy',
XXXXXXXXXXoptimizer='adam', metrics=['accuracy'])