Noter i Anvendt Machine learning – Optimization and regularization – 15/10 2021

Memorising = overfitting.

Large neural networks are tendent to overfit. And that is to be avoided.

Trade off between model and data

Regularization: to change data in a way that aids in avoiding overfitting.

flatten can be used on data that is flattend, but is not necesary.

When overfitting, it is because it has the capcit to fit disturbance patterns in the data (which is random noice). It will make the model perform worse.

The simplest way to avoid overfitting is to se a simple model. It can still happen, but there is less options to do so.

Large models have the potential to perform better when not overfitted.

Validation data as tool to detect overfitting

Vanishing gradient leads to weights of zero and small results.

Exploding gradient leads to infinite weights and large results.

Early stopping is stopping the model when it’s posible to tell that the model started stagnate.

Weight regulariztion as tool to improve generalization

Weight is hard to regularize, as it can only be optimized through trial and error.

Weight reularization can be done with weight. This is done by shinkage. Weights for poor classifiers are set to 0.

Weight regulariation can be done with l1 and l2.

Weight initialization can also be uses. Which is to adjust weights before training start.

Dropout as tool to improve generalization

Dropout can be used to reguralize by removing some connections at a given layer. It introduces noice and limits the potential of hidden nodes to co-addapt.

How dropout works – Source: “Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.”
Dropout at training and test – Source: “Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.”

Batch normalization as tool to improve training and generalization

Batch normalization is good to implement before dropout. This will be discoused later.

Normalizing can help not only at the input layer, but at every layer. Batch normalizion is an option to do that.

It does have some drawbacks. It reqires a slow learning rate, carefull initialization and it may saturate non-linearties by making gradients.

Batch normalization algorithm – Source: “Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.”

γ and B (the betta) introduces bias, but since they are trainable, they can be optimized. Ecentialy noice is introduced only to be further reduced.

Batch normalization is allmost allways a good idea.

Small batch sizes don’t allways work so well – especially batch sizes of 1. To large batch sizes might introduce other problems too.

It does require somewhat high computeational power.

Batch normilization works well with other methods of normilization. Usually dropout.

Data augmentation

With a set of data, eg. images, especially a small sample, the data can be transormed in some degree to be computated as new data.

This could be done by rotating, asdjusting colors, fliping and introduce noice to the pictures.

Augmentation will add a lot more data to the dataset.