In the paper "on-the-fly learning with perpetual learning machines" a novel type of DNN is presented - a perpetual learning machine. The perpetual learning machine is made up of two DNNs - one for identification, and one for recall. Simpson trains the model on the MNIST data set "We took the first 75 of the MNIST digits and assigned each to an arbitrary class." The recall DNN is used to feed images to the storage DNN, alowing the model to perform self-supervised training in 'perpetual stochastic gradient descent'.

My question/ concern is that each class is only made up of one image, so how can this model be applied when a class is made up of 1000's of examples. How can the recall DNN synthesize a an image representative of all/ any instance of a given class? Another question that arises, is why dropout and

dither are used - over-fitting cannot be an issue if the training set is the same as the evaluation/ test set?

Thanks in advance for your time, I understand that this is a very novel area, and prehaps I am mistaken as to what the author meant.

https://arxiv.org/abs/1509.00913

Similar questions and discussions