Memory components in MANNs are pretty basic. I know of models with multiple attention components. Are there Deep Learning (DL) architectures that employ a multiple-component approach for memory as in Baddeley & Hitch, 1974 and Baddeley, 2000?
There were several connectionist models of the phonological loop published in the 1990s. I think you could find this work if you run separate searches on George Houghton, David Glasspool, Mike Page, Neil Burgess, and Rik Henson. There are also some connectionist approaches in a book edited by Sue Gathercole, "Models of working memory". These models had multiple components with variations of competitive queuing as a schema for order with separate layers or phonological units simulating many effects in the working memory literature.
Have such models been given a Deep Learning spin? Has anyone tried to run them on / integrate them with a Neural Turing Machine (NTL) / Differentiable Neural Computer (DNC), for instance?