When training Neural Networks (e.g. FFNNs, Restricted Boltzmann Machines), lots of regularizers and other tricks can be applied to improve the results. Such tricks as well as different training parameters can totally change the look of the neurons' filters (i.e. weights) of a trained model. That means, depending on the training of the NN, the data will be differently REPRESENTED by the neurons and data is also differently DISTRIBUTED over neurons. We know that sparsity and selectivity and weight regularization may help, but we conclude this by looking at the results instead of having a "measure" which is calculated from the parameters.
Question: Does anyone know about publications about what are "good" representations (calculated from the weights and biases over all neurons)? How should the data be distributed over the neurons - is it better the data gets perfectly compressed (most data in least neurons) or perfectly distributed (all neurons should be used very balanced for representing data). Are there other statistical analysis over filters? How should representations look for discriminative tasks vs. generative networks, etc.? How does the quality of the data relate to "good" representations (e.g. does the model size follow "intrinsic dimensionality"), etc.?
Thank you for your answers!