Suppose we have a Neural Network with a binary output (0 or 1). What I am trying to do is to remove neurons or layers from the NN while maintaining a correct classification for all the instances that were classified as 1 in the original NN, same thing for the output 0. Said differently, is there any way to spot neurons that are paramount to the correct classification of the instances of a particular class ? The aim is to remove all the unnecessary neurons regarding that output. Currently, I am trying to use the back propagation phase to try to attribute a fitness to each neuron regarding its contribution to a certain class.
In the case of Binary Neural Networks (binary weights and activations), a research track could be compiling the NN to a Boolean Formula and reasoning on it to spot the neurons that does not contribute to the chosen output, but it is not always obvious to carry out this compilation.