Using non-linear activation functions, a single layer NN behaves like a multi layered one. The question is related to this. How can we perceive this? Is there any derivation to answer the question?
I am not sure that such property is right. Linear activation functions mean linear combination of inputs, and multiple layers mean also linear combinations, what leads to a (complicated) linear combination of inputs at the output. I do not think that it might replace a non-linear function in any way (unless I am missing something).
On the other hand, a single layer with proper non-linear activation functions leads to an output calculated as an expansion of inputs using such functions as a basis. If properly chosen the NN becomes a universal approximator (provided that enough neurons are used = provided that enough terms are used for the expansion).
Theoretically, any function might be approximated that way, but I do not believe that the opposite is true...
"How does a multi layer neural network behaves like a single layer network if the activation function is a linear one? "
The answer is fairly obvious: since the connection between neurons are achieved by means of a linear combination, if also the activation function is linear all the system behaves as a linear one.
So not only a multilayered ANN is equivalent to one-layered but also a N-neurons hidden layer is equivalent to a 1-Neuron Hidden layer.
After all, if X is the vector of inputs, W is a weight vector, W^T*X is the more general linear combination of the inputs.