I have the intuition that these two types of methodologies are related but I could not find any references nor any clear explanation of this relationship besides the fact that they are 2 types of modern, novel and evolved artificial neural networks.
Reservoir computing generally refers to some kinds of recurrent neural networks where only the parameters of the final, non-recurrent output layer (known as the readout layer) are trained, while all the other parameters are randomly initialized subject to some condition that essentially prevents chaotic behavior and then they are left untrained.
There is also a non-recurrent analogue of reservoir computing, which undergoes various names including "extreme neural networks", that consists of plain feed-forward neural networks where only the readout layer is trained. All these methods can be considered to belong to the larger class of "random projection" techniques.
You can "unfold" a recurrent neural network into a feed-forward, generally deep, neural network where the internal layers are time-shifted replicas of each others. This is the intuition behind the backpropagation-through-time training algorithm.
In fact, if you train a deep neural network with vanilla backpropagation, or a recurrent neural network with vanilla backpropagation-through-time, you often observe that the parameters in the hidden/recurrent layers don't change much from the random values they got at initialization, due to an issue known as the "vanishing gradient problem" (there is also an "exploding gradient problem" that can cause chaotic behavior and numerical instability in some cases).
This is where reservoir computing and deep learning part ways:
"Extreme"/Reservoir computing argues that since backpropagation/backpropagation-through-time is computationally very expensive but typically doesn't affect much the internal layers and it can run into chaotic behavior and numerical instability, we can often avoid it altogether and only train the readout layer for a small fraction of the computational cost (since it is a generalized linear classification/regression problem), while avoiding any instability by enforcing a simple constraint on the random parameters of the internal layers. This works very well for some problems.
Deep learning, on the other hand, argues that there are very hard problems that really do benefit from the training of the internal layers, and develops training algorithms, such as staged autoencoder pre-training, designed to overcome the limitations of vanilla backpropagation.
1) rReservoir Computing essentially is using the path of iterative updating on recurrent networks as an indication of the input. Since the input can occur at different times, it is in principle a method for spatial-temporal pattern recognition. See the articles for Hananel Hazan for recent work on this. The original articles are by Maass (Liquid State Machines) and Jaeger (Echo computing). ( The iterations, on the one hand, cause distinct patterns to diverge making potential classification in principle easier.)
On the other hand, deep learning methodologies are (at least in the basic formulation) a way to allow feed forward networks with many levels to make use of their potential power.
Thus, they are two very different ideas; one is essentially about static pattern recognition, while the other is about dynamic patterns.
However, having said that, they can potentially be connected in many ways. For example, in reservoir computing, typically a "detector" that looks at the patterns can be any good classifier, and in particular, it might be very useful to use the power of deep learning classifiers for this part.
In addition, one might consider investigating the deep learning paradigm for training the interconnections in the reservoir level; however this is still a research stretch.
Reservoir computing generally refers to some kinds of recurrent neural networks where only the parameters of the final, non-recurrent output layer (known as the readout layer) are trained, while all the other parameters are randomly initialized subject to some condition that essentially prevents chaotic behavior and then they are left untrained.
There is also a non-recurrent analogue of reservoir computing, which undergoes various names including "extreme neural networks", that consists of plain feed-forward neural networks where only the readout layer is trained. All these methods can be considered to belong to the larger class of "random projection" techniques.
You can "unfold" a recurrent neural network into a feed-forward, generally deep, neural network where the internal layers are time-shifted replicas of each others. This is the intuition behind the backpropagation-through-time training algorithm.
In fact, if you train a deep neural network with vanilla backpropagation, or a recurrent neural network with vanilla backpropagation-through-time, you often observe that the parameters in the hidden/recurrent layers don't change much from the random values they got at initialization, due to an issue known as the "vanishing gradient problem" (there is also an "exploding gradient problem" that can cause chaotic behavior and numerical instability in some cases).
This is where reservoir computing and deep learning part ways:
"Extreme"/Reservoir computing argues that since backpropagation/backpropagation-through-time is computationally very expensive but typically doesn't affect much the internal layers and it can run into chaotic behavior and numerical instability, we can often avoid it altogether and only train the readout layer for a small fraction of the computational cost (since it is a generalized linear classification/regression problem), while avoiding any instability by enforcing a simple constraint on the random parameters of the internal layers. This works very well for some problems.
Deep learning, on the other hand, argues that there are very hard problems that really do benefit from the training of the internal layers, and develops training algorithms, such as staged autoencoder pre-training, designed to overcome the limitations of vanilla backpropagation.
So summarizing and as far as I understand: bot methods are recurrent multi-layer neural networks. They differ in the learning algorithms. While RC use algorithms that only train the read-out layer (and this seems to be sufficient for some problems), deep learning techniques claim the need to train all layers (although the changes in other layers than the read-out one are minimal).
If this is true, deep learning seem to me a generalization of reservoir computing.
Deep learning doesn't necessarily involve recurrent neural networks. In fact, most research in deep learning is done on feed-forward neural networks.
A feed-forward neural network is generally considered to be deep if it has more than one hidden layer.
A recurrent neural network, when unfolded over time for an example of duration T, essentially becomes a feed-forward neural netowork with k*T hidden layers (where k is a constant usually equal to one).
Since training many hidden layers using standard backpropagation techniques is difficult, extreme/reservoir computing gives up and just trains the output layer, while deep learning trains all the layers using techniques that extend standard backpropagation.
we have implemented Reservoir Computing in electronic, optoelectronic and all-optical (lasers) hardware, reaching information injections rates above 1 GSample/s and good performance values.
For us, one of the most important features of Reservoir Computing is its conceptual simplicity, strongly fostering such implementations. In case you want to know more I would be happy to send you some publications regarding hardware implementations of Reservoir Computing.
Basically both are neural networks but one which recurrently executing its layers is reservoir computing while the one with feed forward approach is simple neural network.
Now if both have too many layers in network it is called deep learning.
I can tell you some differences. Deep learning is associated to feed-forward networks, with no feedback and therefore no memory. Reservoir Computing s recurrent so that it presents memory. The former is used more in image processing while the latter is normally used in temporal series processing. Other difference is the training, reservoir computing training is done through a simple linear fitting in a much more easy way.
Reservoir Computing is a recurrent (with loops, where the data persists or repeats in patters, hence recurrence) neural network that avoids the problems of Back-Propagation through time (BPTT) during training.
It's a limiting case of deep learning where the gradient -> 0 at the n-1 layer. It's successful in signal processing applications because in many cases the gradients do vanish quite rapidly; relevant piece of information in a signal usually persists for only a short duration of time in most real world applications. Any temporal correlation are taken care by the persistence or the dynamical memory states (signal echo bouncing around in the reservoir).
RCs do have certain limitations, e.g. a reservoir has it's characteristic timescale and decay rate of information, which dictates the size of the signal in temporal dimensions it can handle. But in some sense it is an equivalent problem of deep learning where the number of layers imposes similar restrictions.
LSTMs are also an interesting RNNs you should look at. It's an engineered solution to many of the problems with BPTT and takes a different approach to RC. It allows you to tune and selectively remember or forget the states and the gradients.
I'm very interested in RC for pure hardware implementations, especially in building adaptive signal processors because they are much simpler to implement purely in hardware due to the simplicity of the algorithm. Many of the inherent requirements of RC networks are demonstrated in the physical characteristics of memristors and spintronic technology and hopefully we will see highly compact direct implementations of these networks in hardware, rather than simulations at a software level or even FPGA based emulations.