If you are using RNN there is no such thing as a "window size". Window size, as I know it, is the length of a (sliding) cutout of a time sequence of data. E.g., if you have data x(t) that you want to model, you could use a k-size window x(n), x(n+1), ..., x(n+k). This is a method commonly used in non-recursive approximators.
RNNs are powerful in that information on previous time steps can be coded in the system, therefore do not need a time window input.
Window size is used in Time Delay Neural Networks and other older neural networks such as NETtalk. The effect of the window size can be better explained by using an example of reading a text. if you have a sliding window with no overlap then it would be equivalent to not using a time relationship between elements of your training set of character sequences in your text. As you apply an overlap you are applying context of the neighboring words to the one in the middle of the window.
See for example:
*Parallel Networks that Learn to Pronounce English Text by. Sejnowski and Rosenberg
*Phoneme recognition using time-delay neural networks. by Waibel et al