for your input variables, it does not hurt to scale them to "reasonable" values but if you initialize your network with very small random weights, the input to the activation function of the input layer will be quite small in the early training stage, so it should fall in the range of any reasonable activation function and the subsequent evolution of the weights will be driven by your learning algorithm
for your output variables, obviously, they must be scaled inside the output range of the activation function of your output layer
for instance, for a binary target (-1, +1) and a th(x) activation function in the output layer, it is useful to rescale the targets to (-0.8, +0.8) or so : not rescaling may cost you a lot of useless updates to "drag" already well classified samples up to +/- 1 along vanishing gradients
Before utilization of dataset for model construction, the inputs and target output were normalized or scaled linearly between 0 and 1 in order to increase the accuracy, performance and speed of ANN. Normalization implies two operations; centering and reduction. The first operation (centering) requires subtraction of mean from each value, the obtained values are called centered values, the second one is reduction, this signifies that each centered values is divided on standard deviation, that allows to set the data free from the arbitrary of units used to measure each variable (Brigitte E. et Jérôme P., 2008).