Theoretically (as I understand) the weights can be in the range of [- inf to + inf].
This is a very large space to find the optimal solution in, if we could limit this space we may get the optimal solution faster or at least near optimal.
Most of the artificial network work with inputs scaled down to [0,1] or [-1,1] depending on the neuron activation function.
If you have access to the limits of your input/output space you should normalize your signal into this interval so that 1 represent the highest limit and 0 the lowest.
Then at the level of weights, it is also a good practise to try to maintain them in a predetermined range to avoid overfitting. That can be achieved through several methods, including taking into account the previous weight value when calculating the delta to be applied.
Well, it depends very much of you weight adaptation algorithm. If you allow your weights to reach large values, then you also need to find a way to compensate them in order to acquire new information in incremental learning... But it is true that since the weights by themselves are wrapped inside the activation function, you may not see any incidence at the output level.
I agree with Stephanie. Moreover there are learning methods with regularization that shrink weights toward zero during training and protect overfitting.