Why do we select MSE as the error measure in case of non linear system identification over the several time instants?Why cannot we use the error only for one time instant?
Of course, it is not necessary to use MSE. You can use other measures but as long as they remain global; otherwise the method may not be sufficiently robust. Therefore, you should definitely not use the local error at a given time. The heuristic is likely to reduce the error at that particular time but could yield a very large error somewhere else.
Local error measures are avoided because in system identification the intended models are DYNAMICAL, and therefore have memory, and therefore it does make sense to assess performance (or to minimize error) over a window of data.