1. Using mini-batch gradient descent enables you to gain more speed in terms of the gradient computation compared to SGD. This is reasonable as we do not need to go through the whole training set (which can be huge or even possibly infinite) to update the network parameters.
2. Note that some data points might be redundant samples, or we can say that many data are similar. In this case, a subset of data will provide almost the same gradient than using all data points.