NA is defined as sinT where T is the maximum angle of convergence of the system.
The higher the angle T, the shorter the focal distance of the lens. Thus the higher the NA, the shorter depth of field.
It's exactly the same phenomenon as in photography, if you want to have a deeper focal plane, you have to close the iris, and you lose some definition. On the other hand if you want blur in the fore and backgrounds you open the iris to the maximum, and you get higher definition.
in simple terms (at least as long as the diffractive effects are not dominant), the smaller the aperture, the lower the transversal aberration. This means that the degradation of the spot diagram (or the PSF) grows slower (than for larger apertures). As Olivier mentioned, this is the effect of the NA, control the cone of light (of the marginal rays), if the cone is very narrow you have a larger length (e.g. Rayleigh length) in which the image is (relatively seen) close to "in focus"