They say in the comments and documentation that it is implemented using a "Direct Form II Transposed" realization.
But for FIR filters (i.e. when a = 1) I suspect it uses an fft() to do the convolution in the frequency domain when it is faster to do so.
This would make sense because the procedure operates on a batch of data, not a sequential stream.
I don't suppose it really matters if the result is the same. I was just wondering if anybody knew for sure (for sizing and timing reasons).