MMSE and sparse recovery methods both aim at improving the classical LS channel estimation by injecting prior information. On the one hand, the prior information is learnt over time for MMSE (it is assumed that the channel is a correlated gaussian with a covariance matrix learnt from previous channel estimates). On the other hand, the prior information is based on physics for sparse recovery methods (it is assumed that only a few paths contribute significantly to the channel so that provided the plane wave assumption holds, it can be expressed as a sparse linear combination of steering vectors).
I wonder if the two methods have already been thoroughly compared on realistic channels.