BM3D is a powerful image denoising method. There are some recent attempts to achieve better results using the sparse representation framework by incorporating prior knowledge (dictionary learning), regional information (localized histogram) and other features. I have recently published an IEEE TSP paper entitled, "An Adaptive Approach to Learn Overcomplete Dictionaries With Efficient Numbers of Elements", also called DLENE. I have applied DLENE for image denoising, and results are competitive with BM3D. The manuscript along with its MATLAB codes are available on IEEExplore. Sadly, I have no experience with GPU programming to help you in this regard. However, if you are interested in DLENE, I will try to support you in C++ programming and algorithmic problems.
BM3D is a powerful image denoising method. There are some recent attempts to achieve better results using the sparse representation framework by incorporating prior knowledge (dictionary learning), regional information (localized histogram) and other features. I have recently published an IEEE TSP paper entitled, "An Adaptive Approach to Learn Overcomplete Dictionaries With Efficient Numbers of Elements", also called DLENE. I have applied DLENE for image denoising, and results are competitive with BM3D. The manuscript along with its MATLAB codes are available on IEEExplore. Sadly, I have no experience with GPU programming to help you in this regard. However, if you are interested in DLENE, I will try to support you in C++ programming and algorithmic problems.
Surely, BM3D is a powerful Gaussian denoising algorithm, but of course not the most powerful one. There are a few algorithms outperforming BM3D in terms of denoising quality, such as learning based approaches:
LSSC, Non-local sparse models for image restoration, ICCV 2009
GMM-EPLL, From learning models of natural image
patches to whole image restoration, ICCV 2011
opt-MRF, Insights into analysis operator learning:
From patch-based sparse models to higher order MRFs, IEEE TIP 2014
WNNM, Weighted nuclear norm
minimization with application to image denoising, CVPR 2014
CSF, Shrinkage fields for effective image restoration, CVPR 2014
But, so far, only the CSF algorithm can outperform BM3D in terms of run time. All the other competing algorithms are slower than BM3D on CPU, especially the WNNM algorithm, which is quite slow.
We have recently worked out a denoising algorithm which can beat BM3D in both image quality and run time. Our approach is based on a very old concept of nonlinear diffusion/partial differential equations (PDEs), but it is optimized by training. Our paper will publicly available soon, after the review of CVPR 2015.
Concerning the GPU implementation of BM3D, it seems there are a few people trying to do this, such as
Concerning the answer of Miguel Bordallo Lopez below, I also want to mention that our diffusion based algorithm is very suitable for GPU implementation, and it is very easy to implement it on GPU. We will be glad to make our algorithm publicly available soon. We believe we can provide a more appealing denoising algorithm in both quality and speed.
In the University of Oulu we have an "almost ready" implementation for GPUs (hopefully i will work also on mobile). You can contact us if you are interested in more details.
Our recent work on optimized nonlinear reaction difussion models has been accepted as an oral presentation at CVPR2015. The peer review version is uploaded in my publications. Basically, for general image denoising, it is faster and bettern than BM3D. Feel free to download it.