The best way is let the machine to learn how to understand an image with and without the noise. Nowadays machines are cable of classifying images with significantly high noise levels thanks to the data augmentation done in the training phase. In the augmentation, the images are modified by adding different types of noises randomly in addition to the original noise presents. This augmentation helps the machine to be robust in classifying images with the existence of zero to many significant noises. Check the following articles for more details.
It is important to understand what to classify first, then anything other than your region of interest is "noise". If you can educate me with the type of image and what you want in that image, i can suggest you denoising techniques.
I wonder you to use CNN networks. There are a lot of tools for using CNN. You can use Tensorflow , Cofee, Kheras etc. Nowadays CNN models is used for image classification because CNN models can extract the features itself.
The models like CNN and others suggested by peer groups demand high computational complexities; best of my knowledge, we cannot achieve 100% retrieval accuracy by these models; and I think there is no necessity of achieving 100% accuracy in the case of retrieval. It is enough the required image is one among the retrieved. But the system should retrieve a number of limited images, because based on that the precision is determined.
In your case, I do not know what type images you are considering, whatever it may be generally I would like to suggest my idea.
First, you remove the noise, because it may lead to feature extraction with a low accuracy; segment the images into various homogeneous regions, if it is structure; extract features region-wise. Now, you can employ a distance measure (select either distribution-based or point-wise/pairwise distance measure according to your number features). If it is possible, you can adopt semantic concepts. Surely, it will enhance the accuracy/precision of the retrieval rate.