Is CNN algorthms ( ResNet-18 and ResNet-50 ) better for Gemstone classification (from image dataset)then Random Forest, Logistic Regression and SVM?

Hello Bojan Andonovski ,

I skimmed through the paper you referenced. The Random Forests algorithm was used with certain handcrafted features, which seem to include color and texture features. Notably, that particular feature combination seems to give the best performance across all the models they evaluated (Figure 8 in the paper).

From intuition, considering the appearance of a gemstone, it makes sense that explicit color information and texture information would be quite useful for this task. I am by no means a gemstone expert, but it is clear (from Figure 4 in the paper) for example that the textural properties of the different gemstones is certainly useful information. Same as the color properties, which (from the paper) were extracted in 3 different color spaces.

CNNs also perform feature extraction - the convolutional layers in particular are known to hierarchically construct features across layers. However, the features CNNs are capable of learning are definitely not quite as...detailed or domain-specific/domain-suitable as the handcrafted features the authors used. More importantly, the CNNs used were originally trained on natural images (and limited to the RGB color space), then used in a transfer-learning setting (specifically they trained only the classification part in the CNN, not any of the convolutional layers which are responsible for feature extraction). Given the distance between natural images and gemstones, it would have been a good idea to fine-tune some of the convolutional layers, or even better yet, train a CNN from scratch on the gemstone images. But you can still see that the CNN performed rather well even so - it outperformed at least 4 other classifiers (NB, KNN, DT, LDA) even when using the color + texture feature vector.

So in summary, given the problem and their methodology, it is not very strange that their method outperformed the CNN. They chose features that are very well suited to the problem, and used the CNN in a way that may have limited its potential. On a side note, the wider implication is that feature engineering (especially for non-standard problems like this one) is still very relevant. Automatic feature extraction (via CNNs and the like) is certainly more convenient, but is not guaranteed to be the best approach 100% of the time. There's no free lunch afterall :-)

In your case, since you have such a large dataset, I would recommend training a CNN (you can use either of the ResNets they considered, or the newer models suggested by Giannis Tolios above) from scratch, rather than using transfer learning. I feel this approach would definitely provide compelling results. You could even augment the CNN features with some handcrafted features to have a hybrid approach, which could provide further performance gains.

I wish you the best of luck.

Tawfiq Beghriche

For a large image classification task with over 100,000 images, Convolutional Neural Networks (CNNs) are generally a better choice than Random Forest. CNNs have been shown to be highly effective for image classification tasks, especially when the dataset is large, as they are able to learn complex patterns and features in images. On the other hand, Random Forest is a decision tree-based algorithm that can handle large datasets but is not specifically designed for image classification.

Engr. Dr. Fawad Salam Khan

The results of the paper you mentioned could be due to several reasons, such as the specific features that were extracted from the images for training the algorithms, the quality of the annotations, the training and validation procedures, and the size and complexity of the problem. Both Random Forest and CNNs can be good options for image classification problems, depending on the specifics of the problem and the resources available. Random Forest is a traditional machine learning algorithm that is relatively simple to implement and can be very effective when the features of the images are well-defined and easy to extract. However, it might not perform as well as CNNs when dealing with more complex and high-dimensional data, such as images.

On the other hand, CNNs are designed to automatically learn hierarchical representations of the image data, which makes them well suited for handling complex visual data. They can also be more accurate than Random Forest, especially when large amounts of annotated data are available.

Regarding YOLO v8 or v7, they are object detection models, not classification models. Object detection models can be used to locate and classify multiple objects in an image, while classification models only predict a single class label for an entire image. If you are interested in classifying individual stones in images, then object detection models might not be the best choice for your problem.

The choice of algorithm for your problem depends on several factors, such as the quality and size of the dataset, the complexity of the problem, and the computational resources available. It might be a good idea to try both Random Forest and CNNs and compare their performance on your specific problem. You can also try to extract more relevant features from the images and use other feature selection techniques to improve the performance of the algorithms.

Manuel Günther

I think the main reason is dataset size. The paper that you mentioned uses 2000 images and 70 classes (about 30 images per class) for training, which is usually too small for training image-based deep networks such as the ResNet-18. Generally, the rule of 10 suggests that you need about ten times as many training samples as you have parameters in your model (when dealing with images, probably smaller amounts are sufficient).

Since your dataset is much larger and surely sufficient to train a deep network, I would strongly recommend to switch from hand-crafted features to automatically-learned ones. It is worth considering to fine-tune a pre-trained network, for example, trained on ImageNet. This typically reduces the training time and leads to better results.

Abdelaaziz Hessane

Dear Bojan Andonovski You must consider many factors before deciding what machine learning algorithm to use. For example, if you want to automate the feature engineering step, especially the extraction of relevant descriptors of your images, and you have enough data (It depends on the number of classes you have, for example, a minimum of 1000 images per class) then I suggest using Deep learning-based methods. However, these methods are costly in computation because of their deep and complex architectures. On the other hand, Machine learning-based algorithms may be effective on smaller datasets, but features need to be (manually) computed, and you may need to perform additional preprocessing techniques such as dimensionality reduction, etc. Good luck.

Objects Color Matching against a Reference Standard (ColorCODEX) ?

Self-Localization of a Swarm of Underwater Vehicles ?

How to get odometry from propeler rotation singals knowing that they can rotatet to maximim and be constant at that rotattion?

Is it Visual-inertial localisation better solution for underwater compare to Sonar(ultrasound)?

How to send a compressed video with a very low latency from raspberry Pi 4 Model B with camera module V2?

Real time 6D pose estimation of known 3D CAD objects from a single 2D image or point clouds from RGBD Camera when objects are one on top of the other?

How useful would it be to have a simple open-source FEM which would enable relatively easy implementation of your constitutive model?

How to define the cut off point?

Obstacle avoidance with pure Monocular Cameras?

Does anyone want to program a new, promising algorithm for learning neural networks?

What is the best conditions for formation Magnetite ore deposits?

Short Synthesis of Graphene Oxide from Natural Graphite Flakes?

Difficulty with permittivitt and Magnetic Permeability Calculations?

What is the solubility of Iron (+3) in Bismuth Oxide?

How to use Desmond in HPC ?

All math can be explained by iterator of code?

What is human-computer interaction (HCI)?

How to evaluate teachers' professional vision?

Which are the Scopus Indexed Journals in Computer Science with short review time?

How can I download an article to my computer?