Should I combine both the datasets and train the ML algorithm or should I need to train the algorithm separately with the datasets and test on a single set.The test set is the combination of both the datasets.
If there are enough images then I would use one for training and one for testing. Just because this will help show that your network generalises well to different datasets and that you haven't over fitted the data. One of the big challenges of training classifiers on histology images is accounting for differences in staining quality (two H&E images from two labs rarely have similar hues).
Obviously if you have a very small number of images then another approach is needed
Thank you sir... yes the dataset that I have used for training contains 57 images... and the test dataset contains almost 2000 images...
If H and E images are stained from different labs, training and testing with different datasets won't work?? And for hue similarity, can I do color normalization ?
Hope you are doing well. Just some basic issues first. Why 2 sets? are these normal and pathologic specimens? Were the tissue specimens stained automatically by instrument, or with different batches of H&E? (if so then the hue of the staining will be different and then only a trained histopathologist with experiences will help.) What are the image parameters, sizes, magnification, eg 600-1200 dpi and the diskspace, eg at least 5 MB per image. Simply the hue is important for it at times correlate the extend of the pathology. I use 3 software programs for final analysis.
1) Gimp to normalize the dpi of the images, should this be necessary, especially when different batches of H&E were used (3 normals/standards) or different labs 3 normals/standards from each laboratory;
2) ImageJ (Fiji) (plugins for histo are available) for quantifying the images and
3) Rstudio to statistically evaluate the findings. Use 30% for training set and 70% for testing set, have 10 specimens evaluated manually and place these with the testing set of specimens.
I hope these guidelines will help. All the software are freely available.
I should add: OS=(Arco)Linux; cpu=i7; ram=24GB; ram-swap=24GB, hdd=500GB SSD; rendering=2GB, Add-storage =2TB; time-elapse =4,2-5,8 minutes. I have not attempted evaluation on other configurations. If this configuration is not possible then please attempt (with google scholar search) NDIPTools, a publication. All the success.