I wish to estimate the expression of two types of markers by immunohistochemistry in the biopsy specimens of a particular type of cancer, and correlate their expressions with clinical parameters and outcomes. For this type of cancer, we see approximately 400-500 patients every year at our centre, which is about 10% of all of our cancer patients.The crude rate of this cancer in India is also around 10%. The estimated prevalence of the two markers vary between 50-75% as per published studies. What should be my ideal sample size ?

Training and Testing or Validation Sample Data is based on the type of model selection. For parametric models, the ideal training samples may be 10 to 20% and the remaining for testing purposes. However., for non parametric cases the usual way is 50 to 70% for training and the remaining for testing. For object detection the data may be split into 3 parts: training testing and validation purposes. In general there is NO specific optimal sample size for best accuracy estimate. It varies in data and models variations.

Based on the general population you want to infere to. The minimum should be 400 participant. However, you can add 5% margin of error. If you are doing for two groups you will need 420 participants each to make 840 participants including 5% margin of error from each group.

The classical sample size calculations are for studies based on Statistical Inference rather than descriptive ones. For a descriptive study, you describe with summary statistics what you observed. As always, more is better but no inferences beyond the data set you have is meaningful. E.g. see W.G. Cochran, Sampling Techniques, 3Rd Ed.

pundits say 70:30 , but that will change with NN, you sample size is very small you can put that 80 test 20, since Biopsy test you can go with this but ? is now days these validation is stuff is widely so in that 60:40/10%, that would set right tone