I observed several models that are trained on teaching robots to do grasping tasks, and I noticed that the success rate varies within 7 % in each training session. Also, it can be noted that the validation of each model's training session may have an impact on the performance. It implies that it is neither steady nor efficient. That's why the Deep-RL model's training might take several sessions to find the optimal learning. Also, if we take those several training sessions and put them to the test in various scenarios, we can see that the success rate varies.