if used for loop we would actually know that loop is starting from example 0 to 100 if in case of opencl how will kernel knows to run loop for fixed time wen nothing is given in .cl part.can anyone help me out?
you can run loops inside the kernel, similar to a for loop, but there is no point (in terms of speed up) to run serial for loops on the gpu
you would want to make it a parallel for (and wait to sync threads) working on shared memory for best results. in this way, all threads of the gpu will access the (small) shared memory and perform their individual operations on it.
The question seems a bit vague. Why would you want to know how to run a loop for a fixed time. and what is happening during that time? You can run a loop as long as you need. If you are using a loop to implement a delay, the key question is why do you need a delay? Or, do you need to run a loop that does not exceed some time constraint? This requires knowing something about the architecture of the machine. Note that machines with multilevel caches and deep instruction pipelines can exhibit differences of a factor of 20 due to normal variation of cache behavior, particularly when interrupts can be happening that result in cache disruption. So there is no nice general answer to a question this vague; you have to say what you are trying to accomplish, and what machine you are trying to accomplish it on. Some machines have very high-resolution timers available, if "physical time" is the constraint, which is different from counting instruction cycles as you can do in simple sequential microcontrollers, you might need to count real time rather than loop counts and simple arithmetic addition of instruction times. Architecture of the target matters.