I think when the time slice ends, and when there is an interruption (different from the time slice) are two different cases. The context switch which occurs in both cases hardly depends on the underlying hardware (CPU). The context switch depends on the number of registers, etc... In the case of the end of the time slice, the cost of executing the next task (for a time quantum) really depends on the cost of the scheduling algorithm and not only the cost of the context switch. Most of modern CPU can effectively read/write all the registers in a single instruction. This makes difficult to be able to optimize this part at the OS level.