If an activation function has a jump discontinuity, then in the training process, can we implement backpropagation to compute the derivatives and update the parameters?
Yes, because what matters isn't the activation function, but the cost function.
It is possible to define the limiting values that enter in backpropagation.
The ``discontinuous activation functions'', usually, aren't arbitrary-they're step functions. So their derivatives are delta functions-which simplifies the calculations considerably.