Well, maybe I'm not that expert yet and this is just a disquistion but idea can be that AI can select activation functions based on known information about trainning process. Another way can be to change NN structure as we know. Right now DNN is sum of multiplications which we after that put under activation function. What if we use multiplications or powers in some layers instead? How would that influence amount of layers needed? So many questions can be raised but all of them need testing.