The topic of meta parameters is very broad , examples are:
1)Learning rate adaptation- this started as far back as the work of Barto and Sutton using Delta-Delta and Jacobs using Delta-Bar-Delta and continued with the work of Schraudolph on stocahstic meta descent and other variants whose purpose is to try to look at the error surface and adjust the rate of decent by tweaking the learning rate. Recent work includes Hypergradient Descent by Baydin, Cornish, Rubio, Schmidt, and Wood
2) Architecture configuration by evolutionary means- this goes as far back as Ash (Dynamic Node Creation), Fahlman (cascade architecture), and Xin Yao. current work in this are can be seen in works like Converting Cascade-Correlation Neural Nets into Probabilistic Generative Models by. Nobandegani and Shultz
Other methods of evolutionary computation such as PSO, simulated annealing, etc. can be put in the second bin. Also note that there are mixtures of techniques on either learning rate or architecture design. Still, other methods such as brain damage to prune links should also be contemplated when doing the review.