I came across the method called CAVE - completely automated vowel extraction. In this method Hidden Markov speech recognizer was trained with the CMU Sphinx toolkit.
I asked around recently and the answer from several colleagues was that the best at the moment was the Univ. of Pennsylvania aligner (https://www.ling.upenn.edu/phonetics/p2fa/); it seems to have been maintained for the last few years but my colleagues were not sure about its long term viability. The main issue that they saw was that it's built on top of HTK, which hasn't had an update in years and is not likely to given that Kaldi has supplanted HTK as the community recognizer of choice. They pointed out that Kaldi could be trained to do phonetic alignment but that such a course might have a steep learning curve. For HTK, the thought was that it might be important to retrain the acoustic model on the exact type of speech you are trying to process. Michael Wagner at McGill has made an effort to support this capability, see: http://prosodylab.org/tools/ Hope this helps! -Laura