Please go through some of the referred papers given below, which might help you in your research.
1. Adiga, M. T., & Bhandarkar, R. (2016, October). Improving single frequency filtering based Voice Activity Detection (VAD) using spectral subtraction based noise cancellation. In Signal Processing, Communication, Power and Embedded System (SCOPES), 2016 International Conference on (pp. 18-23). IEEE.
2. Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE signal processing letters, 6(1), 1-3.
3. Sohn, J., & Sung, W. (1998, May). A voice activity detector employing soft decision based noise spectrum adaptation. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on (Vol. 1, pp. 365-368). IEEE.
It is possible, but the accuracy depends on the particular activity or the set of activities you are going to measure. Some of the frame activities are preserved in frequency domain in a way that can be extracted by means of statistical feature but some specific activities with fine details do not survive in such a way. The selection of the statistical feature is also not straightforward. If you could elaborate more on the particular frame activity or the application, there may have some researchers who already have experience doing trial and error for finding features for the particular problem.