As far as I checked the net the autoencoders are used mostly in image or audio application but the question is that can we use them for fitting models?
Sure you can. The particular use case might be the following:
Consider you have a dataset only a part of which has been labeled. Labels may be classes or real values (exactly what is you question about) or something else. Consider you have trained a (variational) autoencoder (AE) on the whole dataset. You can do this for sure, because AE needs only objects and doesn`t need the target values. You will tune the AE hyperparameters to maintain a trade-off between embedded representation dimensionality and the reconstruction quality (you will have to define your reconstruction quality at this point).
This AE (the encoder part of it) may be used then as a feature extractor and a tool for nonlinear dimensionality reduction of your dataset.
Using this model you can apply some supervised learning method on a subset of your dataset that is labeled. For example, some kind of regression models. Your objects provided to this regression model will be described with the embedded representation obtained with the encoder part of the trained AE.
In my experience it works well.
You may use the Transfer Learning approach for this problem (I mean if you would like to confine yourself to neural networks only). Actually there are a number of papers that shows the increasing supervised learning quality with the preceding unsupervised (e.g. VAE) method applied as a feature extractor. With this approach one can train (V)AE, then use its encoder part to form a new network. The additional layers may be the neural classifier (or regressor). The transferred part (trained encoder) might be "frozen" (its weights are fixed). It is the starting point for neural networks training tricks.