I am trying to find a number of relationships between visual images and one-dimensional associated time-series using deep convolutional neural networks.
May we have a little bit more insights about your idea? It's difficult to give leads if we don't now what kind of time series you want and what it should represent.
Otherwise, you can start with a basic VGG like structure, and flatten your features to use them as inputs for LSTM (or GRU) layers. But a little bit more details would help us to provide more insightful information.
Otherwise if you plan to do supervised DL, taking your image as input, with a "label" corresponding to the 1D time serie you have, I think it is possible to achieve that goal, with x your input image and y your 1D ts:
- y should have the same length, scaled between 0 and 1 (or -1 and 1),
- input x to a network (let's say we keep a VGG-like structure) having an output layer with the shape of y.
- if you're scaled between [0;1], use sigmoid as activation function for the output layer, tanh otherwise.
You can then fit your model on both images and 1D signal. I don't know if it's a good way to do so, because the network will have to learn from the image as input, and deal with an "undefined"/"unstable"/"unclear" output and may never converge.
It is a very raw idea at this point. Imagine images of different structures (faces) and their associated signals (voices); however, this voice is not necessarily generated by the shape of their lips and so on. I found the following which is very interesting. However, they are generating faces based on the speech. I want to do it the other way around.
Thanks for details you provided, very interesting project. Maybe you can use a GAN like in this paper: https://arxiv.org/pdf/1903.10195.pdf
It's still generating faces from voices here, but I think you can make it the other way around, using images for the generator to create a voice and concat face + voice in the discriminator.
Maybe that it'll help you: https://www.mathworks.com/help/deeplearning/examples/train-generative-adversarial-network.html
I never used matlab, but I heard it's not that different from Julia. Otherwise, Keras is really straight forward in Python, maybe you can do all your preprocessing in matlab, and just load inputs in a keras model, and transfer the output to matlab?
The input is an image and the output would be a sequence of words (time series in your case). If your output sequence is too long, you probably need to make a Hybrid architecure that translates the LSTM vector output , which is small, to your desired sequnce.