After reading this post : http://karpathy.github.io/2015/05/21/rnn-effectiveness/ I wondered about the generation of image derived from a formal description. I have only basic understanding of deep learning, but it seems that a RNN trained with sequential description of image would output new images.

SVG provide a format that is simpler than an actual language with a smaller alphabet. It seems that one could build training data from a collection of SVG images.

I am genuinely curious if it is reasonable or if there is a strong reason why it wouldn't work.

Similar questions and discussions