There are two or possibly three books to read if you want the full picture, and given your background in Machine Vision, you can get away with one thick one if you want to skip the foundations, but that might come back to haunt you later.
To start from the beginning and get some statistical foundations begin with "Intro to Statistical Learning" (ISL) by James, Witten, Hastie, and Tibshirani. It's nice because the book pdf can be downloaded free from Stanford, and you can take a free class by the authors which lets you code-along in R/RStudio. All free. This is a class that's intended to include undergrads, a good way to get the big picture of key concepts. Probably the overall best way to get the idea just how important (and tricky) it can be to achieve proper generalization in Learning.
When you're done with that, a more in-depth reference of the same material is Elements of Statistical Learning (ESL) by Tibshirani and Hastie, also free for download. You don't need to read it cover-to-cover, it's more a supplement that takes the topics of the first book much deeper and connects them to the next book.
Then proceed to "Deep Learning" by Goodfellow et al: it's a thick book, very well written and comprehensive view of ML from a Comp Sci and Vector Calc perspective. They assume you are a comp sci major, or at least can deal with a lot of programming, understand complexity and parallelism, etc. The good thing in your case is that the authors do a lot of vision work, which also happens to work well as an example of some concepts such as manifold learning. It begins with a 175 page intro that presumes you know basic linear algebra and introductory calculus: the material works from there to extend those ideas to key concepts of differential calculus on tensors, which sounds more intimidating than it is.
The explanations are good, there's just a lot of material.
They use tensorflow as the basic platform for advanced work. That's probably the best platform to use, because it's supported by Google, free, and works well with python/anaconda. Most learning is effectively a form of convex optimization, so the key value provided by libraries like tensorflow is symbolic differentiation of your network code in order to find optimal weights in a high-dimensional variation on Newton's method.
The book is free to read online at http://www.deeplearningbook.org/ but you can't download it. Tensorflow etc are free.
Aside from learning and network concepts themselves, with tools like tensorflow there is also a required shift in perspective: you are not directly writing the learning code, rather are writing code which specifies the network architecture, learning calculations, and the methodologies used to feed the system with training data during the learning process: together, these are compiled by tensorflow into a "computational graph" structure. It'ss a kind of "meta-programming," which is good considering that it saves you much complexity, but it is counterintuitive if you haven't done that sort of thing before.
BTW, in the US, the print/e-book of Deep Learning costs about USD $60. But worth it compared to 99% of textbooks today.
A lof of people like Andrew Ng's course on Coursera, on basic machine learning. It's more a competitor to ISL than to the Deep Learning book. But Ng depends on using Matlab or Octave, and on figuring out efficient ways to do matrix calculations in MATLAB - I personally find that it requires a large amount of programming and debugging relative to the depth of what the student learns. I think they also charge money for the class now, I could be wrong. In contrast, the Stanford ISL class teaches a slightly simpler version of the same material for free, and uses R/RStudio, with less emphasis on how to multiply arrays and more on illustrating the concepts of how to avoid overfitting/underfitting models, etc. Also, dealing directly with matrix calculations is something you explicitly don't do in deep learning frameworks such as tensorflow.
Overall, it's a big project. I recommend you start with ISL even if it seems overly simplistic, because there's kind of an 80/20 rule at work here.
I have taken a course on "Machine Learning" in Coursera by Mr Andrew Ng, it is really a pretty course with a lot of programming tasks. Now I am looking for "Deep Learning" courses