Deep learning has produced exceptional results in many fields but it is not THE absolute solution for everything. In my experience if you have structured data, ML can perform quite well. However, I will try to answer your question from a different perspective. 'Accuracy' in any problem is a tempting measure and we all want to achieve it as high as possible. But it is only one aspect of an algorithm. Space complexity, time complexity, availability of resources ( memory, computational power, latency etc.) are also important if you want to deploy an algorithm for real world applications. Traditional ML algorithms are still popular and might be your only solution if you are working with very limited resources ( think about single board computing)
In my limited knowledge, I have observed that Deep Learning is effective in applicatiins with high computational resources and large data set for training. Usually, with a smaller data set, conventional ML algorithms based on SVM and KNN suffices.
As successful as deep learning, or artificial neural network (ANN) training as it was known before the hype, has been in recent years, it is not a panacea for machine learning. Instead, it is a particular discriminative machine learning approach that has been especially successful on certain kinds of machine learning tasks, namely those with huge quantities of, usually high dimensional, data with rich underlying structure. To suggest that ANNs are universally dominant even among discriminative machine learning tasks is somewhat preposterous, e.g. they are often, if not usually, inferior to Gaussian processes on low-dimensional, small-data regime, tasks, and still arguably trail behind decision tree ensemble approaches for “out-of-the-box” usage on, for example, generic classification tasks. Perhaps more significantly, many problems call for a generative (e.g. Bayesian), rather than discriminative, approach, particularly when data is scarce or significant prior information is available. Interestingly, however, this is perhaps where a lot of the successes of ANNs actually originate. Compared to say random forests, ANNs constitute a very flexible and composable framework for discriminative machine learning and perhaps provide a means of indirectly imposing prior knowledge on the ANN through its structuring. A major weakness of declarative generative approaches is that we almost always have to impose more assumptions on the model then we would like – all models will inevitably be misspecified. For ANNs, on the other hand, when there is not an abundance of training data, one often struggles to impose enough assumptions, sometimes, for example, resorting to generating synthetic data to train the network.
Dear All , I hereby invite any one who wants to Co-author a text book on Big Data Analytics , specifically for authoring one Chapter on Deep Learning (other Machine Learning Techniques are already being covered in a different chapter by one other Co-author)
As Tom pointed out DLs need large volumes of training data. DL have been performing extremely well on multimedia data, e. g. for object recognition or Natural Language Processing tasks. Whether DLs or shallow learners should be used depends on the problem at hand. It is often good practice to train a simple learner such as a Bayesian model as benchmark. Bayesian models are simple and converge fast, what makes them a good starting point. If a simpler learner performs equally well as a more sophisticated one such as Convolutional Neural Networks, the simpler one should be preferred (Occham's razor).
In a range of cases DL algorithms are equivalent to "classical" ones. For example, linear regression can be simply expressed as an artificial neural network, and they will be equivalent. The difference here is the way that ANN will be solved (usually) via some kind of gradient descent, and linear regression might be solved via (for example) SVD of something similar. And please note: since they are equivalent here, there is no need of huge dataset or high-dimesional feature space.
Almost the same for SVM vs ANN. One can construct certain ANN that will be equivalent to SVM. The difference is: in a range of problems SVM might be solved (optimized) way faster than an equivalent ANN. So the hyperparameters optimization will take less time.
Yes, a range of ANNs has demostrated spectacular results in a set of cases, but generally they are just multidimensional (in terms of feature space) approximators of a nonlinear decision boundary. And they have correlated weaknesses. For example, decision trees and their compositions (e.g. gradient boosting over DTs) handles categorical features in a more natural way than any ANN. That doesnt mean GBT will beat ANN in any problem with a lot of catgorical features. But it means one may choose easily when a problem featurespace contains catecorical features.