depends on the type of application and data you are considering. If a large amount of raw data is available and properly labelled and you don't want to spend time for feature extraction, convolution neural networks (deep learning as mentioned) is a solution with good empirical results. However SVM is still the main competitor. On the other hand, if you seek methods with deep theoretical construction probabilistic models could be the choice.
I agree with Davide. Deep learning techniques are becoming more and more popular last decade (since 2006). Some aspects of classical machine learning are still somehow hot topic in research,however. For instance,automatic kernel selection in SVM according to data is a current topic that interests to many.
It always depend of your problem. If you do not have the hardware resources and you are in a rush, do not even think about deep learning. You will need hours of computation, large datasets.
SVM is probably a nice classifier and can be combined with the kernel trick. However, doing such thing, you cannot track or understand the higher feature in which you are projecting the data.
That is why, boosting such as AdaBoost (with decision tree as weak learner), RandomForest still allows to have some understanding about your initial feature and the decision taken by the algorithm.
Basically, you have two main choice, use ML as a black box to get the best results or try to work on the whole classification (features detection, features selection, features classification) and apply what make the most sense for your problem.
The VEGGIE-CS project's graph grammar induction process extents SubdueGL's frequent pattern discovery from context-free (C-F) solutions to general context-sensitive (C-S) solutions within an induction step. While the base discovery method used to locate high frequency sub-graphs within data is not technically a "new" algorithm, the contextual relationship (overlap) between discovered sub-graphs is a significant extension.
The C-S solution removes the arbitrary selection on one of several identical sub-graphs having overlap (i.e., shared nodes). In doing so, the process captures the complete composition of so-called "islands" of sub-graphs including grammar rules exposing the contextual relations within the overlapping regions--mining the context. Where as in the C-F solution, arbitrary selection leaves unselected sub-graph remnants that may influence later induction iterations with undesirable results. In multidimensional induction, a C-F solution may have profound unforeseen effects on the induced language. In contrast, linear induction can easily manage string overlap.
In frequent pattern discovery, the prioritization, classification, and clustering are generally automated and find natural induction solutions (a black box solution). However, induction can be directed with preexisting rules and positive / negative graphs resulting in feature detection and selection desirable to the user. Given that pattern recognition is the primary goal of frequent pattern discovery, the goal of neural networks (the basis for deep learning) are a natural fit for graph grammar induction. As usual, we categorize data by parsing on the language: success or not. When not, a parsing fitness metric can further indicate data "closeness" to the language--analogous to finding the angle between two vectors.