It seems that using machine/deep learning to solve PDEs is very popular (actually, not only in scientific computing, but also in all fields). So I want to know the reasons behind this. And is the prospect cheerful?
I think the applicability of deep-learning/AI/neural networks is being over hyped these days, and the main reason behind this hype is the availability of funding for research on these technologies. Everybody, irrespective of the field of research, wants to grab the fruits before they disappear. Many people I have seen (in Engineering) are applying these technologies without actually thinking much. They are doing it just because of ease of getting research grants when compared against conventional research methodologies.
Success of research grants these days are highly decided by the fancy buzz words. The more fancy words you put in your research grant, the higher the chances of success. Gone are the days when research panels used to evaluate the proposals based on the quality of the methodology and applications rather than how fancy it sounds.
Solving PDEs may be shown to be equivalent to solve large sparse systems of equations on either regular or irregular grids/manifolds. Currently very popular and computationally efficient solutions to solve such problems are multiresolution solvers (GMG, AMG, multiresolution wavelets). Now, some strong interconnections between machine learning (in particular kernel-based learning machines) and multiresolution solvers exist and have been highlighted in the past.
Machine / deep learning is becoming popular because it has recently become feasible on regular computers. In neuroscience it has been proposed to model how the brain works, but the proposal has little support. It has also been proposed that to bring new interpretations / explanations for particular phenomena, but this suggestion again has little support. It is a new technology that brings little light into how the real brain works, so the "prospect" of using it in neuroscience does not seem particularly "cheerful".
I think the applicability of deep-learning/AI/neural networks is being over hyped these days, and the main reason behind this hype is the availability of funding for research on these technologies. Everybody, irrespective of the field of research, wants to grab the fruits before they disappear. Many people I have seen (in Engineering) are applying these technologies without actually thinking much. They are doing it just because of ease of getting research grants when compared against conventional research methodologies.
Success of research grants these days are highly decided by the fancy buzz words. The more fancy words you put in your research grant, the higher the chances of success. Gone are the days when research panels used to evaluate the proposals based on the quality of the methodology and applications rather than how fancy it sounds.
Machine learning techniques are primarily optimization techniques well suited to solving problems involving strong non-linearities. But they do not do everything alone. I quote here an excerpt from an answer to a connex question on researchGate:
" [Hierarchical] graphical models, machine learning, finite element methods and signal[/data] processing[/analysis] are gradually merging, jointly leading to better understanding and modeling of the world. The theoretical challenge is the continuous integration of increasing dimensions (multivariate data) in the processes while mastering/reducing the computational costs. " (excerpt from https://www.researchgate.net/post/What_do_you_think_are_the_hottest_issues_in_the_Industry_40_field_in_terms_of_theoretical_depth)
I agree with most of the comments about AI/ML/DL being the new "hot topic" and possibly being over hyped. ML/DL is basically a scheme for problems where a given set of inputs (often complicated or extensive) can be used to infer a particular conclusion. For example, as set of photo pixels corresponds to a dog (of a particular breed if the system is good). The system needs to be trained by presenting a large number of input data sets and the corresponding inferences. The training part is what takes most of the computing time. But the parameters determined by the training can be reused to make quick inferences about new, similar sets of input data.
There are some evolving fields where this has some promise. For example, weather forecasting is traditionally a scheme of solving PDE's for fluid dynamics. Reliable, deterministic, and time consuming. Alternatively, train the DL network to make an inference about the weather from the collected set of inputs (temperatures, air pressures, wind speeds, ...). Sort of imitating what experienced farmers have been doing for centuries. Similarly, people have discussed training networks to map genome anomalies to known diseases. In general, the scheme is to have a pattern of input data that reliably leads to an inference and train a neural network to make that inference. And then use the trained network to make inferences about new input data sets.
I'm confident that there will be a spectacular failure example where the inference is nonsense or even dangerous. But that is not a good reason for discounting the field in general.
In some previous discussion I was sharing our experience to solve quadratic
equation using ML, so far it is not good. Input data is a,b,c and output data
is two roots. It suppose to be much simpler than PDE or even ODE. However,
it has its pitfalls - the case when a=0 and it reduces to linear equation and standard formula for roots does not work because of division by zero, complex roots, loss of significant digits when b^2 >> 4ac. Then as it stands now I am very pessimistic about using ML for problems where accuracy is important.
Machine learning (for pattern recognition) as a general technique or a 'holy grail' to replace PDE solvers is likely not going to happen as established numerical methods are designed to be both robust, efficient and accurate and well supported by theoretical frameworks. Machine learning brings in opportunities for detecting hidden and useful patterns in data sets with high complexity etc. and serve mostly different purposes than solving PDEs. So, Machine learning can be seen as another tool in the toolbox that complements the already successful and widely used / existing ones.
Use of machine learning techniques for solving PDEs is popular for lots of reasons. For example, Sirignano and Spiliopoulos used deep neural networks to solve high-dimensional PDEs in finance. Using conventional numerical methods for solving these PDEs is computationally very expensive or intractable because of the curse of dimensionality. I have also used deep neural networks to solve high-dimensional random PDEs. Conventional methods for solving random PDEs have many shortcomings (stochastic collocation methods suffer from curse of dimensionally, stochastic Galerkin methods are intrusive and are very cumbersome to implement, and Monte Carlo methods are often computationally very expensive as they require a very large set of samples). The method we have proposed does not suffer from the curse of dimensionality, requires minimum problem-dependent setup (compared to stochastic Galerkin method), and is parallel on GPUs.
Arjun R, No, certainty that is not the reason. Also, please note that most of the recent machine learning methods for solving PDEs are unsupervised methods, meaning that we do not use any dataset to train the models.
Mikhail Shashkov could you please share your code with us? Also, are you using single precision or double precision?
References
Sirignano, J. and Spiliopoulos, K., 2018. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375, pp.1339-1364.
Nabian, M.A. and Meidani, H., 2018. A Deep Neural Network Surrogate for High-Dimensional Random Partial Differential Equations. arXiv preprint arXiv:1806.02957.
About the latest trends for solving PDEs by combining multiresolution analysis (MRA) and adaptive (i.e., machine learning inside) spectral graph wavelets, follow https://www.researchgate.net/post/What_is_the_fastest_way_of_solving_a_complex_PDE_equation
Everything is in the title (or how to bridge PDEs and machine learning-derived Gaussian Processes as solvers):
Eigel et al., " Variational Monte Carlo — Bridging concepts of machine learning and high dimensional partial differential equations ", 2018 - http://www.wias-berlin.de/preprint/2544/wias_preprints_2544.pdf
In the above paper, both low-rank hierarchical tensor networks and deep neural networks are indifferently addressed. Moreover, to help to choose between these two modeling tools depending on the application requirements, I recommend the following publication:
Bachmayr et al., " Parametric PDEs: Sparse or low-rank approximations? ", 2017 - https://arxiv.org/pdf/1607.04444.pdf
This is not popular, but it has potential applicability in particular contexts (as mentioned by Stéphane Breton). In my opinion, the big field of applicability is the case of parametrized PDEs (combined with model order reduction techniques). Another case of applicability is for example the learning of control rules (still needing model order reduction for dimensionality reduction). Another comment is that there are techniques like DMD (Dynamic Mode Decomposition, Kutz, Brunton et al.) that are very close to machine learning for PDEs, and can be interpreted as "grey-box" ANNs.
To bounce back on Florian De Vuyst's post, DMD is actually very close to machine learning for PDEs. Here is an excerpt from Kutz et al., " Multi-Resolution Dynamic Mode Decomposition ", 2015 - https://arxiv.org/pdf/1506.00564.pdf:
"...The DMD method approximates the modes of the so-called Koopman operator. The Koopman operator is a linear, infinite-dimensional operator that represents nonlinear, possibly infinitedimensional, dynamics without linearization [...], and it is the adjoint of the Perron-Frobenius operator. The method can be viewed as computing, from the experimental data, the eigenvalues and eigenvectors (low-dimensional modes) of a linear model that approximates the underlying dynamics, even if the dynamics are nonlinear. Since the model is assumed to be linear, the decomposition gives the growth rates and frequencies associated with each mode. If the underlying model is linear, then the DMD method recovers the leading eigenvalues and eigenvectors normally computed using standard solution methods for linear differential equations... "
To bounce back on Allan Peter Engsig-Karup 's post, whereas machine learning may help to solve high dimensional and highly complex PDEs, it may reciprocally of great interest to use established ODE solvers to help machine learning as demonstrated by Chen et al., " Neural Ordinary Differential Equations ", 2019 - https://arxiv.org/pdf/1806.07366v4.pdf
I thought it's worth adding something I noticed recently.
Neural networks for PDEs can be reformulated as collocation least-squares; the minimisation problem for the loss function in ANN is nothing but a least-squares problems. So, basically the concept is not entirely new, and all the disadvantages associated with the least-squares methods such as (i) difficulty in imposing the boundary conditions and (ii) ill-conditioned matrices (along with the difficulties in solving them using iterative solvers), are also be present in ML schemes for PDEs.
To bounce back on Allan Peter Engsig-Karup's post, one may consider in some way that machine learning and PDE solvers have fused based on simplicial cell complexes.
Now given these, the aim of ML is to learn patterns in the data. ML therefore lends itself as a promising candidate because: 1.closed form/analytical solutions are not always feasible and 2.because of decision space partitioning. On the second point, PDEs are essentially solutions to optimal space partitions as shown in the PhD thesis attached here. PDEs are therefore seen as promising candidates in decision space design and partitioning in ML.
To follow up on Enoch A-iyeh post, natural and intrinsic patterns in data can alternatively be revealed by identifying their topological critical points. In particular, this may be performed by unsupervised learning using Radial Basis Functions (RBFs) or alternatively by more conventional topological analysis methods.
Stéphane Breton I agree with your post. Further to that, here is a project touching upon it: https://www.researchgate.net/project/Finite-Element-Modelling-Quality-and-Analysis-in-Machine-Learning-and-Pattern-Recognition
I believe there is a two-fold motivation: (1) There is a need to conceptualize processes for which there is not clear understanding of the physics and, (2) generate much faster versions and practical versions of repetitive simulations that may facilitate automation. The first one aims at physics discovery, the second one to create twin digital models.