Bayesian networks, Markov networks and factor graphs are all part of graphical models, which is one approach to uncertainty based on probability theory. Nowadays, graphical models are almost synonymous with uncertainty in AI, but there are other approaches.
Neural networks (and its current incarnation in deep learning) is one of them. They are not necessarily graphical models, but sometimes they can be seen as one instance of them.
Circumscription and other sorts of non-monotonic reasoning (in logic) are non-quantitative approaches to uncertainty. That was very prominent in the 80s and 90s, I don't know about the current state of that area.
There is also credal networks which, unlike graphical models, avoid the committment to modeling only one probability distribution, which some people consider a weakness of graphical models.
The thing I don't like about Bayesian approaches is the need for prior probability distributions. In practice, measurement error statistics and processes noise statistics are rarely Gaussian (as assumed in the Kalman filter, for instance), and if they are, then the parameters are usually unknown and time varying.
B&E sum up the situation nicely I think ...
Prior information “constitutes an additional assumption that may be helpful or harmful depending on its correctness” [1].
[1] L. J. Bain and M. Engelhardt, Introduction to Probability and Mathematical Statistics (Second Edition), Duxbury Press, California, 1992.
When you have no prior knowledge concerning the unknown parameters to be estimated, in the form of a mean vector and a covariance matrix for instance, then they can be still estimated using standard statistical analysis techniques e.g. linear regression etc (see [1] in my earlier answer).
For example, in a Kalman Filter (a Bayesian technique) you would assume that you know that the sensor measurement noise is Gaussian with a mean of zero and a variance of R. But in most problems you could also use linear regression - estimate the parameters of a fitted polynomial over a finite or fading window, then estimate the measurement noise variance from the residual of the fit. In the latter case you still assume the measurement errors are zero-mean Gaussian, but the variance is unknown.