Computation of likelihood and information criterion like AIC for Machine learning (ML) models specially artificial neural network

Likelihood and information criteria such as Akaike Information Criterion (AIC) are commonly used in statistical modeling to compare different models and select the one that best fits the data. However, the computation of likelihood and AIC for artificial neural network (ANN) models is not straightforward as it is for traditional statistical models.

In ANNs, the objective is to optimize the model parameters to minimize a cost function, such as mean squared error or cross-entropy loss, using an optimization algorithm such as gradient descent. Unlike traditional statistical models, where the likelihood can be directly computed from the probability density function, the likelihood in ANNs is not explicitly defined.

However, there are some techniques that can be used to estimate the likelihood and information criteria for ANN models. Here are some of them:

Maximum Likelihood Estimation (MLE)

MLE is a commonly used technique in statistics to estimate the parameters of a probability distribution that maximizes the likelihood of the observed data. In ANNs, MLE can be used to estimate the likelihood of the model by assuming that the output of the network follows a known probability distribution, such as a Gaussian or a Bernoulli distribution.

To use MLE, one needs to compute the log-likelihood of the observed data given the model parameters. The log-likelihood can be computed by evaluating the probability density function of the output of the network at the observed data points. However, computing the probability density function can be computationally expensive, especially for complex models.

Information criteria

Information criteria such as AIC and Bayesian Information Criterion (BIC) are commonly used in statistics to compare different models based on their goodness-of-fit and complexity. In ANNs, information criteria can be used to compare different network architectures or to select the best model from a set of models.

To compute the information criteria for ANN models, one needs to compute the likelihood of the model given the observed data and the number of parameters in the model. One way to estimate the likelihood is to use cross-validation, where the data is split into training and validation sets, and the likelihood is computed on the validation set.

Once the likelihood is computed, one can compute the AIC or BIC by adding a penalty term that depends on the number of parameters in the model. The penalty term helps to avoid overfitting and select a simpler model that is more likely to generalize well to new data.

In summary, computing the likelihood and information criteria for ANN models is not straightforward as it is for traditional statistical models. However, techniques such as maximum likelihood estimation and information criteria can be used to estimate the likelihood and compare different models.

Alessandro Rizzo

Here some reference:

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer. Chapter 5, pages 171-175, covers the computation of likelihood for neural networks.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. The original paper on AIC.

Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research, 33(2), 261-304. A comprehensive explanation of AIC and BIC.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. Chapter 5, pages 153-157, covers the computation of likelihood for neural networks.

McElreath, R. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press. Chapter 7, pages 203-206, covers the computation of likelihood for neural networks.

Gromacs first step of minimization ?

Can I multiplex a mouse monoclonal and rat primary for IHC (mouse brain tissue)?

For an in-vitro drug release study, what molecular weight cut-off (MWCO) dialysis bag is required for a 117 kDa protein?

How can we identify (in silico) the interacting amino acid residues or the nucleotides involved in the Protein-Protein / Protein-RNA interaction?

Please inform me about the International Conference on Plant Biology?

Any suitable Animal models to evaluate Mast cell stabilizers?

Is there any standard format or protocol for plant photo-plate creation?

What is the mathematical expressionof theoretical yield strength calculated considering solution strengthening in stainless steel 316L?

Unification of General Relativity and Quantum Mechanics is possible ?

Radial Basis Functions Based Differential Quadrature Method for One Dimensional Heat Equation "required matlab code" and provide code in discription?

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

"A Markov-like Model for Patient Progression"?