Can the minimization process in deep learning for find the weights arrive at a local minimum?how is avoided?

More Peter Stavroulakis's questions See All

In machine learning and for any AI Application ,How do we know that the minimization process we must use does not result in a local minimum?

Machine learning and AI which are fancy names of the old operations research procedures of the sixties for minimization such a the Gradient methods, usually based now a days on neural...

03 April 2018 1,546 6 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to design human-centered classroom in the age of A.I.?

08 August 2024 347 5 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

What's the role of IT & AI in Telecommunication Industry?

05 August 2024 8,264 3 View

Can usage of AI tools like chat GPT in research work is recommendable ?

AI tools like ChatGPT can enhance research work significantly when used responsibly and in conjunction with thorough human oversight.

05 August 2024 1,842 3 View

Graham W Pulford Popular answer

Hello. This is a good question. My understanding of deep learning is that it is a very large non-linear parameter estimation problem with certain structural constraints in terms of the connectivity of the neurons from one layer to the next, as well as positivity constraints enforced via rectified linear units. Gradient descent is used to optimise the network. As far as I know, there are no convergence guarantees and no convexity results for this type of optimisation problem. Therefore it is to be expected that only local convergence will result.

How can it be avoided? This would be a good research topic. I would start with a simple problem that you can analyse with a deep learning neural network and see what data and initialisation points in the parameter space lead to various convergence behaviours.

Graham W Pulford

Ioannis Hadjidakis

A good technique is to force arbitrarily major change to some of the coefficients and leave it to continue its new convergence. The most effective changes are the ones that correspond to bias nodes that should be close to zero.

Josep Anton Mir Tutusaus

I think this article might be of interest:

https://www.kdnuggets.com/2017/06/deep-learning-local-minimum.html

Vishnu Raj

One thing about deep learning is there is going to be a very high number of parameters to tune. What this means is your function lives in a high-dimensional space of parameters. In Andrew's coursera DL specialization, he mentions that at this high dimensional space, because we have so many parameters, we will be dealing with saddle points more often than local minima. and the optimizers we use now are quite capable of navigating through saddle points. So, in practical cases, you will most likely not be stuck at a local minima.

However, this is all intuition and is not quite verifiable. It is a nice theoretical question to ask and I think is a good research topic. One way to avoid them is to use multiple starting points for you algorithm, or slightly change some of the co-efficients after the algorithm is converged to see whether it will again converge to same point.