Why the validation/training accuracy starts at almost 70% in the first epoch and stucks at that rate for approximately 100 epochs?

Hey Ali Salariyan ,

Could you provide more information such as what kind of model is being trained and on what task? What does the loss function look like?

Otherwise, my first guess would be that the model quickly converged to a local minimum and managed to get out of it around the ~75 epoch.

The validation accuracy starting at 70% is unusual, but is definitely possible depending on the size of the model and the hardness of the task.

I would be interested in whether this result is consistently replicable.

Cheers,

Raoul

Ali Salariyan

Hey Raoul G. C. Schönhof ,

Thanks for your comment. I'm training a multi-layer perceptron neural network for analyzing accidents.

The dataset contains several features, and the target is a multi-class classification.

The loss function figure is the second pic I uploaded and you can see (2.png above).

Dr. Arumugam Thiagarajan

I will take a leap of faith on nature of the input data and offer these generic suggestions. This could happen when the training dataset and validation dataset is either not properly partitioned or not randomized. Several factors could be at play here. My suggestion is first to properly explore the dataset to understand the stratifications that exist in the data. Then, optimize the size of training and validation data to test the model. Neural networks should be cautiously used in conditions where they could be over-trained or under-trained unintentionally when no attention to training size is given. The other parameters that must be investigated are the momentum and learning rate. If the learning rate is too high, this could also affect the performance of the model.

Ali Salariyan

Thank you Dr. Arumugam Thiagarajan .

As you said, my target values are not balanced, and I'm kind of dealing with imbalanced data, but K-Fold cross-validation works quite sensible and well with this dataset. When I want to use the data splitting method, this issue arises.

Muhammad Mohsin Kabir

The result you are getting in first epochs is not any valid accuracy. Your dataset may not be carefully organized, the model getting odd inputs and giving random result.

Gabriele Accarino

Hi Ali Salariyan,

In my opinion this is because the loss function, which is generally not convex for neural networks, reached a local sub-optimal minimum within a convergence region. Subsequently, after 100 epochs the loss function exited from that region and moved to a region with a lower error, which corresponds to an increase in the training accuracy. However, the aforementioned situation depends on the particular optimizer that is used to update weights, such as stochastic gradient descent, RMSProp, Adam and so on. Additionally, it strongly depends on the learning rate value. In fact, it is possible to achieve similar results earlier (with lower epochs), by increasing a bit the learning rate. Note that, selecting a high value for the learning rate can cause the loss function to overshoot the minimum value.

Moreover, I think that the accuracy is an inconsistent metric in classification tasks, especially for imbalanced datasets, as you may incur in the accuracy paradox issue: the accuracy value is high, whereas your classification model performs poorly. In this case, the model learned that, in order to minimise the loss function, it needs to classify any record as belonging to the over represented class. In this cases, in order to have a clear vision of how the model is performing, I suggest you to:

Compute and interpret the confusion matrix;

Plot the precision/recall curve;

Plot the ROC curve;

Compute the F1 score.

Finally, one way to deal with imbalanced datasets consists in the removal of imbalance between classes through Random Under Sampling or Random Over Sampling techniques, although both have drawbacks.

I hope it helps you.

Best,

Gabriele

How can I find suitable datasets for construction management or AI applications in Construction Management and Engineering ?

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

Request Python code?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Why does everyone use vs code?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?