I believe that improving your data quality or/and your model is a solution. I do not think there is an effective method to easily find a solution to the label noise problem. My suggestion is the usage of different activation functions in your model.
A Kallman filter approach which considers all data to contribute with differnt weights depending on the standard deviation might meet with good results.
We can use a loss function which allows the network to abstain on confusing samples during training thereby allowing learning and improving performance on non- abstained samples.