While outliers might be discovered and discarded if evidence is found that they are not just tail-of-the-distribution values, one may still have a problem with low data quality. This may be handled in various ways, such as looking at errors-in-variables regression. But there are other ways to alter the estimated residuals (which are basically distance measures from the regression line) to reduce the impact of the perhaps lower quality data points.
The mildest adjustment that can be made, and one that is often basically best to use anyway, is weighted least squares regression. This is generally needed just because members of a population are not all of the same 'size.' (See https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity.) This can be adjusted, even reduced, by other factors, such as data quality being lowest for the smallest members of the sample. (See https://www.researchgate.net/publication/324706010_Nonessential_Heteroscedasticity.)
More extreme measures include least absolute value regression.
From what i have read, an m-estimator involves maximum likelihood estimation rather than least squares, though they apear to be related.
Least absolute deviation regression, and the M-estimator (and the R-estimator and S-estimator) are noted in the following:
The M-estimator is not something I have used, but the links I have seen by Pennsylvania State University generally have looked good to me, so I assume that last link may be useful for you. Also, I think that often, when someone talks about "robust" regression, they are referring to the M-estimator.
I suggest you try weighted least squares (WLS) regression first. There are multiple ways to estimate the coefficient of heteroscedasticity. As long as you factor the estimated residuals into a random factor and a nonrandom factor, such that the random factor reasonably passes a graphical residual analysis, that should be helpful. This works for multiple regression the same way that it works for simple linear regression. Here are a couple of references:
Here's a thought to follow up on the above: I wondered if anyone has worked on a weighted version (to account for naturally occurring heteroscedasticity) of least absolute deviation regression, thus weighted least absolute deviation regression. A quick search of the internet indicated perhaps "yes." Here is a link and reference for a paper, the abstract for which is interesting:
M. Norouzirad, S. Hossain & M. Arashi (2018) Shrinkage and penalized estimators in weighted least absolute deviations regression models, Journal of Statistical Computation and Simulation, 88:8, 1557-1575, DOI: 10.1080/00949655.2018.1441415
Taking a hint from that abstract you might also look up lasso and ridge regression. The reference https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b notes that the idea is to avoid overfitting, which appears pertinent here.
By the way, the idea of least absolute deviation regression is to avoid giving too much weight to a potential outlier. If you use weighted least squares regression, you can form prediction intervals (for predicted y values, the analogy of confidence intervals). If, for example, you have much more than, say, f-percent of your points outside of f-percent prediction intervals, or say, one or two data points far outside of a reasonable range, then you may have a problem to address.
A weighted least squares regression will help avoid throwing suspicion on larger data points where the estimated variance of the prediction error should be larger for larger predicted values.
There is a chapter in Applied Regression Analysis and Generalized Linear Models, 2nd ed, 2008, John Fox, Sage, on robust regression. It is chapter 19. It isn't really an assumption, just something that is basically true, that medians are generally less vulnerable to outliers than means, and Fox discusses that.
However, if weighted least squares (WLS) regression will handle what you need to do, then the link below can help you pick a coefficient of heteroscedasticity to use in a regression weight, which you could then enter as "w" in SAS PROC REG, or perhaps something similar in whatever software you might have available.
This spreadsheet tool was developed for linear regression and is provided with references:
Note that OLS regression is a special case of WLS regression, where the coefficient of heteroscedasticity is zero, yielding equal weights. That is w=1, but any constant regression weight will reduce WLS to OLS.
By the way, if anyone reading this knows how regression weights are entered for use in R, or any other software, in additon to the SAS example I gave above, that might be good for others to know here. Thanks.