By the law of large numbers and the central limit theorem, the ordinary least squares (OLS) estimators in linear regression technique still will be approximately normally distributed around the true parameter values, which implies the estimated parameters and their confidence interval estimates remain robust. Hence, in a large sample, the use of a linear regression technique, even if the dependent variable violates the “normality assumption” rule, remains valid.
Ref: Article Are Linear Regression Techniques Appropriate for Analysis Wh...
Data have no 'normality' requirement for using regression. It is the estimated residuals, or better, the random factors of the estimated residuals in weighted least squares regression, which would ideally be close enough to being normally distributed that the central limit theorem would help when estimating prediction intervals, but that is usually a rather weak requirement.
What is important are the estimated residuals, which are distributed vertically in a residual analysis graph.
Normality can help, but it is not a big requirement, and is for the estimated residuals, not the dependent and independent variables.
FYI: In Applied Regression Analysis and Generalized Linear Models, 2nd ed, 2008, John Fox, Sage Publications, Inc, on page 196, in a footnote, he states that if you add the assumption of normality to those of the Gauss-Markov Theorem, which address the "errors," then the least-squares estimator can be shown not only to be the best linear unbiased estimator, BLUE, but also the best of all unbiased estimators. Bonus!
For that, Fox references, as an example, page 319 in C.R. Rao(1973), Linear statistical inference and its applications, 2nd ed., New York, Wiley.
It may also be good, if you have enough data, to save some data not used for model selection or estimating regressor coefficients, and then see how well you would have 'predicted' them. This may help you to avoid overfitting your model to a specific data set.
[Also, you do not want to model data together which actually fall under separate models. Does one model really apply to everything, or are there separate categories to be considered? Dummy variables? Should multilevel modeling be considered?]