In machine learning, gradient descent is used to find the best line matching the training set, but as I know in statistics certain formulas are used to calculate regression parameters. Is there a difference in the accuracy of these methods?
Let me give my insight and let's discuss this issue together.
Perhaps the similarity of the methods used in statistical modeling and machine learning makes people think they are the same thing. For this I can understand, but in fact this is not the case.
The most obvious example is linear regression, which is probably the main reason for this misconception. Linear regression is a statistical method by which we can both train a linear regressor and fit a statistical regression model by least squares.
As you can see, in this case, what the former does is called "training" the model, which uses only a subset of the data, and the performance of the trained model can only be known after testing it with another subset of the data, the test set. In this example, the ultimate goal of machine learning is to obtain the best performance on the test set.
For the latter, we assume in advance that the data is a linear regression volume with Gaussian noise and then try to find a line that minimizes the mean squared error over all the data. No training or test set is required, and in many cases, especially in research (as in the sensor example below), the purpose of modeling is to describe the relationship between the data and the output variables, rather than to make predictions about future data. We refer to this process as statistical inference, not prediction. Although we can use this model to make predictions, which may be what you want, the way to evaluate the model is no longer to test the set, but to evaluate the significance and robustness of the model parameters.
The goal of machine learning (here specifically supervised learning) is to obtain a model that can be predicted iteratively. We usually do not care whether the model is interpretable or not. Machine learning only cares about the results. It's as if your value to a company is measured only by your performance. Statistical modeling, on the other hand, is more about finding relationships between variables and determining the significance of the relationships, which happens to cater to prediction.
Regression analysis is regression analysis. However, machine learning can significantly simplify some complex regression problems. Two examples:
1. Cubist program (called M5 in Weka) divides data sets in subgroups, and fits regression models individually to each subgroup. My experience says that this can extremely improve analysis - enormous improvement of r2.
2. Cox proportional hazard model is the standard regression model for survival analysis and identification of variables with predicitive value. Survival trees (machine learning method for survical analysis) is almost always superior to Cox regression and easier to interpret.