I'm asking If my data is not time series data, then the stationarity is (or is not )a relevant concern for fitting a multiple linear regression model? and if so, what makes differences with time series data?
Correct, stationarity is not a relevant concern for fitting a multiple linear regression model when the data is not time series data. Stationarity is a concept that applies specifically to time series data, which is a type of data where observations are taken at regular intervals over time. In time series data, stationarity refers to the property that the statistical properties of the data remain constant over time, such as mean, variance, and autocorrelation.
Multiple linear regression, on the other hand, is a statistical method used to model the relationship between a dependent variable and two or more independent variables that are not necessarily taken at regular intervals over time. In this type of analysis, the relevant assumptions relate to the linearity, normality, homoscedasticity, and independence of the errors, rather than stationarity.
Therefore, when fitting a multiple linear regression model to non-time series data, it is important to ensure that these assumptions are met and appropriate diagnostic tests are performed to check the validity of the model. These diagnostic tests may include residual plots, normal probability plots, and tests for multicollinearity and influential observations.
That's correct. In time series data, stationarity is a desirable property that indicates that the statistical properties of the data do not change over time, such as the mean and variance. Stationarity is important for time series analysis because it allows for the use of certain statistical techniques and models, such as autoregressive integrated moving average (ARIMA) models, that rely on the assumption of stationary data.
In the case of multiple linear regression, the data is typically not time series data and is instead cross-sectional data, where each observation represents a distinct individual, group, or event. Stationarity is not a relevant concern in this case because the assumption of stationary data does not apply. Instead, other assumptions, such as linearity, normality, independence, and homoscedasticity of errors, are more relevant for fitting a multiple linear regression model to cross-sectional data.
It is not the fact that the data are time series which is important. What is important is the presence of autocorrelation. For example, if your data are about the composition of the soil along a road, for instance, every 10 meters, you will also face autocorrelation despite the fact that the data are not time series. It is known how to handle autocorrelation for regression data but it adds difficulties and it requires typically longer samples.
To further clarify matters, if the variable yt equals α yt-1 + εt where εt is a pure random error term, then the data series is nonstationary if α equals one. The data series is deemed stationary if the parameter α is less than one; the tests of nonstationary are thus known as unit root tests.
Given this framework, unit root issues are only relevant if the data are time-series in nature or if there is a possible pattern to the cross-sectional correlations.
Correct, stationarity is not a relevant concern for fitting a multiple linear regression model when the data is not time series data. Stationarity is a concept that applies specifically to time series data, which is a type of data where observations are taken at regular intervals over time