I know we do have a bunch of statistical methods to deal with network data, but I am wondering if it is meaningful that we just use the adjacency matrix as regressors with classical regression method. If it does, how could we interpret it?
Not being an expert in network analysis, I would still venture that regression would be very helpful, but it would be hard to impossible to sort out the relationships and be very certain that you really know what is happening. Is that what you are questioning? If so, remember that you are looking at associations, not causality, and as Theo Dijkstra and others noted in answering the attached question, you will need to theorize a reasonable model, and then see if data support it, I think.
I'd heard at one point recently, that 'big data' experimentation had shown that just looking for relationships among data without a starting framework could go wrong, and if so, this sounds related, but I am not a 'big data' expert either.
Thanks for your answer Prof. Knaub. I was just wondering if this way could be a convenient way to decide whether we should apply further, more sophisticated network analysis, based on the significance.
'Significance' is a relative term. If you are looking at the regression coefficients in a multiple regression, then association between a regressor and your y-value may not be very useful unless such a given regression coefficient is several times its standard error, but again that may be relative among your regressors, and also there is masking and collinearity and such complex relationships among regressors, that it is not really straightforward. People discussed, in answers to the question I linked above, that you need a good theoretical reason for a model.
Visually, I think scatterplots comparing two variables at a time can be helpful, but again, that is just comparing for association in your particular dataset.
Also, a scatterplot for multiple regression in just two dimensions could put y on the y-axis and predicted y on the x-axis.
Further, note the importance of heteroscedasticity. That may be important to your modeling.
So, you need theoretical reasons for your model, and then explore your data.
Attached are links to two files, in case they might be of any help: (1) one on problems with interpreting p-values, and (2) another on a simple way to visualize heteroscedasticity when looking at one regressor.
Article Practical Interpretation of Hypothesis Tests - letter to the...
Conference Paper Alternative to the Iterated Reweighted Least Squares Method ...
From a brief look at your thesis on statistical networks, it appears that one cannot be a two-dimensional thinker, like me, and keep this all straight. :-)
It looks a bit like operations research to me, but that you have to keep in mind that these are random variables, and the models should not he deterministic.
Considering that it also looks like multiple regression is so much related, problems there may be as big here or bigger, much bigger. That would mean realizing you are looking at associations, not causality; OLS may not be appropriate, as heteroscedasticity should be considered in the error structures; a coefficient of variation is much more useful than a p-value; the impact of some regressors will be to mask others; some relationships may be nonlinear; and with all the data needed, you are bound to have data quality issues that could easily confuse.
On the one hand, it looks like many relationships might be found that might otherwise be missed, but on the other hand, it would be easy to encounter spurious relationships, which need to be verified, as Theo Dijkstra noted in answering the question I linked above, in a similar context.
I have been thinking about your question before trying to offer a suggestion. The answer somewhat depends on what you already know or may have tried. Are you familiar with the idea of spatial auto-correlation (i.e. the possibility that errors in the model are similar in areas that are close to each other)? Assuming yes ... well, there is a variant on that .... namely network auto-correlation ... where the spatial relationships are defined with respect to a network rather than say euclidean coordinates. Now, for many spatial modelers these are things to be careful with -- and usually indicate that you should make some effort to model the spatial dependence between observations -- and I would suggest this is true whether the dependence is measured by some kind of geometric OR network distance. The idea then of SIMPLY using the weights of the adjacency measures as regressors (if that is what you mean) seems far too simplistic. At the very least you should use some distance weighted combination of the observations from surrounding points (on the network). If you want a challenging example to test your modeling skill, how for example would you write a model for traffic fatalities on a road network?
You might want to check some of the contents of Geographical Analysis. Most older papers are available is a free archive. In particular Prof Atsu Okabe ** has a large set of tools to deal with network models and the correct treatment of their implications for spatial prediction. Simply using connections in OLS is probably not going to be the best way. Please let us know about your next steps and discoveries.