I want to generate some nice prediction plots from my MRQAP model. I've laid out my process below, and would be very grateful to get anyone's insight, as I'm not seeing much written about this online.

I am building my own regression models on network data in R, using quadratic assignment procedure with Decker and colleagues (2007) double-semi-partialling method. In other words, I am predicting the weight of an edge given its respective node traits. This approach uses node permutations of residuals to adjust for interdependence of observations in the network. (Regression with networks involves huge heteroskedasticity, because the observations are literally connected).

Traditionally, this method (MRQAP with DSP) just produces a p-value, and original standard errors are suspect. So, I am using a Doug Altman's method to back-transform p-values into new standard errors that better reflect the actual error range (read more here; thanks to @Andrew Paul McKenzie Pegman: https://www.bmj.com/content/343/bmj.d2090). This at least allows me to make nice dot-and-whisker plots of beta coefficients and with their confidence intervals (estimate + se*.196, etc.). However, I'd still really like to make predictions.

There seem to be two logical routes to make predictions from an MRQAP model.

First, you could just make predictions normally.

This relies on your observed residuals in the model to calculate the standard error for your predictions. I think this might even work, because the homoskedasticity assumption in regression is really about covariate standard error and p-values, not prediction; this means that a heteroskedastic model can still produce solid predictions (see Matthew Drury's & Jesse Lawson's helpful notes here: https://stats.stackexchange.com/questions/303787/using-model-with-heteroskedasticity-for-predictions). However, I would love some external verification on this. Any sources I can draw on to be confident I can use this for visualizing predicted effects from networks?

Second, you could simulate the predictions, like in Zelig/Clarify.

Simulation requires building a multivariate normal distribution, where each vector has a mean of one of your model coefficients, and where the vectors share the same general correlation structure as your variance-covariance matrix. Then, you make a sample from this multi-variate distribution (eg. grab a row of observations from each vector), use these as your coefficients, and generate a set of predictions. You then repeat this about 1000 times, grabbing different sets of slightly-differing coefficients.

In other words, this approach comes with a few assumptions: 1) Your coefficients might be slightly off, but if they're wrong, they follow a normal distribution. 2) The distribution for each coefficient is related to the other coefficients in specific, empirically observed ways. 3) These distributions don't necessarily have standard deviations that reflect the nice new standard deviations generated from our DSP p-values! Ordinarily, I'd think that you'd want a multivariate normal distribution where each assumptions 1 (normal) and 2 (correlated) apply, but where you've also constrained each coefficient's distribution to reflect the standard errors from DSP. But there doesn't seem to be a good way to do this, since standard error doesn't directly factor into making a multivariate normal distribution (to my knowledge). You mostly just need the mean (coefficients) and a variance-covariance matrix.

To any kind souls out there who have read this far, what would you recommend? Should I just use normal prediction? Should I simulate with a multivariate normal distribution? Should I make some weird third multivariate-normal-distribution-that-somehow-resembles-my-standard-errors-made-indirectly-from-MRQAP-DSP?

Any thoughts would be appreciated!

More Timothy Fraser's questions See All
Similar questions and discussions