How can I generated Simulated/Predicted Effects from an MRQAP model?

06 November 2020 2 2K Report

I want to generate some nice prediction plots from my MRQAP model. I've laid out my process below, and would be very grateful to get anyone's insight, as I'm not seeing much written about this online.

I am building my own regression models on network data in R, using quadratic assignment procedure with Decker and colleagues (2007) double-semi-partialling method. In other words, I am predicting the weight of an edge given its respective node traits. This approach uses node permutations of residuals to adjust for interdependence of observations in the network. (Regression with networks involves huge heteroskedasticity, because the observations are literally connected).

Traditionally, this method (MRQAP with DSP) just produces a p-value, and original standard errors are suspect. So, I am using a Doug Altman's method to back-transform p-values into new standard errors that better reflect the actual error range (read more here; thanks to @Andrew Paul McKenzie Pegman: https://www.bmj.com/content/343/bmj.d2090). This at least allows me to make nice dot-and-whisker plots of beta coefficients and with their confidence intervals (estimate + se*.196, etc.). However, I'd still really like to make predictions.

There seem to be two logical routes to make predictions from an MRQAP model.

First, you could just make predictions normally.

This relies on your observed residuals in the model to calculate the standard error for your predictions. I think this might even work, because the homoskedasticity assumption in regression is really about covariate standard error and p-values, not prediction; this means that a heteroskedastic model can still produce solid predictions (see Matthew Drury's & Jesse Lawson's helpful notes here: https://stats.stackexchange.com/questions/303787/using-model-with-heteroskedasticity-for-predictions). However, I would love some external verification on this. Any sources I can draw on to be confident I can use this for visualizing predicted effects from networks?

Second, you could simulate the predictions, like in Zelig/Clarify.

Simulation requires building a multivariate normal distribution, where each vector has a mean of one of your model coefficients, and where the vectors share the same general correlation structure as your variance-covariance matrix. Then, you make a sample from this multi-variate distribution (eg. grab a row of observations from each vector), use these as your coefficients, and generate a set of predictions. You then repeat this about 1000 times, grabbing different sets of slightly-differing coefficients.

In other words, this approach comes with a few assumptions: 1) Your coefficients might be slightly off, but if they're wrong, they follow a normal distribution. 2) The distribution for each coefficient is related to the other coefficients in specific, empirically observed ways. 3) These distributions don't necessarily have standard deviations that reflect the nice new standard deviations generated from our DSP p-values! Ordinarily, I'd think that you'd want a multivariate normal distribution where each assumptions 1 (normal) and 2 (correlated) apply, but where you've also constrained each coefficient's distribution to reflect the standard errors from DSP. But there doesn't seem to be a good way to do this, since standard error doesn't directly factor into making a multivariate normal distribution (to my knowledge). You mostly just need the mean (coefficients) and a variance-covariance matrix.

To any kind souls out there who have read this far, what would you recommend? Should I just use normal prediction? Should I simulate with a multivariate normal distribution? Should I make some weird third multivariate-normal-distribution-that-somehow-resembles-my-standard-errors-made-indirectly-from-MRQAP-DSP?

Any thoughts would be appreciated!

Muhammad Ali

Have a look at Article Predicting product co-consideration and market competitions ...

https://www.cambridge.org/core/services/aop-cambridge-core/content/view/D3177E150359F07B2FE3DF123F13027D/S2053470118000045a.pdf/predicting_product_coconsideration_and_market_competitions_for_technologydriven_product_design_a_networkbased_approach.pdf

Article Sensitivity of MRQAP Tests to Collinearity and Autocorrelati...

http://nosh.northwestern.edu/wp-content/uploads/2017/11/wang-et-al-ASME-2016-1.pdf

Timothy Fraser

Thanks Muhammad Ali for your feedback. I'm afraid these papers don't seem to specifically answer how to handle predictions, but I could be wrong. (Dekker and colleagues' 2007 piece is certainly foundational, since they developed the technique I'm using (double-semi partialling). Any thoughts out there would still be very helpful.

Tentatively, for those interested, I've fallen on the following conclusion:

Heteroskedasticity is the big problem in network regression models. But, this is because it inflates type II error for coefficient p-values. Heteroskedasticity does not invalidate model predictions; for example, machine learning models, which are less concerned with coefficient p-values and more with prediction, do not worry about heteroskedasticity as much.

As a result, I have concluded that the standard methods should be fine. Simulations, like used in Zelig, are even better, because the multivariate normal distribution helps us adjust for sampling error too. But, as a safeguard, we probably should only present predictions when varying a coefficient that MRQAP-DSP found to be statistically significant.

Feel free to be in touch if you have thoughts about this; would love to get your input.

How can we improv the sustainability of decentralized wastewater treatment systems in resource-limited settings through novel bio-inspired approaches?

What are the potential effects of atomic-scale defects on the electronic and optical properties of thin films in emerging semiconductor materials?

Can advanced wastewater treatment processes be integrated with real-time monitoring and adaptive control systems?

What are the Mechanisms and Efficiency of Thin Films for Wastewater Treatment?

Is Integrating thin film with existing systems challenging?

Why is there no condensation in my Star-CCM+ fluid film multiphase case?

How to make pre-post data analysis more 'sophisticated' than a paired samples t-test?

How is a proton generator made?

Can anyone identify these Cumaceans?

How is turnaround time measured in the study, and how did the transition into lithium heparin tubes affect its value, and why?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?