Hi, what are people currently using to determine the reliability of meta-analysis methods ? I had a recent comment that Fail-safe N was no longer regarded as a useful or appropriate method, but is there anything that is more appropriate?
Fail-safe is essentially a sensitivity analysis as opposed to assessing true reliability. It essentially aims to test two components - publication bias and statistical robustness.
You can assess publication bias using simple funnel plot tests (Egger's), visual interpretation of funnel plots and/or trim and fill. Quantitative tests for assessing funnel plots like Egger's and trim and fill have specific requirements and limitations so may or may not be appropriate. The truth though is that all methods of assessing publication bias are inaccurate and insensitive.
You can assess the statistical 'robustness' through sensitivity analysis in various forms, such as use of subgroups. A 'leave-one-out' analysis of inference is probably the most intuitive and conceptually closest to fail-safe N.
Jack Henry thanks for that clear response. Just wondering what happens when there is not that much data to sub divide. Do you know if that becomes an issue?
There is no well-defined lower limit to the number of studies required to perform qualitative sub-group analysis, but too few studies would generally preclude quantitative tests for difference between sub-groups or very concrete conclusions regarding the groups.
If you have very few studies, it's probably most practical to do a leave-one-out analysis and qualitative assessment of funnel plots, generally speaking.
As already underscored by Jack Henry, validated methods to assess whether a meta-analysis is robust (and its results reliable) have their own requirements and limitations. Apart from funnel plots, the Egger's test, the trim-and-fill method and leave-one-out analyses, you may consider "p-hacking" detection systems like the p-curve method.
The p-curve method is used to test if a set of studies included in a meta-analysis is, on average, powered enough to detect a true effect of intervention, and to correct for the potentially inflated estimates that arise from the publication of results intentionally modified to be significant (the so-called “p-hacking”).
If combined with other tests, this method may give you a help to evaluate both the risk of publication bias and any potential sign of data dredging.
However, in my opinion, the point is that any reliable meta-analysis should be backed by a well-conducted systematic literature review. Beyond a thorough evaluation of quantitative synthesis methods, you'd better focus on the review methodology (sufficiently wide literature search, proper study selection, critical evaluation of included evidence...). This is a fundamental precondition for a robust analysis and where, in my experience, the most common flaws are hidden. What follows can be important too, but it is rather secondary for the overall consistency of results.
Reference
Article p-Curve and Effect Size
A practical example of p-curve method application
Article Ginseng integrative supplementation for seasonal acute upper...