I am planning to calculate of false discovery rate using spss as comparison to Bonferroni adjustment to the p value. Can anyone show me a step-by step procedure to calculate false discovery rate using spss?
You could use a DATA LIST command to create a small dataset containing the p-values for your multiple tests. Do something like the following, replacing the data line(s) between BEGIN DATA and END DATA with your own list of p-values.
DATA LIST free / p (F5.3).
BEGIN DATA
.152 .093 .055 .035 .044 .017 .001
END DATA
After that, run the syntax you see on that IBM website.
You could use a DATA LIST command to create a small dataset containing the p-values for your multiple tests. Do something like the following, replacing the data line(s) between BEGIN DATA and END DATA with your own list of p-values.
DATA LIST free / p (F5.3).
BEGIN DATA
.152 .093 .055 .035 .044 .017 .001
END DATA
After that, run the syntax you see on that IBM website.
I have tried to run the FDR correction by following your recommendation found herein, but I have never run any syntax in SPSS (I am just familiar with the regular windows interface). Therefore, I do not know where to enter the data described here or in the IBM website. Would you be so kind and patient and explain to me how to proceed? The more step-by-step the explanation is, the better. :-) I would be extremely grateful.
@Bruce. Very helpful :). Thanks a lot. I have a question. Should I consider the total number of tests (eg. The data has three groups. The comparisons among the groups are 3.)? The FDR python code mentioned total number of tests (http://stats.stackexchange.com/questions/870/multiple-hypothesis-testing-correction-with-benjamini-hochberg-p-values-or-q-va). Thanks a lot!
Hi Sugai. There was a problem with the link you inserted in your last message--it ran into the word Thanks, and so didn't work. Here is a working version of it (I hope), for anyone who wants to look at that Stack Exchange thread:
I don't really understand your question. If you are saying you have 3 groups and wish to make all pairwise comparisons among the groups, you could always use Fisher's least significant difference (LSD) procedure, as it controls the family-wise (FW) alpha at the same level as the per-contrast alpha. If you need support for the claim that Fisher's LSD controls the FW alpha when there are 3 groups, see the article linked below. See also David Howell's book, Statistical Methods for Psychology, and Thom Baguley's book, Serious Stats.
@Bruce. Thanks a lot for your reply. Fisher's LSD does not correct for multiple comparisons (https://en.wikipedia.org/wiki/Post_hoc_analysis). If the data has 3 groups, post hoc tests will have 3 comparisons among these groups. But you didn't consider the number of comparisons in your syntax. The current data has 3 groups, and I hope to use FDR to correct the uncorrected-p value. Thanks a lot!
Sugai, I found the line on that Wikipedia page that says Fisher's LSD does not correct for multiple comparisons. I assume that the author means it does not make any adjustment to the per-contrast alpha. That is true. However, the pair-wise contrasts are only carried out if the omnibus F-test is statistically significant. And in the case of 3 groups, Fisher's LSD does maintain the FW alpha at the per-contrast alpha, despite making no adjustment to the per-contrast alpha.
Re the syntax I posted, you say that it doesn't consider the number of comparisons. I think it does. The IBM Tech Note it was taken from (see link below) includes this advice:
Make sure that there are no other cases in the data file, as the number of cases in the file is used to count the number of comparisons involved.
Look at the syntax again, and the output it generates, and pay attention to the variable m.
HTH.
p.s. - In the syntax I posted earlier, I would add another SORT CASES line just before the final LIST, sorting by the p-values in ascending order. I think this lists the results in a manner that is more intuitive, with any significant results listed first, and non-significant results coming later.
DATA LIST free / p (F5.3).
BEGIN DATA
.152 .093 .055 .035 .044 .017 .001
END DATA.
SORT CASES by p (a).
COMPUTE i=$casenum.
SORT CASES by i (d).
COMPUTE q=.05.
COMPUTE m=max(i,lag(m)).
COMPUTE crit=q*i/m.
COMPUTE test=(p le crit).
COMPUTE test=max(test,lag(test)).
FORMATS i m test(f8.0) q (f8.2) crit(f8.6).
VALUE LABELS test 1 'Significant' 0 'Not Significant'.
Sugai, do you think your 3 groups mean different nature of p values or 3 parts of one big set of p values that you had to cut for 3 parts?
Efron has a paper 2008 discussing combining p values from different two nature of p values. Benjamini and Bogomolov 2014 have a BH FDR approah that corrects for families. I have a method to combine chunks of pvalues that do not have different nature.
I just don't understand what your groups are for? Could you explain more?
@Bruce. Thanks a lot for your reply. Very helpful. I just know a little bit about R and Python. I should learn something about SPSS syntax. Thanks a lot!
Dear all, is there any way to also obtain the corrected p-values from the syntax instead of just a category of 1 vs. 0 for significant and not after the correction procedure?
@Rahajeng, please refer the above discussion, you can use either FDR or Bonferroni to adjust the p-values. You can follow the syntax procedure as suggested by Bruce. Hope it help.
p.s. - Note that a bit further down the thread, I posted a slightly revised version with a SORT CASES command inserted just before the final LIST command. I think this gives the results in a somewhat more sensible order.
@ Guillermo: Yes, you can set q to whatever false discovery rate (FDR) you wish. From the IBM Technote page where the basic syntax came from (with emphasis added):
These commands use a .05 level for the false discovery rate. You can change that by changing the value of q in the second COMPUTE command.
For me in the end this is the best and quickest way to get your BH-corrected values: paste all of your p values to this automatic calculator http://www.sdmproject.com/utilities/?show=FDR
We have a new, and much faster algorithm for the Benjamini-Hochberg procedure(aka the BH-LSU). Instead of sorting p-values it makes a linear search of O(m). Besides, it does handle many chunks of p-values without changing the global FDR level, no compromise is required. We will appreciate if someone could write this code into SPSS. We have R and SAS versions, but not SPSS.
The paper is available at advance access in Bioinformatics, http://bioinformatics.oxfordjournals.org/content/early/2016/02/25/bioinformatics.btw029
It is called "FastLSU: a more practical approach for the Benjamini–Hochberg FDR controlling procedure for huge-scale testing problems.
The reviewers asked us to make a running time simulation, which you can finds it at
Hi Vered. I don't have time to look at it right now. But I should be able to look at it in late April or May, if no one has programmed it in SPSS for you by then. Alternatively, you could post to the SPSSX-L mailing list, and see if any of the regulars there have the time & interest. See the link below for the Nabble archive of that mailing list. (If you post via Nabble, you can upload attachments.)
Thank you for nice suggestions. The syntax given above by Bruce works well, but this produces only a new significance level while it does not give any new (adjusted) p-values as far as I understand it... It would be nice to keep the "usual" significance level (0.05) and get the new, adjusted p-values. Is it possible to calculate it in SPSS in this way?...
You can use any other value < 0.5 as you want, as long as you state it as your significance level.
Declaring a significance level of 0.05, 0.2 or even 0.4 is same as you announcing how far you are comfortable with your overall false discovery error rate.
Somehow 5% became the standard, but I saw papers published with 10%, and if I am not wrong some with FDR
I've just had the same problem as Herri, and this thread has been extremely useful. Thank you so much for all the comments!
Unfortunately the syntax given above by Bruce has not worked for me (I am a novice to SPSS syntax), so I have been looking for an online FDR calculator, and found this:
We have a simple R code from our FastLSU paper that many people like to download. It is just a different algorithm but performs faster and promise to gives you the exact same result. It is also aimed to control the FDR when you have two or more batches of p-values and you want to control the global FDR level.
If you don"t have many p-values, the original 1995 BH algorithm is really simple to apply too (and can serve as a good exercise). Just sort the p-values from the largest to the smallest and search for the largest kth p-values that will be < k*alpha/m. m will be the overall number of p-values you have, alpha can be 0.05 or 0.1 or any significance level you want to use.
Thank you for the question and detailed answer. The code provided by Bruce was very helpful for me too. but I have a question: the results pertain to the descending order of old p-values, right?
@ Zainab Awanda: Sorry, I missed your post 3 months ago. Yes, the code I edited on 28-Aug-2015 includes these commands, which sort the data in descending order of the p-values:
SORT CASES by p (a).
COMPUTE i=$casenum.
SORT CASES by i (d).
If you wanted to list them in ascending order of the p-values, you could add another SORT command before the final LIST command:
SORT CASES by p(a).
Or if you wanted to list them in the original order of the p-values, you'd have to add something like this immediately after reading in the p-value list via DATA LIST:
Krzysztof e-mailed me the same question. Here's the reply I sent.
-----------------------------
Hello Krzysztof. That's a good question. The code I posted uses the original Benjamini-Hochberg approach. Does it assume independence of the p-values? I think it depends who you ask. When I Googled it, I found this interesting StackExchange discussion:
Here are a couple of answers that I thought were useful. First this one: FDR does not assume independence; see my answers here and here for example. There are several more specialized procedures for individual types of dependence, though you would have to provide more detail as to what kind of dependence you have in your dataset. – Chris C Apr 5 '16 at 21:50
And also this one:
You're looking for the Benjamini-Yekutieli procedure:
Benjamini, Yoav; Yekutieli, Daniel. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 (2001), no. 4, 1165--1188. doi:10.1214/aos/1013699998. http://projecteuclid.org/euclid.aos/1013699998
The procedure is available in R using the method = "BY" option in p.adjust(). For more info, try ?p.adjust.
When I find time, I'll have to look at those links in the first answer and the Benjamini-Yekutieli article. My answer, for now, combining info from the various answers posted, is that (I think) BH does not assume independence of the p-values; but it does not assume any particular pattern of dependencies either. Thanks for bringing this distinction to my attention!