How can I calculate false discovery rate using spss ?

02 February 2015 40 4K Report

I am planning to calculate of false discovery rate using spss as comparison to Bonferroni adjustment to the p value. Can anyone show me a step-by step procedure to calculate false discovery rate using spss?

Note: I know little about spss syntax

Bruce Weaver Popular answer

You could use a DATA LIST command to create a small dataset containing the p-values for your multiple tests. Do something like the following, replacing the data line(s) between BEGIN DATA and END DATA with your own list of p-values.

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001

END DATA

After that, run the syntax you see on that IBM website.

***************************************************************

Edited 28-Aug-2015: Adding more detailed instructions in response to Clarissa's post on 28-Aug-2015.

Clarissa, here are more detailed instructions.

In SPSS, click on File > New > Syntax to open a new syntax window.

Copy the following lines of syntax and paste them into your syntax window

Replace the p-values in my example (the bold values between the BEGIN DATA and END DATA lines) with your own list of p-values.

In the toolbar of the syntax window, click on Run > All.

View the results (of the LIST command) in the Output viewer. Significant results will have variable test = 1.

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001

END DATA.

SORT CASES by p (a).

COMPUTE i=$casenum.

SORT CASES by i (d).

COMPUTE q=.05.

COMPUTE m=max(i,lag(m)).

COMPUTE crit=q*i/m.

COMPUTE test=(p le crit).

COMPUTE test=max(test,lag(test)).

FORMATS i m test(f8.0) q (f8.2) crit(f8.6).

VALUE LABELS test 1 'Significant' 0 'Not Significant'.

LIST.

* Significant results have test = 1.

Bruce Weaver

AFAIK, there is no built-in procedure for the FDR. But see the link below.

http://www-01.ibm.com/support/docview.wss?uid=swg21476447

Herri Mulyono

HI Bruce, i have read the article you refer actually. But, I am still confused on how to write a SPSS syntax regarding to my variables

Bruce Weaver

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001

END DATA

After that, run the syntax you see on that IBM website.

***************************************************************

Edited 28-Aug-2015: Adding more detailed instructions in response to Clarissa's post on 28-Aug-2015.

Clarissa, here are more detailed instructions.

In SPSS, click on File > New > Syntax to open a new syntax window.

Copy the following lines of syntax and paste them into your syntax window

Replace the p-values in my example (the bold values between the BEGIN DATA and END DATA lines) with your own list of p-values.

In the toolbar of the syntax window, click on Run > All.

View the results (of the LIST command) in the Output viewer. Significant results will have variable test = 1.

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001

END DATA.

SORT CASES by p (a).

COMPUTE i=$casenum.

SORT CASES by i (d).

COMPUTE q=.05.

COMPUTE m=max(i,lag(m)).

COMPUTE crit=q*i/m.

COMPUTE test=(p le crit).

COMPUTE test=max(test,lag(test)).

FORMATS i m test(f8.0) q (f8.2) crit(f8.6).

VALUE LABELS test 1 'Significant' 0 'Not Significant'.

LIST.

* Significant results have test = 1.

Herri Mulyono

Thanks a lot Bruce, really helpful :)

Vered Madar

A useful shortcut for applying the Benjamini-Hochberg FDR and computing adjusted p-values(q-values) for the , is the following (for say, alpha = 5%):

Suppose you have m p-values and r p-values out of the original m p-values are

Clarissa Trzesniak

Dear Herri and Bruce,

I have tried to run the FDR correction by following your recommendation found herein, but I have never run any syntax in SPSS (I am just familiar with the regular windows interface). Therefore, I do not know where to enter the data described here or in the IBM website. Would you be so kind and patient and explain to me how to proceed? The more step-by-step the explanation is, the better. :-) I would be extremely grateful.

Thank you very much in advance!

Clarissa

Bruce Weaver

Carissa, I updated my earlier post with more detailed step-by-step instructions on how to run the syntax from the IBM-SPSS website. Please see above.

Clarissa Trzesniak

Thank you VERY MUCH, Bruce. It worked and I am extremely grateful and happy!! :-)

Herri Mulyono

Hi Bruce, thanks for the details and I have made a template file for the spss syntax using your procedure. And again, it is very helpful.

Sugai Liang

@Bruce. Very helpful :). Thanks a lot. I have a question. Should I consider the total number of tests (eg. The data has three groups. The comparisons among the groups are 3.)? The FDR python code mentioned total number of tests (http://stats.stackexchange.com/questions/870/multiple-hypothesis-testing-correction-with-benjamini-hochberg-p-values-or-q-va). Thanks a lot!

Bruce Weaver

Hi Sugai. There was a problem with the link you inserted in your last message--it ran into the word Thanks, and so didn't work. Here is a working version of it (I hope), for anyone who wants to look at that Stack Exchange thread:

http://stats.stackexchange.com/questions/870/multiple-hypothesis-testing-correction-with-benjamini-hochberg-p-values-or-q-va

I don't really understand your question. If you are saying you have 3 groups and wish to make all pairwise comparisons among the groups, you could always use Fisher's least significant difference (LSD) procedure, as it controls the family-wise (FW) alpha at the same level as the per-contrast alpha. If you need support for the claim that Fisher's LSD controls the FW alpha when there are 3 groups, see the article linked below. See also David Howell's book, Statistical Methods for Psychology, and Thom Baguley's book, Serious Stats.

HTH.

http://www.ncbi.nlm.nih.gov/pubmed/17128424

Sugai Liang

@Bruce. Thanks a lot for your reply. Fisher's LSD does not correct for multiple comparisons (https://en.wikipedia.org/wiki/Post_hoc_analysis). If the data has 3 groups, post hoc tests will have 3 comparisons among these groups. But you didn't consider the number of comparisons in your syntax. The current data has 3 groups, and I hope to use FDR to correct the uncorrected-p value. Thanks a lot!

Bruce Weaver

Sugai, I found the line on that Wikipedia page that says Fisher's LSD does not correct for multiple comparisons. I assume that the author means it does not make any adjustment to the per-contrast alpha. That is true. However, the pair-wise contrasts are only carried out if the omnibus F-test is statistically significant. And in the case of 3 groups, Fisher's LSD does maintain the FW alpha at the per-contrast alpha, despite making no adjustment to the per-contrast alpha.

Re the syntax I posted, you say that it doesn't consider the number of comparisons. I think it does. The IBM Tech Note it was taken from (see link below) includes this advice:

Make sure that there are no other cases in the data file, as the number of cases in the file is used to count the number of comparisons involved.

Look at the syntax again, and the output it generates, and pay attention to the variable m.

HTH.

p.s. - In the syntax I posted earlier, I would add another SORT CASES line just before the final LIST, sorting by the p-values in ascending order. I think this lists the results in a manner that is more intuitive, with any significant results listed first, and non-significant results coming later.

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001

END DATA.

SORT CASES by p (a).

COMPUTE i=$casenum.

SORT CASES by i (d).

COMPUTE q=.05.

COMPUTE m=max(i,lag(m)).

COMPUTE crit=q*i/m.

COMPUTE test=(p le crit).

COMPUTE test=max(test,lag(test)).

FORMATS i m test(f8.0) q (f8.2) crit(f8.6).

VALUE LABELS test 1 'Significant' 0 'Not Significant'.

SORT CASES by p (a).

LIST.

* Significant results have test = 1.

http://www-01.ibm.com/support/docview.wss?uid=swg21476447

Vered Madar

Sugai, do you think your 3 groups mean different nature of p values or 3 parts of one big set of p values that you had to cut for 3 parts?

Efron has a paper 2008 discussing combining p values from different two nature of p values. Benjamini and Bogomolov 2014 have a BH FDR approah that corrects for families. I have a method to combine chunks of pvalues that do not have different nature.

I just don't understand what your groups are for? Could you explain more?

Sugai Liang

@Bruce. Thanks a lot for your reply. Very helpful. I just know a little bit about R and Python. I should learn something about SPSS syntax. Thanks a lot!

Rahajeng N Tunjungputri

Dear all, is there any way to also obtain the corrected p-values from the syntax instead of just a category of 1 vs. 0 for significant and not after the correction procedure?

Thank you!

Herri Mulyono

@Rahajeng, please refer the above discussion, you can use either FDR or Bonferroni to adjust the p-values. You can follow the syntax procedure as suggested by Bruce. Hope it help.

Joseph Jake Shenker

@Bruce, I just stumbled upon your code and it is EXTREMELY helpful, so thank you very much!

Bruce Weaver

Good to hear you found it useful, Jake. ;-)

p.s. - Note that a bit further down the thread, I posted a slightly revised version with a SORT CASES command inserted just before the final LIST command. I think this gives the results in a somewhat more sensible order.

SORT CASES by p (a).

LIST.

* Significant results have test = 1.

Guillermo De Velasco

Could I change COMPUTE q=.05. for q=0.1?? thanks

Bruce Weaver

@ Guillermo: Yes, you can set q to whatever false discovery rate (FDR) you wish. From the IBM Technote page where the basic syntax came from (with emphasis added):

These commands use a .05 level for the false discovery rate. You can change that by changing the value of q in the second COMPUTE command.

http://www-01.ibm.com/support/docview.wss?uid=swg21476447

Rahajeng N Tunjungputri

For me in the end this is the best and quickest way to get your BH-corrected values: paste all of your p values to this automatic calculator http://www.sdmproject.com/utilities/?show=FDR

Vered Madar

Hi everyone,

We have a new, and much faster algorithm for the Benjamini-Hochberg procedure(aka the BH-LSU). Instead of sorting p-values it makes a linear search of O(m). Besides, it does handle many chunks of p-values without changing the global FDR level, no compromise is required. We will appreciate if someone could write this code into SPSS. We have R and SAS versions, but not SPSS.

The paper is available at advance access in Bioinformatics, http://bioinformatics.oxfordjournals.org/content/early/2016/02/25/bioinformatics.btw029

It is called "FastLSU: a more practical approach for the Benjamini–Hochberg FDR controlling procedure for huge-scale testing problems.

The reviewers asked us to make a running time simulation, which you can finds it at

https://www.researchgate.net/publication/294087820_FastLSU_Running_Time_simulations_for_a_single_chunk

I must admit that I was somehow surprised how fast it is, when I compared it to a naive R code.

Thanks,

Vered

Research FastLSU, Running Time simulations for a single chunk.

Bruce Weaver

Hi Vered. I don't have time to look at it right now. But I should be able to look at it in late April or May, if no one has programmed it in SPSS for you by then. Alternatively, you could post to the SPSSX-L mailing list, and see if any of the regulars there have the time & interest. See the link below for the Nabble archive of that mailing list. (If you post via Nabble, you can upload attachments.)

Cheers,

Bruce

http://spssx-discussion.1045642.n5.nabble.com/

Vered Madar

Dear Bruce,

Thanks so much!

Please feel free to take your time, unless someone else wants to help. :)

If you forgive me, I allowed myself to borrow some lines from your last year code (FEB 2015) since I never wrote SPSS code (but I know SAS)

The New BH algorithm, for a single batch of p-values, should be simply something a like the following:

DATA LIST free / p (F5.3).

BEGIN DATA

.152 .093 .055 .035 .044 .017 .001 (and lots of orther p-values)

END DATA.

COMPUTE q=.05.

COMPUTE crit=.05. #same as 0.05*m/m;

COMPUTE test=(p le crit).

COMPUTE r1 = sum(test) # count how much test=1 you have

***begin loop, stop for the first time the loop ends with r = r1.

COMPUTE r = r1 # count how much test=1 you have

COMPUTE crit=.05.*r/m.

COMPUTE test=(p le crit).

COMPUTE r1 = sum(test)

end loop if r1 = r, else goto the beginning of the loop.

******end loop;

FORMATS i m test(f8.0) q (f8.2) crit(f8.6).

VALUE LABELS test 1 'Significant' 0 'Not Significant'.

LIST.

* Significant results have test = 1.

Vered

Piotr Sowa

Hello!

Thank you for nice suggestions. The syntax given above by Bruce works well, but this produces only a new significance level while it does not give any new (adjusted) p-values as far as I understand it... It would be nice to keep the "usual" significance level (0.05) and get the new, adjusted p-values. Is it possible to calculate it in SPSS in this way?...

Thank you!

Henry Okuchukwu Ebili

Why did you use 0.05 as the value of q in the equation?

Vered Madar

You can use any other value < 0.5 as you want, as long as you state it as your significance level.

Declaring a significance level of 0.05, 0.2 or even 0.4 is same as you announcing how far you are comfortable with your overall false discovery error rate.

Somehow 5% became the standard, but I saw papers published with 10%, and if I am not wrong some with FDR

Dorota Jaworska-Pasterska

Hi,

I've just had the same problem as Herri, and this thread has been extremely useful. Thank you so much for all the comments!

Unfortunately the syntax given above by Bruce has not worked for me (I am a novice to SPSS syntax), so I have been looking for an online FDR calculator, and found this:

https://tools.carbocation.com/FDR

Does anyone have experience using it?

best regards,

Dorota

Vered Madar

We have a simple R code from our FastLSU paper that many people like to download. It is just a different algorithm but performs faster and promise to gives you the exact same result. It is also aimed to control the FDR when you have two or more batches of p-values and you want to control the global FDR level.

If you don"t have many p-values, the original 1995 BH algorithm is really simple to apply too (and can serve as a good exercise). Just sort the p-values from the largest to the smallest and search for the largest kth p-values that will be < k*alpha/m. m will be the overall number of p-values you have, alpha can be 0.05 or 0.1 or any significance level you want to use.

Herri Mulyono

Thank you Vered, your explanation is really useful

Larry Mintz

You can try to add extra weight to your model. This would bias it

Yesaya Tommy Paulus

Thank you Vared, your explanation about the value of q is really useful.

Zainab Awada

Thank you for the question and detailed answer. The code provided by Bruce was very helpful for me too. but I have a question: the results pertain to the descending order of old p-values, right?

Paulo Barraza

see: https://www.sdmproject.com/utilities/?show=FDR

Bruce Weaver

@ Zainab Awanda: Sorry, I missed your post 3 months ago. Yes, the code I edited on 28-Aug-2015 includes these commands, which sort the data in descending order of the p-values:

SORT CASES by p (a).

COMPUTE i=$casenum.

SORT CASES by i (d).

If you wanted to list them in ascending order of the p-values, you could add another SORT command before the final LIST command:

SORT CASES by p(a).

Or if you wanted to list them in the original order of the p-values, you'd have to add something like this immediately after reading in the p-value list via DATA LIST:

COMPUTE case = $CASENUM.

And then, before the final LIST command:

SORT CASES by case.

Cheers,

Bruce

Zainab Awada

Thank you very much Bruce.

Krzysztof Gbyl

@ Bruce Weaver,

Thank you for your very useful post.I have used your script and was able to FDR-correct my p-vaules.

However, my supervisor used Matlab to run the same set of p-values and he got two different thresholds for FDR-corrected significance:

1) one with assumptions of non-dependency of p-values ( which was equaled to my result),

2) one with assumptions of dependency between the p-values.

Please, could you explain if the results of your script assume non-dependency or not?

and if it assumes the non-dependency, what script to use in SPSS to get a threshold for the p-values that are dependent?

BW,

Krzysztof

Bruce Weaver

Krzysztof e-mailed me the same question. Here's the reply I sent.

-----------------------------

Hello Krzysztof. That's a good question. The code I posted uses the original Benjamini-Hochberg approach. Does it assume independence of the p-values? I think it depends who you ask. When I Googled it, I found this interesting StackExchange discussion:

https://stats.stackexchange.com/questions/205516/fdr-correction-when-tests-are-correlated

Here are a couple of answers that I thought were useful. First this one: FDR does not assume independence; see my answers here and here for example. There are several more specialized procedures for individual types of dependence, though you would have to provide more detail as to what kind of dependence you have in your dataset. – Chris C Apr 5 '16 at 21:50

And also this one:

You're looking for the Benjamini-Yekutieli procedure:

Benjamini, Yoav; Yekutieli, Daniel. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 (2001), no. 4, 1165--1188. doi:10.1214/aos/1013699998. http://projecteuclid.org/euclid.aos/1013699998

The procedure is available in R using the method = "BY" option in p.adjust(). For more info, try ?p.adjust.

When I find time, I'll have to look at those links in the first answer and the Benjamini-Yekutieli article. My answer, for now, combining info from the various answers posted, is that (I think) BH does not assume independence of the p-values; but it does not assume any particular pattern of dependencies either. Thanks for bringing this distinction to my attention!

Badges
Science topic

More Herri Mulyono's questions See All

Instruments (questionnaire) to identify students with special needs?

Can you suggest reading (references) on instruments (questionnaire) that teacher can use to identify students with special needs? (the questionnaire requires parents to complete)

04 May 2018 6,346 3 View

Can "Flesch Reading Ease" and "Flesh Kincaid Grade Scale" be used to examine non-English reading texts

Dear Colleagues, I wonder to know if the formula "Flesch Reading Ease" and "Flesh Kincaid Grade Scale" can be used to measure a level of readability of reading text of which the language is not...

05 June 2017 3,511 6 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Just bounced on me. Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal...

31 July 2024 8,141 13 View

What is the acceptable p-value cutoff for GO enrichment analysis ?

I have an RNA-seq data that I have analysed using Limma-voom and have extracted the gene IDs, log2FC and the p-values. At p value < 0.05, I have over 10,000 DEGs, however, when I run the GO...

31 July 2024 225 2 View

How to do Mann-Whitney U test with Bonferroni corrected p-values?

Dear All, My lab primarily works on insect wing patterns. In one of the projects, my student and I have defined 19 abnormality characters on the forewing and 6 abnormality characters on the...

31 July 2024 6,464 5 View

How to back transform the results generated from analyses using log transformed with In(X+1) data?

I am conducting my analysis using SPSS. I log transformed my data using In(X+1) as my data contain zero values. However, when I want to back transform the regression coefficients generated from my...

31 July 2024 7,860 3 View

Which statistical test should we use?

N=6 Comparing pre and post test likert scale responses. Participants are mix of practicing & preservice teachers.

30 July 2024 7,233 4 View