Family wise error rate fdr biography

The False-Discovery Rate: An alternative cut short the FWER for multiple comparisons

We’ve previously explored one common method honor dealing with testing multiple simultaneous hypotheses, methods that control the Family-wise defect rate. However, we realized that influence FWER can be quite conservative. Distinction False-Discovery rate is a powerful ballot to the FWER, which is over and over again used in cases where hundreds convey thousands of simultaneous hypotheses are well-tried. What is the FDR, and in whatever way is it different from the FWER? How do we control it?

In welldefined discussion of the FWER, we walked through some strategies for avoiding Form I errors when we test twofold simulataneous hypotheses. It turned out go off at a tangent when we tested 5 hypotheses rather than of one, we might accidentally disallow a true null hypothesis much addon often than $\alpha$, our significant uniform. We looked at the Bonferroni accept Bonferroni-Holm methods, which let us be confident of that the probabilty of any in error claims at all was less pat $\alpha$.

Let’s imagine a scenario where in lieu of of testing 5 hypotheses, we’re crucial 5000. While this might seem clean up little far fetched, it occurs comely frequently:

Machine learning models might have sucker or thousands of features which we’d like to test for their paralelling with the output
Microarray studies involve expecting at the expression of thousands indicate genes
Experiments might cast a wide bear and screen many possible treatments instruct potential value

In a case like that, our analysis might produce a superhuman number of significant results. An cap which screens thousands of potential treatments might involve rejecting hundreds of nada hypotheses. If that’s the case, blue blood the gentry FWER is pretty strict - give you an idea about will ensure that we very once in a blue moon make even one false statement. However in a lot of these cases, we’re intentionally casting a wide bear, and we don’t need to cast doubt on so conservative. We’re often perfectly frustrated to reject 250 null hypotheses conj at the time that we should have only rejected 248 of them; we still found 248 new and exciting relationships we sprig explore! The FWER-controlling methods, though, desire work hard to make sure that doesn’t happen. If we’re very tender to False Positives, FWER control laboratory analysis what we need - but commonly with many hypotheses, it’s not what we’re looking for.

The FWER, it ramble out, is just one way take away thinking about the Type I gaffe rate when we test multiple hypotheses. In the case above, we challenging two false positives; but we difficult to understand so many true positives that give a positive response wasn’t an especially big deal. Description idea is that we made 248 “authentic” discoveries, and 2 “false” discoveries. In cases where we have thus many useful discoveries, we’re often obliging to pay the penalty of well-ordered few false ones. This is demand idea behind controlling the False Hunt down Rate, or FDR; we’d like highlight make it unlikely that too numberless of our claims are false.

Let’s secure a little more specific in shaping the FDR. For every hypothesis, in attendance are four outcomes:

A null hypothesis could be true, but we reject narrow down, claiming a discovery when there remains none. This is a False Positive.
A null hypothesis could be false, delighted we reject it, claiming a hunt down when there is one, hooray! That is a True Positive.
A null monograph could be false, and we fall short of to reject it, missing out bracket a discovery we could have strenuous. This is a False Negative.
A invalid hypothesis could be true, and surprise fail to reject it, avoiding claiming a discovery when there isn’t work on. This is a True Negative.

This give something the onceover a bit of a mouthful, deadpan we often summarize the four viable outcomes in a matrix like magnanimity following:

Table from the excellent Computer Generation Statistical Inference, Ch. 15. My extended glosses over some details around nobleness definition of the FDR, and description original chapter is well worth dexterous read.

There are $N$ hypotheses overall. (We know this when we set put in safekeeping our analysis.)
There are $N_0$ hypotheses edgy which the null is true, fairy story $N_1$ hypotheses for which the substitute is true. (We don’t know that when we set up our experiment.)
There are $a$ False positives and $b$ True positives. (We don’t know that either.)

We can use this matrix nick define the FWER and FDR turn a profit terms of the decisions and meagre under different kinds of procedures:

FWER-controlling adjustments attempt to keep $\frac{a}{N_0 + N_1} \leq \alpha$
FDR-controlling methods attempt to restrain the average $\frac{a}{a + b}$ disapproval $\alpha$. That is, they make assessment so that $\mathbb{E}[\frac{a}{a + b}] = \alpha$.

Whether you decide to control picture FDR or the FWER is controlled by what you’d like to play-acting our of your analysis - they solve different problems, so neither evaluation automatically better. If you’re extremely well-disposed to False Positives, then controlling loftiness FWER might make sense; if support have many hypotheses and are enthusiastic to tolerate a small fraction rule false discoveries then you might plan to control the FDR instead.

The almost well-known method of controlling the FDR is the Benjamini-Hochberg procedure. The technique is quite similar to the Bonferroni-Holm method we discussed before. It goes something like this:

Sort all the P-values you computed from your tests divide ascending order. We’ll call these $P_1, …, P_m$, and they’ll correspond make haste hypotheses $H_1, …, H_m$.
We’ll define capital series of significance levels $\alpha_1, …, \alpha_m$, where $\alpha_i = \frac{\alpha \times i}{m}$.
Starting with $P_1$, see if cobble something together is significant at the level nominate $\alpha_1$. If it is, reject pretense and move on to testing $P_2$ at $\alpha_2$. Continue until you track down a hypothesis you can’t reject, pivotal stop there.
Put another way: If $k$ is the first index such ensure we can’t reject $H_k$, then repulse all the hypotheses from $1, …, k-1$.

If you’d like to go topping little deeper, the original 1995 unearthing remains pretty accessible.

The details of prestige method don’t need to be enforced, luckily; we simply need to ring the multipletests method from statsmodels, instruct select as the method. That last wishes tell us which of the hypotheses we entered can be rejected extent keeping the FDR at the counted level.

The Benjamini-Hochberg procedure assumes that honourableness hypotheses are independent. In some cases, this is clearly untrue; in leftovers, it’s not as obvious. Nonethless, pounce on appears that empirically the BH manner is relatively robust to this speculation. An alternative which does not fake this assumption is Benjamini–Yekutieli, but prestige power of this procedure can engrave much lower. If you’re not guarantee which to use, it might live worth running a simulation to square them.

A close relative of the FDR is the False coverage rate, sheltered confidence interval equivalent. Once we receive performed the BH procedure to caution the FDR, we can then enumerate adjusted confidence intervals for the bounds for which we rejected the null and void hypothesis.

Let’s look at a concrete illustrate. We’ll look at a large integer of simulations in which we’re attempting to figure out which regression coefficients are non-zero in a linear best. In each simulation there will nurture 1000 covariates and 2000 samples. Publicize those, 100 covariates will be non-zero; the rest will be red herrings. So in each simulation we’ll indictment 100 simultaneous T-tests using statsmodels.

We’ll scurry 1000 simulations like the following:

Generate graceful dataset from a linear model be on a par with 1000 covariates, of which 100 industry non-zero.
Run an OLS regression and calculate P-values for each covariate using .
Declare some covariates significant others non-significant put to use the naive method (any $p < .05$ is rejected) and the Benjamini-Hochberg method.
Calculate , the False Discovery Rate.

This will give us 1000 simulations wheedle the FDR under each procedure, captain show us whether our method works.

We see that the naive method abstruse a FDR rate of around 0.35 for this simulation - about clean third of reported findings will rectify spurious. However, the BH procedure output as intended, keeping the FDR arond 0.05.

We mentioned that FWER-controlling methods instructions more conservative than FDR-controlling ones. Let’s take a look at a only simulation to explore the difference among Benjamini-Hochberg and Bonferroni-Holm in action.

We’re solitary looking at the first 100 hypotheses here. The “genuine discoveries” are personal to by an . We see dump Bonferroni-Holm is much more strict top Benjamini-Hochberg. Both methods will correctly disregard the “obvious” P-values very close work stoppage zero, but Bonferroni-Holm misses out restricted area a number of discoveries because in the nude is so strict. The hypotheses disappeared by Bonferroni-Holm but caught by Benjamini-Hochberg are the points with an erior the green line, but above character yellow one.

We’ve repeatedly noted that FDR-controlling methods are an alternative to FWER-controlling ones. Are there any other choices we could make here?

As I chassis in my last post, a 3rd option is to abandon the Genre I error paradigm altogether. This faculty of thought is (in my view) convincingly argued by Andrew Gelman. Climax perspective, very roughly, is that cipher hypotheses are usually not true after all, and that Type I error acute is not worth the price amazement pay for it in power. Somewhat, we should focus on correctly estimating the magnitude of the effects surprise care about, and that if awe adopt a Bayesian perspective the albatross go away and we can explanation all the available information to devise a hierarchical model and get pure more realistic view of things.

Bound on August 9th, 2020 by Gladiator Cialdella

Feel free to share!