False discovery rate

From Wikipedia, the free encyclopedia

This article may require cleanup to meet Wikipedia's quality standards.
Please discuss this issue on the talk page or replace this tag with a more specific message.
This article has been tagged since April 2006.

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. It controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses ^[1]. It is a less conservative comparison procedure with greater power than familywise error rate^[2] (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimating q-values rather than fixing a level at which to control the FDR.

1 Classification of m hypothesis tests
2 Controlling procedures
- 2.1 Independent tests
- 2.2 Dependent tests
3 References

[edit] Classification of m hypothesis tests

	# declared non-significant	# declared significant	Total
# true null hypotheses	$U$	$V$	$m 0$
# non-true null hypotheses	$T$	$S$	$m - m 0$
Total	$m - R$	$R$	$m$

$m 0$ is the number of true null hypotheses
$m - m 0$ is the number of false null hypotheses
$U$ is the number of true negatives
$V$ is the number of false positives
$T$ is the number of false negatives
$S$ is the number of true positives
$H 1 ... H m$ the null hypotheses being tested
In m hypothesis tests of which m₀ are true null hypotheses, R is an observable random variable, and S, T, U, and V are all unobservable random variables.

The false discovery rate is given by $\mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ]$ and one wants to keep this value below a threshold $α$ .

[edit] Controlling procedures

[edit] Independent tests

The Simes procedure ensures that its expected value $\mathrm{E}\!\left[ \frac{V}{V + S} \right]\,$ is less than a given $α$ (Benjamini and Hochberg 1995). This procedure is only valid when the $m$ tests are independent. Let $H_1 \ldots H_m$ be the null hypotheses and $P_1 \ldots P_m$ their corresponding p-values. Order these values in increasing order and denote them by $P_{(1)} \ldots P_{(m)}$ . For a given $α$ , find the largest $k$ such that

$P_{(k)} \leq \frac{k}{m} \alpha.$

Then reject (i.e. declare positive) all $H (i)$ for $i = 1, \ldots, k$ . ...Note, the mean $α$ for these $m$ tests is $\frac{\alpha(m+1)}{2m}$ which could be used as a rough FDR (RFDR) or " $α$ adjusted for $m$ indep. tests."

[edit] Dependent tests

The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest $k$ such that:

$P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha$

If the tests are independent: $c (m) = 1$ (same as above)
If the tests are positively correlated: $c (m) = 1$
If the tests are negatively correlated: $c(m) = \sum _{i=1} ^m \frac{1}{i}$

In the case of negative correlation, $c (m)$ can be approximated by using the Euler-Mascheroni constant

$\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma.$

Using RFDR above, an approximate FDR (AFDR) is the min(mean $α$ ) for $m$ dependent tests = RFDR / ( ln( $m$ )+ 0.57721...).

[edit] References

Benjamini, Yoav; Hochberg, Yosef (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". Journal of the Royal Statistical Society, Series B (Methodological) 57 (1): 289–300. MR 1325392.

Benjamini, Yoav; Yekutieli, Daniel (2001). "The control of the false discovery rate in multiple testing under dependency". Annals of Statistics 29 (4): 1165–1188. DOI:10.1214/aos/1013699998. MR 1869245.

Storey, John D. (2002). "A direct approach to false discovery rates". Journal of the Royal Statistical Society, Series B (Methodological) 64 (3): 479–498. DOI:10.1111/1467-9868.00346. MR 1924302.