Yates' correction for continuity
From Wikipedia, the free encyclopedia
Yates' correction for continuity, or Yates' chi-square test is used in certain situations when testing for independence in a contingency table. It is required as a chi-square test has the assumption that the discrete probability of observed frequencies can be approximated by the ch-squared distribution, which is continuous.
To overcome this, Frank Yates, an English statistician suggested a correction for continuity which adjusts the formula for Pearson's chi-square test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table. This reduces the chi-square value obtained and thus increases its p-value. It prevents overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected frequency less than 5.
where:
- Oi = an observed frequency
- Ei = an expected (theoretical) frequency, asserted by the null hypothesis
- N = number of distinct events
As a short-cut, for a 2x2 table with the following entries:
S | F | ||
---|---|---|---|
A | a | b | NA |
B | c | d | NB |
NS | NF | N |
we can write:
Other sources say that this correction should be used when the expected frequency is less than 10[citation needed].
Yet other sources say that Yates corrections should always be applied[citation needed]. However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value obtained.
[edit] References
Yates, F (1934). Contingency table involving small numbers and the χ2 test. Journal of the Royal Statistical Society (Supplement) 1: 217-235.