Errors and residuals in statistics
Ti Wikipédia, énsiklopédi bébas
Dina statistik, konsep kasalahan jeung sesa gampang silih bingungkeun.
Error is a misnomer; an error is the amount by which an observation differs from its nilai ekspektasi; the latter being based on the whole population from which the statistical unit was chosen randomly. The expected value, being the average of the entire population, is typically unobservable. If the average height of 21-year-old men is 5 feet 9 inches, and one randomly chosen man is 5 feet 11 inches tall, then the "error" is 2 inches; if the randomly chosen man is 5 feet 7 inches tall, then the "error" is −2 inches. The nomenclature arose from random measurement errors in astronomy. It is as if the measurement of the man's height were an attempt to measure the population average, so that any difference between the man's height and the average is a measurement error.
A residual, on the other hand, is an observable estimate of the unobservable error. The simplest case involves a random sample of n men whose heights are measured. The sample average is used as an estimate of the population average. Then we have:
- The difference between each man's height and the unobservable population average is an error, and
- The difference between each man's height and the observable sample average is a residual.
- Residuals are observable; errors are not.
Note that the sum of the residuals is necessarily zero, and thus the residuals are necessarily not independent. The sum of the errors need not be zero; the errors are independent random variables if the individuals are chosen from the population independently.
- Errors are often independent of each other; residuals are usually not independent of each other.
[édit] Conto
Lamun urang nganggap populasi nu kasebar normal mibanda mean μ sarta simpangan baku σ, sarta individu nu dipilih bebas, mangka
sarta sampel mean ngarupakeun sebaran variabel random:
Mangka kasalahan nyaeta
sedengkeun sesa nyaeta
(Saperti nu ilahar dipake, tanda "topi" diluhureun aksara ε nunjukkeun estimasi observasi tina kuantitas nu teu kaobservasi disebut ε.)
Jumlah kuadrat kasalahan, dibagi ku σ2, mibanda sebaran chi-kuadrat mibanda n tingkat kebebasan:
This quantity, however, is not observable. The sum of squares of the residuals, on the other hand, is observable. The quotient of that sum by σ2 has a chi-square distribution with only n − 1 degrees of freedom:
It is remarkable that this random variable and the sample mean can be shown to be independent of each other. That fact and the normal and chi-square distributions given above form the basis of interval kapercayaan calculations relying on sebaran-t student. The cancellation of σ from the numerator and the denominator in those calculations entails that the absurdity of the seeming assumption that σ2 is known has no harmful effect.
[édit] Tempo ogé
![]() |
Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris. Bantosanna diantos kanggo narjamahkeun. |