From Wikipedia, the free encyclopedia
In econometrics and statistics, a top coded dataset is one for which the upper bound is not known. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $51 billion, it would not be anonymous because people would know it is Bill Gates).
[edit] Example: Top Coding of Wealth
id |
age |
income |
1 |
26 |
24778 |
exact value |
2 |
32 |
26750 |
exact value |
3 |
45 |
26780 |
exact value |
4 |
32 |
30000+ |
top coded |
5 |
45 |
30000+ |
top coded |
[edit] Implications for OLS
- If the lower bound of the top coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
- The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
- The Tobit procedure is robust to top coding, and gives unbiased estimates.
[edit] See Also
[edit] References
- Tobin, James (1958). "Estimation for relationships with limited dependent variables". Econometrica 26 (1), 24–36.