Calgary Corpus
From Wikipedia, the free encyclopedia
The Calgary Corpus is a collection of text and binary data files, commonly used for comparing data compression algorithms. It was created by Ian Witten and Tim Bell in the 1980s and was commonly used in the 1990s. In 1997 it was replaced by the Canterbury Corpus, but the Calgary Corpus still exists for comparison and is still useful for its original intended purpose.
[edit] Example compression ratios
Compression Algorithm | Compressed Size | Compression Ratio |
---|---|---|
None | 3251493 | 1 |
lzop | 1592692 | 2.0415 |
AIT-3 | 1558353 | 2.0864 |
LTO-2 | 1558353 | 2.0864 |
ncompress -b13 | 1510478 | 2.1526 |
DLT | 1479577 | 2.1975 |
ncompress | 1367363 | 2.3779 |
gzip -6 | 1068037 | 3.0443 |
bzip2 -9 | 890079 | 3.6530 |