循环冗余校验
维基百科,自由的百科全书
循环冗余校验(CRC)是根据如网络数据包或者计算机文件块这样的数据生成少数固定数目数据位的一种散列函数。校验和用来检测数据传输或者存储后可能出现的错误。CRC 在传输或者储存之前计算出来并且附加到数据后面,然后接收方进行检验确定数据是否发生变化。由于易于用二进制的电脑硬件实现、容易进行数学分析并且尤其善于检测传输通道噪声引起的错误,所以 CRC 得到了广泛使用。
目录 |
[编辑] 简介
CRC“校验和”是两个比特数据流采用二进制除法(没有进位,使用XOR异或来代替减法)相除所得到的余数。其中被除数是需要计算校验和的信息数据流的二进制表示;除数是一个长度为n + 1的预定义(短)的二进制数,通常用多项式的系数来表示。在做除法之前,要在信息数据之后先加上n个0.
CRCa 是基于有限域GF(2)(关于2同余)的多项式环。简单的来说,就是所有系数都为0或1(又叫做二进制)的多项式系数的集合,并且集合对于所有的代数操作都是封闭的。例如:
2会变成0,因为对系数的加法都会模2. 乘法也是类似的:
我们同样可以对多项式作除法并且得到商和余数。例如, 如果我们用x3 + x2 + x除以x + 1。我们会得到:
也就是说,
- (x3 + x2 + x) = (x2 + 1)(x + 1) − 1
这里除法得到了商x2 + 1和余数-1,因为是奇数所以最后一位是1。
字符串中的每一位其实就对应了这样类型的多项式的系数。为了得到CRC, 我们首先将其乘以xn,这里n是一个固定多项式的阶数, 然后再将其除以这个固定的多项式,余数的系数就是CRC。
在上面的等式中,x2 + x + 1表示了本来的信息位是111
, x + 1是所谓的钥匙, 而余数1(也就是x0)就是CRC. key的最高次為1, 所以我們將原來的信息乘上x1來得到x3 + x2 + x,也可視為原來的信息位補1個零成為1110
。
一般来说,其形式为:
这里 M(x) 是原始的信息多项式。K(x)是n阶的“钥匙”多项式。表示了将原始信息后面加上n个0。R(x)是余数多项式,既是CRC“校验和”。在通讯中,发送者在原始的信息数据M后加上n位的R(替换本来附加的0)再发送。接收者收到M和R后,检查是否能被K(x)整除。如果是,那么接收者认为该信息是正确的。值得注意的是就是发送者所想要发送的数据。这个串又叫做codeword.
CRCs 经常被叫做“校验和”, 但是这样的说法严格来说并不是准确的,因为技术上来说,校验“和”是通过加法来计算的,而不是CRC这里的除法。
“错误纠正编码”常常和CRCs紧密相关,其语序纠正在传输过程中所产生的错误。这些编码方式常常和数学原理紧密想关。
[编辑] 实现
There are simple, efficient algorithms for computing the CRC of a block of data, such as those shown below. (Warning, this algorithm is wrong, go look inside the discussion to discuss about it)
One common bitwise algorithm is based on a shift register which has a size in bits equal to the degree of the chosen polynomial. The main portion of the algorithm can be expressed in pseudocode as follows:
function crc(bit array bitString[1..len], int polynomial) { shiftRegister := initial value // commonly all 0 bits or all 1 bits for i from 1 to len { if most significant bit of shiftRegister = 1 { // Shift left, place data bit as LSB, then divide shiftRegister := shiftRegister left shift 1 shiftRegister := shiftRegister or bitString[i] shiftRegister := shiftRegister xor polynomial } else { // shiftRegister is not divisible by polynomial yet. // Just shift left and bring current data bit onto LSB of shiftRegister shiftRegister := shiftRegister left shift 1 shiftRegister := shiftRegister or bitString[i] } } return shiftRegister }
Another common algorithm uses a lookup table indexed by multiple most-significant bits of the shiftRegister
to process multiple bits at once. A 256-entry lookup table is a particularly common choice.
Another implementation uses a shift register, but instead of changing multiple bits in the shiftRegister
based on the xor of one bit of the shiftRegister
and one bit of the bitString
, it is possible to xor (compute the parity of) all the bits of the shiftRegister
selected by the polynomial
and the bitString
and add that single bit to the shiftRegister
. With suitable adjustments to the polynomial
, this also produces the same remainder. This algorithm is usually inefficient in software, but used in some hardware implementations, and is often used when describing the close relative to a CRC, the linear feedback shift register.
The specific CRC produced is defined by the polynomial used. To produce an n-bit CRC requires a degree-n polynomial, of the form . This is naturally expressed as an (n + 1)-bit string, but the leading xn term is normally implicit, leaving an n-bit string Thus, depending on the bit-order convention used, the CRC-16-IBM polynomial x16 + x15 + x2 + 1, will be represented as the hexadecimal number 0x8005 or as 0xA001.
One of the most commonly encountered CRC algorithms is known as CRC-32, used by (among others) Ethernet, FDDI, ZIP and other archive formats, and PNG image format. Its polynomial can be written 0x04C11DB7, or 0xEDB88320 by reversing the bit string derived from the polynomial as explained above. The W3C webpage on PNG includes an appendix with a short and simple table-driven implementation in C of CRC-32. Despite the popular acclaim, it is important to notice that the polynomial in question was arbitrarily chosen and generally exhibits very poor error detection properties (in terms of Hamming distance per given block size) compared to polynomials selected by algorithmic means[1] [2].
[编辑] 变体
CRC 有几种不同的变体
shiftRegister
可以逆向使用,这样就需要检测最低位的值,每次向右移动一位。这就要求polynomial
生成逆向的数据位结果。实际上这是最常用的一个变体。- 可以先将数据最高位读到移位寄存器,也可以先读最低位。在通讯协议中,为了保留 CRC 的突发错误检测特性,通常按照物理层发送数据位的方式计算 CRC。
- 为了检查 CRC,需要在全部的码字上进行 CRC 计算,而不是仅仅计算消息的 CRC 并把它与 CRC 比较。如果结果是 0,那么就通过这项检查。这是因为码字 可以被 K(x) 整除。
- 移位寄存器可以初始化成 1 而不是 0。同样,在用算法处理之前,消息的最初 n 个数据位要取反。这是因为未经修改的 CRC 无法区分只有起始 0 的个数不同的两条消息。而经过这样的取反过程,CRC 就可以正确地分辨这些消息了。
- CRC 在附加到消息数据流的时候可以进行取反。这样,CRC 的检查可以用直接的方法计算消息的 CRC、取反、然后与消息数据流中的 CRC 比较这个过程来完成,也可以通过计算全部的消息来完成。在后一种方法中,正确消息的结果不再是 0,而是 除以 K(x) 得到的结果。这个结果叫作核验多项式 C(x),它的十六进制表示也叫作幻数。
按照惯例,使用 CRC-32 多项式以及 CRC-16-CCITT 多项式时通常都要取反。CRC-32 的核验多项式是
C(x) = x31 + x30 + x26 + x25 + x24 + x18 + x15 + x14 + x12 + x11 + x10 + x8 + x6 + x5 + x4 + x3 + x + 1。
[编辑] Reversed representations and reciprocal polynomials
[编辑] Polynomial representations
All the well-known CRC polynomials of degree n have two common hexadecimal representations:
- A hexadecimal number with n bits, the least significant bit of which is always 1, the most significant bit representing the xn − 1 coefficient and the least significant bit representing the x0 coefficient.
- A hexadecimal number with n bits, the most significant bit of which is always 1, the most significant bit representing the x0 coefficient and the least significant bit representing the xn − 1 coefficient.
The first of these is the normal representation, and gives the correct results when used in a CRC algorithm using a left shift. The second is called the reversed representation, because it is the bitwise reverse of the normal representation. When used in a CRC algorithm using a right shift, the reverse representation gives the bitwise reverse of the results that using the left shift algorithm with the normal representation does.
Using the normal representation in the right shift algorithm or the reversed one in the left shift algorithm gives incorrect results.
In both these representations the xn coefficient is omitted and understood to be 1.
There is a third representation in use, from a paper by P. Koopman and T. Chakravarty [2][3]
- A hexadecimal number with n bits, the most significant bit of which is always 1, the most significant bit representing the xn coefficient and the least significant bit representing the x1 coefficient. The x0 coefficient is omitted and understood to be 1.
This representation (call it Koopman) has the advantage that (like the normal representation) the coefficients are retained in their usual mathematical order, and (like the reversed representation) the order of the polynomial can be determined directly from the representation. Unfortunately, it will not produce correct results in either the left- or right- shift versions of the CRC algorithm.
[编辑] Reciprocal polynomials
A reciprocal polynomial is created by assigning the xn through x0 coefficients of one polynomial to the x0 through xn coefficients of a new polynomial. Example: The reverse of x16 + x15 + x2 + 1 is x16 + x14 + x1 + 1. The most interesting property of reciprocal polynomials, when used in CRCs, is that they have the exact same error-detecting strength as the polynomials they are reciprocals of. The reciprocal of a polynomial generates the same codewords — that is, the bit strings consisting of a valid CRC appended to a message — as the original polynomial, only bit reversed. But the reciprocal polynomial is not the same as the original polynomial, and the CRCs generated using it are not the same (even modulo bit reversal) as those generated from the original polynomial.
Another interesting property of reciprocal polynomials is related to their representation. Consider again the CRC-16-IBM polynomial x16 + x15 + x2 + 1 and its reciprocal. These are its representations
- Normal original: 0x8005
- Reversed original: 0xA001
- Koopman original: 0xC002
- Normal reciprocal: 0x4003
- Reversed reciprocal: 0xC002
- Koopman reciprocal: 0xA001
Note that the Koopman representation of the reciprocal polynomial is the same as the reversed representation of the original polynomial. This means that if you mistakenly use the Koopman representation of a polynomial in a right-shift algorithm, you get a CRC that is just as strong as (but different from) the one you intended. This holds only for polynomials with a degree which is a multiple of 4.
[编辑] 错误检测能力
CRC 的错误检测能力依赖于关键多项式的阶次以及所使用的特定关键多项式。误码多项式 E(x) 是接收到的消息码字与正确消息码字的异或结果。当且仅当误码多项式能够被 CRC 多项式整除的时候 CRC 算法无法检查到错误。
- 由于 CRC 的计算基于除法,任何多项式都无法检测出一组全为零的数据出现的错误或者前面丢失的零。但是,可以根据 CRC 的变体来解决这个问题。
- 所有只有一个数据位的错误都可以被至少有两个非零系数的任意多项式检测到。误码多项式是 xk,并且 xk 只能被 的多项式 xi 整除。
- CRC 可以检测出所有间隔距离小于多项式阶次的双位错误,在这种情况下的误码多项式是
。
如上所述,xk 不能被 CRC 多项式整除,它得到一个 xi − k + 1 项。根据定义,满足多项式整除 xi − k + 1 的 i − k 最小值就是多项是的阶次。最高阶次的多项式是本原多项式,带有二进制系数的 n 阶多项式 term will not be divisible by the CRC polynomial, which leaves the xi − k + 1 term. By definition, the smallest value of i − k such that a polynomial divides xi − k + 1 is the polynomial's order. The polynomials with the largest order are called primitive polynomials, and for polynomials of degree n with binary coefficients, have order 2n − 1.
- All errors in an odd number of bits will be detected by a polynomial which is a multiple of x + 1. This is equivalent to the polynomial having an even number of terms with non-zero coefficients.
- All burst errors of length n will be detected by any polynomial of degree n or greater which has a non-zero x0 term.
(as an aside, there is never reason to use a polynomial with a zero x0 term. Recall that a CRC is the remainder of the message polynomial times x^n divided by the CRC polynomial. A polynomial with a zero x0 term always has x as a factor. So if K(x) is the original CRC polynomial and , then
That is, the CRC of any message with the K(x) polynomial is the same as that of the same message with the K'(x) polynomial with a zero appended. It is just a waste of a bit.)
The combination of these factors means that good CRC polynomials are often primitive polynomials (which have the best 2-bit error detection) or polynomials whose factors are a primitive polynomial of degree n - 1 and x + 1 (which detects all odd numbers of bit errors, and has half the two-bit error detection ability of a primitive polynomial of degree n)[2].
[编辑] CRC 多项式规范
下面的表格略去了“初始值”、“反射值”以及“最终异或值”。
- 对于一些复杂的校验和来说这些十六进制数值是很重要的,如 CRC-32 以及 CRC-64。通常小于 CRC-16 的 CRC 不需要使用这些值。
- 通常可以通过改变这些值来得到各自不同的校验和,但是校验和算法机制并没有变化。
CRC 标准化问题
- 由于 CRC-12 有三种常用的形式,所以 CRC-12 的定义会有歧义
- 在应用的 CRC-8 的两种形式都有数学上的缺陷。
- 据称 CRC-16 与 CRC-32 至少有 10 种形式,但没有一种在数学上是最优的。
- 同样大小的 CCITT CRC 与 ITU CRC 不同,这个机构在不同时期定义了不同的校验和。
[编辑] 常用 CRC(按照 ITU-IEEE 规范)
名称 | 多项式 | 表示法:正常或者翻转 |
---|---|---|
CRC-1 | x + 1 (用途:硬件,也称为奇偶校验位) |
0x1 or 0x1 (0x1) |
CRC-5-CCITT | x5 + x3 + x + 1 (ITU G.704 标准) | 0x15 (0x??) |
CRC-5-USB | x5 + x2 + 1 (用途:USB 信令包) | 0x05 or 0x14 (0x9) |
CRC-7 | x7 + x3 + 1 (用途:通信系统) | 0x09 or 0x48 (0x11) |
CRC-8-ATM | x8 + x2 + x + 1 (用途:ATM HEC) | 0x07 or 0xE0 (0xC1) |
CRC-8-CCITT | x8 + x7 + x3 + x2 + 1 (用途:1-Wire 总线) | |
CRC-8-Dallas/Maxim | x8 + x5 + x4 + 1 (用途:1-Wire bus) | 0x31 or 0x8C |
CRC-8 | x8 + x7 + x6 + x4 + x2 + 1 | 0xEA(0x??) |
CRC-10 | x10 + x9 + x5 + x4 + x + 1 | 0x233 (0x????) |
CRC-12 | x12 + x11 + x3 + x2 + x + 1 (用途:通信系统) |
0x80F or 0xF01 (0xE03) |
CRC-16-Fletcher | 参见 Fletcher's checksum | 用于 Adler-32 A & B CRC |
CRC-16-CCITT | x16 + x12 + x5 + 1 (X25, V.41, Bluetooth, PPP, IrDA) | 0x1021 or 0x8408 (0x0811) |
CRC-16-IBM | x16 +x15 + x2 + 1 | 0x8005 or 0xA001 (0x4003) |
CRC-16-BBS | x16 + x15 + x10 + x3 (用途:XMODEM 协议) | 0x8408 (0x????) |
CRC-32-Adler | See Adler-32 | 参见 Adler-32 |
CRC-32-MPEG2 | See IEEE 802.3 | 参见 IEEE 802.3 |
CRC-32-IEEE 802.3 | x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1 | 0x04C11DB7 or 0xEDB88320 (0xDB710641) |
CRC-32C (Castagnoli)[1] | x32 + x28 + x27 + x26 + x25 + x23 + x22 + x20 + x19 + x18 + x14 + x13 + x11 + x10 + x9 + x8 + x6 + 1 | 0x1EDC6F41 or 0x82F63B78 (0x05EC76F1) |
CRC-64-ISO | x64 + x4 + x3 + x + 1 (use: ISO 3309) |
0x000000000000001B or 0xD800000000000000 (0xB000000000000001) |
CRC-64-ECMA-182 | x64 + x62 + x57 + x55 + x54 + x53 + x52 + x47 + x46 + x45 + x40 + x39 + x38 + x37 + x35 + x33 + x32 + x31 + x29 + x27 + x24 + x23 + x22 + x21 + x19 + x17 + x13 + x12 + x10 + x9 + x7 + x4 + x + 1 (as described in ECMA-182 p.63) |
0x42F0E1EBA9EA3693 or 0xC96C5795D7870F42 (0x92D8AF2BAF0E1E85) |
CRC-128 | IEEE-ITU 标准。被 MD5 & SHA-1 取代 | |
CRC-160 | IEEE-ITU 标准。被 MD5 & SHA-1 取代 |
[编辑] CRC 与数据完整性
尽管在错误检测中非常有用,CRC 并不能可靠地验证数据完整性(即数据没有发生任何变化),这是因为 CRC 多项式是线性结构,可以非常容易地故意改变数据而维持 CRC 不变,参见CRC and how to Reverse it中的证明。我们可以用 Message authentication code 验证数据完整性。
[编辑] CRC发生碰撞的情况
与所有其它的散列函数一样,在一定次数的碰撞测试之后 CRC 也会接近 100% 出现碰撞。CRC 中每增加一个数据位,就会将碰撞数目减少接近 50%,如 CRC-20 与 CRC-21 相比。
- 理论上来讲,CRC64 的碰撞概率大约是每 18×1018 个 CRC 码出现一次。
- 由于 CRC 的不分解多项式特性,所以经过合理设计的较少位数的 CRC 可能会与使用较多数据位但是设计很差的 CRC 的效率相媲美。在这种情况下 CRC-32 几乎同 CRC-40 一样优秀。
[编辑] 设计 CRC 多项式
生成多项式的选择是 CRC 算法实现中最重要的部分,所选择的多项式必须有最大的错误检测能力,同时保证总体的碰撞概率最小。多项式最重要的属性是它的长度,也就是最高非零系数的数值,因为它直接影响着计算的校验和的长度。
最常用的多项式长度有
- 9 位 (CRC-8)
- 17 位 (CRC-16)
- 33 位 (CRC-32)
- 65 位 (CRC-64)
在构建一个新的 CRC 多项式或者改进现有的 CRC 时,一个通用的数学原则是使用满足所有模运算不可分解多项式约束条件的多项式。
- 这种情况下的不可分解是指多项式除了 1 与它自身之外不能被任何其它的多项式整除。
生成多项式的特性可以从算法的定义中推导出来:
- 如果 CRC 有多于一个的非零系数,那么 CRC 能够检查出输入消息中的所有单数据位错误。
- CRC 可以用于检测短于 2k 的输入消息中的所有双位错误,其中 k 是多项式的最长的不可分解部分的长度。
- 如果多项式可以被 x+1 整除,那么不存在可以被它整除的有奇数个非零系数的多项式。因此,它可以用来检测输入消息中的奇数个错误,就象奇偶校验函数那样。
- CRC polynomials detect (single) burst errors shorter than the number of the position of the highest polynomial coefficient.
[编辑] 参见
总的分类
- 纠错码
- 校验和算法列表
- 奇偶校验位
特殊技术参考
- Adler-32
- Fletcher's checksum
[编辑] 参考文献
- ^ 1.0 1.1 Castagnoli, G. and Braeuer, S. and Herrman, M. (June 1993). "Optimization of Cyclic Redundancy-Check Codes with 24 and 32 Parity Bits". IEEE Transactions on Communications 41 (6). - Castagnoli's et al. work on algorithmic selection of CRC polynomials
- ^ 2.0 2.1 2.2 Koopman, P. (June 2002). "32-Bit Cyclic Redundancy Codes for Internet Applications". The International Conference on Dependable Systems and Networks. - verification of Castagnoli's results by exhaustive search and some new good polynomials
- ↑ Koopman, P. and Chakravarty, T. (2004). "Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks". The International Conference on Dependable Systems and Networks. - analysis of short CRC polynomials for embedded applications
[编辑] 外部链接
- Tutorial and C++ implementation of CRC
- Cyclic redundancy check - a simple guide to what it means for your data, CD and DVD discs. http://www.softwarepatch.com/tips/cyclic-redundancy.html
- The CRC Pitstop
- Williams, R. (1993-09) A Painless Guide to CRC Error Detection Algorithms
- Understanding Cyclic Redundancy Check
- Black, R. (1994-02) Fast CRC32 in Software — Algorithm 4 is used in Linux and info-zip's zip and unzip.
- Barr, M. (1999-11, 1999-12, 2000-01) checksums, CRCs, and their source code. Embedded Systems Programming
- CRC32: Generating a checksum for a file, C++ implementation by Brian Friesen
- Online Tool to compute common CRCs (8/16/32/64) from strings
- Online CRC calculator
- Online CRC Tool: Generator of synthesizable CRC functions
- Online Char (ASCII), HEX, Binary, Base64, etc... Encoder/Decoder with MD2, MD4, MD5, SHA1+2, CRC, etc. hashing algorithms
- CRC16 to CRC64 collision research
- Reversing CRC – Theory and Practice.