频率分析
维基百科,自由的百科全书
频率分析在数学、物理学和信号处理中是一种分解函数、波形、或者信号的频率组成,以获取频谱的方法。
在密码学中,频率分析是指研究字母或者字母组合在文本中出现的频率。应用频率分析可以破解古典密码。
频率分析基于如下原理:在任何一种书面语言中,不同的字母或字母组合出现的频率各不相同。而且,对于以这种语言书写的任意一段文本,都具有大致相同的特征字母分布。比如,在英语中,字母E出现的频率很高,而X则出现得较少。类似地,ST、NG、TH,以及QU等双字母组合出现的频率非常高,NZ、QJ组合则极少。英语中出现频率最高的12个字母可以简记为“ETAOIN SHRDLU”。
In some ciphers, such properties of the natural language plaintext are preserved in the ciphertext, and these patterns have the potential to be exploited in a ciphertext-only attack.
目录 |
[编辑] 简单替换密码的频率分析
在一个简单的替换密码中,明文中的每一个字母都被另一个字母替换,而且且明文中相同的字母在转换为密文时总是被同一个字母所替换。比如,所有的e都会被替换成 X.一个含有大量X的密文消息会向密码破译者暗示X替换e.
The basic use of frequency analysis is to first count the frequency of ciphertext letters and then associate guessed plaintext letters with them. More X's in the ciphertext than anything else suggests that X corresponds to e in the plaintext, but this is not certain; t and a are also very common in English, so X might be either of them also. It is unlikely to be a plaintext z or q which are less common. Thus the cryptanalyst may need to try several combinations of mappings between ciphertext and plaintext letters.
More complex use of statistics can be conceived, such as considering counts of pairs of letters, or triplets (trigrams), and so on. This is done to provide more information to the cryptanalyst, for instance, Q and U nearly always occur together in that order in English, even though Q itself is rare.
[编辑] 一个例子
Suppose Eve has intercepted the cryptogram below, and it is known to be encrypted using a simple substitution cipher:
LIVITCSWPIYVEWHEVSRIQMXLEYVEOIEWHRXEXIPFEMVEWHKVSTYLXZIXLIKIIXPIJVSZEYPERRGERIM WQLMGLMXQERIWGPSRIHMXQEREKIETXMJTPRGEVEKEITREWHEXXLEXXMZITWAWSQWXSWEXTVEPMRXRSJ GSTVRIEYVIEXCVMUIMWERGMIWXMJMGCSMWXSJOMIQXLIVIQIVIXQSVSTWHKPEGARCSXRWIEVSWIIBXV IZMXFSJXLIKEGAEWHEPSWYSWIWIEVXLISXLIVXLIRGEPIRQIVIIBGIIHMWYPFLEVHEWHYPSRRFQMXLE PPXLIECCIEVEWGISJKTVWMRLIHYSPHXLIQIMYLXSJXLIMWRIGXQEROIVFVIZEVAEKPIEWHXEAMWYEPP XLMWYRMWXSGSWRMHIVEXMSWMGSTPHLEVHPFKPEZINTCMXIVJSVLMRSCMWMSWVIRCIGXMWYMX
For this example, uppercase letters are used to denote ciphertext, lowercase letters are used to denote plaintext (or guesses at such), and X~t is used to express a guess that ciphertext letter X represents the plaintext letter t.
Eve could use frequency analysis to help solve the message along the following lines: counts of the letters in the cryptogram show that I is the most common single letter, XL most common bigram, and XLI is the most common trigram. e is the most common letter in the English language, th is the most common bigram, and the the most common trigram. This strongly suggests that X~t, L~h and I~e. The second most common letter in the cryptogram is E; since the first and second most frequent letters in the English language, e and t are accounted for, Eve guesses that E~a, the third most frequent letter. Tentatively making these assumptions, the following partial decrypted message is obtained.
heVeTCSWPeYVaWHaVSReQMthaYVaOeaWHRtatePFaMVaWHKVSTYhtZetheKeetPeJVSZaYPaRRGaReM WQhMGhMtQaReWGPSReHMtQaRaKeaTtMJTPRGaVaKaeTRaWHatthattMZeTWAWSQWtSWatTVaPMRtRSJ GSTVReaYVeatCVMUeMWaRGMeWtMJMGCSMWtSJOMeQtheVeQeVetQSVSTWHKPaGARCStRWeaVSWeeBtV eZMtFSJtheKaGAaWHaPSWYSWeWeaVtheStheVtheRGaPeRQeVeeBGeeHMWYPFhaVHaWHYPSRRFQMtha PPtheaCCeaVaWGeSJKTVWMRheHYSPHtheQeMYhtSJtheMWReGtQaROeVFVeZaVAaKPeaWHtaAMWYaPP thMWYRMWtSGSWRMHeVatMSWMGSTPHhaVHPFKPaZeNTCMteVJSVhMRSCMWMSWVeRCeGtMWYMt
Using these initial guesses, Eve can spot patterns that confirm her choices, such as "that". Moreover, other patterns suggest further guesses. "Rtate" might be "state", which would mean R~s. Similarly "atthattMZe" could be guessed as "atthattime", yielding M~i and Z~m. Furthemore, "heVe" might be "here", giving V~r. Filling in these guesses, Eve gets:
hereTCSWPeYraWHarSseQithaYraOeaWHstatePFairaWHKrSTYhtmetheKeetPeJrSmaYPassGasei WQhiGhitQaseWGPSseHitQasaKeaTtiJTPsGaraKaeTsaWHatthattimeTWAWSQWtSWatTraPistsSJ GSTrseaYreatCriUeiWasGieWtiJiGCSiWtSJOieQthereQeretQSrSTWHKPaGAsCStsWearSWeeBtr emitFSJtheKaGAaWHaPSWYSWeWeartheStherthesGaPesQereeBGeeHiWYPFharHaWHYPSssFQitha PPtheaCCearaWGeSJKTrWisheHYSPHtheQeiYhtSJtheiWseGtQasOerFremarAaKPeaWHtaAiWYaPP thiWYsiWtSGSWsiHeratiSWiGSTPHharHPFKPameNTCiterJSrhisSCiWiSWresCeGtiWYit
In turn, these guesses suggest still others (for example, "remarA" could be "remark", implying A~k) and so on, and it is relatively straightforward to deduce the rest of the letters, eventually yielding the plaintext.
In this example, Eve's guesses were all correct. This would not always be the case, however; the variation in statistics for individual plaintexts can mean that initial guesses are incorrect. It may be necessary to backtrack incorrect guesses or to analyse the available statistics in much more depth than the somewhat simplified justifications given in the above example.
It is also possible that the plaintext does not exhibit the expected distribution of letter frequencies. Shorter messages are likely to show more variation. It is also possible to construct artificially skewed texts. For example, entire novels have been written that omit the letter "e" altogether — a form of literature known as a lipogram.
[编辑] 历史和应用
The first known recorded explanation of frequency analysis (indeed, of any kind of cryptanalysis) was given by 9th century Arabic polymath Abu Yusuf Yaqub ibn Ishaq al-Sabbah Al-Kindi in A Manuscript on Deciphering Cryptographic Messages (Ibraham, 1992). It has been suggested that close textual study of the Qur'an first brought to light that Arabic has a characteristic letter frequency. Its use spread, and was so widely used by European states by the Renaissance that several schemes were invented by cryptographers to defeat it. These included
- use of homophones — several alternatives to the most common letters in otherwise monoalphabetic substitution ciphers (for example, for English, both X and Y ciphertext might mean plaintext E).
- polyalphabetic substitution, that is, the use of several alphabets — chosen in assorted, more or less devious, ways (Leone Alberti seems to have been the first to propose this); and
- polygraphic substitution, schemes where pairs or triplets of plaintext letters are treated as units for substitution, rather than single letters (for example, the Playfair cipher invented by Charles Wheatstone in the mid 1800s).
A disadvantage of all these attempts to defeat frequency counting attacks is that it increases complication of both enciphering and deciphering, leading to mistakes. Famously, a British Foreign Secretary is said to have rejected the Playfair cipher because, even if school boys could cope successfully as Wheatstone and Playfair had shown, 'our attaches could never learn it!'.
The rotor machines of the first half of the 20th century (for example, the Enigma machine) were essentially immune to straightforward frequency analysis. However, other kinds of analysis ("attacks") successfully decoded messages from some of those machines.
Frequency analysis requires only a basic understanding of the statistics of the plaintext language and some problem solving skills, and, if performed by hand, some tolerance for extensive letter bookkeeping. During World War II (WWII), both the British and the Americans recruited codebreakers by placing crossword puzzles in major newspapers and running contests for who could solve them the fastest. Several of the ciphers used by the Axis powers were breakable using frequency analysis (for example, some of the consular ciphers used by the Japanese). Mechanical methods of letter counting and statistical analysis (generally IBM card type machinery) were first used in WWII, possibly by the US Army's SIS. There are lurid tales of midnight expeditions by the cryptographers to machines in another Department. Today, the hard work of letter counting and analysis has been replaced by computer software, which can carry out such analyses in seconds. With modern computing power, classical ciphers are unlikely to provide any real protection for confidential data.
[编辑] 小说中的频率分析
Frequency analysis has been described in fiction. Edgar Allan Poe's The Gold Bug, and Arthur Conan Doyle's Sherlock Holmes tale The Adventure of the Dancing Men are examples of stories which describe the use of frequency analysis to attack simple substitution ciphers. The cipher in the Poe story is encrusted with several deception measures, but this is more a literary device than anything significant cryptographically.
[编辑] 相关内容
- ETAOIN SHRDLU
- topics in cryptography
- Zipf's law
[编辑] 参考
- Helen Fouché Gaines, "Cryptanalysis", 1939, Dover. ISBN 0486200973
- Ibraham A. “Al-Kindi: The origins of cryptology: The Arab contributions”, Cryptologia, 16(2) (April 1992) pp. 97–126.
- Abraham Sinkov, "Elementary Cryptanalysis : A Mathematical Approach", The Mathematical Association of America, 1966. ISBN 0883856220.
[编辑] 外部链接
Template:Classical cryptography