A Crypto Problem

My #3 Son was reading Poe’s The Gold Bug and so he and I were discussing solving crypto puzzles. My son mentioned the letter frequencies Poe gave and I said Poe was wrong. So I asked him if he was sure he remembered it correctly. My son said he was sure he remembered it correctly and got the book out to prove it. Of course I had read Poe many years ago, but I didn’t remember the letter frequencies Poe gave. Here are the letter frequencies Poe gave in order from most common to least as:
e a o i d h n r s t u y c f g l m w b k p q x z
In Linotype usage the frequencies are:
e t a o i n s h r d l u c m f w y p v b g k q j x z
and like any good crypto puzzle solver let me put them one on top of the other:
e a o i d h n r s t u y c f g l m w b k p q x z
e t a o i n s h r d l u c m f w y p v b g k q j x z

Do any of you have an explanation for the differences at least among the most common letters? Was he trying to make it harder for his readers to solve the crypto puzzles he often published?
Should you care to read or re-read the story here is an on line version of The Gold Bug from Project Gutenberg. And here is a paperback version: The Gold-Bug and Other Tales.
Cross Posted at Power and Control


Posted

in

by

Tags:

Comments

16 responses to “A Crypto Problem”

  1. Bleepless Avatar
    Bleepless

    Simon Singh’s terrific “The Code Book” has a crypto problem at the end. He offered money to the first person to solve it. Someone did.

  2. Dennis Avatar

    Here’s someone who thinks Poe is playing games:
    http://is.gd/eo8b

  3. SBP Avatar
    SBP

    I speculate that Poe didn’t have access to a good frequency table and generated his own, perhaps from an atypical sample, or from too small a sample size.

  4. Anonymous Avatar
    Anonymous

    Why doesn’t Poe include “j” and “v”?

  5. guy on internet Avatar
    guy on internet

    The frequencies he gives are from his own cryptograms, none of which decodes to “vajayjay,” because it was old-timey days.

  6. M. Simon Avatar

    SBP,
    You would have a point if the letters “in error” came at the end of the table.
    However misplacing t d and h so significantly and t especially seems intentional.
    It might be interesting to scan a number of texts to see if it is possible to find one with that frequency pattern. Poe may have hidden a cryptogram in his letter frequency chart.

  7. Dennis Avatar

    Wasn’t Poe an accomplished cryptographer who claimed that no one in the world could design a cipher he couldn’t crack? He apparently succeeded when he challenged readers of his newspaper column to try to stump him.
    It seems more likely that he gave bad information on purpose. Perhaps he didn’t want to give away too much information. It might generate interest in the puzzles but not give people an absolute key to solving them. The rest of his readers, who still didn’t go in for the puzzles, would be none the wiser but could still enjoy the story with its internal logic.

  8. K Avatar
    K

    No mystery. Poe wrote 150 years ago. English word use was different then. And Poe was just one person, not the English Speaking Peoples. His table might have been right for the United States of his day.
    An interesting test would be a table made from Poe’s writings. That would be English as he used it.
    Type would have still been set by hand for Poe. Mark Twain famously invested in early typesetting machines and lost a fortune.

  9. Alex Green Avatar
    Alex Green

    Intriguingly, this is the ranking for a book cipher which uses the first letter of each word.
    The ranking for first letters in English is
    t o a w b c d s f m r h i y e g l n o u j k
    whereas that for every letter in English is as you quoted.
    If you multiply the numerical frequencies for first letters by the numerical frequencies for every letter, the ranking will be close to that given by Poe.

  10. M. Simon Avatar

    Alex,
    That is very interesting.

  11. SBP Avatar
    SBP

    Why doesn’t Poe include “j” and “v”?
    Now this is interesting.
    Classical Latin uses j and i interchangeably, and the same for u and v.
    I wonder if Poe was working from a Latin text?

  12. SBP Avatar
    SBP

    The Wikipedia page doesn’t give a table for Latin, but the Italian table has “t” at 5.62% and the Spanish table has “t” at 4.63%, both significantly lower than the 9.06% for English.

  13. Alex Green Avatar
    Alex Green

    Kluber gives the percentages for Latin as:
    I – 10.1 M – 3.4 V – 0.7
    E – 9.2 C – 3.3 X – 0.6
    U – 7.4 P – 3.0 H – 0.5
    T – 7.2 L – 2.1 J – 0
    A – 7.2 D – 1.7 K – 0
    S – 6.8 G – 1.4 Y – 0
    R – 6.8 Q – 1.3 Z – 0
    N – 6.0 B – 1.2
    O – 4.4 F – 0.9

  14. Dave Avatar
    Dave

    Poe’s table is probably based on a pretty small sample size. Note that in
    eaoi(dhnrstuy)(cfglmw)(bkpqxz)
    the three parenthesized groups are alphabetized. My guess is that this means they all had the same (or statistically indistinguishable) counts in his sample, so he just listed them in order. If this is true, then only the T has really shifted very much.

  15. Aakash Avatar

    “Son #3”? I didn’t even know you had any kids!

  16. M. Simon Avatar

    Aakash,
    Three sons, one daughter.