Besides not dealing with secrets in the first place, the best way to keep a secret is summarised by a neat little axiom.

Go placidly amidst the noise and haste, and remember what peace there may be in silence.” – Max Ehrmann, The Desiderata of Happiness

The safest way to keep a secret is by preventing it from spawning more secrets. This way, there is no association between the proto-secret and its surroundings, leaving adversaries (those who want to learn the secret) with fewer means to trackback to it.

For instance, if A knows the secret that “Bob is a golem”, then the security of the secret itself is in some way halved when A shares it with B. This is irrespective of whether B is trustworthy or not.

This is because, assuming B is trustworthy, there are now two people who possess the same information, and therefore the make the information doubly available. Put another way, whatever securities A had in place – from taping his mouth to writing it on a piece of paper, wiping off his memory, locking the paper in a vault, and drowning the key – will have to be installed for B as well.

Invoking a transitive property, the faults that A’s measures were susceptible to will also be the B’s measures are at least susceptible to, halving the efficiency of the system.

Looked at in another different way: A system designed to protect secrets from adversaries must not have secrets of its own. Curiously, this argument brings to light another hidden property of secrets: Until one is shared, we may not know what the secret really is that we possess!

To overcome these fallibilities, the system will be expected to protect only the secret and not itself. This is because, if it is protecting itself, then it is revealing that it has minimal security at some point in space/time at the least (minima), and that it is serving to protect only itself and not the secret at the most (maxima).

In the minimal situation, the system is at its least ideal: Its failure is the loss of the secret’s secrecy. In the maximal situation, the system is para-ideal: Its success will hinder the transmission of the secret.

So, why can’t the maximal case be adopted, you ask? Because what we need is a system purely for a secret’s protection, a system that serves unto its own protection will stand in the way of the purpose of the secret.

For instance, if a piece of information is to be generated first, then encrypted, and then transmitted, decrypted, consumed, re-encrypted, transmitted, decrypted, consumed, and then destroyed, then a system containing the information but protecting only itself will need to adapt to threats in different encryption, transmission, decryption, and consumption situations.

Axiomatically, the protecting system becomes the weakest link whereas what we need is the secret to be the weakest link and the system to be the strongest link.

In other words, there must exist a secret, and it will be best protected if there exists a guarding system whose machinations the adversary may know of and still not have access to the secret. In fact, this is the essence of Kerckhoffs’s axiom.

In the words of Auguste Kerckhoffs, who first formulated the axiom, “Il faut qu’il n’exige pas le secret, et qu’il puisse sans inconvénient tomber entre les mains de l’ennemi.” (The system must not be required to be a secret, and it must be able to fall into the hands of the enemy without inconvenience) In the reductionist parlance prevalent throughout the 21st century, and in the words of Claude Shannon, “The enemy knows the system.”

The principles behind the axiom are founded on a very conservative philosophy, and bring with them a lot of safety and protection if implemented. What it does well is to eliminate serendipity on the adversary’s party, and limits the number of avenues at his disposal to break the system and access the secret. Another important thing about the axiom is that it eliminates any focus on what the technique is and turns all of it on on what the technique does.

In fact, this last point was also expounded upon by Kerckhoffs in 1883, through two journal articles he wrote for the ‘La Cryptographie Militaire’, in which his axiom also made its first appearance.

As I discussed earlier, what the technique does is it guards a secret, establishing a sort of linearity: no matter the design of the system appointed to protect the secret, the system itself must not replace the existing problem (secrecy) with a harder one - but what about with an equally difficult one? This should be allowable only if both the secret and the system are part of a linear information-transmission circuit.


- The path on top is a linear information-transmission circuit, and the one below, a parallel variant (A = asset; S = secret; P = program/pathway).

For instance, in a substitution cipher (a.k.a., a Caesar cipher), each letter of the alphabet is substituted with the one that’s one place away from it (“J” with “K”) or two places away from it (“J” with “L”) or three places away from it (“J” with “M”), or so forth. The quantity of the shift is the value of the cipher’s key. So, if you’ve the encrypted message in your hand and want to know what it means, all you need is the key.

Here, while the message itself is the secret, the secrecy of the endeavour rests on knowing the key, so the message and the key both lie on a linear information-transmission circuit. The maintenance of secrecy, now, depends on how well the key is guarded - not how well the message is guarded. Therefore, even though the system has replaced the secret as the weakest link, it is known for a fact that it is absolutely impossible to know what the message is without the key!

Of course, a modern decryption system can break the Caesar cipher in a jiffy even if doesn’t have the key. This is because - Step 1 - once the system knows that a Caesar cipher is at work, it can just run a nested loop through all the letters with key values between, including, 1 and 26. At the end, the system will possess 26 messages. Step 2: If it’s known which language the message is in, then one among the 26 messages will make some sense in that language. This message is the decrypted result.

A more sophisticated cipher is the IBM’s Lucifer cipher, released by the US-NIST (then the US-NBS) in 1977 as part of the DES, and which uses the feisty Feistel network. Understanding the DES was the first step in figuring out how cryptanalytic attacks worked, or could work. What made the Lucifer so powerful was that it made use of four components that performed serially but delivered parallely.

First, say you’ve a message, M, and have no idea what it means and that the Caesar cipher has also turned up nothing.

So, you use a 56-bit key, such as “11000000111110010110111000111110110110110111001100111010”, as your starting point. This “master key” is called permuted-choice-1 (PC-1), and it’s job is to spawn a series of sub-keys. Before acquiring the first sub-key, PC-1 is split into two 28-bit keys: 1100000011111001011011100011 and 1110110110110111001100111010. Next, each half is rotated to the left by one or two bits. Then, 24 bits from each rotated 28-bit half-key is selected and concatenated to yield the first sub-key.

This sub-key forms the first Caesar cipher key, the PC-2, with which the message, M, is to be indirectly decrypted.

In the second round, the two half-keys are again rotated one or two bits to the left, then 24 bits selected and joined together to yield the second sub-key, with which the once-decrypted message is decrypted for the second time.

In the third round, the two half-keys are again rotated one or two bits to the left, then 24 bits selected and joined together to yield the third sub-key, with which the twice-decrypted message is decrypted for the third time.

(If, after rotation, an uneven number of bits from each half-key is chosen instead of 24 from each, then the encryption system stops being a Feistel and starts being a Skipjack.)

This whole thing goes on for 16 total rounds, administered by a rounding algorithm (DES) which also decides at which place to start, leaving M decrypted a total of 16 times with the use of 32 half-keys determined by 16 two-set 24-bit concatenations, which in turn are each selected on the basis of 16 key-schedules. So, pictographically, this is how the Feistel network encryption system is represented.

Here: R-A is the rounding algorithm, K-S the key-schedule, and B-D the bits’ displacement each round, represented as a function.

When only the system’s looked at, you’ll see that it fails if one of its four components fails. This means the system, by itself, has four secrets to protect of itself. However, in making this argument, we’re also assuming that all four components are individually capable of protecting the secret, that it’s a parallel information-transmission circuit. This isn’t the case because if one of the four components fails, then the rest of the system becomes un-unlockable, thus destroying (technical) access to the secret and preserving the secrecy.


- The four components have to all work or the system will be in stasis.

So, Kerckhoffs’s axiom holds - from Caesar to Feistel to Skipjack - and all along the way, provides the same level of security through just one principle, his desiderata.

(If you’re interested in cracking the Skipjack, check the DESCHALL, the Deep Crack, and the AES. If you’re interested in the most effective encryption system in place today, check out the FIPS standards over the years, COPACABANA, and RIVYERA.)