[dm-crypt] Basics

Michael Kjörling michael at kjorling.se
Sun Sep 27 15:47:21 CEST 2015

On 27 Sep 2015 13:08 +0200, from promike1987 at gmail.com (Mike Nagie):
> The cipher key size doesn't impact on disk space. Maybe it might impact 
> on speed; aes-xts 256b was 141.5MiB/s while aes-xts 512b was 108.5MiB/s. 

Given that XTS doubles the key length, and the fact that 128-bit AES
is 10 rounds whereas 256-bit AES is 14 rounds, using AES-XTS with 512
bits of key should be 40% slower than AES-XTS with 256 bits of key.
Your numbers show AES-XTS-512 being about 30% slower than AES-XTS-256,
which I would consider to be within tolerance.

> Twofish is a riddle why it's so fast.

It's a completely different cipher with different properties.

> I don't know how reliable this is, but
> dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync gave me this 
> result:
> 536870912 bytes (537 MB) copied, 18.2785 s, 29.4 MB/s
> (Without fdatasync I got 572 MB/s, which obviously is not true)
> So according to the dd result, I could choose any cipher, even serpent 
> would not slow my system down.

Right; if you're only seeing some 30 MB/s mostly sequential
throughput, any cipher that can do more than that will not be a

The 572 MB/s number may very well be true, if you consider caching.
The data is initially stored in RAM, then at some later time written
out to disk asynchronously. That's how _most_ I/O is done.

> Since iteration time means millisecond here, it doesn't matter which 
> hash I choose.
> cryptsetup -h sha1   -i 1000 ... 
> cryptsetup -h sha512 -i 1000 ... 
> both should take 1 second, just sha1 has 644088 iterations per second 
> (on my computer) while sha512 only 321254.
> Isn't sha1 safer in this case? I thought the more iterations, the 
> better/safer.

I suppose an argument could be made either way. As has been discussed
here previously, for how it is used by LUKS, practically _any_
cryptographic hash algorithm should be a safe choice, as long as it is
iterated a reasonable number of times.

> I still don't understand if -i just the number of milliseconds, why does 
> it differ if I change the CPU. Isn't 1000 milliseconds, 1000 milliseconds 
> everywhere?

The important part is how many hash iterations are done. Think of it
like this (_highly_ simplified):

1. You enter the passphrase "hello world".
2. "hello world" is fed to the hash, which outputs "123456".
3. "123456" is fed to the hash, which outputs "gjeiqp".
4. "gjeiqp" is fed to the hash, which outputs "mvie8m".
5. "mvie8m" is fed to the hash, which outputs "ba1nwq".
6. The hash that actually gets used is "ba1nwq".

Now, obviously, to arrive at the same result you need to know how many
times to repeat one of the steps 2-5.

If you simply iterate for a given amount of time, a small difference
in system load may cause the process to terminate either early or
late. For example, if system load is slightly higher then the process
may terminate after step 4, causing the hash used to be "mvie8m"
rather than "ba1nwq". The two are obviously not the same, so you get
an error.

Accurate timing is a difficult problem to solve, so LUKS' approach is
to simply allow you to set a ballpark figure (using "-i" to
cryptsetup) but then actually _store_ the number of iterations
actually used. That way, the process instead becomes:

1. You enter the passphrase "hello world".
2. LUKS reads the header and determines that four hash iterations are used.
3. "hello world" is fed to the hash, which outputs "123456".
4. "123456" is fed to the hash, which outputs "gjeiqp".
5. "gjeiqp" is fed to the hash, which outputs "mvie8m".
6. "mvie8m" is fed to the hash, which outputs "ba1nwq".
7. LUKS concludes that the hash has been iterated four times.
8. The hash that actually gets used is "ba1nwq".

At this point, while the wallclock time may differ slightly from one
time opening the container to the next, you are guaranteed to always
be able to open the container given that you have the correct
passphrase, because you are guaranteed to iterate the hash the same
number of times resulting in the same final output value.

> Thank you for the hint about passwords/passphrases.
> Whether is 'cleft cam synod lacy yr wok' more secure than 'nXRUzbL6' (a 
> random 'pwgen' generated password)?

It is relatively straightforward mathematics. Note that I use some
weird-base logarithms here; logN(x) can be calculated as log(x)÷log(N)
where log(x) is the base-10 logarithm of x, such that 10^log(x) = x.
Generally, N^logN(x) = x.

The Diceware word list consists of 7776 (6^5) words, resulting in an
entropy of log2(7776) ~ 12.9 bits per word. Hence, if an adversary
knows that you are using a Diceware passphrase and that it is exactly
six words out of exactly the standard English word list and the words
are separated by exactly one space character (this is already quite a
bit of knowledge), then the search space corresponds to 77.5 bits,
because 6 × log2(7776) ~ 77.5. Each word you add or remove corresponds
to 12.9 bits of search space. (Passphrase hash iteration is meant to
make each attempt somewhat painful to an adversary without
significantly impacting normal use, but doesn't impact the _search
space_ itself.)

Properly generated Diceware passphrases are generated using true
randomness in a physical process. pwgen uses a software pseudo-random
number generator, and making a deterministic process generate quality
randomness is one of the hard problems in cryptography. Low-quality
dice may be slightly biased; this can easily be tested and compensated
for by adding one or two words to the passphrase, or if it is a major
concern then perfectly fair high-quality "casino" dice can be bought.

pwgen, at least by default, uses a 62-character alphabet and eight
characters in a password. That gives 62^8 possible outputs which, if
they are generated completely randomly (which is not the case) results
in a search space corresponding to log2(62^8) ~ 47.6 bits, _if_ the
random number generator used was perfect. For our purposes here we can
likely disregard the biases in the random number generator it uses.

Remember Edward Snowden's advice: "Assume your adversary is capable of
one trillion guesses per second." One (US) trillion is 10^12 and
log2(10^12) ~ 39.9 bits. A 47.6 bit search space thus takes
approximately 2^(47.6-39.9) ~ 208 seconds; on average, half that.
That's two minutes' work by a determined adversary. We don't know if
this refers to guesses with or without hash iteration, but given that
the advice was provided in the context of PGP secret keys, it's
probably safe to assume that it at least is more than raw hash

The difference between 47.6 bits and 77.5 bits is almost exactly nine
orders of magnitude. In other words, breaking a properly generated
six-word Diceware passphrase _knowing how it was generated_ is about
_a billion times more difficult_ than breaking an eight-character
pwgen password. (A more exact figure is 1,001,836,546 times. This is
2^(77.5-47.6). log10(2^29.9) ~ 9.00.) This turns the 208 seconds into
about 6600 _years_.

_That_ is the security difference between the eight-character pwgen
password and the six-word Diceware passphrase.

To get the same security as the six-word Diceware passphrase using the
pwgen alphabet ([a-zA-Z0-9]), assuming the characters are selected
truly randomly, you need log62(2^77.5) ~ 13 characters. log2(62^13) ~
77.4 bits of search space, which for all intents and purposes is the

Because Diceware passphrases use real words and real-word-lookalikes,
at a given bit strength level they tend to be easier to memorize than
truly random pwgen-style passwords. "cleft cam synod lacy yr wok" is
probably easier to memorize than "Hai0theePuXai".

Of course, because "cleft cam synod lacy yr wok" is published as an
example, its entropy in practice is significantly lower than 77.5
bits. You should never use a passphrase that anyone has published as
an example. (You probably realize this already, but just in case
someone comes across this later and doesn't realize it's a published

> I thought I was going to use the same password as my login password, so 
> I wouldn't have to enter 2 passwords during every boot.

That's up to you. (And there is no reason why you can't use a Diceware
passphrase for both, either.) I would keep in mind that you type the
login password significantly more often (for example, to unlock the
screen saver) which presents additional opportunities for an adversary
to learn your passphrase. This may or may not be a concern to you.

Michael Kjörling • https://michael.kjorling.semichael at kjorling.se
OpenPGP B501AC6429EF4514 https://michael.kjorling.se/public-keys/pgp
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)

More information about the dm-crypt mailing list