[dm-crypt] En/decrypting in multi-sector batches?

Milan Broz mbroz at redhat.com
Wed Apr 27 09:19:20 CEST 2011


On 04/26/2011 05:17 PM, Will Drewry wrote:
> Recently, I've been benchmarking some different hardware crypto
> accelerators and many of them appear to be tuned toward largish
> requests (up to 16k) with a given key and a base IV.

Please can you explicitly say which accelerators you are using and
show some benchmarks?

Does dmcrypt work in async crypto mode or you have also
"accelerators" like special instructions which run synchronously
(like AES-NI)?

Of course large block means smaller overhead but the difference should
not be significant (at least in theory).
If it is, we need to know why - it can be because of timing or
the way how the request are submitted not the time
or real encryption (initialization) itself.

In this case the crypto driver should be optimised first.

>  I've created a
> very simple patch for dm-crypt that uses PAGE_SIZE blocks to aid in
> the driver performance testing, but I lack the cryptographic
> understanding to determine if there is significant exposure by
> allowing a dm-crypt device to use a block size that exceeds the sector
> size.

As Arno said, there should be no real security problem for these block
sizes. Basically we are just using CBC or XTS mode today.

For XTS-AES, definition explicitly says that data unit (= your block)
size should not exceed 2^20 128bit blocks (128bit = AES cipher block).
(And even here possible attacks are closely related to birthday
bound, IOW you need to have enough blocks encrypted with the same key.)

So I do not see real security problem here. But problems are elsewhere.

> 1. Does anyone know if there will be significant exposure to the
> plaintext if dm-crypt used larger block sizes?

Should not be.

> 2. Would an optional, configurable block-size (up to PAGE_SIZE) be of
> interest?

Short answer would be no :-)

As I said, I would like to prove first that the problem is really in block
size and not in related problem.

Now the real problems:

The whole device mapper and dmcrypt works as transparent block encryption
and we are always operating on 512B sectors.

Even if device is 4k blocks, this is hidden in underlying layer and
DM just properly aligns data and propagates limits but
still operates on 512B sectors. (It can be ineffective for some
IO patterns, but it works).

Changing encryption block size causes device to be incompatible with other
systems (note stacked devices, a common thing here - LVM over dmcrypt)
and IOs. You have to generate only aligned IO of your encryption block size.

(or change dmcrypt significantly)

IO hints is not enough - maybe example is better here:

Testing device (some random data there, not important)
# dmsetup table --showkeys
x: 0 417792 crypt aes-cbc-essiv:sha256 aeb26d1f69eb6dddfb9381eed4d7299f091e99aa5d3ff06866d4ce9f620f7aca 0 8:16 0

Let's generate some direct IOs (to avoid page cache)

*Without* your patch:

# dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

# dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

# dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
eed6cf19ee9b2ecc5f4a6d1b251468fd9d691cbee67124de730078a1eda2c0c4  -

As you can see, we get the same plain data with different IO sizes.

Now *with* your patch (page size is 4096):

# dd if=/dev/mapper/x iflag=direct bs=512 count=32 | sha256sum 
dd: reading `/dev/mapper/x': Invalid argument

# dd if=/dev/mapper/x iflag=direct bs=4096 count=4 | sha256sum 
4f4271e7799097b6e0ed66d81a8341163b8a5a06a2c57f50b930d429a7aa94d1  -

# dd if=/dev/mapper/x iflag=direct bs=8192 count=2 | sha256sum 
17cf9897059800f5b43af38766471048b872d20a0f565ee553a351b1a6251141  -

So block size of 512B causes operation to fail (ok - IO hints).
IO of block encryption size and multiple of encryption size returns
apparently something different now.

This is probably not what we want...

(Note that I did not even tested cross-encryption-block operations.)

Even if this is somehow solved, many other problems remains:

- we need to extend mapping table parameters so the block size
must be configurable (encrypted device image must be readable
on system with different page size, I have e.g. Sparc with 8k page size.
(This will be needed for other extensions so it is not real
problem, just it need to be done first.)

- you need to store this block size info in header,
for LUKS it means using new LUKS header version
(requiring parameter on commandline is dangerous - it must be enforced)

I would really better to not support this yet and first try to optimize
crypto layer such way that it can process 512B blocks more
efficiently (of course it will not fix bad hw but it can help batching
sector encryption, maybe suing some hints, dunno).


