[dm-crypt] The future of disk encryption with LUKS2

f-dm-c at media.mit.edu f-dm-c at media.mit.edu
Sun Feb 7 08:09:58 CET 2016

This discussion of multiple headers wrt resizing seems to be
overcomplicating the issue, while potentially breaking LUKS
for my use case.

First, overcomplicating:

Putting a backup header at the very end of the device, as we've seen,
requires all sorts of gymnastics to ensure that the right things
happen with updates and resizes.  But what are we really trying to
fix here?  Accidental header smashes?  In that case, might I suggest
something much simpler:

(a) If the underlying container is smaller than some figure (100 meg?),
just use a single header.  You could back up -the whole container-
in seconds, much less the header.

(b) If it's bigger, put a second header maybe 1 meg after the first
header, and start the encrypted container after that second header.

The idea here is to allow tiny containers for those cases which make
sense (if there are any), without chewing up several extra meg for a
backup header.  But if the container is larger than (say) 100 meg, the
extra space rapidly becomes completely negligible.  We don't have to
put the header at the end of the device---just keeping it several meg
away from things that are likely to smash it is fine.  (Something that
decides to eat 10's of meg into your filesystem is rare and will turn
the FS to swiss cheese anyway and you're going to have to go back to
your backups at that point, most likely.)

This doesn't solve the how-to-update-correctly problem (since we're
still talking two or more headers), but it -does- mean that enlarging
the partition -does not- require relocating a backup header!  This,
in turn, means no pressure to remove the ability to resize (most
especially, to -grow-) the container, which is very important to my
use case.

My use case:

I crucially depend on LUKS being able to grow to a larger container
-without having to throw away the existing filesystem-.  Why?  Because
one of my most-important use cases is a giant encrypted filesystem
which holds dirvish vaults.  These vaults are -very- extensively
hardlinked, not only forwards and backwards in time, but also sideways
across vaults, because I run faster-dupemerge across them to squeeze
out identical copies of files from similar hosts and from movement of
files from one host to another.

For example, I'm looking at one FS right now with 8 TB in it,
consisting of about 845 million inodes, with a huge number of
those reflecting files with tens of hardlinks or more.

This FS is built on top of LUKS, on top of LVM, on top of RAID.
I have enlarged it by either migrating to larger disks, or by
adding disks and adding LV's, then growing LUKS to cover them
(which it does automatically, since it resizes itself to the
size of the underlying device), and then growing the filesystem.
[I can do this online, since the filesystem is ext4.]

Because there are so many hardlinks and the filesystem is so large, it
is NOT POSSIBLE to copy this filesystem at the file level to another
device.  I can point at previous discussions from years ago on other
lists detailing the difficulties, but here's an outline:  Neither
rsync nor tar nor cpio can walk the entire filesystem without eating
enormous quantities of RAM, which cannot physically fit in the
machine, which means enormous paging, which means runtime of months
if not years.  It is also infeasible to move the filesystem in slices,
because that would break all the hardlinks between the shards, and
recomputing them is both computationally expensive -and- would alter
directory write times in undesireable ways.

When I have migrated this FS to different hardware in the past, I've
either done a block-level copy with dd (after dismounting it, of
course), or swapped RAID devices underneath it, And then, if I'm going
to larger disks (the primary reason for moving it, especially before
it was also a RAID), I've resized, including resizing the LUKS layer,
of course.

If LUKS lost the ability to resize in place, LUKS would become useless
to me for this workload.  "You should copy it elsewhere and redo LUKS
and then copy back" is simply a nonstarter.  At the very best, that
would mean a block-level copy of the whole thing, recreation of LUKS,
and copy back, using either dd or playing RAID games, while the entire
FS was down.  That's several days.  But the current scheme, where LUKS
can resize in place, means I can (if I have to) back up the filesystem
via RAID (e.g., add disks, sync, remove) while it's still up, then
resize, and see no downtime at all.

More information about the dm-crypt mailing list