[dm-crypt] KISS (was disappearing luks header and other mysteries)

Sven Eschenberg sven at whgl.uni-frankfurt.de
Sun Sep 21 17:38:13 CEST 2014

On Sun, September 21, 2014 16:29, Marc Ballarin wrote:
> Am 21.09.2014 um 11:58 schrieb Arno Wagner:
>> On Sat, Sep 20, 2014 at 02:29:43 CEST, Sven Eschenberg wrote:
>>> Well, it is not THAT easy.
>> Actially it is.
>>> If you want resilience/availability, you'll need RAID. Now what do you
>>> put
>>> ontop of the RAID when you need to slice it?
>> And there the desaster starts: Don't slice RAID. It isnot a good
>> idea.
>>> Put a disklabel/partition on
>>> top of it and stick with a static setup or use LVM which can span
>>> multiple
>>> RAIDs (and types) supports snapshotting etc. . Depending on your needs
>>> and
>>> usage you will end up with LVM in the end. If you want encryption,
>>> you'll
>>> need a crypto layer (or you put it in the FS alongside volume slicing).
>>> Partitions underaneath the RAID, not necessary if the RAID
>>> implementation
>>> can subslice physical devices and arrange for different levels on the
>>> same
>>> disk. Except unfortunately, when you need a bootloader.
>>> I don't see any alternative which would be KISS enough, except merging
>>> the
>>> layers to avoid collissions due to stacking order etc. . Simple usage
>>> and
>>> debugging for the user, but the actual single merged layer would be
>>> anything but KISS.
>> You miss one thing: LVM breaks layereing and rather badly so. That
>> is a deadly sin. Partitioning should only ever been done on
>> monolithic devices. There is a good reason for that, namely that
>> parition-raid, filesystems and LUKS all respect partitioning per
>> default, and hence it actually takes work to break the container
>> structure.
> Hi,
> I don't see how LVM breaks layering. In theory it replaces partitioning,
> but in practice it is still a very good idea to use one single partition
> per visible disk as a (more or less) universally accepted way to say
> "there is something here, stay away!". The same applies to LUKS or plain
> fiilesystems. No reason to put them on whole disks.
> The megabyte or so that you sacrifice for the partition table (plus
> alignment) is well spent. Partitions do not cause any further overhead,
> as unlike device mapper, they do not add a layer to the storage stack
> (from a users POV they do, but not from the kernel's).

I always wondered why there werem't arbitary slicing schemes. In the very
beginning firmware blindly loaded code from sector 0, why waste codespace
for slicing metadata? Admitted, having it alongside the code data made
things a little easier decades back.

> Note that there is little reason to use mdraid for data volumes nowadays
> (that includes "/" when using a proper initramfs). LVM can handle this
> just fine and unlike mdadm has not seen any major metadata changes, or
> even metadata location changes, in the last years. But I'm not sure, it
> can offer redundancy on boot devices. In theory it should, if the boot
> loader knows how to handle it, but I have never tested it. This is
> basically the "merging of layers" that Sven talked about.

I overlooked that, I guess I'll have to look into this, maybe I can
eliminate mdraid in the long run. The bootloader itself is the problem
If the firmware was extensible in a sane way, you'd add a module that
takes care of reading the metadata and providing access to the actual
bootloader, or you could have the bootloader within the firmware. That's
even true for (U)EFI where extensions (haha) need to reside on an ESP
readable by the firmware. Quite insane.

> Btrfs and ZFS push this even further, and while they are complex beasts,
> they actually eliminate a lot of complexity for applications and users.
> Just look at how simple, generic and cheap it becomes to create a
> consistent backup by using temporary snapshots, or to preserve old
> versions by using long lived snapshots. This can replace application
> specific backup solutions, that cost an insane amount of money and whose
> user interfaces are based on the principles of Discordianism (so that
> training becomes mandatory).
> Also: Stay away from tools like gparted or parted. Resizing and, above
> all, moving volumes is bound to cause problems. For example, looking at
> John Wells issue from august 18th (especially mail
> CADt3ZtscbX-rmMt++aXme9Oiu3sxiBW_MD_CGJM_b=t+iMaerQ), the most likely
> culprit really wasn't LVM, but parted. It seems to have set up scratch
> space where it should not have.
> Once resizing or volume deletions/additions are necessary, LVM is
> actually the much simpler and more robust solution. Resizing as well as
> deletions and additions in LVM are well defined, robust and even
> undoable (as long as the filesystem was not adjusted/created). At work,
> we use that on 10,000s of systems.

Until now I always was quite lucky doing resizing and other
transformations (including parted operations and hex editing metadata),
but indeed there's no safety-net and no double bottom. Once you go there,
there's no turning back when things start to wreck.

BTW, it has been quite some time since I deeply looked into LVM, can LVM
nowadays 'defrag' LVs? Say I grow LVs and they reside on different PE
groups on the same PV, can I merge these groups down to get a single
continuous area?

> Lastly, it should be noted, that complex storage stacks like
> MD-RAID->LVM->LUKS->(older)XFS can have reliability issues due to stack
> exhaustion (you can make it even worse by adding iSCSI, virtio,
> multi-path and many other things to your storage stack). When and if
> problems occur, depends strongly on the architecture, low-level drivers
> involved  and the kernel version, but it is likely to happen at some
> point. Kernel 3.15 defused this, by doubling the stack size on x86_64.
> (btw: That, and not bad memory, might actually be the most common cause
> behind FAQ item 4.3).

That's an interesting bit of info, luckily I never ran into this...

> Regards,
> Marc



More information about the dm-crypt mailing list