[dm-crypt] (More) Questions about LUKS / LVM

Arno Wagner arno at wagner.name
Mon Oct 3 12:55:23 CEST 2011

On Mon, Oct 03, 2011 at 08:17:36AM +0200, Luca Berra wrote:
> On Tue, Sep 20, 2011 at 04:14:31PM +0200, Arno Wagner wrote:
> >Indeed. Especially with the incredible mess that MD superblock
> >positioning is. I only use superblock format 0.9 for that
> >reason. Then I at least know it is at the end and that the kernel
> >can auto-detect. They should have let it stay there. That would
> >have been massively better than the insanity of having 3 possible
> >positions.
> Please, before speaking against something do some research.

Why do you assume I have not done research? 
> There is no reason on earth to use 0.90 superblocks nowadays.

I do not agree.

> Even if it seems easier to do that with in-kernel autodetection and
> being able to access two halves of a mirror like they were a single
> disk, the drawbacks are unacceptable.

And I do not agree to that either.

> In kernel autodetection is not smart enough and can backfire, just plug
> a usb or e-sata device with an md superblock with the same md minor
> number as your root mirror.

Whyever would I have a MD element on a removable disk, except
on purpose and being very careful with it? 

And are you telling me the Kernel _lies_ to me when it
tells me right during boot, that it is using the UUID 
for array assembly? That would be incredibly bad and outright 
malicious! The following certainly had me assume that UUIDs 
are used as basis for assembly:

md: autorun ...
md: considering sdb9 ...
md:  adding sdb9 ...
md: sdb8 has different UUID to sdb9
md: sdb7 has different UUID to sdb9
md: sdb6 has different UUID to sdb9
md: sdb5 has different UUID to sdb9
md: sdb2 has different UUID to sdb9
md: sdb1 has different UUID to sdb9
md: sdc10 has different UUID to sdb9
md:  adding sdc9 ...

> It has been left as is for historical reasons, the proper fix is using
> an initramfs, without bloating the kernel with unneeded code.

I do not use initramfs. It adds massive complexity and
intransparency, and increases maintenance effort without good 
reasons, at least in my set-ups.

I do see the distribution-kernel benefit, but I have not used
them since the early days of kernel 1.1.x.

I also do not use modular kernels, except where no other option
exists, again to decrease complexity. 

> accessing a raid member like it was a non raid device is also a bad
> idea. it is better to force assembly of a degraded array.
> Putting metadata at the end also raises a lot of confusion with
> partition tables, which are at the start of the disk.
> If you create a partition ending at the end of the disk, then add the
> partition to an md array, 0.9 metadata would be at the same location
> than if you added the whole device to the array.

So what? 

> If you create an raid 0/5/6 array using whole devices then partition it, the
> kernel will see a broken partition table on one or more of the component
> devices. This extends to any other kind of data besides partition.
> Add udev and event-driven activation of disks (especially in its first
> very early stages) and people started having the weirdest problems.

Well, If people are not competent to remove an md superblock, that 
can of course have any kind of bad effects. However, this is not a 
low-competence area to be messing around in.

Incidentally, I think udev also is a mess that creates significantly
more problems than it solves. Automagic gone wrong. Replacing something
simple with something complex needs a _very_ _good_ reason, especially
in infrastructure. It seems to me some people in kernel design have
never heard of the "Second System Effect" and are making beginners
mistakes. (As designers, not as coders.)
> Then there are limitations on number of components and array sizes which
> are possible to reach, and have already been reached by a number of
> users.

I agree on that. Design mistake, the same old story all around, 
people never learn or look more than a few years ahead. Or back. 

> The only reason nowadays to keep metadata at the end of a device is a
> limitation of grub 1, which cannot boot otherwise.

Your criticism of kernel autodetection is unconvincing. I have
been using it for about a decade now in numerous installations.
I like it and never had any problems with it.

Give me autodetection with the other formats and maybe I will switch. 

And let me add that conceptually, automatic RAID assembly
is definitiely the task of the RAID controller. Here that 
would be the md driver in the kernel (or the kernel itself),
not some external scripting or tool. 

> The latter case is covered by metadata 1.0, which addresses most of the
> limitations of 0.9, still keeping metadata at end in order to please
> grub.
> Then in order to protect the innocent, a schema with metadata at the
> start was implemented, first attempt was 1.0 (which imho was a bad
> idea).
> It puts metadata at the very beginning of disk, which poses metadata at
> risk of being overwritten (since that location is often used by mbr and
> partition table).

And from the number of people that manage to trash their LUKS 
headers, this is a real risk. However, the fix with the offset
is just a nightmare as it breaks layering.

Never break something fundamental unless there is absolutely 
no other option! 

The principles of least surprise and simplicity are fundamental 
to all engineering and ignoring them is a recipe for disaster!

Fixing one bad decision (end placement) with another bad
decision (start placement which breaks grub) and then a third 
bad decision (offset placement that still breaks grub and in 
addition breaks layering) shows fundamental problems in the 
dev team. 

Better to lose some md components now and then, that to lose the 
overall hierachical layering and surprise competent folks. 
That is how you create huge disasters. The history of engineering
failure is full of them.

The incompetent will always be surprised, even by obvious things. 
Trying to fix that is just stupid. 

> In order to avoid that metadata 1.2 was devised, it is stored at
> beginnning of disk, with an offset from the start, in order for it to be
> somewath protected. There is also room on the disk to store some form of
> boot code.
> Consensus now is use metadata 1.2 for almost everything except for
> mirrors containing /boot, which need to use 1.0.

And there is the mess. It would be very hard to do this any
worse and have it still working. To misquote "Those that
prefer a little safety over architectural soundness 
shall neither have safety nor architectural soundness."

It is far better to have one format in one place or at least 
several similar formats in the same place, than having them 
all over the device, even if that one place is not optimal.

We have seen people here desparately trying to figure out 
where their MD superblocks where, all the while juggeling 
partitioning, LUKS headers and LVM headers as well, with some 
of them partially or fully overwritten and, to make matters 
worse, intransparent magic hidden in the initramfs. At least 
the other superblocks are all in a consistent place, only the 
md-folks have gone off the deep end.

Sorry, my original comment stands. This is a mess and shows
bad decision making, possibly due to inexperience.

At this time, format 0.90 is the best option (or rather
"least bad") and the efforts of fixing it have been an
impressive failure.

Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno at wagner.name 
GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25 338F
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it.  The very definition of 
"news" is "something that hardly ever happens." -- Bruce Schneier 

More information about the dm-crypt mailing list