[dm-crypt] Latency penalty

Ivan Babrou ibobrik at gmail.com
Fri Sep 22 00:34:41 CEST 2017


Hello,

We were looking at LUKS performance and found some disturbing numbers on SSDs.

* Linear write performance

We took 2 identical disks, encrypted one of them, put XFS on both and
tested linear write speed with fio:

[rewrite]
size=200g
bs=1m
rw=write
direct=1
loops=10000

Without LUKS we are getting 450MB/s write, with LUKS we are twice as
low at 225MB.s

* Linear read performance

To avoid hitting any XFS bugs we just read 1GB from raw device and
from corresponding LUKS device, both with direct io. We try different
block sizes too. Here's the script we used:

#!/bin/bash -e

SIZE=$((1024 * 1024 * 1024))

for power in $(seq 12 30); do
  BS=$((2 ** $power))
  COUNT=$(($SIZE / $BS))
  TIME_DIRECT=$(sudo dd if=/dev/sdd of=/dev/null bs=$BS count=$COUNT
iflag=direct 2>&1 | tail -n1 | awk '{ print $(NF-1) }')
  TIME_LUKS=$(sudo dd if=/dev/mapper/luks-sdd of=/dev/null bs=$BS
count=$COUNT iflag=direct 2>&1 | tail -n1 | awk '{ print $(NF-1) }')
  echo -e "${BS}\t${TIME_DIRECT}\t${TIME_LUKS}"
done

And the output:

4096    59.5    52.6
8192    103 91.0
16384   158 139
32768   227 181
65536   287 228
131072  354 243
262144  373 251
524288  428 307
1048576 446 327
2097152 474 396
4194304 485 431
8388608 496 464
16777216    499 483
33554432    504 498
67108864    508 503
134217728   508 506
268435456   510 509
536870912   511 511
1073741824  512 512

Here are the results on the graph: https://i.imgur.com/yar1GSC.png

If I re-do this test with 1GB file on actual filesystem:

#!/bin/bash -e

SIZE=$((1024 * 1024 * 1024))

for power in $(seq 12 30); do
 BS=$((2 ** $power))
 TIME_DIRECT=$(sudo dd if=/mnt/sda/zeros of=/dev/null bs=$BS
iflag=direct 2>&1 | tail -n1 | awk '{ print $(NF-1) }')
 TIME_LUKS=$(sudo dd if=/mnt/sdd/zeros of=/dev/null bs=$BS
iflag=direct 2>&1 | tail -n1 | awk '{ print $(NF-1) }')
 echo -e "${BS}\t${TIME_DIRECT}\t${TIME_LUKS}"
done

And the output:

4096    73.5    54.8
8192    123 86.2
16384   189 130
32768   251 176
65536   302 226
131072  345 239
262144  373 243
524288  395 287
1048576 435 297
2097152 438 373
4194304 457 410
8388608 464 429
16777216    469 448
33554432    474 459
67108864    477 463
134217728   478 467
268435456   480 469
536870912   480 470
1073741824  481 471

Here are the results on the graph: https://i.imgur.com/OQk6kDo.png

If I do 1MB block reads from raw device (sda) and from LUKS block
device (sdd), then I see the following:

ivan at 36com1:~$ iostat -x -m -d 1 /dev/sd* | grep -E '^(Device:|sda|sdd)'
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.01     0.00   76.84    0.34    20.32    16.86
986.70     0.60    7.77    1.84 1337.82   0.64   4.94
sdd               0.03     0.00  379.40    0.83    42.57    33.79
411.32     1.64    4.31    1.66 1214.03   0.31  11.87
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 1002.00    0.00   501.00     0.00
1024.00     1.50    1.50    1.50    0.00   0.97  97.60
sdd               0.00     0.00  655.00    0.00   327.50     0.00
1024.00     1.00    1.53    1.53    0.00   1.01  66.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00  999.00    0.00   499.52     0.00
1024.05     1.51    1.51    1.51    0.00   0.98  97.80
sdd               0.00     0.00  650.00    0.00   325.00     0.00
1024.00     1.00    1.53    1.53    0.00   1.01  65.60
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00  983.00    0.00   491.48     0.00
1023.95     1.52    1.54    1.54    0.00   1.00  98.30
sdd               0.00     0.00  648.00    0.00   324.00     0.00
1024.00     1.00    1.54    1.54    0.00   1.01  65.30
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00  979.00    0.00   490.00     0.00
1025.05     1.51    1.54    1.54    0.00   1.00  98.10
sdd               0.00     0.00  646.00    0.00   323.00     0.00
1024.00     0.99    1.54    1.54    0.00   1.01  65.20
^C

End results are 509MB/s and 360MB/s to read full 240GB. This is a
pretty hard hit.

* Random write performance

The following fio scenario was used:

[rewrite]
size=10g
bs=64k
rw=randwrite
direct=1
numjobs=20
loops=10000

Raw block device gave us ~320MB/s, LUKS only does ~40MB/s.

* In-memory results

I made two 10GB loopback devices in tmpfs and formatted one of them as
LUKS. Plain device can read at 4.5GB/s, LUKS device can read at
0.85GB/s. This is a big difference, but it doesn't really explain
results from physical SSD.

We are running kernel 4.9, but 4.4 seems to have the same behavior. We
tried completely different SSD model and it had the same behavior
(352MB/s vs 274MB/s linear read). Spinning disks we have can do under
200MB/s linear read and do not expose the issue.

Are these numbers expected? Is there any way to improve this situation?

Thanks!


More information about the dm-crypt mailing list