CephNotes

Some notes about Ceph
Laurent Barbe @SIB

RBD journal offloading

If you are using rbd journaling feature (for example for rbd mirroring), in some cases it could be interesting to offloading journaling on specific pool. For example if your rbd pool is on hdd drives and you also have ssd or nvme.

external rbd journal

To change rbd journaling pool, there are 2 ways :

  • Cluster wide : you need to add the option rbd_journal_pool in your ceph.conf or with ceph config, this option cannot be changed dynamiquely, it must be applied on all OSDs after restarting all of them. If your cluster is alreading deployed and uses journaling, it could be difficult to update this configuration.

  • Per image : with the --journal-pool option

Example of rbd image with mirroring

Have a look to an example rbd image with default journaling storage :

$ rbd info rbd-pve-mirrored/vm-1517-disk-0
rbd image 'vm-1517-disk-0':
    size 5 GiB in 1280 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 378952d8c4cf0c
    block_name_prefix: rbd_data.378952d8c4cf0c
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
    op_features:
    flags:
    create_timestamp: Mon Mar  6 11:14:12 2023
    access_timestamp: Mon Mar 13 08:43:01 2023
    modify_timestamp: Mon Mar 13 09:13:37 2023
    journal: 378952d8c4cf0c        # <---
    mirroring state: enabled
    mirroring mode: journal        # <---
    mirroring global id: 9b968a20-91a5-47a0-85af-84d3d79111b8
    mirroring primary: true

We can see mirroring mode journal mirroring mode: journal and journal oid journal: 378952d8c4cf0c. To get more detail use journal info:

$ rbd journal info --pool rbd-pve-mirrored --image vm-1517-disk-0
rbd journal '378952d8c4cf0c':
    header_oid: journal.378952d8c4cf0c                  # <--- journal header
    object_oid_prefix: journal_data.51.378952d8c4cf0c.  # <--- journal data
    order: 24 (16 MiB objects)
    splay_width: 4

With rados ls, you can see thoses objects in the current rbd pool (journal.378952d8c4cf0c and journal_data.51.378952d8c4cf0c.)

$ rados -p rbd-pve-mirrored ls | grep '^journal_data.51.378952d8c4cf0c.'
journal_data.51.378952d8c4cf0c.2
journal_data.51.378952d8c4cf0c.1
journal_data.51.378952d8c4cf0c.0
...

Migrate the journal

In case of mirroring the operation need to be done on the primary side.

Create a new pool to store journals, e.g. on nvme replicated crush rule

$ ceph osd pool create rbd-journals 32 32 nvme-replicated

Disable the mirroring on the image ! disabling mirroring will delete replicated versions of the image

$ rbd mirror image disable rbd-pve-mirrored/vm-1517-disk-0

Check that journaling feature has been disabled, or run rbd feature disable rbd-pve-mirrored/vm-1517-disk-0 journaling :

$ rbd info rbd-pve-mirrored/vm-1517-disk-0 | grep features
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:

Now we can enable journaling feature with the specific journal pool

$ rbd feature enable rbd-pve-mirrored/vm-1517-disk-0 journaling --journal-pool rbd-journals

Check the journal details :

$ rbd journal info --pool rbd-pve-mirrored --image vm-1517-disk-0
rbd journal '378952d8c4cf0c':
    header_oid: journal.378952d8c4cf0c
    object_oid_prefix: journal_data.51.378952d8c4cf0c.
    order: 24 (16 MiB objects)
    splay_width: 4
    object_pool: rbd-journals      # <--- We can see the custom pool

Re-enable mirroring in journaling mode :

$ rbd mirror image enable rbd-pve-mirrored/vm-1517-disk-0 journal

See objects on the new rbd-journals pool :

$ rados -p rbd-journals ls | grep '^journal_data.51.378952d8c4cf0c.'
journal_data.51.378952d8c4cf0c.2
journal_data.51.378952d8c4cf0c.1
journal_data.51.378952d8c4cf0c.0

Comments