CephNotes

Some notes about Ceph
Laurent Barbe @CCM Benchmark

RBD journal offloading

If you are using rbd journaling feature (for example for rbd mirroring), in some cases it could be interesting to offloading journaling on specific pool. For example if your rbd pool is on hdd drives and you also have ssd or nvme.

external rbd journal

To change rbd journaling pool, there are 2 …

How many mouvement when I add a replica ?

Make a simple simulation !

Use your own crushmap :

$ ceph osd getcrushmap -o crushmap

got crush map from osdmap epoch 28673

Or create a sample clushmap :

$ crushtool --outfn crushmap --build --num_osds 36 host straw 12 root straw 0

2017-07-28 15:01:16.240974 7f4dda123760  1 
ID  WEIGHT  TYPE NAME
-4 …

Dealing with some osd timeouts

In some cases, some operations may take a little longer to be processed by the osd. And the operation may fail, or even make the OSD to suicide. There are many parameters for these timeouts. Some examples :

Thread suicide timed out

heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1ee3ca7700' had suicide timed …

Erasure code on small clusters

Erasure code is rather designed for clusters with a sufficient size. However if you want to use it with a small amount of hosts you can also adapt the crushmap for a better matching distribution to your need.

Here a first example for distributing data with 1 host OR 2 …

Crushmap for 2 DC

An example of crushmap for 2 Datacenter replication :

rule replicated_ruleset {
    ruleset X
    type replicated
    min_size 2
    max_size 3
    step take default
    step choose firstn 2 type datacenter
    step chooseleaf firstn -1 type host
    step emit
}

This working well with pool size=2 (not recommended!) or 3. If you set pool …