CephNotes

Some notes about Ceph
Laurent Barbe @Adelius / INRAE

Dealing with some osd timeouts

Certain operations may occasionally take longer for the OSD to process. And the operation may fail, or even make the OSD to suicide. There are many parameters for these timeouts. Some examples :

Thread suicide timed out

heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1ee3ca7700' had suicide timed out after 150
common/HeartbeatMap …

Erasure code on small clusters

Erasure code is rather designed for clusters with a sufficient size. However if you want to use it with a small amount of hosts you can also adapt the crushmap for a better matching distribution to your need.

Here a first example for distributing data with 1 host OR 2 …

Change log level on the fly to Ceph daemons

Aaahhh full disk this morning. Sometimes the logs can go crazy, and the files can quickly reach several gigabytes.

Show debug option (on host) :

# Look at log file
tail -n 1000 /var/log/ceph/ceph-osd.33.log

# Check debug levels
ceph daemon osd.33 config show | grep '"debug_'
    "debug_none …

Main new features in the latest versions of ceph

It's always pleasant to see how fast new features appear in Ceph. :)

Here is a non-exhaustive list of some of theme on the latest releases :

Kraken (October 2016)

  • BlueStore declared as stable
  • AsyncMessenger
  • RGW : metadata indexing via Elasticseasrch, index resharding, compression
  • S3 bucket lifecycle API, RGW Export NFS version 3 …