CephNotes

Some notes about Ceph
Laurent Barbe @CCM Benchmark

Multiple Clusters on the Same Hardware: OSD Isolation With LXC

Ceph makes it easy to create multiple cluster on the same hardware with the naming of clusters. If you want a better insolation you can use LXC, for example to allow a different version of Ceph between your clusters.

For this you will need access to the physical disks from the container. You just allow access to the device with cgroup and create the device with mknod :

# Retrieve the major and minor number for a device :
$ ls -l /dev/sda5
brw-rw---T 1 root disk 8, 5 janv. 26 18:47 /dev/sda5

$ mknod /var/lib/lxc/container-cluster1/rootfs/dev/sda5 b 8 5
$ echo "lxc.cgroup.devices.allow = b 8:7 rwm" >> /var/lib/lxc/container-cluster1/config

Replace Apache by Civetweb on the RadosGW

Since Firefly you can test the use of the lightweight web client Civetweb instead of Apache. To activate it, it’s very simple, there’s nothing to install again, simply add this line to your ceph.conf:

[client.radosgw.gateway]
rgw frontends = "civetweb port=80"
...

If you have already installed apache, remember to stop it before activating civetweb, or it must not listen on the same port.

Then :

/etc/init.d/radosgw restart

Difference Between ‘Ceph Osd Reweight’ and ‘Ceph Osd Crush Reweight’

From Gregory and Craig in mailing list…

“ceph osd crush reweight” sets the CRUSH weight of the OSD. This
weight is an arbitrary value (generally the size of the disk in TB or
something) and controls how much data the system tries to allocate to
the OSD.

“ceph osd reweight” sets an override weight on the OSD. This value is
in the range 0 to 1, and forces CRUSH to re-place (1-weight) of the
data that would otherwise live on this drive. It does *not* change the
weights assigned to the buckets above the OSD, and is a corrective
measure in case the normal CRUSH distribution isn’t working out quite
right. (For instance, if one of your OSDs is at 90% and the others are
at 50%, you could reduce this weight to try and compensate for it.)

Note that ‘ceph osd reweight’ is not a persistent setting. When an OSD
gets marked out, the osd weight will be set to 0. When it gets marked in
again, the weight will be changed to 1.

Because of this ‘ceph osd reweight’ is a temporary solution. You should
only use it to keep your cluster running while you’re ordering more
hardware.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040961.html

Placement_pools on Rados-GW

The purpose of this test is to map a RadosGw Bucket to a specific Ceph pool. For exemple, if using a fast pool with ssd and a low pool for archive…

1
2
   standard_bucket datas  --> .rgw.buckets        (default pool)
   specific_bucket datas  --> .rgw.buckets.custom

Ceph RBD With LXC Containers

Update on Apr 14th, 2016:

A simple way to secure your data with containers is to use a distributed storage such as Ceph for LXC root storage.

For exemple :

1
# lxc-create -n mycontainer -t debian -B rbd --pool rbd --rbd mycontainer --fstype ext4 --fssize 500

OpenNebula 4.8 With Ceph Support on Debian Wheezy

A quick howto to install OpenNebula 4.8 with support for Ceph on Debian Wheezy.

1
2
3
4
5
6
7
8
9
10
11
12
$ onedatastore show cephds
DATASTORE 101 INFORMATION
ID             : 101
NAME           : cephds
USER           : oneadmin
GROUP          : oneadmin
CLUSTER        : -
TYPE           : IMAGE
DS_MAD         : ceph
TM_MAD         : ceph
BASE PATH      : /var/lib/one//datastores/101
DISK_TYPE      : RBD

Remove Pool Without Name

For exemple :

# rados lspools
data
metadata
rbd
                            <---- ?????
.eu.rgw.root
.eu-west-1.domain.rgw
.eu-west-1.rgw.root
.eu-west-1.rgw.control
.eu-west-1.rgw.gc
.eu-west-1.rgw.buckets.index
.eu-west-1.rgw.buckets
.eu-west-1.log


# ceph osd dump | grep "pool 4 "
pool 4 '' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1668 stripe_width 0

# rados rmpool "" "" --yes-i-really-really-mean-it
successfully deleted pool

Ceph Node.js Bindings for Librados

1
2
3
4
5
6
7
8
9
var cluster = new rados.Rados( "ceph", "client.admin", "/etc/ceph/ceph.conf");
cluster.connect();

var ioctx = new rados.Ioctx(cluster, "data");
ioctx.aio_write("testfile2", new Buffer("1234567879ABCD"), 14, 0, function (err) {
  if (err) {
    throw err;
  }
  ...

Ceph Primary Affinity

This option allows you to answer a fairly constant worry in the case of heterogeneous cluster. Indeed, all the discs do not have the same performance or not the same ratio performance / size. With this option it is possible to reduce the load on a disk without reducing the amount of data it contains. Furthermore, the option is easy to modify because it does not result in data migration. Only preference between primary / secondary will be modified and propagated to clients.