CephNotes

Some notes about Ceph
Laurent Barbe @CCM Benchmark

RadosGW Big Index

$ rados -p .default.rgw.buckets.index listomapkeys .dir.default.1970130.1 | wc -l
166768275

With each key containing between 100 and 250 bytes, this make a very big object for rados (several GB)… Especially when migrating it from an OSD to another (this will lock all writes), moreover, the OSD containing this object will use a lot of memory …

Since the hammer release it is possible to shard the bucket index. However, you can not shard an existing one but you can setup it for new buckets. This is a very good thing for the scalability.

Setting up index max shards

You can specify the default number of shards for new buckets :

  • Per zone, in regionmap :
1
2
3
4
5
6
7
8
9
10
11
12
13
$ radosgw-admin region get
...
"zones": [
    {
        "name": "default",
        "endpoints": [
            "http:\/\/storage.example.com:80\/"
        ],
        "log_meta": "true",
        "log_data": "true",
        "bucket_index_max_shards": 8             <===
    },
...
  • In in radosgw section in ceph.conf (this override the per zone value)
1
2
3
4
...
[client.radosgw.gateway]
rgw bucket index max shards = 8
....

Verification :

$ radosgw-admin metadata get bucket:mybucket | grep bucket_id
            "bucket_id": "default.1970130.1"

$ radosgw-admin metadata get bucket.instance:mybucket:default.1970130.1 | grep num_shards
            "num_shards": 8,

$ rados -p .rgw.buckets.index ls | grep default.1970130.1
.dir.default.1970130.1.0
.dir.default.1970130.1.1
.dir.default.1970130.1.2
.dir.default.1970130.1.3
.dir.default.1970130.1.4
.dir.default.1970130.1.5
.dir.default.1970130.1.6
.dir.default.1970130.1.7

Bucket listing impact :

A simple test with ~200k objects in a bucket :

num_shard time (s)
0 25
8 36
128 109

So, do not use buckets with thousands of shards if you do not need it, because the bucket listing will become very slow…

Link to the blueprint :

https://wiki.ceph.com/Planning/Blueprints/Hammer/rgw%3A_bucket_index_scalability

Comments