CephNotes

Some notes about Ceph
Laurent Barbe @SIB

RadosGW big index

$ rados -p .default.rgw.buckets.index listomapkeys .dir.default.1970130.1 | wc -l
166768275

With each key containing between 100 and 250 bytes, this make a very big object for rados (several GB)... Especially when migrating it from an OSD to another (this will lock all writes), moreover, the OSD containing this object will use a lot of memory ...

Since the hammer release it is possible to shard the bucket index. However, you can not shard an existing one but you can setup it for new buckets. This is a very good thing for the scalability.

Setting up index max shards

You can specify the default number of shards for new buckets :

  • Per zone, in regionmap :
$ radosgw-admin region get 
...
"zones": [
    {
        "name": "default",
        "endpoints": [
            "http:\/\/storage.example.com:80\/"
        ],
        "log_meta": "true",
        "log_data": "true",
        "bucket_index_max_shards": 8             <===
    },
...
  • In in radosgw section in ceph.conf (this override the per zone value)
...
[client.radosgw.gateway]
rgw bucket index max shards = 8
....

Verification :

$ radosgw-admin metadata get bucket:mybucket | grep bucket_id
            "bucket_id": "default.1970130.1"

$ radosgw-admin metadata get bucket.instance:mybucket:default.1970130.1 | grep num_shards
            "num_shards": 8,

$ rados -p .rgw.buckets.index ls | grep default.1970130.1
.dir.default.1970130.1.0
.dir.default.1970130.1.1
.dir.default.1970130.1.2
.dir.default.1970130.1.3
.dir.default.1970130.1.4
.dir.default.1970130.1.5
.dir.default.1970130.1.6
.dir.default.1970130.1.7

Bucket listing impact :

A simple test with ~200k objects in a bucket :

num_shard time (s)
0 25
8 36
128 109

So, do not use buckets with thousands of shards if you do not need it, because the bucket listing will become very slow...

Link to the blueprint :

https://wiki.ceph.com/Planning/Blueprints/Hammer/rgw%3A_bucket_index_scalability

Comments