The purpose is to verify where my data is stored on the Ceph cluster.
For this, I have just create a minimal cluster with 3 osd :
$ ceph-deploy osd create ceph-01:/dev/sdb ceph-02:/dev/sdb ceph-03:/dev/sdb
Where is my osd directory on ceph-01 ?
$ mount | grep ceph
/dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime,attr2,delaylog,noquota)
The directory content :
$ cd /var/lib/ceph/osd/ceph-0; ls -l
total 52
-rw-r--r-- 1 root root 487 août 20 12:12 activate.monmap
-rw-r--r-- 1 root root 3 août 20 12:12 active
-rw-r--r-- 1 root root 37 août 20 12:12 ceph_fsid
drwxr-xr-x 133 root root 8192 août 20 12:18 current
-rw-r--r-- 1 root root 37 août 20 12:12 fsid
lrwxrwxrwx 1 root root 58 août 20 12:12 journal -> /dev/disk/by-partuuid/37180b7e-fe5d-4b53-8693-12a8c1f52ec9
-rw-r--r-- 1 root root 37 août 20 12:12 journal_uuid
-rw------- 1 root root 56 août 20 12:12 keyring
-rw-r--r-- 1 root root 21 août 20 12:12 magic
-rw-r--r-- 1 root root 6 août 20 12:12 ready
-rw-r--r-- 1 root root 4 août 20 12:12 store_version
-rw-r--r-- 1 root root 0 août 20 12:12 sysvinit
-rw-r--r-- 1 root root 2 août 20 12:12 whoami
$ du -hs *
4,0K activate.monmap → The current monmap
4,0K active → "ok"
4,0K ceph_fsid → cluster fsid (same return by 'ceph fsid')
2,1M current
4,0K fsid → id for this osd
0 journal → symlink to journal partition
4,0K journal_uuid
4,0K keyring → the key
4,0K magic → "ceph osd volume v026"
4,0K ready → "ready"
4,0K store_version
0 sysvinit
4,0K whoami → id of the osd
The data are store in the directory "current" : It contains some file and many _head file :
$ cd current; ls -l | grep -v head
total 20
-rw-r--r-- 1 root root 5 août 20 12:18 commit_op_seq
drwxr-xr-x 2 root root 12288 août 20 12:18 meta
-rw-r--r-- 1 root root 0 août 20 12:12 nosnap
drwxr-xr-x 2 root root 111 août 20 12:12 omap
In omap directory :
$ cd omap; ls -l
-rw-r--r-- 1 root root 150 août 20 12:12 000007.sst
-rw-r--r-- 1 root root 2031616 août 20 12:18 000010.log
-rw-r--r-- 1 root root 16 août 20 12:12 CURRENT
-rw-r--r-- 1 root root 0 août 20 12:12 LOCK
-rw-r--r-- 1 root root 172 août 20 12:12 LOG
-rw-r--r-- 1 root root 309 août 20 12:12 LOG.old
-rw-r--r-- 1 root root 65536 août 20 12:12 MANIFEST-000009
In meta directory :
$ cd ../meta; ls -l
total 940
-rw-r--r-- 1 root root 710 août 20 12:14 inc\uosdmap.10__0_F4E9C003__none
-rw-r--r-- 1 root root 958 août 20 12:12 inc\uosdmap.1__0_B65F4306__none
-rw-r--r-- 1 root root 722 août 20 12:14 inc\uosdmap.11__0_F4E9C1D3__none
-rw-r--r-- 1 root root 152 août 20 12:14 inc\uosdmap.12__0_F4E9C163__none
-rw-r--r-- 1 root root 153 août 20 12:12 inc\uosdmap.2__0_B65F40D6__none
-rw-r--r-- 1 root root 574 août 20 12:12 inc\uosdmap.3__0_B65F4066__none
-rw-r--r-- 1 root root 153 août 20 12:12 inc\uosdmap.4__0_B65F4136__none
-rw-r--r-- 1 root root 722 août 20 12:12 inc\uosdmap.5__0_B65F46C6__none
-rw-r--r-- 1 root root 136 août 20 12:14 inc\uosdmap.6__0_B65F4796__none
-rw-r--r-- 1 root root 642 août 20 12:14 inc\uosdmap.7__0_B65F4726__none
-rw-r--r-- 1 root root 153 août 20 12:14 inc\uosdmap.8__0_B65F44F6__none
-rw-r--r-- 1 root root 722 août 20 12:14 inc\uosdmap.9__0_B65F4586__none
-rw-r--r-- 1 root root 0 août 20 12:12 infos__head_16EF7597__none
-rw-r--r-- 1 root root 2870 août 20 12:14 osdmap.10__0_6417091C__none
-rw-r--r-- 1 root root 830 août 20 12:12 osdmap.1__0_FD6E49B1__none
-rw-r--r-- 1 root root 2870 août 20 12:14 osdmap.11__0_64170EAC__none
-rw-r--r-- 1 root root 2870 août 20 12:14 osdmap.12__0_64170E7C__none → current osdmap
-rw-r--r-- 1 root root 1442 août 20 12:12 osdmap.2__0_FD6E4941__none
-rw-r--r-- 1 root root 1510 août 20 12:12 osdmap.3__0_FD6E4E11__none
-rw-r--r-- 1 root root 2122 août 20 12:12 osdmap.4__0_FD6E4FA1__none
-rw-r--r-- 1 root root 2122 août 20 12:12 osdmap.5__0_FD6E4F71__none
-rw-r--r-- 1 root root 2122 août 20 12:14 osdmap.6__0_FD6E4C01__none
-rw-r--r-- 1 root root 2190 août 20 12:14 osdmap.7__0_FD6E4DD1__none
-rw-r--r-- 1 root root 2802 août 20 12:14 osdmap.8__0_FD6E4D61__none
-rw-r--r-- 1 root root 2802 août 20 12:14 osdmap.9__0_FD6E4231__none
-rw-r--r-- 1 root root 354 août 20 12:14 osd\usuperblock__0_23C2FCDE__none
-rw-r--r-- 1 root root 0 août 20 12:12 pglog\u0.0__0_103B076E__none → Log for each pg
-rw-r--r-- 1 root root 0 août 20 12:12 pglog\u0.1__0_103B043E__none
-rw-r--r-- 1 root root 0 août 20 12:12 pglog\u0.11__0_5172C9DB__none
-rw-r--r-- 1 root root 0 août 20 12:12 pglog\u0.13__0_5172CE3B__none
-rw-r--r-- 1 root root 0 août 20 12:13 pglog\u0.15__0_5172CC9B__none
-rw-r--r-- 1 root root 0 août 20 12:13 pglog\u0.16__0_5172CC2B__none
............
-rw-r--r-- 1 root root 0 août 20 12:12 snapmapper__0_A468EC03__noneosd
Try decompiling crush map from osdmap :
$ ceph osd stat
e12: 3 osds: 3 up, 3 in
$ osdmaptool osdmap.12__0_64170E7C__none --export-crush /tmp/crushmap.bin
osdmaptool: osdmap file 'osdmap.12__0_64170E7C__none'
osdmaptool: exported crush map to /tmp/crushmap.bin
$ crushtool -d /tmp/crushmap.bin -o /tmp/crushmap.txt
$ cat /tmp/crushmap.txt
# begin crush map
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root
# buckets
host ceph-01 {
id -2 # do not change unnecessarily
# weight 0.050
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.050
}
host ceph-02 {
id -3 # do not change unnecessarily
# weight 0.050
alg straw
hash 0 # rjenkins1
item osd.1 weight 0.050
}
host ceph-03 {
id -4 # do not change unnecessarily
# weight 0.050
alg straw
hash 0 # rjenkins1
item osd.2 weight 0.050
}
root default {
id -1 # do not change unnecessarily
# weight 0.150
alg straw
hash 0 # rjenkins1
item ceph-01 weight 0.050
item ceph-02 weight 0.050
item ceph-03 weight 0.050
}
...
# end crush map
Ok it's what I expect. :)
The cluster is empty :
$ find *_head -type f | wc -l
0
The directory list correspond to the 'ceph pg dump'
$ for dir in ` ceph pg dump | grep '\[0,' | cut -f1 `; do if [ -d $dir_head ]; then echo exist; else echo nok; fi; done | sort | uniq -c
dumped all in format plain
69 exist
To get all stats for a specific pg :
$ ceph pg 0.1 query
{ "state": "active+clean",
"epoch": 12,
"up": [
0,
1],
"acting": [
0,
1],
"info": { "pgid": "0.1",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 1,
"last_epoch_started": 12,
"last_epoch_clean": 12,
"last_epoch_split": 0,
"same_up_since": 9,
"same_interval_since": 9,
"same_primary_since": 5,
"last_scrub": "0'0",
"last_scrub_stamp": "2013-08-20 12:12:37.851559",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2013-08-20 12:12:37.851559",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "12",
"reported_epoch": "12",
"state": "active+clean",
"last_fresh": "2013-08-20 12:16:22.709534",
"last_change": "2013-08-20 12:16:22.105099",
"last_active": "2013-08-20 12:16:22.709534",
"last_clean": "2013-08-20 12:16:22.709534",
"last_became_active": "0.000000",
"last_unstale": "2013-08-20 12:16:22.709534",
"mapping_epoch": 5,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 1,
"last_epoch_clean": 12,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2013-08-20 12:12:37.851559",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2013-08-20 12:12:37.851559",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0},
"stat_cat_sum": {},
"up": [
0,
1],
"acting": [
0,
1]},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 12},
"recovery_state": [
{ "name": "Started\/Primary\/Active",
"enter_time": "2013-08-20 12:15:30.102250",
"might_have_unfound": [],
"recovery_progress": { "backfill_target": -1,
"waiting_on_backfill": 0,
"backfill_pos": "0\/\/0\/\/-1",
"backfill_info": { "begin": "0\/\/0\/\/-1",
"end": "0\/\/0\/\/-1",
"objects": []},
"peer_backfill_info": { "begin": "0\/\/0\/\/-1",
"end": "0\/\/0\/\/-1",
"objects": []},
"backfills_in_flight": [],
"pull_from_peer": [],
"pushing": []},
"scrub": { "scrubber.epoch_start": "0",
"scrubber.active": 0,
"scrubber.block_writes": 0,
"scrubber.finalizing": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []}},
{ "name": "Started",
"enter_time": "2013-08-20 12:14:51.501628"}]}
Retrieve an object on the cluster
In this test we create a standard pool (pgnum=8 and repli=2)
$ rados mkpool testpool
$ wget -q http://ceph.com/docs/master/_static/logo.png
$ md5sum logo.png
4c7c15e856737efc0d2d71abde3c6b28 logo.png
$ rados put -p testpool logo.png logo.png
$ ceph osd map testpool logo.png
osdmap e14 pool 'testpool' (3) object 'logo.png' -> pg 3.9e17671a (3.2) -> up [2,1] acting [2,1]
My Ceph logo is on pg 3.2 (main on osd.2 and replica on osd.1)
$ ceph osd tree
# id weight type name up/down reweight
-1 0.15 root default
-2 0.04999 host ceph-01
0 0.04999 osd.0 up 1
-3 0.04999 host ceph-02
1 0.04999 osd.1 up 1
-4 0.04999 host ceph-03
2 0.04999 osd.2 up 1
And osd.2 is on ceph-03 :
$ cd /var/lib/ceph/osd/ceph-2/current/3.2_head/
$ ls
logo.png__head_9E17671A__3
$ md5sum logo.png__head_9E17671A__3
4c7c15e856737efc0d2d71abde3c6b28 logo.png__head_9E17671A__3
It exactly the same :)
Import RBD
Same thing, but testing as a block device.
$ rbd import logo.png testpool/logo.png
Importing image: 100% complete...done.
$ rbd info testpool/logo.png
rbd image 'logo.png':
size 3898 bytes in 1 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.1048.2ae8944a
format: 1
Only one object.
$ rados ls -p testpool
logo.png
rb.0.1048.2ae8944a.000000000000
rbd_directory
logo.png.rbd
$ ceph osd map testpool logo.png.rbd
osdmap e14 pool 'testpool' (3) object 'logo.png.rbd' -> pg 3.d592352c (3.4) -> up [0,2] acting [0,2]
Let's go.
$ cd /var/lib/ceph/osd/ceph-0/current/3.4_head/
$ cat logo.png.rbd__head_D592352C__3
<<< Rados Block Device Image >>>
rb.0.1048.2ae8944aRBD001.005:
Here we can retrieve the block name prefix of the rbd 'rb.0.1048.2ae8944a' :
$ ceph osd map testpool rb.0.1048.2ae8944a.000000000000
osdmap e14 pool 'testpool' (3) object 'rb.0.1048.2ae8944a.000000000000' -> pg 3.d512078b (3.3) -> up [2,1] acting [2,1]
On ceph-03 :
$ cd /var/lib/ceph/osd/ceph-2/current/3.3_head
$ md5sum rb.0.1048.2ae8944a.000000000000__head_D512078B__3
4c7c15e856737efc0d2d71abde3c6b28 rb.0.1048.2ae8944a.000000000000__head_D512078B__3
We retrieve the file unchanged because it is not split :)
Try RBD snapshot
$ rbd snap create testpool/logo.png@snap1
$ rbd snap ls testpool/logo.png
SNAPID NAME SIZE
2 snap1 3898 bytes
$ echo "testpool/logo.png" >> /etc/ceph/rbdmap
$ service rbdmap reload
[ ok ] Starting RBD Mapping: testpool/logo.png.
[ ok ] Mounting all filesystems...done.
$ dd if=/dev/zero of=/dev/rbd/testpool/logo.png
dd: écriture vers « /dev/rbd/testpool/logo.png »: Aucun espace disponible sur le périphérique
8+0 enregistrements lus
7+0 enregistrements écrits
3584 octets (3,6 kB) copiés, 0,285823 s, 12,5 kB/s
$ ceph osd map testpool rb.0.1048.2ae8944a.000000000000
osdmap e15 pool 'testpool' (3) object 'rb.0.1048.2ae8944a.000000000000' -> pg 3.d512078b (3.3) -> up [2,1] acting [2,1]
It's the same place on ceph-03 :
$ cd /var/lib/ceph/osd/ceph-2/current/3.3_head
$ md5sum *
4c7c15e856737efc0d2d71abde3c6b28 rb.0.1048.2ae8944a.000000000000__2_D512078B__3
dd99129a16764a6727d3314b501e9c23 rb.0.1048.2ae8944a.000000000000__head_D512078B__3
We can notice that file containing 2 (snap id 2) contain original data. And a new file has been created for the current data : head
For next tests, I will try with stripped files, rbd format 2 and snap on pool.