Ceph Crush Map规则剖析
bucket type 类型
bucket 层次结构中内部节点的 map 术语:主机、机架、行等。CRUSH 映射定义了一系列用于描述这些节点的类型。默认情况下,这些类型包括:
- osd (or device)
- host 主机
- chassis
- rack 机架
- row
- pdu
- pod
- room 机房
- datacenter 数据中心
- region 亚洲,欧洲
- root
查看本地 Crush 层级结构的简单视图
ceph osd crush tree
ceph osd crush dump
ceph osd pool get <pool_name> crush_rule
可以看到为群集定义的规则:
ceph osd crush rule ls
您可以查看规则的内容:
ceph osd crush rule dump
命令定制 crush map
# 创建 ssd root bucket
[root@node1 ~]# ceph osd crush add-bucket ssd root
# 创建 节点 host bucket
[root@node1 ~]# ceph osd crush add-bucket node1-ssd host
[root@node1 ~]# ceph osd crush add-bucket node2-ssd host
[root@node1 ~]# ceph osd crush add-bucket node3-ssd host
# 将节点host bucket 指定 root bucket
[root@node1 ~]# ceph osd crush move node1-ssd root=ssd
[root@node1 ~]# ceph osd crush move node2-ssd root=ssd
[root@node1 ~]# ceph osd crush move node3-ssd root=ssd
[root@node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0 root ssd
-10 0 host node1-ssd
-11 0 host node2-ssd
-12 0 host node3-ssd
-1 0.14635 root default
-3 0.04878 host node1
0 hdd 0.01949 osd.0 up 1.00000 1.00000
3 hdd 0.02930 osd.3 up 1.00000 1.00000
-5 0.04878 host node2
1 hdd 0.01949 osd.1 up 1.00000 1.00000
4 hdd 0.02930 osd.4 up 1.00000 1.00000
-7 0.04878 host node3
2 hdd 0.01949 osd.2 up 1.00000 1.00000
5 hdd 0.02930 osd.5 up 1.00000 1.00000
# 将已有的 osd 节点 添加到创建的 crush map
[root@node1 ~]# ceph osd crush move osd.3 host=node1-ssd root=ssd
[root@node1 ~]# ceph osd crush move osd.4 host=node2-ssd root=ssd
[root@node1 ~]# ceph osd crush move osd.5 host=node3-ssd root=ssd
[root@node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.08789 root ssd
-10 0.02930 host node1-ssd
3 hdd 0.02930 osd.3 up 1.00000 1.00000
-11 0.02930 host node2-ssd
4 hdd 0.02930 osd.4 up 1.00000 1.00000
-12 0.02930 host node3-ssd
5 hdd 0.02930 osd.5 up 1.00000 1.00000
-1 0.05846 root default
-3 0.01949 host node1
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-5 0.01949 host node2
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-7 0.01949 host node3
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# 创建 crush rule
[root@node1 ~]# ceph osd crush rule create-replicated
Invalid command: missing required parameter name(<string(goodchars [A-Za-z0-9-_.])>)
osd crush rule create-replicated <name> <root> <type> {<class>} : create crush rule <name> for replicated pool to start from <root>, replicate across buckets of type <type>, use devices of type <class> (ssd or hdd)
Error EINVAL: invalid command
[root@node1 ~]# ceph osd crush rule create-replicated ssd-demo ssd host hdd
# 查看创建的 crush 规则
[root@node1 ~]# ceph osd crush rule dump
[root@node1 ~]# ceph osd crush rule ls
replicated_rule
ssd-demo
# 修改资源池的 crush 规则
[root@node1 ~]# ceph osd pool set pool_demo crush_rule ssd-demo
set pool 9 crush_rule to ssd-demo
[root@node1 ~]# ceph osd pool get pool_demo crush_rule
crush_rule: ssd-demo
# 验证存放节点是否和设置的一样
[root@node1 ~]# rbd create pool_demo/demo.img --size 5G
[root@node1 ~]# rbd -p pool_demo ls
demo.img
[root@node1 ~]# ceph osd map pool_demo demo.img
osdmap e183 pool 'pool_demo' (9) object 'demo.img' -> pg 9.c1a6751d (9.d) -> up ([3,4,5], p3) acting ([3,4,5], p3)
注意事项
- 反编译二进制文件创建的时候需要保留初始.bin文件,有问题方便恢复
- 尽量在集群搭建之初就设计好,否则有数据在进行修改的话会进行大量的迁移,影响性能
- 手动修改后,重启 ceph-osd 会恢复成设计之初,osd 节点crush map 会失效,需要修改参数
osd crush update on start = false
[root@node1 my-cluster]# cat ceph.conf # [osd] 内容
[global]
fsid = 3f5560c6-3af3-4983-89ec-924e8eaa9e06
public_network = 192.168.6.0/24
cluster_network = 172.16.79.0/16
mon_initial_members = node1
mon_host = 192.168.6.160
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon_allow_pool_delete = true
[client.rgw.node1]
rgw_frontends = "civetweb port=80"
[osd]
osd crush update on start = false
[root@node1 my-cluster]# ceph-deploy --over-write config push node1 node2 node3
[root@node1 my-cluster]# systemctl restart ceph-osd.target
[root@node1 my-cluster]# ssh node2 systemctl restart ceph-osd.target
[root@node1 my-cluster]# ssh node3 systemctl restart ceph-osd.target