Cephadm方式安装部署Ceph集群环境(octopus版)

环境说明：

系统：CentOS 7.9 最小化安装；升级软件补丁；关闭SELinux和防火墙。

开始部署：

1、本地系统预配置(以下所有操作在所有节点执行)
a）配置定义Ceph所有节点HostName；

hostnamectl set-hostname ceph1
hostnamectl set-hostname ceph2
hostnamectl set-hostname ceph3

    b）配置Host文件（/etc/hosts)配置所有节点的本地解析记录；
    192.168.80.245 ceph1
    192.168.80.246 ceph2
    192.168.80.247 ceph3

[root@ceph1 ~]# cat >> /etc/hosts <<EOF
> 192.168.80.245 ceph1
> 192.168.80.246 ceph2
> 192.168.80.247 ceph3
> EOF
[root@ceph1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.80.245 ceph1
192.168.80.246 ceph2
192.168.80.247 ceph3

[root@ceph2 ~]# cat >> /etc/hosts <<EOF
> 192.168.80.245 ceph1
> 192.168.80.246 ceph2
> 192.168.80.247 ceph3
> EOF
[root@ceph2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.80.245 ceph1
192.168.80.246 ceph2
192.168.80.247 ceph3

[root@ceph3 ~]# cat >> /etc/hosts <<EOF
> 192.168.80.245 ceph1
> 192.168.80.246 ceph2
> 192.168.80.247 ceph3
> EOF
[root@ceph3 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.80.245 ceph1
192.168.80.246 ceph2
192.168.80.247 ceph3

c）配置时间同步服务(时间不同步会造成Ceph集群健康状况告警)；CentOS7.9默认已安装Crondy时间同步服务；

vim /etc/chrony.conf    #编辑修改Chrony配置文件

# These servers were defined in the installation:
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp.aliyum.com iburst    #注释以上四行并重新配置公网NTP时间服务器地址

# Allow NTP client access from local network.
#allow 192.168.0.0/16
allow 192.168.80.0/24    #复制以上一行并根据实际情况配置允许内网访问此服务器同步时间的客户端网段地址

第一节点的时间服务器地址指向的是公网地址，其它节点的时间服务器地址指向的是第一节点的IP地址；只有第一节点需要配置允许内网，其它节点无需配置；

[root@ceph1 ~]# systemctl restart chronyd.service    #重启Chrony时间服务使配置生效（其它节点同样操作）
[root@ceph1 ~]# systemctl status chronyd.service    #查看Chrony服务状态确保服务运行正常
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2021-03-14 22:24:35 CST; 4s ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 1663 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 1660 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1662 (chronyd)
   CGroup: /system.slice/chronyd.service
           └─1662 /usr/sbin/chronyd

Mar 14 22:24:35 ceph1 systemd[1]: Starting NTP client/server...
Mar 14 22:24:35 ceph1 chronyd[1662]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK ...UG)
Mar 14 22:24:35 ceph1 chronyd[1662]: Frequency 13.537 +/- 2.008 ppm read from /var/lib/chr...ift
Mar 14 22:24:35 ceph1 systemd[1]: Started NTP client/server.
Hint: Some lines were ellipsized, use -l to show in full.

[root@ceph1 ~]# chronyc sources    #查看跟公网时间服务器通讯状态（S列符号为星号即代表通讯正常）
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 203.107.6.88                  2   6    17     3   +358us[ +283us] +/-   18ms
[root@ceph1 ~]# chronyc clients    #查看跟其它节点时间客户端通讯状态（Hostname列有客户端信息即代表通讯正常）
Hostname                      NTP   Drop Int IntL Last     Cmd   Drop Int  Last
===============================================================================
ceph2                           4      0   1   -    32       0      0   -     -
ceph3                           4      0   1   -    18       0      0   -     -

d）安装Python3：yum install -y python3

[root@ceph1 ~]# yum install -y python3
[root@ceph2 ~]# yum install -y python3
[root@ceph3 ~]# yum install -y python3

e）安装Docker服务：安装请参考以下链接文章

CentOS 7 快速部署Docker-CE环境（阿里云版）

安装Docker-CE 1、安装必要的一些系统工具：yum install -y yum-utils devi […]

2、安装Cephadm(以下所有操作在Ceph节点1执行)
a）获取脚本：curl –silent –remote-name –location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm

[root@ceph1 ~]# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
[root@ceph1 ~]# ll    #查看文件下载与否（由于网络原因有时可能会没下载下来，多下载几次即可；）
total 220
-rw-------. 1 root root   1456 Jan 15  2019 anaconda-ks.cfg
-rw-r--r--  1 root root 219622 Mar 14 22:49 cephadm

b）配置脚本可执行权限：chmod +x cephadm

[root@ceph1 ~]# chmod +x cephadm

    c）执行安装脚本：
    ./cephadm add-repo –release octopus    #添加Ceph镜像源地址
    cp /etc/yum.repos.d/ceph.repo{,.bak}    备份镜像源地址
    sed -i ‘s#download.ceph.com#mirrors.aliyun.com/ceph#’ /etc/yum.repos.d/ceph.repo    #修改镜像源为国内阿里云地址
    ./cephadm install

[root@ceph1 ~]# ./cephadm add-repo --release octopus
Writing repo to /etc/yum.repos.d/ceph.repo...
Enabling EPEL...
[root@ceph1 ~]# cp /etc/yum.repos.d/ceph.repo{,.bak}
[root@ceph1 ~]# sed -i 's#download.ceph.com#mirrors.aliyun.com/ceph#' /etc/yum.repos.d/ceph.repo
[root@ceph1 ~]# ./cephadm install
Installing packages ['cephadm']...

d）检验cephadm已加入PATH环境变量(可能需要退出命令行界面重新登录)

[root@ceph1 ~]# which cephadm 
/usr/sbin/cephadm

3、Ceph集群部署(以下所有操作在Ceph节点1执行)
a）引导创建集群：cephadm bootstrap –mon-ip 192.168.80.245

[root@ceph1 ~]# cephadm bootstrap --mon-ip 192.168.80.245
Creating directory /etc/ceph for ceph.conf
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 12837782-84d6-11eb-a474-00505622b20c
Verifying IP 192.168.80.245 port 3300 ...
Verifying IP 192.168.80.245 port 6789 ...
Mon IP 192.168.80.245 is in CIDR network 192.168.80.0/24
Pulling container image docker.io/ceph/ceph:v15...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to root@localhost's authorized_keys...
Adding host ceph1...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Enabling mgr prometheus module...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 13...
Mgr epoch 13 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:

URL: https://ceph1:8443/
User: admin
Password: 0m4lylrdco

You can access the Ceph CLI with:

sudo /usr/sbin/cephadm shell --fsid 12837782-84d6-11eb-a474-00505622b20c -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

ceph telemetry on

For more information see:

https://docs.ceph.com/docs/master/mgr/telemetry/

Bootstrap complete.

        PS：该命令执行以下操作：
        1、在本地主机上为新集群创建monitor 和 manager daemon守护程序。
        2、为Ceph集群生成一个新的SSH密钥，并将其添加到root用户的/root/.ssh/authorized_keys文件中。
        3、将与新群集进行通信所需的最小配置文件保存到/etc/ceph/ceph.conf。
        4、向/etc/ceph/ceph.client.admin.keyring写入client.admin管理（特权！）secret key的副本。
        5、将public key的副本写入/etc/ceph/ceph.pub。
    b）查看当前配置文件

[root@ceph1 ~]# ll /etc/ceph/
total 12
-rw------- 1 root root  63 Mar 14 23:01 ceph.client.admin.keyring
-rw-r--r-- 1 root root 179 Mar 14 23:01 ceph.conf
-rw-r--r-- 1 root root 595 Mar 14 23:02 ceph.pub

c）查看当前拉取的镜像及容器运行状态

[root@ceph1 ~]# docker image ls
REPOSITORY           TAG       IMAGE ID       CREATED         SIZE
ceph/ceph            v15       5553b0cb212c   2 months ago    943MB
ceph/ceph-grafana    6.6.2     a0dce381714a   9 months ago    509MB
prom/prometheus      v2.18.1   de242295e225   10 months ago   140MB
prom/alertmanager    v0.20.0   0881eb8f169f   15 months ago   52.1MB
prom/node-exporter   v0.18.1   e5a616e4b9cf   21 months ago   22.9MB
[root@ceph1 ~]# docker container ls
CONTAINER ID   IMAGE                        COMMAND                  CREATED         STATUS         PORTS     NAMES
bef4209bd542   prom/node-exporter:v0.18.1   "/bin/node_exporter …"   2 minutes ago   Up 2 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-node-exporter.ceph1
d062d4b9fc13   ceph/ceph-grafana:6.6.2      "/bin/sh -c 'grafana…"   2 minutes ago   Up 2 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-grafana.ceph1
3b81b1145e40   prom/alertmanager:v0.20.0    "/bin/alertmanager -…"   2 minutes ago   Up 2 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-alertmanager.ceph1
6442a8ae6fe8   prom/prometheus:v2.18.1      "/bin/prometheus --c…"   2 minutes ago   Up 2 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-prometheus.ceph1
839e0a93f289   ceph/ceph:v15                "/usr/bin/ceph-crash…"   2 minutes ago   Up 2 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-crash.ceph1
fe8d6084aad4   ceph/ceph:v15                "/usr/bin/ceph-mgr -…"   3 minutes ago   Up 3 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-mgr.ceph1.usdqnp
180e32d34660   ceph/ceph:v15                "/usr/bin/ceph-mon -…"   3 minutes ago   Up 3 minutes             ceph-12837782-84d6-11eb-a474-00505622b20c-mon.ceph1

        PS：此时已经运行了以下组件
            ceph-mgr ceph管理程序
            ceph-monitor ceph监视器
            ceph-crash 崩溃数据收集模块
            prometheus prometheus监控组件
            grafana 监控数据展示dashboard
            alertmanager prometheus告警组件
            node_exporter prometheus节点数据收集组件
    d）根据初始化完成的提示使用浏览器访问Dashboard(https://ceph1-IP:8443)，修改密码后登录到Ceph Dashboard；还有一个实时显示Ceph集群状态的检测页面Grafana（https://ceph1-IP:3000）

    e）启用CEPH命令：
        e1）标准用法：默认情况下本地主机是不支持ceph基本命令的，需执行命令(cephadm shell)进入特定Shell中使用(退出exit)

[root@ceph1 ~]# ceph -s
-bash: ceph: command not found
[root@ceph1 ~]# cephadm shell
Inferring fsid 12837782-84d6-11eb-a474-00505622b20c
Inferring config /var/lib/ceph/12837782-84d6-11eb-a474-00505622b20c/mon.ceph1/config
Using recent ceph image ceph/ceph@sha256:37939a3739e4e037dcf1b1f5828058d721d8c6de958212609f9e7d920b9c62bf
[ceph: root@ceph1 /]# ceph -s
  cluster:
    id:     12837782-84d6-11eb-a474-00505622b20c
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3
 
  services:
    mon: 1 daemons, quorum ceph1 (age 42m)
    mgr: ceph1.usdqnp(active, since 41m)
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     
 
[ceph: root@ceph1 /]# exit
exit
[root@ceph1 ~]#

e2）安装ceph-common包使本地主机支持ceph基本命令：cephadm install ceph-common

[root@ceph1 ~]# cephadm install ceph-common
Installing packages ['ceph-common']...
[root@ceph1 ~]# ceph -s
  cluster:
    id:     12837782-84d6-11eb-a474-00505622b20c
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3
 
  services:
    mon: 1 daemons, quorum ceph1 (age 48m)
    mgr: ceph1.usdqnp(active, since 47m)
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

f）将主机添加到集群中
f1）配置集群的公共SSH公钥至其它Ceph节点：ssh-copy-id -f -i /etc/ceph/ceph.pub root@Hostname

[root@ceph1 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/etc/ceph/ceph.pub"
The authenticity of host 'ceph2 (192.168.80.246)' can't be established.
ECDSA key fingerprint is SHA256:2Eo2WLWyofiltEAs4nLUFLOcXLFD6YvsuPSDlEDUZGk.
ECDSA key fingerprint is MD5:3c:b0:5f:a8:af:6a:15:45:eb:a9:2a:b0:20:21:65:04.
Are you sure you want to continue connecting (yes/no)? yes
root@ceph2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@ceph2'"
and check to make sure that only the key(s) you wanted were added.

[root@ceph1 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/etc/ceph/ceph.pub"
The authenticity of host 'ceph3 (192.168.80.247)' can't be established.
ECDSA key fingerprint is SHA256:2Eo2WLWyofiltEAs4nLUFLOcXLFD6YvsuPSDlEDUZGk.
ECDSA key fingerprint is MD5:3c:b0:5f:a8:af:6a:15:45:eb:a9:2a:b0:20:21:65:04.
Are you sure you want to continue connecting (yes/no)? yes
root@ceph3's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@ceph3'"
and check to make sure that only the key(s) you wanted were added.

f2）添加指定新节点加入Ceph集群中：ceph orch host add Hostname

[root@ceph1 ~]# ceph orch host add ceph2
Added host 'ceph2'
[root@ceph1 ~]# ceph orch host add ceph3
Added host 'ceph3'

[root@ceph1 ~]# ceph orch host ls    #验证查看ceph纳管的所有节点
HOST   ADDR   LABELS  STATUS  
ceph1  ceph1                  
ceph2  ceph2                  
ceph3  ceph3

添加完成后ceph会自动扩展monitor和manager到另外节点(此过程时间可能会稍久，耐心等待)，另外可用命令(ceph -s)或Ceph的Ceph Dashboard页面查看添加情况

[root@ceph1 ~]# ceph -s
  cluster:
    id:     12837782-84d6-11eb-a474-00505622b20c
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3
 
  services:    #此处可看到连第一节点在内公有三个mon监控服务及两个mgr管理服务已扩展部署好
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 50s)
    mgr: ceph1.usdqnp(active, since 60m), standbys: ceph2.qopzlo
    osd: 0 osds: 0 up, 0 in
 
  task status:
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

4、部署OSD(以下所有操作在Ceph节点1执行)
    a）自动使用任何可用和未使用的存储设备（此模式因为没有显示任何信息后期可能会有异常可按需选择）：ceph orch apply osd –all-available-devices
    b）这里选择手工添加：
        b1）在各节点处执行命令（lsblk）查看确定将被配置osd的设备名称

[root@ceph1 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk 
├─sda1   8:1    0    1G  0 part /boot
└─sda2   8:2    0   99G  0 part /
sdb      8:16   0  100G  0 disk
[root@ceph2 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk 
├─sda1   8:1    0    1G  0 part /boot
└─sda2   8:2    0   99G  0 part /
sdb      8:16   0  100G  0 disk
[root@ceph3 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk 
├─sda1   8:1    0    1G  0 part /boot
└─sda2   8:2    0   99G  0 part /
sdb      8:16   0  100G  0 disk

b2）确定后执行命令添加：ceph orch daemon add osd Hostname:/dev/sdx

[root@ceph1 ~]# ceph orch daemon add osd ceph1:/dev/sdb
Created osd(s) 0 on host 'ceph1'
[root@ceph1 ~]# ceph orch daemon add osd ceph2:/dev/sdb
Created osd(s) 1 on host 'ceph2'
[root@ceph1 ~]# ceph orch daemon add osd ceph3:/dev/sdb
Created osd(s) 2 on host 'ceph3'

[root@ceph1 ~]# ceph -s
  cluster:
    id:     12837782-84d6-11eb-a474-00505622b20c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 6m)
    mgr: ceph2.qopzlo(active, since 6m), standbys: ceph1.usdqnp
    osd: 3 osds: 3 up (since 11s), 3 in (since 11s)    #不一会可查看到此处刚刚配置的osd节点都up状态及in状态表示都正确生效了
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 297 GiB / 300 GiB avail
    pgs:     1 active+clean

检查集群健康状态：ceph -s #health参数显示OK表示Ceph环境基本部署完毕

[root@ceph1 ~]# ceph -s
  cluster:
    id:     12837782-84d6-11eb-a474-00505622b20c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 12m)
    mgr: ceph2.qopzlo(active, since 12m), standbys: ceph1.usdqnp
    osd: 3 osds: 3 up (since 6m), 3 in (since 6m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 297 GiB / 300 GiB avail
    pgs:     1 active+clean

2024年 7月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

您必须 登录 才能发表评论！

您必须登录才能发表评论！