问题背景:
测试服务器上部署ambari大数据平台后,发现METRICS COLLECTOR 服务出现问题,该服务不能启动成功,有博文指出是ntpd服务有问题,因此,查看了ntpd服务的状态,状态如下:
[root@slave2 root]# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2022-04-10 11:27:07 CST; 5h 39min ago
Process: 756 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 781 (ntpd)
CGroup: /system.slice/ntpd.service
└─781 /usr/sbin/ntpd -u ntp:ntp -g
Apr 10 11:27:07 slave2 ntpd[781]: 0.0.0.0 c012 02 freq_set kernel 0.057 PPM
Apr 10 11:27:12 slave2 ntpd[781]: Listen normally on 4 ens33 192.168.0.18 UDP 123
Apr 10 11:27:12 slave2 ntpd[781]: Listen normally on 5 ens33 fe80::20c:29ff:fe23:5740 UDP 123
Apr 10 11:27:12 slave2 ntpd[781]: new interface(s) found: waking up resolver
Apr 10 11:35:55 slave2 ntpd[781]: 0.0.0.0 c615 05 clock_sync
Apr 10 12:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
Apr 10 13:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
Apr 10 14:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
Apr 10 15:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
Apr 10 16:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
ntpd服务报错:frequency file /var/lib/ntp/drift/ntp.drift.TEMP: Permission denied ,此问题将导致时间同步出现问题,也就是说 ntpq -p 命令可能会无法正常执行,而如果时间同步出现问题,大数据平台将会出现各种稀奇古怪的问题,比如 ambari collector这个服务就需要按时间收取节点的各个服务信息,时间的不同将会导致服务部能够启动。
问题解决方案:
查看drift这个文件,发现属组变更为了root属组,因此,临时将该文件提升为777权限。后考虑不太安全,因此,将属组调整为750,总之,该文件的属性应该是调整为如下所示:
[root@slave2 ntp]# ll
total 4
-rw-r--r-- 1 ntp ntp 6 Mar 28 22:10 drift
此时在次重启ntpd服务,时间服务即可正常了。
[root@slave2 ntp]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
master LOCAL(0) 6 u 11 64 7 0.391 -0.083 0.008
[root@slave2 ntp]# ntpstat
unsynchronised
polling server every 8 s
[root@slave2 ntp]# ntpstat
unsynchronised
polling server every 8 s
再次重启ambari collector 服务恢复正常。
总结:
测试服务器有一次模拟sshd报错ssh_exchange_identification: read: Connectio...的问题,因此,将/var/lib/目录权限调整为了777,由此导致ambari平台的一个节点出现了问题。