共计 5107 个字符,预计需要花费 13 分钟才能阅读完成。
环境:RHEL5.5 + Oracle 11g RAC
客户联系说关闭 cluster 后,重启启动,发现 CRS 无法启动。提示 Cannot communicate with Cluster Ready Services。
登录主机检查
[root@rac-2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
检查 RAC 的日志
[grid@rac-2 rac-2]$ tail -100 alertrac-2.log | more
2016-09-23 03:16:17.396
[ohasd(3899)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘rac-2’.
2016-09-23 03:16:18.697
[crsd(22676)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:18.704
[crsd(22676)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:19.433
[ohasd(3899)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘rac-2’.
2016-09-23 03:16:20.737
[crsd(22685)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:20.747
[crsd(22685)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘rac-2’.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2771:Maximum restart attempts reached for resource ‘ora.crsd’; will not restart.
检查 crsd.log
2016-09-23 03:16:20.461: [CRSMAIN][1106286912] Policy Engine is not initialized yet!
2016-09-23 03:16:20.463: [CRSMAIN][3556262304] Initializing OCR
[CLWAL][3556262304]clsw_Initialize: OLR initlevel [70000]
2016-09-23 03:16:20.735: [OCRASM][3556262304]proprasmo: Error in open/create file in dg [ORC_VOTE]
[OCRASM][3556262304]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
2016-09-23 03:16:20.735: [OCRASM][3556262304]ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.737: [OCRASM][3556262304]proprasmo: kgfoCheckMount returned [7]
2016-09-23 03:16:20.737: [OCRASM][3556262304]proprasmo: The ASM instance is down
2016-09-23 03:16:20.738: [OCRRAW][3556262304]proprioo: Failed to open [+ORC_VOTE]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2016-09-23 03:16:20.738: [OCRRAW][3556262304]proprioo: No OCR/OLR devices are usable
2016-09-23 03:16:20.738: [OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.738: [GIPC][3556262304] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
2016-09-23 03:16:20.740: [default][3556262304]clsvactversion:4: Retrieving Active Version from local storage.
2016-09-23 03:16:20.742: [OCRRAW][3556262304]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.
2016-09-23 03:16:20.745: [OCRRAW][3556262304]proprinit: Could not open raw device
2016-09-23 03:16:20.745: [OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.746: [OCRAPI][3556262304]a_init:16!: Backend init unsuccessful : [26]
2016-09-23 03:16:20.747: [CRSOCR][3556262304] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.748: [CRSMAIN][3556262304] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.748: [CRSD][3556262304][PANIC] CRSD exiting: Could not init OCR, code: 26
2016-09-23 03:16:20.748: [CRSD][3556262304] Done.
从错误信息判断是 ASM 出现了问题,检查 ASM 磁盘
[root@rac-2 ~]# /etc/init.d/oracleasm listdisks
ASMDATA01
ASMDATA02
ASMDATA03
OCR_VOTE
磁盘是存在的。
关闭 CRS 后,检查 CRS 相关进程
[root@rac-2 ~]# ps -ef | grep d.bin
root 3899 1 0 Jan13 ? 00:18:59 /u02/11.2.0/grid/bin/ohasd.bin reboot
grid 4267 1 0 Jan13 ? 00:34:32 /u02/11.2.0/grid/bin/oraagent.bin
grid 4280 1 0 Jan13 ? 00:00:16 /u02/11.2.0/grid/bin/mdnsd.bin
grid 4293 1 0 Jan13 ? 00:06:10 /u02/11.2.0/grid/bin/gpnpd.bin
root 4304 1 0 Jan13 ? 01:31:25 /u02/11.2.0/grid/bin/orarootagent.bin
grid 4307 1 0 Jan13 ? 00:27:27 /u02/11.2.0/grid/bin/gipcd.bin
root 4322 1 0 Jan13 ? 00:45:33 /u02/11.2.0/grid/bin/osysmond.bin
root 4332 1 0 Jan13 ? 00:01:24 /u02/11.2.0/grid/bin/cssdmonitor
root 4350 1 0 Jan13 ? 00:02:39 /u02/11.2.0/grid/bin/cssdagent
grid 4362 1 0 Jan13 ? 01:45:38 /u02/11.2.0/grid/bin/ocssd.bin
root 4437 1 0 Jan13 ? 00:28:42 /u02/11.2.0/grid/bin/octssd.bin reboot
grid 4461 1 0 Jan13 ? 00:00:22 /u02/11.2.0/grid/bin/evmd.bin
grid 4843 4461 0 Jan13 ? 00:00:00 /u02/11.2.0/grid/bin/evmlogger.bin -o /u02/11.2.0/grid/evm/log/evmlogger.info -l /u02/11.2.0/grid/evm/log/evmlogger.log
root 4941 1 0 Jan13 ? 00:21:18 /u02/11.2.0/grid/bin/ologgerd -m rac-1 -r -d /u02/11.2.0/grid/crf/db/rac-2
root 23122 22979 0 03:54 pts/3 00:00:00 grep d.bin
CRS 已经关闭,但是好多进程没有释放。手动 kill 掉这些进程
[root@rac-2 ~]# ps -ef | grep d.bin | awk ‘{print $2}’ | xargs kill -9
kill 23131: No such process
重启 CRS,问题解决。
更多 Oracle 相关信息见 Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12
本文永久更新链接地址 :http://www.linuxidc.com/Linux/2016-10/135749.htm