I/O Errors in Alert log with ORA-29701, with "gipcWait failed with 16" in trace (文档 ID 1496329.1)
1. Database alert log
Fri May 04 10:56:59 2018Errors in file /oracle/app/oracle/diag/rdbms/orcl/rocl1/trace/rocl1_ora_65536796.trc:ORA-01114: 将块写入文件 ?时出现 IO 错误 (块 # )Fri May 04 10:57:00 2018
2. trace file
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing optionsORACLE_HOME = /oracle/app/oracle/product/11.2.0/db_1System name: ???AIXNode name: ?????rac1Release: ???????1Version: ???????7Machine: ???????00F6E7C84C00Instance name: rocl1Redo thread mounted by this instance: 1Oracle process number: 1540Unix process pid: 13962128, image: oracle@rac1*** 2018-05-04 10:56:58.840*** SESSION ID:(292.52991) 2018-05-04 10:56:58.840*** CLIENT ID:() 2018-05-04 10:56:58.840*** SERVICE NAME:(orcl) 2018-05-04 10:56:58.840*** MODULE NAME:(JDBC Thin Client) 2018-05-04 10:56:58.840*** ACTION NAME:() 2018-05-04 10:56:58.840 2018-05-04 10:56:58.828: [ CSSCLNT]clssscConnect: gipcWait failed with 16 (12)2018-05-04 10:56:58.840: [ CSSCLNT]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_scdb02_)) failed, rc 16kgxgncin: CLSS init failed with status 3kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSSkjfmsgr: unable to connect to NM for reg in shared groupORA-01114: 将块写入文件 ?时出现 IO 错误 (块 # )Dump of memory from 0x070001209CBA0328 to 0x070001209CBA0D3B70001209CBA0320 ??????????????????57495448 20544F44 ?????????[WITH TOD]
3. ocssd.log
-- 检查/oracle/app/11.2.0/grid/log/rac1/cssd/ocssd.log 文件2018-05-04 10:56:59.495: [ ???CSSD][1029]clssgmQueueShare: (11ba99f10) target global grock DBORCL member 1 type 1 queued from client (1176496b0), global grock DBORCL, refcount 7572018-05-04 10:56:59.495: [ ???CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 7572018-05-04 10:56:59.743: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11ba653d0, ret gipcretAuthFail (22) 2018-05-04 10:56:59.743: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] ?error during accept on endp 1112a29702018-05-04 10:56:59.744: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 1172b0bf0 [00000000e3b63bc0] { gipcSendRequest : addr ‘‘, data 11727c490, len 48, olen 0, parentEndp 11abbcef0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12)2018-05-04 10:56:59.745: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13)2018-05-04 10:56:59.745: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ] ?error during accept on endp 1112a29702018-05-04 10:56:59.804: [ ???CSSD][1029]clssscSelect: cookie accept request 11ad57f102018-05-04 10:56:59.804: [ ???CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ad57f102018-05-04 10:56:59.804: [ ???CSSD][1029]clssgmRegisterClient: proc(7589/11ad57f10), client(2/1174aaa90)2018-05-04 10:56:59.804: [ ???CSSD][1029]clssscSelect: cookie accept request 11ba746302018-05-04 10:56:59.804: [ ???CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba746302018-05-04 10:56:59.804: [ ???CSSD][1029]clssgmRegisterClient: proc(7591/11ba74630), client(1/117497510)2018-05-04 10:56:59.931: [ ???CSSD][1029]clssgmRegisterShared: grp DG_LOCAL_DATA, mbr 0, type 12018-05-04 10:56:59.931: [ ???CSSD][1029]clssgmQueueShare: (11a93a690) target local grock DG_LOCAL_DATA member 0 type 1 queued from client (1174aaa90), local grock DG_LOCAL_DATA, refcount 7212018-05-04 10:56:59.931: [ ???CSSD][1029]clssgmRegisterShared: local grock DG_LOCAL_DATA member 0 share type 1, refcount 7212018-05-04 10:56:59.932: [ ???CSSD][1029]clssgmRegisterShared: grp DBORCL, mbr 1, type 12018-05-04 10:56:59.932: [ ???CSSD][1029]clssgmQueueShare: (11a93ab70) target global grock DBORCL member 1 type 1 queued from client (117497510), global grock DBORCL, refcount 7582018-05-04 10:56:59.932: [ ???CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 7582018-05-04 10:57:00.194: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 11730eff0 [00000000e3b63c64] { gipcSendRequest : addr ‘‘, data 1172fce90, len 48, olen 0, parentEndp 11abbcef0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12)2018-05-04 10:57:00.195: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13)2018-05-04 10:57:00.195: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ] ?error during accept on endp 1112a29702018-05-04 10:57:00.254: [ ???CSSD][1029]clssscSelect: cookie accept request 11ba4a5902018-05-04 10:57:00.254: [ ???CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba4a5902018-05-04 10:57:00.254: [ ???CSSD][1029]clssgmRegisterClient: proc(7590/11ba4a590), client(2/11764d8f0)2018-05-04 10:57:00.254: [ ???CSSD][1029]clssscSelect: cookie accept request 1109c2e002018-05-04 10:57:00.254: [ ???CSSD][1029]clssgmAllocProc: (11bac8dd0) allocated
4. 检查CRS_home空间及文件
目录空间足够。ls -ld /var/tmp/.oracledrwxrwxrwt ???2 root ????oinstall ???????256 Nov 23 2014 ?/var/tmp/.oraclels -ld /tmp/.oracledrwxrwxrwt ???2 root ????oinstall ??????4096 Jan 23 01:43 /tmp/.oracle
5. 数据库此刻出现活动回话剧增,459f3z9u4fb3u语句查询字典视图出现(cursor: pin S wait on X)等待事件,且sga频繁收缩和扩展
SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93696| ?????93184| ????93184|COMPLETE |05/03 16:44 ?????????| ????????1SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93696| ?????93184| ????93184|COMPLETE |05/03 16:44 ?????????| ????????2SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93696| ?????93184| ????93184|COMPLETE |05/03 16:44 ?????????| ????????2GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32768| ?????33280| ????33280|COMPLETE |05/03 16:44 ?????????| ????????3GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32768| ?????33280| ????33280|COMPLETE |05/03 16:44 ?????????| ????????3GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32768| ?????33280| ????33280|COMPLETE |05/03 16:44 ?????????| ????????2SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93184| ?????92672| ????92672|COMPLETE |05/03 16:44 ?????????| ????????2SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93184| ?????92672| ????92672|COMPLETE |05/03 16:44 ?????????| ????????3SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93184| ?????92672| ????92672|COMPLETE |05/03 16:44 ?????????| ????????3SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????92672| ?????92160| ????92160|COMPLETE |05/03 16:45 ?????????| ????????3GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????33280| ?????33792| ????33792|COMPLETE |05/03 16:45 ?????????| ????????3GROW ???????|DEFERRED ???|db_cache_size ???????| ??????92160| ?????92672| ????92672|COMPLETE |05/03 16:55 ?????????| ????????1SHRINK ?????|DEFERRED ???|shared_pool_size ????| ??????33792| ?????33280| ????33280|COMPLETE |05/03 16:55 ?????????| ????????1SHRINK ?????|DEFERRED ???|shared_pool_size ????| ??????33280| ?????32768| ????32768|COMPLETE |05/04 09:53 ?????????| ????????0GROW ???????|DEFERRED ???|db_cache_size ???????| ??????92672| ?????93184| ????93184|COMPLETE |05/04 09:53 ?????????| ????????0GROW ???????|DEFERRED ???|db_cache_size ???????| ??????93184| ?????93696| ????93696|COMPLETE |05/04 10:02 ?????????| ???????88SHRINK ?????|DEFERRED ???|shared_pool_size ????| ??????32768| ?????32256| ????32256|COMPLETE |05/04 10:02 ?????????| ???????88GROW ???????|DEFERRED ???|db_cache_size ???????| ??????93696| ?????94208| ????94208|COMPLETE |05/04 10:53 ?????????| ??????104SHRINK ?????|DEFERRED ???|shared_pool_size ????| ??????32256| ?????31744| ????31744|COMPLETE |05/04 10:53 ?????????| ??????104SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????94208| ?????93696| ????93696|COMPLETE |05/04 10:54 ?????????| ????????1GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????31744| ?????32256| ????32256|COMPLETE |05/04 10:54 ?????????| ????????1GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32256| ?????32768| ????32768|COMPLETE |05/04 10:54 ?????????| ????????7SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93696| ?????93184| ????93184|COMPLETE |05/04 10:54 ?????????| ????????6GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32256| ?????32768| ????32768|COMPLETE |05/04 10:54 ?????????| ????????6SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93696| ?????93184| ????93184|COMPLETE |05/04 10:54 ?????????| ????????7GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????32768| ?????33280| ????33280|COMPLETE |05/04 10:55 ?????????| ????????1SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????93184| ?????92672| ????92672|COMPLETE |05/04 10:55 ?????????| ????????1SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????92672| ?????92160| ????92160|COMPLETE |05/04 10:55 ?????????| ????????4SHRINK ?????|IMMEDIATE ??|db_cache_size ???????| ??????92672| ?????92160| ????92160|COMPLETE |05/04 10:55 ?????????| ????????1GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????33280| ?????33792| ????33792|COMPLETE |05/04 10:55 ?????????| ????????4GROW ???????|IMMEDIATE ??|shared_pool_size ????| ??????33280| ?????33792| ????33792|COMPLETE |05/04 10:55 ?????????| ????????1SHRINK ?????|DEFERRED ???|shared_pool_size ????| ??????33792| ?????33280| ????33280|COMPLETE |05/04 11:09 ?????????| ???????85GROW ???????|DEFERRED ???|db_cache_size ???????| ??????92160| ?????92672| ????92672|COMPLETE |05/04 11:09 ?????????| ???????85
Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)
Example:2012-09-08 05:26:31.168: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] ?error during accept on endp 111249b70gipcretAuthFail (22) indicates "general security authorization failure".This could occur for multiple reasons: * if filesystem is full and there is no space to create file under auth directory. Please check if there is sufficient space in CRS_HOME. * Also this issue could occur if /var/tmp/.oracle socket is deleted (/tmp/.oracle on some platforms) . Please check on this too.
核查结果与【Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)】情况一致,但我们数据库软件目录空间足够且.oracle文件存在。
问题分析总结:ORA-01114告警是由于数据库SGA出现抖动引起数据库出现性能问题导致。
处理建议:增加SGA大小132G扩大到180G(v$sga_target_advice建议值)
RAC with asm on AIX, ?ORA-01114 error,with "gipcretAuthFail (22) ?" in ocssd.log
原文地址:https://www.cnblogs.com/wandering-mind/p/8992892.html