共计 2702 个字符,预计需要花费 7 分钟才能阅读完成。
最近两天生产数据库一直在报 ORA-24756 错误,查了 MOS 上的文档,看到一篇类似的,说是 BUG(Bug 19201866 – RECO reports ORA-24756 repeatedly into trace file (文档 ID 19201866.8)),但是 HP-UX 上的 11.2.0.4 没有解决需要升级到 12.2 版本,文档中还说重启实例也可以解决,但是不现实。
于是查报错时的 trace 文件都是如下的内容
1 2 3 4 | *** 2017-02-04 09:17:56.058 ERROR, tran=42.13.2709343, ose=0: ORA-24756: ...... |
看到 tran=42.13.2709343,印象中是分布式事务,于是查 DBA_2PC_PENDINGS 视图
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SYS@db1> COL LOCAL_TRAN_ID FORMAT A13 SYS@db1> COL GLOBAL_TRAN_ID FORMAT A90 SYS@db1> COL STATE FORMAT A10 SYS@db1> COL MIXED FORMAT A3 SYS@db1> COL HOST FORMAT A10 SYS@db1> COL COMMIT # FORMAT A15 SYS@db1> SET LINESIZE 240 SYS@db1> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE,FAIL_TIME,FORCE_TIME,RETRY_TIME, MIXED, HOST, COMMIT # 2 FROM DBA_2PC_PENDING 3 / LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE FAIL_TIME FORCE_TIME RETRY_TIME MIX HOST COMMIT # ------------- ------------------------------------------------------------------------------------------ ---------- ----------------- ----------------- ----------------- --- ---------- --------------- 42.13.2709343 1096044365.31302E3235352E3233322E32332E746D313438363137313036383638333230333633 collecting 20170204 09:17:55 20170206 01:12:41 no bosbpm4s 764631398601 |
看到 LOCAL_TRAN_ID 和 trace 文件中的 tran 一致,刚开始的设想是正确的。其中 FAIL_TIME 是 20170204 09:17:55 对应到后台的 alert 日志中,看到这个时间前后的报错信息
1 2 3 4 5 6 7 8 9 10 11 12 13 | Sat Feb 04 09:17:50 2017 Error 22 trapped in 2PC on transaction 42.13.2709343. Cleaning up. Error stack returned to user: ORA-02050: transaction 42.13.2709343 rolled back, some remote DBs may be in -doubt ORA-00022: invalid session ID; access denied ORA-02063: preceding line from LINK_DB2 Sat Feb 04 09:17:56 2017 DISTRIB TRAN 41544f4d.31302E3235352E3233322E32332E746D313438363137313036383638333230333633 is local tran 42.13.2709343 (hex=2a.0d.29575f) insert pending collecting tran, scn=764631398601 (hex=b2.079538c9) Sat Feb 04 09:17:56 2017 Errors in file /Oracle11g/app/oracle/diag/rdbms/db1/db1/trace/db1_reco_23402 .trc: ORA-24756: |
这种分布式事务故障可能会锁定数据导致其他事务报 ORA-01591 报错或者一直占用 UNDO 段不能被别的事务重用。需要手工处理这种故障。
1 2 3 4 5 6 7 8 9 10 11 12 13 | SYS@db1> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY( '42.13.2709343' ); PL/SQL procedure successfully completed. SYS@db1> commit ; Commit complete. SYS@db1> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE,FAIL_TIME,FORCE_TIME,RETRY_TIME, MIXED, HOST, COMMIT # 2 FROM DBA_2PC_PENDING 3 / no rows selected |
清理完毕。
管理分布式事务的官方文档:http://docs.oracle.com/cd/E11882_01/server.112/e25494/ds_txnman.htm#ADMIN12252
MOS 文档:How to Purge a Distributed Transaction from a Database (文档 ID 159377.1) ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry (文档 ID 290405.1)
更多 Oracle 相关信息见 Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12
本文永久更新链接地址 :http://www.linuxidc.com/Linux/2017-02/140331.htm