Oracle报错ORA-00600 [4400]产生原因及解决方案

215次阅读

共计 5819 个字符，预计需要花费 15 分钟才能阅读完成。

Oracle 报错实例分析：运维 DBA 反映 Oracle 数据库出现报错 ORA-00600: internal error code, arguments: [4400], [48]，分析原因为跟分布式事务有关。

分析一个 ORA-600 错误，用 UE 打开 trace，看到如下错误：

Oracle9i Enterprise Edition Release 9.2.0.8.0 – 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 – Production
ORACLE_HOME = /install1/oracle/bill1/product/9.2.0
System name: AIX
Node name: billing1
Release: 3
Version: 5
Machine: 00CB104D4C00
Instance name: bill1
Redo thread mounted by this instance: 1
Oracle process number: 148
Unix process pid: 1363982, image: oracle@billing1 (TNS V1-V3)

*** SESSION ID:(578.13852) 2012-06-04 12:08:44.492
*** 2012-06-04 12:08:44.492
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

可以看到，该系统是 aix 5.3，db version 是 9208 单实例。
继续查看下面的 call stack，内容如下：

—– Call Stack Trace —–
calling call entry argument values in hex
location type point (means dubious value)
——————– ——– ——————– —————————-
ksedmp+0148 bl ksedst 1029555CC
ksfdmp+0018 bl 01FD46A8
kgeriv+0118 bl _ptrgl
kgeasi+00cc bl kgeriv 000000002 1100A2128
102954750 7000001B2B78F30
000000000
ktcddt+013c bl kgeasi 110006838 1103923A8
113000001130 200000002
100000001 000000004
000000030 7000001B1B74EE8
ktcsod+01f8 bl ktcddt 110006838 110006978
000000000
kssdch_stage+02b8 bl _ptrgl
kssdch+0014 bl kssdch_stage 000000000 700000000007D98
110002F50
ktcbod+030c bl kssdch 11000D618 000000008
kssdch_stage+02b8 bl _ptrgl
kssdch+0014 bl kssdch_stage 110002F50 110061758
000000009
ksuxds+1118 bl kssdch 1000E82E4 000000000
ksudel+006c bl ksuxds 7000001AB716A20 100000001
opilof+03dc bl 01FD4914
opiodr+08cc bl _ptrgl
ttcpip+0cc4 bl _ptrgl
opitsk+0d60 bl ttcpip http://www.oracleplus.net 11000D4C0 000000000
000000000 000000000
000000000 000000000
000000000 000000000
opiino+0758 bl opitsk 000000000 000000000
opiodr+08cc bl _ptrgl
opidrv+032c bl opiodr 3C00000018 4101FAA40
FFFFFFFFFFFF8F0 0A0012010
sou2o+0028 bl opidrv 3C0C000000 4A0059B20
FFFFFFFFFFFF8F0
main+0138 bl 01FD40E8
__start+0098 bl main 000000000 000000000

mos 上关于该错误的描述是这样的:
PURPOSE:
This article discusses the internal error “ORA-600 [4400]”, what it means and possible actions. The information here is only applicable to the versions listed and is provided only for guidance.

ERROR:
ORA-600 [4400] [a] [b] [c] [d] [e]

VERSIONS:
versions 6.0 to 11

DESCRIPTION:

Internal error 4400 means that we are trying to delete a transaction (for example at logoff time) but the transaction has not yet been marked completed.

This can happen at the remote site in a distributed transaction if the first part of the first stage of a two phase commit gets an error before it really starts the protocol.

FUNCTIONALITY:
TRANSACTION CONTROL

IMPACT:
PROCESS FAILURE – but only at logoff so minimal impact
NON CORRUPTIVE – No underlying data corruption.

该文档描述说 4400 错误是跟分布式事务有关，曾经也遇到不少关于分布式事务的问题 ，以前也写过一篇：ORA-01591: lock held by in-doubt distributed transaction

针对该错误，对比 call stack 可以发现，基本上完全一致，该文档说该错误完全可以忽略，如下：

ORA-00600 [4400], [48], [], [], [], [] From a Distributed Transaction [ID 464861.1]

Symptoms
The following error is reported on 9.2.0.5:

ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

The call stack is:

ksedmp ksfdmp kgeriv kgeasi ktcddt ktcsod kssdch_stage kssdch ktcbod
kssdch_stage kssdch PGOSF40__ksuxds ksudel kxfprdp opirip opidrv sou2o

Cause
The error is encountered due to Bug 3840810 which was fixed in version 10.1.0.3.

The error is encountered when there is a dblink between 8i and 9i/10g databases. This error is only raised
in the log-off of the local session while trying to delete a transaction but the transaction has not yet
been marked completed. This lack of information is caused by the bug and if there is no process failure due
to this error, it can be ignored since there is no SQL statement/session affected.

This bug has been fixed by architectural changes in 10g and unfortunately is not backportable to 9.2.

If this is an one time occurrence then it can be safely ignored.

我们可以从 trace 里面找到如下信息：

BH (0x700000134fdd900) file#: 405 rdba: 0x65403498 (405/13464) class 1 ba: 0x700000134764000
set: 84 dbwrid: 3 obj: 343488 objn: 343488
hash: [700000070fead00,700000163feec00] lru: [7000000e8fe6d68,70000013ffe8468]
LRU flags: hot_buffer
ckptq: [7000000aefe93d8,700000088fe2dd8] fileq: [7000001b1174040,7000000eefc73e8]
st: XCURRENT md: NULL rsop: 0x0 tch: 5
flags: buffer_dirty gotten_in_current_mode block_written_once
redo_since_read
LRBA: [0x3e7cf.396e5.0] HSCN: [0x0bac.2f283f13] HSUB: [1] RRBA: [0x0.0.0]
buffer tsn: 32 rdba: 0x65403498 (405/13464)
scn: 0x0bac.2f283f13 seq: 0x01 flg: 0x02 tail: 0x3f130601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump: 0x65403498

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0040.026.00167771 0x7203e25c.0f91.02 C— 0 scn 0x0bac.2f26d569
0x02 0x008c.04b.002ae01f 0x7242e607.0089.07 C— 0 scn 0x0bac.2f26d709
0x03 0x0043.04f.00184d78 0x720020a7.e5ff.2f C— 0 scn 0x0bac.2f28047f
…….
0x30 0x0005.041.001d254f 0x228560f5.c941.0e –U- 1 fsc 0x0075.2f282d74
……..
0x3f 0x008e.012.0029e870 0x22801e5b.0444.06 C— 0 scn 0x0bac.2f26d4de
0x40 0x0005.000.001d261c 0x228560f5.c941.0a –U- 1 fsc 0x007b.2f282d25
………
0x61 0x0020.004.000bc447 0x7203ce74.e107.12 C— 0 scn 0x0bac.2f26d66a
0x62 0x0049.04c.001e453a 0x2a42eb96.ffe3.06 C— 0 scn 0x0bac.2f26d647

LRBA 是 recover 的起点，这个是 checkpoint 东西，大家可以参考这里：详解 oracle checkpoint

从上面可以看到，所有事务 falg 都是 C 或 U，表示事务都是提交了的，说明这个错误确实没有任何影响。

更多 Oracle 相关信息见 Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12

本文永久更新链接地址 ：http://www.linuxidc.com/Linux/2017-05/143578.htm

正文完

星哥玩云-微信公众号