共计 11597 个字符,预计需要花费 29 分钟才能阅读完成。
一、问题概述
1、数据库环境:
Oracle Database 11.2.0.3.0 for Oracle Linux Server release 6.4,RAC,虚拟机
2、巡检时发现某数据库 alert.log 日志报 ORA-00600[13011] 错误,报错频繁,虽然未导致数据库宕机,但已影响业务,报错如下:
[oracle@NODE1 trace]$ grep -i ora-00600 alert*.log|grep 13011 | sort -u
ORA-00600: : [13011], [321401], [33682485], [24], [33682485], [3], [], [], [], [], [], [] — Tue Feb 06 00:07:53 开始报错
ORA-00600: : [13011], [321401], [33682485], [27], [33682485], [3], [], [], [], [], [], []
trace 文件中信息:
从这里可以看出是对 XXXXXMIN.XXX_XX_XX_XXX_OLD 表进行 DELEDE 操作导致该错误发生
Dump continued from file: /u01/app/oracle/diag/rdbms/racdb/racdb1/trace/racdb1_ora_19795.trc
ORA-00600: [13011], [321401], [33682485], [27], [33682485], [3], [], [], [], [], [], []
========= Dump for incident 49853 (ORA 600 [13011]) ========
*** 2018-02-06 09:37:44.987
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
—– Current SQL Statement for this session (sql_id=b6nmg0fpy3smf) —–
delete from “XXXXXMIN”.”XXX_XX_XX_XXX_OLD” where “AX_ID” = :1
二、问题分析
1、MOS 关于 ORA 600 [13011] 的描述:
Format: ORA-600 [13013] [a] [b] {c} [d] [e] [f] Arg
[a] Passcount Arg
[b] Data Object number Arg
{c} Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated Arg
[d] Row Slot number Arg
[e] Decimal RDBA of block being updated (Typically same as {c}) Arg
[f] Code
参考《New and Improved: ORA-600 [13013] “Unable to get a Stable set of Records” (文档 ID 1438920.1)》和《ORA-600 [13013] “Unable to get a Stable set of Records” (文档 ID 28185.1)》文章。该报错是由于对某个表执行 DML 操作,该表对应的某个索引损坏导致的,解决的办法是找出操作的表和受损的索引,重建索引即可。
2、查找报错对象
根据 ORA-00600 [13011], [321401], [33682485], [27], [33682485], [3] 报错代码,查找报错对象:
select dbms_utility.data_block_address_file(33682485) rfile,dbms_utility.data_block_address_block(33682485) blocks from dual;
RFILE BLOCKS
———- ———-
8 128053
select owner, segment_name, segment_type, tablespace_name, a.partition_name from dba_extents a where file_id = 8 and 128053 between block_id and block_id + blocks – 1;
OWNER SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME PARTITION_NAME
———— ———————— —————— —————- ————–
XXXXXMIN XXX_XX_XX_XXX_OLD TABLE XXX
trace 文件中信息:
BH (0xf60ee308) file#: 8 rdba: 0x0201f435 (8/128053) class: 1 ba: 0xf6c96000 – 其对象 XXXXXMIN.XXX_XX_XX_XXXXX_OLD 与查询一致
set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 0 obj: 321401 objn: 321401 tsn: 8 afn: 8 hint: f
hash: [0x13ef9fd78,0x13ef9fd78] lru: [0xc900efb0,0xaf13f128]
ckptq: [NULL] fileq: [NULL] objq: [0x132d5a950,0x132d5a950] objaq: [0x132d5a940,0x132d5a940]
st: XCURRENT md: NULL fpin: ‘kddwh01: kdddel’ tch: 1 le: 0xcb0e3ee8
flags: remote_transfered
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 8 rdba: 0x0201f435 (8/128053) – 与查询一致,其对象为 XXXXXMIN.XXX_XX_XX_XXXXX_OLD
scn: 0x0001.084d4f80 seq: 0x01 flg: 0x06 tail: 0x4f800601
frmt: 0x02 chkval: 0x538d type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
3、分析异常表
analyze table xxxxxmin.xxx_xx_xx_xxxxx_old validate structure cascade;
ERROR at line 1:
ORA-01499: table/index cross reference failure – see trace file –- 根据文档 1499.1 查找 trace 文件
OERR: ORA-1499 table/Index Cross Reference Failure – see trace file [ID 1499.1]
Error ORA-1499 is produced by statement “ANALIZE TABLE|CLUSTER VALIDATE STRUCTURE CASCADE” to report an inconsistency between a table or a cluster and its index where an index key value is not found in the index or vice versa.
The content of the trace file has:
: tsn: rdba:
description:
“row not found in index”
“Table/Index row count mismatch”
“row mismatch in index dba”
“Table row count/Bitmap index bit count mismatch”
“kdavls: kdcchk returns %d when checking cluster dba 0x%08lx objn %d\n”
tsn: Tablespace Number where the INDEX is stored.
rdba: Relative data block address of the INDEX segment header.
根据文档 1499.1 查找 trace 文件未找到相应的报错。看来与文档描述的情况不同,需进一步分析。
4、根据 ROWID 分析
通过前面的分析知道 ORA-600 [13013] 该报错是由于表与索引之间的逻辑数据不一致导致。查询明确关联的索引:
select owner,index_name,index_type from dba_indexes where table_name=’XXX_XX_XX_XXXXX_OLD’ and owner=’XXXXXMIN’;
OWNER INDEX_NAME INDEX_TYPE
——————————————- —————————
XXXXXMIN PK_XXX_XX_XX_XXXXX NORMAL
– 索引创建语句如下:
DBMS_METADATA.GET_DDL(UPPER(‘INDEX’),UPPER(‘PK_XXX_XX_XX_XXXXX’),UPPER(‘XXXXXMIN’))
———————————————————————————————————–
CREATE UNIQUE INDEX “XXXXXMIN”.”PK_XXX_XX_XX_XXXXX” ON “XXXXXMIN”.”XXX_XX_XX_XXXXX_OLD” (“AX_ID”, “BX_ID”)
根据 ”XXXXXMIN”.”PK_XXX_XX_XX_XXXXX” 索引的创建语句,该索引为 B 树索引,它是基于二叉树的, 由分支块和叶子块组成,包括每个索引列的值和行所对应的 ROWID。
通过下面的语句查询出全表扫描时和索引扫描时存在差异的行:
1 select /*+ INDEX_FFS(t pk_xxx_xx_xx_xxx) */ rowid,
2 dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno,
3 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
4 from XXXXXMIN.XXX_XX_XX_XXXXX_OLD t where (t.AX_ID is not null or BX_ID is not null)
5 minus
6 select /*+ FULL(t1)*/ rowid,
7 dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno,
8 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block from XXXXXMIN.XXX_XX_XX_XXXXX_OLD t1;
查询结果如下:
ROWID RELATIVE_FNO BLOCK
—————— ———— ———-
AABOd5AAIAAAfQ1AAP 8 128053
AABOd5AAIAAAfQ1AAQ 8 128053
AABOd5AAIAAAfQ1AAR 8 128053
AABOd5AAIAAAfQ1AAY 8 128053
AABOd5AAIAAAfQ1AAZ 8 128053
AABOd5AAIAAAfQ1AAa 8 128053
AABOd5AAIAAAfQ1AAb 8 128053
AABOd5AAIAAAfQ1AAc 8 128053
AABOd5AAIAAAfQ1AAd 8 128053
AABOd5AAIAAAfQ1AAe 8 128053
AABOd5AAIAAAfQ1AAf 8 128053
AABOd5AAIAAAfQ1AAg 8 128053
AABOd5AAIAAAfQ1AAq 8 128053
AABOd5AAIAAAfQ1AAr 8 128053
AABOd5AAIAAAfQ1AAs 8 128053
15 rows selected.
5、验证该表全表扫描与索引扫描时存在差异行
根据下面语句找出差异的数据:
select e.*,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.rowid > (select min(x.rowid) from XXXXXMIN.XXX_XX_XX_XXX_OLD x where x.AX_ID=e.AX_ID and x.BX_ID=e.BX_ID);
AX_ID BX_ID COMMITED_XXXXX UNCOMMITED_XXXXX UNSHARED_XXXXX UPDATED_T ROWID
———- ———- ————– —————- ————– ——— ——————
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAM
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAN
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAP
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAL
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAQ
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACA
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACl
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACk
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAB
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAE
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACC
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACm
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAD
**** ** ********** *************** ********** ********* AABOd5AAIAABFRCACB
**** ** ********** *************** ********** ********* AABOd5AAIAAAzAPAAO
15 rows selected.
取其中一条数据来验证走全表扫描和索引扫描时的差异
–SQL 执行计划通过索引扫描查询的数据
SQL> alter session set statistics_level=all;
Session altered.
SQL> select e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**;
AX_ID BX_ID COMMITED_XXXXX UNCOMMITED_XXXXX UNSHARED_XXXXX UPDATED_T ROWNUM ROWID
———- ———- ————– —————- ————– ——— ———- ——————
**** ** ********** * * ********* * AABOd5AAIAAAGcaABR
SQL> select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————–
SQL_ID cy48jvzrnuv22, child number 1
————————————-
select e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**
Plan hash value: 1022151449
————————————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
————————————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | COUNT | | 1 | | 1 |00:00:00.01 | 3 |
| 2 | TABLE ACCESS BY INDEX ROWID| XXX_XX_XX_XXX_OLD | 1 | 1 | 1 |00:00:00.01 | 3 |
|* 3 | INDEX UNIQUE SCAN | PK_XXX_XX_XX_XXX | 1 | 1 | 1 |00:00:00.01 | 2 |
————————————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
3 – access(“E”.”AX_ID”=**** AND “E”.”BX_ID”=**)
21 rows selected.
–SQL 执行计划通过全表查询的数据
SQL> select /*+ full(e) */ e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**;
AX_ID BX_ID COMMITED_XXXXX UNCOMMITED_XXXXX UNSHARED_XXXXX UPDATED_T ROWNUM ROWID
———- ———- ————– —————- ————– ——— ———- ——————
**** ** ********** * * ********* * AABOd5AAIAABFRCACk
SQL> select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————–
SQL_ID 14vbv6bu472ty, child number 1
————————————-
select /*+ full(e) */ e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**
Plan hash value: 3364144674
—————————————————————————————————-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
—————————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 68 |
| 1 | COUNT | | 1 | | 1 |00:00:00.01 | 68 |
|* 2 | TABLE ACCESS FULL| XXX_XX_XX_XXX_OLD | 1 | 1 | 1 |00:00:00.01 | 68 |
—————————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – filter((“E”.”AX_ID”=**** AND “E”.”BX_ID”=**))
20 rows selected.
通过对比走全表扫描和索引扫描时存在差异
SQL> select e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**;
AX_ID BX_ID COMMITED_XXXXX UNCOMMITED_XXXXX UNSHARED_XXXXX UPDATED_T ROWNUM ROWID
———- ———- ————– —————- ————– ——— ———- ——————
**** ** ********** * * ********* * AABOd5AAIAAAGcaABR
SQL> select /*+ full(e) */ e.*,rownum,rowid from XXXXXMIN.XXX_XX_XX_XXX_OLD e where e.AX_ID=**** and e.BX_ID=**;
AX_ID BX_ID COMMITED_XXXXX UNCOMMITED_XXXXX UNSHARED_XXXXX UPDATED_T ROWNUM ROWID
———- ———- ————– —————- ————– ——— ———- ——————
**** ** ********** * * ********* * AABOd5AAIAABFRCACk
小结:在执行 delete from “XXXXXMIN”.”XXX_XX_XX_XXXXX_OLD” where “AX_ID” = :1 时,该 SQL 的执行计划是走索引扫描,因为表与索引之间的逻辑数据不一致(索引列的值和行所对应的 ROWID 组成的索引数据与表数据不一致),在变量“:1”的值刚好是异常的值时,导致了 ORA 600 [13011] 的报错。
三、解决方案
1、重建 XXXXXMIN.PK_XXX_XX_XX_XXXXX 索引
因 XXXXXMIN.PK_XXX_XX_XX_XXXXX 为 ”AX_ID”, “BX_ID” 列的联合主键索引,并且 AX_ID 列作为 ”XXXXXMIN”.”XXX_VX” (“ID”) 的关联外键,BX_ID 列作为 “XXXXXMIN”.”XXX_DATAXXXXX” (“ID”) 的关联外键。所以为避免对业务造成影响,使用 ONLINE 在线重建的方式重建 XXXXXMIN.PK_XXX_XX_XX_XXXXX 索引。
ALTER INDEX XXXXXMIN.PK_XXX_XX_XX_XXX REBUILD ONLINE;
2、验证重建索引后的数据
1 select /*+ INDEX_FFS(t pk_xxx_xx_xx_xxx) */ rowid,
2 dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno,
3 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
4 from XXXXXMIN.XXX_XX_XX_XXXXX_OLD t where (t.AX_ID is not null or BX_ID is not null)
5 minus
6 select /*+ FULL(t1)*/ rowid,
7 dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno,
8 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block from XXXXXMIN.XXX_XX_XX_XXXXX_OLD t1;
no rows selected
更多 Oracle 相关信息见 Oracle 专题页面 https://www.linuxidc.com/topicnews.aspx?tid=12