v$archived_log.applied列不更新的解决方法
在Oracle Data Guard环境中,如果查询Primary库的v$archived_log.applied列一直显示为NO,但实际上Standby库已经应用了归档日志,可能会导致无法删除主库的归档日志,因为无法确认日志是否已经被应用。
Alert: In Oracle ADG, if the redo apply instance crashes, all other instances will from ‘OPEN’ to ‘Mount’
今天在一套11 G r2版本的2节点RAC adg环境,节点1因为硬件原因异常crash(apply redo 节点), 但是实例2也上的应用也都断开了(原来都是open),adg上是有连接一些只读业务,而且节点2 db alert log未发现明显手动close 实例的日志,并且是自动切换到了mount状态,RAC不是应该高可用吗?为什么死一个节点另外的节点也要跟着受影响?
Troubleshooting db instance start failed PRCR-1064 CRS-2643 or CRS-2717 CRS-0223 during patching
12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg上篇提到了这个bug,在安装bug是不是很顺利分享一下。CRS-2717: Server ‘anbob2’ is not in any of the server pool(s) hosting resource ‘ora.stbydb.db’
CRS-0223: Resource ‘ora.stbydb.db’ has placement error.
clsr_start_resource:260 status:223
clsrapi_start_db:start_asmdbs status:22
12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg
上个月刚刚分享了一个ORA-4031 bug《Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c》,那篇中还提到了这个bug, 没想到这么快就在客户遇到。12c R2还没有安装20年07月RU的注意。ORA-04031: unable to allocate 4120 bytes of shared memory (“shared pool”,”unknown object”,”init_heap_kfsg”,”ASM extent pointer array”)
USER (ospid: 20366): terminating the instance due to error 4031
Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c
上周刚分享了《Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+》, 在当前的新版本中又存在一个打击一片的BUG, 同样现ora-4031 占用最大的内存区为init_heap_kfsg
12c R2 DB Alert Log频繁输出”An internal routine has requested a dump of selected redo”
1套Oracle 12.2 4Nodes RAC ON SELS11的本地磁盘使用率告警,DIAG目录在不断的生成redo dump的trace file, db alert log也在不停的显示如下信息:
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error
19C: 非第一个节点执行 Root.sh 提示 “ERROR 4 OPENING DOM ASM/SELF IN 0xNNNN”
2020/08/10 16:02:13 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster … succeeded
昨天一客户安装19c在非第一个节点运行root.sh时,提示下面的错误,但是检查实例状态都已启动正常。
Error 4 opening dom ASM/Self in 0x9e3add0 <-------------
Domain name to open is ASM/Self <-------------
Error 4 opening dom ASM/Self in 0x9e3add0
Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+
在12c 版本以后”ges resource dynamic”逐渐增长最终导致shared_pool可能会超过手动管理的shared pool size达到sga_max_size后出现ora-4031. 与之相关的oracle bug就好几个,这最近因为这个问题导致lmd hang堵塞了其它实例的前台进程,关掉了这个节点临时恢复,简单记录。
Oracle 19c hot backup mode? (二)
When taking hot backups, the file header is frozen while the file is being copied. This means that each data file can have a different checkpoint in your backup set. To make the backup set consistent, all files need to be recovered until their SCNs match again and the fuzzy bits have been reset.
Oracle 19c hot backup mode? (一)
没有维护过oracle 8\9那个版本时,可能不会太接触这个热备份模式, 这个技术已经被RMAN所替代很多年,但是就是这个东西,让我们在最近一次19c 数据库故障中走了弯路, 数据库的内部某个机制触发了begin backup, 因为异常crash后又归档缺失,还尝试从备份做了恢复,最终还是使用bbed修改数据文件头异常恢复
Downgrade Grid Infrastructure 12.1.0.2 to 11.2.0.4降级后crs无法启动 No voting files found
Last month, after a set of Exadata test environment of our customer environment was downgraded,Downgrade Grid Infrastructure 12.1.0.2 to 11.2.0.4, the downgrade operation was successful, but when starting the CRS, it prompted that the VD could not be found, and the VD checked that it exists.
Troubleshooting dbms_sqltune ORA-04068 ORA-04065 ORA-06508 ORA-06512 在做异常恢复后
前几日有个库sysaux和部分业务表空间数据文件损坏,在数据库强制异常恢复后, 提示dbms_sqltune使用sql profile无法使用,这个问题与对象的先后创建顺序或部分重建导致,错误信息如下,这里我还原一下问题和分享一下思路。
Troubleshooting ORA-600 [KKZGPKORID] impdp from 11G to 19C
ORA-39014: One or more workers have prematurely exited.
ORA-39029: worker 1 with process name “DW00” prematurely terminated
ORA-00600: internal error code, arguments: [KKZGPKORID], [0], [], [], [], [], [], [], [], [], [], []
Troubleshooting ORA-00600: 内部错误代码 [kdt_bseg_srch_cbk PITL1]
ORA-00600 [PITL1]
ORA-00600 [kdt_bseg_srch_cbk PITL1]
ORA-00700: soft internal error, arguments: [kgegpa:parameter corruption]
Oracle 12c R2 – 19C Instance_mode read-only(不是雪中须送炭,聊装风景要诗来。)
Oracle数据库40年来还真是“急人所急 想人所想”,不断努力在一套软件中集成所有解决方案,以至于导致有人抱怨“她”太“胖”了。有没有想过oracle数据库中的读写分离场景?首先会想到使用Active DataGuard,但是如果不要DG,只在一套数据库RAC中不同节点实现呢?如一个节点写,其它节点只读呢。
如果存在Infiniband设备,ifconfig hardware address can be incorrect可以忽略
Infiniband(IB) 是一个用网络通信标准,满足科学计算实验的要求, 致力于服务器端的高性能计算的互联技术,适合用于RAC的CACHE FUSION和ORACLE Exadata等工程系统一体机,分布式存储系统. 使用ifconfig 查看ip信息,如果服务器上有IB时会提示如下错误”Infiniband hardware address can be incorrect”
Troubleshooting ORA-00600 [qosdExpStatRead: expcnt mismatch]
This is due to sys.exp_obj$. EXP_CNT mismatch rows of sys.exp_stat$
Following SQL is used to check issue data of some objects. if there are some rows return, that means data issue.
Oracle 12C wait ‘library cache lock’ after change password even set 28401 event 案例
主要是在11g引入的安全特性延迟密码认证在3-10秒,在延迟期间以X模式持有row cache lock防止同一用户的并发失败尝试。通常是配置28401 event禁用延迟认证特性,但这也不是“万能药”,像这次的案例。 除了密码延迟认证,PASSWORD_LIFE_TIME和FAILED_LOGIN_ATTEMPTS 也是用户的警惕的地方。
Oracle 19c新特性: EXPDP 参数TTS_CLOSURE_CHECK估算Transportable Tablespace时间
TTS(Transportable Tablespace)在大型数据库迁移方案看较常见,复制数据文件实现了在线,但是导出元数据阶段需要把表空间改为只读,写业务要中断,那就存在一个问题,导出metadata元数据需要多少时间?有没有不可预见的问题?19c DATAPUMP引入新特性TTS_CLOSURE_CHECK为此而生
High wait event ‘row cache mutex’ in 12cR2、19c
In Oracle 12.2.0.1.0 (12cR2), “row cache mutex” replaced 12.1.0.2.0 (12cR1) and 11g “latch: row cache objects”, similar to “latch: library cache” substitution by “library cache: mutex X” in the previous release. High waits on “row cache mutex” when looking up user or role information in user row cache (dc_users)