Troubleshooting Oracle LGWR wait Event ‘reliable message’ and %sys CPU Usage High, instance crash during DSG running
最近一客银行客户的Oracle环境,在部署了DSG做数据抽取后,数据库频繁的重启,希望分析一下原因, 环境oracle 12c 2nodes- RAC on RHEL x86-64 7.3 , 数据库实例为Datatguard Pyhical Standby端,使用多租户。开始LGwr等待 ‘reliable message’,后出现IPC Send timeout detected, 过几分钟后实例2驱逐,不久后实例1 crash 。Oracle home和/ 使用XFS 文件系统。 问题期间大量进程活动,从ps查看处于D状态,并且WCHAN等待为xlog_G开头的函数调用,这里记录一下该事件。
Migrate Oracle to PostgreSQL (系): AS Table OF
在 PostgreSQL 中,PL/pgSQL 不直接支持 TABLE OF 类型,这是 Oracle PL/SQL 中的一种集合类型。不过,PostgreSQL 提供了其他方式来实现类似的功能,例如使用数组、复合类型(composite types)和表值函数(table-valued functions)。最近在一个生产数据库oracle迁移到postgresql替换过程中,遇到了拆分由逗号分隔的字符串返回table的需求
Oracle国产化改造迁移时的问题: Varchar data type中的 invalid 字符集字符编码
最近在从orale迁移到Postgresql系的数据库时,迁移工具(Java JDBC)几张表迁移失败,源库和目标都是GBK字符集,但在迁移过程中提示ERROR: character with byte sequence 0xe4 0xb8 0xad in encoding “UTf8” has no equivalent in encoding “GBK”,说明在GBK数据库中有部分列数据的值是UTF8编码,这种以GBK字符集查询会乱码,验证在源库入库时错误,迁移时需要注意。
Troubleshooting Oracle 12.1 ‘enq: TC – contention’ wait event and related issues
Recently, a customer’s Oracle database RMAN backup job failed. When doing a full backup, it took a long time to complete. After checking the database, it was found that the backup process was waiting for ‘enq: TC – contention’. The environment was Oracle 12.1 2-nodes RAC. I would like to record this problem.
Oracle国产化改造迁移时的问题: Date data type中的 invalid date 0000-00-00(zero year )
我和我的团队近2年一直在做迁移oracle到国产库的项目,数据迁移相对于PLSQL对象兼容改写更加容易,一般迁移工具做好源和目标的data type映射基本就可以,但是有一些情况,如《Oracle国产化改造迁移时的问题: Number data type中的 invalid number》记录的那样无效number,有些数据在oracle中可以存储,但迁移到目标库时可能无法存储,最近我们在迁移一套oracle到postgresql系的国产库时,在date数据类型的列出现了无效日期数据0000年。
Oracle Column Group Extended Statistics列组扩展统计信息
扩展统计信息(也称为列组扩展)是 Oracle 11g 中引入的重要统计信息改进之一。虽然 Oracle Cost Based Optimizer 能够获得正确的单列选择性估计,但它无法计算出查询谓词中存在的两个或多个相关列的联合的基数。为这种列的联合计算的列组扩展旨在帮助 CBO 弄清楚这种列的相关性,以便获得准确的估计。但在某些情况下,CBO 拒绝使用列组扩展。
How to reduce space of the largest object (table system.logmnr_restart_ckpt$)in System Tablespace
Today, I noticed that the customer’s system tablespace usage is quite large, currently around 3.5TB. The largest object is the system.Logmnr_restart_ckpt$ table, which is already close to 2TB in size. The next largest is the aud$unified table used for unified auditing. In my blog yesterday 《Know more about Unified Auditing in Oracle 19c》
Know more about Unified Auditing in Oracle 19c
Today, a customer encountered a database issue in an Oracle 19c 4-node RAC environment on an Oracle Exadata machine. The database is experiencing a high number of active sessions—thousands in total—indicating waits for ‘enq: hw contention’ and ‘enq: tx contention.’ The blocked business session is executing the SQL statement “insert into AUD$UNIFIED…,” related to unified auditing of the database
How to diagnose slow performance or long execution times with Oracle Data Pump (expdp)?
You can easily track the duration of each export/import operation by directing the export/import job to write timestamps to the logfile using the LOGTIME parameter. For more details, refer to Expdp/Impdp LOGTIME.
However, simply having this information alone is often insufficient, even if you know there was a version or operating system change. What’s really needed to diagnose or analyze performance is concrete data—and that’s where the METRICS and LOGTIME parameters come in handy.
Troubleshooting ‘local write wait’ wait event during Truncate table
I have seen problems with Local Write Wait in the Oracle database, normal tables in the databases were being used for temporary working storage before that data was then written to another table. The content of the working storage tables was then cleared out by periodically truncating them.
Migrate oracle to openGauss/oceanbase/达梦/kingbase: md5 function
在十年前简单测试过oracle 9i 的加密解密用法之dbms_obfuscation_toolkit(二),其中有md5单向加密, 最近在oracle迁移到opengauss项目中用到了md5, 这里简单记录替换方案,在pg或og中直接就有md5 function. 在mysql及Mysql系的产品和ocenabse, 达梦一样存在该函数md5。
Migrate oracle to openGauss: dbms_crypto.encrypt /decrypt functions
在oracle迁移opengauss数据库时,可能会遇到在oracle数据库中使用dbms_crypto 加密的数据 ,在目标数据库opengauss有时也不需要完全等同,仅实现加密功能即可,需要我们改写对应的存储过程,或自定义包装function, 也需要合理规划数据迁移的一些方法,比如需要先解密,在目标库重新加密,尤其是加密方法不同,避免迁移源加密数据到目标库后无法解密,当然如果应用层能实现加密功能那是极好的
Migrate oracle to openGauss: cast_to_raw/cast_to_varchar2 & base64_encode/base64_decode functions
我和我们的团队最近在迁移oracle到openGauss(postgresql)时现在有一些存储过程中使用了加密函数,其中有一些涉及到编码的package 如utl_i18、utl_raw、utl_encode,对一些明文数值进行raw或base64编码,这里记录一下oracle到opengauss后对应的函数实现, 基本也适用于postgresql,下一篇会记录加密函数。
v$active_session_history slow, 如何查询v$fixed_view_definition中的全文本?
最近一个客户oracle数据库中大量的活动会话在查询v$active_session_history, 原因是一个监控软件的刷新ASH数据,但是该SQL正常也都是秒回,该数据库一次查询近6分钟,好奇要分析一下这个案例,但因为客户环境无法远程,所以无分析过程,这里记录我的分析思路,同时在分析v$active_session_history 发现取完整的SQL定义并不是很容易,记录oradebug peek , objdump, gdb等方式读取内存中FIXED VIEW定义的方法。
Troubleshooting Oracle RAC hang due to DBreplay capture write wcr file on NFS
我之前写过一篇关于
Oracle 19c RAC(19.11) ‘crsctl stop crs’ without -f Terminated Database Another node Instances, and more about Flex ASM
最近,在对一套Oracle 19c RAC(Real Application Clusters)进行计划内的维护操作时,遇到了一个预料之外的问题。原本打算通过一个常规的滚动停实例操作来维护其中一个节点,即通过执行 crsctl stop crs 命令来停止节点1的实例。然而,在执行此命令后,意外地发现节点2的实例也随之自动终止了。这一情况发生在Oracle 19.11版本中,如果此类问题频发,无疑会对未来的维护操作带来不确定性。
在去年《Troubleshooting Oracle RAC node OS shutdown (‘crsctl stop crs -f’) cause db instance stop on another node》我们也曾遇到过类似的问题,当时使用的版本是19.10。
Troubleshooting Oracle 12c/19c logon and DML fail with ORA-00604 &ORA-00904: “DECL_OBJ#”: invalid identifier
最近在Oracle 12C环境中,一个用户在尝试登录备用数据库(standby)时失败。在修复主数据库(primary db)并执行datapatch之后,发现大量包(package)失效,导致正常业务运行也出现错误。特别是在递归调用一个触发器(trigger)时,出现了错误:ORA-00904: “DECL_OBJ#”。此外,尝试禁用或删除该触发器时也失败。这是一个需要重点记录的问题。
Troubleshooting Oracle 19c real-time statistics row cache lock(dc_realtime_tabst)
Recently, a customer’s Oracle database encountered a performance problem, waiting for “row cache lock”. Analysis showed that the cache# p1 value pointed to the dc_realtime_tabst 。
How to fixed Oracle RAC Apache Tomcat CVE-2024-21733 issue?
Starting from 12.2.0.1 Grid infrastructure home does have tomcat installed on the GI_HOME. A security scan may find Tomcat security vulnerability CVE-2024-21733 in your Oracle RAC environment. How do I fix it ?
Troubleshooting Oracle a node CRS (asm resource) start fail with “CRS-5019” error after OS reboot
客户一套ORACLE 3 node RAC, 因为需要节点3 主机停机维护, 重启后CRS无法启动, 其它两个节点node 1 ,node2 运行正常。 node 3启动过程中 ora.asm 资源启动hang ,等待enq: dd – contention , 简单记录分析步骤。