ogg replicat 进程 abend 处理
Oracle Goldengate實際使用過程中經常會出現abend的現象,有時ggserr.log提示的信息又讓我們摸不著頭腦。
 2011-11-01 09:14:28? WARNING OGG-01431? Oracle GoldenGate Delivery for Oracle, rep1.prm:? Aborted grouped transaction on 'XXX.XXX_BONUS_LOG', Mapping error.
 2011-11-01 09:14:28? WARNING OGG-01003? Oracle GoldenGate Delivery for Oracle, rep1.prm:? Repositioning to rba 4662627 in seqno 2114.
 2011-11-01 09:14:28? WARNING OGG-01151? Oracle GoldenGate Delivery for Oracle, rep1.prm:? Error mapping from XXX.XXX_BONUS_LOG to XXX.XXX_BONUS_LOG.
 2011-11-01 09:14:28? WARNING OGG-01003? Oracle GoldenGate Delivery for Oracle, rep1.prm:? Repositioning to rba 4662627 in seqno 2114.
 2011-11-01 09:14:28? ERROR?? OGG-01296? Oracle GoldenGate Delivery for Oracle, rep1.prm:? Error mapping from XXX.XXX_BONUS_LOG to XXX.XXX_BONUS_LOG.
 2011-11-01 09:14:28? ERROR?? OGG-01668? Oracle GoldenGate Delivery for Oracle, rep1.prm:? PROCESS ABENDING.
上面的這個錯誤,如果用logdump去查看,你會發現日志seqno 2114 rba 4662627位置根本就不是XXX.XXX_BONUS_LOG。原因很簡單,Goldengate加載的時候默認遵循源端的事務一致性,在這個例子中,seqno 2114 rba 4662627只是事務的起點,而出錯的位置在ggserr.log中沒有辦法定位。這時候就需要用到一些特殊的參數來幫助我們來定位具體的問題原因。
- SHOWSYNTAX?
 
???????? Use the SHOWSYNTAX parameter to start an interactive session where you can view each Replicat SQL statement before it is applied. By viewing the syntax of SQL statements that failed, you might be able to diagnose the cause of the problem.
- NODYNSQL
 
???????? With DYNSQL, the default, Replicat uses dynamic SQL to compile a statement once, and then execute it many times with different bind variables.
- NOBINARYCHARS
 
???????? NOBINARYCHARS is an undocumented parameter that causes Oracle GoldenGate to treat binary data as a null-terminated string.
通過這三個參數的結合,在report文件中記錄詳細的SQL語句,和具體的出錯位置,結合logdump和具體的SQL語句,相信很快能夠定位出問題的原因。
 2011-11-01 09:15:56? WARNING OGG-01431? Aborted grouped transaction on 'XXX.XXX_BONUS_LOG', Mapping error.
 2011-11-01 09:15:56? WARNING OGG-01003? Repositioning to rba 4662954 in seqno 2114.
 2011-11-01 09:15:56? WARNING OGG-01151? Error mapping from XXX.XXX_BONUS_LOG to XXX.XXX_BONUS_LOG.
 2011-11-01 09:15:56? WARNING OGG-01003? Repositioning to rba 4662954 in seqno 2114.
 Source Context :
 ? SourceModule??????????? : [er.main]
 ? SourceID??????????????? : [/scratch/angorant/view_storage/angorant_ogg_12978807_x64/oggcore/OpenSys/src/app/er/rep.c]
 ? SourceFunction????????? : [take_rep_err_action]
 ? SourceLine????????????? : [16134]
 ? ThreadBacktrace???????? : [8] elements
 ????????????????????????? : [/u01/app/oracle/ggs/replicat(CMessageContext::AddThreadContext()+0x26) [0x5ef8b6]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x7b2) [0x5e6382]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat(_MSG_ERR_MAP_TO_TANDEM_FAILED(CSourceContext*, DBString<777> const&, DBString<777> const&, CMessageFactory::MessageDisposition)+0x9b) [0x5c4bcb]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat [0x81ac2f]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat [0x8f73e2]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat(main+0x84b) [0x50764b]]
 ????????????????????????? : [/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x30f3c1c40b]]
 ????????????????????????? : [/u01/app/oracle/ggs/replicat(__gxx_personality_v0+0x1da) [0x4e3c2a]]
2011-11-01 09:15:56? ERROR?? OGG-01296? Error mapping from XXX.XXX_BONUS_LOG to XXX.XXX_BONUS_LOG.
可以看到,出錯的位置是rba 4662954,具體的trail文件信息如下:
 Logdump 195 >n
 ___________________________________________________________________?
 Hdr-Ind??? :???? E? (x45)???? Partition? :???? .? (x04)??
 UndoFlag?? :???? .? (x00)???? BeforeAfter:???? B? (x42)??
 RecLength? :?? 227? (x00e3)?? IO Time??? : 2011/10/31 11:18:19.230.994???
 IOType???? :???? 3? (x03)???? OrigNode?? :?? 255? (xff)?
 TransInd?? :???? .? (x01)???? FormatType :???? R? (x52)?
 SyskeyLen? :???? 0? (x00)???? Incomplete :???? .? (x00)?
 AuditRBA?? :?????? 4602?????? AuditPos?? : 200518188?
 Continued? :???? N? (x00)???? RecCount?? :???? 1? (x01)
 2011/10/31 11:18:19.230.994 Delete?????????????? Len?? 227 RBA 4662954?
 Name: XXX.XXX_BONUS_LOG?
 Before Image:???????????????????????????????????????????? Partition 4?? G? m???
 ?0000 0015 0000 3230 3131 2d31 302d 3331 3a31 303a | ……2011-10-31:10:??
 ?3537 3a34 3500 0100 0a00 0000 0000 0000 1ca2 1100 | 57:45……………??
 ?0200 0a00 0000 0000 0000 0000 1b00 0300 0a00 00ff | ………………..??
 ?ffff ffff fffc e000 0400 0a00 0000 0000 0000 0000 | ………………..??
 ?0000 0500 0700 0000 034e 4554 0006 000a 0000 0000 | ………NET……..??
 ?0102 aea6 3a3b 0007 0004 ffff 0000 0008 0014 0000 | ….:;…………..??
 ?0010 3331 3131 3938 302d bbfd b7d6 bbbb b9ba 0009 | ..3111980-……….??
 Column???? 0 (x0000), Len??? 21 (x0015)??
 ?0000 3230 3131 2d31 302d 3331 3a31 303a 3537 3a34 | ..2011-10-31:10:57:45
通過logdump發現這是處于事務中間的一個刪除語句出錯了,檢查發現這張表的該記錄確實不存在,因此導致Error mapping錯誤的發生。但由于這是事務中間的一條記錄,我們不能直接跳到故障語句之后,這里還需要借助另外兩個參數的幫助。
- GROUPTRANSOPS
 
?????????Controls the number of records that are sent to the trail in one batch.
- MAXTRANSOPS
 
???????? Divides large source transactions into smaller ones on the target system.
通過這兩個參數,可以把源端大的事務拆分成小的事務。為了方便起見,我們設置這兩個參數為1。
edit params rep1
 grouptransops 1
 maxtransops 1
再重啟rep1進程,rep1進程在出錯位置停下來后,手工跳過有問題的語句。
 alter rep1, extseqno 2114, extrba 4663281
 start rep1
至此,這個問題得到了解決。當然根治這個問題最好的辦法還是全同步數據不一致的表,但在一個比較大的生產環境中重新全同步表還是比較麻煩的,在出錯語句不是太多的情況下,這也不失為一種解決辦法。而我們這個案例剛好是delete操作,因此可以簡單的跳過,如果是update或insert則還需要進一步分析。
總結
以上是生活随笔為你收集整理的ogg replicat 进程 abend 处理的全部內容,希望文章能夠幫你解決所遇到的問題。
                            
                        - 上一篇: Oracle Goldengate OR
 - 下一篇: Linux开启FTP(vsftpd)服务