網(wǎng)友發(fā)來告警日志,原本是關(guān)于一個死鎖的情形,而另外的一個問題則是從redo log buffer寫出到redo log file出現(xiàn)了不能分配新的日志,Private strand flush not complete的等待事件。這是個和redo log相關(guān)的話題,從Meatlink也找到了對此的描述如下文。
1、錯誤消息
Tue Sep 24 14:27:48 2013
Thread 1 cannot allocate new log, sequence 22120
Private strand flush not complete
Current log# 4 seq# 22119 mem# 0: /u01/app/Oracle/oradata/orcl/redo04.log
2、Meatlink 對此的描述(Doc ID 372557.1)
Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to 11.2]
Information in this document applies to any platform.
Private strand flush not complete
"Private strand flush not complete" messages are being populated to the alert log, example:
Mon Jan 23 16:09:36 2012
Thread 1 cannot allocate new log, sequence 18358
Private strand flush not complete
Current log# 7 seq# 18357 mem# 0: /u03/oradata/bitst/redo07.log
Thread 1 advanced to log sequence 18358
Current log# 8 seq# 18358 mem# 0: /u03/oradata/bitst/redo08.log
When you switch logs all private strands have to be flushed to the current log before the switch is allowed to proceed.
--切換日值前,所有的private strands必須寫入到當(dāng)前的redo logfile
The message means that we haven't completed writing all the redo information to the log when we are trying to switch. It is similar in nature to a "checkpoint not complete" except that is only involves the redo being written to the log. The log switch can not occur until all of the redo has been written.
-->該消息意味著在日志切換前我們不能夠完整的寫出redo到日志文件。其本質(zhì)類似于checkpoint not complete等待事件。所不同的是它僅僅涉及到正在被寫入到日志的redo
A "strand" is new terminology for 10g and it deals with latches for redo . -->strand是一個用于處理redo latch的新術(shù)語
Strands are a mechanism to allow multiple allocation latches for processes to write redo more efficiently in the redo buffer and is related to the log_parallelism parameter present in 9i.
The concept of a strand is to ensure that the redo generation rate for an instance is optimal and that when there is some kind of redo contention then the number of strands is dynamically adjusted to compensate.
-->最大的作用是用于確保redo產(chǎn)生的速率達(dá)到最佳,并在出現(xiàn)相關(guān)redo競爭的時候動態(tài)調(diào)整strand的值進(jìn)行補(bǔ)償
The initial allocation for the number of strands depends on the number of CPU's and is started with 2 strands with one strand for active redo generation.
For large scale enterprise systems the amount of redo generation is large and hence these strands are *made active* as and when the foregrounds encounter this redo contention (allocated latch related contention) when this concept of dynamic strands comes into play.
There is always shared strands and a number of private strands .
Oracle 10g has some major changes in the mechanisms for redo (and undo), which seem to be aimed at reducing contention.
-->在10g中有很大的變化,最主要的目的還是為了減少競爭
Instead of redo being recorded in real time, it can be recorded 'privately' and pumped into the redo log buffer on commit.
Similarly the undo can be generated as 'in memory undo' and applied in bulk. This affect the memory used for redo management and the possibility to flush it in pieces. The message you get is related to internal Cache Redo File management.
...You can disregard these messages as normal messages. --->可以當(dāng)作常規(guī)消息被忽略
These messages are not a cause for concern unless there is a significant time gap between the "cannot allocate new log" message and the "advanced to log sequence" message. --->如果"cannot allocate new log" 與"advanced to log sequence"有明顯的時間間隔,應(yīng)考慮增加db_writer_processes
Increasing the value for db_writer_processes can in some situations help to avoid the message from being generated. Why, because one of the DBWR main function is to keep the buffer cache clean by writing out dirty buffer blocks. So having multiple db_writer_processes should be able to produce a higher throughput.
Finally, these messages have also been seen when there are issues with the storage side or network for the archive log destination, as this leads to delay or hang in LGWR switch.
3、延伸思考
在高并發(fā),多用戶的數(shù)據(jù)庫系統(tǒng)中,所有客戶端進(jìn)程都是通過向redo log buffer寫入重做數(shù)據(jù)來確保數(shù)據(jù)的完整與一致性。對于redo log buffer的管理,則通過latch的機(jī)制來實(shí)現(xiàn)。和redo相關(guān)的latch主要有兩個,一個是redo allocation latch,一個是redo copy latch。前者負(fù)責(zé)將為新的redo在redo log buffer中分配空間,后者則是pga中的redo復(fù)制到redo log buffer。下面是描述一下redo產(chǎn)生的流程。
用戶進(jìn)程產(chǎn)生redo(位于PGA中)====>服務(wù)器進(jìn)程獲取Redo Copy latch(存在多個取決于CPU_COUNT*2)====>服務(wù)進(jìn)程獲取redo allocation latch(僅1個)====>分配log buffer====>釋放redo allocation latch====>將Redo Entry寫入Log Buffer====>釋放Redo Copy latch
如前文Doc ID 372557.1所述,Oracle 9.2之后引入了log_parallelism機(jī)制,當(dāng)該參數(shù)的值大于1的時候,數(shù)據(jù)庫會分配多個共享的redo log buffer,也就是說redo log buffer被再次細(xì)分,使得每個共享的buffer使用獨(dú)立的redo allocation latch來進(jìn)行保護(hù)以提高redo的并發(fā)性。這些個共享的redo log buffer就被稱之為 shared strand。在10gR2以后了又多出了一個private strand,這個東東是從shared pool中分配而不是先前的log buffer。private strand為大量小的私有內(nèi)存,通常每個大小在64kb-128kb左右,被獨(dú)立的redo allocation latch所保護(hù)。每個特定的小事務(wù)會綁定到獨(dú)立且空閑的private redolog strand,即綁定到一個活動事務(wù)。在這種新機(jī)制引入后,一旦用戶進(jìn)程申請到private strand,redo不再保存的pga中,因此不再需要redo copy latch這個過程。如果新事務(wù)申請不到private strand的redo allocation latch,則會繼續(xù)遵循舊的redo buffer機(jī)制,申請寫入shared strand中。由于新機(jī)制的引入,相應(yīng)的redo的產(chǎn)生發(fā)生了一些變化,如下:
新事務(wù)開始====>申請private strand的redo allocation latch(申請失敗則申請shared strand的redo allocation latch)====>在private strand中生產(chǎn)redo Entry====>flush/commit====>申請redo copy latch====>LGWR將redo entry批量寫入log File====>釋放redo copy latch====>釋放Private strand的redo allocation latch
對于這個新的機(jī)制,在進(jìn)行redo被寫出到logfile時,LGWR需要將shared strand與private strand的內(nèi)容寫出。當(dāng)redo flush發(fā)生時,所有的publicredo allocation latch需要被獲取,所有的public strands的redo copy latch需要被檢查,所有包含活動事務(wù)的private strands需要被持有。
由上可知,Private strand flush not complete事件的出現(xiàn)是通過增加參數(shù)DBWn的值來避免。因?yàn)镈BWn會觸發(fā)LGWR將redo寫入到logfile。
更多Oracle相關(guān)信息見Oracle 專題頁面 http://www.linuxidc.com/topicnews.aspx?tid=12