Dataguard Performance

Edit (07,2013):
The following information is important about Physical Data Guard Redo Apply performance:
11g Media Recovery performance improvements include:
•More parallelism by default
•More efficient asynchronous redo read, parse, and apply
•Fewer synchronization points in the parallel apply algorithm
•The media recovery checkpoint at a redo log boundary no longer blocks the apply of the next log

In 11g, when tuning redo apply consider following:

•By default recovery parallelism = CPU Count-1. Do not use any other values.
•Keep PARALLEL_EXECUTION_MESSAGE_SIZE >= 8192
•Keep DB_CACHE_SIZE >= Primary value
•Keep DB_BLOCK_CHECKING = FALSE (if you have to)
•System Resources Needs to be assessed
•Query what MRP process is waiting

select a.event, a.wait_time, a.seconds_in_wait from gv$session_wait a, gv$session b where a.sid=b.sid and b.sid=(select SID from v$session where PADDR=(select PADDR from v$bgprocess where NAME='MRP0'))

Check: Active Data Guard 11g Best Practices Oracle Maximum Availability Architecture White Paper

When tuning redo transport service, consider following:

1 - Tune LOG_ARCHIVE_MAX_PROCESSES parameter on the primary.
•Specifies the parallelism of redo transport
•Default value is 2 in 10g, 4 in 11g
•Increase if there is high redo generation rate and/or multiple standbys
•Must be increased up to 30 in some cases.
•Significantly increases redo transport rate.
2 - Consider using Redo Transport Compression:
•In 11.2.0.2 redo transport compression can be always on
•Use if network bandwidth is insufficient
•and CPU power is available

Also consider:
3 - Configuring TCP Send / Receive Buffer Sizes (RECV_BUF_SIZE / SEND_BUF_SIZE)
4 - Increasing SDU Size
5 - Setting TCP.NODELAY to YES

Check: Redo Transport Services Best Practices Chapter of Oracle® Database High Availability Best Practices 11g Release 1
-------------------------------------------------------------------
Original Post:

Problem: Recovery service has stopped for a while and there has been a gap between primary and standby side. After recovery process was started again, standby side is not able to catch primary side because of low log applying performance. Disk I/O and memory utilization on standby server are nearly 100%.

Solution:
1 – Rebooting the standby server reduced memory utilization a little.

2 – ALTER DATABASE RECOVER MANAGED STANDBY DATABASE PARALLEL 8 DISCONNECT FROM SESSION;

In general, using the parallel recovery option is most effective at reducing recovery time when several datafiles on several different disks are being recovered concurrently. The performance improvement from the parallel recovery option is also dependent upon whether the operating system supports asynchronous I/O. If asynchronous I/O is not supported, the parallel recovery option can dramatically reduce recovery time. If asynchronous I/O is supported, the recovery time may be only slightly reduced by using parallel recovery.

3 – SQL>alter system Set PARALLEL_EXECUTION_MESSAGE_SIZE = 4096 scope = spfile;

Set PARALLEL_EXECUTION_MESSAGE_SIZE = 4096

When using parallel media recovery or parallel standby recovery, increasing the PARALLEL_EXECUTION_MESSAGE_SIZE database parameter to 4K (4096) can improve parallel recovery by as much as 20 percent. Set this parameter on both the primary and standby databases in preparation for switchover operations. Increasing this parameter requires more memory from the shared pool by each parallel execution slave process.

4 – Kernel parameters that changed in order to reduce file system cache size.

dbc_max_pct 10 10 Immed

dbc_min_pct 3 3 Immed

5 – For secure path (HP) load balancing, SQL Shortest Queue Length is chosen.

autopath set -l 6005-08B4-0007-4D25-0000-D000-025F-0000 -b SQL