Log File Sync
Log File Sync
Log File Sync
2009-2010 www.e2sn.com
WWW.E2SN.COM
Intro: About me
Tanel Pder
http://tech.e2sn.com - My company and technical Oracle stuff http://blog.tanelpoder.com - Personal Blog (more tech stuff) tanel@tanelpoder.com - Questions & enquiries
2009-2010 www.e2sn.com
WWW.E2SN.COM
Topics
How does commit & log file sync work
Overview
2009-2010 www.e2sn.com
WWW.E2SN.COM
DDL
Resulting recursive transactions commit
Transaction rollbacks
We had an internal rollback (because of some failure) Space allocation/ASSM problems, cancelled queries, killed sessions
2009-2010 www.e2sn.com
WWW.E2SN.COM
1) User issues a COMMIT 2) Foreground proc posts LGWR 3) LGWR issues the physical write syscall 5) LGWR posts the FG proc.
6) Commit complete
FG proc.
LGWR IO
WWW.E2SN.COM
FG proc.
LGWR IO
The physical write IO (log file parallel write) takes most of the time
Most of the log file sync time was spent waiting on log file parallel write The other components, scheduling latency, IPC were small
WWW.E2SN.COM
2009-2010 www.e2sn.com
LGWR IO
2009-2010 www.e2sn.com
WWW.E2SN.COM
2009-2010 www.e2sn.com
WWW.E2SN.COM
syscall -------------semtimedop times getrusage gettimeofday io_getevents io_submit semop semctl -------------total
2009-2010 www.e2sn.com
WWW.E2SN.COM
10
Warning
The (background processes) IO instrumentation has quite a few bugs
Different IO modes, sync, asynd, direct, buffered etc
On some versions, the log file parallel write (and db file parallel write) arent properly instrumented
Version dependent, for example 11.2.0.1 on Linux When filesystemio_options = NONE, IO syscall waits are instrumented ok (but you dont wan to use this option) When filesystem_io_options = ASYNC, IO reaping waits are all very short
However theres unaccounted time in LGWRs wait profile
WWW.E2SN.COM
11
2009-2010 www.e2sn.com
12
Instrumentation
Wait Events:
log file sync log file parallel write log file single write
2009-2010 www.e2sn.com
WWW.E2SN.COM
13
2009-2010 www.e2sn.com
WWW.E2SN.COM
14
2009-2010 www.e2sn.com
WWW.E2SN.COM
15
WWW.E2SN.COM
16
2009-2010 www.e2sn.com
WWW.E2SN.COM
17
Event Waits ------------------------ -----------log file parallel write 13,292 db file parallel write 3,470 db file sequential read 257
2009-2010 www.e2sn.com
WWW.E2SN.COM
18
2009-2010 www.e2sn.com
19
Timed_os_statistics
OS Wait-cpu (latency) time in v$sesstat
On other OSes, it cant be directly measured with standard tools Indirectly, you can look into system wide average CPU runqueue length and assume that LGWR was also queueing
Not a too systematic approach, huh?
2009-2010 www.e2sn.com
WWW.E2SN.COM
20
Measure scheduling latency (Solaris) Reading thread-level microstate accounting data with prstat:
USR SYS TRP TFL/DFL LCK SLP LAT % % % % % % % Time spent on CPU in user mode Time spent on CPU in kernel mode Time spent processing system traps (CPU traps) Time spent processing text/data page faults Time spent waiting for user locks Time spent sleeping (other than user locks) CPU scheduling latency
# prstat -mLp 5124 PID USERNAME USR 5124 oracle 4.2 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0 5124 oracle 0.0
SYS 1.6 0.6 0.6 0.6 0.6 0.0 0.0 0.0 0.0
TRP 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
TFL 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
DFL 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
LAT VCX ICX SCL SIG PROCESS/LWPID 0.1 412 20 3K 0 oracle/1 0.0 44 3 90 0 oracle/11 0.0 44 0 90 0 oracle/7 0.0 46 0 90 0 oracle/9 0.0 44 1 90 0 oracle/5 0.0 24 0 7 0 oracle/2 0.0 1 0 5 0 oracle/3 0.0 1 0 5 0 oracle/10 0.0 1 0 5 0 oracle/8
2009-2010 www.e2sn.com
WWW.E2SN.COM
21
Bugs, problems
No instrumentation
On some version/platform/IO configuration the wait interface doesnt record log file parallel write waits at all
The same goes for db file parallel writes (Ive noticed it on 9.2-10.2.0.x on Solaris for example)
For LGWR you can use V$SESSTAT redo write time statistic instead
Its in centiseconds
2009-2010 www.e2sn.com
WWW.E2SN.COM
22
Tuning
No need for tuning!
Log buffer is quite large by default
All memory remaining in a granule after the allocation for fixed SGA is given to log buffer
Since 9.2, Oracle can have the log buffer split into multiple buffers
Each protected by a separate redo allocation latch
From 10g, Oracle can keep lots of small private redo strands in shared pool
Each protected by a separate redo allocation latch @rs.sql Show redo strands available
WWW.E2SN.COM
2009-2010 www.e2sn.com
23
Evil tuning
If you dont care about the D in ACID (and want to occasionally lose data for fun), then:
10gR1:
commit_logging transaction commit log write behavior
10gR2:
commit_write commit_wait transaction commit log write behavior transaction commit log wait behavior
Old: Put redologs to /tmp (on Solaris) or in-memory disks (/dev/shm) for duration of a migration/upgrade
If your OS / server crashes, youll need to restore from a backup!
Enterprise 2.0 Service Networks 2009-2010 www.e2sn.com
WWW.E2SN.COM
24
No log file sync waits log buffer space / log file switch completion waits more likely!
2009-2010 www.e2sn.com
WWW.E2SN.COM
25
You should not put LGWR into the highest real-time class
Real time is tricky your process can monopolize a CPU for itself You dont want to make LGWR pre-empt the OS kernel! Note that Oracle sets some processes into higher priority by default:
_high_priority_processes
Enterprise 2.0 Service Networks 2009-2010 www.e2sn.com
LMS*|VKTM 26
WWW.E2SN.COM
LGWR configuration - IO
Reduce the amount of work and waiting a log file parallel write has to do
Unbuffered concurrent IO
Verify with truss/strace whether proper flags are used (O_DIRECT, O_DIO, O_CIO etc)
WWW.E2SN.COM
27
COMMIT LESS!!!
Commit when your business transaction ends, not after every single update!
Enterprise 2.0 Service Networks 2009-2010 www.e2sn.com
WWW.E2SN.COM
28
Summary
Application: Commit less!
Ideally only when your logical business transaction ends
Troubleshooting: Measure log file sync at session level detail CPU: If waits for log file sync are significant - see whether LGWR gets:
Enough (quality) CPU time Onto the CPU fast enough
IO: See how much LGWR waits for log file parallel write event
Whats the log file parallel write completion time V$EVENT_HISTOGRAM for better detail
Enterprise 2.0 Service Networks 2009-2010 www.e2sn.com
WWW.E2SN.COM
29
Thanks!
Download slides from:
http://tech.e2sn.com/oracle-slides-and-whitepapers
Blog:
http://blog.tanelpoder.com
Email:
tanel@tanelpoder.com
2009-2010 www.e2sn.com
WWW.E2SN.COM
30