Tanel Poder Advanced Oracle Troubleshooting
Tanel Poder Advanced Oracle Troubleshooting
Tanel Poder Advanced Oracle Troubleshooting
Introduction
About me:
Occupation: Expertise: DBA, engineer, researcher Oracle internals geek, End-to-end performance & scalability 10 years as DBA OCM (2002) OCP (1999) OakTable Network http://blog.tanelpoder.com
Tanel Poder
Introduction
About this presentation:
Systematic approach, rather than methodology Use right tools for right problems Break complex problems down to simple problems Therefore, use simple tools for simple problems In other words, use a systematic approach and life will be easier!
Tanel Poder
Tanel Poder
Non-systematic troubleshooting
Check alert.log Check for disk and tablespace free space Check for locks Check for xyz "We did a healthcheck and everything looks OK!"
?????!
Tanel Poder
Semi-systematic troubleshooting
Quick check for usual suspects
System load, locks, etc
then what?
Tanel Poder
Tanel Poder
Troubleshooting approaches
How do you solve problems?
a)
Change something Did it help? Problem fixed ?
b)
Understand Sure? Measure Manage change Problem fixed and prevented
Tanel Poder
Systematic troubleshooting
Understand the "flow" of a server process and how to measure it then measure it step by step using right tool at right step ...fix the problem once you understand it
Tanel Poder
Tanel Poder
Entry point
What counters are being incremented? In which kernel functions the execution is looping?
Tanel Poder
Entry point
v$session.sql_hash_value v$session.sql_id
dbms_profiler
v$sesstat
v$sql_plan_... statistics_all
Tanel Poder
Entry point
dbms_profiler
Tanel Poder
request Application
sql*net trace?
response
AWR
ASH
XYZ
Oracle Database
10046 trace Tanel Poder Wait interface
Endless request & response cycles Local procedure calls, remote procedure calls
2. Database...
a. b. c. d. ...waits for a request from an application ...issues physical IO calls to OS and waits for result ...processes the result data blocks ...returns processed results to application
Tanel Poder
4. Hardware controller...
a. ...waits for a request from the OS b. ...sends (electric) signals to actual hardware and waits for result c. ...processes the result data d. ...returns processed results to OS
Tanel Poder
Application Oracle Call Interface User program interface SQL*Net, Two-Task Common TCP/IP Ethernet / WAN link TCP/IP SQL*Net, Two-Task Common Oracle Program Interface Kernel Kompile Shared (cursors) Query Execution Runtime Kernel Cache Buffer management Kernel Service File i/o (OSD) System Kernel Generic File ? OS / IO system calls
Time
Tanel Poder
Application Oracle Call Interface User program interface SQL*NET, TNS, Two-Task Common TCP/IP Ethernet / WAN link TCP/IP SQL*NET, TNS, Two-Task Common Oracle Program Interface Kernel Kompile Shared (cursors) Query Execution Runtime Kernel Cache Buffer management Kernel Service File i/o (OSD) System Kernel Generic File ? OS / IO system calls
Application instrumentation, ltrace, truss -u"libclntsh:*" $OH/rdbms/demo/ociucb.mk, OCITrace SQL*Net trace, Wireshark TNS protocol digester Wireshark TCP protocol digester snoop, tcpdump, Wireshark Wireshark TCP protocol digester SQL*Net trace, Wireshark, Event 10079 Event 10051 sql_trace, Event 10046, 10270 v$sql_plan_statistics, v$sql_plan_statistics_all, sql_trace x$kcbsw, Event 10200,10298,10812, _trace_pin_time v$filestat, v$tempstat, v$session_wait, Event 10298 strace, truss, tusc, filemon.exe, procmon.exe
Tanel Poder
175982.1 ORA-600 Lookup Error Categories 453521.1 ORA-04031 KSFQ Buffers ksmlgpalloc @d.sql - Report data dictionary & X$ tables @pd.sql - Parameter descriptions @la.sql - Latch by address @lm.sql - Latch Misses by function location @fv.sql - Fixed variable by name @fva.sql- Fixed variable by address
Tanel Poder
Demo
Tanel Poder
Tanel Poder
Windows
procmon.exe procexp.exe
OS kernel
NIC Wire NIC Network IO Interface
Oracle Instance
Disk IO interface HBA/NIC Wire HBA/NIC
Storage subsystem
Tanel Poder
Tanel Poder
What if I need to look further inside Oracle ...if standard Oracle instrumentation isnt detailed enough... OS tools dont understand Oracle internal workings ...only for experimental environments
Tanel Poder
IO tracing events
10200, 00000, "consistent read buffer status" // *Cause: // *Action: alter session set "_trace_pin_time" = 1; // trace how long a current pin is held 10812, 00000, "Trace Consistent Reads" ( Trace into X$TRACE ) // *Cause: N/A // *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE. THIS DOES NOT // NEED TO BE TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING. 10298, 00000, "ksfd i/o tracing" // *Cause: // *Action: If this event is set then ksfd module generates tracing // for each i/o request
Tanel Poder
Tanel Poder
10079, 00000, "trace data sent/received via SQL*Net" // *Cause: // *Action: level 1 - trace network ops to/from client // level 2 - in addition to level 1, dump data // level 4 - trace network ops to/from dblink // level 8 - in addition to level 4, dump data
Tanel Poder
Entry point
dbms_profiler
snapper.sql Sesspack
Questions?
Further questions welcome at http://blog.tanelpoder.com
Thank you!