Wednesday, 31 January 2024

Extended Statistics

 

Use Extended Statistics For Better SQL Execution Plans

1. https://blogs.oracle.com/optimizer/post/extended-statistics

In real-world data, there is often a relationship or correlation between the data stored in different columns of the same table. For example, consider a customers table where the values in a cust_state_province column are influenced by the values in a country_id column, because the state of California is only going to be found in the United States. If the Oracle Optimizer is not aware of these real-world relationships, it could potentially miscalculate the cardinality estimate if multiple columns from the same table are used in the where clause of a statement. With extended statistics you have an opportunity to tell the optimizer about these real-world relationships between the columns. 

https://blogs.oracle.com/optimizer/post/use-extended-statistics-for-better-sql-execution-plans

select dbms_stats.create_extended_stats(user,'t','(upper(surname))')
from dual;
begin
   dbms_stats.gather_table_stats(user,'t', 
      method_opt=>'for columns (upper(surname))',no_invalidate=>FALSE);
end; /

2. https://blogs.oracle.com/optimizer/post/why-do-i-have-sql-statement-plans-that-change-for-the-worse

Inaccurate Cardinality, Inaccurate Cost

If cardinality estimates are inaccurate, then the calculated costs are likely to be inaccurate too. The resulting execution plan might be suboptimal because it is based on incorrect or missing information. Still, it is essential to remember that this does not mean that a bad execution plan is certain. After all, the optimizer is designed to work around issues like this as much as possible

Tackling the Root Cause

If you have a problem SQL statement, I strongly suggest that you always ask yourself, ‘can I improve the cardinality estimates?’

Here are some ways to do that:

  • Ensure that statistics are present and not stale. Gather statistics regularly to keep them fresh. Hopefully, this is obvious.

  • Create histograms (manually or automatically), especially if there is a known range or value skew.

  • For cases where functions are wrapped around columns and used in query predicates, consider creating virtual columns or function-based indexes. Remember to regather statistics once the column/index has been created.

  • Create extended statistics for expressions and correlated columns.

  • Use dynamic statistics (dynamic sampling) at level 3 or more, especially if SQL statements are relatively long-running and have complex predicates (you can target individual queries or sessions; there is no need to use elevated dynamic sampling levels system-wide)

Tuesday, 9 January 2024

What to look for a slow query in oracle

 

  • Query execution plan: This is a list of steps the database takes when running the query. 
  • Indexes: Indexes are often most useful when created on columns as part of the Join and Where clauses. Check which columns are used, check if they have indexes, and create them if they don't.
  • Statistics: Ensure that the tables and indexes used by the query have up-to-date statistics. You can use the DBMS_STATS package to gather statistics.
  • To get actual cardinality  SELECT /*+ GATHER_PLAN_STATISTICS */ 
  • CPU usage: Use the V$SESSTAT view to determine which session is using the most CPU.  Use V$STATNAME for statistics-id description.
  • Long-running queries: Use the v$session_longops view to find long-running queries.



System-wide response time: Use V$sysmetric;  v$active_session_history, v$sqlArea;
v$sysmetric_summary to measure system-wide response time.

------------------------------------------------------------------------------

To display the execution plan for a specific SQL ID: Use  the dbms_xplan.display_awr procedure.

To Display the execution plan of the last SQL statement 
SET LINESIZE 150 SET PAGESIZE 2000 
SELECT * FROM Table(DBMS_XPLAN.DISPLAY_CURSOR);
 


A db file sequential read wait event occurs when a process requests a single block of data from a datafile and is waiting for the operation to complete. This event is initiated by SQL statements that perform single-block read operations. 

A db file scattered read wait event is a wait for a physical read of multiple Oracle blocks from the diskIt's similar to a db file sequential read, but the session is waiting for a multi-block I/O read request to complete. 
A db file scattered read is usually caused by scanning the entire or a sub-range of a table, index, table partition, or index partition. It's associated with full table scans and index fast full scans (FFS) operations


Monday, 8 January 2024

Analyzing TKPROF Files

 https://expertoracle.com/2017/11/24/performance-tuning-basics-7-trace-and-tkprof-part-3-analyzing-tkprof-files/


Overall the structure of the tkprof body output is:

  • SQL Statement
  • Parse/Execute/Fetch statistics and timings
  • Library Cache information
  • Row source plan
  • Events waited for by the statement

Please note that this is Tkprof doesn’t contains the wait event information as the trace that was applied was not level 8 (with waits) trace.

Here SQL_ID is 6j28xhmyb12m7 and plan hash value assigned to this SQL_ID is 2622831843. Now go through below table to understand what each term means in the above tkprof portion

TERM UNIT OF MEASUREMEANING
CallNot ApplicableStatistics for each cursor’s activity are divided in to 3 areas: Parse/Execute/Fetch. A total is also calculated.
ParseNumber of timesStatistics from parsing the cursor are shown. This step checks for proper security authorization and checks for the existence of tables, columns, and other referenced objects and translates the SQL statement into an execution plan.The parses might be HARD or SOFT
ExecuteNumber of timesstatistics for the execution phase of a cursor. For INSERT, UPDATE, and DELETE statements, this modifies the data. For SELECT statements, this identifies the selected rows.
FetchNumber of timesStatistics for actually fetching the rows. Fetches are only performed for SELECT statements.
countNumber of timesNumber of times each individual activity like parsed, executed, or fetched has been performed on this particular cursor. You should investigate why the counts are high. Specially if you see high number of parses then it is very important to find the cause.
cpuSecondsThe total CPU time in seconds for all parse, execute, or fetch calls. Remember that TIMED_STATISTICS database parameter must be enabled to populate this information or else it will show 0 here.
ElapsedSecondsTotal elapsed time in seconds for all parse, execute, or fetch calls.It is the time taken to identify and modify rows during statement execution on the server.
If this value is higher than CPU time, then you should review the wait events sections in the tkprof for the SQL to understand where the waits happened.
diskNumber of Data BlocksTotal number of data blocks physically read for all parse, execute, or fetch calls. Generally it would be preferable for blocks to be read from the buffer cache rather than disk.
queryNumber of Data BlocksThis is Logical read. Total number of buffers retrieved in consistent mode for all parse, execute, or fetch calls.  A Consistent mode buffer is one that has been generated to give a consistent read snapshot for a long running transaction. Consistent read means “get the block as it appeared at the beginning of our statement, using rollback data if needed to undo changes”. Usually, this type of logical read is used by queries.
currentNumber of Data BlocksThis is Logical read. Total number of buffers retrieved in current mode. Current mode means “get the as it exists RIGHT NOW. No need of applying any rollback data etc”. Buffers are retrieved in current mode for statements such as INSERT, UPDATE, and DELETE.
rowsNumber of rowsTotal number of rows processed by the SQL statement. This total does not include rows processed by subqueries of the SQL statement.
For SELECT statements, the number of rows returned appears for the fetch step
For UPDATE, DELETE, and INSERT statements, the number of rows processed appears for the execute step.
totalnumber of timesSum of all operations
Misses in library cache during parse number of timesIndicates that the the statement was found (soft parse) or not found (hard parse) in the library cache.
If it is 0 then it means soft parse happened as there was no miss.
If it is 1 then it means one hard parse happenedIf there are a high number of parses per execution, this could indicate the lack of use of bind variables in SQL statements or poor cursor reuse. So our goal is that misses should be minimum or zero.
Misses in library cache during execute number of timesIt means that the SQL text found but cursor not valid for reuse. Either object invalidation through DDL or different objects referenced/privileges meaning a new child cursor must be spawned. Misses while about to execute the plan and found it missing/invalid in library cache.
If no hard parse occurred during execute calls, that specific line is missing.
Optimizer mode Not ApplicableFirst_rows mode is a cost-based optimizer mode that will return rows as soon as possible, even if the overall query runs longer or consumes more resources.Such setting is good when user wants to see the result quicker. This optimizer mode favors index scans over full table scans.

All_rows mode ensures that the overall query time is minimized, even if it takes longer to receive the first row.
Default optimizer_mode is all_rows as can be seen in above example trace.

Parsing user id Not Applicable Database user id which was used to do parsing. If you specified the “explain” option also when generating TKPROF then username will also be mentioned along with userid.
Recursive DepthNumberRecursive depth is provided only for recursive SQL statements. SQL statements directly executed by an application have a depth of 0.

Recursive depth 1 here indicates that the UPDATE is a child of some other piece of code with depth n-1 (in this case 0).
For PL/SQL block, which isn’t executed by the database recursively, the recursive depth will not be shown as it will be 0.

Number of plan statistics captured Number It tells you how many executions Oracle recorded information about the plan. It uses these to populate the “Rows (1st) Rows (avg) Rows (max)” columns in the row source operation
Row source plan Not ApplicableThis section displays the access path used at execution time for each statement along with timing and actual row counts returned by each step in the plan. This can be very useful for a number of reasons.

Row source plans are generated from STAT lines in the raw trace.
STAT lines are written to trace every now and then, but sometimes, if the cursor is not closed cleanly then STAT lines will not be recorded and then the row source plan will not be displayed. Setting SQL_TRACE to false DOES NOT close all cursors. Cursors are closed in SQL*Plus immediately after execution. The safest way to close all cursors is to cleanly exit the session in question.

Rows (1st) Number of rows While database version before 11.2.0.2 used to provide a single column (“Rows”) for reporting the number of rows returned by every row source operation, 11.2.0.2 provides three columns (“Rows (1st)”, “Rows (avg)” and “Rows (max)”).
Number of rows returned on the first execution
Rows (avg) Number of rowsAverage number of rows returned for all execution
Rows (max) Number of rowsMaximum number of rows returned for all executions
Row Source Operation Not Applicable“Row Source Operation” is the actual execution plan written in the trace file by the server process.

Under “Row Source Operation” Heading first thing you will see is the operation type, example “Index range Scan” ” Nested Loops” “Select..” etc. After that in the brackets you will find the IO statistics like:

cr = number of blocks read with logical reads in consistent mode.
pr= number of blocks read with physical reads from the disk.
pw= number of blocks written with physical writes to the disk.
time = shows the cumulative elapsed time for the step and the steps that preceded it in microseconds (µs: 1/1000000 of a second).  This information is very useful when looking for the point in an access path that takes all the time. By looking for the point at where the majority of the time originates it is possible to narrow down a number of problems.
cost =  estimated cost of the operation used by the optimizer for internal comparison
size = estimated space usage of the operation in bytes
card = estimated cardinality (number of rows returned) of that particular operation

These statistics can be useful in identifying steps that read or write a particularly large proportion of the overall data.
IMPORTANT: if the explain argument is specified during tkprof creation then you will also see the second Execution Plan which is generated by TKPROF. Since the second execution plan is generated by TKPORF WITHOUT EXECUTING THE SQL so that second execution plan may not be same as the first execution plan (“Row Source Operation”). Always refer “Row Source Operation” execution plan as that is the actual execution plan which was followed when the SQL was actually running.

Events waited for by the statement Not ApplicableAlthough this section is not displayed in the above sample trace as we didn’t include the waits option while generating the trace file. This section displays all wait events that a statement has waited for during the tracing. This section can be very useful when used in conjunction with the statistics and row source information for tracking down the causes of problems associated with long wait times. High numbers of waits or waits with a long total duration may be candidates for investigation dependent on the wait itself.

Thursday, 4 January 2024

Not Exists and Not In difference

 

  • NULLs
    NOT IN will not match any rows when the subquery returns even one null. NOT EXISTS can handle the NULL value.
  • Complexity
    NOT IN is simpler and tests for the presence of an element in a set. NOT EXISTS can handle more complicated queries, including grouping, results with multiple conditions, and can take advantage of indexes.
  • Performance
    NOT EXISTS is significantly faster than NOT IN, especially when the subquery result is very large.
  • Nested or Hash full table scans
    NOT IN performs nested full table scans. NOT EXISTS can use an index within the sub-query.

Sharding in Oracle 12 c and above


Sharding is a common way to implement horizontal scaling. Horizontal scaling is the practice of adding more machines to an existing stack to spread out the load and allow for more traffic and faster processing.

.