Wednesday, 31 January 2024

Extended Statistics

 

Use Extended Statistics For Better SQL Execution Plans

1. https://blogs.oracle.com/optimizer/post/extended-statistics

In real-world data, there is often a relationship or correlation between the data stored in different columns of the same table. For example, consider a customers table where the values in a cust_state_province column are influenced by the values in a country_id column, because the state of California is only going to be found in the United States. If the Oracle Optimizer is not aware of these real-world relationships, it could potentially miscalculate the cardinality estimate if multiple columns from the same table are used in the where clause of a statement. With extended statistics you have an opportunity to tell the optimizer about these real-world relationships between the columns. 

https://blogs.oracle.com/optimizer/post/use-extended-statistics-for-better-sql-execution-plans

select dbms_stats.create_extended_stats(user,'t','(upper(surname))')
from dual;
begin
   dbms_stats.gather_table_stats(user,'t', 
      method_opt=>'for columns (upper(surname))',no_invalidate=>FALSE);
end; /

2. https://blogs.oracle.com/optimizer/post/why-do-i-have-sql-statement-plans-that-change-for-the-worse

Inaccurate Cardinality, Inaccurate Cost

If cardinality estimates are inaccurate, then the calculated costs are likely to be inaccurate too. The resulting execution plan might be suboptimal because it is based on incorrect or missing information. Still, it is essential to remember that this does not mean that a bad execution plan is certain. After all, the optimizer is designed to work around issues like this as much as possible

Tackling the Root Cause

If you have a problem SQL statement, I strongly suggest that you always ask yourself, ‘can I improve the cardinality estimates?’

Here are some ways to do that:

  • Ensure that statistics are present and not stale. Gather statistics regularly to keep them fresh. Hopefully, this is obvious.

  • Create histograms (manually or automatically), especially if there is a known range or value skew.

  • For cases where functions are wrapped around columns and used in query predicates, consider creating virtual columns or function-based indexes. Remember to regather statistics once the column/index has been created.

  • Create extended statistics for expressions and correlated columns.

  • Use dynamic statistics (dynamic sampling) at level 3 or more, especially if SQL statements are relatively long-running and have complex predicates (you can target individual queries or sessions; there is no need to use elevated dynamic sampling levels system-wide)

No comments:

Post a Comment