SQL Queries Hints (Advanced)

Improve query plans through use of SQL query hints.

Overview

The warehouse supports Postgres SQL queries, including query planning hints from pg_hint_plan , for both Insights and Direct Access users. SQL query hints can be used to address query execution time problems by giving the Postgres query planner "hints" to improve its plan for a query, which can lead to much better execution times. The process of developing the correct hints is a relatively technical task for advanced users, however. See the link for details on how to use the hinting mechanisms.

Some examples of using hints in queries are as follows. Note that pg_hint_planonly reads the first block comment and discards any following comments. In addition, Benching does not expose or support a hint table that can be used to systemically store hints across multiple users - hints in the query are the only support means of hinting.

Hints

Correcting Estimated Rows

One of the simplest and most common ways to improve query plans is to examine an EXPLAIN (ANALYZE)query plan output and find cases where the estimated number of rows is incorrect by large amounts (usually orders of magnitude) compared to the measured number rows (via the ANALYZE phase of the EXPLAIN plan). Direct Access warehouse users can run the EXPLAIN plan directly themselves to get this query planner output, but Insights users (who can also run the EXPLAIN) may run into a query timeout which may prevent the ANALYZE portion of their query to run to completion and may need to work with support to develop better query plans. If the query plan below indicates the rows in relation b are mis-estimated, the hint below can assign a corrected number of rows so the Postgres query planner can make a better plan. Mis-estimates of the query planner can occur in many ways, such as joins involving a expanded JSON encoded field (which are not supported by the Postgres statistics system) as well as queries involving correlated columns.

/*+ Rows(b #1000000) */ 
SELECT a.id as "Notebook Entry",
c.name as "Author",
coalesce(a.review_status, 'IN_PROGRESS') as "Review Status",
d.name as "Project",
a.created_at as "Created At"
FROM org_1.entry$raw a
LEFT JOIN org_1.entry_auditor$raw b on a.id = b.entry_id
LEFT JOIN org_1.principal$raw c on c.id = a.creator_id
LEFT JOIN org-1.project$raw d on a.source_id = d.id

Hinting for increased parallelism

The following hint causes a parallel scan on the table stagingtx.entry$raw- which may improve operations which take too long due to serial processing. The hint indicates that the relation a should be scanned with 3 parallel workers, with a hard constraint - meaning it is forcing the query planner. Another option is to used the hint soft to encourage the planner to use a parallel scan by reducing the costs associated with setting up for parallel scans. Note that the hint could reference any relation here, such as b, c or d.

/*+ Parallel(a 3 hard) */ 
SELECT a.id as "Notebook Entry",
c.name as "Author",
coalesce(a.review_status, 'IN_PROGRESS') as "Review Status",
d.name as "Project",
a.created_at as "Created At"
FROM org_1.entry$raw a
LEFT JOIN org_1.entry_auditor$raw b on a.id = b.entry_id
LEFT JOIN org_1.principal$raw c on c.id = a.creator_id
LEFT JOIN org-1.project$raw d on a.source_id = d.id

Join and Scan Hints

Many hints are available and follow a pattern similar to the hints above. Hints can modify the join strategy, scan strategy and other attributes the query planner uses. For example,

/*+ SeqScan(b) HashJoin(a b)  */ 
SELECT a.id as "Notebook Entry",
c.name as "Author",
coalesce(a.review_status, 'IN_PROGRESS') as "Review Status",
d.name as "Project",
a.created_at as "Created At"
FROM org_1.entry$raw a
LEFT JOIN org_1.entry_auditor$raw b on a.id = b.entry_id
LEFT JOIN org_1.principal$raw c on c.id = a.creator_id
LEFT JOIN org_1.project$raw d on a.source_id = d.id

hints to the planner to perform a sequential scan on org_1.entry$rawand to use a hash join between tables or subqueries named a and b.