By digoal
How does PostgreSQL add the hook feature to do statistics on the SQL statements of fully scanned tables?
In PostgreSQL, you can create a custom plug-in to use the hook mechanism, thus getting the statistics of fully scanned tables and the corresponding SQL statements. PostgreSQL's hook mechanism allows developers to insert custom behavior at key operating points without modifying the core PostgreSQL code.
To implement this feature, you may need to use an executor hook or a planner hook depending on when you want to capture the information of a full table scan. One possible approach is to use ExecutorStart_hook
which is called before the execution plan starts, or ExecutorRun_hook
, which is called before the plan actually runs.
Here's a simplified example of how to use ExecutorStart_hook
to add a hook to get statistical information of full table scans:
1. Create a new plug-in project and write custom hook code.
#include "postgres.h"
#include "executor/executor.h"
#include "nodes/plannodes.h"
PG_MODULE_MAGIC;
/* Declare the original ExecutorStart function pointer */
static ExecutorStart_hook_type prev_ExecutorStart = NULL;
/* Implement the custom ExecutorStart hook */
void MyExecutorStart(QueryDesc *queryDesc, int eflags)
{
/* Check the plan before calling the original ExecutorStart */
if (queryDesc->plannedstmt != NULL && queryDesc->plannedstmt->planTree != NULL)
{
Plan *plan = queryDesc->plannedstmt->planTree;
if (nodeTag(plan) == T_SeqScan)
{
/* A full table scan is found. You can make a record or do statistics here */
/* Example: Output to server logs */
elog(LOG, " Full table scan performed: %s", queryDesc->sourceText);
}
}
/* Call the original ExecutorStart function */
if (prev_ExecutorStart)
{
prev_ExecutorStart(queryDesc, eflags);
}
else
{
standard_ExecutorStart(queryDesc, eflags);
}
}
/* Initialization function when the module is loaded */
void _PG_init(void)
{
/* Save the original ExecutorStart function pointer */
prev_ExecutorStart = ExecutorStart_hook;
/* Set a new hook */
ExecutorStart_hook = MyExecutorStart;
}
/* The cleanup function when the module is uninstalled */
void _PG_fini(void)
{
/* Restore the original hook */
ExecutorStart_hook = prev_ExecutorStart;
}
2. Compile your plug-in code as a shared library and make sure it is in the PostgreSQL library path.
3. Load the plug-in in PostgreSQL. You can preload the plug-in at database startup by setting the parameter shared_preload_libraries
, or load it dynamically using the LOAD
command.
-- Add to postgresql.conf
shared_preload_libraries = 'my_custom_plugin'
-- Or use an SQL command to dynamically load
LOAD 'my_custom_plugin';
4. Restart the PostgreSQL service for the settings to take effect, or run the LOAD command.
pg_ctl restart
The hook in this example checks whether the current execution plan contains a full table scan (SeqScan) and writes the relevant information to the log before the execution plan starts.
Note that more complex situations need to be handled in actual use, such as nested nodes, multi-table scans, and different types of scans. You may need to traverse the entire plan tree to accurately detect all full table scans.
Before implementing such a custom hook, it is important to conduct adequate testing and code reviews to ensure that it does not affect the performance and stability of the database.
In PostgreSQL, if you want to capture and make statistical analysis on SQL queries of full table scans (Sequential Scans), and output information including database names, schema names, table names, and SQL statements, you may need to rely on several internal hook functions and combine system directory information. The following are the basic steps and sample code to show how to implement this feature by creating a custom plug-in that uses an executor hook.
ExecutorStart_hook
, ExecutorRun_hook
, or ExecutorFinish_hook
. In this hook function, you will check whether the execution plan contains full table scans and collect the required information.SeqScan
nodes. If so, you can obtain the corresponding database name, schema name, and table name from the system directory table.The following is a basic sample code that shows how to define an ExecutorStart_hook
and output information to the log when a full table scan is detected:
#include "postgres.h"
#include "executor/executor.h"
#include "nodes/plannodes.h"
#include "utils/rel.h"
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
static void myExecutorStart(QueryDesc *queryDesc, int eflags);
static ExecutorStart_hook_type prev_ExecutorStart = NULL;
static void myExecutorStart(QueryDesc *queryDesc, int eflags) {
if (prev_ExecutorStart) {
prev_ExecutorStart(queryDesc, eflags);
} else {
standard_ExecutorStart(queryDesc, eflags);
}
Plan *plan = queryDesc->plannedstmt->planTree;
if (nodeTag(plan) == T_SeqScan) {
SeqScan *scan = (SeqScan *)plan;
Relation relation = scan->scanrelid;
Oid relid = relation->rd_id;
// Obtain the table information
char *relname = get_rel_name(relid);
Oid schema_oid = get_rel_namespace(relid);
char *schemaname = get_namespace_name(schema_oid);
// Obtain the database information
char *dbname = get_database_name(MyDatabaseId);
// Output the log information
elog(LOG, "Database: %s, Schema: %s, Table: %s, SQL: %s",
dbname, schemaname, relname, queryDesc->sourceText);
}
}
void _PG_init(void) {
prev_ExecutorStart = ExecutorStart_hook;
ExecutorStart_hook = myExecutorStart;
}
void _PG_fini(void) {
ExecutorStart_hook = prev_ExecutorStart;
}
This code provides a basic framework. In real applications, you may need to do more error checking, traverse the entire plan tree to check all possible SeqScan
nodes, and consider factors such as concurrency and performance impact.
In addition, to compile this code, you need to configure the PostgreSQL development environment, and after writing, you need to compile it into a dynamic library and place the library file in the PostgreSQL library directory. Finally, you need to set the shared_preload_libraries
in the postgresql.conf
to load the plugin and restart the PostgreSQL service.
Note that the actual situation may be more complicated because this example is just a starting point. In the generated execution plan, the full table scan may be encapsulated in more complex nodes, such as subquery nodes or CTE nodes. Therefore, it may be necessary to recursively traverse the entire plan tree to ensure that all full table scans are captured.
Before deploying such a custom plug-in, ensure that it is adequately tested to avoid negative impacts on the performance and database stability.
Indeed, the previous example code did not contain logic to traverse the execution tree. In practice, you usually need to recursively check each node in the execution plan tree to determine whether any child nodes have performed a full table scan (SeqScan). The following sample code provides a simplified function that recursively traverses the execution plan tree and a modified ExecutorStart_hook
, so as to find and output information about the operation of full table scans.
#include "postgres.h"
#include "executor/executor.h"
#include "nodes/plannodes.h"
#include "utils/rel.h"
#include "catalog/pg_class.h"
#include "catalog/namespace.h"
#include "utils/lsyscache.h"
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
static void myExecutorStart(QueryDesc *queryDesc, int eflags);
static void findSeqScan(Plan *plan, QueryDesc *queryDesc);
static ExecutorStart_hook_type prev_ExecutorStart = NULL;
/* Recursively traverse the execution plan tree to find the SeqScan node */
static void findSeqScan(Plan *plan, QueryDesc *queryDesc) {
if (plan == NULL) {
return;
}
if (IsA(plan, SeqScan)) {
Oid relid = ((SeqScan *)plan)->scanrelid;
char *relname = get_rel_name(relid);
Oid schema_oid = get_rel_namespace(relid);
char *schemaname = get_namespace_name(schema_oid);
char *dbname = get_database_name(MyDatabaseId);
elog(LOG, "Detected SeqScan: Database: %s, Schema: %s, Table: %s, SQL: %s",
dbname, schemaname, relname, queryDesc->sourceText);
}
/* Traverse child nodes */
ListCell *cell;
foreach(cell, plan->initPlan) {
SubPlan *subplan = (SubPlan *) lfirst(cell);
Plan *subplan_plan = planner_subplan_get_plan(queryDesc->plannedstmt, subplan);
findSeqScan(subplan_plan, queryDesc);
}
findSeqScan(plan->lefttree, queryDesc);
findSeqScan(plan->righttree, queryDesc);
}
static void myExecutorStart(QueryDesc *queryDesc, int eflags) {
if (prev_ExecutorStart) {
prev_ExecutorStart(queryDesc, eflags);
} else {
standard_ExecutorStart(queryDesc, eflags);
}
/* Start recursively traversing the execution plan tree */
findSeqScan(queryDesc->plannedstmt->planTree, queryDesc);
}
void _PG_init(void) {
prev_ExecutorStart = ExecutorStart_hook;
ExecutorStart_hook = myExecutorStart;
}
void _PG_fini(void) {
ExecutorStart_hook = prev_ExecutorStart;
}
In this code, we define a findSeqScan
function to recursively traverse the execution plan tree. At each node, it checks if it is a SeqScan node. If so, it gets the name of the associated table, schema, and database, and records this information in logs.
In addition, for nodes that contain subqueries or subplans, we also recursively traverse these subnodes. The initPlan list contains SubPlan nodes in the execution plan, which may be subqueries or CTEs. By calling the planner_subplan_get_plan
, we can get the Plan structure corresponding to each subplan and recursively traverse them.
Note that this sample code is simplified for clearer illustration. In a real deployment, you may need to deal with more node types and consider different cases such as parametric scans, functional scans, and so on. You also need to perform adequate testing to ensure that the hook logic does not interfere with the normal operation of the database and does not introduce unnecessary performance overhead.
Optimization on Memory Usage During Rust Cargo Code Compiling
PolarDB for PostgreSQL | Database Performance Optimization for High-Concurrency Queue Processing
digoal - June 28, 2021
digoal - May 16, 2019
ApsaraDB - May 24, 2024
digoal - January 19, 2021
Alibaba EMR - February 20, 2023
digoal - July 4, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreMigrating to fully managed cloud databases brings a host of benefits including scalability, reliability, and cost efficiency.
Learn MoreMore Posts by digoal