All Products
Search
Document Center

PolarDB:Feature management

Last Updated:Mar 22, 2024

This topic describes the scenarios in which feature management is applicable in PolarDB for AI. This topic also describes how to create, update, and delete a feature, and view feature status and the feature list.

Scenarios

In most cases, it is difficult to build an AI model by using raw data. Before raw data is input into the model, the raw data is preprocessed or pre-calculated. The raw and preprocessed data are used as features.

For example, if a data set contains multiple items related to a house, such as the size, number of bedrooms, geographic location, and year of construction, each item can be considered a feature.

The selection and preparation of features are essential for building an effective AI model. Ideal features can greatly improve the performance of the model, while irrelevant or redundant features may adversely impact the learning process of the model and degrade model performance.

Feature engineering involves operations related to features, such as selecting the most relevant features, converting raw data into a format that is suitable for machine learning algorithms, and creating new features derived from the raw data.

Features are commonly used in the following scenarios:

  • Classification

    • Email classification: uses words and phrases as features to identify spam emails.

    • Image recognition: uses features such as pixel values, colors, textures, and shapes to classify images.

    • Customer churn prediction: uses features related to customers, such as their historical transactions, engagement level, and purchase frequency, to predict the retention rate of customers in the platform.

  • Regression

    • Housing price prediction: uses features such as the location, size, age, and amenities of a house to predict its future prices.

    • Stock price prediction: uses features such as historical prices, trading volume, and economic indicators to predict future stock prices.

  • Clustering

    • Market segmentation: uses features such as the purchase history and preferences of customers, and demographic information to group customers.

    • Social network analysis: uses features such as the social activity data and connectivity of users to discover groups in social networks.

  • Recommendation

    • Movie recommendation: uses features such as the movies watched by users and the ratings, type, and actors of the movies to recommend movies.

    • Product recommendation: uses features such as the purchase history and browsing behavior of users, and product attributes to recommend products.

  • Time series analysis

    • Weather forecast: uses features such as historical weather conditions, dates, and solar terms to predict future weather conditions.

    • Financial forecast: uses features such as the financial statements and market trends of a company to forecast the financial status of the company.

  • Anomaly detection

    • Credit card fraud detection: uses features such as the time, location, and amount of a transaction to identify unusual transactions.

    • Network security: uses features such as network traffic data, connection type, and traffic size to detect network attacks.

  • Natural language processing (NLP)

    • Sentiment analysis: uses features such as words, phrases, and sentence structures to analyze the emotional tendencies of text.

    • Machine translation: uses features such as words, grammatical structure, and context to assist in machine translation.

  • Computer vision

    • Facial recognition: uses features such as key points, contours, and textures of the face to identify individuals.

    • Object detection: uses features such as the shape, size, and color of an object to detect objects in an image.

Features are widely used in all fields. Feature selection and engineering are critical steps in the model building process. They directly affect the performance of models.

Create a feature

Features are created in an asynchronous manner. You can check whether a feature has been created by querying its status.

Syntax

  • Create a feature by using data in a database.

    CREATE FEATURE feature_name WITH (feature_class= '',parameters=()) AS (SELECT select_expr[, select_expr] ... FROM table_reference)
  • Create a feature by using an existing feature.

    CREATE FEATURE feature_name WITH (feature_class = '',parameters=()) USING FEATURE feature_reference

Parameters

Parameter

Description

feature_name

The name of the feature.

feature_class

The type of the feature. Valid values:

  • graph: a graph. This value can be used only when you create the feature by using data in your database.

  • connected_subgraph: a connected graph. This value can be used only when you create the feature by using an existing feature. The existing feature must be created by using a graph.

  • iv: evaluates the predictive power of a feature. Valid values range from 0 to positive infinity. This value can be used only when you create the feature by using data in your database. For more information, see Information Value.

  • none_value: measures the proportion of NULL values in a feature to evaluate the validity of a feature. This value can be used only when you create the feature by using data in your database.

    Note
    • If the proportion of NULL values is high (such as ≥80%), you can discard the feature.

    • If the proportion of NULL values is low, you can retain the feature and fill in the missing values to reduce the impact.

  • pearson: measures the linear correlation of two features. This value can be used only when you create the feature by using data in your database. For more information, see Pearson Correlation Coefficient.

  • psi: measures the difference between the expected distribution and actual distribution of data to evaluate the stability of a model. This value can be used only when you create the feature by using data in your database. For more information, see Population Stability Index.

parameters

Custom parameters for creating the feature.

select_expr

The name of the column used to create the feature.

table_reference

The name of the table containing the column used to create the feature.

feature_reference

The name of an existing feature.

Examples

  • Create a feature by using data in a database.

    /*polar4ai*/create feature test_graph_1 with (feature_class='graph') as (select ip, user_id from small_sample limit 30)

    Sample result:

    Query OK, 0 rows affected (0.79 sec)
  • Create a feature by using an existing feature.

    /*polar4ai*/create feature test_subgraph_1 with (feature_class='connected_subgraph') using feature test_graph_1

    Sample result:

    Query OK, 0 rows affected (0.79 sec)

Update a feature

Features are updated in an asynchronous manner. You can check whether a feature has been updated by querying its status.

Syntax

UPDATE FEATURE feature_name WITH (feature_class = '',parameters=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)

Parameters

Parameter

Description

feature_name

The name of the feature.

feature_class

The type of the feature. Set the value to graph.

parameters

Custom parameters for creating the feature.

select_expr

The name of the column used to update the feature.

Note
  • If the column used to update the feature has a different name from the column used in an existing feature but contains the same information, you must add aliases to the two columns to make their names the same.

  • If the column used to update the feature has the same name as the column used in an existing feature but contains different information, you must add aliases to the two columns to make their names different.

table_reference

The name of the table containing the column used to update the feature.

Examples

Modify the columns of the test_subgraph_1 feature to the ip and user_id columns in the small_sample table.

/*polar4ai*/update feature test_graph_1 with (feature_class='graph') as (select ip, user_id from small_sample limit 30)

Sample result:

Query OK, 0 rows affected (0.79 sec)

View feature status

View the current status of a feature. In most cases, you can check whether a feature is created or updated after you create or update the feature. A feature may be in one of the following states:

  • INIT: The feature is initialized.

  • PREPARING: The data for the feature is being prepared.

  • PREPARED: The data for the feature is prepared.

  • COMPUTING: The feature is being computed.

  • COMPUTED: The feature is computed.

  • RESTORING: The feature results are being stored.

  • RESTORED: The feature results are stored.

  • FINISHED: The feature computing is successful.

  • FAILED: The feature computing failed.

Syntax

SHOW FEATURE feature_name 

Parameters

Parameter

Description

feature_name

The name of the feature.

Examples

View the status of the test_subgraph_1 feature.

/*polar4ai*/show feature test_subgraph_1;

Sample result:

+-----------+-------------------------------------------------------------------------------------------------------------------------------+
| CSGStatus | featurePath                                                                                                                   |
+-----------+-------------------------------------------------------------------------------------------------------------------------------+
| FINISHED  | http://db4ai-collie-cn-hangzhou-dataset.oss-cn-hangzhou.aliyuncs.com/xxxx.txt?OSSAccessKeyId=xxxx&Expires=xxxx&Signature=xxxx |
+-----------+-------------------------------------------------------------------------------------------------------------------------------+
Note

The validity period of featurePath is 100 minutes. You can use the Expires parameter in featurePath to check whether the link is valid. You must access the link within the validity period.

View the feature list

You can view all features for the current cluster.

Syntax

SHOW FEATURES

Examples

/*polar4ai*/show features;

Sample result:

+--------------------+-----------------+----------------+
| feature_class      | feature_name    | feature_status |
+--------------------+-----------------+----------------+
| connected_subgraph | test_subgraph_1 | DELETED        |
| graph              | test_graph_1    | FINISHED       |
+--------------------+-----------------+----------------+
2 rows in set (0.32 sec)

Delete a feature

You can delete an existing feature.

Syntax

DROP FEATURE feature_name 

Parameters

Parameter

Description

feature_name

The name of the feature.

Examples

Deletes the existing feature test_subgraph_1.

/*polar4ai*/drop feature test_subgraph_1

The following result indicates that the feature has been deleted:

Query OK, 0 rows affected (0.69 sec)