Feature engineering is essential to model training in machine learning. Feature engineering helps find feature crosses that are beneficial to models. Generally, algorithm engineers must spend a lot of efforts in feature engineering. Machine Learning Platform for AI (PAI) provides the Auto Feature Cross component to help you find effective feature crosses. You can combine the features that form the feature crosses to optimize your model. This topic describes how to use the Auto Feature Cross component.
Flowchart
The Auto Feature Cross component is developed based on the deep learning framework TensorFlow. This component involves intensive parallel computing at the underlying layer and requires GPU resources. Only the China (Beijing) and China (Shanghai) regions support Auto Feature Cross component.

1.Authorize PAI to access your GPU resources and OSS bucket
- Log on to the PAI console. In the left-side navigation pane, choose Model Training > Visualized Modeling (Machine Learning Studio). On the page that appears, find the project in which you want to perform operations and click Machine Learning in the Actions column.
- On the page that appears, click Settings in the left-side navigation pane. On the Settings page, select Authorize Machine Learning Platform for AI to access my OSS resources and Pay by used on the General tab.
2.Bin data
The Auto Feature Cross component supports only the BIGINT data type. However, raw data in most business scenarios are of the DOUBLE data type, as shown in the following figure.

In this case, you must use the SQL Script or One Hot Encoding component to convert the raw data from the DOUBLE type to the BIGINT type. In addition, you must use the Feature Discretization component to decompose feature data in different intervals into different bins. The following figure shows the data after binning.

3.Determine the range of feature values
- The maximum value of the thalach feature is 4.
- The maximum value of the oldpeak feature is 3.
- The maximum value of the ca feature is 4.

You can execute the following SQL statement to obtain the maximum value of each feature:
select max(feature) from table;
In the sample data of this topic, the maximum value of all features after binning is 4.

You must set the Feature length parameter of the Auto Feature Cross component in the format shown in the following figure. In the format, 5 indicates a left-closed, right-open interval [0,5) that includes 4.

4.Prepare training data and test data
In this topic, the training data is the same as the test data. In actual use, the test data can differ from the training data, provided that the fields in the test data are the same as the fields in the training data.
5.Fields Setting tab
- Set the parameters on the Fields Setting tab
In the Auto Feature Cross component, the input port on the left is used to import training data and the input port on the right is used to import test data.
- Feature selection: the feature columns that are selected for feature crossing.
- if sparse data: specifies whether the input data is in the sparse format. This check box is cleared by default, which means that the data is in the dense format.
- Label: the label column that is used to determine whether a feature cross is effective.
- Output path: the URL of the OSS bucket that stores the generated model.
- Parameters Setting tab
- Ergodic number: the number of iterations.
- Feature order: the maximum number of features in each feature cross. For example, 3 indicates that each feature cross involves a maximum of three features.
PAI -name fives_ext -project algo_public
-DlabelColName="ifhealth" // The label column that is used to determine whether a feature cross is effective.
-Dmetric_file="metric_log.log" // The name of the system log file.
-Dfeature_meta="[5,5,5,5,5,5,5,5,5,5,5,5,5]"
-DtrainTable="odps://Project name/tables/Table name"
-Dbuckets="oss://{oss_bucket}/"
-Dthreshold="0.5"
-Dk="3"
-DossHost="oss-cn-beijing-internal.aliyuncs.com" // The region in which OSS is activated.
-Demb_dims="16"
-DenableSparse="0"
-Dtemp_anneal_steps="30000"
-DfeatureColName="sex,cp,fbs,restecg,exang,slop,thal,age,trestbps,chol,thalach,oldpeak,ca" // The feature columns that are selected for feature crossing.
-DtestTable="odps:// Project name/tables/Table name"
-Darn="acs:ram::********:role/aliyunodpspaidefaultrole" //rolearn
-Depochs="1500"
-DcheckpointDir="oss://{oss_bucket}/{path}/";
View the feature crosses
In the root directory of your OSS bucket, find the interactions.json file. The root directory of your OSS bucket is specified by Dbuckets.

- [0,1] indicates that the cross of the first and second features is effective. The feature order in each feature cross is the same as the feature order in the input table.
- [8, 6, 5] indicates that the cross of the ninth, seventh, and fifth features is effective.