All Products
Search
Document Center

Platform For AI:XGboost Predict

Last Updated:Feb 22, 2024

XGBoost is an extension of the gradient boosting algorithm. XGBoost provides improved usability and robustness and is widely used in machine learning production systems and machine learning competitions. XGBoost can be used for classification and regression. The XGboost Predict component is developed based on the open-source XGBoost algorithm by the Platform for AI (PAI) team. You can perform offline model inference based on the models trained by the XGboost Predict component. This topic describes how to configure the XGboost Predict component.

Limits

You can use the XGboost Predict component based on MaxCompute, Flink, and Deep Learning Containers (DLC) resources.

Data formats

Table and LibSVM formats are supported.

  • Sample table-formatted data:

    f0

    f1

    label

    0.1

    1

    0

    0.9

    2

    1

  • Sample LibSVM-formatted data:

    Sample data

    1 2:1 9:1 10:1 20:1 29:1 33:1 35:1 39:1 40:1 52:1 57:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 101:1 104:1 116:1 123:1

    0 0:1 9:1 18:1 20:1 23:1 33:1 35:1 38:1 41:1 52:1 55:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 101:1 105:1 115:1 121:1

    1 2:1 8:1 18:1 20:1 29:1 33:1 35:1 39:1 41:1 52:1 57:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 101:1 104:1 116:1 123:1

    0 2:1 9:1 13:1 21:1 28:1 33:1 36:1 38:1 40:1 53:1 57:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 97:1 105:1 113:1 119:1

    0 0:1 9:1 18:1 20:1 22:1 33:1 35:1 38:1 44:1 52:1 55:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 101:1 104:1 115:1 121:1

    0 0:1 8:1 18:1 20:1 23:1 33:1 35:1 38:1 41:1 52:1 55:1 64:1 68:1 76:1 85:1 87:1 91:1 94:1 101:1 105:1 116:1 121:1

Configure the component in the PAI console

You can configure the XGboost Predict component in Machine Learning Designer. The following table describes the parameters.

Parameter

Data type

Description

Field Setting

reservedCols

String array

The reserved columns.

featureCols

String array

The table-formatted feature columns. The values of the featureCols and vectorCol parameters are mutually exclusive. The input data must be of the table type.

vectorCol

String

The LibSVM-formatted vector column. The values of the featureCols and vectorCol parameters are mutually exclusive. The input data must be of the LibSVM type.

Parameter Setting

Prediction Result Column

String

The prediction column in the output.

Execution Tuning

Number of Workers

Positive integer

The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. Valid values: [1, 9999].

Memory per worker, unit MB

Positive integer

The memory size of each worker. Unit: MB. Valid values: [1024, 64 × 1024].

Procedure

This example uses a Higgs boson classification scenario to describe how to use the XGboost component in Machine Learning Designer. The pipeline used in this example is built based on a preset template. For information about how to create a pipeline based on the Use XGBoost algorithm to identify the Higgs boson template, see Create a pipeline from a preset template.

This component generates JSON strings that are serialized from JSON objects outputted by the open source XGBoost library. To evaluate the data generated by the component, you need to convert the data to a format that is supported by an evaluation component. In this example, you can add an SQL script component as a downstream component of the XGboost Predict component to serialize the component output to a format that is supported by the subsequent Binary classification Evaluation V2 component. The following sample code provides an example on how to configure the SQL Script component to convert the data format. For more information, see XGBoost Parameters.

set odps.sql.udf.getjsonobj.new=true;

select *, CONCAT("{\"0\":", 1.0-prob, ",\"1\":", prob, "}") as detail
FROM (
select *, cast(get_json_object(pred, '$[0]') as double) as prob FROM ${t1})

References

  • You can use the XGboost Predict component together with the XGboost Train component. For more information about how to configure the XGboost Train component, see XGboost Train.

  • Machine Learning Designer provides various preset algorithm components. You can select a component for data processing based on your business requirements. For more information, see Component reference: Overview of all components.