Test trained text summarization models and evaluate inference performance based on prediction results.
Prerequisites
OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant permissions.
Limits
Only DLC compute resources are supported.
Configure component parameters in the UI
In Designer, configure component parameters in the UI.
-
Input
Input port (from left to right)
Type
Recommended upstream component
Required
Prediction data
OSS
Yes
Prediction model
Component output
No
-
Component configuration
Tab
Parameter
Description
Field Settings
Input data format
Text columns of the input file. The default value is target:str:1,source:str:1.
Source text column
Column name for source text in the input table. The default value is source.
Appended output columns
Specified text columns from the input file appended to output text columns. Separate multiple column names with a comma (,). The default value is source.
Output columns
Column names for the sink table. The default value is predictions,beams.
Prediction data output
OSS Bucket path where the prediction result file is stored.
Use custom model
Whether to use the default PAI model for direct prediction. Valid values:
-
Yes
-
No (default)
Is Megatron model
Only pre-trained models with the `mg` prefix listed in the Text Summarization Train component are supported. Valid values:
-
Yes
-
No (default)
Model path
Required only when Use custom model is set to Yes.
Storage path of the custom model in an OSS Bucket.
Parameter Settings
Batch size
Batch processing size during training. This is an INT type. The default value is 8.
For multi-GPU servers, this parameter specifies the batch size for each GPU.
Maximum text length
Maximum length of the entire sequence. This is an INT type. The value must be in the range (1, 512). The default value is 512.
Language
Language for text processing:
-
zh: Chinese.
-
en: English.
Copy text from source
Whether to use a copy mechanism. Valid values:
-
false (default)
-
true
Minimum decoder length
Minimum length of the decoder. This is an INT type. The default value is 12. The model output length must be greater than this value.
Maximum decoder length
Maximum length of the decoder. This is an INT type. The default value is 32. The model output length must be less than this value.
Minimal Unique Field
Size of the non-repeating segment (n-gram). This is an INT type. The default value is 2.
Beam search size
Size for beam search. This is an INT type. The default value is 5.
Number of returned candidates
Number of results to return. This is an INT type. The default value is 5.
ImportantThis parameter must be the same as Beam search size.
Execution Tuning
GPU type
GPU type of the compute resource. The default value is gn5-c8g1.2xlarge.
-
Example
Build a workflow using the Text Summarization Predict component. Two methods are available:
-
Method 1: Use a model fine-tuned by the Text Summarization Train component.

-
Method 2: Use a custom model.

Configure the component and run the workflow:
-
Build a workflow. For more information, see the Example section of the Text Summarization Train topic.
-
Prepare data to summarize (predict_data.txt) and upload it to an OSS bucket. The test data in this example is a tab-delimited TXT file.
CSV files are also supported. Use the Tunnel command of the MaxCompute client to upload the dataset to MaxCompute. For more information about how to install and configure the MaxCompute client, see Connect to MaxCompute using the client (odpscmd). For more information about the Tunnel command, see Tunnel commands.
-
Read the test dataset using the Read OSS Data-3 component in Method 1 or the Read OSS Data-1 component in Method 2. Set the OSS Data Path parameter to the OSS path where the test dataset is stored.
-
Connect the model file and test dataset to the Text Summarization Predict component and configure its parameters. For more information, see Configure component parameters in the UI.
-
For models fine-tuned by the Text Summarization Train component, connect the model output port of the Text Summarization Train component to the model input port of the Text Summarization Predict component.
-
For custom models, on the Field Settings tab, set the Use custom model parameter to Yes and set the Model path parameter to the OSS path where the model is stored.
-
-
Click the
button to run the workflow. After successful execution, view the output summary in the OSS path specified for the Prediction data output parameter of the Text Summarization Predict component.
References
-
For more information about how to configure the Text Summarization Train component, see Text Summarization Train.