All Products
Search
Document Center

Platform For AI:Text summarization

Last Updated:Mar 05, 2026

The Text Summarization component uses an automatic summarization algorithm based on the TextRank model to extract key sentences from a document. This process generates a concise and coherent summary that accurately captures the main idea of the original document. This topic describes how to configure the Text Summarization component.

Limits

The supported computing engine is MaxCompute.

Usage notes

Add a Sentence Splitting component upstream to split the text into one sentence per row.

Component configuration

You can configure the component parameters in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the Designer workflow page.

Tab

Parameter

Description

Fields Setting

Column for document ID

Enter the name of the column that contains document IDs.

Sentence column

Specify one column.

Parameters Setting

Number of key sentences to output

The default value is 3.

Sentence similarity calculation method

The method to calculate sentence similarity:

  • Ics_sim

  • leveshtein_sim

  • ssk

  • cosine

Weight of matching string

This parameter is active when Sentence similarity calculation method is set to ssk. The default value is 0.5.

Length of substring

This parameter is active when Sentence similarity calculation method is set to ssk or cosine. The default value is 2.

Damping factor

The default value is 0.85.

Maximum iterations

The default value is 100.

Convergence coefficient

The default value is 0.000001.

Execution tuning

Number of cores

Automatically allocated.

Memory per core

Automatically allocated.

Method 2: Use PAI commands

You can use PAI commands to configure the component parameters. To do this, use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name TextSummarization
    -project algo_public
    -DinputTableName="test_input"
    -DoutputTableName="test_output"
    -DdocIdCol="doc_id"
    -DsentenceCol="sentence"
    -DtopN=2
    -Dlifecycle=30;

Parameter

Required

Description

Default value

inputTableName

Yes

The input table name.

None

inputTablePartitions

No

The partitions in the input table to use for computation.

All partitions of the input table

outputTableName

Yes

The output table name.

None

docIdCol

Yes

The name of the column that contains document IDs.

None

sentenceCol

Yes

The sentence column. You can specify only one column.

None

topN

No

The output consists of the first few key sentences.

3

similarityType

No

The method to calculate sentence similarity:

  • Ics_sim

  • leveshtein_sim

  • ssk

  • cosine

lcs_sim

lambda

No

The weight of a matching string. This parameter is available when `similarityType` is set to ssk.

0.5

k

No

The length of a substring. This parameter is available when `similarityType` is set to ssk or cosine.

2

dampingFactor

No

The damping factor.

0.85

maxIter

No

The maximum number of iterations.

100

epsilon

No

The convergence coefficient.

0.000001

lifecycle

No

The lifecycle of the output table.

None

coreNum

No

The number of cores for computation.

Automatically allocated by the system

memSizePerCore

No

The memory required for each core.

Automatically allocated by the system

Example

  1. Prepare the input table `test_input`. The following table shows sample data.

    You can use the MaxCompute client to create a table and use Tunnel commands to upload data. For more information about how to install and configure the MaxCompute client, see Connect using the local client (odpscmd). For more information about Tunnel commands, see Tunnel commands.

    doc_id

    sentence

    1000897

    Since the COVID-19 outbreak, the consumption of wild animals has become a prominent issue. This poses a great risk to public health and has drawn widespread social concern. Public security, forestry, and market regulation departments across the country have launched special campaigns to combat the illegal hunting, trafficking, and consumption of wild animals, achieving notable success. While cracking down on these illegal activities, law enforcement found that a large consumer base, enormous poaching profits, and the difficulty and high cost of identification are key reasons the illegal wildlife trade continues to thrive.

    Where:

    • doc_id: The document ID column.

    • sentence: The sentence column.

  2. Use the Sentence Splitting component to split the text in the `sentence` column into one sentence per row. The output table is named `test_output`. The following table shows the content. For more information, see Sentence Splitting.

    doc_id

    sentence

    1000897

    Since the COVID-19 outbreak, the consumption of wild animals has become a prominent issue.

    1000897

    This poses a great risk to public health and has drawn widespread social concern.

    1000897

    Public security, forestry, and market regulation departments across the country have launched special campaigns to combat the illegal hunting, trafficking, and consumption of wild animals, achieving notable success.

    1000897

    While cracking down on these illegal activities, law enforcement found that a large consumer base, enormous poaching profits, and the difficulty and high cost of identification are key reasons the illegal wildlife trade continues to thrive.

  3. Run the following PAI command to generate a text summary.

    You can use an SQL Script component or an ODPS SQL Node component to run the following PAI command.

    PAI -name TextSummarization
        -project algo_public
        -DinputTableName="test_output"
        -DoutputTableName="test_output1"
        -DdocIdCol="doc_id"
        -DsentenceCol="sentence"
        -DtopN=2
        -Dlifecycle=30;

    The output table has two columns: doc_id and abstract.

    doc_id

    abstract

    1000897

    Since the COVID-19 outbreak, the consumption of wild animals has become a prominent issue. Public security, forestry, and market regulation departments across the country have launched special campaigns to combat the illegal hunting, trafficking, and consumption of wild animals, achieving notable success.

References

  • The Sentence Splitting component preprocesses data by splitting a text segment into one sentence per row. For more information, see Sentence Splitting.

  • For more information about Designer, see Designer overview.