All Products
Search
Document Center

Platform For AI:Keyword extraction

Last Updated:Mar 05, 2026

Keyword extraction is a natural language processing (NLP) technique. It identifies and extracts words from a text that are highly relevant to the main topic. This technique often uses the TextRank algorithm. TextRank builds a word co-occurrence network and applies a calculation method similar to PageRank to evaluate the importance of each word. Words with higher weights are selected as keywords. This method helps you understand and summarize large amounts of text.

The common workflow is as follows:

  1. Source data

  2. Tokenize the text.

  3. Filter the words.

  4. Extract keywords.

Component configuration

Method 1: Using the GUI

On the Designer workflow page, you can add the Keyword Extraction component and configure its parameters in the pane on the right.

Parameter type

Parameter

Description

Fields setting

Document ID column

The name of the column that contains document IDs.

The word segmentation results for the article content.

The name of the column that contains the tokenized document content.

Parameters setting

Number of keywords to output

An integer. Default value: 5.

Window size

An integer. Default value: 2.

Damping coefficient

Default value: 0.85.

Maximum iterations

Default value: 100.

Convergence coefficient

Default value: 0.000001.

Execution tuning

Number of cores. Auto-assigned by default.

Selected by default.

Memory per core. Auto-assigned by default.

Selected by default.

Method 2: Using PAI commands

You can use PAI commands to configure the parameters for the Keyword Extraction component. You can use the SQL script component to call PAI commands. For more information, see SQL Script.

PAI -name KeywordsExtraction      
    -DinputTableName=maple_test_keywords_basic_input    
    -DdocIdCol=docid -DdocContent=word    
    -DoutputTableName=maple_test_keywords_basic_output    
    -DtopN=19;

Parameter

Required

Default value

Description

inputTableName

Yes

None

The input table.

inputTablePartitions

No

All partitions

The partitions in the input table to use for training. Use the Partition_name=value format. For multi-level partitions, use name1=value1/name2=value2. Separate multiple partitions with commas (,).

outputTableName

Yes

None

The name of the output table.

docIdCol

Yes

None

The name of the column that contains document IDs. You can specify only one column.

docContent

Yes

None

The Word column. You can specify only one column.

topN

No

5

The number of top keywords to return. If the total number of keywords is less than this value, all keywords are returned.

windowSize

No

2

The window size for the TextRank algorithm.

dumpingFactor

No

0.85

The damping coefficient for the TextRank algorithm.

maxIter

No

100

The maximum number of iterations for the TextRank algorithm.

epsilon

No

0.000001

The convergence residual threshold for the TextRank algorithm.

lifecycle

No

None

The lifecycle of the output table.

coreNum

No

Calculated automatically

Worker count.

memSizePerCore

No

Calculated automatically

The memory size per worker, in MB.

Example

  1. Generate data

    In the input table, separate words with spaces. Filter out stop words, such as 'the' and 'a', and all punctuation marks.

    docid:string

    word:string

    doc0

    blended-wing-body aircraft is future aviation field development a new direction many research institutions have started on blended-wing-body aircraft research and its fully-automatic shape optimization algorithm has become a new research hot-spot existing achievements basis on top of analyze compare common modeling solving platform usage methods and features design write blended-wing-body aircraft shape optimization geometric modeling grid division flow-field solving shape optimization module compare different algorithms between pros and cons implement blended-wing-body aircraft conceptual-design in shape optimization geometric modeling and grid generation module implement based-on transfinite interpolation grid generation algorithm based-on spline curve modeling method flow-field solving module includes finite difference solver finite element solver and panel method solver among them finite difference solver mainly includes based-on finite difference method potential-flow mathematical modeling based-on Cartesian grid variable step-size difference format derivation Cartesian grid generation index algorithm based-on Cartesian grid Neumann boundary-condition expression form derivation implement based-on finite difference solver two-dimensional airfoil aerodynamic parameters calculation example finite element solver mainly includes based-on variational principle potential-flow finite element theory modeling two-dimensional finite element Kutta condition expression derivation based-on least squares velocity solving algorithm design based-on Gmsh two-dimensional with-wake airfoil spatial grid generator development implement based-on finite element solver two-dimensional airfoil aerodynamic parameters calculation example panel method solver mainly includes based-on panel method potential-flow theory modeling automatic wake generation algorithm design based-on panel method three-dimensional blended-wing-body body flow-field solver development based-on Blasius flat-plate solution drag estimation algorithm design solver Fortran language on port Python and Fortran code mixed-compilation based-on OpenMP and CUDA parallel acceleration algorithm design and development implement based-on panel method solver three-dimensional blended-wing-body body aerodynamic parameters calculation example shape optimization module implemented based-on free form deformation grid deformation algorithm genetic-algorithm differential evolution algorithm aircraft surface-area calculation algorithm based-on moment integration aircraft volume calculation algorithm development based-on VTK data visualization format tool

  2. PAI command

    PAI -name KeywordsExtraction      
        -DinputTableName=maple_test_keywords_basic_input    
        -DdocIdCol=docid -DdocContent=word    
        -DoutputTableName=maple_test_keywords_basic_output    
        -DtopN=19;
  3. Output description

    docid

    keywords

    weight

    doc0

    based-on

    0.041306752223538405

    doc0

    algorithm

    0.03089845626854151

    doc0

    modeling

    0.021782865850562882

    doc0

    grid

    0.020669749212693957

    doc0

    solver

    0.020245609506360847

    doc0

    aircraft

    0.019850761705313365

    doc0

    research

    0.014193732541852615

    doc0

    finite element

    0.013831122054200538

    doc0

    solving

    0.012924593244133104

    doc0

    module

    0.01280216562287212

    doc0

    derivation

    0.011907588923852495

    doc0

    shape

    0.011505456605632607

    doc0

    difference

    0.011477831662367547

    doc0

    potential-flow

    0.010969269350293957

    doc0

    design

    0.010830986516637251

    doc0

    implement

    0.010747536556701583

    doc0

    two-dimensional

    0.010695570768457084

    doc0

    development

    0.010527342662670088

    doc0

    new

    0.010096978306668461