All Products
Search
Document Center

Platform For AI:Stop word filter

Last Updated:Mar 05, 2026

This topic describes the Stop Word Filter component in Designer.

The Stop Word Filter component is a pre-processing method in text analytics. It filters noise, such as "of", "is", or "a", from tokenization results.

The Stop Word Filter component takes two inputs: an input table and a stop word table. The input table contains the text to filter. The stop word table is a single-column table where each row is a stop word.

You can configure the Stop Word Filter component in Designer using the GUI or PAI commands.

Component configuration

You can configure the Stop Word Filter component in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the workflow page in Designer.

Tab

Parameter

Description

Fields Setting

Column to Filter

The column to filter. Separate multiple columns with commas (,).

Execution Tuning

Number of cores

Automatically allocated by the system.

Memory size

Automatically allocated by the system.

Method 2: Use a PAI command

You can use a PAI command to configure the component parameters. You can run PAI commands using the SQL Script component. For more information, see SQL Script.

PAI -name FilterNoise -project algo_public \
    -DinputTableName=”test_input” -DnoiseTableName=”noise_input” \
    -DoutputTableName=”test_output” \
    -DselectedColNames=”words_seg1,words_seg2” \
    -Dlifecycle=30

Parameter name

Required

Description

Default value

inputTableName

Yes

The name of the input tokenization table.

None

inputTablePartitions

No

Enter the partition name for the token table.

All partitions

noiseTableName

Yes

The name of the stop word table.

None

noiseTablePartitions

No

The name of the partition for the stopword list.

All partitions

outputTableName

Yes

The name of the output table.

None

selectedColNames

Yes

The columns to filter. Separate multiple columns with commas (,).

None

lifecycle

No

The lifecycle of the output table. The value must be a positive integer.

None

coreNum

No

The number of cores for the computation.

Automatically allocated by the system.

memSizePerCore

No

The memory size for each core.

Automatically allocated by the system.