All Products
Search
Document Center

Platform For AI:Split word (generate model)

Last Updated:Mar 05, 2026

This topic introduces the Split Word (Generate Model) algorithm component provided by Designer.

The Split Word (Generate Model) algorithm component is based on the Alibaba Word Segmenter (AliWS) lexical analysis system. It generates a word segmentation model based on parameters and a custom dictionary.

The Split Word (Generate Model) component supports Chinese word segmentation for the Taobao and the Internet domains.

Differences from Split Word:

  • The Split Word component directly segments the input text.

  • The Split Word (Generate Model) component generates a word segmentation model. To segment text, you must first deploy the model, and then make predictions or call the online API.

Component configuration

You can configure the Split Word (Generate Model) component in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the Designer workflow page.

Tab

Parameter

Description

Fields Setting

Selected Field Column

The field column used to generate the model.

Parameters Setting

Recognized Options

The content type to detect. Valid values:

  • Detect simple entities

  • Detect person names

  • Detect organization names

  • Detect phone numbers

  • Time detected

  • Detection date

  • Detect numbers and letters

Default: Detect simple entities, Detect phone numbers, Detect time, Detect dates, and Detect numbers and letters are selected.

Merge Options

The content type to merge. Valid values:

  • Merge Chinese numerals

  • Merge Arabic numerals

  • Merge Chinese dates

  • Merge Chinese time

Default: Merge Arabic numerals is selected.

Tokenizer

The type of filter. Valid values: TAOBAO_CHN and INTERNET_CHN. Default: TAOBAO_CHN.

Pos Tagger

Specifies whether to perform part-of-speech tagging. By default, this feature is disabled.

Semantic Tagger

Specifies whether to perform semantic tagging. By default, this feature is disabled.

Filter out words that contain only numbers

Specifies whether to filter out segmented words that are numbers. By default, this feature is disabled.

Filter out words that contain only English letters

Specifies whether to filter out segmented words that are all-English. By default, this feature is disabled.

Filter out words that contain only punctuation marks

Specifies whether to filter out segmented words that are punctuation marks. By default, this feature is disabled.

Execution Tuning

Number of cores

By default, the system assigns it.

Memory per core

The system automatically allocates resources.

Method 2: Use PAI commands

You can run PAI commands in the SQL Script component to configure the component. For more information, see SQL Script.

pai -name split_word_model
    -project algo_public
    -DoutputModelName=aliws_model
    -DcolName=content
    -Dtokenizer=TAOBAO_CHN
    -DenableDfa=true
    -DenablePersonNameTagger=false
    -DenableOrgnizationTagger=false
    -DenablePosTagger=false
    -DenableTelephoneRetrievalUnit=true
    -DenableTimeRetrievalUnit=true
    -DenableDateRetrievalUnit=true
    -DenableNumberLetterRetrievalUnit=true
    -DenableChnNumMerge=false
    -DenableNumMerge=true
    -DenableChnTimeMerge=false
    -DenableChnDateMerge=false
    -DenableSemanticTagger=true

Parameter Name

Required

Description

Default Value

userDictTableName

No

Specifies whether to use a custom dictionary table. A custom dictionary table has only one column, and each row contains one word.

None

outputModelName

Yes

The name of the output model.

None

colName

No

The column name of the text for prediction.

context

dictTableName

No

Specifies whether to use a custom dictionary table. A custom dictionary table has only one column, and each row contains one word.

None

tokenizer

No

The filter type. Valid values: TAOBAO_CHN and INTERNET_CHN.

TAOBAO_CHN

enableDfa

No

Specifies whether to detect simple entities. Valid values: True and False.

True

enablePersonNameTagger

No

Specifies whether to detect person names. Valid values: True and False.

False

enableOrgnizationTagger

No

Specifies whether to detect organization names. Valid values: True and False.

False

enablePosTagger

No

Specifies whether to perform part-of-speech tagging. Valid values: True and False.

False

enableTelephoneRetrievalUnit

No

Specifies whether to detect phone numbers. Valid values: True and False.

True

enableTimeRetrievalUnit

No

Specifies whether to detect time. Valid values: True and False.

True

enableDateRetrievalUnit

No

Specifies whether to detect dates. Valid values: True and False.

True

enableNumberLetterRetrievalUnit

No

Specifies whether to detect numbers and letters. Valid values: True and False.

True

enableChnNumMerge

No

Specifies whether to merge Chinese numerals into a retrieval unit. Valid values: True and False.

False

enableNumMerge

No

Specifies whether to merge regular numbers into a retrieval unit. Valid values: True and False.

True

enableChnTimeMerge

No

Specifies whether to merge Chinese time expressions into a semantic unit. Valid values: True and False.

False

enableChnDateMerge

No

Specifies whether to merge Chinese date expressions into a semantic unit. Valid values: True and False.

False

enableSemanticTagger

No

Specifies whether to perform semantic tagging. Valid values: True and False

False

Examples

  • PAI command

    pai -name split_word_model
        -project algo_public
        -DoutputModelName=aliws_model
  • Deployment

    create onlinemodel ning_test_aliws_model_2 -offlinemodelName ning_test_aliws_model -instanceNum 1 -cpu 100 -memory 4096;
  • Online word segmentation

    KVJsonRequest request = new KVJsonRequest();
    Map<String, JsonFeatureValue> row = request.addRow();
    row.put(col_name, new JsonFeatureValue("The big data algorithm platform is a new platform"));
    KVJsonResponse res = predictClient.syncPredict(new JsonPredictRequest(project_name, model_name, request));
    List<ResponseItem> ri = res.getOutputs();
    for (ResponseItem item : ri) {
            System.out.println(item.getOutputLabel());
     }
  • Offline word segmentation

    pai -name prediction
        -DmodelName=ning_test_aliws_model
        -DinputTableName=ning_test_aliws
        -DoutputTableName=ning_test_aliws_offline_predict;