All Products
Search
Document Center

Platform For AI:LLM-Length Filter (DLC)

Last Updated:Apr 27, 2025

The LLM-Length Filter (DLC) component of Platform for AI (PAI) is used to filter texts based on the text length, average length of lines in the text, and maximum line length. The input Object Storage Service (OSS) data file must be in the JSON Lines format and meet the following requirements: Each line in the file is a valid JSON object and the file consists of multiple lines of JSON objects, but the file is not a valid JSON object. For more information, see Example.

Supported computing resources

Deep Learning Containers (DLC)

Configure the component

On the pipeline page of Machine Learning Designer, configure the parameters of the LLM-Length Filter (DLC) component.

Tab

Parameter

Required

Description

Default value

Fields Setting

Target Process Field

Yes

The name of the field that you want to process.

N/A

Whether to Filter with Text Length

No

Specifies whether to filter texts based on the text length. If you select this option, you must configure the following parameters:

  • Minimum Length: Texts with lengths less than this value are filtered out.

  • Maximum Length: Texts with lengths greater than this value are filtered out.

Unselected

Whether to Filter with the Average Length of the Sample

No

The algorithm splits the text based on line breaks, calculates the average line length of the text, and then filters the text based on the average line length of the text. If you select this option, you must configure the following parameters:

  • Minimum average length: Texts with average line lengths less than this value are filtered out.

  • Maximum Average Length: Texts with average line lengths greater than this value are filtered out.

Unselected

Whether to Filter with the Longest Line Length of the Sample

No

The algorithm splits the text based on line breaks, calculates the maximum line length of the text, and filters the texts based on the maximum line length of the text. If you select this option, you must configure the following parameters:

  • Minimum length of the Longest Line: Texts with maximum line length less than this value are filtered out.

  • Maximum length of the Longest Line: Texts with maximum line length greater than this value are filtered out.

Unselected

OSS Directory for Saving OutputData

No

The OSS directory in which the generated data is stored. If you do not specify this parameter, the default path of the workspace is used.

N/A

Tuning

Number of Processes

No

The number of processes.

8

Select Resource Group

Public Resource Group

No

The instance type (CPU or GPU), number of instances, and virtual private cloud (VPC) that you want to use.

N/A

Dedicated resource group

No

The number of vCPUs, memory, shared memory, number of GPUs, and number of instances that you want to use.

N/A

Maximum Running Duration

No

The maximum period of time for which the component can run. If this period of time is exceeded, the job is terminated.

N/A