All Products
Search
Document Center

Platform For AI:LLM - Special content removal (DLC)

Last Updated:Jun 20, 2026

The LLM - Special Content Removal (DLC) component removes URL links, strips HTML tags, and parses the resulting text. The input OSS data file must be in JSONL format (example), where each line is a valid JSON object, but the entire file is not.

Supported compute resources

DLC

Algorithm

The LLM - Special Content Removal (DLC) component supports the following features:

  • Remove URL links

    Removes characters from the text that match the regular expression r'(https?|http)?:\/\/[\w\.\/\?\=\&\%\-\_]+'.

  • Remove HTML tags and parse HTML text

    Replaces '<li>' and '<ol>' with '\n*', and removes the '</li>' and '</ol>' tags. The component then parses and returns the resulting text.

For example, to remove URL links from an article:

Before

Before processing, the current field value is the minified source code of AngularJS v1.3.0-beta.2, where the URL http://angularjs.org is highlighted. A snippet of the field value:

/*
 AngularJS v1.3.0-beta.2
 (c) 2010-2014 Google, Inc. http://angularjs.org
 License: MIT
*/
(function(H,a,A){'use strict';function D(p,g){g=g||
{};a.forEach(g,function(a,c){delete g[c]});for(var c in
p)!p.hasOwnProperty(c)||"$"===c.charAt(0)&&"$"===c.charAt(1)|| // ...

After

The current field value dialog box displays the processed content. It is a snippet of minified JavaScript code from AngularJS v1.3.0-beta.2, including copyright comments ((c) 2010-2014 Google, Inc., License: MIT) and partial function definitions. The URL http://angularjs.org has been removed.

Configure the component

In the Designer workflow, add the LLM - Special Content Removal (DLC) component and configure its parameters in the right-side pane.

Parameter type

Parameter

Required

Description

Default

field settings

target processing field

Yes

The name of the field to process.

None

Remove URL links

No

Whether to remove URL links from the text.

Selected

Remove HTML tags and parse HTML text

No

Whether to remove HTML tags and parse the resulting text.

Not selected

data output OSS directory

No

The OSS directory to store the processed data. If this parameter is left empty, the component uses the default workspace path.

None

execution tuning

number of processes

No

The number of processes to use for the job.

8

Select resource group

public resource group

No

Select the instance specification (CPU or GPU), number of nodes, and Virtual Private Cloud.

None

dedicated resource group

No

Select the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes.

None

maximum runtime

No

The maximum runtime of the component. If this time is exceeded, the system terminates the job.

None