The LLM-Remove LaTeX Bibliography (DLC) component processes TeX documents by removing the bibliography section from the end of the text. Input data must be from an OSS file in JSONL format (example), where each line is a valid JSON object but the file as a whole is not.
Supported computing resources
Algorithm
This component identifies the bibliography in a LaTeX text using the regular expression r'(\\appendix|\\begin\{references\}|\\begin\{REFERENCES\}|\\begin\{thebibliography\}|\\bibliography\{.*\}).*$'. In this expression, multiple match patterns are separated by a vertical bar (|).
The component removes all strings that match the regular expression. The following example shows the results before and after processing.
|
Before The Current field value pop-up displays a LaTeX code snippet from the end of the |
After The Current field value pop-up shows the result after processing. Only the header comment text from the LaTeX file remains: |
Configure component
In Designer, add the LLM-Remove LaTeX Bibliography (DLC) component to your workflow and configure its parameters in the right-side panel.
|
Parameter type |
Parameter |
Required |
Description |
Default |
|
|
Field settings |
Field to process |
Yes |
The name of the field to process. |
None |
|
|
Output OSS directory |
No |
The OSS directory for storing the processed data. If this parameter is left empty, the default workspace path is used. |
None |
||
|
Execution tuning |
Number of processes |
No |
The number of concurrent processes to use for the job. |
8 |
|
|
Select resource group |
public resource group |
No |
Allows you to configure the node specification (CPU or GPU instance), number of nodes, and VPC. |
None |
|
|
dedicated resource group |
No |
Allows you to configure the number of CPU cores, memory, shared memory, number of GPUs, and number of nodes. |
None |
||
|
Maximum runtime |
No |
The maximum time allowed for the job to run. The job is terminated if it exceeds this limit. |
None |
||