All Products
Search
Document Center

Data Transmission Service:Tutorial: Connect OSS to a DTS RAGFlow knowledge base

Last Updated:Aug 21, 2025

This topic describes how to transfer data from Alibaba Cloud Object Storage Service (OSS) to a Data Transmission Service (DTS) RAGFlow knowledge base.

Prerequisites

You have created a RAGFlow knowledge base in DTS and configured an IP whitelist.

Supported file types

  • DOC, DOCX, PPT, PPTX, YML, XML, HTML, JSON, CSV, TXT, XLS, XLSX, WPS, RTF, MD, and SQL

  • JPG, JPEG, and PNG

  • INI

  • MP3

Preparations

  1. Create an AccessKey pair and record the AccessKey ID and AccessKey secret.

    Note

    If you use an AccessKey pair created by a Resource Access Management (RAM) user, the RAM user must be granted the read-only permission (AliyunOSSReadOnlyAccess) or the management permission (AliyunOSSFullAccess) for Object Storage Service (OSS).

  2. Obtain and record the OSS bucket information, including the bucket name and region ID.

    1. Log on to the OSS console.

    2. In the navigation pane on the left, click Buckets.

    3. Find the destination bucket.

    4. Record the Bucket Name of the target bucket.

    5. Note the Region of the target bucket, and then find and record its corresponding Region ID.

Procedure

  1. Obtain the KBSync file.

    Note

    You can join the DingTalk group (ID: 79690034672) and contact the helpdesk to obtain the KBSync file.

  2. Prepare the runtime environment for the KBSync program.

    Note

    The KBSync program must run in a Linux environment that can access OSS and RAGFlow.

  3. Prepare the config configuration file.

    1. Create a Linux file named config.

    2. Copy the following code to the config file.

      whiteList=
      blackList=
      sinkType=RagFlow
      sourceType=OSS
      
      ragflowUrl=http://XX.XX.XX.XX
      ragflowApiKey=Bearer ragflow-Rh******
      ragflowDatasetId=******
      
      sourceOSSAccessKeyId=******
      sourceOSSAccessKeySecret=******
      sourceOSSRegion=cn-beijing
      sourceOSSBucket=kbsync
    3. Replace the parameters in the config file.

      Important
      • If a parameter does not require configuration, leave its value empty.

      • The blackList parameter takes precedence over the whiteList parameter.

      Parameter

      Required

      Description

      How to obtain

      whiteList

      No

      The paths of files to transfer (whitelist) and files to exclude (blacklist). This includes the paths of folders and documents in OSS.

      Note

      Regular expressions are supported. Separate multiple paths with spaces.

      Obtain from OSS.

      blackList

      sinkType

      Yes

      The type of the sink.

      The value must be RagFlow.

      sourceType

      The type of the source.

      The value must be OSS.

      ragflowUrl

      The address of RAGFlow (API Server).

      Get the API endpoint of the RAGFlow knowledge base

      ragflowApiKey

      The API key for the RAGFlow knowledge base.

      Important

      It must start with Bearer , for example, Bearer ragflow-RhMjc0NjFhNTZmNTExZjBiYWY****.

      Get the API key of the RAGFlow knowledge base

      ragflowDatasetId

      The ID of the RAGFlow knowledge base.

      Get the ID of the RAGFlow knowledge base

      sourceOSSAccessKeyId

      The AccessKey ID that you recorded in the Preparations section.

      Preparations

      sourceOSSAccessKeySecret

      The AccessKey secret that you recorded in the Preparations section.

      sourceOSSRegion

      The OSS region ID that you recorded in the Preparations section.

      sourceOSSBucket

      The OSS bucket name that you recorded in the Preparations section.

  4. Place the KBSync file and the config configuration file in the same folder in the Linux environment.

  5. In the Linux environment, run the ./KBSync --config config command to start the KBSync program.

    If the output is similar to the following, the KBSync program is running correctly.

    INFO config SourceType=OSS, SinkType=RagFlow
    INFO config whiteList=, blackList=
    INFO config ragflowUrl=http://XX.XX.XX.XX ragflowApiKey=Bearer ragflow-Rh******
    INFO config ragflowDatasetId=b2******
    INFO config sourceOssKeyId=******, sourceOssRegion=cn-beijing
    INFO Verifying RAGFlow connection...
    INFO Attempting to list datasets to validate the connection...
    INFO Successfully found matching dataset: Name='test', ID='b2******'
    INFO RAGFlow connection verified successfully.

Appendix

Get the API endpoint of the RAGFlow knowledge base

  1. Log on to the RAGFlow page.

  2. In the navigation pane on the left, click API.

  3. Copy the API Server value.

Get the API key of the RAGFlow knowledge base

  1. Log on to the RAGFlow page.

  2. In the navigation pane on the left, click API.

  3. On the right side of RAGFlow API, click API KEY.

  4. In the API KEY dialog box, click Create New Key.

  5. Click image to record the token.

Get the ID of the RAGFlow knowledge base

  1. Log on to the RAGFlow page.

  2. On the Knowledge Base page, click the target knowledge base.

  3. In the URL of the current page, record the ID of the knowledge base.

    Note

    The information after id= is the ID of the knowledge base.