All Products
Search
Document Center

Data Transmission Service:Connect SharePoint to a DTS RAGFlow knowledge base

Last Updated:May 13, 2026

This document describes how to transfer data from SharePoint to a RAGFlow knowledge base in Data Transmission Service (DTS). DTS supports authentication using a Microsoft Entra ID (formerly Azure AD) application to access documents in SharePoint sites via the Microsoft Graph API. This method uses the client credentials flow, which allows the application to authenticate itself and access data without user interaction.

Prerequisites

You have created a RAGFlow knowledge base in DTS and configured an IP whitelist.

Supported data types

DTS RAGFlow supports ingesting files from SharePoint document libraries, such as Word, Excel, PDF, and PowerPoint files.

Before you begin

Step 1: Register an application in Microsoft Entra ID

  1. Sign in to the Azure portal.

  2. In the left navigation pane, select Microsoft Entra ID.

  3. Click App registrations > New registration.

  4. Enter the application information, and then click Register.

    Parameter

    Description

    Name

    Enter an application name, such as KBSync-SharePoint-Reader.

    Supported account types

    Select Accounts in this organizational directory only (Single tenant).

    Redirect URI

    You can leave this blank. The client credentials flow does not require a redirect URI.

  5. After registration, record the following information from the application's Overview page.

    Parameter

    Description

    Application (client) ID

    The unique identifier of the application, corresponding to the llamahubReader_client_id parameter in the KBSync configuration.

    Directory (tenant) ID

    The Microsoft Entra tenant ID, which corresponds to the llamahubReader_tenant_id parameter in the KBSync configuration.

Step 2: Create a client secret

  1. On the application's page, select Certificates & secrets > Client secrets.

  2. Click New client secret.

  3. Enter a description and select an expiration period (we recommend 24 months or Never).

  4. Click Add, then immediately copy the secret's value.

    Important

    You cannot view this key value again after you leave the page. Be sure to copy and save it securely immediately after it is created. This key corresponds to the llamahubReader_client_secret parameter in the KBSync configuration.

Step 3: Configure API permissions

  1. On the application's page, select API permissions > Add a permission.

  2. Select Microsoft Graph > Application permissions.

  3. Search for and add the following permissions.

    Permission type

    Permission name

    Description

    Application

    Sites.Read.All or Sites.Selected

    • Sites.Read.All: Reads all SharePoint sites (Recommended for full site access).

    • Sites.Selected: Reads only selected SharePoint sites (a more secure option).

    Note

    If you select the Sites.Selected permission, you must explicitly grant access to specific SharePoint sites for your application via the Microsoft Graph API.

    Application

    Files.Read.All

    Allows the application to read all files in all site collections.

    Application

    BrowserSiteLists.Read.All

    Allows the application to read browser site lists.

  4. Grant admin consent.

    After adding permissions, a global administrator must grant consent to activate them.

    1. On the API permissions page, click Grant admin consent for <tenant name>.

    2. Confirm that the Status column shows Granted for the permissions.

    Note

    If you do not have administrator privileges, contact your organization's Microsoft 365 global administrator to complete this step.

Step 4: Get SharePoint site information

Before configuring KBSync, collect the following SharePoint information.

Parameter

Description

How to obtain

Site name

The name of the SharePoint site.

Extracted from the SharePoint URL. For example, if the URL is https://contoso.sharepoint.com/sites/Marketing, the Site Name is Marketing.

Folder path

The path to a folder within the document library.

In SharePoint, view the folder path of the target document. For example, Reports.

Host name

SharePoint hostname. Required when using the Sites.Selected permission.

Extracted from the SharePoint URL. For example, contoso.sharepoint.com.

Drive name

The name of the SharePoint document library.

View the document library name in the SharePoint site. The default document library is typically Documents.

Procedure

Step 1: Prepare the configuration file

  1. Obtain the KBSync program and prepare its runtime environment.

    Note
    • Submit a ticket to obtain the KBSync file.

    • The KBSync program requires a Linux environment with access to both the Microsoft Graph API and RAGFlow.

  2. Prepare the configuration file named config.

    1. Create a file named config in your Linux environment.

    2. Copy the following code into the config file.

      # Basic configuration
      sourceType=LlamaHub
      sinkType=RagFlowV2
      whiteList=
      blackList=
      
      sleepTime=600
      jobId=dts_kbsync_llamahub_sharepoint_to_ragflow
      documentSyncMode=full
      
      # LlamaHub Reader configuration
      llamahubReaderClass=llama_index.readers.microsoft_sharepoint.SharePointReader
      llamahubReader_client_id=96b75717-****-****-****-46b70e29bec1
      llamahubReader_tenant_id=c2211d60-****-****-****-fd062b3f8c2b
      llamahubReader_client_secret=Xwd8Q***************DmpRTDmpRTw
      llamahubReader_sharepoint_host_name=contoso.sharepoint.com
      llamahubReader_sharepoint_site_name=Marketing
      llamahubReader_drive_name=Documents
      llamahubLoad_sharepoint_folder_path=Reports
      
      # Sink RAGFlow configuration
      sinkOSSAccessKeyId=
      sinkOSSAccessKeySecret=
      sinkOSSRegion=
      sinkOSSBucket=
      sinkOSSEndpoint=
    3. Update the parameters in the config file.

      Important
      • If a parameter is not required for your setup, you can leave its value empty.

      • The parameter blackList has a higher priority than the parameter whiteList.

      Parameter

      Required

      Description

      How to obtain

      whiteList

      No

      Specifies paths to include (whitelist) or exclude (blacklist) from the synchronization.

      Note

      Supports regular expressions. Separate multiple paths with spaces.

      Obtain the target folder paths from SharePoint.

      blackList

      No

      sourceType

      Yes

      The type of the source.

      Set the value to LlamaHub.

      sinkType

      Yes

      The type of the sink.

      Set the value to RagFlowV2.

      sleepTime

      Yes

      The interval in seconds for incremental scans.

      -

      documentSyncMode

      Yes

      The sync mode:

      • full: Full sync

      • inc: Incremental sync

      -

      llamahubReader_client_id

      Yes

      The Application (client) ID of the app registered in Microsoft Entra ID.

      See Step 1: Register an application in Microsoft Entra ID.

      llamahubReader_tenant_id

      Yes

      The Directory (tenant) ID from Microsoft Entra ID.

      llamahubReader_client_secret

      Yes

      The client secret created in Microsoft Entra ID.

      See Step 2: Create a client secret.

      llamahubReader_sharepoint_site_name

      Yes

      The name of the SharePoint site.

      See Step 4: Get SharePoint site information.

      llamahubReader_sharepoint_host_name

      No

      Note

      Required when you use the Sites.Selected permission.

      The SharePoint host name.

      llamahubReader_drive_name

      No

      The name of the SharePoint document library. The default value is Documents.

      llamahubLoad_sharepoint_folder_path

      No

      The path to a folder within the document library. If left blank, the system syncs all content in the root of the document library.

      sinkOSSAccessKeyId

      Yes

      Information related to the Object Storage Service (OSS) bucket.

      Obtain this information from the Object Storage Service (OSS) console.

      sinkOSSAccessKeySecret

      Yes

      sinkOSSRegion

      Yes

      sinkOSSBucket

      Yes

      sinkOSSEndpoint

      Yes

Step 2: Run the KBSync program

  1. Place the KBSync file and the config file in the same directory in your Linux environment.

  2. In a Linux environment, run the ./KBSync --config config command to start the KBSync program.

Troubleshooting

Error message

Possible cause

Solution

Authentication failed

The client secret is incorrect or has expired.

Verify the Application (client) ID and client secret. If the secret has expired, create a new one in the Azure portal.

Access denied

The required API permissions have not been granted or admin consent is missing.

Ensure you have added all required API permissions and granted admin consent. On the API permissions page in the Azure portal, verify the status is Granted.

Site not found

The site name or URL is configured incorrectly.

Verify that the SharePoint site name and path are correct, and that the llamahubReader_sharepoint_site_name parameter is set to the site name and not the full URL.

Insufficient privileges

The application does not have permission to access the specified site.

When you use the Sites.Selected permission, you need to explicitly grant the application access to target sites by using the Microsoft Graph API. See the instructions in Configure API permissions to complete site-level authorization.

Appendix: Grant site-level permissions

You must use the Microsoft Graph API to grant the Sites.Selected permission to specific sites. The Azure console does not support per-site authorization.

  1. In Graph Explorer, log in with a global administrator account, and grant the Sites.FullControl.All permission for the current session.

    Enter any request. Click the Modify permissions tab above, find Sites.FullControl.All, and click Consent to complete the authorization.

  2. Obtain the site ID for the target SharePoint site by running the following request in Graph Explorer:

    GET https://graph.microsoft.com/v1.0/sites/{hostname}:/{site-path}

    For example: GET https://graph.microsoft.com/v1.0/sites/contoso.sharepoint.com:/sites/project-alpha. The id field in the response is the Site ID.

  3. Grant your application access to the specific site by running the following POST request in Graph Explorer:

    POST https://graph.microsoft.com/v1.0/sites/{site-id}/permissions
    Content-Type: application/json
    
    {
      "roles": ["write"],
      "grantedToIdentities": [
        {
          "application": {
            "id": "<your application's Application (client) ID>",
            "displayName": "<your application's name>"
          }
        }
      ]
    }

    The valid values for roles are: read (read-only), write (read-write), and owner (full control).

  4. Verify that the permissions have taken effect. Run GET https://graph.microsoft.com/v1.0/sites/{site-id}/permissions and confirm that the response contains your application and its role information.