This document describes how to transfer data from SharePoint to a RAGFlow knowledge base in Data Transmission Service (DTS). DTS supports authentication using a Microsoft Entra ID (formerly Azure AD) application to access documents in SharePoint sites via the Microsoft Graph API. This method uses the client credentials flow, which allows the application to authenticate itself and access data without user interaction.
Prerequisites
You have created a RAGFlow knowledge base in DTS and configured an IP whitelist.
Supported data types
DTS RAGFlow supports ingesting files from SharePoint document libraries, such as Word, Excel, PDF, and PowerPoint files.
Before you begin
Step 1: Register an application in Microsoft Entra ID
Sign in to the Azure portal.
In the left navigation pane, select Microsoft Entra ID.
Click App registrations > New registration.
Enter the application information, and then click Register.
Parameter
Description
Name
Enter an application name, such as
KBSync-SharePoint-Reader.Supported account types
Select Accounts in this organizational directory only (Single tenant).
Redirect URI
You can leave this blank. The client credentials flow does not require a redirect URI.
After registration, record the following information from the application's Overview page.
Parameter
Description
Application (client) ID
The unique identifier of the application, corresponding to the
llamahubReader_client_idparameter in the KBSync configuration.Directory (tenant) ID
The Microsoft Entra tenant ID, which corresponds to the
llamahubReader_tenant_idparameter in the KBSync configuration.
Step 2: Create a client secret
On the application's page, select Certificates & secrets > Client secrets.
Click New client secret.
Enter a description and select an expiration period (we recommend 24 months or Never).
Click Add, then immediately copy the secret's value.
ImportantYou cannot view this key value again after you leave the page. Be sure to copy and save it securely immediately after it is created. This key corresponds to the
llamahubReader_client_secretparameter in the KBSync configuration.
Step 3: Configure API permissions
On the application's page, select API permissions > Add a permission.
Select Microsoft Graph > Application permissions.
Search for and add the following permissions.
Permission type
Permission name
Description
Application
Sites.Read.AllorSites.SelectedSites.Read.All: Reads all SharePoint sites (Recommended for full site access).Sites.Selected: Reads only selected SharePoint sites (a more secure option).
NoteIf you select the
Sites.Selectedpermission, you must explicitly grant access to specific SharePoint sites for your application via the Microsoft Graph API.Application
Files.Read.AllAllows the application to read all files in all site collections.
Application
BrowserSiteLists.Read.AllAllows the application to read browser site lists.
Grant admin consent.
After adding permissions, a global administrator must grant consent to activate them.
On the API permissions page, click Grant admin consent for <tenant name>.
Confirm that the Status column shows Granted for the permissions.
NoteIf you do not have administrator privileges, contact your organization's Microsoft 365 global administrator to complete this step.
Step 4: Get SharePoint site information
Before configuring KBSync, collect the following SharePoint information.
Parameter | Description | How to obtain |
Site name | The name of the SharePoint site. | Extracted from the SharePoint URL. For example, if the URL is |
Folder path | The path to a folder within the document library. | In SharePoint, view the folder path of the target document. For example, |
Host name | SharePoint hostname. Required when using the | Extracted from the SharePoint URL. For example, |
Drive name | The name of the SharePoint document library. | View the document library name in the SharePoint site. The default document library is typically |
Procedure
Step 1: Prepare the configuration file
Obtain the KBSync program and prepare its runtime environment.
NoteSubmit a ticket to obtain the KBSync file.
The KBSync program requires a Linux environment with access to both the Microsoft Graph API and RAGFlow.
Prepare the configuration file named config.
Create a file named config in your Linux environment.
Copy the following code into the config file.
# Basic configuration sourceType=LlamaHub sinkType=RagFlowV2 whiteList= blackList= sleepTime=600 jobId=dts_kbsync_llamahub_sharepoint_to_ragflow documentSyncMode=full # LlamaHub Reader configuration llamahubReaderClass=llama_index.readers.microsoft_sharepoint.SharePointReader llamahubReader_client_id=96b75717-****-****-****-46b70e29bec1 llamahubReader_tenant_id=c2211d60-****-****-****-fd062b3f8c2b llamahubReader_client_secret=Xwd8Q***************DmpRTDmpRTw llamahubReader_sharepoint_host_name=contoso.sharepoint.com llamahubReader_sharepoint_site_name=Marketing llamahubReader_drive_name=Documents llamahubLoad_sharepoint_folder_path=Reports # Sink RAGFlow configuration sinkOSSAccessKeyId= sinkOSSAccessKeySecret= sinkOSSRegion= sinkOSSBucket= sinkOSSEndpoint=Update the parameters in the config file.
ImportantIf a parameter is not required for your setup, you can leave its value empty.
The parameter
blackListhas a higher priority than the parameterwhiteList.
Parameter
Required
Description
How to obtain
whiteListNo
Specifies paths to include (whitelist) or exclude (blacklist) from the synchronization.
NoteSupports regular expressions. Separate multiple paths with spaces.
Obtain the target folder paths from SharePoint.
blackListNo
sourceTypeYes
The type of the source.
Set the value to
LlamaHub.sinkTypeYes
The type of the sink.
Set the value to
RagFlowV2.sleepTimeYes
The interval in seconds for incremental scans.
-
documentSyncModeYes
The sync mode:
full: Full syncinc: Incremental sync
-
llamahubReader_client_idYes
The Application (client) ID of the app registered in Microsoft Entra ID.
llamahubReader_tenant_idYes
The Directory (tenant) ID from Microsoft Entra ID.
llamahubReader_client_secretYes
The client secret created in Microsoft Entra ID.
llamahubReader_sharepoint_site_nameYes
The name of the SharePoint site.
llamahubReader_sharepoint_host_nameNo
NoteRequired when you use the
Sites.Selectedpermission.The SharePoint host name.
llamahubReader_drive_nameNo
The name of the SharePoint document library. The default value is
Documents.llamahubLoad_sharepoint_folder_pathNo
The path to a folder within the document library. If left blank, the system syncs all content in the root of the document library.
sinkOSSAccessKeyIdYes
Information related to the Object Storage Service (OSS) bucket.
Obtain this information from the Object Storage Service (OSS) console.
sinkOSSAccessKeySecretYes
sinkOSSRegionYes
sinkOSSBucketYes
sinkOSSEndpointYes
Step 2: Run the KBSync program
Place the KBSync file and the config file in the same directory in your Linux environment.
In a Linux environment, run the
./KBSync --config configcommand to start the KBSync program.
Troubleshooting
Error message | Possible cause | Solution |
| The client secret is incorrect or has expired. | Verify the Application (client) ID and client secret. If the secret has expired, create a new one in the Azure portal. |
| The required API permissions have not been granted or admin consent is missing. | Ensure you have added all required API permissions and granted admin consent. On the API permissions page in the Azure portal, verify the status is Granted. |
| The site name or URL is configured incorrectly. | Verify that the SharePoint site name and path are correct, and that the |
| The application does not have permission to access the specified site. | When you use the |
Appendix: Grant site-level permissions
You must use the Microsoft Graph API to grant the Sites.Selected permission to specific sites. The Azure console does not support per-site authorization.
In Graph Explorer, log in with a global administrator account, and grant the
Sites.FullControl.Allpermission for the current session.Enter any request. Click the Modify permissions tab above, find
Sites.FullControl.All, and click Consent to complete the authorization.Obtain the site ID for the target SharePoint site by running the following request in Graph Explorer:
GET https://graph.microsoft.com/v1.0/sites/{hostname}:/{site-path}For example:
GET https://graph.microsoft.com/v1.0/sites/contoso.sharepoint.com:/sites/project-alpha. Theidfield in the response is the Site ID.Grant your application access to the specific site by running the following POST request in Graph Explorer:
POST https://graph.microsoft.com/v1.0/sites/{site-id}/permissions Content-Type: application/json { "roles": ["write"], "grantedToIdentities": [ { "application": { "id": "<your application's Application (client) ID>", "displayName": "<your application's name>" } } ] }The valid values for
rolesare:read(read-only),write(read-write), andowner(full control).Verify that the permissions have taken effect. Run
GET https://graph.microsoft.com/v1.0/sites/{site-id}/permissionsand confirm that the response contains your application and its role information.