KBSync is a command-line tool that syncs files from an Alibaba Cloud Object Storage Service (OSS) bucket into a Data Transmission Service (DTS) RAGFlow knowledge base. Run KBSync from a Linux host with network access to both OSS and RAGFlow, point it at a config file, and it transfers the files into the specified knowledge base dataset.
Supported file types
KBSync can sync the following file types:
Documents: DOC, DOCX, PPT, PPTX, YML, XML, HTML, JSON, CSV, TXT, XLS, XLSX, WPS, RTF, MD, SQL
Images: JPG, JPEG, PNG
Other: INI, MP3
Prerequisites
Before you begin, make sure you have:
A RAGFlow knowledge base created in DTS, with an IP whitelist configured
An OSS bucket containing the files to sync
A Linux host that can reach both OSS and RAGFlow over the network
The KBSync binary (see step 1 of the procedure below)
Gather required values
Before configuring KBSync, collect the following values. Each is required in the config file.
OSS credentials and bucket details
Create an AccessKey pair and record the AccessKey ID and AccessKey secret.
If you use an AccessKey pair from a Resource Access Management (RAM) user, grant that RAM user either the AliyunOSSReadOnlyAccess (read-only) or AliyunOSSFullAccess (management) permission for OSS.
Record your OSS bucket name and region ID:
Log on to the OSS console.
In the navigation pane, click Buckets.
Find the target bucket and record its Bucket Name.
Note the Region, then find the corresponding region ID (for example,
cn-beijing).
RAGFlow connection details
Collect the following three values from the RAGFlow page. To log on, follow the steps in Log on to the RAGFlow page.
API endpoint (ragflowUrl)
In the navigation pane, click API.
Copy the API Server value.
API key (ragflowApiKey)
In the navigation pane, click API.
Next to RAGFlow API, click API KEY.
In the API KEY dialog box, click Create New Key.
Click the copy icon to record the token.
ImportantThe API key must start with
Bearer, for example:Bearer ragflow-RhMjc0NjFhNTZmNTExZjBiYWY****.
Knowledge base ID (ragflowDatasetId)
On the Knowledge Base page, click the target knowledge base.
In the URL, record the value after
id=. That value is the knowledge base ID.
Sync OSS files to RAGFlow
Step 1: Get the KBSync binary
Join the DingTalk group (ID: 79690034672) and contact the helpdesk to get the KBSync binary.
Step 2: Create the config file
On your Linux host, create a file named
config.Copy the following template into the file:
whiteList= blackList= sinkType=RagFlow sourceType=OSS ragflowUrl=http://XX.XX.XX.XX ragflowApiKey=Bearer ragflow-Rh****** ragflowDatasetId=****** sourceOSSAccessKeyId=****** sourceOSSAccessKeySecret=****** sourceOSSRegion=cn-beijing sourceOSSBucket=kbsyncReplace the placeholder values with the values you collected in the previous section:
ImportantLeave optional parameters blank rather than removing them.
blackListtakes precedence overwhiteListwhen both are set.Parameter Required Description Example whiteListNo Space-separated paths of OSS files or folders to include. Supports regular expressions. Leave blank to include everything. docs/ reports/2024blackListNo Space-separated paths of OSS files or folders to exclude. Supports regular expressions. Takes precedence over whiteList.drafts/ *.tmpsinkTypeYes Must be RagFlow.RagFlowsourceTypeYes Must be OSS.OSSragflowUrlYes The RAGFlow API Server endpoint. http://192.0.2.10ragflowApiKeyYes The RAGFlow API key. Must start with Bearer.Bearer ragflow-Rh****ragflowDatasetIdYes The ID of the target knowledge base. b2abcd1234efsourceOSSAccessKeyIdYes Your AccessKey ID. LTAI5tXxxsourceOSSAccessKeySecretYes Your AccessKey secret. xXxXxXxsourceOSSRegionYes The region ID of your OSS bucket. cn-beijingsourceOSSBucketYes The name of your OSS bucket. my-bucket
Step 3: Run KBSync
Place the
KBSyncbinary and theconfigfile in the same directory.Run the following command:
./KBSync --config config
Verify the sync
If KBSync starts successfully, the output looks similar to the following:
INFO config SourceType=OSS, SinkType=RagFlow
INFO config whiteList=, blackList=
INFO config ragflowUrl=http://XX.XX.XX.XX ragflowApiKey=Bearer ragflow-Rh******
INFO config ragflowDatasetId=b2******
INFO config sourceOssKeyId=******, sourceOssRegion=cn-beijing
INFO Verifying RAGFlow connection...
INFO Attempting to list datasets to validate the connection...
INFO Successfully found matching dataset: Name='test', ID='b2******'
INFO RAGFlow connection verified successfully.The key indicator is RAGFlow connection verified successfully. Once you see it, KBSync is connected and syncing files from your OSS bucket to the knowledge base.
Troubleshooting
KBSync cannot connect to RAGFlow
Check that ragflowUrl points to the correct API Server address and that the Linux host can reach that address over the network. Verify that the RAGFlow instance's IP whitelist includes the host's IP address.
Authentication fails
Confirm that ragflowApiKey starts with Bearer (including the space), and that the token has not expired. Create a new API key if needed.
OSS access is denied
Verify that the AccessKey ID and AccessKey secret are correct. If you are using a RAM user's AccessKey pair, confirm the RAM user has AliyunOSSReadOnlyAccess or AliyunOSSFullAccess on the bucket.
Files are not being synced
If you set whiteList, check that the paths match the actual OSS object paths. If blackList is also set, it takes precedence — a path matched by blackList is excluded even if it also matches whiteList.