This topic describes how to transfer data from Alibaba Cloud Object Storage Service (OSS) to a Data Transmission Service (DTS) RAGFlow knowledge base.
Prerequisites
You have created a RAGFlow knowledge base in DTS and configured an IP whitelist.
Supported file types
DOC, DOCX, PPT, PPTX, YML, XML, HTML, JSON, CSV, TXT, XLS, XLSX, WPS, RTF, MD, and SQL
JPG, JPEG, and PNG
INI
MP3
Preparations
Create an AccessKey pair and record the AccessKey ID and AccessKey secret.
NoteIf you use an AccessKey pair created by a Resource Access Management (RAM) user, the RAM user must be granted the read-only permission (AliyunOSSReadOnlyAccess) or the management permission (AliyunOSSFullAccess) for Object Storage Service (OSS).
Obtain and record the OSS bucket information, including the bucket name and region ID.
Log on to the OSS console.
In the navigation pane on the left, click Buckets.
Find the destination bucket.
Record the Bucket Name of the target bucket.
Note the Region of the target bucket, and then find and record its corresponding Region ID.
Procedure
Obtain the KBSync file.
NoteYou can join the DingTalk group (ID: 79690034672) and contact the helpdesk to obtain the KBSync file.
Prepare the runtime environment for the KBSync program.
NoteThe KBSync program must run in a Linux environment that can access OSS and RAGFlow.
Prepare the config configuration file.
Create a Linux file named config.
Copy the following code to the config file.
whiteList= blackList= sinkType=RagFlow sourceType=OSS ragflowUrl=http://XX.XX.XX.XX ragflowApiKey=Bearer ragflow-Rh****** ragflowDatasetId=****** sourceOSSAccessKeyId=****** sourceOSSAccessKeySecret=****** sourceOSSRegion=cn-beijing sourceOSSBucket=kbsyncReplace the parameters in the config file.
ImportantIf a parameter does not require configuration, leave its value empty.
The
blackListparameter takes precedence over thewhiteListparameter.
Parameter
Required
Description
How to obtain
whiteListNo
The paths of files to transfer (whitelist) and files to exclude (blacklist). This includes the paths of folders and documents in OSS.
NoteRegular expressions are supported. Separate multiple paths with spaces.
Obtain from OSS.
blackListsinkTypeYes
The type of the sink.
The value must be
RagFlow.sourceTypeThe type of the source.
The value must be
OSS.ragflowUrlThe address of RAGFlow (API Server).
ragflowApiKeyThe API key for the RAGFlow knowledge base.
ImportantIt must start with
Bearer, for example,Bearer ragflow-RhMjc0NjFhNTZmNTExZjBiYWY****.ragflowDatasetIdThe ID of the RAGFlow knowledge base.
sourceOSSAccessKeyIdThe AccessKey ID that you recorded in the Preparations section.
sourceOSSAccessKeySecretThe AccessKey secret that you recorded in the Preparations section.
sourceOSSRegionThe OSS region ID that you recorded in the Preparations section.
sourceOSSBucketThe OSS bucket name that you recorded in the Preparations section.
Place the KBSync file and the config configuration file in the same folder in the Linux environment.
In the Linux environment, run the
./KBSync --config configcommand to start the KBSync program.If the output is similar to the following, the KBSync program is running correctly.
INFO config SourceType=OSS, SinkType=RagFlow INFO config whiteList=, blackList= INFO config ragflowUrl=http://XX.XX.XX.XX ragflowApiKey=Bearer ragflow-Rh****** INFO config ragflowDatasetId=b2****** INFO config sourceOssKeyId=******, sourceOssRegion=cn-beijing INFO Verifying RAGFlow connection... INFO Attempting to list datasets to validate the connection... INFO Successfully found matching dataset: Name='test', ID='b2******' INFO RAGFlow connection verified successfully.
Appendix
Get the API endpoint of the RAGFlow knowledge base
In the navigation pane on the left, click API.
Copy the API Server value.
Get the API key of the RAGFlow knowledge base
In the navigation pane on the left, click API.
On the right side of RAGFlow API, click API KEY.
In the API KEY dialog box, click Create New Key.
Click
to record the token.
Get the ID of the RAGFlow knowledge base
On the Knowledge Base page, click the target knowledge base.
In the URL of the current page, record the ID of the knowledge base.
NoteThe information after
id=is the ID of the knowledge base.