This topic describes how to import data in the console to Data Management of Model Studio as a source of knowledge for knowledge bases.
Use API: You can use API to import unstructured data. To import structured data, you must use the console. To automatically update a structured knowledge base, you can build it based on an ApsaraDB RDS data table.
Import from RDS: If you want to build a knowledge base based on an RDS data table, see Create a knowledge base.
Procedure
The Model Studio console supports importing Unstructured Data and Structured Data. Unstructured Data is not organized based on predefined table structure, while Structured Data is organized based on a predefined table structure.
Select Unstructured Data for:
Documents in formats such as PDF, DOCX, DOC, TXT, Markdown, PPTX, PPT, PNG, JPG, JPEG, BMP, or GIF.
Multiple XLSX or XLS documents, but their table structures may be different.
Importing documents from Object Storage Service (OSS).
Select Structured Data for:
Multiple XLSX or XLS documents with identical table structures.
Documents in XLSX or XLS format that will be used for FAQ scenarios. For example, an Excel document contains two columns:
question
andanswer
. A structured knowledge base allows you to limitQuestion
column for retrieval, andAnswer
column for reference. Unstructured knowledge base can hardly achieve this effect.
Unstructured data
Go to the Data Management page and select the Unstructured Data tab.
Under Category Management on the left, select the desired category for data import.
Select the default category or click
to create a new one. Each workspace can have up to 500 categories.
Each workspace can have up to 100,000 documents.
Click Import Data to go to the Import Data page.
For Document Recoognition, the default is Intelligent Document Parsing (currently cannot be changed). However, you can configure parsing rules for different document formats through Data Parsing Settings for better effect.
(Optional) Configure Tags for documents.
When calling applications through API, you can specify tags in the request parameter
tags
. When the application retrieves the knowledge base, it first filters documents based on tags, thereby improving efficiency. For agent applications, you can also set tags when editing the application in the console (enable ).Click Confirm. The system will begin parsing and importing the documents. This may take some time.
Document parsing converts uploaded documents into a format that Model Studio can process. During peak periods, it may take longer time.
After parsing and importing are complete, click Details to the right of the corresponding document to view the imported document.
You can view documents imported within 90 days. Documents beyond this time range will not be viewable.
Structured data
Go to the Data Management page and select the Structured Data tab.
Create a new data table or select an existing one.
Each workspace can have up to 1,000 data tables, and each table can have up to 100,000 rows (including the header). Exceeding this limit will result in a failed import, so you may need to split the data in advance.
Create a new data table
Click
to create a data table.
Enter a Table Name.
Configure the table structure by selecting Upload Excel File or Custom Header.
Option
Description
Upload Excel File
Model Studio will automatically identify the header in the uploaded document to create the data table structure accordingly. Then, it will import the remaining content as data records into the table.
Custom Header
Column Name and Type are required. Description is optional.
ImportantOnce the data table is created, you cannot modify the Column Name, Description, or Type.
Make sure the table schema matches the schema of the data to be imported. For example, if the data table to be imported has 2 columns, the structure here must also have 2 fields with corresponding column names. Click New Columns or Delete in the Actions column to adjust the fields.
Upload your documents.
Click
to select and upload documents (XLSX or XLS format).
The documents must have a header that matches the structure of the data table. Otherwise, the import will fail.
Then, click Preview to view the imported data.
Click Confirm. The new data table will appear under Table Management on the left.
Select an existing data table
Select an existing data table under Table Management on the left and click Import Data.
For Import Type, select Upload and Overwrite or Incremental Upload.
You can click Download Template to download a blank document with the table header. Then, insert data to the template and upload it directly.
Click
to select and upload documents (XLSX or XLS format).
The documents must have a header that matches the structure of the data table. Otherwise, the import will fail.
Then, click Preview to view the imported data.
What to do next
More
Import data from OSS
If you are importing data from OSS for the first time, you must first complete authorization as prompted and add the bailian-datahub-access
tag to the desired bucket.
If you are not familiar with the concepts and differences between Alibaba Cloud accounts and RAM users, read Permissions first.
Use Alibaba Cloud account
Click Authorize Now.
In the dialog box that appears, click Confirm Authorization. The system will automatically create an OSS service-linked role (necessary).
This typically takes effect within seconds, but slight delays may occur during peak periods.
What should I do if I encounter error code "10041495"?.
Add the
bailian-datahub-access
tag to desired OSS bucket.This tag is used to mark buckets that Model Studio can access. Model Studio cannot access buckets without this tag.
Go to the OSS console. In the left-side navigation pane, choose Buckets.
In the Tag column of the desired bucket, hover over
and click Edit.
Click Create Tag.
Click + Tag and enter the following pair
bailian-datahub-access
:read
. Then, click Save.
Go back to the Import Data page of the Model Studio console. Select the target bucket and try importing again.
Model Studio cannot access files in the OSS root directory. Use an existing subdirectory or create a new one.
Using a RAM user
Click Authorize Now.
In the dialog box that appears, click Confirm Authorization. If the authorization failed because the current user does not have the permission to create service-linked role, you must grant the RAM user the permissions to create service-linked role and to access OSS through Model Studio.
Grant the permission to create service-linked role
Log on to the RAM Console with your Alibaba Cloud account. In the left-side navigation pane, select . Then, click Create Policy.
On the JSON tab, enter the following for
Effect
,Action
,Resource
, andCondition
and click OK.{ "Action": [ "ram:CreateServiceLinkedRole" ], "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "ram:ServiceName": "datahub.sfm.aliyuncs.com" } } }
Enter the policy name, then click OK.
In the left-side navigation pane, choose
. Find the desired RAM user and click Add Permissions in the Actions column.Select the created policy from the list and click Grant permissions.
The RAM user is now able to create a service-linked role.
Authorize the RAM user to access OSS through Model Studio.
Go back to the Import Data page of the Model Studio console. Click Authorize Now.
In the dialog box that appears, click Confirm Authorization. The system will automatically create an AliyunServiceRoleForSFMDataHubOSSImport(necessary).
This typically takes effect within seconds, but slight delays may occur during peak periods.
What should I do if I encounter error code "10041495"?.
Add the
bailian-datahub-access
tag to desired OSS bucket.This tag is used to mark buckets that Model Studio can access. Model Studio cannot access buckets without this tag.
Go to the OSS console. In the left-side navigation pane, choose Buckets.
In the Tag column of the desired bucket, hover over
and click Edit.
Click Create Tag.
Click + Tag and enter the following pair
bailian-datahub-access
:read
. Then, click Save.
Go back to the Import Data page of the Model Studio console. Select the target bucket and try importing again.
Model Studio cannot access files in the OSS root directory. Use an existing subdirectory or create a new one.
FAQ
What should I do if I encounter error code "10041495"?
This is usually because the Alibaba Cloud account has not activated OSS. Take these steps:
Log on to the OSS console with the Alibaba Cloud account. Activate OSS as prompted.
Go back to the Import Data page of Model Studio and try again.