All Products
Search
Document Center

Alibaba Cloud Model Studio:Import data

Last Updated:Apr 08, 2025

This topic describes how to import data in the console to Data Management of Model Studio as a source of knowledge for knowledge bases.

Note

Procedure

The Model Studio console supports importing Unstructured Data and Structured Data. Unstructured Data is not organized based on predefined table structure, while Structured Data is organized based on a predefined table structure.

Select Unstructured Data for:

  • Documents in formats such as PDF, DOCX, DOC, TXT, Markdown, PPTX, PPT, PNG, JPG, JPEG, BMP, or GIF.

  • Multiple XLSX or XLS documents, but their table structures may be different.

  • Importing documents from Object Storage Service (OSS).

Select Structured Data for:

  • Multiple XLSX or XLS documents with identical table structures.

  • Documents in XLSX or XLS format that will be used for FAQ scenarios. For example, an Excel document contains two columns: question and answer. A structured knowledge base allows you to limit Question column for retrieval, and Answer column for reference. Unstructured knowledge base can hardly achieve this effect.

Unstructured data

  1. Go to the Data Management page and select the Unstructured Data tab.

  2. Under Category Management on the left, select the desired category for data import.

    Select the default category or click image to create a new one. Each workspace can have up to 500 categories.
    Each workspace can have up to 100,000 documents.

    image

  3. Click Import Data to go to the Import Data page.

  4. For Import Method, select Upload Local File or OSS.

    If you are importing data from OSS for the first time, you must first complete authorization as prompted and add the bailian-datahub-access tag to the desired bucket. For more information, see Import data from OSS.
    Model Studio does not support OSS buckets in the following classes: Archive, Cold Archive, and Deep Cold Archive. Buckets with content encryption and private buckets are supported.
    Model Studio cannot access files in the root directory of OSS. Select an existing sub-directory under the bucket or create a new sub-directory.
  5. For Document Recoognition, the default is Intelligent Document Parsing (currently cannot be changed). However, you can configure parsing rules for different document formats through Data Parsing Settings for better effect.

    Data Parsing Settings

    On the Data Management page, you can configure parsing strategies. If you are not sure, just maintain the default settings.

    Digital Parsing: Cannot parse illustrations and charts in documents.
    Intelligent Parsing: The parser can extract texts from illustrations and generates summaries. These summaries, along with non-image content in the document, are chunked and embedded for knowledge base retrieval.
    LLM Parsing: Agent applications using Qwen VL supports questions about the illustrations and charts in documents. If you requires understanding illustrations and charts in documents, select LLM Parsing.

    image

    image

  6. (Optional) Configure Tags for documents.

    When calling applications through API, you can specify tags in the request parameter tags. When the application retrieves the knowledge base, it first filters documents based on tags, thereby improving efficiency. For agent applications, you can also set tags when editing the application in the console (enable Knowledge Base Retrieval Augmentation > Configure Knowledge Base > Advanced Configuration > Filter by Tag).
  7. Click Confirm. The system will begin parsing and importing the documents. This may take some time.

    Document parsing converts uploaded documents into a format that Model Studio can process. During peak periods, it may take longer time.
  8. After parsing and importing are complete, click Details to the right of the corresponding document to view the imported document.

    You can view documents imported within 90 days. Documents beyond this time range will not be viewable.

Structured data

  1. Go to the Data Management page and select the Structured Data tab.

  2. Create a new data table or select an existing one.

    Each workspace can have up to 1,000 data tables, and each table can have up to 100,000 rows (including the header). Exceeding this limit will result in a failed import, so you may need to split the data in advance.

    Create a new data table

    Click image to create a data table.

    image

    1. Enter a Table Name.

    2. Configure the table structure by selecting Upload Excel File or Custom Header.

      Option

      Description

      Upload Excel File

      Model Studio will automatically identify the header in the uploaded document to create the data table structure accordingly. Then, it will import the remaining content as data records into the table.

      Custom Header

      Column Name and Type are required. Description is optional.

      Important
      • Once the data table is created, you cannot modify the Column Name, Description, or Type.

      • Make sure the table schema matches the schema of the data to be imported. For example, if the data table to be imported has 2 columns, the structure here must also have 2 fields with corresponding column names. Click New Columns or Delete in the Actions column to adjust the fields.

      image

    3. Upload your documents.

      1. Click image to select and upload documents (XLSX or XLS format).

        The documents must have a header that matches the structure of the data table. Otherwise, the import will fail.
      2. Then, click Preview to view the imported data.

    4. Click Confirm. The new data table will appear under Table Management on the left.

      image

    Select an existing data table

    Select an existing data table under Table Management on the left and click Import Data.

    1. For Import Type, select Upload and Overwrite or Incremental Upload.

      You can click Download Template to download a blank document with the table header. Then, insert data to the template and upload it directly.
    2. Click image to select and upload documents (XLSX or XLS format).

      The documents must have a header that matches the structure of the data table. Otherwise, the import will fail.
    3. Then, click Preview to view the imported data.

What to do next

Create a knowledge base

More

Import data from OSS

If you are importing data from OSS for the first time, you must first complete authorization as prompted and add the bailian-datahub-access tag to the desired bucket.

If you are not familiar with the concepts and differences between Alibaba Cloud accounts and RAM users, read Permissions first.

Use Alibaba Cloud account

  1. Click Authorize Now.

    image

  2. In the dialog box that appears, click Confirm Authorization. The system will automatically create an OSS service-linked role (necessary).

    This typically takes effect within seconds, but slight delays may occur during peak periods.
    What should I do if I encounter error code "10041495"?.

    image

  3. Add the bailian-datahub-access tag to desired OSS bucket.

    This tag is used to mark buckets that Model Studio can access. Model Studio cannot access buckets without this tag.
    1. Go to the OSS console. In the left-side navigation pane, choose Buckets.

    2. In the Tag column of the desired bucket, hover over image and click Edit.

      image

    3. Click Create Tag.

      image

    4. Click + Tag and enter the following pair bailian-datahub-access:read. Then, click Save.

      image

  4. Go back to the Import Data page of the Model Studio console. Select the target bucket and try importing again.

    Model Studio cannot access files in the OSS root directory. Use an existing subdirectory or create a new one.

Using a RAM user

  1. Click Authorize Now.

    image

  2. In the dialog box that appears, click Confirm Authorization. If the authorization failed because the current user does not have the permission to create service-linked role, you must grant the RAM user the permissions to create service-linked role and to access OSS through Model Studio.

  3. Grant the permission to create service-linked role

    1. Log on to the RAM Console with your Alibaba Cloud account. In the left-side navigation pane, select Permissions > Policies. Then, click Create Policy.

    2. On the JSON tab, enter the following for Effect, Action, Resource, and Condition and click OK.

      {
          "Action": [
              "ram:CreateServiceLinkedRole"
          ],
          "Resource": "*",
          "Effect": "Allow",
          "Condition": {
              "StringEquals": {
                  "ram:ServiceName": "datahub.sfm.aliyuncs.com"
              }
          }
      }

      image

    3. Enter the policy name, then click OK.

      image

    4. In the left-side navigation pane, choose Identities > Users. Find the desired RAM user and click Add Permissions in the Actions column.

    5. Select the created policy from the list and click Grant permissions.

      The RAM user is now able to create a service-linked role.

      image

  4. Authorize the RAM user to access OSS through Model Studio.

    1. Go back to the Import Data page of the Model Studio console. Click Authorize Now.

      image

    2. In the dialog box that appears, click Confirm Authorization. The system will automatically create an AliyunServiceRoleForSFMDataHubOSSImport(necessary).

      This typically takes effect within seconds, but slight delays may occur during peak periods.
      What should I do if I encounter error code "10041495"?.

      image

  5. Add the bailian-datahub-access tag to desired OSS bucket.

    This tag is used to mark buckets that Model Studio can access. Model Studio cannot access buckets without this tag.
    1. Go to the OSS console. In the left-side navigation pane, choose Buckets.

    2. In the Tag column of the desired bucket, hover over image and click Edit.

      image

    3. Click Create Tag.

      image

    4. Click + Tag and enter the following pair bailian-datahub-access:read. Then, click Save.

      image

  6. Go back to the Import Data page of the Model Studio console. Select the target bucket and try importing again.

    Model Studio cannot access files in the OSS root directory. Use an existing subdirectory or create a new one.

FAQ

  • What should I do if I encounter error code "10041495"?

    This is usually because the Alibaba Cloud account has not activated OSS. Take these steps:

    1. Log on to the OSS console with the Alibaba Cloud account. Activate OSS as prompted.

    2. Go back to the Import Data page of Model Studio and try again.