All Products
Search
Document Center

OpenSearch:Import data

Last Updated:Apr 01, 2024

Data structure

A data structure organizes data based on a fixed series of fields, such as id, title, url, content, category, timestamp, and score.

image

Note

You can import data into the data structure by calling an API operation or uploading a file to the OpenSearch console.

A category field can have multiple values. Separate multiple values with commas (,). For more information, see Extended parameters.

Upload a file

1. Structured data

To upload structured data in a TXT or JSON file, click Import File on the Data Configuration page. In the Import File panel, click Upload Local File in the corresponding section to upload the file that you prepare and click Upload File.

image

To upload structured data in an EXCEL file, click Import File on the Data Configuration page. In the Import File panel, click Upload Local File in the corresponding section to upload the file that you prepare and click Upload File.

image

Note
  • The name of an EXCEL file can contain letters, digits, and underscores (_). The name can be up to 20 characters in length.

  • A field name can contain letters and underscores (_) and cannot start with an underscore (_). The name must be 1 to 30 characters in length.

  • A maximum of 30 fields in each Excel file can be imported and queried. Excess fields are ignored.

2. Unstructured data

To upload unstructured data files in the DOC, DOCX, PDF, or HTML format, click Import File on the Data Configuration page. In the Import File panel, click Upload Local File in the corresponding section to upload the file that you prepare and click Upload File.

image

Note
  • You can upload multiple unstructured data files at a time.

  • The size of each data file cannot exceed 128 MB.

Import web pages

1. On the Default Table (main) tab of the Data Configuration page, click Web Page URL Import to import the web pages whose content is to be used for conversational searches.

image

2. In the Web Page URL Import panel, click the Web Page Import tab. Enter the URL of the web page whose content you want to import. If you want to enter multiple URLs, enter one URL in each row. Then, click Import.

image

3. In the Web Page URL Import panel, click the Website Import tab. On the Website Import tab, click Create Task. In the Create Task dialog box, enter the website URL and set the category of the website content.

image

image

Note
  • Website URL: The URL of the website whose content you want to import.

  • Category: The category of the content to be imported.

  • URL Filtering: The default URL filter condition is a URL that starts with the URL of the website that you want to access. For example, if the URL of the website is http://www.abc.com/, the default regular expression is http://www\.abc\.com/.*.

  • XPath Selector: You can use an XPath selector to query the specified content on web pages. For example, if you want to query content in the div tag on web pages, set this parameter to //div.

  • CSS Selector: You can use a CSS selector to query the specified content on web pages. For example, if you want to query content in the <div class="content">Web Page Content</div> format on web pages, set this parameter to div.content.

  • URLs that end with .png, .jpg, or .jpeg are not supported.

4. After you configure the parameters, click OK. The task enters the Pending state. After the task is complete, the task enters the Finished state, and the number of imported URLs is displayed on the Website Import tab.

image

Add a data source

1. On the Data Configuration page, click Configure Custom Table. On the page that appears, click Add Table and then select Use Data Source.

image

image.png

2. In the Select Data Source panel, click MaxCompute. Then, click Connect to Database. In the Connect to Database dialog box, configure the Project Name, AccessKey ID, and AccessKey Secret parameters as prompted, and click Connect.

image.png

image.png

Note

Project Name: the name of the MaxCompute project.

AccessKey ID: the AccessKey ID of the account to which the MaxCompute project belongs.

AccessKey Secret: the AccessKey secret of the account to which the MaxCompute project belongs.

3. Select the data table to be used in conversational searches and click OK.

image.png

4. Turn on Text Q&A and select tags for the fields to be used in conversational searches.

image.png

5. Configure the Partition Import Conditions parameter. If you do not configure this parameter, data in all partitions is imported. Then, click Complete. After the table is created, you can perform conversational search tests on the Q&A Test page. For more information, see Q&A tests.

image.png

image.png

Query data

After documents are uploaded, you can view the total number of documents in the Data Query section. You can also view the pushed content on the Q&A Test page. For more information, see Q&A test. In addition, you can view a pushed document or delete a document by using the primary key. In this example, the primary key field is id.

1. View a document

To view a pushed document, select id from the drop-down list, enter the primary key value of the document in the search box, and then click the search icon.

image.png

2. Delete a document

To delete a document, select id from the drop-down list, enter the primary key value of the document in the search box, and then click the search icon. Find the document and click Delete in the Actions column. In the Delete Document message, click OK.

image.png

image.png

3. Modify a document

OpenSearch LLM-Based Conversational Search Edition allows you to modify a document in the OpenSearch console. To modify a document, select id from the drop-down list, enter the primary key value of the document in the search box, and then click the search icon. Find the document and click Edit in the Actions column. In the Data Editing panel, modify the fields of the document.

image.png

image.png

Usage notes

  1. The primary key value of each document is unique. If two documents have the same primary key value, the more recent document overwrites the document that was created earlier.

  2. The size of a structured data file that you upload cannot exceed 2 MB.

  3. The size of an unstructured data file that you upload cannot exceed 128 MB.

  4. After the data is uploaded, the wait time before you can query data is subject to the amount of data to be updated.

  5. You can add up to five custom tables. A maximum of 30 fields in each custom table can be imported.