All Products
Search
Document Center

Hologres:Import public datasets with a few clicks

Last Updated:Jan 24, 2025

You can use HoloWeb to import public datasets with a few clicks in a visualized manner. This facilitates public data import and query. This topic describes how to use HoloWeb to create a public dataset import task and view the task status.

Background information

In HoloWeb, you can import the tpch_10g, tpch_100g, tpch_1t, and github_event public datasets with a few clicks.

  • The tpch_10g, tpch_100g, and tpch_1t public datasets are sample datasets in retail scenarios. The tpch_10g public dataset contains 10 GB of data, the tpch_100g public dataset contains 100 GB of data, and the tpch_1t public dataset contains 1 TB of data. For more information, see Test plan.

  • The github_event public dataset is available on GitHub. For more information, see Introduction to business and data.

Prerequisites

  • The version of your Hologres instance is V1.3.13 or later.

  • The Hologres instance is connected to HoloWeb. For more information, see Log on to an instance.

Precautions

  • The public dataset importing feature is supported by Hologres instances that are deployed in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Zhangjiakou).

  • To import a public dataset with a few clicks, you must have the permissions to create a schema, create a table, and write data. For more information, see Hologres permission models.

  • It may take 3 minutes to 20 minutes to import a public dataset into a Hologres instance. The duration varies based on the instance specifications. We recommend that you plan your computing resources in advance to prevent negative impacts on your online business.

  • In a public dataset import task, two schemas and several foreign tables and internal tables are automatically created. You must make sure that no existing schemas and tables in your Hologres instance have the same names as the automatically created schemas and tables. This prevents data deletion by mistake.

Create a public dataset import task

  1. Go to the HoloWeb console. For more information, see Connect to HoloWeb and perform queries.

  2. In the HoloWeb console, click Data Solutions in the top navigation bar.

  3. On the Data Solutions page, click Import Public Dataset in the left-side navigation pane.

  4. On the Import Public Dataset page, click Create Task for Importing Public Dataset.

  5. On the Create Task for Importing Public Dataset page, configure the Instance Name, Database, and Public Dataset Name parameters, turn on or off the Use Serverless Computing Resource to Import Data switch, and then click Submit.

    image

View the information about a public dataset import task

  1. On the Import Public Dataset page, configure the Instance Name and Database parameters and click Query.

    image

    You can view the information displayed in the task list and perform operations on a task:

    • Displayed information: No., Instance Name, Database, Public Dataset Name, Status, Progress, Created At, and Ended At. The progress is displayed in the following format: Number of completed SQL statements/Total number of SQL statements.

    • Supported operations: Details, Stop, Rerun, Delete, Execution History, and Query.

  2. When the task status changes to Successful, the public dataset import task is completed. Then, you can click Query in the Actions column to further analyze the data.

Drop a public dataset

You can execute the following SQL statement to drop the schemas in which the public dataset that you want to drop resides and all dependencies. In this example, the tpch_100g dataset is dropped. Exercise caution when you perform this operation.

DROP SCHEMA hologres_dataset_tpch_100g, hologres_foreign_dataset_tpch_100g CASCADE;