Import public datasets with a few clicks - Hologres - Alibaba Cloud Documentation Center

You can use HoloWeb to import public datasets and query data in the public datasets in a visualized manner. This topic describes how to use HoloWeb to create an import task and view the task status.

Background information

In HoloWeb, you can import the tpch_10g, tpch_100g, and github_event public datasets with a few clicks. Each of the datasets may occupy 10 GB to 100 GB storage space.

The tpch_10g and tpch_100gpublic datasets are two sample datasets in retail scenarios. The tpch_10g public dataset contains 10 GB of data, and the tpch_100g public dataset contains 100 GB of data. For more information, see Test plan.
The github_event public dataset is available on GitHub. For more information, see Introduction to business and data.

Prerequisites

The version of your Hologres instance is V1.3.13 or later.
A Hologres instance is connected to HoloWeb. For more information, see Log on to an instance.

Usage notes

The public dataset importing feature is supported by Hologres instances that are deployed in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Zhangjiakou).
The account that you use to import public datasets must have permissions to perform operations such as creating schemas, creating tables, and writing data. For more information, see Overview.
It may take 3 minutes to 20 minutes to import a public dataset into a Hologres instance. The duration varies based on the instance specifications. We recommend that you plan your computing resources in advance to prevent negative impacts on your online business.
In a public dataset import task, two schemas and several external tables and internal tables are automatically created. You must make sure that no existing schemas and tables in your Hologres instance have the same names as the automatically created schemas and tables. This prevents data deletion by mistake.

Create a public dataset import task

Go to the HoloWeb console. For more information, see Connect to HoloWeb.
In the HoloWeb console, click Data Solution in the top navigation bar.
On the Data Solution page, click One-Click Import of public datasets in the left-side navigation pane.
On the One-Click import of public datasets page, click Create a public dataset import task.
On the New Public Dataset Import page, configure the Instance Name, Database, and Public Data Set Name parameters, and click Submit.

View the information about a public dataset import task

On the One-Click import of public datasets page, configure the Instance Name and Database parameters and click Query.
You can view the information displayed in the task list and perform operations on a task.
- Displayed information: No., Instance, Database, Public Data Set Name, Status, Progress, Created At, and Ended At. The Progress is displayed in the following format: Number of completed SQL statements/Total number of SQL statements.
- Supported operations: Details, Stop, Rerun, Delete, and Execution History.
A public dataset import task is complete if Status is Successful. You can then use the data for analytics.

Drop a public dataset

You can execute the following statement to drop the schema in which the public dataset that you want to drop resides and all dependencies. In this example, the tpch_100g dataset is dropped. Exercise caution when you perform this operation.

DROP SCHEMA hologres_dataset_tpch_100g, hologres_foreign_dataset_tpch_100g CASCADE;