HoloWeb lets you import public datasets in a few clicks for quick data import and querying. You can create import tasks and check task status directly from the HoloWeb console.
Background information
HoloWeb supports one-click import of the tpch_10g, tpch_100g, tpch_1t, and github_event public datasets.
-
The
tpch_10g,tpch_100g, andtpch_1tpublic datasets are sample datasets in retail scenarios. The tpch_10g public dataset contains 10 GB of data, the tpch_100g public dataset contains 100 GB of data, and the tpch_1t public dataset contains 1 TB of data. For more information, see Test plan. -
The
github_eventpublic dataset is available on GitHub. For more information, see Introduction to business and data.
Prerequisites
-
The version of your Hologres instance is V1.3.13 or later.
-
The Hologres instance is connected to HoloWeb. For more information, see Log on to an instance.
Precautions
-
The public dataset importing feature is supported by Hologres instances that are deployed in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Zhangjiakou).
-
Importing a public dataset requires permissions to create schemas, create tables, and write data. For more information, see Hologres permission models.
-
Importing a public dataset takes 3 to 20 minutes depending on the instance specifications. Plan your computing resources in advance to avoid affecting your online business.
-
The import task automatically creates two schemas and several foreign and internal tables. Ensure that no existing schemas or tables in your instance have the same names as the automatically created ones to avoid accidental data deletion.
Create a public dataset import task
-
Go to the HoloWeb development page. For more information, see Connect to HoloWeb.
-
In the top menu bar of the HoloWeb development page, click Data Solutions.
-
On the Data Solutions page, click Import Public Dataset in the left-side navigation pane.
-
On the Import Public Dataset page, click Create Task for Importing Public Dataset.
-
On the Create Task for Importing Public Dataset page, select an Instance Name, Database, and Public Dataset Name. Then, specify whether to Use Serverless Computing Resource to Import Data and click Submit.
This feature is available only in Hologres V1.3.13 or later. The public dataset consumes storage space in your instance. The import task automatically creates two schemas, such as
hologres_dataset_tpch_100gandhologres_foreign_dataset_tpch_100g, and several foreign and internal tables. For the tpch_100g dataset, we recommend an instance with at least 32 CUs.
View the information about a public dataset import task
-
On the Import Public Dataset page, select an Instance Name and Database, and then click Search to view the list of public dataset import tasks.
The task list displays the following information and actions:
-
Information: No., Instance Name, Database, Public Dataset Name, Status, Progress (number of completed SQL statements/total number of SQL statements), Created At, and End Time.
-
Actions: Details, Stop, Rerun, Delete, Execution History, and Query.
-
-
When the task Status changes to Successful, the import is complete. You can then click Query in the Actions column to perform further data analysis.
Drop a public dataset
Run the following SQL statement to drop the schemas that contain the public dataset and all their dependencies. This example drops the tpch_100g dataset. Exercise caution when you perform this operation.
DROP SCHEMA hologres_dataset_tpch_100g, hologres_foreign_dataset_tpch_100g CASCADE;