You can use HoloWeb to import public datasets with a few clicks in a visualized manner. This facilitates public data import and query. This topic describes how to use HoloWeb to create a public dataset import task and view the task status.
Background information
In HoloWeb, you can import the tpch_10g
, tpch_100g
, tpch_1t
, and github_event
public datasets with a few clicks.
The
tpch_10g
,tpch_100g
, andtpch_1t
public datasets are sample datasets in retail scenarios. The tpch_10g public dataset contains 10 GB of data, the tpch_100g public dataset contains 100 GB of data, and the tpch_1t public dataset contains 1 TB of data. For more information, see Test plan.The
github_event
public dataset is available on GitHub. For more information, see Introduction to business and data.
Prerequisites
The version of your Hologres instance is V1.3.13 or later.
The Hologres instance is connected to HoloWeb. For more information, see Log on to an instance.
Precautions
The public dataset importing feature is supported by Hologres instances that are deployed in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Zhangjiakou).
To import a public dataset with a few clicks, you must have the permissions to create a schema, create a table, and write data. For more information, see Hologres permission models.
It may take 3 minutes to 20 minutes to import a public dataset into a Hologres instance. The duration varies based on the instance specifications. We recommend that you plan your computing resources in advance to prevent negative impacts on your online business.
In a public dataset import task, two schemas and several foreign tables and internal tables are automatically created. You must make sure that no existing schemas and tables in your Hologres instance have the same names as the automatically created schemas and tables. This prevents data deletion by mistake.
Create a public dataset import task
Go to the HoloWeb console. For more information, see Connect to HoloWeb and perform queries.
In the HoloWeb console, click Data Solutions in the top navigation bar.
On the Data Solutions page, click Import Public Dataset in the left-side navigation pane.
On the Import Public Dataset page, click Create Task for Importing Public Dataset.
On the Create Task for Importing Public Dataset page, configure the Instance Name, Database, and Public Dataset Name parameters, turn on or off the Use Serverless Computing Resource to Import Data switch, and then click Submit.
View the information about a public dataset import task
On the Import Public Dataset page, configure the Instance Name and Database parameters and click Query.
You can view the information displayed in the task list and perform operations on a task:
Displayed information: No., Instance Name, Database, Public Dataset Name, Status, Progress, Created At, and Ended At. The progress is displayed in the following format: Number of completed SQL statements/Total number of SQL statements.
Supported operations: Details, Stop, Rerun, Delete, Execution History, and Query.
When the task status changes to Successful, the public dataset import task is completed. Then, you can click Query in the Actions column to further analyze the data.
Drop a public dataset
You can execute the following SQL statement to drop the schemas in which the public dataset that you want to drop resides and all dependencies. In this example, the tpch_100g
dataset is dropped. Exercise caution when you perform this operation.
DROP SCHEMA hologres_dataset_tpch_100g, hologres_foreign_dataset_tpch_100g CASCADE;