This topic describes how to create a crawler to collect metadata from a Hologres data source. You can view collected metadata on the Data Map page.
Prerequisites
A Hologres instance is associated with your workspace as a compute engine instance. For more information, see Create and manage workspaces.
Background information
After you create a metadata crawler to collect full metadata of EMR tables, the system enables automated incremental metadata collection. This way, the metadata crawler can automatically synchronize incremental metadata of the EMR tables to DataWorks.Procedure
Log on to the DataWorks console and go to the DataMap page. For more information, see Go to the DataMap page.
- In the top navigation bar, click Data Discovery.
In the left-side navigation pane, choose .
On the HologresMetadata Crawler page, click Create Crawler.
In the Create Crawler dialog box, set the parameters in each step.
Configure the basic information.
In the Basic Information step, set the parameters as required.
Parameter
Description
Crawler Name
Required. The name of the crawler. You must set a unique name.
Crawler Description
The description of the crawler.
Workspace
The workspace of the data source from which you want to collect metadata.
Data Source Type
The type of the data source from which you want to collect metadata. The default value is Hologres and cannot be changed.
Click Next.
Select a Hologres data source.
In the Select Collection Object step, select a data source from the Data Source drop-down list.
You can collect metadata only from the Hologres instances that are associated with your workspace as compute engine instances. If no data source is available, click Create to create one. For more information, see Add a Hologres data source.
Click Start Testing next to Test Crawler Connectivity. If the message The connectivity test is successful appears, the DataWorks metadata service can connect to the Hologres data source.
NoteIf the message The connectivity test failed appears, you must find the cause of the connection error and troubleshoot the issue.
Click Next.
Configure an execution plan.
In the Configure Execution Plan step, configure an execution plan.
Valid values of the Execution Plan parameter are On-demand Execution, Monthly, Weekly, Daily, and Hourly. The execution plan that is generated varies based on the execution cycle. The system collects metadata from the Hologres data source based on the execution cycle that you specify. The following descriptions provide the details:
On-demand Execution: The system collects metadata from the Hologres data source based on your business requirements.
Monthly: The system automatically collects metadata from the Hologres data source once at a specific time on several specific days of each month.
ImportantSpecific months do not have the 29th, 30th, or 31st day. In these months, the system does not collect metadata from the Hologres data source on these dates. We recommend that you do not select the last few days of a month.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 09:00 on the 1st, 11th, and 21st day of each month. An expression is automatically generated for the Cron Expression parameter based on the values of the Date and Time parameters.
Weekly: The system automatically collects metadata from the Hologres data source once at a specific time on several specific days of each week.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 03:00 on Sunday and Monday of each week. An expression is automatically generated for the Cron Expression parameter based on the values of the Week and Time parameters. If the Time parameter is not set, the system automatically collects metadata from the Hologres data source once at 00:00:00 on the specific days of each week.
Daily: The system automatically collects metadata from the Hologres data source once at a specific time of each day.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 01:00 each day. An expression is automatically generated for the Cron Expression parameter based on the values of the Time parameter.
Hourly: The system automatically collects metadata from the Hologres data source once on the
N × 5
th minute of each hour.NoteFor a Hologres metadata collection task that is run each hour, you can set the Minute value to a multiple of 5 minutes.
The following figure shows that the system automatically collects metadata from the Hologres data source from the 5th and 10th minutes of each hour. An expression is automatically generated for the Cron Expression parameter based on the values of the Minutes parameter.
Click Next.
Confirm the settings of the crawler.
In the Confirm Information step, check the information that you specified.
Click Confirm.
On the HologresMetadata Crawler page, you can view the information about your crawler and manage your crawler.
The following descriptions show the information that you can view and the operations that you can perform:
You can view the status and execution plan of the crawler. You can also view the time when the last execution started, the amount of time consumed for the last execution, the average amount of time consumed, the number of updated tables in the last execution, and the number of created tables in the last execution.
You can click Details, Edit, Delete, Run, or Stop in the Actions column to perform the desired operation.
Details: View the crawler name and the data source and execution plan configured for the crawler.
Edit: Modify the configurations of the crawler.
Delete: Delete the crawler.
Run: Run a task to collect metadata from the Hologres data source. The Run button is available only if the Execution Plan parameter is set to On-demand Execution.
Stop: Stop the crawler. The Stop button is displayed only if a crawler is in the Pending state.
Result
After the metadata in the Hologres data source is collected, click All Data in the top navigation bar. Select Hologres from the drop-down list in the upper part of the page. You can view the Hologres tables whose metadata is collected.