In Data Map, you can use the external catalog feature of a StarRocks data source to collect the metadata of external data sources. This topic describes the required configurations for creating an external catalog to collect the metadata of an external data source. After you complete the configurations and collect the metadata of an external data source, you can search for and view the metadata, such as the information about the tables and fields of the external data source, in Data Map.
Background information
By default, Data Map obtains only the metadata of the StarRocks internal catalog after you add a StarRocks data source to DataWorks and start metadata collection. If you want to obtain the metadata of an external data source by using the StarRocks external catalog feature, you must establish a connection with the external data source that is associated with the external catalog in DataWorks and collect the metadata. After the collection is complete, Data Map automatically associates the metadata of the external data source with the external catalog. You can view the external catalog and the associated metadata of the external data source in the StarRocks data source.
Prerequisites
A DataWorks workspace is created. For more information, see Create a workspace.
A StarRocks database is added to DataWorks as a StarRocks data source. For more information, see StarRocks data source.
If you want to collect the metadata of an external data source for which an IP address whitelist is enabled, make sure that the permissions related to the IP address whitelist are configured. For more information, see Configure IP address whitelists for metadata collection.
Limits
Elasticsearch external catalogs are not supported.
Object Storage Service (OSS) Paimon catalogs are not supported.
Procedure
In this topic, a MySQL external catalog supported by StarRocks is used to collect the metadata of a MySQL data source. You create a MySQL database named mysql_catalog_db
. In Data Map, you create and configure a metadata crawler of the MySQL data source type. After you use the metadata crawler to collect the metadata of the MySQL database, you can search for and view the metadata in the mysql_catalog_db
database on the Search page where StarRocks is selected on the Data Source tab.
Step 1: Prepare data
Create a MySQL data source
Create a MySQL data source based on the mysql_catalog_db
database and create a sample table named mysql_catalog_table
. Sample script:
CREATE TABLE mysql_catalog_table(
catalog_table_id INT,
catalog_table_name VARCHAR(255)
)
Prepare the MySQL JDBC driver package
You must upload the MySQL Java Database Connectivity (JDBC) driver package of the corresponding version to OSS.
To upload the Java ARchive (JAR) package of the JDBC driver of the same version as your MySQL database to OSS, log on to the OSS console and click Buckets in the left-side navigation pane.
On the Buckets page, find the desired bucket and click the bucket name to go to the Objects page. In this example, the
catalog-bucket-oss
bucket is used.On the Objects page, click Create Directory to create a directory that is used to store the JAR package. In the Create Directory panel, set Directory Name to
libs
and click OK.Go to the directory where the JDBC driver JAR package is stored. Click Upload Object. In the Files to Upload section, click Select Files and add the
mysql-connector-java-8.0.28.jar
JDBC driver JAR package. Then, click Upload Object.Find the JDBC driver JAR package that you uploaded and click View Details in the Actions column. In the View Details panel, click Set ACL. In the Set ACL panel, set the ACL parameter to Public Read/Write and click OK.
Step 2: Establish a connection with the external data source
Go to the query list of the StarRocks instance.
Log on to the EMR console. In the left-side navigation pane, choose . The Instances tab appears.
Find the StarRocks instance that you created and click Connect in the Actions column. On the page that appears, click the New Connection tab.
On the New Connection tab, set the Region parameter to the region where the created StarRocks instance is deployed and the Instance parameter to the name of the created StarRocks instance, configure the Connection Name, Username, and Password parameters, and then click Test Network Connectivity. After the instance is connected, click OK to go to the query list of the StarRocks instance.
Establish a connection with the MySQL data source in the StarRocks data source.
On the Queries tab, click + File. In the Create a file dialog box, configure the Name and Storage path parameters and click Confirm.
Double-click the name of the file that you created under All files to go to the configuration tab of the StarRocks instance. Enter the following sample external connection script. For more information, see the Sample configuration of a StarRocks external catalog section in this topic.
CREATE EXTERNAL CATALOG mysql_db_catalog PROPERTIES ( "driver_class" = "com.mysql.cj.jdbc.Driver", "driver_url" = "https://catalog-bucket-oss.oss-cn-hangzhou-internal.aliyuncs.com/libs/mysql-connector-java-8.0.28.jar", "type" = "jdbc", "user" = "<UserName>", "password"="<PassWord>", "jdbc_uri" = "jdbc:mysql://xxx:3306/mysql_catalog_db" );
NoteThe
mysql_db_catalog
parameter specifies the name of the external catalog that you created for the MySQL data source.Configure the
UserName
andPassWord
parameters based on your business requirements.jdbc_uri
: Enter the Uniform Resource Identifier (URI) of the external database that you created.
After the script is edited, click Run to run the script. After the script is run, you can view the information about the related table on the Databases tab.
Step 3: Configure the data sources
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Workspace in the left-side navigation pane. On the Workspaces page, find the workspace that you created and click Manage in the Actions column. The SettingCenter page appears.
In the left-side navigation pane of the SettingCenter page, choose Add a StarRocks data source and Add a MySQL data source.
. On the Data Sources page, add the StarRocks and MySQL data sources. For more information, seeNoteSet the Configuration Mode parameter to Connection String Mode when you add a MySQL data source. Set the Configuration Mode parameter to Connection String Mode when you add an external data source of the JDBC type.
Step 4: Configure a metadata crawler
You must log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Map. Configure a metadata crawler in Data Map.
Collect the metadata of the StarRocks internal catalog
In the left-side navigation pane of the DataMap page, click the
icon. On the page that appears, find the StarRocks data source type and click Manage in the upper-right corner.
On the StarRocks Metadata Collection page, click the Data Sources for Which No Crawler Is Created tab. On the tab, set the Workspace parameter to your workspace and the Data Source Name parameter to the name of the StarRocks data source that you added in SettingCenter.
Configure a metadata crawler.
Find the desired StarRocks data source and click Create Crawler in the Actions column. In the Configure Collection Plan dialog box, set the Resource Group Name parameter to the name of the resource group that you created and click Test Network Connectivity. Wait until the connectivity test is successful, configure the Collection Plan parameter, and then click Confirm.
Run the metadata crawler.
Click the Data Sources for Which Crawler Is Created tab. On the tab, find the collection plan that you configured. Click Run in the Actions column. Wait until the Status of the collection plan changes to Successful.
Collect the metadata of the StarRocks external catalog
You must perform the steps for collecting the metadata of the StarRocks internal catalog in Step 4: Configure a metadata crawler to collect the metadata of the MySQL data source. Otherwise, the external catalog information of the MySQL data source cannot be found.
If you create a metadata crawler for a MySQL data source, select the MySQL data source type on the Collect Metadata page.
Step 5: Search for metadata
Wait until the StarRocks and MySQL metadata collection tasks are complete. Then, click the icon in the left-side navigation pane of the DataMap page to go to the search page.
On the Data Source tab in the Type section, click StarRocks. In the Filter Conditions section, set the Instance parameter to the StarRocks instance that you created, the Data Catalog parameter to the name of the external data storage catalog that you created for the MySQL data source, and the Database parameter to the corresponding MySQL database. You can view the MySQL catalog information in the StarRocks data source. The following figure shows the result.
Click MySQL on the Data Source tab in the Type section. In the Filter Conditions section, set the Database parameter to the name of the MySQL database you created. Then, check whether the table information is the same.
View the table details.
You can click the table name to view the table details, as shown in the following figure.
The following figure shows the details.
Sample configuration of a StarRocks external catalog
Syntax used by the sample configuration of a StarRocks external catalog
CREATE EXTERNAL CATALOG <Catalog_Name> COMMENT '' PROPERTIES("type"="","xxx1"="","xxx2"="");
The Catalog_Name
parameter specifies the name of the external catalog. You can configure the catalog name based on your business requirements.
The following table describes how to use the StarRocks external catalog feature to collect the metadata of external data sources. For more information.
Collection method | Connection method | Sample configuration of a StarRocks external catalog |
Default Catalog | default | If the default catalog is used, the internal metadata of StarRocks is collected by default. In this case, you do not need to configure an external catalog. For more information, see Overview of catalogs. |
ODPS Catalog | VPC |
Note You must replace |
Hive Catalog | Hive Metastore (HMS) |
Note Replace the value of the |
Data Lake Framework (DLF) |
Note Replace the value of the | |
Iceberg Catalog | Hive |
Note If you use Hive Metastore as the metadata service when you configure an Iceberg external catalog, replace the value of the |
Hudi Catalog | Hive |
Note If you use Hive Metastore as the metadata service when you configure a Hudi external catalog, replace the value of the |
Data Lake Framework (DLF) |
Note If you use Data Lake Formation (DLF) as the metadata service when you configure a Hudi external catalog, replace the value of the | |
Delta Lake Catalog | Hive |
Note
|
Data Lake Framework (DLF) |
Note Replace the value of the | |
JDBC Catalog | MySQL |
Note
|
PostgreSQL |
Note
| |
Paimon Catalog | Hive (StarRocks 3.1 and later) |
Note If you use Hive as the metadata service when you configure a Paimon external catalog, take note of the following items:
|
Data Lake Formation (StarRocks 3.1 and later) |
Note If you use dlf as the metadata service when you configure a Paimon external catalog, take note of the following items:
| |
Unified Catalog | Hive (StarRocks 3.2 and later) |
Note If you use Hive as the metadata service when you configure a unified external catalog, replace the value of the |