Dataphin supports using ArgoDB as an offline computing engine to process offline computing tasks.
Prerequisites
You have initialized the TDH Inceptor metadata warehouse compute engine and set the compute engine to TDH Inceptor. For more information, see Initialize a metadata warehouse by using TDH as the metadata warehouse compute engine and Set the compute engine of a Dataphin instance to TDH or ArgoDB.
Creating an ArgoDB compute source requires a TDH Inceptor metadata warehouse compute engine. Other types of metadata warehouse compute engines are not supported.
Background information
ArgoDB is a distributed analytical database developed by Transwarp that replaces hybrid Hadoop and Massively Parallel Processing (MPP) architectures. It supports standard SQL workloads and provides capabilities such as multi-model data analytics, real-time data processing, decoupled storage and compute, and hybrid deployment on heterogeneous servers. For more details, visit the official ArgoDB website.
Limitations
-
When you use MySQL Metadatabase, ArgoDB System Library, or HMS to acquire metadata, some information may be unavailable or inaccurate, as described below.
-
If Metadata Acquisition Method is set to MySQL Metadatabase or HMS:
-
Data volume for Asset Panorama, Data Portal, and projects is unavailable.
-
Table data volume, partition data volume, and partition record counts in Asset Catalog are unavailable.
-
Storage-related metrics in Resource Management are inaccurate.
-
Data volume and record counts for
dim_dataphin_tableanddim_dataphin_partitionin the metadata warehouse sharing model are unavailable.
-
-
If Metadata Acquisition Method is set to ArgoDB System Library:
-
Partition record counts in Asset Catalog are unavailable.
-
Data volume and partition data volume for holodesk tables in Asset Catalog are unavailable.
-
Record counts for
dim_dataphin_tableanddim_dataphin_partition, and data volume for holodesk-format tables in the metadata warehouse sharing model are unavailable.
-
-
-
If the HDFS connection uses Non-Kerberos Authentication and the ArgoDB configuration uses Non-LDAP Authentication, unknown issues can occur. Before enabling these options, contact the Dataphin operations and deployment team for confirmation.
-
Other limitations:
-
Dataphin does not support table management when ArgoDB is used as the compute source.
-
The salted hash desensitization algorithm (including salted SHA256, salted SHA384, salted SHA512, and salted MD5) and the Gaussian Noise (GaussianNoise) desensitization algorithm are not supported.
-
SQL dialects such as Oracle, IBM DB2, and Teradata are not supported. Oracle and DB2 stored procedures are also not supported. Errors may occur during SQL execution.
-
A range-partitioned table supports only Data Query Language (DQL) statements and a limited number of Data Definition Language (DDL) and Data Manipulation Language (DML) statements.
-
Procedure
-
On the Dataphin homepage, click Plan.
-
In the left-side navigation pane, choose Projects > Compute Source.
-
On the Compute Source page, click + New Compute Source and select ArgoDB Compute Source from the drop-down list.
-
On the Create Compute Source page, configure the parameters.
a. Configure Basic Information.
Parameter
Description
Compute Source Type
Select ArgoDB.
Compute Source Name
The naming conventions are as follows:
-
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
-
It must be 64 characters or less.
Compute Source Description
A brief description of the compute source.
b. Configure Basic Cluster Information.
Parameter
Description
namenode
This field is pre-filled with the NameNode value from your compute settings and cannot be modified.
core-site.xml, hdfs-site.xml, hive-site.xml, yarn-site.xml, Other Configuration Files
Upload the HDFS configuration files
core-site.xmlandhdfs-site.xml, the Hive configuration filehive-site.xml, and theyarn-site.xmlfile.If you have other configuration files, you can upload them in the corresponding section.
Authentication Method
If your ArgoDB cluster uses Kerberos authentication, select Kerberos. Kerberos is a symmetric-key authentication protocol that supports Single Sign-On (SSO), allowing a client to access multiple services such as HBase and HDFS after a single authentication.
If you select Kerberos authentication, you must upload a krb5 file or specify the KDC server address:
-
Krb5 File Configuration: Upload a krb5 file for Kerberos authentication.
-
KDC Server Address: The address of the Key Distribution Center (KDC) server, which assists with Kerberos authentication. You can specify multiple KDC server addresses, separated by commas (,).
c. Configure parameters in the HDFS Connection Information section.
Parameter
Description
Execution Username, Password
The username and password for logging in to the execution machine to run MapReduce tasks and read from or write to HDFS.
ImportantEnsure that the specified user has the required permissions to submit MapReduce tasks.
Authentication Method
If your HDFS uses Kerberos authentication, select Kerberos. Kerberos is a symmetric-key authentication protocol that supports SSO, allowing a client to access multiple services such as HBase and HDFS after a single authentication.
-
If you select Kerberos authentication, you must upload a keytab file and configure the principal:
-
Keytab File: Upload the keytab file for Kerberos authentication.
-
Principal: The Kerberos principal name.
-
-
If you select no authentication, you must specify the username for accessing HDFS.
d. Configure parameters in the ArgoDB Configuration section.
Parameter
Description
JDBC URL
Configure the JDBC connection address for Hive Server in the format
jdbc:hive2://InceptorServerIP:Port/Database.Authentication Method
Select the authentication method for ArgoDB based on your ArgoDB configuration. Supported options are No Authentication, LDAP, and Kerberos:
-
No Authentication: No authentication is required.
-
LDAP: Provide the username and password for access.
-
Kerberos: Upload a Kerberos authentication file and provide the principal.
Execution User for Development Tasks
Based on the selected authentication method, configure the username and password, or upload a Kerberos authentication file and provide the principal for tasks in the development environment.
Execution User for Periodically Scheduled Tasks
Based on the selected authentication method, configure the username and password, or upload a Kerberos authentication file and provide the principal for periodically scheduled tasks.
Priority Task Queue
Choose how to specify the execution user for priority tasks. You can select Use Default Execution User or Custom.
If you select Custom, you must configure different usernames for tasks with different priorities.
NotePriority queues allocate resources by creating different YARN queues on the Hadoop cluster. Based on a task's priority level, it is submitted to the corresponding YARN queue.
e. Configure ArgoDB Metadata Connection Information.
Parameter
Description
Metadata Acquisition Method
Acquire metadata from a metadatabase or from Hive Metastore (HMS). To use HMS, first upload the
hdfs-site.xml,hive-site.xml, andcore-site.xmlfiles and configure the authentication method in the cluster configuration section.Database Type
Select the metadatabase type for ArgoDB. Currently, only ArgoDB is supported.
JDBC URL
Enter the connection address for the corresponding metadata database in the format
jdbc:postgresql://{host}:{port}/{database name}.Username, Password
Enter the username and password for logging in to the metadatabase.
NoteEnsure the specified account has the necessary data permissions for tasks to run as expected.
-
-
Click Test Connection.
-
After the connection test succeeds, click Submit.
Next steps
After creating the ArgoDB compute source, you can bind it to a project. For more information, see Create a general-purpose project.