Dataphin supports the integration of ArgoDB as an offline compute engine, enabling the handling of offline computing tasks within Dataphin projects. This topic outlines the steps to create an ArgoDB compute source.
Prerequisites
Ensure the TDH Inceptor metadata warehouse compute engine has been initialized and set as the compute engine for the Dataphin instance. For more information, see Initialize the metadata warehouse using TDH as the compute engine and Set the Dataphin instance's compute engine to TDH or ArgoDB.
Note that only the TDH Inceptor metadata warehouse compute engine supports the creation of an ArgoDB compute source. Other metadata warehouse compute engines do not offer this capability.
Background information
ArgoDB, developed by Transwarp, is a distributed analytic database that can substitute the Hadoop + MPP hybrid architecture. It facilitates business development with standard SQL syntax and offers advanced features such as multi-model data analysis, real-time data processing, storage-compute decoupling, and heterogeneous server hybrid deployment.
Limits
Certain metadata may not be retrievable or may be inaccurate when using MySQL metadatabase, ArgoDB System database, or HMS as the metadata retrieval method, as outlined below:
If Metadata retrieval method is set to MySQL metadatabase or HMS:
Asset overview, data section, and project data volumes cannot be retrieved.
Table data volumes, partition data volumes, and partition record counts in the asset directory are not retrievable.
Storage-related metrics in resource administration may be inaccurate.
Data volume and record counts for dim_dataphin_table and dim_dataphin_partition in the metadata warehouse shared model are not retrievable.
If Metadata retrieval method is set to ArgoDB System database:
Partition record count information in the asset directory is not retrievable.
Data volumes for holodesk tables and partition data volumes in the asset directory are not retrievable.
Record counts for dim_dataphin_table and dim_dataphin_partition and data volumes for tables in holodesk format in the metadata warehouse shared model are not retrievable.
When HDFS connection information is set to Non-Kerberos authentication and the ArgoDB configuration to Non-LDAP authentication, unexpected issues may arise. Please consult the Dataphin operations and deployment team for assistance before proceeding.
Additional usage limitations include:
Table management is not supported when using ArgoDB as the compute engine.
Salted hash algorithms (including salted SHA256, salted SHA384, salted SHA512, salted MD5) and the Gaussian noise desensitization algorithm (GaussianNoise) are not supported.
Dialects such as Oracle, IBM DB2, Teradata, along with Oracle and DB2 stored procedures, are not supported. SQL execution may result in errors.
Range partitioned tables only support DQL statements and a limited set of DDL and DML statements.
Procedure
Navigate to the Dataphin home page and click Planning.
In the left-side navigation pane, select Project > Compute Source.
On the Compute Source page, click + Add Compute Source, and from the drop-down menu, select ArgoDB Compute Source.
On the Create Compute Source page, fill in the necessary parameters.
a. Configure the basic information for the compute engine source.
Parameter
Description
Compute Source Type
Choose ArgoDB as the compute source type.
Compute Source Name
The name must adhere to the following criteria:
Only include Chinese characters, numbers, uppercase and lowercase English letters, underscores (_), and hyphens (-).
Be no longer than 64 characters.
Compute Source Description
Provide a brief description of the compute source.
b. Configure the basic information for the cluster.
Parameter
Description
namenode
Defaults to the NameNode parameter value configured in the compute settings. No modifications are allowed.
core-site.xml, hdfs-site.xml, hive-site.xml, Other Configuration Files
Upload the HDFS configuration files core-site.xml and hdfs-site.xml, and the Hive configuration file hive-site.xml.
If additional configuration files are needed, upload them in the other configuration files section.
Authentication Type
If the ARGODB cluster uses Kerberos authentication, select Kerberos as the authentication method. Kerberos is a symmetric-key cryptography-based identity authentication protocol that enables access to multiple services, such as HBase and HDFS, after client authentication (SSO).
Upon selecting Kerberos authentication, upload the Krb5 authentication file or configure the KDC Server address:
Krb5 File Configuration: Upload the Krb5 file required for Kerberos authentication.
KDC Server Address: Provide the KDC server address to facilitate Kerberos authentication. Multiple addresses can be entered, separated by commas.
c. Configure the HDFS Connection Information Area Parameters.
Parameter
Description">Description
Execution Username, Password
Enter the username and password required for logging into the compute execution machine, which are used for executing MapReduce tasks, reading and writing HDFS, and other operations.
ImportantEnsure that the provided credentials have the necessary permissions to submit MapReduce tasks.
Authentication Type
Select Kerberos as the authentication method if HDFS is secured with Kerberos authentication. Kerberos provides identity authentication for services and supports Single Sign-On (SSO), allowing authenticated clients to access multiple services such as HBase and HDFS.
Upon selecting Kerberos authentication, upload the Keytab File and configure the Principal address:
Keytab File: Upload the Keytab file necessary for Kerberos authentication.
Principal: Specify the Kerberos authentication username.
If no authentication is selected, configure the username for HDFS access.
d. Configure the ArgoDB configuration area parameters.
Parameter
Description
JDBC URL
Enter the JDBC connection address for the Hive Server in the format
jdbc:hive2://InceptorServerIP:Port/Database.Authentication Type
Choose the appropriate authentication method for ArgoDB based on the engine's configuration. Options include No Authentication, LDAP, and Kerberos:
No Authentication: No additional authentication required.
LDAP: Configure the username and password for LDAP access.
Kerberos: Upload the Kerberos authentication file and specify the Principal.
Execution User For Development Environment Tasks
Depending on the selected authentication method, configure the execution username and password, or upload the Kerberos authentication file and specify the Principal for tasks in the development environment.
Execution User For Periodic Scheduling Tasks
Configure the execution username and password, or upload the Kerberos authentication file and specify the Principal for periodic scheduling tasks, based on the authentication method.
Priority Task Queue
Allows the choice between Use Default Execution User and Custom as priority execution users.
Upon choosing Custom, configure the username to execute tasks according to their respective priorities.
NotePriority queues allocate resources by creating different Yarn queues on the Hadoop cluster, with tasks of corresponding priorities executed in their respective Yarn queues.
e. Configure the ArgoDB metadata connection information.
Parameter
Description
Metadata Retrieval Method
Choose between metadatabase and HMS for metadata retrieval. When using HMS, upload the hdfs-site.xml, hive-site.xml, and core-site.xml configuration files and configure the authentication method.
Database Type
Select ArgoDB as the metadatabase type. Currently, the system supports ArgoDB.
JDBC URL
Input the connection address for the metadatabase in the format
jdbc:postgresql://<host>:<port>/<database name>.Username, Password
Enter the username and password for the metadatabase login.
NoteTo ensure proper task execution, verify that the user has the necessary database permissions.
Click Test Connection to verify the configurations.
Once the connection test is successful, click Submit to finalize the creation of the ArgoDB compute source.
What to do next
Once you have established the ArgoDB compute source, you can associate it with your project. For detailed instructions, see Create a general project.