This topic describes how to connect to Dataphin using the Java Database Connectivity (JDBC) driver.
Prerequisites
Dataphin JDBC uses the Dataphin OpenAPI to run SQL queries. Before you can use Dataphin JDBC, you must enable the Dataphin OpenAPI feature.
Function overview
Two authentication modes: Dataphin JDBC supports two authentication modes: basic mode and proxy mode. For more information, see Authentication modes.
Consistent execution results: Running an SQL statement using Dataphin JDBC is equivalent to running an SQL statement in Dataphin. Therefore, the SQL execution results are consistent with the results from the Dataphin UI. Configurations in Dataphin, such as permissions, data masking rules, security settings, and code specifications, also affect the execution results of Dataphin JDBC.
Limits
Dataphin does not support JDBC connections if the compute engine is Databricks, SelectDB, or Doris.
When Dataphin JDBC runs an SQL statement, Dataphin performs pre-processing operations, such as SQL translation and data masking. The result data is also transmitted and forwarded through Dataphin. These processes cause performance degradation. Therefore, the query takes longer than a direct query to the compute engine.
Dataphin JDBC runs SQL statements by submitting them to Dataphin through the Dataphin OpenAPI. Dataphin then processes the statements and submits them to the underlying compute engine for execution. Before you use Dataphin JDBC, you must assess the number of calls and determine whether you need to scale out the Dataphin cluster and compute engine.
NoteIf you cannot assess the capacity, you can submit a capacity assessment request for the Dataphin cluster to the Dataphin Operations and Maintenance (O&M) team. This request does not include a capacity assessment for the compute engine.
Dataphin does not support throttling or concurrency control. You must carefully assess the impact.
JDBC driver version and download link
Contact a Dataphin O&M engineer to obtain the driver JAR package.
Connection parameters
When Dataphin JDBC uses the platform AccessKey ID and AccessKey secret (proxy mode), you can use the set syntax to specify the access user.
set dp_delegation_uid = 'target_source_user_uid';set dp_delegation_name = 'target_user_name';
JDBC URL format: jdbc:dataphin://host:port/catalog?[tenant_id=TenantID][&ssl=true][&log_level=Log_Level][&user=UserName][&password=PassWord][&delegation_uid=DelegationUid][&account_type=AccountType][&connect_timeout=ConnectTimeout][&engine=engine_type][compute_project=ProjectName]
Parameter names are case-sensitive.
The square brackets ([]) are for illustrative purposes only. You must remove them when you run the command.
Parameter | Required | Description | Example |
host | Yes | The domain name of the Dataphin OpenAPI. You can obtain the domain name of Dataphin from OpenAPI Call Address in Personal Center > AccessKey Management. The following figure shows an example.
| dataphin-openapi.****.aliyun.com |
port | No | The port number depends on whether HTTPS is enabled for the Dataphin OpenAPI. If HTTPS is enabled, the port is | 80 |
catalog | Yes | The default queryable data scope.
Note
| Exprojectname |
tenant_id | Yes | The ID of the tenant for the query. | 111***111 |
ssl | No | Specifies whether to use HTTPS.
| False |
currentschema | No | The schema of the data source.
| information_schema |
compute_project | No | The English name of the project. This project is used to run SQL queries. The project and its attached compute engine must have the read permission on the queried tables.
| Exprojectname |
user | Yes | The AccessKey ID of the user or the platform. In proxy mode, use the platform's AccessKey ID.
| kIB**********PT0 |
log_level | No | The logging level. Valid values:
| DEBUG |
password | Yes | The AccessKey secret of the access user.
| Cy**************r2T |
delegation_uid | No | When you use a platform AccessKey, you can act as a proxy for other users to establish connections and access Dataphin. This parameter specifies the Dataphin user to be proxied. For the Dataphin user to be proxied, you must pass the corresponding account ID based on the selected account_type. After you set this parameter, the proxy mode is used for authentication. | 999***999 |
account_type | No | When you use proxy mode for authentication, you must specify the account type of the proxied user.
Note
| USER_ID |
connect_timeout | No | The maximum timeout period for obtaining a connection, in seconds.
| 10 |
engine | No | The offline engine for the project specified by project_name in the session or connection string. If it is a Hadoop compute source, the default engine is Hive. You can set this parameter to Impala or Spark. To set the engine type, you must configure it in the compute source in advance. If the specified engine is not supported by the project, the setting is ignored, but an alert is reported. Valid values:
Note This parameter is ignored when you access a data source. | MaxCompute |
acceleration_source | No | Specifies the acceleration source. You can select the code of any acceleration source within the tenant. | starrocks_code |
acceleration_resource_group | No | The acceleration resource group. You can select a configured resource group for the selected acceleration source. | starrocks_resource_group |
Authentication modes
Basic mode
You can set the username to the user's AccessKey ID and the password to the user's AccessKey secret. The current access is then identified as that user. For more information about how to view the AccessKey, see Dataphin OpenAPI AccessKey management.
When you access Dataphin using the JDBC driver, Dataphin authenticates the AccessKey and authorizes the user of the AccessKey to access resources or run SQL statements. Dataphin then identifies the user who runs the SQL statements.
Proxy mode
Before you use proxy mode, you must contact the Dataphin O&M team to enable and configure this feature.
Proxy mode is typically used for integration between systems and Dataphin. This method avoids the need to distribute or configure AccessKeys for different users. By specifying the proxy user in the JDBC connection string (JDBC URL), you can ensure that operations are authorized based on the specified user. Platform-level AccessKeys have high permissions and can act as a proxy for all users. As shown in the following figure, the client is configured with a platform AccessKey. When userA accesses the client, you must specify delegation_uid=userA to indicate that this access is a proxy for userA. The function permissions and data permissions of userA are then used for permission verification.
Dataphin driver details
com.aliyun.dataphin.jdbc.DataphinDriver
Interface | Function description | Interface definition |
connect | Obtains a database connection. | |
acceptsURL | Checks whether the URL is supported. | |
com.aliyun.dataphin.jdbc.DataphinConnection
Interface | Description | Interface definition |
createStatement | Creates a Statement object. | |
prepareStatement | Creates a PreparedStatement object. | |
com.aliyun.dataphin.jdbc.DataphinStatement
Interface | Description | Interface definition |
executeQuery | Runs an SQL statement and returns a ResultSet object. | |
setFetchSize | Retrieves result data in batches based on the specified number of rows. If this parameter is not set or is set to 0, the default value is 1,000. | |
cancel | Cancels the execution of the Statement object. | |
com.aliyun.dataphin.jdbc.DataphinPrepareStatement
Interface | Description | Interface definition |
executeQuery | Runs an SQL statement and returns a ResultSet object. | |
com.aliyun.dataphin.jdbc.DataphinResultSetMetaData
Interface | Description | Interface definition |
getColumnCount | Obtains the number of columns in the result schema. | |
getColumnName | Obtains the column name of the result schema. | |
com.aliyun.dataphin.jdbc.ResultSet
Interface | Description | Interface definition |
next | Retrieves the data of SQL execution results row by row. | |
com.aliyun.dataphin.jdbc.DatabaseMetaData
Interface | Description | Interface definition |
getTables | Obtains table information.
| |
getColumns | Obtains the field information of a table.
| |
Examples
You can retrieve catalog information using the Dataphin JDBC driver.
1. Obtain a list of tables
This operation retrieves the list of physical tables and physical views in a project.
Syntax
SHOW TABLES
[FROM db_name]
[LIKE 'pattern']Parameter description
db_name:
The name of the Dataphin project. The physical tables in the specified project are displayed.
For development projects and data domains, you must explicitly add _Dev.
If you do not specify db_name, the tables in the project that is specified by project_name in the URL are displayed by default.
Return result
Name | Type | Comment |
dim_user | LOGICAL TABLE | User table. |
ods_user | PHYSICAL TABLE | User source table. |
ods_user_logical_view | LOGICAL_VIEW | Logical view. |
ods_user_physical_view2 | PHYSICAL_VIEW | Physical view. |
2. Obtain the schema of a table
This operation retrieves the field details of a physical table or a physical view.
Syntax
{DESCRIBE | DESC} table_name;Only physical tables and physical views are supported.
Parameter description
table_name: The name of the physical table or physical view.
Return result
Name | Type | Comment |
ID | BigInt | User ID. |
Name | String | Username. |
DS | String | Partition time. |
Server-side connection control
Maximum connections: The default value is 100.
Connection timeout: The default value is 288,000 s (2 hours). If a service connects to the database but remains idle for a specific period, the connection is automatically disconnected. If you perform an operation over this connection, an error is reported or the connection is closed unexpectedly. This corresponds to the Connection_Idle_Timeout parameter in the connection URL.
Information about Dataphin JDBC jobs submitted to MaxCompute instances for execution
When you submit a Dataphin JDBC job to a MaxCompute instance for execution, the system also submits Dataphin-related information. The following table describes the information.
Parameter | Description |
logical_project | The name of the Dataphin project where the JDBC job is run. |
EXT_JDBC_TASKRUN_ID | The ID of the JDBC job. |
EXT_DPN_TENANT_ID | The ID of the Dataphin tenant where the JDBC job is run. |
EXT_PLATFORM_ID | The ID of the upper-layer platform that submits the job to MaxCompute. The default value is Dataphin. |
biz_id | The ID of the Dataphin member who runs the JDBC job. |
odps.idata.userenv | The user environment information. This includes the Java SDK version, Java version, IP address, and device MAC address. Example:
|
You can use the submitted information to analyze MaxCompute bills and view the time consumed by jobs. For more information, see Collect statistics on accounts of top N costs and time-consuming jobs of MaxCompute projects.
