To perform offline integration in Dataphin using Hive foreign tables based on OSS within the E-MapReduce 5.x Hadoop compute engine, you must first configure the necessary settings. This topic provides guidance on configuring these settings.
Configuration instructions
Before using this feature, configure the required parameters in the core-site.xml
file of the Hive data source or Hadoop compute source and upload the updated file.
When Dataphin and OSS are in the same region, set the
fs.oss.endpoint
parameter in thecore-site.xml
file.If Dataphin and OSS are in different regions, in addition to the
fs.oss.endpoint
parameter, include theaccessKeyId
andaccessKeySecret
parameters.
The internal network address does not require the configuration of accessKeyId and accessKeySecret.
Configuration examples
Dataphin and OSS Are in the Same Region
<property> <name>fs.oss.endpoint</name> <value>oss-cn-hangzhou-internal.aliyuncs.com</value> </property>
Dataphin and OSS Are in Different Regions
<property> <name>fs.oss.endpoint</name> <value>oss-cn-hangzhou-internal.aliyuncs.com</value> </property> <property> <name>fs.oss.accessKeyId</name> <value>ak</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>ks</value> </property>
NoteFor the
fs.oss.endpoint
parameter's<value>
, configure it according to your regional environment. For more information, see the referenced document.For the
fs.oss.accessKeyId
andfs.oss.accessKeySecret
parameters'<value>
, enter your account's AccessKey information. To obtain the AccessKey, see Create AccessKey.
FAQ
If an error occurs during the offline integration process with the error message com.alibaba.dt.pipeline.plugin.center.exception.DataXException: Code:[HDFSConnection-06], Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: No FileSystem for scheme: oss
.
Add the following configuration to the core-site.xml
file:
<property>
<name>fs.oss.impl</name>
<value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.oss.impl</name>
<value>com.aliyun.jindodata.oss.OSS</value>
</property>
<property>
<name>fs.jindofsx.data.cache.enable</name>
<value>false</value>
</property>
<property>
<name>fs.jindofsx.namespace.rpc.address</name>
<value>emr-cluster:8101</value>
</property>
For the fs.jindofsx.namespace.rpc.address
parameter's <value>
, set it according to your cluster's configuration. If you need assistance, consult the EMR product helpdesk.
If an error occurs during the offline integration process with the error message Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: ERROR: not found login secrets, please configure the accessKeyId and accessKeySecret
.
Add the following configuration to the core-site.xml
file:
<property>
<name>fs.jindofsx.namespace.rpc.address</name>
<value>emr-cluster:8101</value>
</property>