All Products
Search
Document Center

Dataphin:Offline integration using Hive foreign tables created based on OSS

Last Updated:Jan 21, 2025

To perform offline integration in Dataphin using Hive foreign tables based on OSS within the E-MapReduce 5.x Hadoop compute engine, you must first configure the necessary settings. This topic provides guidance on configuring these settings.

Configuration instructions

Before using this feature, configure the required parameters in the core-site.xml file of the Hive data source or Hadoop compute source and upload the updated file.

  • When Dataphin and OSS are in the same region, set the fs.oss.endpoint parameter in the core-site.xml file.

  • If Dataphin and OSS are in different regions, in addition to the fs.oss.endpoint parameter, include the accessKeyId and accessKeySecret parameters.

Note

The internal network address does not require the configuration of accessKeyId and accessKeySecret.

Configuration examples

  • Dataphin and OSS Are in the Same Region

    <property>
    <name>fs.oss.endpoint</name>
    <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    
  • Dataphin and OSS Are in Different Regions

    <property>
    <name>fs.oss.endpoint</name>
    <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>ak</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>ks</value>
    </property>
    
    Note
    • For the fs.oss.endpoint parameter's <value>, configure it according to your regional environment. For more information, see the referenced document.

    • For the fs.oss.accessKeyId and fs.oss.accessKeySecret parameters' <value>, enter your account's AccessKey information. To obtain the AccessKey, see Create AccessKey.

FAQ

If an error occurs during the offline integration process with the error message com.alibaba.dt.pipeline.plugin.center.exception.DataXException: Code:[HDFSConnection-06], Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: No FileSystem for scheme: oss.

Add the following configuration to the core-site.xml file:

<property>
    <name>fs.oss.impl</name>
    <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
</property>
<property>
    <name>fs.AbstractFileSystem.oss.impl</name>
    <value>com.aliyun.jindodata.oss.OSS</value>
</property>
<property>
    <name>fs.jindofsx.data.cache.enable</name>
    <value>false</value>
</property>
<property>
    <name>fs.jindofsx.namespace.rpc.address</name>
    <value>emr-cluster:8101</value>
</property>
Important

For the fs.jindofsx.namespace.rpc.address parameter's <value>, set it according to your cluster's configuration. If you need assistance, consult the EMR product helpdesk.

If an error occurs during the offline integration process with the error message Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: ERROR: not found login secrets, please configure the accessKeyId and accessKeySecret.

Add the following configuration to the core-site.xml file:

<property>
    <name>fs.jindofsx.namespace.rpc.address</name>
    <value>emr-cluster:8101</value>
</property>