This topic describes the parameters that you must configure when you use the metadata of Data Lake Formation (DLF) in an Iceberg table.

The following compute engines are supported:

Spark

Alibaba Cloud Object Storage Service (OSS) is used as the file system. The default name of the catalog and the parameters that you must configure vary based on the version of your cluster.

  • EMR V3.40 or a later minor version, and EMR V5.6.0 or later

    Note The default name of the catalog is iceberg.
    Parameter Description Remarks
    spark.sql.extensions The SQL extension module of Spark. Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
    Note This parameter is introduced in Iceberg 0.11.0. Only Spark 3.x supports this parameter.
    spark.sql.catalog.iceberg<catalog-name> The name of the catalog. Set the value to org.apache.iceberg.spark.SparkCatalog.
    spark.sql.catalog.<catalog-name>.catalog-impl The class name of the catalog. Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
  • EMR V3.39.X and EMR V5.5.X

    Note The default name of the catalog is dlf.
    Parameter Description Remarks
    spark.sql.extensions The SQL extension module of Spark. Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
    Note This parameter is introduced in Apache Iceberg 0.11.0. Only Apache Spark 3.x supports this parameter.
    spark.sql.catalog.<catalog-name> The name of the catalog. Set the value to org.apache.iceberg.spark.SparkCatalog.
    spark.sql.catalog.<catalog-name>.catalog-impl The class name of the catalog. Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
  • EMR V3.38.X, EMR V5.3.X, and EMR V5.4.X

    Note The default name of the catalog is dlf_catalog.
    Parameter Description Remarks
    spark.sql.extensions The SQL extension module of Spark. Set the value to org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
    Note This parameter is introduced in Apache Iceberg 0.11.0. Only Apache Spark 3.x supports this parameter.
    spark.sql.catalog.<catalog-name> The name of the catalog. Set the value to org.apache.iceberg.spark.SparkCatalog.
    spark.sql.catalog.<catalog-name>.catalog-impl The class name of the catalog. Set the value to org.apache.iceberg.aliyun.dlf.DlfCatalog.
    spark.sql.catalog.<catalog-name>.io-impl The name of the class that is written to the catalog during the I/O operation. Set the value to org.apache.iceberg.hadoop.HadoopFileIO.
    spark.sql.catalog.<catalog-name>.oss.endpoint The endpoint of your OSS bucket. For more information, see Regions and endpoints.
    We recommend that you set this parameter to the virtual private cloud (VPC) endpoint of the OSS bucket. For example, if you select the China (Hangzhou) region, set this parameter to oss-cn-hangzhou-internal.aliyuncs.com.
    Note If you want to access OSS across VPCs, set this parameter to the public endpoint of the OSS bucket.
    spark.sql.catalog.<catalog-name>.warehouse The OSS path in which table data is stored. None.
    spark.sql.catalog.<catalog-name>.access.key.id The AccessKey ID of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID of an Alibaba Cloud account, see Obtain an AccessKey pair.
    spark.sql.catalog.<catalog-name>.access.key.secret The AccessKey secret of your Alibaba Cloud account. For more information about how to obtain the AccessKey secret of an Alibaba Cloud account, see Obtain an AccessKey pair.
    spark.sql.catalog.<catalog-name>.dlf.catalog-id The ID of your Alibaba Cloud account. To obtain the ID of your Alibaba Cloud account, go to the Security Settings page. Obtain the ID of your Alibaba Cloud account
    spark.sql.catalog.<catalog-name>.dlf.endpoint The endpoint of DLF.
    We recommend that you set this parameter to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set this parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
    Note You can set this parameter to the public endpoint of DLF. If you select the China (Hangzhou) region, set this parameter to dlf.cn-hangzhou.aliyuncs.com.
    spark.sql.catalog.<catalog-name>.dlf.region-id The ID of the region in which DLF is activated. Make sure that the region you specified in this parameter matches the endpoint you specified in the spark.sql.catalog.<catalog-name>.dlf.endpoint parameter.

Hive

You can configure the parameters described in the following tables based on the version of your cluster.

  • EMR V3.39.0 or a later minor version, and EMR V5.5.0 or later

    Note The default name of the catalog is dlf.
    Parameter Description Remarks
    iceberg.catalog.<catalog-name>.catalog-impl The class name of the catalog. Set the value to org.apache.iceberg.aliyun.dlf.hive.DlfCatalog.
  • EMR V3.38.X, EMR V5.3.X, and EMR V5.4.X

    Note The default name of the catalog is dlf_catalog.
    Parameter Description Remarks
    iceberg.catalog The name of the catalog. Set the value to a custom name.
    iceberg.catalog.<catalog-name>.type The type of the catalog. Set the value to custom.
    iceberg.catalog.<catalog-name>.catalog-impl The class name of the catalog. Set the value to org.apache.iceberg.aliyun.dlf.DlfCatalog.
    iceberg.catalog.<catalog-name>.io-impl The name of the class that is written to the catalog during the I/O operation. Set the value to org.apache.iceberg.hadoop.HadoopFileIO.
    iceberg.catalog.<catalog-name>.warehouse The warehouse path in which table data is stored. Table data can be stored in Hadoop Distributed File System (HDFS) or OSS.
    iceberg.catalog.<catalog-name>.access.key.id The AccessKey ID of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID of an Alibaba Cloud account, see Obtain an AccessKey pair.
    iceberg.catalog.<catalog-name>.access.key.secret The AccessKey secret of your Alibaba Cloud account. For more information about how to obtain the AccessKey secret of an Alibaba Cloud account, see Obtain an AccessKey pair.
    iceberg.catalog.<catalog-name>.dlf.catalog-id The ID of your Alibaba Cloud account. To obtain the ID of your Alibaba Cloud account, go to the Security Settings page. Obtain the ID of your Alibaba Cloud account
    iceberg.catalog.<catalog-name>.dlf.endpoint The endpoint of DLF.
    We recommend that you set this parameter to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set this parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
    Note You can set this parameter to the public endpoint of DLF. If you select the China (Hangzhou) region, set this parameter to dlf.cn-hangzhou.aliyuncs.com.
    iceberg.catalog.<catalog-name>.dlf.region-id The ID of the region in which DLF is activated. Make sure that the region you specified in this parameter matches the endpoint you specified in the iceberg.catalog.<catalog-name>.dlf.endpoint parameter.