E-MapReduce (EMR) provides MetaService, which serves as a special ECS application role. In EMR V3.32.0 and earlier V3.X.X versions as well as in EMR V4.5.0 and earlier V4.X.X versions, when you create a cluster, this role is automatically bound to your cluster. Applications that run on your EMR cluster use this role to access other Alibaba Cloud resources without an AccessKey pair. This avoids the disclosure of the AccessKey pair in a configuration file.

Prerequisites

This role is authorized. For more information, see Authorize roles.

Background information

MetaService allows you to access only Object Storage Service (OSS), Log Service, and Message Service (MNS) without an AccessKey pair.

Permissions

The default role AliyunEmrEcsDefaultRole is configured with the policy AliyunEmrECSRolePolicy. The following table describes OSS-related permissions.
Permission (Action) Description
oss:PutObject Uploads a file or folder.
oss:GetObject Obtains a file or folder.
oss:ListObjects Queries files.
oss:DeleteObject Deletes a file.
oss:AbortMultipartUpload Terminates a multipart upload event.
Notice Modify or delete the AliyunEmrEcsDefaultRole role with caution. Otherwise, your cluster fails to be created or jobs fail to be run.

Data sources that support MetaService

MetaService allows you to access OSS, Log Service, and MNS. You can use an EMR SDK in your EMR cluster to read data from and write data to the preceding data sources without an AccessKey pair.

By default, only access to OSS is enabled. If you want to read data from and write data to Log Service and MNS, log on to the RAM console and configure the required permissions for the AliyunEmrEcsDefaultRole role. For more information, see .

For more information about how to authorize a RAM role, see Grant permissions to a RAM role.

Use MetaService

MetaService allows you to access OSS, Log Service, and MNS without an AccessKey pair. MetaService provides the following benefits:
  • Reduces the risk of AccessKey information leak. To minimize the security risk, authorize roles in the RAM console based on the principle of least privilege.
  • Improves user experience. MetaService shortens the OSS path that you need to enter during interactive access to OSS resources.
  • Brings the following benefits for services in your EMR cluster:

    The jobs that you run in the services can access Alibaba Cloud resources (OSS, Log Service, and MNS) without an AccessKey pair.

    Comparison of operations before and after MetaService is used:
    • Run the hadoop fs -ls command to view OSS data.
      • MetaService is not used:
        hadoop fs -ls oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c
      • MetaService is used:
        hadoop fs -ls oss://bucket/a/b/c
    • Create an external table in Hive.
      • MetaService is not used:
        CREATE EXTERNAL TABLE test_table(id INT, name string)
                ROW FORMAT DELIMITED
                FIELDS TERMINATED BY '/t'
                LOCATION 'oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c';
      • MetaService is used:
        CREATE EXTERNAL TABLE test_table(id INT, name string)
                ROW FORMAT DELIMITED
                FIELDS TERMINATED BY '/t'
                LOCATION 'oss://bucket/a/b/c';
    • Use Spark to view OSS data.
      • MetaService is not used:
        val data = sc.textFile("oss://ZaH******As1s:Ba23N**************sdaBj2@bucket.oss-cn-hangzhou-internal.aliyuncs.com/a/b/c")
      • MetaService is used:
        val data = sc.textFile("oss://bucket/a/b/c")
  • Brings the following benefits for self-deployed services:
    MetaService is an HTTP service. You can access the URL of this HTTP service to obtain a Security Token Service (STS) temporary credential. Then, you can use the STS temporary credential to access Alibaba Cloud resources without an AccessKey pair in self-managed systems.
    Notice A new STS temporary credential is generated 30 minutes before the current one expires. Both STS credentials can be used within the 30 minutes.

    For example, you can run curl http://localhost:10011/cluster-region to obtain the region where your cluster resides.

    You can use MetaService to obtain the following information:
    • Region: /cluster-region
    • Role name: /cluster-role-name
    • AccessKey ID: /role-access-key-id
    • AccessKey secret: /role-access-key-secret
    • Security token: /role-security-token
    • Network type: /cluster-network-type