Virtual Private Cloud (VPC) helps you build an isolated network environment. You can customize Classless Inter-Domain Routing (CIDR) blocks, create one or more subnets, and configure route tables and gateways for VPCs. You can create E-MapReduce (EMR) clusters in different VPCs and use Express Connect to enable the VPCs to communicate with each other.

For more information about VPC, see What is a VPC. In addition, VPCs and Internet data centers (IDCs) can communicate with each other by using Express Connect. VPCs in different regions or under different accounts can communicate with each other by using Cloud Enterprise Network (CEN).

Create an EMR cluster in a VPC

You can create an EMR cluster in a classic network or a VPC. To create a cluster in a VPC, you must configure the following information:

  • VPC: Select the VPC to which the EMR cluster belongs. If no VPCs are created, create one in the VPC console. You can create a maximum of two VPCs under an account. To create more than two VPCs, submit a ticket.

  • VSwitch: Select the VSwitch over which Elastic Compute Service (ECS) instances in the EMR cluster can communicate with each other. If no VSwitches are created, log on to the VPC console and create a VSwitch on the VSwitches page. A VSwitch belongs to a specific zone. You must make sure that the VSwitch is in the zone where the EMR cluster resides.

  • Security group: Select the existing security group to which the EMR cluster belongs. For security purposes, only the security groups created in EMR are available in the drop-down list. To create a security group, you can directly enter a name in the Security Group Name field.

Connect EMR clusters that belong to different VPCs under the same account

This example creates a Hive cluster and an HBase cluster that belong to different VPCs and uses CEN to enable the Hive cluster to access the HBase cluster.
  1. Create clusters.

    In the EMR console, create Hive cluster C1 and HBase cluster C2 in the China (Hangzhou) region. C1 belongs to VPC 1 and C2 belongs to VPC 2.

  2. Connect the two VPCs.

    Create a CEN instance to connect the two clusters. For more information, see Create a CEN instance. Make sure that the CEN instance resides in the same region as the clusters.

  3. Use SSH to log on to the HBase cluster and run the following command in the HBase shell:
    hbase(main):001:0> create 'testfromHbase','cf'
  4. Use SSH to log on to the Hive cluster and follow these steps:
    1. Append the following line to the hosts file:
      $zk_ip emr-cluster //$zk_ip indicates the IP address of the ZooKeeper node in the HBase cluster.
    2. Run the following commands in the Hive shell to connect to the HBase cluster:
      hive> set hbase.zookeeper.quorum=172. *. *.111,172. *. *.112,172. *. *.113;
      hive> CREATE EXTERNAL TABLE IF NOT EXISTS testfromHive (rowkey STRING, pageviews Int, bytes STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'testfromHbase');
    3. If the java.net.SocketTimeoutException error appears, which indicates that the connection failed, add rules in the security group of the HBase cluster to enable all ports required for the Hive cluster to access the HBase cluster. The following figure shows an example.Security group rules

      By default, only port 22 is enabled for security groups created in EMR. However, the Hive cluster cannot access the HBase cluster at this port. Therefore, you must enable all ports required for the Hive cluster to access the HBase cluster.