This topic describes how to create an E-MapReduce (EMR) table.

Prerequisites

  • An Alibaba Cloud EMR cluster is created. The inbound rules of the security group to which the cluster belongs include the following rules:
    • Action: Allow
    • Protocol type: Custom TCP
    • Port range: 8898/8898
    • Authorization object: 100.104.0.0/16
  • An EMR compute engine instance is associated with the desired workspace. The EMR folder is displayed only after you associate an EMR compute engine instance with the workspace on the Workspace Management page. For more information, see Configure a workspace.
  • The metadata of an EMR data source is collected in Data Map so that you can select an EMR database when you create a table. For more information, see Collect metadata from an EMR data source.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon icon and choose EMR > table.
    You can also find the workflow in which you want to create an EMR table, right-click EMR, and then choose Create > Table.
  3. In the Create Table dialog box, set the parameters as required.
    Create Table dialog box
    Parameter Description
    Engine type The default value is EMR, which cannot be modified.
    Table Name The name of the EMR table.
    Engine Instance Select an engine instance from the drop-down list.
    Database Select the database in which the engine instance resides from the drop-down list.
    Note You must collect metadata before you can select a database.
  4. Click Create. The table configuration tab appears.
    The upper part of the tab shows the configurations that you specified in the Create Table dialog box. You can modify the database in which the EMR engine instance resides. To create a database, click Create a database. In the Create a database dialog box, set the parameters and click OK.
  5. In the Basic attributes section, set the parameters as required.
    Basic attributes section
    Parameter Description
    Level 1 theme The name of the level-1 folder in which the table resides.
    Note Level-1 and level-2 folders show the table locations in DataWorks for you to manage tables more conveniently.
    Level 2 theme The name of the level-2 folder in which the table resides.
    Create a theme Click Create a theme to go to the Folder Management tab. On this tab, you can create level-1 and level-2 folders.
    Refresh After you create a folder, click Refresh.
    Description The description of the table.
  6. In the Physical model design section, set the parameters as required.
    Physical model design section
    Parameter Description
    Layer Select the appropriate level and category from the drop-down list. To add levels and categories, contact the workspace administrator to click Create a level to go to the Level Management tab. After you create levels and categories, click Refresh.
    Physical classification
    Partition type Valid values: Partition table and Non-partitioned table.
    Table type Valid values: Internal tables and External tables.
  7. In the Table structure design section, set the parameters as required.
    Table structure design section
    Parameter or button Description
    Add fields To add a field, click Add fields, configure the field information, and then click Save.
    Move up Adjust the sequence of fields in the table to be created. If you adjust the sequence of fields in an existing table, you are required to delete the table and create another table with the same name. These operations are forbidden in the production environment.
    Move down
    Field name The name of the field, which can contain letters, digits, and underscores (_).
    Field type The EMR table supports the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, CHAR, STRING, BINARY, DATETIME, DATE, TIMESTAMP, BOOLEAN, ARRAY, MAP, and STRUCT.
    Length/Settings You must set this parameter if the data type that you specify for the field has a length limit.
    Description The description of the field.
    Primary key Specifies whether the field serves as the primary key. The primary key is a business concept and ensures that a record is unique for your business. DataWorks does not impose a limit on the primary key.
    Edit After you save a field, you can click Edit to edit the field and then click Save.
    Delete Delete a created field.
    Note If you delete a field from an existing table and then commit the table, you are required to delete the table and create another table with the same name. This operation is forbidden in the production environment.
    Add If you set the Partition type parameter to Partition table in the Physical model design section, you must configure a partition for the table.

    You can add a partition to the current table. If you add a partition to an existing table, you are required to delete the table and create another table with the same name. This operation is forbidden in the production environment.

  8. Click the Submit icon icon in the top toolbar to commit the EMR table to the production environment.
    If you are using a workspace in standard mode, commit the table to the development environment and the production environment in sequence.
    Notice
    • You cannot create an EMR table in data definition language (DDL) mode.
    • You must select a resource group for scheduling when you commit the EMR JAR resource. We recommend that you use an exclusive resource group for scheduling. If no exclusive resource groups for scheduling are available, you can purchase and configure one. For more information, see Create and use an exclusive resource group for scheduling.