All Products
Search
Document Center

DataWorks:Create an EMR table

Last Updated:Sep 21, 2023

This topic describes how to create an E-MapReduce (EMR) table.

Background information

After you associate an EMR cluster with your workspace as a compute engine instance, the Data Map service of DataWorks creates a crawler to collect the metadata of the cluster. If no database is available when you create an EMR table, go to the DataMap page and use the crawler to collect the metadata of the cluster. For more information, see Collect metadata from an EMR data source.

Procedure

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. Move the pointer over the 新建 icon and choose Create Table > EMR > Table.

    You can also find the workflow in which you want to create an EMR table, right-click EMR, and then select Create Table.

  3. In the Create Table dialog box, configure the parameters.

    新建emr表
  4. Click Create. The configuration tab of the table appears.

  5. In the Basic attributes section, configure the parameters. The following table describes the parameters.

    基本属性

    Parameter

    Description

    Level 1 theme

    The name of the level-1 folder in which the table resides.

    Note

    The level-1 and level-2 folders show the table locations in DataWorks to help you easily manage tables.

    Level 2 theme

    The name of the level-2 folder in which the table resides.

    Create a theme

    Click Create a theme to go to the Folder Management tab. On the Folder Management tab, you can create level-1 and level-2 folders.

    Refresh

    After you create a folder, click Refresh.

    Description

    The description of the table.

  6. In the Physical model design section, configure the parameters. The following table describes the parameters.

    物理模型

    Parameter

    Description

    Layer

    Select a level and a category from the drop-down lists based on your business requirements. To create levels and categories, click Create Level to go to the Level Management tab and create levels and categories. You can perform this operation only if you are the workspace administrator. After you create levels and categories, click Refresh.

    Physical classification

    Partition type

    Valid values: Partition table and Non-partitioned table.

    Table type

    Valid values: Internal tables and External tables.

    Select the storage format

    Select a storage format for files in the table based on your business requirements.

  7. In the Table structure design section, configure the parameters. The following table describes the parameters.

    表结构

    Parameter

    Description

    Add fields

    To add a field, click Add fields, configure the field information, and then click Save.

    Move up

    You can click the buttons to adjust the field sequence of the table. If you want to adjust the sequence of fields in an existing table, you must delete the table and create another table that has the same name. You are not allowed to adjust the sequence of fields in an existing table in the production environment.

    Move down

    Field name

    The name of a field. The name can contain letters, digits, and underscores (_).

    Field type

    The data type of a field. EMR supports the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, CHAR, STRING, BINARY, DATETIME, DATE, TIMESTAMP, BOOLEAN, ARRAY, MAP, and STRUCT.

    Length/Settings

    The length limit of a field. If the data type that you specified for a field requires a length limit, you must configure this parameter.

    Description

    The description of a field.

    Primary key

    Specifies whether a field serves as the primary key. The primary key is a business concept that ensures the uniqueness of a record for your business. DataWorks has no limits on the primary key.

    Edit

    You can click this button for a field to edit the field and then click Save.

    Delete

    You can click this button for a field to delete the field.

    Note

    If you want to delete a field from an existing table and then commit the table, you must delete the table and create another table that has the same name. You are not allowed to perform this operation in the production environment.

    Add

    If you set the Partition type parameter to Partition table in the Physical model design section, you must configure a partition for the table.

    You can click this button to add a partition to the current table. If you want to add a partition to an existing table and then commit the table, you must delete the table and create another table that has the same name. You are not allowed to add a partition to an existing table in the production environment.

  8. Click the 提交 icon in the top toolbar to commit the EMR table to the production environment.

    If you use a workspace in standard mode, commit the table to the development environment and the production environment in sequence.

    Important

    You must select a resource group for scheduling when you commit the table. If you use an exclusive resource group for scheduling to commit the table, DataWorks issues a table creation node to a compute engine instance and displays the run logs. If an error occurs when you commit the table, you can use the run logs to troubleshoot the issue. If no exclusive resource groups for scheduling are available, you can purchase and configure an exclusive resource group for scheduling. For more information, see Create and use an exclusive resource group for scheduling.