This topic describes how to create tables in MaxCompute to store raw data and processed data.

Prerequisites

  • MaxCompute is activated. A DataWorks workspace in basic mode is created. For more information, see Activate services.
  • You are granted the permissions to access data in Tablestore. If you use the same Alibaba Cloud account to activate MaxCompute and Tablestore, you can obtain the permissions with one click on the Cloud Resource Access Authorization page. If you use different Alibaba Cloud accounts to activate MaxCompute and Tablestore, you can obtain the permissions by using a custom policy. For more information, see Create a Tablestore external table.

Procedure

  1. Go to the DataStudio page in the DataWorks console.
    1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the page that appears, select China (Shanghai) in the top navigation bar.
    2. Find the workspace that you created and click Data Development in the Actions column.
  2. Create a workflow.
    1. On the page that appears, right-click Business Flow and select Create Workflow.
      Create Workflow
    2. In the Create Workflow dialog box, specify values for Workflow Name and Description, and click Create. In this example, Workflow Name is set to Workshop.
  3. Create tables.
    1. Create an external table named ots_user_trace_log.
      Notice You cannot create external tables by using DDL statements in wizard mode. To create an external table, you can manually add fields to the table in wizard mode. You can also create an ODPS SQL node in MaxCompute and execute a DDL statement on the ODPS SQL node to create an external table. The following example shows how to execute a DDL statement on an ODPS SQL node to create an external table.
      1. On the Data Analytics tab of the DataStudio page, unfold the Workshop workflow, right-click MaxCompute, and then choose Create > ODPS SQL. In the Create Node dialog box, set Node Name to workshop, and click Commit.
      2. Double-click the ODPS SQL node that you created. On the tab that appears, enter the following DDL statement in the code editor to create an external table named ots_user_trace_log.
        Sample DDL statement:
        CREATE EXTERNAL TABLE ots_user_trace_log (
            md5 string COMMENT 'The first eight characters in the MD5 value of the user ID (UID)',
            uid string COMMENT 'The UID',
            ts bigint COMMENT 'The timestamp when the user performs the operation',
            ip string COMMENT 'The IP address',
            status bigint COMMENT 'The status code returned by the server',
            bytes bigint COMMENT 'The number of bytes returned to the client',
            device string COMMENT 'The model of the terminal on which the client runs',
            system string COMMENT 'The version of the operating system in which the client runs. The version must be in the ios xxx or android xxx format',
            customize_event string COMMENT 'The custom event. You can specify a custom event that is reported for logon, logoff, purchase, registration, click, background operations, account switching, browsing, or commenting',
            use_time bigint COMMENT 'The duration for which the app is used at a time. This field is required if you configure a custom event that is reported for logoff, background operations, or account switching',
            customize_event_content string COMMENT 'The content of the custom event. This field is required if you configure a custom event that is reported for browsing or commenting'
        )
        STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
        WITH SERDEPROPERTIES (
            'tablestore.columns.mapping'=':md5,:uid,:ts, ip,status,bytes,device,system,customize_event,use_time,customize_event_content',
            'tablestore.table.name'='user_trace_log'
        )
        LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/';
        • STORED BY: required. Set the value to com.aliyun.odps.TableStoreStorageHandler. This value specifies the storage handler that is built in MaxCompute to process the data stored in Tablestore. This parameter defines the logic of interactions between MaxCompute and Tablestore.
        • SERDEPROPERITES: required. The WITH SERDEPROPERITES clause allows you to specify options. If you set the STORED BY parameter to com.aliyun.odps.TableStoreStorageHandler, you must specify the following options in the WITH SERDEPROPERITES clause:
          • tablestore.columns.mapping: the columns of the Tablestore table that you want to access from MaxCompute. The columns include primary key columns and attribute columns.
            Note
            • Primary key columns are columns that begin with a colon (:). In this example, :md5, :uid, and :ts are primary key columns. The remaining columns are attribute columns.
            • When you configure column mappings, you must specify all primary key columns in the Tablestore table. For the attribute columns, you need only to specify the columns that you want to access from MaxCompute. The Tablestore table must contain the specified attribute columns. Otherwise, errors are returned when you query data by using the external table that you created.
          • tablestore.table.name: the name of the Tablestore table that you want to access from MaxCompute. If you specify an invalid table name or the table name that you specify does not exist, an error is returned. MaxCompute does not create a Tablestore table.
        • LOCATION: the endpoint of Tablestore. For more information about how to specify a value for this parameter, see Activate services.
          Note If you set this parameter to the public endpoint 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/', an error that indicates a network failure may be returned. In this case, you can set this parameter to the classic network endpoint, such as 'tablestore://workshop-bj-001.cn-beijing.ots-internal.aliyuncs.com/'.
      3. Click the Save icon. Then click the Run icon.
        If the operational log indicates that the preceding statement is successfully executed, the external table is created.
        Note The first time you click the Run icon, you must select a resource group for scheduling. For more information about the scenarios in which resource groups for scheduling are used and how to use resource groups for scheduling, see Overview.
    2. Create a table named ods_user_trace_log.
      Use the same method to create the table. The ods_user_trace_log table is a table at the operational data store (ODS) layer.
      CREATE TABLE IF NOT EXISTS ods_user_trace_log (
          md5 STRING COMMENT 'The first eight characters in the MD5 value of the UID',
          uid STRING COMMENT 'The UID',
          ts BIGINT COMMENT 'The timestamp when the user performs the operation',
          ip STRING COMMENT 'The IP address',
          status BIGINT COMMENT 'The status code returned by the server',
          bytes BIGINT COMMENT 'The number of bytes returned to the client',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system STRING COMMENT 'The version of the operating system in which the client runs. The version must be in the ios xxx or android xxx format',
          customize_event STRING COMMENT 'The custom event. You can specify a custom event that is reported for logon, logoff, purchase, registration, click, background operations, account switching, browsing, or commenting',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is required if you configure a custom event that is reported for logoff, background operations, or account switching',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is required if you configure a custom event that is reported for browsing or commenting'
      )
      PARTITIONED BY (
          dt STRING
      );
    3. Create a table named dw_user_trace_log.
      Use the same method to create the table. The dw_user_trace_log table is a table at the data warehouse (DW) layer.
      CREATE TABLE IF NOT EXISTS dw_user_trace_log (
          uid STRING COMMENT 'The UID',
          region STRING COMMENT 'The region in which the user resides. The region is obtained based on the IP address of the user',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The type of the operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone',
          customize_event STRING COMMENT 'The custom event. You can specify a custom event that is reported for logon, logoff, purchase, registration, click, background operations, account switching, browsing, or commenting',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is required if you configure a custom event that is reported for logoff, background operations, or account switching',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is required if you configure a custom event that is reported for browsing or commenting'
      )
      PARTITIONED BY (
          dt STRING
      );
    4. Create a table named rpt_user_trace_log.
      Use the same method to create the table. The rpt_user_trace_log table is a table at the application data store (ADS) layer.
      CREATE TABLE IF NOT EXISTS rpt_user_trace_log (
          country STRING COMMENT 'The country in which the user resides',
          province STRING COMMENT 'The province in which the user resides',
          city STRING COMMENT 'The city in which the user resides',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The type of the operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone',
          customize_event STRING COMMENT 'The custom event. You can specify a custom event that is reported for logon, logoff, purchase, registration, click, background operations, account switching, browsing, or commenting',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is required if you configure a custom event that is reported for logoff, background operations, or account switching',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is required if you configure a custom event that is reported for browsing or commenting',
          pv bigint comment 'The number of page views (PVs)',
          uv bigint comment 'The number of unique visitors (UVs)'
      )
      PARTITIONED BY (
          dt STRING
      );
    5. Optional. If your workspace is in standard mode, the external table that you created is committed only to the development environment. To commit the external table to the production environment, perform the following steps:
      1. On the Data Analytics tab of the DataStudio page, unfold your workflow below Business Flow, right-click Table, and then select Import Table. Import Table
      2. In the Import Table dialog box, select the external table that is created in the development environment from the Table drop-down list, and click OK.
        Note You can move the pointer over the table name to check whether the table is created in the production environment or in the development environment. If you select a table that is created in the production environment, the system displays a message indicating that the table does not exist.
      3. Double-click the external table that you imported. On the tab that appears, click Commit to Production Environment in the upper part of the tab.
  4. Verify the tables that you created.
    1. After you create the tables, unfold the Workshop workflow below Business Flow, and choose MaxCompute > Table to view all tables that you created.
    2. Unfold the Workshop workflow below Business Flow, right-click Data Analytics below MaxCompute, and then choose Create > ODPS SQL.
    3. In the Create Node dialog box, specify a value for Node Name, and click Commit to create an ODPS SQL node.
    4. Enter the following SQL statements in the code editor of the ODPS SQL node, and click the Run icon.
      DESCRIBE ots_user_trace_log;
      DESCRIBE ods_user_trace_log;
      DESCRIBE dw_user_trace_log;
      DESCRIBE rpt_user_trace_log;
      The following information about the table is returned:Returned information