This topic describes how to create tables in MaxCompute to store the raw data and processed data.

Prerequisites

  • MaxCompute is activated. A DataWorks workspace is created. A workspace in basic mode is used in this topic. For more information, see Prepare the environment.
  • MaxCompute is authorized to access Tablestore. If you use the same Alibaba Cloud account to activate MaxCompute and Tablestore, go to the RAM console and authorize MaxCompute. If you use different Alibaba Cloud accounts to activate MaxCompute and Tablestore, authorize MaxCompute by customizing an authorization policy. For more information, see Access Tablestore data.

Procedure

  1. Go to the DataStudio page of DataWorks.
    1. Go to the Workspaces page in the DataWorks console. Select China (Shanghai).
    2. Find the workspace you created and click Data Analytics to go to the DataStudio page of the workspace.
  2. Create a workflow.
    1. Right-click Business Flow and select Create Workflow.
      Create Workflow
    2. Set the Workflow Name and Description parameters and click Create. In this example, set the Workflow Name parameter to Workshop.
  3. Create tables.
    1. Create a foreign table named ots_user_trace_log.
      1. Click the Workshop workflow. Right-click MaxCompute and choose Create > Table. Set the Table Name parameter to ots_user_trace_log and click Create.11
      2. Set the Display Name parameter for the table and click DDL Statement. 22
      3. In the DDL Statement dialog box, enter the statement that is used to create the table and click Generate Table Schema.
        Use the following statement to create the ots_user_trace_log table:
        CREATE EXTERNAL TABLE ots_user_trace_log (
            md5 string COMMENT 'The first eight characters in the MD5 value of the user ID',
            uid string COMMENT 'The ID of the user',
            ts bigint COMMENT 'The timestamp when the user performs the operation',
            ip string COMMENT 'The IP address of the client',
            status bigint COMMENT 'The status code returned by the server',
            bytes bigint COMMENT 'The number of bytes returned to the client',
            device string COMMENT 'The model of the terminal on which the client runs',
            system string COMMENT 'The version of the operating system in which the client runs, in the format of ios xxx or android xxx',
            customize_event string COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
            use_time bigint COMMENT 'The duration for which the app is used at a time. This field is displayed when the custom event is set to logoff, background running, or user switching.',
            customize_event_content string COMMENT 'The content of the custom event. This field is displayed when the custom event is set to browsing or comment.'
        )
        STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
        WITH SERDEPROPERTIES (
            'tablestore.columns.mapping'=':md5,:uid,:ts, ip,status,bytes,device,system,customize_event,use_time,customize_event_content',
            'tablestore.table.name'='user_trace_log'
        )
        LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/';
        • STORED BY: required. The value com.aliyun.odps.TableStoreStorageHandler specifies the storage handler that is built in MaxCompute to process the data in Tablestore. This clause defines the logic of interactions between MaxCompute and Tablestore.
        • SERDEPROPERITES: required. This clause allows you to specify options. If you specify com.aliyun.odps.TableStoreStorageHandler, you must also specify the following options:
          • tablestore.columns.mapping: the columns of the Tablestore table that you want to access from MaxCompute. These columns include primary key columns and property columns.
            Note
            • To specify primary key columns, add a colon (:) to the beginning of each column name. In this example, :md5 and :uid are primary key columns. The remaining columns are property columns.
            • When you specify column mapping, you must specify all the primary key columns in the Tablestore table. You can specify only the property columns that you want to access from MaxCompute. The specified property columns must exist in the Tablestore table. Otherwise, errors are returned when you query data in the foreign table.
          • tablestore.table.name: the name of the Tablestore table that you want to access from MaxCompute. If you specify an invalid table name or the specified table name does not exist, an error is returned and MaxCompute does not create a Tablestore table.
        • LOCATION: This clause specifies the endpoint of Tablestore. Specify the endpoint of your Tablestore instance in this clause. For more information, see Prepare the environment.
          Note If you specify the public endpoint of your Tablestore instance in the LOCATION clause, such as LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/', an error that indicates a network failure may be returned. In this case, specify the endpoint of the instance on the classic network, such as LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots-internal.aliyuncs.com/'.
      4. Click Commit to Production Environment. The table is created.
        Note If you use a workspace in standard mode, click Commit in Development Environment and then Commit to Production Environment.
    2. Create a table named ods_user_trace_log.
      Use the same method as that in the preceding example and use the following statement to create the table. The ods_user_trace_log table is a table at the operational data store (ODS) layer.
      CREATE TABLE IF NOT EXISTS ods_user_trace_log (
          md5 STRING COMMENT 'The first eight characters in the MD5 value of the user ID',
          uid STRING COMMENT 'The ID of the user',
          ts BIGINT COMMENT 'The timestamp when the user performs the operation',
          ip STRING COMMENT 'The IP address of the client',
          status BIGINT COMMENT 'The status code returned by the server',
          bytes BIGINT COMMENT 'The number of bytes returned to the client',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system STRING COMMENT 'The version of the operating system in which the client runs, in the format of ios xxx or android xxx',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is displayed when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is displayed when the custom event is set to browsing or comment.'
      )
      PARTITIONED BY (
          dt STRING
      );
    3. Create a table named dw_user_trace_log.
      Use the same method as that in the preceding example and use the following statement to create the table. The dw_user_trace_log table is a table at the data warehouse detail (DWD) layer.
      CREATE TABLE IF NOT EXISTS dw_user_trace_log (
          uid STRING COMMENT 'The ID of the user',
          region STRING COMMENT 'The region in which the user resides, which is obtained based on the IP address',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone.',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is displayed when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is displayed when the custom event is set to browsing or comment.'
      )
      PARTITIONED BY (
          dt STRING
      );
    4. Create a table named rpt_user_trace_log.
      Use the same method as that in the preceding example and use the following statement to create the table. The rpt_user_trace_log table is a table at the application data store (ADS) layer.
      CREATE TABLE IF NOT EXISTS rpt_user_trace_log (
          country STRING COMMENT 'The country in which the user resides',
          province STRING COMMENT 'The province in which the user resides',
          city STRING COMMENT 'The city in which the user resides',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone.',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is displayed when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is displayed when the custom event is set to browsing or comment.'
          pv bigint comment 'The number of page views (PVs)',
          uv bigint comment 'The number of unique visitors (UVs)'
      )
      PARTITIONED BY (
          dt STRING
      );
  4. Verify the created tables.
    1. After you create the tables, click the Workshop workflow and choose MaxCompute > Table to view the created tables.
    2. Click MaxCompute in the workflow. Right-click Data Analytics and choose Create > ODPS SQL.
    3. In the Create Node dialog box, set the Node Name parameter and click Commit. An ODPS SQL node is created.
    4. Enter the following SQL statements in the code editor of the ODPS SQL node and click the Run icon:
      DESCRIBE ots_user_trace_log;
      DESCRIBE ods_user_trace_log;
      DESCRIBE dw_user_trace_log;
      DESCRIBE rpt_user_trace_log;
      The following figure shows the returned schemas of the created tables.Test results