Before you use MaxCompute to process data, you must create tables in MaxCompute to store both the raw data and processed data.

Prerequisites

  • MaxCompute is activated. A DataWorks workspace is created. A workspace in basic mode is used in this topic. For more information, see Prepare the environment.
  • MaxCompute is authorized to access Tablestore. If you use the same account to activate MaxCompute and Tablestore, go to the RAM console to authorize MaxCompute in one-click mode. If you use different accounts to activate MaxCompute and Tablestore, authorize MaxCompute by customizing an authorization policy. For more information, see Access Table Store (OTS) data.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console. Click Workspaces in the left-side navigation pane. On the page that appears, select China (Shanghai) in the top navigation bar.
    2. Find the target workspace and click Data Analytics in the Actions column.
  2. Create a workflow.
    1. On the Data Analytics tab of the DataStudio page, right-click Business Flow and select Create Workflow.
      Create Workflow
    2. In the dialog box that appears, specify Workflow Name and Description and click Create. For this example, set Workflow Name to Workshop.
  3. Create tables.
    1. Create an external table named ots_user_trace_log.
      1. Click the Workshop workflow. Right-click MaxCompute and choose Create > Table. In the dialog box that appears, set Table Name to ots_user_trace_log and click Commit.11
      2. On the tab that appears, specify Display Name in the General section and click DDL Statement in the upper-left corner. 22
      3. In the DDL Statement dialog box, enter the statement that is used to create the table and click Generate Table Schema.
        The following code describes the statement that is used to create the ots_user_trace_log table:
        CREATE EXTERNAL TABLE ots_user_trace_log (
            md5 string COMMENT 'The first eight characters in the MD5 value of the user ID',
            uid string COMMENT 'The ID of the user',
            ts bigint COMMENT 'The timestamp when the user performs the operation',
            ip string COMMENT 'The IP address of the client',
            status bigint COMMENT 'The status code returned by the server',
            bytes bigint COMMENT 'The number of bytes returned to the client',
            device string COMMENT 'The model of the terminal on which the client runs',
            system string COMMENT 'The version of the operating system in which the client runs, in the format of ios xxx or android xxx',
            customize_event string COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
            use_time bigint COMMENT 'The duration for which the app is used at a time. This field is available when the custom event is set to logoff, background running, or user switching.',
            customize_event_content string COMMENT 'The content of the custom event. This field is available when the custom event is set to browsing or comment.'
        )
        STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
        WITH SERDEPROPERTIES (
            'tablestore.columns.mapping'=':md5,:uid,:ts, ip,status,bytes,device,system,customize_event,use_time,customize_event_content',
            'tablestore.table.name'='user_trace_log'
        )
        LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/';
        • STORED BY: required. The value com.aliyun.odps.TableStoreStorageHandler specifies the storage handler that is built in MaxCompute to process the data stored in Tablestore. This clause defines the logic of interactions between MaxCompute and Tablestore.
        • SERDEPROPERITES: required. The WITH SERDEPROPERITES clause allows you to specify options. If you specify com.aliyun.odps.TableStoreStorageHandler in the STORED BY clause, you must also specify the following options in the WITH SERDEPROPERITES clause:
          • tablestore.columns.mapping: the columns of the Tablestore table that you want to access from MaxCompute. These columns include primary key columns and property columns.
            Note
            • To specify primary key columns, add a colon (:) to the beginning of each column name. :md5 and :uid are primary key columns in this example. The remaining columns are property columns.
            • If you specify column mapping by using the tablestore.columns.mapping option, you must specify all the primary key columns in the Tablestore table. You only need to specify the property columns that you want to access from MaxCompute. The specified property columns must exist in the Tablestore table. Otherwise, errors are returned when you query data in the external table.
          • tablestore.table.name: the name of the Tablestore table that you want to access from MaxCompute. If you specify an invalid table name or the specified table name does not exist, an error is returned and MaxCompute does not create a Tablestore table.
        • LOCATION: This clause specifies the endpoint of Tablestore. Specify the endpoint of your Tablestore instance in this clause. For more information, see Prepare the environment.
          Note If you specify the public endpoint of your Tablestore instance in the LOCATION clause, such as LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/', an error that indicates a network failure may be returned. In this case, specify the endpoint of the instance in a classic network, for example, LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots-internal.aliyuncs.com/'.
      4. Click Commit to Production Environment. The external table is created.
        Note If you create the external table in a workspace that is in standard mode, click Commit in Development Environment and then Commit to Production Environment.
    2. Create a table named ods_user_trace_log.
      Use the same method as in the preceding example to create the table. The ods_user_trace_log table is a table at the operational data store (ODS) layer. The following code describes the statement that is used to create the table:
      CREATE TABLE IF NOT EXISTS ods_user_trace_log (
          md5 STRING COMMENT 'The first eight characters in the MD5 value of the user ID',
          uid STRING COMMENT 'The ID of the user',
          ts BIGINT COMMENT 'The timestamp when the user performs the operation',
          ip STRING COMMENT 'The IP address of the client',
          status BIGINT COMMENT 'The status code returned by the server',
          bytes BIGINT COMMENT 'The number of bytes returned to the client',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system STRING COMMENT 'The version of the operating system in which the client runs, in the format of ios xxx or android xxx',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is available when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is available when the custom event is set to browsing or comment.'
      )
      PARTITIONED BY (
          dt STRING
      );
    3. Create a table named dw_user_trace_log.
      Use the same method as in the preceding example to create the table. The dw_user_trace_log table is a table at the data warehouse detail (DWD) layer. The following code describes the statement that is used to create the table:
      CREATE TABLE IF NOT EXISTS dw_user_trace_log (
          uid STRING COMMENT 'The ID of the user',
          region STRING COMMENT 'The region in which the user resides, which is obtained based on the IP address',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone.',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is available when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is available when the custom event is set to browsing or comment.'
      )
      PARTITIONED BY (
          dt STRING
      );
    4. Create a table named rpt_user_trace_log.
      Use the same method as in the preceding example to create the table. The rpt_user_trace_log table is a table at the application data store (ADS) layer. The following code describes the statement that is used to create the table:
      CREATE TABLE IF NOT EXISTS rpt_user_trace_log (
          country STRING COMMENT 'The country in which the user resides',
          province STRING COMMENT 'The province in which the user resides',
          city STRING COMMENT 'The city in which the user resides',
          device_brand string comment 'The brand of the terminal on which the client runs',
          device STRING COMMENT 'The model of the terminal on which the client runs',
          system_type STRING COMMENT 'The operating system in which the client runs. Valid values: Android, IOS, ipad, and Windows_phone.',
          customize_event STRING COMMENT 'The custom event: logon, logoff, purchase, registration, click, background running, user switching, browsing, or comment',
          use_time BIGINT COMMENT 'The duration for which the app is used at a time. This field is available when the custom event is set to logoff, background running, or user switching.',
          customize_event_content STRING COMMENT 'The content of the custom event. This field is available when the custom event is set to browsing or comment.'
          pv bigint comment 'The number of page views (PVs)',
          uv bigint comment 'The number of unique visitors (UVs)'
      )
      PARTITIONED BY (
          dt STRING
      );
  4. Verify the created tables.
    1. After you create the tables, click the Workshop workflow, MaxCompute, and then Table to view the created tables.
    2. Click the Workshop workflow and then MaxCompute. Right-click Data Analytics and choose Create > ODPS SQL.
    3. In the Create Node dialog box, specify Node Name and click Commit to create an ODPS SQL node.
    4. Enter the following SQL statements in the SQL editor of the ODPS SQL node and click the Run icon icon:
      DESCRIBE ots_user_trace_log;
      DESCRIBE ods_user_trace_log;
      DESCRIBE dw_user_trace_log;
      DESCRIBE rpt_user_trace_log;
      The following figure shows the schemas of the created tables.Verify the settings