Before you use MaxCompute to process data, you must create tables in MaxCompute to store raw data and data after processing.

Prerequisites

  • MaxCompute is activated. A DataWorks workspace is created. This topic uses a workspace in the basic mode as an example. For more information, see Prepare the environment.
  • MaxCompute is authorized to access Tablestore data. If MaxCompute and Tablestore are activated by using the same Alibaba Cloud account, you can go to the Resource Access Management (RAM) console and authorize MaxCompute to access Tablestore data with one click. If MaxCompute and Tablestore are activated by using different Alibaba Cloud accounts, you can authorize MaxCompute to access Tablestore data by creating a custom authorization policy. For more information, see Access Table Store (OTS) data.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console. Click Workspaces in the left-side navigation pane. On the Workspaces page that appears, select China (Hangzhou) from the region drop-down list in the upper-left corner.
    2. Find the target workspace and click the workspace name or Data Analytics in the Actions column. The DataStudio page appears.
  2. Create tables. For more information, see Table.
    1. On the Data Analytics tab, right-click Business Flow and select Create Workflow.
    2. In the Create Workflow dialog box that appears, set Workflow Name and Description and click Create. In this example, set Workflow Name to Workshop.
    3. Create a foreign table named ots_user_trace_log.
      1. On the Data Analytics tab, click the Workshop workflow, right-click MaxCompute, and then choose Create > Table. In the Create Table dialog box that appears, set Table Name to ots_user_trace_log and click Commit.
      2. On the table editing tab that appears, set Display Name in the General section and click DDL Statement in the upper-left corner. The DDL Statement dialog box appears.
        Enter the following SQL statements for creating the ots_user_trace_log table in the DDL Statement dialog box and click Generate Table Schema:
        CREATE EXTERNAL TABLE ots_user_trace_log (
            md5 string COMMENT 'The MD5 value of the ID of the client user',
            uid string COMMENT 'The ID of the client user',
            ts bigint COMMENT 'The timestamp when the client user performs the operation',
            ip string COMMENT 'The IP address of the client that sends the request',
            status bigint COMMENT 'The status code returned by the server',
            bytes bigint COMMENT 'The number of bytes returned to the client',
            device string COMMENT 'The model of the terminal where the client runs',
            system string COMMENT 'The version of the operating system where the client runs, in the format of ios xxx or android xxx',
            customize_event string COMMENT 'The custom event, including logon, logoff, purchase, registration, click, background running, user switching, and browse',
            use_time bigint COMMENT 'The use duration of the app at a time. (This field is available when the custom event is logoff, background running, or user switching.)',
            customize_event_content string COMMENT 'The content of the custom event. (This field is available when the custom event is browse or comment.)'
        )
        STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
        WITH SERDEPROPERTIES (
            'tablestore.columns.mapping'=':md5,:uid,:ts, ip,status,bytes,device,system,customize_event,use_time,customize_event_content',
            'tablestore.table.name'='user_trace_log'
        )
        LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/';
        The preceding SQL statements contain the following clauses:
        • STORED BY: required. The value com.aliyun.odps.TableStoreStorageHandler specifies the storage handler built in MaxCompute for processing data stored in Tablestore and defines the logic of interaction between MaxCompute and Tablestore.
        • WITH SERDEPROPERITES: required. This clause allows you to specify options. If you specify com.aliyun.odps.TableStoreStorageHandler in the STORED BY clause, you must also specify the following options in the WITH SERDEPROPERITES clause:
          • tablestore.columns.mapping: the columns of the Tablestore table that MaxCompute needs to access, including primary key columns and property columns.
            Note
            • To specify primary key columns, prefix each column name with a colon (:), for example, :md5 and :uid. Other columns are property columns.
            • When you specify column mappings by using the tablestore.columns.mapping option, you must specify all primary key columns of the target Tablestore table. You only need to specify the property columns that MaxCompute needs to access instead of specifying all property columns. The specified property columns must exist in the target Tablestore table. Otherwise, even though the foreign table is created, errors are returned when you query data in the foreign table.
          • tablestore.table.name: the name of the Tablestore table that MaxCompute needs to access. If the table name does not exist in Tablestore, an error is returned and MaxCompute does not create the foreign table.
        • LOCATION: This clause specifies the endpoint of your Tablestore instance. Specify the endpoint of the Tablestore instance that you purchased in this clause. For more information, see Prepare the environment.
          Note If you specify the public endpoint of the Tablestore instance in the LOCATION clause, for example, LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots.aliyuncs.com/', an error that indicates network failure may be returned. In this case, specify the internal endpoint of the instance, for example, LOCATION 'tablestore://workshop-bj-001.cn-beijing.ots-interna.aliyuncs.com/'.
      3. Click Commit to Production Environment. The foreign table is created.
        Note If you create the foreign table in a workspace in the standard mode, click Commit in Development Environment and then Commit to Production Environment.
    4. Create a table named ods_user_trace_log.
      Use the same method to create the table. The ods_user_trace_log table is a table at the operational data store (ODS) layer. Use the following SQL statements to create the table:
      CREATE TABLE IF NOT EXISTS ods_user_trace_log (
          md5 STRING COMMENT 'The MD5 value of the ID of the client user',
          uid STRING COMMENT 'The ID of the client user',
          ts BIGINT COMMENT 'The timestamp when the client user performs the operation',
          ip STRING COMMENT 'The IP address of the client that sends the request',
          status BIGINT COMMENT 'The status code returned by the server',
          bytes BIGINT COMMENT 'The number of bytes returned to the client',
          device STRING COMMENT 'The model of the terminal where the client runs',
          system STRING COMMENT 'The version of the operating system where the client runs, in the format of ios xxx or android xxx',
          customize_event STRING COMMENT 'The custom event, including logon, logoff, purchase, registration, click, background running, user switching, and browse',
          use_time BIGINT COMMENT 'The use duration of the app at a time. (This field is available when the custom event is logoff, background running, or user switching.)',
          customize_event_content STRING COMMENT 'The content of the custom event. (This field is available when the custom event is browse or comment.)'
      )
      PARTITIONED BY (
          dt STRING
      );
    5. Create a table named dw_user_trace_log.
      Use the same method to create the table. The dw_user_trace_log table is a table at the common data model (CDM) layer. Use the following SQL statements to create the table:
      CREATE TABLE IF NOT EXISTS dw_user_trace_log (
          uid STRING COMMENT 'The ID of the client user',
          region STRING COMMENT 'The region where the client user resides, which is obtained based on the IP address',
          device_brand string comment 'The brand of the terminal where the client runs',
          device STRING COMMENT 'The model of the terminal where the client runs',
          system_type STRING COMMENT 'The operating system where the client runs. Valid values: Android, IOS, ipad, and Windows_phone',
          customize_event STRING COMMENT 'The custom event, including logon, logoff, purchase, registration, click, background running, user switching, and browse',
          use_time BIGINT COMMENT 'The use duration of the app at a time. (This field is available when the custom event is logoff, background running, or user switching.)',
          customize_event_content STRING COMMENT 'The content of the custom event. (This field is available when the custom event is browse or comment.)'
      )
      PARTITIONED BY (
          dt STRING
      );
    6. Create a table named rpt_user_trace_log.
      Use the same method to create the table. The rpt_user_trace_log table is a table at the application data store (ADS) layer. Use the following SQL statements to create the table:
      CREATE TABLE IF NOT EXISTS rpt_user_trace_log (
          country STRING COMMENT 'The country where the user resides',
          province STRING COMMENT 'The province where the user resides',
          city STRING COMMENT 'The city where the user resides',
          device_brand string comment 'The brand of the terminal where the client runs',
          device STRING COMMENT 'The model of the terminal where the client runs',
          system_type STRING COMMENT 'The operating system where the client runs. Valid values: Android, IOS, ipad, and Windows_phone',
          customize_event STRING COMMENT 'The custom event, including logon, logoff, purchase, registration, click, background running, user switching, and browse',
          use_time BIGINT COMMENT 'The use duration of the app at a time. (This field is available when the custom event is logoff, background running, or user switching.)',
          customize_event_content STRING COMMENT 'The content of the custom event. (This field is available when the custom event is browse or comment.)'
          pv bigint comment 'The number of page views (PVs)',
          uv bigint comment 'The number of unique visitors (UVs)'
      )
      PARTITIONED BY (
          dt STRING
      );
  3. Verify the created tables.
    1. After you create the tables, click the Workshop workflow on the Data Analytics tab and choose MaxCompute > Table. The created tables appear.
    2. On the Data Analytics tab, move the pointer over the Create icon and choose MaxCompute > ODPS SQL. In the Create Node dialog box that appears, set Node Type to ODPS SQL, set Node Name and Location, and then click Commit. An ODPS SQL node is created.
    3. Enter the following SQL statements in the SQL editor of the ODPS SQL node and click the Run icon icon:
      DESCRIBE ots_user_trace_log;
      DESCRIBE ods_user_trace_log;
      DESCRIBE dw_user_trace_log;
      DESCRIBE rpt_user_trace_log;
      The schema information of the created tables appears.