This topic describes how to use DataHub to migrate log data to MaxCompute.

Prerequisites

The following permissions are granted to the account that is authorized to access MaxCompute:
  • CreateInstance permission on MaxCompute projects
  • Permissions to view, modify, and update MaxCompute tables

For more information, see Authorize users.

Background information

DataHub is a platform that is designed to process streaming data. After data is uploaded to DataHub, the data is stored in a table for real-time processing. DataHub executes scheduled tasks to synchronize the data to a MaxCompute table within five minutes for offline computing.

You only need to create and configure a DataConnector. Then, it periodically archives streaming data in DataHub to MaxCompute.

Procedure

  1. On the odpscmd client, create a table that is used to store the data synchronized from DataHub. Example:
     CREATE TABLE test(f1 string, f2 string, f3 double) partitioned by (ds string);
  2. Create a project in the DataHub console.
    1. Log on to the DataHub console. In the left-side navigation pane, click Project Manager. On the page that appears, click Create Project in the upper-right corner.
    2. In the Create Project dialog box, specify Name and Comment and click Create.
  3. Create a topic.
    1. On the Project List page, find the target project and click View in the Operate column.
    2. On the details page of the project, click Create Topic in the upper-right corner. In the Create Topic dialog box, specify the required parameters.
      Create a topic
    3. Click Next Step to complete topic configurations.
      Note
      • Schema corresponds to a MaxCompute table. The field names, data types, and field sequence specified by Schema must be consistent with those of the MaxCompute table. If one of the three conditions is not met, the DataConnector fails to be created.
      • You are allowed to migrate the topics of the TUPLE and BLOB types to MaxCompute tables.
      • A maximum of 20 topics can be created by default. If you require more topics, submit a ticket.
      • The owner of a DataHub topic or the Creator account has the permissions to manage the DataConnector, including creation and deletion.
  4. Write data to the newly created topic.
    1. Find the target topic and click View in the Operate column.
    2. On the details page of the topic, click Connector in the upper-right corner.
    3. In the Create connector dialog box, click MaxCompute, specify the required parameters, and click Create.
  5. View details about the Connector.
    1. In the left-side navigation pane, click Project Manager.
    2. On the Project List page, find the target project and click View in the Operate column.
    3. On the details page of the project, find the target topic and click View in the Operate column.
    4. On the details page of the topic, click the Connector tab to view the created Connector.
    5. Find the target Connector and click View in the Operate column to view details about the Connector.
      By default, DataHub migrates data to MaxCompute tables at five-minute intervals or when the volume of data reaches 60 MB. Sync Offset indicates the number of migrated data entries.DataConnector details
  6. Execute the following statement to check whether the migration of log data to MaxCompute is successful:
    SELECT * FROM test;
    If the result shown in the following figure is displayed, data migration is successful.Test result