This topic describes how to use DataHub to migrate log data to MaxCompute.
Prerequisites
The following permissions are granted to the account authorized to access MaxCompute:
- CreateInstance permission on MaxCompute projects
- Permissions to view, modify, and update MaxCompute tables
For more information, see Authorize users.
Background information
DataHub is a platform that is designed to process streaming data. After data is uploaded
to DataHub, the data is stored in a table for real-time processing. DataHub executes
scheduled tasks within five minutes to synchronize the data to a MaxCompute table
for offline computing.
To periodically archive streaming data in DataHub to MaxCompute, you only need to
create and configure a DataConnector.
Procedure
- On the odpscmd client, create a table that is used to store the data synchronized
from DataHub. Example:
CREATE TABLE test(f1 string, f2 string, f3 double) partitioned by (ds string);
- Create a project in the DataHub console.
- Log on to the DataHub console. In the upper-left corner, select a region.
- In the left-side navigation pane, click Project Manager.
- In the upper-right corner of the Project List page, click Create Project.
- In the Create Project dialog box, specify Name and Comment, and click Create.
- Create a topic.
- On the Project List page, find the project for which you want to create a topic and click View in the Operate column.
- In the upper-right corner of the project details page, click Create Topic. In the Create Topic dialog box, configure the parameters.
- Click Next Step to complete topic configurations.
Note
- Schema corresponds to a MaxCompute table. The field names, data types, and field sequence
specified by Schema must be consistent with those of the MaxCompute table. You can
create a DataConnector only if the three conditions are met.
- You are allowed to migrate the topics of the TUPLE and BLOB types to MaxCompute tables.
- A maximum of 20 topics can be created by default. If you require more topics, submit
a ticket.
- The owner of a DataHub topic or the Creator account has the permissions to manage
a DataConnector. For example, you can create or delete a DataConnector.
- Write data to the newly created topic.
- Click View in the Operate column of the newly created topic.
- On the topic details page, click Connector.
- In the Create connector dialog box, click MaxCompute, configure the parameters, and then click Create.
- View DataConnector details.
- In the left-side navigation pane, click Project Manager.
- On the Project List page, find the project that you want to view its DataConnector details and click
View in the Operate column.
- On the Topic List tab, find the topic of the project and click View in the Operate column.
- On the topic details page, click the Connector tab to view the created DataConnector.
- Click View to view DataConnector details.
By default, DataHub migrates data to MaxCompute tables at five-minute intervals or
when the amount of data reaches 60 MB.
Sync Offset indicates the number of migrated data entries.

- Execute the following statement to check whether the log data is migrated to MaxCompute:
SELECT * FROM test;
If the result shown in the following figure is displayed, the log data is migrated
to MaxCompute.
