edit-icon download-icon

Configure VPC environment to achieve data synchronization

Last Updated: Apr 25, 2018

This article explains how to synchronize MySQL data sources in VPC environments to MaxCompute. Given that building databases in a VPC environment is a subject to network connectivity issues, the data of these databases cannot be directly synchronized to cloud.

The network of Finance Cloud is a VPC network. In this tutorial, we have two ECS servers in the same network segment:

  • ECS1 serves as the resource group, on which the synchronization task runs.

  • ECS2 is used to build a MySQL database.

The ECS1 server reads data of the database from the ECS2 server and then writes it to MaxCompute.

Note:
You must grant database permissions to the ECS1 server to access relevant database and read the data of the database. The command for granting permissions is as follows:

  1. grant all privileges on *.* to 'demo_test'@'%' identified by 'Password'; --> % means granting permissions to any IP addresses.

The following figure demonstrates the procedure mentioned in this tutorial:

22

Procedure

Purchase ECS servers

Firstly, you must Purchase two ECS servers with your Finance Cloud account.

  • The ECS1 server (with an elastic IP (EIP) of 120.76.2.21) serves as the scheduling resource group.

  • The ECS2 server (with an EIP of 120.76.1.119) is used to build a MySQL database with the database name of demo_test and the table name of mytest.

Note:

  • We recommend using CentOS6, CentOS7, and AliyunOS.

  • Check whether the Python version of the added ECS servers for running synchronization tasks is Python 2.6.5 or later (The CentOS version is 6.5 (64-bit)).

  • We recommend binding an EIP instead of assigning a public IP.

Purchase a public network bandwidth for custom ECS servers to access the public network

Use either of the following options:

  • Directly assign a public IP address when purchasing the ECS servers.

  • If you choose not to assign a public IP, you must configure and bind an EIP. For more information, see Bind an EIP.

ECS instances in VPC environments access the public network. When an ECS instance is created in a VPC environment, and if no public IP is assigned by the system or any EIP is not bound to such instance, the connection to DataWorks fails. This is because, EIP is important for DataWorks to send heartbeat packets.

Add a security group

A security group is a logical group and a virtual firewall. It consists of instances with the same security requirements and mutual trust in the same region. As an important means of network isolation, it can be used to set the network access control for one or more ECS instances. Each instance belongs to at least one security group, which must be specified at the time of instance creation. Instances in the same security group can communicate through the network, but instances in different security groups cannot communicate over intranet by default. You can authorize the intercommunication between security groups. For more information, see Scenarios.

Note:

Add a security group of 172.31.46.0/24 and set the port range of -1/-1. In this way, you can make sure the connectivity of the entire 172.31.46.0 private network IP segments.

Add a scheduling resource

  1. Log on to Alibaba Cloud account.
  2. Go to DataWorks and select Scheduling Resource List as the project administrator.
  3. Click New Scheduling Resource and enter name/title for the added resource, as shown in the following figure.

    11

  4. After adding the scheduling resource, click Server Management in the Actions column for the new scheduling resource in the dialog box to enter the server addition page and add the purchased ECS to the resource group, as shown in the following figure.

    12

  5. Click Add a Server.

    1

    • Network Type: Select Classic Network or VPC based on your network type.
    • Server Name:
      • Obtain it on the classic network: Log on to ECS and run the hostname command, and the returned value is the server name.
      • Obtain it on the VPC: Log on to ECS and run the dmidecode | grep UUID command, and the returned value is the server name.
    • Device IP: Enter the IP address of the VPC.

After completing the preceding steps, you have now registered the newly purchased ECS for DataWorks, but ECS is not yet ready for use.

Log on to the remote server

  1. Go to the DTplus Console.
  2. Select ECS > Instance > Connect page, log on to the remote server and perform corresponding operations, as shown in the following figure.

    13

  3. Click Connect.

    Note:
    The connection password is displayed only once and is required every time you logon to your account. Therefore, save and secure this password.

  4. Enter the remote logon password.

  5. Click Copy Command Input and enter related commands to perform corresponding operations.

    12

  6. Initialize the server.

    1. Log on to the ECS server (which is the remotely logged-on ECS server) with the root permission, run the su root command, and enter the password set when purchasing the server.

    2. Run the command (directly copy this command line from the interface prompt and run it): wget https://alisaproxy.shuju.aliyun.com/install.sh --no-check-certificate.

    3. Run the command (directly copy this command line from the interface prompt and run it): sh install.sh --user_name=zz_[Unique identifier of the scheduling resource] --password=[AK password] --enable_uuid=false.

    4. About 15 seconds later, click the Refresh button on the server addition page and check whether the service status becomes Normal. If yes, the newly created ECS server is registered successfully.

After you complete the preceding steps, the service status may remain as Stopped, and this can happen because of the following reason:

The error shown in the preceding figure indicates that no host was bound. To fix the error, follow these steps.

  1. Switch to the admin account.

  2. Run the hostname -i command to view the host binding status.

  3. Run the vim / etc / hosts command to add an IP address and a host name.

  4. Refresh the service status on the page, and if the service status becomes Normal, the newly created ECS server is registered successfully.

Note:,
If the service status remains Stopped after refreshing, restart the following alisa command to switch to the admin account: /home/admin/alisatatasknode/target/alisatatasknode/bin/serverct1 restart.

Add a MySQL data source

  1. Log on to the DataWorks console.

  2. Select Data Integration > Offline Sync > Data Sources to enter data sources management page.

  3. Click New Source, and select MySQL from the dialog box.

  4. Enter connection information, and click Test Connectivity. See the following figure.

    15

    JDBC URL: The JDBC URL.

    Format: jdbc:mysql://IP:Port/database, where IP is the IP address of the VPC of ECS.

  5. Click Complete to finish the data source connection.

Configure a synchronization task (synchronization from the VPC is only supported in script mode)

  1. The script configuration example is as follows:
  2. {
  3. "configuration": {
  4. "reader": {
  5. "plugin": "mysql",
  6. "parameter": {
  7. "datasource": "ecs_mysql",
  8. "column": [
  9. "id",
  10. "name",
  11. "sex",
  12. "age"
  13. ],
  14. "where": "",
  15. "splitPk": "",
  16. "table": "mytest"
  17. }
  18. },
  19. "writer": {
  20. "plugin": "odps",
  21. "parameter": {
  22. "odpsServer": "http://service.odps.aliyun.com/api",
  23. "tunnelServer": "http://dt.odps.aliyun.com",
  24. "partition": "",
  25. "truncate": false,
  26. "datasource": "odps_first",
  27. "column": [
  28. "id",
  29. "name",
  30. "sex",
  31. "age"
  32. ],
  33. "table": "mytest"
  34. }
  35. },
  36. "setting": {
  37. "errorLimit": {
  38. "record": "0"
  39. },
  40. "speed": {
  41. "concurrent": "1",
  42. "mbps": "1"
  43. }
  44. }
  45. },
  46. "type": "job",
  47. "version": "1.0"
  48. }

The connection addresses of MaxCompute and its tunnel service

Currently, the connection addresses of the MaxCompute and MaxCompute tunnel services of Finance Cloud can be connected. The following table lists the connection addresses for accessing MaxCompute and its tunnel service in different regions and network environments.

The services may subject you to certain charges in different cases. For more information, see Access domains and data centers. After the synchronization task is configured, the scheduling task is not yet configured successfully until the configured synchronization task runs in the previously added scheduling resource group.

Modify the scheduling resource group

  1. Go to the DataWorks > Operation Center > Cycle Task page.

  2. Select the synchronization task and click Modify Resource Group.

    16

  3. Select the resource group to be added and click OK.

    The task rerunning result is shown as follows.

    You can use the select * from mytest command to view the table contents in MySQL database and MaxCompute.

Thank you! We've received your feedback.