edit-icon download-icon

Database data sync (without public network IP)

Last Updated: Apr 11, 2018

For security reasons, you may build a database on a server with no public IP, and want to synchronize the data in such a database to the cloud. However, because of the network connectivity problems, data in this environment cannot be synchronized directly. This article explains how to synchronize MySQL data in environments with no public IP to MaxCompute.

Environment prerequisites

Generally, after purchasing the ECS to build MySQL database, you may encounter two scenarios: no public IP (elastic IP, EIP) assigned and local IDC without public IP.

Details are as follows.

Scenario 1: The ECS is purchased to build a MySQL database; no public IP (EIP) is assigned.

The following diagram explains the procedure of this scenario.

Scenario1

  • As ECS2 server cannot access the public network, an ECS1 server that is in the same network segment as ECS2 and has the ability to access the public network is required in the test.

  • ECS1 serves as the resource group, on which the synchronization task runs.

  • The data from MySQL database of ECS2 server is written to the MaxCompute.

    Note:

    You must grant database permissions to the ECS2 server to access relevant database and read the data of the database to ECS1. The command for granting permissions is as follows.

    1. grant all privileges on *.* to 'demo_test'@'%' identified by 'password'; --> "%" stands for authorizing all the IP addresses
  1. Purchase an ECS server with the Cloud account.

    • The ECS server (with an EIP: 120.76.2.21) serves as the scheduling resource group.

    • Build a MySQL database in the local data room database: demo_test; table name: mytest.

    Note:

    • CentOS 6, CentOS 7, and AliyunOS are recommended.

    • Check whether the Python version of the added ECS servers for running synchronization tasks is Python 2.6.5 or later (The CentOS version is 6.5 (64-bit)).

    • We recommend that you bind an EIP instead of assigning a public IP.

    • For information on ECS purchase, log on to the ECS console.

  2. Purchase a public network bandwidth for custom ECS servers to access the public network.

    You can customize the ECS by using any of the following methods.

    • Directly assign a public IP address when purchasing the ECS servers.

    • If you do not want to assign a public IP, you must configure and bind an EIP. For more information, see Bind EIPs.

    The ability to access the public network is necessary for ECS servers. If no public IP is assigned by the system nor an EIP is bound to the ECS server, the connection to DataWorks may fail because the EIP is critical for DataWorks to send heartbeat packets.

  3. Add a security group.

    A security group is a logical group and a virtual firewall. It consists of instances with the same security requirements and mutual trust in the same region. As an important means of network isolation, it can be used to set the network access control for one or more ECS instances. Each instance belongs to at least one security group, which must be specified at the time of instance creation. Instances in the same security group can communicate through the network, but instances in different security groups cannot communicate over intranet by default. You can authorize the intercommunication between security groups. For details, see Use Cases.

    Note:

    Add a security group of 172.31.46.0/24 and set the port range as -1/-1. Then, you can ensure the connectivity of the entire private network IP segment 172.31.46.0, and ECS can connect to the database of the local data room. 0.0.0.0/0 indicates that all the IP segments are allowed to be accessed by ECS.

Scenario 2: Local IDCs with no public IP.

Scenario2

As machine 1 cannot access the public network, a machine 2 that is in the same network segment as machine 1 and has the ability to access the public network is required .

  • The server on machine 2 (with an public IP address: 120.76.2.21) serves as the scheduling resource group.

  • Build a MySQL database in the local data room database: demo_test; table name: mytest.

Procedure

Add scheduling resources

  1. Navigate to DataWorks > Schedule Resource List as the project administrator.

    ScheduleResourceList

  2. click New Scheduling Resources, and enter a name for the added resource.

    NewScheduling

  3. After the name is added, click Manage Servers to enter the Server Addition page, and add the purchased ECS to the resource group.

  4. Click Add Server.

    AddAServer

    The configuration items are as follows.

    • Network Type: select Classic Network or VPC based on your network type.

    • Server Name: enter the real host name (non-cutomizable)

      Obtain it on the classic network: log on to the sever and run the hostname command, and the returned value is the server name.

    • Machine IP: the IP address of the VPC.

    After completing the operations, the information of the new server is successfully registered to DataWorks, but it cannot be effective immediately.

Log on to the remote server

  1. Go to ECS Console > Instance > Remote Connection page, log on to the remote server and perform corresponding operations.

  2. Click Remote Connection.

    Note: The remote connection password is displayed only once and is required for logon every time. Therefore, store your password carefully.

  3. Enter the remote logon password.

  4. Click Copy Command Input, and enter the hostname command to find the relevant results.

    Note: If you are using machine 2, you can directly enter the hostname command to find the relevant information, and remote logon is not required.

  5. Initialize the server.

  6. Log on to the server with the root permission, run the su root command, and enter the set password.

  7. Run the command (copy page prompts directly for the command line, and run it).wget https://alisaproxy.shuju.aliyun.com/install.sh --no-check-certificate;

  8. Run the command (copy page prompts directly for the command line, and run it).

    sh install.sh --user_name=zz_[ Unique identifier of scheduling resource] --password=[AK password] --enable_uuid=false;

  9. About 15s later, click Refresh button on the Add Server page and check whether the service status is Normal. If it is normal, the newly created ECS server has been registered successfully.

After the preceding steps are completed, the service status may remain as Stopped, and this may occur because of the following.

The error shown in the preceding figure indicates that no host was bound. To fix the error, follow these steps.

  1. Switch to the admin account.

  2. Run the hostname -i command to check the host binding status.

  3. Run the vim / etc / hosts command to add an IP address and a host name.

  4. Refresh the service status on the page, and if the service status is “Normal”, the newly created server has been registered successfully.

    Note: If the service status remains as Stopped after refreshing the page, restart the following alisa command to switch to the admin account: /home/admin/alisatatasknode/target/alisatatasknode/bin/serverct1 restart.

Configure the MySQL database

  1. Go to the DataWorks console as a developer and click Enter Project.

  2. Click Data Integration from the upper menu and navigate to the Data Sources page.

  3. Click New Source to show the supported data source types.

  4. Select MySQL as the data source type.

    Configuration item descriptions are shown as follows.

    • Type: data source without a public IP address.

    • Name: It is a combination of letters, numbers, and underscores. It must begin with a letter or underscore and must not exceed 60 characters.

    • Description: A brief description of the data source, which must not exceed 80 characters.

    • JDBCUrl: the JDBC URL. Format: jdbc:mysql://ServerIP:Port/database.

    • Username/Password: The user name and password of the database.

  5. Click Test Connectivity.

  6. In this case, the connectivity is Failed, and click Complete.

Configure a synchronization task

  1. An example of script configuration is as follows.
  2. {
  3. "configuration": {
  4. "reader": {
  5. "plugin": "mysql",
  6. "parameter": {
  7. "datasource": "ecs_mysql",
  8. "column": [
  9. "id",
  10. "name",
  11. "sex",
  12. "age"
  13. ],
  14. "where": "",
  15. "splitPk": "",
  16. "table": "mytest"
  17. }
  18. },
  19. "writer": {
  20. "plugin": "odps",
  21. "parameter": {
  22. "odpsServer": "http://service.odps.aliyun.com/api",
  23. "tunnelServer": "http://dt.odps.aliyun.com",
  24. "partition": "",
  25. "truncate": false,
  26. "datasource": "odps_first",
  27. "column": [
  28. "id",
  29. "name",
  30. "sex",
  31. "age"
  32. ],
  33. "table": "mytest"
  34. }
  35. },
  36. "setting": {
  37. "errorLimit": {
  38. "record": "0"
  39. },
  40. "speed": {
  41. "concurrent": "10",
  42. "mbps": "10"
  43. }
  44. }
  45. },
  46. "type": "job",
  47. "version": "1.0"
  48. }

Note:

The connection addresses of MaxCompute and its tunnel service:

Currently, MaxCompute service and the default MaxCompute data source address for the MaxCompute tunnel connection can be connected. For more information,see Endpoints and Data Center. This document lists the connection addresses for accessing MaxCompute and its tunnel service in different regions and network environments.

Modify the scheduling resource group

After you configure the synchronization, the scheduling task is not yet configured successfully until the configured synchronization task runs in the previously added scheduling resource group. Modify the scheduling resource group, and the steps are as follows.

  1. Go to DataWorks > Operation Center page, and modify the scheduling task in either the Cycle Task or Manual Task.

  2. Select the synchronization task and select More > Modify Resource Group.

  3. Select the resource group to be added and click OK.

    The task rerunning result is shown as follows.

    You can view the sync log by using select * from mytest command.

Thank you! We've received your feedback.