All Products
Search
Document Center

MaxCompute:Migrate JSON-formatted data from MongoDB to MaxCompute

Last Updated:Oct 25, 2023

This topic describes how to use the Data Integration service of DataWorks to migrate JSON-formatted fields from MongoDB to MaxCompute.

Prerequisites

  • MaxCompute and DataWorks are activated. For more information, see Activate MaxCompute and DataWorks.

  • A workflow is created in DataWorks. In this example, a workflow is created in a workspace that is in basic mode. For more information, see Create a workflow.

Prepare test data in MongoDB

  1. Prepare an account.

    Create a user in your database to prepare information for creating a connection in DataWorks. In this example, run the following command:

    db.createUser({user:"bookuser",pwd:"123456",roles:["user1"]})

    In this example, a user named bookuser is created and the user password is 123456. The user is assigned a role that has the data access permission.

  2. Prepare data.

    Upload the data to the MongoDB database. In this example, an ApsaraDB for MongoDB instance in a virtual private cloud (VPC) is used. You must apply for a public endpoint for the ApsaraDB for MongoDB instance to communicate with the default resource group of DataWorks. The following test data is uploaded:

    {
        "store": {
            "book": [
                {
                    "category": "reference",
                    "author": "Nigel Rees",
                    "title": "Sayings of the Century",
                    "price": 8.95
                    },
                {
                    "category": "fiction",
                    "author": "Evelyn Waugh",
                    "title": "Sword of Honour",
                    "price": 12.99
                    },
                {
                    "category": "fiction",
                    "author": "J. R. R. Tolkien",
                    "title": "The Lord of the Rings",
                    "isbn": "0-395-19395-8",
                    "price": 22.99
                    }
                        ],
            "bicycle": {
                "color": "red",
                "price": 19.95
                    }
                        },
            "expensive": 10
                }
  3. Log on to the MongoDB database in the Data Management (DMS) console. In this example, the name of the database is admin, and the name of the collection is userlog. You can run the following command to view the uploaded data.

    db.userlog.find().limit(10)

Migrate JSON-formatted data from MongoDB to MaxCompute by using DataWorks

  1. Login DataWorks console.

  2. Create a table in DataWorks. This table is used to store the data that is migrated from MongoDB.

    1. Right-click a created workflow, Select new > MaxCompute > table.

    2. In create a table page, select the engine type, and enter table name.

    3. On the table editing page, click DDL Statement.

    4. In the DDL Statement dialog box, enter the following statement and click Generate Table Schema:

      Important

      The table name that you specify in the table creation statement must be the same as the value of the Name parameter that you configured in the Create Table dialog box.

      create table mqdata (mqdata string);
    5. Click Commit to Production Environment.

  3. Create a MongoDB connection. For more information, see Add a MongoDB data source.

  4. Create a batch synchronization node.

    1. Go to the data analytics page. Right-click the specified workflow and choose new > data integration > offline synchronization.

    2. In create a node dialog box, enter node name, and click submit.

    3. In the top navigation bar, choose Conversion scripticon.

    4. In script mode, click **icon.

    5. In import Template dialog box SOURCE type, data source, target type and data source, and click confirm.

    6. Enter the following script:

      {
          "type": "job",
          "steps": [
          {
              "stepType": "mongodb",
              "parameter": {
                  "datasource": "mongodb_userlog", // The name of the connection. 
                  "column": [
                      {
                      "name": "store.bicycle.color", // The path of the JSON-formatted field. In this example, the color field is extracted. 
                      "type": "document.String" // For fields other than top-level fields, the data type of such a field is the type that is finally obtained. If the specified JSON-formatted field is a top-level field, such as the expensive field in this example, enter string. 
                      }
                    ],
                  "collectionName": "userlog" // The name of the collection. 
                  },
              "name": "Reader",
              "category": "reader"
              },
              {
                  "stepType": "odps",
                  "parameter": {
                  "partition": "",
                  "isCompress": false,
                  "truncate": true,
                  "datasource": "odps_first",
                  "column": [
                  "mqdata" // The name of the column in the MaxCompute table. 
                  ],
                  "emptyAsNull": false,
                  "table": "mqdata"
                  },
                  "name": "Writer",
                  "category": "writer"
                  }
                  ],
                  "version": "2.0",
                  "order": {
                  "hops": [
                  {
                  "from": "Reader",
                  "to": "Writer"
                  }
                  ]
                  },
                  "setting": {
                  "errorLimit": {
                  "record": ""
                  },
                  "speed": {
                  "concurrent": 2,
                  "throttle": false,
                  }
                  }
              }
    7. Click **icon to run the code.

    8. You can operation Log view the results.

Verify the migration result

  1. Right-click the workflow and choose new > MaxCompute > ODPS SQL.

  2. In create a node dialog box, enter node name, and click submit.

  3. On the configuration tab of the ODPS SQL node, enter the following statements:

    SELECT * from mqdata;
  4. Click **icon to run the code.

  5. You can operation Log view the results.