All Products
Search
Document Center

MaxCompute:Migrate JSON-formatted data from MongoDB to MaxCompute

Last Updated:Mar 26, 2026

Use the Data Integration feature of DataWorks to read JSON fields from a MongoDB collection and load them into a MaxCompute table as a batch sync job.

Prerequisites

Before you begin, ensure that you have:

Limitations

  • The MongoDB instance must have an Internet endpoint enabled. The default resource group for DataWorks communicates with MongoDB over the public internet. A Virtual Private Cloud (VPC)-only instance cannot be reached.

How it works

Data Integration maps MongoDB JSON fields to MaxCompute columns using dot-notation paths. For example, given this source document:

{
  "store": {
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  },
  "expensive": 10
}

The field store.bicycle.color maps to a MaxCompute column. You specify this path in the sync job configuration along with a type annotation:

JSON field pathType annotationNote
store.bicycle.colordocument.StringNested field — use the document. prefix followed by the data type
expensivestringTop-level field — specify the type directly

Step 1: Prepare test data in MongoDB

  1. Create a database user for DataWorks to authenticate with. In the MongoDB shell, run:

    db.createUser({user:"bookuser",pwd:"123456",roles:["user1"]})

    This creates a user named bookuser with password 123456 and the user1 role.

  2. Load test data into the collection. This tutorial uses an ApsaraDB for MongoDB instance with a VPC network type. Before proceeding, request an Internet endpoint for the instance — without it, the default resource group for DataWorks cannot reach the instance. Insert the following JSON document into the userlog collection of the admin database:

    {
        "store": {
            "book": [
                {
                    "category": "reference",
                    "author": "Nigel Rees",
                    "title": "Sayings of the Century",
                    "price": 8.95
                },
                {
                    "category": "fiction",
                    "author": "Evelyn Waugh",
                    "title": "Sword of Honour",
                    "price": 12.99
                },
                {
                    "category": "fiction",
                    "author": "J. R. R. Tolkien",
                    "title": "The Lord of the Rings",
                    "isbn": "0-395-19395-8",
                    "price": 22.99
                }
            ],
            "bicycle": {
                "color": "red",
                "price": 19.95
            }
        },
        "expensive": 10
    }
  3. Verify the data was inserted correctly. In the DMS console for MongoDB, run the following query against the admin database:

    db.userlog.find().limit(10)

Step 2: Create the destination table in MaxCompute

  1. Log on to the DataWorks console.

  2. In the business flow you created, right-click and choose New Table > MaxCompute > Table.

  3. On the Create Table page, select the engine type and enter a table name.

  4. On the table editing page, click DDL Statement.

  5. In the DDL Mode dialog box, enter the following statement and click Generate Table Schema.

    create table mqdata (mqdata string);
    Important

    The table name in the DDL statement must match the table name you entered on the Create Table page.

  6. Click Commit To Production Environment.

Step 3: Add the MongoDB data source

Add MongoDB as a data source in DataWorks. See Configure a MongoDB data source.

Step 4: Create and run the sync job

  1. In the business flow, right-click the workflow and choose Create Node > Data Integration > Offline synchronization.

  2. In the Create Node dialog box, enter a name and click Confirm.

  3. In the top navigation bar, click the Conversion script icon to switch to script mode.

  4. In script mode, click the icon.

  5. In the Import Template dialog box, set the source type, data source, target type, and data source fields, then click Confirm.

  6. Replace the generated script with the following configuration. This script reads store.bicycle.color from the userlog collection and writes it to the mqdata column in MaxCompute.

    {
        "type": "job",
        "steps": [
            {
                "stepType": "mongodb",
                "parameter": {
                    "datasource": "mongodb_userlog",
                    "column": [
                        {
                            "name": "store.bicycle.color",
                            "type": "document.String"
                        }
                    ],
                    "collectionName": "userlog"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter": {
                    "partition": "",
                    "isCompress": false,
                    "truncate": true,
                    "datasource": "odps_first",
                    "column": [
                        "mqdata"
                    ],
                    "emptyAsNull": false,
                    "table": "mqdata"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2,
                "throttle": false
            }
        }
    }

    Key parameters:

    ParameterValueDescription
    datasource (reader)mongodb_userlogName of the MongoDB data source added in Step 3
    column[].namestore.bicycle.colorDot-notation path to the target JSON field
    column[].typedocument.StringType annotation for nested fields; use string for top-level fields
    collectionNameuserlogMongoDB collection to read from
    datasource (writer)odps_firstName of the MaxCompute data source
    column (writer)mqdataMaxCompute column to write to
    tablemqdataMaxCompute destination table
    speed.concurrent2Number of concurrent read threads
  7. Click the icon to run the job.

  8. Check the Operation Log tab to confirm the job completed without errors.

Verify the result

  1. In the business flow, right-click the workflow and choose New > MaxCompute > ODPS SQL.

  2. In the Create Node dialog box, enter a node name and click Confirm.

  3. On the ODPS SQL node editing page, enter the following query:

    SELECT * from mqdata;
  4. Click the icon to run the query.

  5. Check the Operation Log tab to view the results. The output should contain the value red, which is the value of store.bicycle.color in the source document.

What's next

  • To sync additional JSON fields, add more entries to the column array in the sync job script.

  • To run the sync job on a schedule, configure a scheduling dependency in DataWorks.