All Products
Search
Document Center

DataWorks:Migrate JSON-formatted data from MongoDB to MaxCompute

Last Updated:Mar 27, 2026

MongoDB stores data as nested JSON documents. When migrating this data to MaxCompute, you need to extract specific fields using dot-notation paths and map them to the correct column types. This topic shows you how to use the Data Integration service of DataWorks to configure a batch sync job that reads nested JSON fields from a MongoDB collection and loads them into a MaxCompute table.

The example uses an ApsaraDB for MongoDB instance in a virtual private cloud (VPC) and extracts the store.bicycle.color field from a nested JSON document.

Prerequisites

Before you begin, make sure you have:

Step 1: Prepare test data in MongoDB

This step creates a database user, uploads sample JSON data to MongoDB, and confirms the data is accessible before you configure the sync job.

Create a database user

Create a user in your MongoDB database so DataWorks can connect to it. Run the following command:

db.createUser({user:"bookuser",pwd:"123456",roles:["user1"]})

This creates a user named bookuser with password 123456, assigned to the user1 role, which has data access permissions.

Upload test data

If you use an ApsaraDB for MongoDB instance in a VPC, apply for a public endpoint for the instance first. DataWorks shared resource groups connect to MongoDB over the public network.

Upload the following JSON document to the userlog collection in the admin database:

{
    "store": {
        "book": [
            {
                "category": "reference",
                "author": "Nigel Rees",
                "title": "Sayings of the Century",
                "price": 8.95
            },
            {
                "category": "fiction",
                "author": "Evelyn Waugh",
                "title": "Sword of Honour",
                "price": 12.99
            },
            {
                "category": "fiction",
                "author": "J. R. R. Tolkien",
                "title": "The Lord of the Rings",
                "isbn": "0-395-19395-8",
                "price": 22.99
            }
        ],
        "bicycle": {
            "color": "red",
            "price": 19.95
        }
    },
    "expensive": 10
}

Verify the uploaded data

Log on to the MongoDB database in the Data Management (DMS) console and run the following command to confirm the data was uploaded:

db.userlog.find().limit(10)

Step 2: Configure the sync job in DataWorks

This step creates the destination table in MaxCompute, adds MongoDB as a data source, and configures a batch sync node that maps the JSON field path to the MaxCompute column.

Create the destination table in MaxCompute

  1. Log on to the DataWorks console.

  2. Right-click your workflow and choose new > MaxCompute > table.

  3. On the create a table page, select the engine type and enter a table name.

  4. On the table editing page, click DDL Statement.

  5. In the DDL Statement dialog box, enter the following statement and click Generate Table Schema:

    Important

    The table name in the DDL statement must match the Name value you set in the create a table page.

    create table mqdata (mqdata string);
  6. Click Commit to Production Environment.

Add a MongoDB data source

Add your MongoDB instance as a data source in DataWorks. For more information, see Add a MongoDB data source.

Create a batch sync node

  1. On the data analytics page, right-click your workflow and choose new > data integration > offline synchronization.

  2. In the create a node dialog box, enter a node name and click submit.

  3. In the top navigation bar, click the Conversion script icon to switch to script mode.

  4. In script mode, click the ** icon.

  5. In the import Template dialog box, select the SOURCE type, data source, target type, and target data source, then click confirm.

  6. Replace the template content with the following script:

    {
        "type": "job",
        "steps": [
            {
                "stepType": "mongodb",
                "parameter": {
                    "datasource": "mongodb_userlog",
                    "column": [
                        {
                            "name": "store.bicycle.color",
                            "type": "document.String"
                        }
                    ],
                    "collectionName": "userlog"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter": {
                    "partition": "",
                    "isCompress": false,
                    "truncate": true,
                    "datasource": "odps_source",
                    "column": [
                        "mqdata"
                    ],
                    "emptyAsNull": false,
                    "table": "mqdata"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2,
                "throttle": false
            }
        }
    }

    The following table describes the key parameters:

    Parameter Value Description
    datasource (Reader) mongodb_userlog The name of your MongoDB data source
    name (column) store.bicycle.color The dot-notation path to the JSON field to extract
    type (column) document.String The field data type. Use document.String for nested fields. For top-level fields (such as expensive), use string
    collectionName userlog The MongoDB collection to read from
    datasource (Writer) odps_source The name of your MaxCompute data source
    column (Writer) mqdata The column in the MaxCompute table to write to
    table mqdata The MaxCompute table name

    The following example shows how the JSON document is mapped to the MaxCompute table after migration:

    MongoDB document field Value MaxCompute column Stored value
    store.bicycle.color "red" mqdata red
  7. Click the ** icon to run the job.

  8. Check the operation Log to confirm the job completed without errors.

Step 3: Verify the migration result

  1. Right-click your workflow and choose new > MaxCompute > ODPS SQL.

  2. In the create a node dialog box, enter a node name and click submit.

  3. On the node configuration tab, enter the following query:

    SELECT * from mqdata;
  4. Click the ** icon to run the query.

  5. Check the operation Log to confirm the query returns the migrated data. The result should contain the value red, extracted from store.bicycle.color in the source document.

JSON field type reference

The type parameter in the Reader column configuration controls how DataWorks interprets the extracted JSON value.

Field location Example field type value Notes
Nested (non-top-level) store.bicycle.color document.String Use the final type of the extracted value
Top-level expensive string Top-level fields use simple type names