Migrate JSON-formatted data from MongoDB to MaxCompute - MaxCompute

This topic describes how to use the Data Integration feature of DataWorks to migrate JSON fields from MongoDB to MaxCompute.

Prerequisites

Activate MaxCompute and DataWorks.
A business flow has been created in DataWorks. This example uses the basic mode of DataWorks. For more information, see Create a business flow.

Prepare test data in MongoDB

Prepare an account.
Create a user in the database. This user is used to add a data source in DataWorks. In this example, run the following command.
```
db.createUser({user:"bookuser",pwd:"123456",roles:["user1"]})
```
A user named bookuser is created. The password for the user is 123456. The user is assigned a role that has the database permission to access data.

Prepare data.

Upload data to the MongoDB database. This example uses an ApsaraDB for MongoDB instance. The network type of the instance is VPC. You must request an Internet endpoint for the instance. Otherwise, the instance cannot communicate with the default resource group for DataWorks. The following code shows the test data.

{
    "store": {
        "book": [
            {
                "category": "reference",
                "author": "Nigel Rees",
                "title": "Sayings of the Century",
                "price": 8.95
                },
            {
                "category": "fiction",
                "author": "Evelyn Waugh",
                "title": "Sword of Honour",
                "price": 12.99
                },
            {
                "category": "fiction",
                "author": "J. R. R. Tolkien",
                "title": "The Lord of the Rings",
                "isbn": "0-395-19395-8",
                "price": 22.99
                }
                    ],
        "bicycle": {
            "color": "red",
            "price": 19.95
                }
                    },
        "expensive": 10
            }

In the DMS console for MongoDB, the database used in this example is admin and the collection is userlog. Run the following command to view the uploaded data.
```
db.userlog.find().limit(10)
```

Migrate JSON data from MongoDB to MaxCompute using DataWorks

Login DataWorks console.
Create a destination table in DataWorks. This table is used to receive the data migrated from MongoDB.
1. Right-click the business flow that you created and choose New Table > MaxCompute > Table.
2. In Create Table page, select the engine type, and enter Name.
3. On the table editing page, click DDL Statement.
4. In the DDL Mode dialog box, enter the table creation statement and click Generate Table Schema.
  Important
  The table name in the DDL statement must be the same as the Table Name that you entered on the Create Table page.
```
create table mqdata (mqdata string);
```
5. Click Commit To Production Environment.
Add a MongoDB data source. For more information, see Configure a MongoDB data source.

Create a batch synchronization node.

Go to the data analytics page. Right-click the specified workflow and choose Create Node > Data Integration > Offline synchronization.
In Create Node dialog box, enter Name, and click Confirm.
In the top navigation bar, choose icon.
In script mode, click icon.
In import Template dialog box SOURCE type, data source, target type and data source, and click confirm.

Enter the following script.

{
    "type": "job",
    "steps": [
    {
        "stepType": "mongodb",
        "parameter": {
            "datasource": "mongodb_userlog",//The name of the data source.
            "column": [
                {
                "name": "store.bicycle.color", // The path of the JSON field. In this example, the value of color is fetched.
                "type": "document.String" // For a sub-property that is not at the first layer, the type is the final type that you obtain. If the JSON field that you select is a first-layer field, such as the expensive field in this example, you can directly enter string.
                }
              ],
            "collectionName": "userlog"   // The name of the collection.
            },
        "name": "Reader",
        "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
            "partition": "",
            "isCompress": false,
            "truncate": true,
            "datasource": "odps_first",
            "column": [
            "mqdata"  // The name of the column in the MaxCompute table.
            ],
            "emptyAsNull": false,
            "table": "mqdata"
            },
            "name": "Writer",
            "category": "writer"
            }
            ],
            "version": "2.0",
            "order": {
            "hops": [
            {
            "from": "Reader",
            "to": "Writer"
            }
            ]
            },
            "setting": {
            "errorLimit": {
            "record": ""
            },
            "speed": {
            "concurrent": 2,
            "throttle": false,
            }
            }
        }

Click icon to run the code.
You can operation Log view the results.

Verify the result

Right-click the workflow and choose new > MaxCompute > ODPS SQL.
In create a node dialog box, enter node name, and click submit.
On the ODPS SQL node editing page, enter the following statement.
```
SELECT * from mqdata;
```
Click icon to run the code.
You can operation Log view the results.