Use DataWorks to batch-migrate offline data from ApsaraDB for MongoDB to LindormTable. DataWorks is an important platform as a service (PaaS) provided by Alibaba Cloud that supports multiple computing engines and storage engines. For more information about DataWorks, see What is DataWorks?.
Prerequisites
Before you begin, make sure you have:
An ApsaraDB for MongoDB instance with the source data
A LindormTable with the target schema created
Access to the Data Integration service of DataWorks to configure a DataX task (see Use DataWorks to configure synchronization tasks in DataX)
Nested field mapping
MongoDB documents can contain nested JSON objects. LindormTable stores data in a flat, row-oriented structure, so you must unnest all nested fields before or during migration.
Use dot notation to reference nested fields in the MongoDB Reader configuration. For example, a document field map.a maps to column a in LindormTable, with type document.string in the reader configuration. The following table shows how nested fields translate:
| MongoDB field | Dot notation in Reader | LindormTable column | Reader type |
|---|---|---|---|
map.a | map.a | a | document.string |
map.b | map.b | b | document.string |
Data type conversion is not required for other field types.
If you need to transform data during migration (for example, apply MD5 hashing to the primary key), use the following three-step approach instead:
Migrate data from ApsaraDB for MongoDB to MaxCompute.
Run SQL statements in MaxCompute to process the data.
Migrate the processed data from MaxCompute to LindormTable.
Prepare the source and target data
Source document in ApsaraDB for MongoDB:
{
"id" : ObjectId("624573dd7c0e2eea4cc8****"),
"title" : "ApsaraDB for MongoDB tutorial",
"description" : "ApsaraDB for MongoDB is a NoSQL database",
"by" : "beginner tutorial",
"url" : "http://www.runoob.com",
"map" : {
"a" : "mapa",
"b" : "mapb"
},
"likes" : 100
}Target schema in LindormTable:
CREATE TABLE t1(title varchar, desc varchar, by1 varchar, url varchar, a varchar, b varchar, likes int, primary key(title));The nested fields map.a and map.b in MongoDB are flattened into columns a and b in LindormTable. The id field is omitted because title serves as the primary key.
Migrate data
Step 1: Add a MongoDB data source
In the DataWorks console, configure the source ApsaraDB for MongoDB instance as a data source. For detailed steps, see Add a MongoDB data source.
Step 2: Create a workflow
For more information about configuring a batch synchronization task using the code editor, see Configure a batch synchronization task by using the code editor.
Log on to the DataWorks console.
In the left-side navigation pane, click Workspace.
In the top navigation bar, select the region where your workspace resides. On the Workspaces page, find your workspace and choose Shortcuts > Data Development in the Actions column.
On the DataStudio page, hover over the
icon and select Create Workflow.In the Create Workflow dialog box, enter a Workflow Name and Description.
The name must be 1 to 128 characters and can contain letters, digits, underscores (_), and periods (.).
Click Create.
Step 3: Create a batch synchronization node
Click the new workflow, then right-click Data Integration.
Choose Create Node > Offline synchronization.
In the Create Node dialog box, enter the Name of the node.
The node name must be 1 to 128 characters and can contain letters, digits, underscores (_), and periods (.).
Click Submit.
Step 4: Configure the reader and writer
On the node configuration tab, click the Conversion script icon
in the top toolbar.In the Tips dialog, click OK to open the code editor.
Replace the generated code with the following configuration. The job uses MongoDB Reader as the source and Lindorm Writer as the destination.
For MongoDB Reader parameters, see MongoDB Reader.
For Lindorm Writer parameters, see Lindorm Writer.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mongodb", "parameter": { "datasource": "test_mongo", // The name of the ApsaraDB for MongoDB data source. "column": [ { "name": "title", "type": "string" }, { "name": "description", "type": "string" }, { "name": "by", "type": "string" }, { "name": "url", "type": "string" }, { "name": "map.a", "type": "document.string" }, { "name": "map.b", "type": "document.string" }, { "name": "likes", "type": "int" } ], "collectionName": "testdatax" }, "name": "Reader", "category": "reader" }, { "stepType": "lindorm", "parameter": { "configuration": { "lindorm.client.seedserver": "ld-xxxx-proxy-lindorm.lindorm.rds.aliyuncs.com:30020", "lindorm.client.username": "root", "lindorm.client.namespace": "test", "lindorm.client.password": "root" }, "nullMode": "skip", "datasource": "", "writeMode": "api", "envType": 1, "columns": [ "title", "desc", "by", "url", "a", "b", "likes" ], "dynamicColumn": "false", "table": "t1", "encoding": "utf8" }, "name": "Writer", "category": "writer" } ], "setting": { "executeMode": null, "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Save the node configuration, then click the
icon to run the job. Monitor progress on the Runtime Log
Verify the migration
After the job completes, confirm that the data was migrated correctly:
On the Runtime Log tab, verify that the job status shows no errors and that the record count matches the number of documents in the source collection.
Query LindormTable to spot-check the migrated data. Confirm that the flattened nested fields (
a,b) are populated correctly and that numeric fields such aslikescontain accurate integer values.