This topic describes the data types and parameters supported by MongoDB Writer and how to configure it by using the code editor.

MongoDB Writer connects to a remote MongoDB database by using the Java client named MongoClient and writes data to the database. The latest version of MongoDB has improved the locking feature from database locks to document locks. With the powerful functionalities of indexes in MongoDB, MongoDB Writer can efficiently write data to MongoDB databases. If you want to update data, specify the primary key.

Note
  • You must configure a connection before configuring MongoDB Writer. For more information, see Configure a MongoDB connection.
  • If you use ApsaraDB for MongoDB, the MongoDB database has a root account by default.
  • For security concerns, Data Integration only supports access to a MongoDB database by using a MongoDB database account. When adding a MongoDB connection, do not use the root account for access.

MongoDB Writer obtains data from a Data Integration reader, and converts the data types to those supported by MongoDB. Data Integration does not support arrays. MongoDB supports arrays and the array index is useful.

To use MongoDB arrays, you can convert strings to MongoDB arrays by configuring a parameter and write the arrays to a MongoDB database.

Data types

MongoDB Writer supports most MongoDB data types. Make sure that your data types are supported.

The following table lists the data types supported by MongoDB Writer.
Category MongoDB data type
Integer Int and Long
Floating point Double
String String and Array
Date and time Date
Boolean Boolean
Binary Bytes
Note When data of the Date type is written to a MongoDB database, the type of the data is converted to Datetime.

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
collectionName The name of the MongoDB collection. Yes None
column The columns in MongoDB.
  • name: the name of the column.
  • type: the data type of the column.
  • splitter: the delimiter. Specify this field only when you want to convert the string to an array. The string is split based on the specified delimiter, and the split strings are saved in a MongoDB array.
Yes None
writeMode Specifies whether to overwrite data.
  • isReplace: If you set this parameter to true, MongoDB Writer overwrites the data in the destination table with the same primary key. If you set this parameter to false, the data is not overwritten.
  • replaceKey: the primary key for each record. Data is overwritten based on this primary key. The primary key must be unique.
No None
preSql The action to perform before the sync node is run. For example, you can clear outdated data before data synchronization. If the preSql parameter is left empty, no action is performed before data synchronization. Make sure that the value of the preSql parameter complies with the JSON syntax.

Before running the sync node, Data Integration performs the action specified by the preSql parameter. After the action is completed, Data Integration starts to read or write data. The action does not affect the data that is read or written. By specifying the preSql parameter, you can guarantee the idempotence of the read or write operation. For example, you can specify the preSql parameter to clear outdated data before data synchronization based on business requirements. If the sync node fails, you only need to rerun the sync node.

The format requirements for the preSql parameter are as follows:
  • Configure the type field to specify the action type. Valid values: drop and remove. Example: "preSql":{"type":"remove"}.
    • drop: deletes the collection specified by the collectionName parameter and the data in the collection.
    • remove: deletes data based on conditions.
    • json: the conditions for deleting data. Example: "preSql":{"type":"remove", "json":"{'operationTime':{'$gte':ISODate('${last_day}T00:00:00.424+0800')}}"}. In the preceding JSON string, ${last_day} is a scheduling parameter of DataWorks. The format is $[yyyy-mm-dd]. You can use comparison operators (such as $gt, $lt, $gte, and $lte), logical operators (such as $and and $or), and functions (such as max, min, sum, avg, and ISODate) supported by MongoDB as needed. For more information, see the MongoDB query syntax.

      Data Integration uses the following standard MongoDB API to query and delete the specified data:

      query=(BasicDBObject) com.mongodb.util.JSON.parse(json);                
      col.deleteMany(query);
      Note If you want to delete data based on conditions, we recommend that you specify the conditions in JSON format preferentially.
    • item: the name, condition, and value for filtering data. Example: "preSql":{"type":"remove","item":[{"name":"pv","value":"100","condition":"$gt"},{"name":"pid","value":"10"}]}.

      Data Integration sets query conditions based on the value of the item field and deletes data through the standard MongoDB API. Example: col.deleteMany(query);.

  • If the value of the preSql parameter cannot be recognized, no action is performed.
No None

Configure MongoDB Writer by using the codeless UI

Currently, the codeless user interface (UI) is not supported for MongoDB Writer.

Configure MongoDB Writer by using the code editor

In the following code, a node is configured to write data to a MongoDB database. For more information about the parameters, see the preceding parameter description.
{
    "type": "job",
    "version": "2.0",// The version number.
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "mongodb",// The writer type.
            "parameter": {
                "datasource": "",// The connection name.
                "column": [
                    {
                        "name": "_id",// The name of the column to which data is written.
                        "type": "ObjectId"// The data type of the column to which data is written. If the replacekey parameter is set to _id, set the type parameter to ObjectId. If you set the type parameter to String, the data cannot be overwritten.
                    },
                    {
                        "name": "age",
                        "type": "int"
                    },
                    {
                        "name": "id",
                        "type": "long"
                    },
                    {
                        "name": "wealth",
                        "type": "double"
                    },
                    {
                        "name": "hobby",
                        "type": "array",
                        "splitter": " "
                    },
                    {
                        "name": "valid",
                        "type": "boolean"
                    },
                    {
                        "name": "date_of_join",
                        "format": "yyyy-MM-dd HH:mm:ss",
                        "type": "date"
                    }
                ],
                "writeMode": {// The write mode.
                    "isReplace": "true",
                    "replaceKey": "_id"
                },
                "collectionName": "datax_test"// The name of the MongoDB collection.
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {// The maximum number of dirty data records allowed.
            "record": "0"
        },
        "speed": {
            "jvmOption": "-Xms1024m -Xmx1024m",
            "throttle": true,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
            "mbps": "1"// The maximum transmission rate.
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}