This topic describes the data types and parameters that MongoDB Writer supports and how to configure it by using the code editor.

Background information

MongoDB Writer connects to a remote MongoDB database by using the Java client named MongoClient and writes data to the database. The latest version of MongoDB has improved the locking feature from database locks to document locks. Based on the powerful index functionalities of MongoDB, MongoDB Writer can efficiently write data to MongoDB databases. If you want to update data, specify the primary key.
Note
  • You must configure a MongoDB connection before you configure MongoDB Writer. For more information, see Configure a MongoDB connection.
  • By default , the MongoDB database has a root account if you use ApsaraDB for MongoDB.
  • For security concerns, Data Integration is allowed to connect to a MongoDB database only by using a MongoDB database account. When you add a MongoDB connection, do not use the root account.

MongoDB Writer obtains data from a Data Integration reader, and converts the data types to those supported by MongoDB. Data Integration does not support arrays. MongoDB supports arrays. Arrays support the indexing feature.

To use MongoDB arrays, you can convert strings to MongoDB arrays by configuring a parameter, and write the arrays to a MongoDB database.

Data types

MongoDB Writer supports most MongoDB data types. Make sure that your data types are supported.

The following table describes the data types that MongoDB Writer supports.
Category MongoDB data type
Integer INT and LONG
Floating point DOUBLE
String STRING and ARRAY
Date and time DATE
Boolean BOOL
Binary BYTES
Note When MongoDB Writer writes data of the DATE type to a MongoDB database, the type of the data is converted to DATETIME.

Parameters

Parameter Description Required Default value
datasource The connection name. It must be the same as the name of the added connection. You can add connections in the code editor. Yes N/A
collectionName The name of the MongoDB collection. Yes N/A
column The columns in MongoDB.
  • name: the name of the column.
  • type: the data type of the column.
  • splitter: the delimiter. Specify this field only when you want to convert the string to an array. The string is split based on the specified delimiter, and the split strings are saved in a MongoDB array.
Yes N/A
writeMode Specifies whether to overwrite data. Valid values:
  • isReplace: If you set this parameter to true, MongoDB Writer overwrites the data in the destination table with the same primary key. If you set this parameter to false, the data is not overwritten.
  • replaceKey: the primary key for each record. Data is overwritten based on this primary key. The primary key must be unique.
No N/A
preSql The action to perform before the sync node is run. For example, you can clear outdated data before data synchronization. If the preSql parameter is left empty, no action is performed before data synchronization. Make sure that the value of the preSql parameter complies with the JSON syntax. No N/A

Before the sync node is run, Data Integration performs the action that is specified by the preSql parameter. After the action is completed, Data Integration starts to read or write data. The action does not affect the data that is read or written. You can ensure the idempotence of the read or write operation by specifying the preSql parameter. For example, you can specify the preSql parameter to clear outdated data before data synchronization based on your business requirements. If the sync node fails, you only need to rerun the sync node.

Note the following format requirements for the preSql parameter:
  • Configure the type field to specify the action type. Valid values: drop and remove. Example: "preSql":{"type":"remove"}.
    • drop: deletes the collection specified by the collectionName parameter and the data in the collection.
    • remove: deletes data based on conditions.
    • json: the conditions for deleting data. Example: "preSql":{"type":"remove", "json":"{'operationTime':{'$gte':ISODate('${last_day}T00:00:00.424+0800')}}"}. In the preceding JSON string, ${last_day} is a scheduling parameter of DataWorks. The format is $[yyyy-mm-dd]. Other operators and functions are also supported, for example, comparison operators $gt, $lt, $gte, and $lte, logical operators $and and $or, and functions max, min, sum, avg, and ISODate. You can use them based on your actual needs.
      Data Integration uses the following standard MongoDB API to query and delete the specified data:
      query=(BasicDBObject) com.mongodb.util.JSON.parse(json);                
      col.deleteMany(query);
      Note If you want to delete data based on conditions, we recommend that you specify the conditions in JSON format preferentially.
    • item: the name, condition, and value for filtering data. Example: "preSql":{"type":"remove","item":[{"name":"pv","value":"100","condition":"$gt"},{"name":"pid","value":"10"}]}.

      Data Integration sets query conditions based on the value of the item field and deletes data by using the standard MongoDB API. Example: col.deleteMany(query);.

  • If the value of the preSql parameter cannot be recognized, no action is performed.

Configure MongoDB Writer by using the codeless UI

The codeless user interface (UI) is not supported for MongoDB Writer.

Configure MongoDB Writer by using the code editor

You can configure MongoDB Writer by using the code editor. For more information, see Create a sync node by using the code editor.

The following example shows how to configure a sync node to write data to a MongoDB database. For more information about the parameters, see the preceding parameter description.
{
    "type": "job",
    "version": "2.0", // The version number.
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "mongodb",// The writer type.
            "parameter": {
                "datasource": "",// The connection name.
                "column": [
                    {
                        "name": "_id",// The name of the column to which data is written.
                        "type": "ObjectId"// The data type of the column to which data is written. If the replaceKey parameter is set to _id, set the type parameter to ObjectId. If you set the type parameter to String, the data cannot be overwritten.
                    },
                    {
                        "name": "age",
                        "type": "int"
                    },
                    {
                        "name": "id",
                        "type": "long"
                    },
                    {
                        "name": "wealth",
                        "type": "double"
                    },
                    {
                        "name": "hobby",
                        "type": "array",
                        "splitter": " "
                    },
                    {
                        "name": "valid",
                        "type": "boolean"
                    },
                    {
                        "name": "date_of_join",
                        "format": "yyyy-MM-dd HH:mm:ss",
                        "type": "date"
                    }
                ],
                "writeMode": {// The write mode.
                    "isReplace": "true",
                    "replaceKey": "_id"
                },
                "collectionName": "datax_test"// The name of the MongoDB collection.
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {// The maximum number of dirty data records allowed.
            "record": "0"
        },
        "speed": {
            "jvmOption": "-Xms1024m -Xmx1024m",
            "throttle": true,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1// The maximum number of concurrent threads.
            "mbps": "1"// The maximum transmission rate.
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}