This topic describes the data types and parameters that MongoDB Reader supports and how to configure it by using the code editor.

MongoDB Reader connects to a remote MongoDB database by using the Java client named MongoClient and reads data from the database. The latest version of MongoDB has improved the locking feature from database locks to document locks. By using the powerful functionalities of indexes in MongoDB, MongoDB Reader can efficiently read data from MongoDB databases.
Note
  • If you use ApsaraDB for MongoDB, the MongoDB database has a root account by default. For security concerns, Data Integration supports access to a MongoDB database only by using a MongoDB database account. When you add a MongoDB connection, do not use the root account for access.
  • JavaScript syntax is not supported for queries.

MongoDB Reader shards data in the MongoDB database based on specified rules, reads data from the database with multiple threads, and then converts the data to a format readable by Data Integration.

Data types

MongoDB Reader supports most MongoDB data types. Make sure that your data types are supported.

The following table describes the data types that MongoDB Reader supports.
Category MongoDB data type
LONG INT, LONG, document.INT, and document.LONG
DOUBLE DOUBLE and document.DOUBLE
STRING STRING, ARRAY, document.STRING, document.ARRAY, and COMBINE
DATE DATE and document.DATE
BOOLEAN BOOL and document.BOOL
BYTES BYTES and document.BYTES
Note The DOCUMENT data type is used to store embedded documents. It is also called the OBJECT data type.

The following content describes how to use the COMBINE data type:

When MongoDB Reader reads data from a MongoDB database, MongoDB Reader combines and converts multiple fields in MongoDB documents to a JSON string.

For example, doc1, doc2, and doc3 are three MongoDB documents that contain different fields. The keys are represented by keys instead of key-value pairs. The keys a and b represent common fields in all the three documents. The key x_n represents an unfixed field.

doc1: a b x_1 x_2

doc2: a b x_2 x_3 x_4

doc3: a b x_5

To import the preceding three MongoDB documents to MaxCompute, you must specify the fields to retain, specify a name for each combined string, and specify the data type of each combined string to COMBINE in the configuration file. Make sure that the name of each combined string is different from that of any existing field in the documents.
"column": [
{
"name": "a",
"type": "string",
},
{
"name": "b",
"type": "string",
},
{
"name": "doc",
"type": "combine",
}
]
The following table describes the output in MaxCompute.
odps_column1 odps_column2 odps_column3
a b {x_1,x_2}
a b {x_1,x_2,x_3}
a b {x_5}

Parameters

Parameter Description Required Default value
datasource The connection name. It must be the same as the name of the added connection. You can add connections in the code editor. Yes N/A
collectionName The name of the replica set in MongoDB. Yes N/A
column The columns in MongoDB.
  • name: the name of the column.
  • type: the data type of the column. Valid values:
    • string: string
    • long: integer
    • double: floating point
    • date: date
    • bool: Boolean value
    • bytes: binary sequence
    • arrays: If the value of type is array, MongoDB Reader reads data from MongoDB documents as a JSON array, for example, ["a","b","c"].
    • array: If the value of type is array, MongoDB Reader reads data from MongoDB documents as a common array, in which elements are separated with commas (,), for example, a,b,c. We recommend that you use the value arrays.
    • combine: If the value of type is combine, MongoDB Reader combines and converts multiple fields in the MongoDB documents to a JSON string.
  • splitter: the delimiter. Specify this parameter only when you need to convert the string to an array. MongoDB supports arrays, but Data Integration does not. The array elements that are read by MongoDB Reader are joined to a string by using this delimiter.
Yes N/A
query The filter condition that is used to read data from MongoDB. Only data of the time type is supported. For example, you can use the statement "query":"{'operationTime':{'$gte':ISODate('${last_day}T00:00:00.424+0800')}}" to obtain data where the time specified by operationTime is not earlier than 00:00 on the day specified by ${last_day}. In the preceding JSON string, ${last_day} is a scheduling parameter of DataWorks. The format is $[yyyy-mm-dd]. You can use comparison operators (such as $gt, $lt, $gte, and $lte), logical operators (such as $and and $or), and functions (such as max, min, sum, avg, and ISODate) that are supported by MongoDB as needed. No N/A

Configure MongoDB Reader by using the codeless UI

The codeless user interface (UI) is not supported for MongoDB Reader.

Configure MongoDB Reader by using the code editor

You can configure MongoDB Reader by using the code editor. For more information, see Create a sync node by using the code editor.

The following example shows how to configure a sync node to read data from a MongoDB database. For more information about the parameters, see the preceding parameter description.
Notice
  • Delete the comments from the following code before you run the code.
  • You cannot retrieve data elements from arrays.
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "category": "reader",
            "name": "Reader",
            "parameter": {
                "datasource": "datasourceName",// The connection name.
                "collectionName": "tag_data",// The name of the replica set in MongoDB.
                "query": "",// The filter condition that is used to obtain data from MongoDB.
                "column": [
                    {
                        "name": "unique_id",// The name of the column.
                        "type": "string"// The data type of the column.
                    },
                    {
                        "name": "sid",
                        "type": "string"
                    },
                    {
                        "name": "user_id",
                        "type": "string"
                    },
                    {
                        "name": "auction_id",
                        "type": "string"
                    },
                    {
                        "name": "content_type",
                        "type": "string"
                    },
                    {
                        "name": "pool_type",
                        "type": "string"
                    },
                    {
                        "name": "frontcat_id",
                        "type": "array",
                        "splitter": ""
                    },
                    {
                        "name": "categoryid",
                        "type": "array",
                        "splitter": ""
                    },
                    {
                        "name": "gmt_create",
                        "type": "string"
                    },
                    {
                        "name": "taglist",
                        "type": "array",
                        "splitter": " "
                    },
                    {
                        "name": "property",
                        "type": "string"
                    },
                    {
                        "name": "scorea",
                        "type": "int"
                    },
                    {
                        "name": "scoreb",
                        "type": "int"
                    },
                    {
                        "name": "scorec",
                        "type": "int"
                    },
                    {
                        "name": "a.b",
                        "type": "document.int"
                    },
                    {
                        "name": "a.b.c",
                        "type": "document.array",
                        "splitter": " "
                    }
                ]
            },
            "stepType": "mongodb"
        },
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}