This topic describes the data types and parameters supported by MongoDB Reader and how to configure it by using the code editor.

MongoDB Reader connects to a remote MongoDB database by using the Java client named MongoClient and reads data from the database. The latest version of MongoDB has improved the locking feature from database locks to document locks. With the powerful functionalities of indexes in MongoDB, MongoDB Reader can efficiently read data from MongoDB databases.

Note
  • If you use ApsaraDB for MongoDB, the MongoDB database has a root account by default. For security concerns, Data Integration only supports access to a MongoDB database by using a MongoDB database account. When adding a MongoDB connection, do not use the root account for access.
  • JavaScript syntax is not supported for queries.

MongoDB Reader shards data in the MongoDB database according to specified rules, reads data from the database with multiple threads, and then converts the data to a format readable by Data Integration.

Data types

MongoDB Reader supports most MongoDB data types. Make sure that your data types are supported.

The following table lists the data types supported by MongoDB Reader.

Category MongoDB data type
Long INT, LONG, DOCUMENT.INT, and DOCUMENT.LONG
Double DOUBLE and DOCUMENT.DOUBLE
String STRING, ARRAY, DOCUMENT.STRING, DOCUMENT.ARRAY, and COMBINE
Date DATE and DOCUMENT.DATE
Boolean BOOLEAN and DOCUMENT.BOOLEAN
Byte BYTES and DOCUMENT.BYTES
Note
  • The DOCUMENT data type is used to store embedded documents. It is also called the OBJECT data type.
  • The following content describes how to use the COMBINE data type:

    When MongoDB Reader reads data from a MongoDB database, it combines and converts multiple fields in MongoDB documents to a JSON string.

    For example, doc1, doc2, and doc3 are three MongoDB documents with different fields, which are represented by keys instead of key-value pairs. The keys a and b represent common fields in all the three documents. The key x_n represents an unfixed field.

    doc1: a b x_1 x_2

    doc2: a b x_2 x_3 x_4

    doc3: a b x_5

    To import the preceding three MongoDB documents to MaxCompute, you must specify the fields to retain, set a name for each combined string, and set the data type of each combined string to COMBINE in the configuration file. Make sure that the name of each combined string is unique among all existing fields in the documents.

    "column": [
    {
    "name": "a",
    "type": "string",
    },
    {
    "name": "b",
    "type": "string",
    },
    {
    "name": "doc",
    "type": "combine",
    }
    ]
    The following table lists the output in MaxCompute.
    odps_column1 odps_column2 odps_column3
    a b {x_1,x_2}
    a b {x_1,x_2,x_3}
    a b {x_5}

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
collectionName The name of the MongoDB collection. Yes None
column The columns in MongoDB.
  • name: the name of the column.
  • type: the data type of the column.
  • splitter: the delimiter. Specify this parameter only when you need to convert the string to an array. MongoDB supports arrays, but Data Integration does not. The array elements read by MongoDB are joined to a string by using this delimiter.
Yes None
query The filter condition for obtaining data from MongoDB. Only data of the time type is supported. For example, you can use the statement "query":"{'operationTime':{'$gte':ISODate('${last_day}T00:00:00.424+0800')}}" to obtain data where the time specified by operationTime is not earlier than 00:00 on the day specified by ${last_day}. In the preceding JSON string, ${last_day} is a scheduling parameter of DataWorks. The format is $[yyyy-mm-dd]. You can use conditional operators ($gt, $lt, $gte, $lte), logical operators (and, or), and functions (max, min, sum, avg, ISODate) supported by MongoDB as needed. For more information, see Configure MongoDB Reader in the code editor. No None

Configure MongoDB Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for MongoDB Reader.

Configure MongoDB Reader by using the code editor

In the following code, a node is configured to read data from a MongoDB database. For more information about the parameters, see the preceding parameter description.

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "category": "reader",
            "name": "Reader",
            "parameter": {
                "datasource": "datasourceName", // The connection name.
                "collectionName": "tag_data", // The name of the MongoDB collection.
                "query": "", // The filter condition for obtaining data from MongoDB.
                "column": [
                    {
                        "name": "unique_id", // The field name.
                        "type": "string" // The data type.
                    },
                    {
                        "name": "sid",
                        "type": "string"
                    },
                    {
                        "name": "user_id",
                        "type": "string"
                    },
                    {
                        "name": "auction_id",
                        "type": "string"
                    },
                    {
                        "name": "content_type",
                        "type": "string"
                    },
                    {
                        "name": "pool_type",
                        "type": "string"
                    },
                    {
                        "name": "frontcat_id",
                        "type": "array",
                        "splitter": ""
                    },
                    {
                        "name": "categoryid",
                        "type": "array",
                        "splitter": ""
                    },
                    {
                        "name": "gmt_create",
                        "type": "string"
                    },
                    {
                        "name": "taglist",
                        "type": "array",
                        "splitter": " "
                    },
                    {
                        "name": "property",
                        "type": "string"
                    },
                    {
                        "name": "scorea",
                        "type": "int"
                    },
                    {
                        "name": "scoreb",
                        "type": "int"
                    },
                    {
                        "name": "scorec",
                        "type": "int"
                    },
                    {
                        "name": "a.b",
                        "type": "document.int"
                    },
                    {
                        "name": "a.b.c",
                        "type": "document.array",
                        "splitter": " "
                    }
                ]
            },
            "stepType": "mongodb"
        },
        {// The following template is used to configure Stream Writer. For more information, see the corresponding topic.
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}
Note Currently, you cannot retrieve data elements from arrays.