By creating a MongoDB data source, you can enable Dataphin to read business data from MongoDB or write data to MongoDB. This topic describes how to create a MongoDB data source.
Background information
MongoDB is an open-source document database that stores BSON documents (similar to JSON) with dynamic schemas, capable of handling large amounts of unstructured data. Its features include flexible data models, efficient indexing mechanisms, support for data replication and sharding, and easy-to-use APIs. MongoDB is suitable for application scenarios that require rapid iteration and storage of diverse data formats.
If you use MongoDB, you need to create a MongoDB data source before importing business data from MongoDB to Dataphin or exporting data from Dataphin to MongoDB. For more information about ApsaraDB for MongoDB, see ApsaraDB for MongoDB-What is ApsaraDB for MongoDB.
Permissions
Only custom global roles with the Create Data Source permission and system roles such as Super Administrator, Data Source Administrator, Domain Architect, and Project Administrator can create data sources.
Procedure
In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select MongoDB in the NoSQL section.
If you have recently used MongoDB, you can also select MongoDB in the Recently Used section. You can also enter MongoDB keywords in the search box to quickly search for it.
On the Create MongoDB Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the format
data_source_code.table_nameordata_source_code.schema.table_name. If you need to automatically access the data source in the corresponding environment based on the current environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Version
Supports MongoDB3.2 and MongoDB3.4+ versions.
Data Source Description
A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration
Select the data source to configure:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you selected Production + Development Data Source in the previous step, the configuration page appears as shown in the following figure. If you selected Production Data Source, only the production data source configuration page is displayed.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation between them and reduce the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
JDBC URL
Supports multi-replica mode and can be configured with multiple addresses. The connection address format is
mongodb://host1:port1;host2:port2....NoteThe JDBC URL address supports carrying the authSource parameter.
For example, with ApsaraDB for MongoDB, you can view the connection address and port information on the instance basic information page in the MongoDB console.
Login Method
Supports Username Login and Anonymous Login.
Username, Password
If the login method is Username Login, you need to enter the username and password for logging in to the MongoDB instance.
Select a Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Connection Test or directly click OK to save and complete the creation of the MongoDB data source.
Click Connection Test to test whether the data source can connect normally to Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but the data source can still be created normally even if all selected clusters fail to connect.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.