DataWorks Data Integration allows you to use Doris Writer to write table data to Doris. This topic describes the capabilities of synchronizing data to Doris data sources.
Supported Doris versions
Doris Writer uses the MySQL driver 5.1.47. The following table describes the Doris kernel versions supported by the driver. For more information about the driver capabilities, see the official Doris documentation.
Doris kernel version | Supported or not |
0.x.x | Yes |
1.1.x | Yes |
1.2.x | Yes |
2.x | Yes |
Limits
You can use Data Integration to synchronize data to Doris only in offline mode.
Supported data types
Different Doris versions support different data types and aggregation models. For information about all data types supported in each Doris version, see the official Doris documentation. The following table describes the mainly supported data types.
Data type | Supported model | Doris version | Doris Writer for batch data write |
SMALLINT | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
INT | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
BIGINT | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
LARGEINT | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
FLOAT | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
DOUBLE | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
DECIMAL | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
DECIMALV3 | Aggregate,Unique,Duplicate | Versions later than 1.2.1, 2.x | Yes |
DATE | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
DATETIME | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
DATEV2 | Aggregate,Unique,Duplicate | 1.2.x, 2.x | Yes |
DATATIMEV2 | Aggregate,Unique,Duplicate | 1.2.x, 2.x | Yes |
CHAR | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
VARCHAR | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
STRING | Aggregate,Unique,Duplicate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
VARCHAR | Aggregate,Unique,Duplicate | 1.1.x, 1.2.x, 2.x | Yes |
ARRAY | Duplicate | 1.2.x, 2.x | Yes |
JSONB | Aggregate,Unique,Duplicate | 1.2.x, 2.x | Yes |
HLL | Aggregate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
BITMAP | Aggregate | 0.x.x, 1.1.x, 1.2.x, 2.x | Yes |
QUANTILE_STATE | Aggregate | 1.2.x, 2.x | Yes |
How it works
Doris Writer writes data by using the native StreamLoad method. Doris Writer caches the data that is read by a reader in the memory, concatenates the data into texts, and then writes the texts to a Doris database at a time. For more information, see the official Doris documentation.
Prepare an ApsaraDB for OceanBase environment before data synchronization
Before you use DataWorks to synchronize data to a Doris data source, you must prepare a Doris environment. This ensures that a data synchronization task can be configured and can synchronize data to the Doris data source as expected. The following information describes how to prepare a Doris environment for data synchronization to a Doris data source.
Preparation 1: Check the version of your Doris database
Data Integration supports only specific Doris versions. You can refer to the Supported Doris versions section in this topic to check whether the version of your Doris database meets the requirements. You can download the related Doris version from the official Doris website and install the Doris version.
Preparation 2: Prepare an account that has the required permissions
You must create an account that is used to log on to the Doris database for subsequent operations. You must specify a password for the account for subsequent connections to the Doris database. If you want to use the default root user of Doris to log on to the Doris database, you must specify a password for the root user. By default, the root user does not have a password. You can execute an SQL statement in Doris to specify the password:
SET PASSWORD FOR 'root' = PASSWORD('Password')
Preparation 3: Establish a network connection between the Doris database and a resource group
To use the StreamLoad method to write data, you need to access the private IP address of an FE node. If you access the public IP address of the FE node, you are redirected to the private IP address of a BE node. For more information about the redirection, see Data operation issues. In this case, you must establish network connections between your data source and a serverless resource group or an exclusive resource group for Data Integration to enable the resource group to access the data source over an internal network. For more information about how to establish a network connection between the Doris database and a resource group, see Network connectivity solutions.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Take note of the configuration requirements for the following configuration items of the Doris data source:
JdbcUrl: a Java Database Connectivity (JDBC) connection string that consists of an IP address, a port number, a database name, and connection parameters. Public and private IP addresses are supported. If you use a public IP address, make sure that the resource group for Data Integration can access the host on which the Doris database resides.
FE endpoint: the IP address and port number of the FE node. If your cluster contains multiple FE nodes, you can configure the IP addresses and port numbers of multiple FE nodes. Separate the IP address and port number pairs with a comma (,), such as
ip1:port1,ip2:port2
. A network connectivity test will be performed for all FE endpoints.Username: the username that is used to log on to the Doris database.
Password: the password that is used to log on to the Doris database.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.