This topic describes the data import methods that are provided by Doris, the supported formats of data that can be imported, and the common features of importing data by using Doris.
Data import methods
Doris provides various data import methods. You can select a data import method based on the data source that you use.
Supported data formats
The supported formats of data that can be imported vary based on the data import methods.
Data import method | Supported data format |
Broker Load | Parquet, ORC, CSV, and GZIP |
Stream Load | CSV, GZIP, and JSON |
Routine Load | CSV and JSON |
Features
This section describes the common features of importing data by using Doris.
Atomicity
Each import job in Doris is a complete transaction regardless of whether you use Broker Load to import multiple data records at the same time or use the INSERT statement to import a single data record. An import transaction can ensure the atomicity of data that is imported in a batch. This prevents data from being lost during the import process.
Labels are used to identify import jobs. Each import job has a label. The label of an import job in a database is unique. You can specify a label for an import job or use the label that is generated by Doris for an import job.
The label of an import job ensures that data in the import job can be successfully imported only once. If an import job is successful, you cannot use the label of the import job for another import job. If you use the label for another import job, the request is denied and the error message Label already used is returned. This way, the at-most-once semantics is implemented for Doris. You can implement the exactly-once semantics for data import based on the at-most-once semantics for Doris and the at-least-once semantics for the upstream system.
Synchronous and asynchronous modes
You can import data in synchronous or asynchronous mode. In synchronous mode, Doris returns a result after an import job is complete. You can determine whether data is successfully imported based on the result. In asynchronous mode, after an import job is submitted, Successful is returned. However, this result does not mean that data is imported. To check the status of the import job, you must run the related command.