All Products
Search
Document Center

ApsaraDB for SelectDB:SELECT INTO OUTFILE

Last Updated:Mar 30, 2026

Use the SELECT INTO OUTFILE statement to export query results from ApsaraDB for SelectDB to remote storage—Object Storage Service (OSS) or Hadoop Distributed File System (HDFS)—for backup or data migration.

How it works

  1. Write a SELECT query to define the data to export.

  2. Specify a destination path in OSS or HDFS and set the output format and properties.

  3. Run the statement. SelectDB executes the query and writes the results to the destination in a single synchronous operation.

  4. Check the returned row for file count, row count, and file size to confirm success.

Usage notes

Synchronous execution and result integrity

SELECT INTO OUTFILE is synchronous. The statement returns only after the export finishes—or fails. If the connection drops during export, no status is returned and you cannot tell whether the export completed.

To confirm export success reliably, add "success_file_name" = "SUCCESS" to the PROPERTIES clause. When the export succeeds, SelectDB writes a marker file named SUCCESS to the output directory.

Single-threaded execution

By default, the export uses one thread on a single BE node. Export duration equals query execution time plus result-set write time:

Total duration = Query execution time + Result write time

For large datasets, extend the timeout before running the export:

SET query_timeout = <seconds>;

For details on session variables, see Variable management.

File path and overwrite behavior

SELECT INTO OUTFILE does not check whether the destination path or files already exist. It does not create paths automatically or overwrite existing files. Path creation and overwrite behavior are controlled by the remote storage system (OSS or HDFS).

File management responsibility

SelectDB does not manage exported files. Clean up output files from successful jobs and residual files from failed jobs directly in your storage system.

Supported file formats

Data type Supported formats
Basic types CSV, CSV_WITH_NAMES, CSV_WITH_NAMES_AND_TYPES, Parquet, ORC (Optimized Row Columnar)
Complex types (ARRAY, MAP, STRUCT) CSV, CSV_WITH_NAMES, CSV_WITH_NAMES_AND_TYPES, ORC
Nested data Not supported

Special output values

  • BITMAP and HyperLogLog (HLL) functions: If the output contains invisible characters, \N is returned instead (indicating null).

  • Geospatial functions: Output is encoded binary. Use ST_AsText to get readable text. Affected functions: ST_CIRCLE, ST_POINT, ST_POLYGON, ST_GEOMETRYFROMTEXT, ST_LINEFROMTEXT, ST_GEOMETRYFROMWKB.

Result file behavior

  • An empty result set still produces an output file.

  • Segment files always contain complete rows, so actual file size may differ slightly from max_file_size.

Limitations

SELECT INTO OUTFILE supports the Amazon Simple Storage Service (Amazon S3) protocol and the HDFS protocol only.

Syntax

query_stmt
INTO OUTFILE "file_path"
[FORMAT AS format_name]
[PROPERTIES ("key"="value", ...)]

Parameters

Parameter Required Description
query_stmt Yes A SELECT statement that defines the data to export.
file_path Yes Destination path and optional filename prefix. Omit the prefix to use the directory only, for example "hdfs://path/to/".
FORMAT AS No Output format. Options: CSV (default), CSV_WITH_NAMES, CSV_WITH_NAMES_AND_TYPES, PARQUET, ORC.
PROPERTIES No File, HDFS, or S3 properties. See File properties, HDFS properties, and S3 properties.

File naming

When file_path includes a prefix (for example "s3://bucket_name/to/my_file_"), output files are named using this pattern:

<prefix><instance-name>_<sequence>.csv
  • The sequence number starts from 0 and increments for each segment file.

  • For a single output file, the sequence number is omitted.

  • The default extension is .csv. Override it with file_suffix.

Example: With prefix my_file_ and FORMAT AS CSV, a multi-file export produces:

my_file_abcdefg_0.csv
my_file_abcdefg_1.csv
my_file_abcdefg_2.csv
...

File properties

Property Required Description
column_separator No Column delimiter. Applies to CSV formats only.
line_delimiter No Row delimiter. Applies to CSV formats only.
max_file_size No Maximum size per segment file. Valid range: 5 MB–2 GB. Default: 1 GB. For ORC files, the actual size is ceil(max_file_size / 64) × 64 MB.
delete_existing_files No Whether to delete all files in the destination directory before exporting. false (default): export without deleting existing files. true: delete all files and subdirectories in the destination directory first.
success_file_name No Name of the marker file written to the destination directory after a successful export, for example "SUCCESS". Use this to confirm export completion when the connection may be interrupted.
file_suffix No File extension for output files. Overrides the default extension.
Warning

delete_existing_files = true permanently deletes data. Use this setting in test environments only. To enable it, submit a ticket to Alibaba Cloud technical support to set enable_delete_existing_files = true in fe.conf and restart the frontend.

`delete_existing_files` scope examples:

  • "file_path" = "/user/tmp" — deletes all files and directories under /user/.

  • "file_path" = "/user/tmp/" — deletes all files and directories under /user/tmp/.

HDFS properties

The following properties are required when exporting over HDFS.

Property Required Description
fs.defaultFS Yes NameNode endpoint and port.
hadoop.username Yes Username for HDFS authentication.
dfs.nameservices Yes Name Service name. Must match the value in hdfs-site.xml.
dfs.ha.namenodes.[nameservice ID] Yes NameNode IDs in the Name Service. Must match hdfs-site.xml.
dfs.namenode.rpc-address.[nameservice ID].[name node ID] Yes Remote Procedure Call (RPC) address for each NameNode. Must match hdfs-site.xml.
dfs.client.failover.proxy.provider.[nameservice ID] Yes Java class for active NameNode failover. Default: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.

Kerberos authentication (if enabled)

Property Required Description
dfs.namenode.kerberos.principal Yes Kerberos principal name of the HDFS NameNode.
hadoop.security.authentication Yes Authentication method. Set to kerberos.
hadoop.kerberos.principal Yes Principal for Kerberos authentication.
hadoop.kerberos.keytab Yes Path to the Kerberos keytab file.

S3 properties

These properties apply to any storage system that supports the S3 protocol, including OSS.

Property Required Description
s3.endpoint Yes Endpoint of the destination storage system.
s3.access_key Yes Access key for authentication.
s3.secret_key Yes Secret key for authentication.
s3.region Yes Region of the destination storage system.
s3.session_token If using temporary credentials Session token for temporary session authentication.
use_path_style No Access style. Default: false (virtual-hosted style). Set to true to force path-style access for storage systems that do not support virtual-hosted style.

URI scheme: Use http://, https://, or s3:// in file_path.

  • http:// or https://: use_path_style determines the access style.

  • s3://: always uses virtual-hosted style, regardless of use_path_style.

Important

Always include a URI scheme (http://, https://, or s3://) in the destination URL. Omitting the scheme causes the error: ERROR 1105 (HY000): errCode = 2, detailMessage = Unknown properties: [s3.region, s3.endpoint, s3.secret_key, s3.access_key].

Response

SELECT INTO OUTFILE is synchronous. When the statement finishes, it returns one row per export thread.

Success response:

SELECT * FROM tbl1 LIMIT 10 INTO OUTFILE "file:///home/work/path/result_";
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                |
+------------+-----------+----------+--------------------------------------------------------------------+
|          1 |         2 |        8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
1 row in set (0.05 sec)
Field Description
FileNumber Number of files generated.
TotalRows Number of rows in the result set.
FileSize Total size of exported files, in bytes.
URL The compute node to which the data is exported if data is exported to a local disk.

Parallel export response (multiple rows, one per thread):

+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                |
+------------+-----------+----------+--------------------------------------------------------------------+
|          1 |         3 |        7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
|          1 |         2 |        4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
2 rows in set (2.218 sec)

Error response:

ERROR 1064 (HY000): errCode = 2, detailMessage = ...

Enable parallel export

By default, export runs on a single BE node thread. Enable parallel export to distribute the workload across multiple BE nodes.

Step 1: Enable the session variable.

SET enable_parallel_outfile = true;

Step 2: Verify that your query supports parallel export.

Run EXPLAIN on your SELECT INTO OUTFILE statement:

EXPLAIN <select_into_outfile_statement>;

Check the output:

  • Parallel export supported: RESULT FILE SINK appears in PLAN FRAGMENT 1.

  • Parallel export not supported: RESULT FILE SINK appears in PLAN FRAGMENT 0.

A query with ORDER BY cannot run in parallel even when enable_parallel_outfile = true, because ordering requires a single-threaded merge step.

Example query plan (parallel export supported):

+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                             |
|  OUTPUT EXPRS:<slot 2> | <slot 3> | <slot 4> | <slot 5>                     |
|   PARTITION: UNPARTITIONED                                                  |
|                                                                             |
|   RESULT SINK                                                               |
|                                                                             |
|   1:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 1                                                             |
|  OUTPUT EXPRS:`k1` + `k2`                                                   |
|   PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1`   |
|                                                                             |
|   RESULT FILE SINK                                                          |
|   FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_                        |
|   STORAGE TYPE: S3                                                          |
|                                                                             |
|   0:OlapScanNode                                                            |
|      TABLE: multi_tablet                                                    |
+-----------------------------------------------------------------------------+

Step 3: Calculate maximum concurrency (optional).

When parallel export is supported, the maximum number of concurrent requests is:

be_instance_num × parallel_fragment_exec_instance_num
  • be_instance_num: number of BE nodes. Clusters with up to 32 CPU cores use 1 node; clusters with at least 32 CPU cores use multiple nodes, each with 32 CPU cores.

  • parallel_fragment_exec_instance_num: parallelism setting per node.

To view the current parallelism setting:

SHOW VARIABLES LIKE '%parallel_fragment_exec_instance_num%';

To set it:

SET parallel_fragment_exec_instance_num = <value>;

Examples

Export to HDFS

The required HDFS properties differ depending on whether high availability (HA) is enabled for the Hadoop cluster.

Hadoop cluster without high availability

-- fileSystem_port default: 9000
SELECT * FROM tbl
INTO OUTFILE "hdfs://${host}:${fileSystem_port}/path/to/result_"
FORMAT AS CSV
PROPERTIES
(
    "fs.defaultFS" = "hdfs://ip:port",
    "hadoop.username" = "work"
);

Hadoop cluster with high availability

-- fileSystem_port default: 8020
SELECT * FROM tbl
INTO OUTFILE "hdfs:///path/to/result_"
FORMAT AS CSV
PROPERTIES
(
    "fs.defaultFS" = "hdfs://hacluster/",
    "dfs.nameservices" = "hacluster",
    "dfs.ha.namenodes.hacluster" = "n1,n2",
    "dfs.namenode.rpc-address.hacluster.n1" = "192.168.0.1:8020",
    "dfs.namenode.rpc-address.hacluster.n2" = "192.168.0.2:8020",
    "dfs.client.failover.proxy.provider.hacluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);

If the total result is 1 GB or less, SelectDB writes a single file: result_0.csv. If the total exceeds 1 GB, it writes multiple segment files: result_0.csv, result_1.csv, and so on.

Export to OSS over the S3 protocol

Export the result of a UNION query to Parquet format in an OSS bucket in the China (Hangzhou) region:

SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
INTO OUTFILE "s3://oss-bucket/result_"
FORMAT AS PARQUET
PROPERTIES
(
    "s3.endpoint" = "oss-cn-hangzhou-internal.aliyuncs.com",
    "s3.access_key" = "****",
    "s3.secret_key" = "****",
    "s3.region" = "cn-hangzhou"
);

Export to OSS with parallel export enabled

Export data in parallel. All requests use the enable_parallel_outfile session variable.

SET enable_parallel_outfile = true;

SELECT k1 FROM tb1 LIMIT 1000
INTO OUTFILE "s3://my_bucket/export/my_file_"
FORMAT AS CSV
PROPERTIES
(
    "s3.endpoint" = "oss-cn-hangzhou-internal.aliyuncs.com",
    "s3.access_key" = "****",
    "s3.secret_key" = "****",
    "s3.region" = "cn-hangzhou"
);

If the query includes ORDER BY, parallel export is disabled automatically even with enable_parallel_outfile = true:

SET enable_parallel_outfile = true;

SELECT k1 FROM tb1 ORDER BY k1 LIMIT 1000
INTO OUTFILE "s3://my_bucket/export/my_file_"
FORMAT AS CSV
PROPERTIES
(
    "s3.endpoint" = "oss-cn-hangzhou-internal.aliyuncs.com",
    "s3.access_key" = "****",
    "s3.secret_key" = "****",
    "s3.region" = "cn-hangzhou"
);

Data type mappings

When exporting to Parquet or ORC, SelectDB automatically maps its data types to the formats supported by each file type.

SelectDB to ORC

SelectDB type ORC type
BOOLEAN BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
INT INT
BIGINT BIGINT
LARGEINT STRING
DATE STRING
DATEV2 STRING
DATETIME STRING
DATETIMEV2 TIMESTAMP
FLOAT FLOAT
DOUBLE DOUBLE
CHAR / VARCHAR / STRING STRING
DECIMAL DECIMAL
STRUCT STRUCT
MAP MAP
ARRAY ARRAY

SelectDB to Parquet (via Apache Arrow)

Parquet export routes data through Apache Arrow. SelectDB types map to Arrow types first, then Arrow writes them to Parquet.

SelectDB type Apache Arrow type
BOOLEAN BOOLEAN
TINYINT INT8
SMALLINT INT16
INT INT32
BIGINT INT64
LARGEINT UTF8
DATE UTF8
DATEV2 UTF8
DATETIME UTF8
DATETIMEV2 UTF8
FLOAT FLOAT32
DOUBLE FLOAT64
CHAR / VARCHAR / STRING UTF8
DECIMAL DECIMAL128
STRUCT STRUCT
MAP MAP
ARRAY LIST