All Products
Search
Document Center

Tablestore:Download Tablestore data to a local CSV file by using DataX

Last Updated:Apr 29, 2026

This topic describes how to use DataX to export data from Tablestore to a local CSV file. You can export data from data tables and time series tables in Tablestore.

Background

DataX is an open-source offline data synchronization tool that supports data transfer between heterogeneous data sources, including MySQL, Oracle, HDFS, Hive, and Tablestore.

Before you start, note the following:

  • Exported file names: DataX appends a random suffix to exported CSV file names (for example, output.csv__d737aec2_c9e3_4489_a5d7_361f44c998ce). Remove the suffix manually after export to get a standard CSV file.

  • Network costs: If you run DataX on an Elastic Compute Service (ECS) instance, use a VPC endpoint to avoid outbound Internet traffic fees and improve network performance.

Prerequisites

  • The endpoint and name of the Tablestore instance and the information of the source table in Tablestore are obtained.

  • An AccessKey pair is created for your Alibaba Cloud account or a Resource Access Management (RAM) user. For more information, see Create an access key pair.

    Important

    For security purposes, use Tablestore as a RAM user. Create a RAM user, attach the AliyunOTSFullAccess policy to the RAM user, and create an AccessKey pair for the RAM user. For more information, see Use the AccessKey pair of a RAM user to access Tablestore.

Procedure

Note

The examples in this topic use ECS instances running Alibaba Cloud Linux 3.2104 LTS 64-bit or Ubuntu 22.04 64-bit. For more information, see What is ECS?

Step 1: Install dependencies

  1. Install Python 2 or Python 3.

    Python 3 is pre-installed on ECS instances running Alibaba Cloud Linux or Ubuntu. For other operating systems, install Python manually.

  2. Install JDK 1.8 or later. JDK 1.8 is recommended.

    The following commands install JDK 1.8 on Alibaba Cloud Linux or Ubuntu. For other operating systems, install JDK manually.

    Alibaba Cloud Linux

    yum -y install java-1.8.0-openjdk-devel.x86_64

    Ubuntu

    apt update && apt upgrade
    apt install openjdk-8-jdk

Step 2: Download DataX

  1. Download the DataX package.

    wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz
  2. Extract the package.

    tar -zxvf datax.tar.gz

To compile DataX from source, see the DataX installation guide.

Step 3: Create a configuration file

  1. Go to the bin directory of DataX.

    cd datax/bin
  2. Create a configuration file.

    vi tablestore_to_csv.json

    The following examples show the configuration file content for each table type. Modify the parameters based on your requirements.

    Export data from a data table

    For information of the schema of the source data table named orders , see Appendix 1: Data table .
    {
      "job": {
        "setting": {
          "speed": {
            "channel": 1
          },
          "errorLimit": {
            "record": 0,
            "percentage": 0
          }
        },
        "content": [
          {
            "reader": {
              "name": "otsreader",
              "parameter": {
                "endpoint": "https://<YOUR-INSTANCE>.<YOUR-REGION>.ots.aliyuncs.com",
                "accessId": "<YOUR-ACCESS-KEY-ID>",
                "accessKey": "<YOUR-ACCESS-KEY-SECRET>",
                "instanceName": "<YOUR-INSTANCE>",
                "table": "orders",
                "range": {
                  "begin": [
                    {
                      "type": "INF_MIN"
                    }
                  ],
                  "end": [
                    {
                      "type": "INF_MAX"
                    }
                  ]
                },
                "column": [
                  {
                    "name": "order_id"
                  },
                  {
                    "name": "user_id"
                  },
                  {
                    "name": "sku_id"
                  },
                  {
                    "name": "price"
                  },
                  {
                    "name": "num"
                  },
                  {
                    "name": "total_price"
                  },
                  {
                    "name": "order_status"
                  },
                  {
                    "name": "create_time"
                  },
                  {
                    "name": "modified_time"
                  }
                ]
              }
            },
            "writer": {
              "name": "txtfilewriter",
              "parameter": {
                "path": "/tmp/export/",
                "fileName": "output.csv",
                "writeMode": "truncate",
                "fileFormat": "csv"
              }
            }
          }
        ]
      }
    }

    Export data from a time series table

    For information of the schema of the source time series table named vehicles_timeseriesdata , see Appendix 2: Time series table .
    {
      "job": {
        "setting": {
          "speed": {
            "channel": 1
          }
        },
        "content": [
          {
            "reader": {
              "name": "otsreader",
              "parameter": {
                "endpoint": "https://<YOUR-INSTANCE>.<YOUR-REGION>.ots.aliyuncs.com",
                "accessId": "<YOUR-ACCESS-KEY-ID>",
                "accessKey": "<YOUR-ACCESS-KEY-SECRET>",
                "instanceName": "<YOUR-INSTANCE>",
                "table": "vehicles_timeseriesdata",
                "mode": "normal",
                "newVersion": "true",
                "isTimeseriesTable": "true",
                "measurementName": "measurement_1",
                "timeRange": {
                  "begin": 0,
                  "end": 1750000000000
                },
                "column": [
                  {
                    "name": "_m_name"
                  },
                  {
                    "name": "_data_source"
                  },
                  {
                    "name": "_tags"
                  },
                  {
                    "name": "_time"
                  },
                  {
                    "name": "vin_id",
                    "type": "STRING"
                  },
                  {
                    "name": "mfrs",
                    "type": "STRING"
                  },
                  {
                    "name": "model",
                    "type": "STRING"
                  },
                  {
                    "name": "speed",
                    "type": "DOUBLE"
                  },
                  {
                    "name": "gps",
                    "type": "STRING"
                  },
                  {
                    "name": "mileage",
                    "type": "DOUBLE"
                  },
                  {
                    "name": "emission",
                    "type": "DOUBLE"
                  }
                ]
              }
            },
            "writer": {
              "name": "txtfilewriter",
              "parameter": {
                "path": "/tmp/export/",
                "fileName": "output.csv",
                "writeMode": "truncate",
                "fileFormat": "csv"
              }
            }
          }
        ]
      }
    }
    • The following table describes the Tablestore Reader parameters that you must modify.

      Parameter

      Applies to

      Description

      channel

      Both

      The number of parallel read/write threads. Each channel is an independent thread. Increasing this value raises concurrency and resource consumption.

      endpoint

      Both

      The endpoint of the Tablestore instance. If you access Tablestore from an ECS instance, use a VPC endpoint to avoid outbound traffic fees and improve performance and security.

      accessId

      Both

      The AccessKey ID of your Alibaba Cloud account or RAM user.

      accessKey

      Both

      The AccessKey secret of your Alibaba Cloud account or RAM user.

      instanceName

      Both

      The name of the Tablestore instance.

      tableName

      Both

      The name of the source table in Tablestore.

      column

      Both

      The array of columns to export.

      range

      Data table

      The primary key range to read. The begin and end arrays each specify one value per primary key column. The range is left-closed and right-open. Use INF_MIN (negative infinity) and INF_MAX (positive infinity) to read the full table. The number of columns in a virtual point must be the same as that in a primary key. This parameter is optional; the default reads from negative infinity.

      measurementName

      Time series table

      The metric name of the time series to read. If not set, all data in the table is read.

      timeRange

      Time series table

      The timestamp range to read, in milliseconds. The range is left-closed and right-open. This parameter is optional; the default reads all versions.

    • The following table describes the TxtFileWriter parameters that you must modify.

      Parameter

      Description

      path

      The local directory where the exported file is saved.

      fileName

      The base name of the exported file, including the extension. Example: output.csv.

      writeMode

      How TxtFileWriter handles existing files before writing. Valid values:

      • truncate: deletes all files prefixed with the fileName value in the target directory before writing.

      • append: writes directly to the file without pre-processing, ensuring no file name conflicts.

      • nonConflict: reports an error and aborts if a file prefixed with the fileName value already exists in the directory.

      fileFormat

      The output file format. Valid values: csv and text.

Step 4: Run the DataX job

  1. Run the export job.

    python3 datax.py tablestore_to_csv.json

    After the job completes, a summary is printed:

    2025-03-19 17:21:05.146 [job-0] INFO  StandAloneJobContainerCommunicator - Total 200000 records, 23086634 bytes | Speed 1.10MB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.222s |  All Task WaitReaderTime 18.392s | Percentage 100.00%
    2025-03-19 17:21:05.147 [job-0] INFO  JobContainer - 
    Task Start Time                    : 2025-03-19 17:20:43
    Task End Time                    : 2025-03-19 17:21:05
    Task Duration                    :                 21s
    Average Transfer Speed                    :            1.10MB/s
    Record Write Speed                    :          10000rec/s
    Total Records Read                    :              200000
    Failed Records                    :                   0
  2. Verify the export result.

    Check the output directory (for example, /tmp/export/) to confirm the exported file. The file name includes a random suffix:

    output.csv__d737aec2_c9e3_4489_a5d7_361f44c998ce

    To preview the file contents, run:

    head -5 output.csv__d737aec2_c9e3_4489_a5d7_361f44c998ce

    The output is similar to:

    order_id,user_id,sku_id,price,num,total_price,order_status,create_time,modified_time
    ORD-001,USR-100,SKU-200,29.99,2,59.98,completed,2025-01-01 08:00:00,2025-01-02 10:00:00
    ORD-002,USR-101,SKU-201,49.99,1,49.99,pending,2025-01-02 09:00:00,2025-01-02 09:00:00

    Remove the suffix to get a standard CSV file name:

    mv output.csv__d737aec2_c9e3_4489_a5d7_361f44c998ce output.csv
    Note

    DataX appends a random suffix to distinguish files written by parallel threads. Remove the suffix to get a standard CSV file name.

Appendix: Sample source tables

Appendix 1: Data table

The name of the sample data table is orders. The following table describes the schema of the data table.

Field name

Type

Description

order_id (primary key column)

String

The ID of the order.

user_id

String

The ID of the user.

sku_id

String

The ID of the product.

price

Double

The unit price of the purchased products.

num

Integer

The quantity of the purchased products.

total_price

Double

The total price of the order.

order_status

String

The status of the order.

create_time

String

The time when the order was created.

modified_time

String

The time when the order was last modified.

Appendix 2: Time series table

The name of the sample time series table is vehicles_timeseriesdata. The following table describes the schema of the time series table.

Field name

Type

Description

_m_name

String

The name of the physical quantity or metric for data in the time series.

_data_source

String

The identifier of the data source for the time series. This field can be left blank.

_tags

String

The tags of the time series.

_time

Integer

The time when the data is reported.

vin_id

String

The vehicle identification number (VIN) of the vehicle.

mfrs

String

The manufacturer.

model

String

The model of the vehicle.

speed

Double

The current speed of the vehicle.

gps

String

The GPS coordinates of the vehicle in the format of latitude,longitude, with latitude preceding longitude.

mileage

Double

The current mileage of the vehicle.

emission

Double

The emission value.