All Products
Search
Document Center

Tablestore:Download Tablestore data to a local CSV file by using DataX

Last Updated:Apr 02, 2025

This topic describes how to use DataX to export data from Tablestore to a local CSV file. You can export data from data tables and time series tables in Tablestore.

Background information

DataX is a tool for offline synchronization between disparate data sources. It supports efficient and stable synchronization between various data sources such as MySQL, Oracle, HDFS, Hive, and Tablestore.

Prerequisites

  • The endpoint and name of the Tablestore instance and the information of the source table in Tablestore are obtained.

  • An AccessKey pair is created for your Alibaba Cloud account or a Resource Access Management (RAM) user. For more information, see Create an AccessKey pair.

    Important

    For security purposes, we recommend that you use Tablestore features as a RAM user. You can create a RAM user, grant the user permissions to manage Tablestore by attaching the AliyunOTSFullAccess policy to the RAM user, and create an AccessKey pair for the RAM user. For more information, see Use the AccessKey pair of a RAM user to access Tablestore.

Procedure

Note

In the examples in this topic, Elastic Compute Service (ECS) instances that run Alibaba Cloud Linux 3.2104 LTS 64-bit or Ubuntu 22.04 64-bit are used. For more information, see What is ECS?

Step 1: Install dependencies

  1. Install Python 2 or Python 3.

    Python 3 is pre-installed on ECS instances that run Alibaba Cloud Linux or Ubuntu. For ECS instances that run other operating systems, you need to install Python yourself.

  2. Install JDK 1.8 or later. We recommend that you install JDK 1.8.

    This step describes how to install JDK 1.8 on an ECS instance that runs Alibaba Cloud Linux or Ubuntu. For ECS instances that run other operating systems, you need to install JDK yourself.

    Alibaba Cloud Linux

    yum -y install java-1.8.0-openjdk-devel.x86_64

    Ubuntu

    apt update && apt upgrade
    apt install openjdk-8-jdk

Step 2: Download DataX

  1. Download the DataX package.

    wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz
  2. Extract the package.

    tar -zxvf datax.tar.gz

For information about how to compile DataX yourself, see DataX installation guide.

Step 3: Create a configuration file

  1. Go to the bin directory of DataX.

    cd datax/bin
  2. Create a configuration file. If you use vim, replace the command accordingly.

    vi tablestore_to_csv.json

    The following sample code shows the content of a configuration file. Modify the parameters based on your synchronization requirements and actual situation.

    Export data from a data table

    For information of the schema of the source data table named orders, see Appendix 1: Data table.
    {
      "job": {
        "setting": {
          "speed": {
            "channel": 1
          },
          "errorLimit": {
            "record": 0,
            "percentage": 0
          }
        },
        "content": [
          {
            "reader": {
              "name": "otsreader",
              "parameter": {
                "endpoint": "https://<YOUR-INSTANCE>.<YOUR-REGION>.ots.aliyuncs.com",
                "accessId": "<YOUR-ACCESS-KEY-ID>",
                "accessKey": "<YOUR-ACCESS-KEY-SECRET>",
                "instanceName": "<YOUR-INSTANCE>",
                "table": "orders",
                "range": {
                  "begin": [
                    {
                      "type": "INF_MIN"
                    }
                  ],
                  "end": [
                    {
                      "type": "INF_MAX"
                    }
                  ]
                },
                "column": [
                  {
                    "name": "order_id"
                  },
                  {
                    "name": "user_id"
                  },
                  {
                    "name": "sku_id"
                  },
                  {
                    "name": "price"
                  },
                  {
                    "name": "num"
                  },
                  {
                    "name": "total_price"
                  },
                  {
                    "name": "order_status"
                  },
                  {
                    "name": "create_time"
                  },
                  {
                    "name": "modified_time"
                  }
                ]
              }
            },
            "writer": {
              "name": "txtfilewriter",
              "parameter": {
                "path": "/tmp/export/",
                "fileName": "output.csv",
                "writeMode": "truncate",
                "fileFormat": "csv"
              }
            }
          }
        ]
      }
    }

    Export data from a time series table

    For information of the schema of the source time series table named vehicles_timeseriesdata, see Appendix 2: Time series table.
    {
      "job": {
        "setting": {
          "speed": {
            "channel": 1
          }
        },
        "content": [
          {
            "reader": {
              "name": "otsreader",
              "parameter": {
                "endpoint": "https://<YOUR-INSTANCE>.<YOUR-REGION>.ots.aliyuncs.com",
                "accessId": "<YOUR-ACCESS-KEY-ID>",
                "accessKey": "<YOUR-ACCESS-KEY-SECRET>",
                "instanceName": "<YOUR-INSTANCE>",
                "table": "vehicles_timeseriesdata",
                "mode": "normal",
                "newVersion": "true",
                "isTimeseriesTable": "true",
                "measurementName": "measurement_1",
                "timeRange": {
                  "begin": 0,
                  "end": 1750000000000
                },
                "column": [
                  {
                    "name": "_m_name"
                  },
                  {
                    "name": "_data_source"
                  },
                  {
                    "name": "_tags"
                  },
                  {
                    "name": "_time"
                  },
                  {
                    "name": "vin_id",
                    "type": "STRING"
                  },
                  {
                    "name": "mfrs",
                    "type": "STRING"
                  },
                  {
                    "name": "model",
                    "type": "STRING"
                  },
                  {
                    "name": "speed",
                    "type": "DOUBLE"
                  },
                  {
                    "name": "gps",
                    "type": "STRING"
                  },
                  {
                    "name": "mileage",
                    "type": "DOUBLE"
                  },
                  {
                    "name": "emission",
                    "type": "DOUBLE"
                  }
                ]
              }
            },
            "writer": {
              "name": "txtfilewriter",
              "parameter": {
                "path": "/tmp/export/",
                "fileName": "output.csv",
                "writeMode": "truncate",
                "fileFormat": "csv"
              }
            }
          }
        ]
      }
    }
    • The following table describes the parameters of Tablestore Reader that you must modify.

      Parameter

      Description

      channel

      The core parallel unit of the data synchronization task.

      Each channel corresponds to an independent thread for reading and writing data. You can control the concurrency of the task by modifying the number of channels, which affects system performance and resource consumption.

      endpoint

      The endpoint of the Tablestore instance.

      Note

      If you access Tablestore from an ECS instance, we recommend that you use a VPC endpoint, which does not generate fees for outbound traffic over the Internet and provides better network performance and security.

      accessId

      The AccessKey ID of your Alibaba Cloud account or RAM user.

      accessKey

      The AccessKey secret of your Alibaba Cloud account or RAM user.

      instanceName

      The name of the Tablestore instance.

      tableName

      The name of the source table in Tablestore.

      column

      The array of columns to export.

      range

      The begin and end parameters specify the start and end values for each primary key column when data is read from a Tablestore data table. The value of the begin and end parameters is a JSON array. The range of data to read from Tablestore is a left-closed, right-open interval.

      The start and end primary keys must be valid primary keys or virtual points composed of INF_MIN and INF_MAX types. The number of columns in a virtual point must be the same as that in a primary key. INF_MIN indicates negative infinity, which is smaller than any value. INF_MAX indicates positive infinity, which is larger than any value.

      Note

      This parameter is used when a data table is used as the source table. This parameter is optional. The default value indicates that the reading starts from negative infinity.

      measurementName

      The metric name of the time series that you want to read. If this parameter is not configured, all data in the table is read.

      Note

      This parameter is used when a time series table is used as the source table.

      timeRange

      The begin and end parameters specify the start and end timestamps for reading data. The value of the begin and end parameters is a JSON array. The range of data to read from Tablestore is a left-closed, right-open interval. The unit of the timestamp is milliseconds.

      Note

      This parameter is used when a time series table is used as the source table. This parameter is optional. The default value indicates that all versions are read.

    • The following table describes the parameters of TxtFileWriter that you must modify.

      Parameter

      Description

      path

      The path of the exported file in the system.

      fileName

      The name of the exported file, including the file name extension. Example: output.csv.

      writeMode

      The mode in which TxtFileWriter processes data before writing data. Valid values:

      • truncate: clears all files whose names are prefixed with the value of the fileName parameter in the specified directory before writing data.

      • append: does not process data before writing the data. The data is directly written to the file whose name is specified by using the fileName parameter. This ensures that the file names do not conflict.

      • nonConflict: reports an error and aborts the operation if a file whose name is prefixed with the value of the fileName parameter exists in the directory.

      fileFormat

      The format of the output file. Valid values: csv and text.

Step 4: Execute the DataX task

  1. Run the following command to start the execution of the data export task:

    python3 datax.py tablestore_to_csv.json

    After the task is completed, the overall execution status is printed.

    2025-03-19 17:21:05.146 [job-0] INFO  StandAloneJobContainerCommunicator - Total 200000 records, 23086634 bytes | Speed 1.10MB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.222s |  All Task WaitReaderTime 18.392s | Percentage 100.00%
    2025-03-19 17:21:05.147 [job-0] INFO  JobContainer - 
    Task Start Time                    : 2025-03-19 17:20:43
    Task End Time                    : 2025-03-19 17:21:05
    Task Duration                    :                 21s
    Average Transfer Speed                    :            1.10MB/s
    Record Write Speed                    :          10000rec/s
    Total Records Read                    :              200000
    Failed Records                    :                   0
  2. Verify the export result.

    After the task is completed, check the specified path (such as /tmp/export/) to confirm whether the exported CSV file meets your expectations.

    output.csv__d737aec2_c9e3_4489_a5d7_361f44c998ce
    Note

    The exported CSV file name will have a random suffix to distinguish files generated by multiple threads. You need to manually remove this suffix to obtain a standard CSV file.

Appendix-Sample source tables

Appendix 1: Data table

The name of the sample data table is orders. The following table describes the schema of the data table.

Field name

Type

Description

order_id (primary key column)

String

The ID of the order.

user_id

String

The ID of the user.

sku_id

String

The ID of the product.

price

Double

The unit price of the purchased products.

num

Integer

The quantity of the purchased products.

total_price

Double

The total price of the order.

order_status

String

The status of the order.

create_time

String

The time when the order was created.

modified_time

String

The time when the order was last modified.

Appendix 2: Time series table

The name of the sample time series table is vehicles_timeseriesdata. The following table describes the schema of the time series table.

Field name

Type

Description

_m_name

String

The name of the physical quantity or metric for data in the time series.

_data_source

String

The identifier of the data source for the time series. This field can be left empty.

_tags

String

The tags of the time series.

_time

Integer

The time when the data is reported.

vin_id

String

The vehicle identification number (VIN) of the vehicle.

mfrs

String

The manufacturer.

model

String

The model of the vehicle.

speed

Double

The current speed of the vehicle.

gps

String

The GPS coordinates of the vehicle in the format of latitude,longitude, with latitude preceding longitude.

mileage

Double

The current mileage of the vehicle.

emission

Double

The emission value.