All Products
Search
Document Center

ApsaraDB for ClickHouse:Use clickhouse-client to import data

Last Updated:Jan 26, 2024

This topic describes how to use clickhouse-client to import data from a local file to ApsaraDB for ClickHouse. This method can help you quickly import data.

Usage notes

ApsaraDB for ClickHouse allows you to import data from files. The common file formats supported are TabSeparated, TabSeparatedWithNames, TabSeparatedWithNamesAndTypes, CSV, and CSVWithNames. For more information about common file formats, see Common file formats. If your file is in another format, the file must meet the standards of the ClickHouse community. For more information, see Formats for Input and Output Data.

Prerequisites

  • The IP address of the on-premises machine is added to the whitelist of the ApsaraDB for ClickHouse cluster. For more information, see Configure a whitelist.

  • clickhouse-client that corresponds to the version of the ApsaraDB for ClickHouse cluster is installed. For more information, see clickhouse-client.

Step 1: Prepare test data

In the directory in which clickhouse-client is installed, create a CSV file named testdata.csv and write the following data to the file:

1,yang,32,shanghai,http://example.com
2,wang,22,beijing,http://example.com
3,xiao,23,shenzhen,http://example.com
4,jess,45,hangzhou,http://example.com
5,jack,14,shanghai,http://example.com
6,tomy,25,hangzhou,http://example.com
7,lucy,45,shanghai,http://example.com
8,tengyin,26,shanghai,http://example.com
9,wangli,27,shenzhen,http://example.com
10,xiaohua,37,shanghai,http://example.com

Step 2: Create an ApsaraDB for ClickHouse table

  1. Log on to the ApsaraDB for ClickHouse console.

  1. In the top navigation bar, select the region where the cluster is deployed.

  1. On the Clusters page, find the cluster that you want to manage and click the cluster ID.

  1. In the upper-right corner of the cluster details page, click Log On to Database.

  1. In the Log on to Database Instance dialog box, enter the username and password of the database account and click Login.

  1. Create a table.

    Note

    The schema of an ApsaraDB for ClickHouse table must map to the structure of a local file, and the data type of the ApsaraDB for ClickHouse table must be the same as that of the local file.

    • Create a local table

      • If you use a cluster of Single-replica Edition, execute the following statement to create a local table:

        CREATE TABLE test_tbl_local ON cluster default
        (
        id UInt8,
        user_name String,
        age UInt16,
        city String,
        access_url String
        )
        ENGINE = MergeTree()
        ORDER BY id;
      • If you use a cluster of Double-replica Edition, execute the following statement to create a local table:

        CREATE TABLE test_tbl_local ON cluster default
        (
        id UInt8,
        user_name String,
        age UInt16,
        city String,
        access_url String
        )
        ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
        order by id;
    • Create a distributed table.

      Note

      If you want to import data only to the local table, skip this step.

      CREATE TABLE test_tbl_distributed ON cluster default
      (
      id UInt8,
      user_name String,
      age UInt16,
      city String,
      access_url String
      )
      ENGINE = Distributed(default, default, test_tbl_local, rand());

Step 3: Import data from the local file

Run the following command in the directory in which clickhouse-client is installed:

cat <Local file name> | ./clickhouse-client --host=<Cluster endpoint> --port=<TCP port number> --user=<Database account username> --password=<Database account password> --query="INSERT INTO <ApsaraDB for ClickHouse table name> FORMAT <Local file format>";
Note

Data import over the Internet is slow. If you want to import a large amount of data or test import performance, we recommend that you use one of the following methods:

  • Import data over an internal network: Purchase an Elastic Compute Service (ECS) instance that is deployed in the same virtual private cloud (VPC) as the ApsaraDB for ClickHouse cluster, use the VPC endpoint to connect to the ApsaraDB for ClickHouse cluster from the ECS instance, and then import data.

  • Import data in multiple concurrent processes: Divide the original file into multiple parts. Then, start multiple clients to concurrently import data from these parts.

Step 4: Query the ApsaraDB for ClickHouse table

  1. Connect to the ApsaraDB for ClickHouse cluster by using Data Management (DMS). For more information, see Connect to an ApsaraDB for ClickHouse cluster.

  1. Execute the following query statement:

    SELECT * FROM test_tbl_distributed; 
    Note

    If you import data to a local table, replace the name of the distributed table in the query statement with the name of the local table.

    The following result is returned.查询结果

Common file formats

The following table describes the common file formats that are supported by ApsaraDB for ClickHouse.

Common file format

Description

TabSeparated

Data is written to a text file in this format by row. Columns are separated by tabs. Escape sequences are required to represent tabs, line feeds, and backslashes. The escape sequence for a tab is \t. The escape sequence for a line feed is \n. The escape sequence for a backslash is \\. \N indicates NULL.

TabSeparatedWithNames

This file format is similar to the TabSeparated format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns.

TabSeparatedWithNamesAndTypes

This file format is similar to the TabSeparated format, except that the values in the first row are the column names and the values in the second row are the data types. The first and second rows are ignored during parsing.

CSV

Data is written to a file of this format by row. Strings are enclosed in a pair of double quotation marks (" "). Each double quotation mark (") that is contained in a string is escaped with a pair of double quotation marks (" "). Numeric values are not enclosed in a pair of double quotation marks (" "). The default column delimiter is a comma (,). You can use the --format_csv_delimiter parameter to specify other column delimiters. For example, if you want to separate columns with vertical bars (|), run the following command:

cat testdata.csv | ./clickhouse-client --format_csv_delimiter="|" --host=cc-bp163l724nkf8****.clickhouse.ads.aliyuncs.com --port=3306 --user=test --password=123456Aa --query="INSERT INTO test_tbl_distributed FORMAT CSV";

CSVWithNames

This format is similar to the CSV format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns.