This topic describes how to use clickhouse-client to import data from an on-premises file to ApsaraDB ClickHouse.

Usage notes

ApsaraDB ClickHouse allows you to import data from files in the following formats: TabSeparated, TabSeparatedWithNames, TabSeparatedWithNamesAndTypes, CSV, and CSVWithNames. For more information about other file formats that are supported, see Formats for Input and Output Data.

Prerequisites

  • The IP address of the on-premises machine is added to the whitelist of the ApsaraDB ClickHouse cluster.
  • clickhouse-client that is compatible with the version of the ApsaraDB ClickHouse cluster is installed. For more information, see clickhouse-client.

Procedure

  1. Prepare test data
    In the directory in which clickhouse-client is installed, create a CSV file named testdata.csv and write the following data to the file:
    1,yang,32,shanghai,http://example.com
    2,wang,22,beijing,http://example.com
    3,xiao,23,shenzhen,http://example.com
    4,jess,45,hangzhou,http://example.com
    5,jack,14,shanghai,http://example.com
    6,tomy,25,hangzhou,http://example.com
    7,lucy,45,shanghai,http://example.com
    8,tengyin,26,shanghai,http://example.com
    9,wangli,27,shenzhen,http://example.com
    10,xiaohua,37,shanghai,http://example.com
  2. Create an ApsaraDB ClickHouse table
    1. Log on to the ApsaraDB for ClickHouse console.
    2. On the Clusters page, find the cluster that you want to manage and click the cluster ID.
    3. In the upper-right corner of the cluster details page, click Log On to Database.
    4. In the Login instance dialog box, enter the username and password of the database account and click Login.
    5. Create a local table.
      Note The structure of the ApsaraDB ClickHouse table must be the same as the structure of the data that you want to import.
      • If you use a cluster of Single-replica Edition, execute the following statement to create a table:
        create table test_tbl_local on cluster default
        (
        id UInt8,
        user_name String,
        age UInt16,
        city String,
        access_url String
        )
        engine = MergeTree()
        order by id;
      • If you use a cluster of Double-replica Edition, execute the following statement to create a table:
        create table test_tbl_local on cluster default
        (
        id UInt8,
        user_name String,
        age UInt16,
        city String,
        access_url String
        )
        engine = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
        order by id;
    6. Create a distributed table.
      Note If you want to import data only to the local table, skip this step.
      create table test_tbl_distributed on cluster default
      (
      id UInt8,
      user_name String,
      age UInt16,
      city String,
      access_url String
      )
      engine = Distributed(default, default, test_tbl_local, rand());
  3. Import data
    Run the following command in the directory in which clickhouse-client is installed:
    cat <On-premises file name> | ./clickhouse-client --host=<Cluster endpoint> --port=<TCP port number> --user=<Database account username> --password=<Database account password> --query="INSERT INTO <ApsaraDB for ClickHouse table name> FORMAT <On-premises file format>";
    Note Data from an external network is imported at a low speed. If you want to import a large amount of data or test import performance, we recommend that you use one of the following methods:
    • Import data from an internal network: Purchase an elastic compute service (ECS) instance that is deployed in the same virtual private cloud (VPC) as the ApsaraDB ClickHouse cluster, use the VPC endpoint to connect to the ApsaraDB ClickHouse cluster from the ECS instance, and then import data.
    • Import data in multiple concurrent processes: Divide the original file into multiple parts. Then, start multiple clients to concurrently import data from these parts.
    The following table describes common file types that are supported by ApsaraDB ClickHouse.
    Common file format Description
    TabSeparated Data is written to a text file in this format by row. Columns are separated by tabs. Escape sequences are required to represent tabs, line feeds, and backslashes. The escape sequence for a tab is \t. The escape sequence for a line feed is \n. The escape sequence for a backslash is \\. \N indicates NULL.
    TabSeparatedWithNames This file format is similar to the TabSeparated format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns.
    TabSeparatedWithNamesAndTypes This file format is similar to the TabSeparated format, except that the values in the first row are the column names and the values in the second row are the data types. The first and second rows are ignored during parsing.
    CSV Data is written to a file of this format by row. Strings are enclosed in a pair of double quotation marks (" "). Each double quotation mark (") that is contained in a string is escaped with a pair of double quotation marks (" "). Numeric values are not enclosed in a pair of double quotation marks (" "). The default delimiter for columns is a comma (,). You can use the -- format_csv_delimiter parameter to specify other column delimiters. For example, if you want to separate columns with vertical bars (|), run the following command:
    cat testdata.csv | ./clickhouse-client --format_csv_delimiter="|" --host=cc-bp163l724nkf8****.clickhouse.ads.aliyuncs.com --port=3306 --user=test --password=123456Aa --query="INSERT INTO test_tbl_distributed FORMAT CSV";
    CSVWithNames This format is similar to the CSV format, except that the values in the first row are the column names. The first row is ignored during parsing. Therefore, the column names in the first row cannot be used to determine the positions of columns.
    Note For more information about other file formats that are supported by ApsaraDB ClickHouse, see Formats for Input and Output Data.
  4. Query an ApsaraDB ClickHouse table
    Execute the following query statement:
    select * from test_tbl_distributed; 
    Note If you import data to a local table, replace the name of the distributed table in the query statement with the name of the local table.
    The following result is returned: Query result