Hologres Reader allows you to read data from Hologres. You can use Hologres Reader to read data from the tables of Hologres data stores and write the data to other types of data stores.

Notice Currently, you can only use exclusive resource groups for Data Integration for Hologres Reader. The default resource group and custom resource groups are not supported. For more information, see Use exclusive resource groups for data integration and Add a custom resource group.
Hologres Reader reads data from Hologres tables by using PostgreSQL statements. The number of concurrent threads is determined based on the number of shards in the source Hologres table. Each shard corresponds to one SELECT statement.
  • In a CREATE TABLE statement that is used to create a table in Hologres, the CALL set_table_property('table_name', 'shard_count', 'xx') command sets the number of table shards.

    By default, the shard_count field is set to the default number of table shards of the database created in Hologres. The amount of computing resources for the relevant Hologres instance determines the default number of table shards.

  • A SELECT statement uses the shard specifies by the built-in field hg_shard_id of the source Hologres table to query data.

Parameters

Parameter Description Required Default value
endpoint The endpoint used to connect to the source Hologres instance, in the format of instance-id-region-endpoint.hologres.aliyuncs.com:port. You can view the endpoints of a Hologres instance on the configuration page of the instance in the Hologres console.
The endpoint of a Hologres instance varies with the network types, including the classic network, Internet, and Virtual Private Cloud (VPC). Select an appropriate endpoint based on the network where the resource group for Data Integration and the Hologres instance reside. Otherwise, the connection may fail or the performance may be poor. The formats of these three types of endpoints are as follows:
  • Classic network endpoint: instance-id-region-endpoint-internal.hologres.aliyuncs.com:port
  • Public endpoint: instance-id-region-endpoint.hologres.aliyuncs.com:port
  • VPC endpoint: instance-id-region-endpoint-vpc.hologres.aliyuncs.com:port

We recommend that you deploy the resource group for Data Integration and the Hologres instance in the same zone of the same region to guarantee network connection and optimal performance.

Yes None
accessId The AccessKey ID of the account used to access Hologres. Yes None
accessKey The AccessKey secret of the account used to access Hologres. Specify an AccessKey secret of an account that is authorized to read data from the source table. Yes None
database The name of the source database in the Hologres instance. Yes None
table The name of the source Hologres table. If the table is a partitioned table, specify the name of the parent partitioned table. Yes None
column The columns in the source Hologres table from which data is read. The primary key columns of the source table must be included. Set the value to an asterisk (*) if data is read from all the columns in the source table. That is, set the column parameter as follows: "column":["*"]. Yes None
partition The partition key column and the corresponding value of the source Hologres table, in the format of column=value. This parameter is valid for partitioned tables.
Notice
  • Currently, Hologres only supports list partitioning and you can only specify a single column as the partition key column. The data type of the partition key column must be INT4 or TEXT.
  • The parameter value must match the partition expression in the data definition language (DDL) statements used to create the source table.
  • The corresponding child partitioned tables must have been created with data.
No Null, indicating a non-partitioned table.
fetchSize The number of data records to be read from the source Hologres table at a time by using the SELECT statement. No 1,000

Configure Hologres Reader by using the codeless UI

  1. Configure the connections.

    Configure the source and destination connections for the sync node.

    Parameter Description
    Connection The name of the connection.
    Table The table parameter in the preceding parameter description.
    Filter The filter condition for the data to be synchronized. Currently, the WHERE clause is not supported. The SQL syntax is determined by the selected connection.
  2. Configure field mapping, that is, the column parameter in the preceding parameter description.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field or move the pointer over a field and click the Delete icon to delete the field.Mappings section
    GUI Element Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. Note that the data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping for fields in the same row. Note that the data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove mappings that have been established.
    Auto Layout Click Auto Layout. The fields are automatically sorted based on specified rules.
    Change Fields Click the Change Fields icon. In the Change Fields dialog box that appears, you can manually edit fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored.
    Add
    • Click Add to add a field. You can enter constants. Each constant must be enclosed in single quotation marks (' '), such as 'abc' and '123'.
    • You can use scheduling parameters, such as ${bizdate}.
    • You can enter functions supported by relational databases, such as now() and count(1).
    • Fields that cannot be parsed are indicated by Unidentified.
  3. Configure channel control policies.Channel section
    Parameter Description
    Expected Maximum Concurrency The maximum number of concurrent threads to read data from or write data to data storage within the sync node. You can configure the concurrency for a node on the codeless user interface (UI).
    Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value.
    Dirty Data Records Allowed The maximum number of dirty data records allowed.

Configure Hologres Reader by using the code editor

  • Use a non-partitioned table as the source table
    • In the following code, a node is configured to read data from a non-partitioned Hologres table.
      {
          "type":"job",
          "version":"2.0",// The version number.
          "steps":[
              {
                  "stepType":"holo",// The reader type.
                  "parameter":{
                      "endpoint": "instance-id-region-endpoint.hologres.aliyuncs.com:port",
                      "accessId": "***************", // The AccessKey ID of the account used to access Hologres.
                      "accessKey": "*******************", // The AccessKey secret of the account used to access Hologres.
                      "database": "postgres",
                      "table": "holo_reader_****",
                      "column":[// The columns to be synchronized.
                          "tag",
                          "id",
                          "title"
                      ]
                  },
                  "name":"Reader",
                  "category":"reader"
              },
              {
                  "stepType":"stream",
                  "parameter":{},
                  "name":"Writer",
                  "category":"writer"
              }
          ],
          "setting":{
              "errorLimit":{
                  "record":"0"// The maximum number of dirty data records allowed.
              },
              "speed":{
                  "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
                  "concurrent":1,// The maximum number of concurrent threads.
              }
          },
          "order":{
              "hops":[
                  {
                      "from":"Reader",
                      "to":"Writer"
                  }
              ]
          }
      }
    • The following section provides the sample DDL statements used to create a non-partitioned Hologres table.
      begin;
      drop table if exists holo_reader_basic_src;
      create table holo_reader_basic_src(
        tag text not null, 
        id int not null, 
        title text not null, 
        body text, 
        primary key (tag, id));
        call set_table_property('holo_reader_basic_src', 'orientation', 'column');
        call set_table_property('holo_reader_basic_src', 'shard_count', '3');
      commit;
  • Use a partitioned table as the source table
    • In the following code, a node is configured to read data from a child partitioned table in Hologres in the SDK (Fast Write) mode.
      Note Exercise caution when you set the partition parameter.
      {
          "type":"job",
          "version":"2.0",// The version number.
          "steps":[
              {
                  "stepType":"holo",// The reader type.
                  "parameter":{
                      "endpoint": "instance-id-region-endpoint.hologres.aliyuncs.com:port",
                      "accessId": "***************", // The AccessKey ID of the account used to access Hologres.
                      "accessKey": "*******************", // The AccessKey secret of the account used to access Hologres.
                      "database": "postgres",
                      "table": "holo_reader_basic_****",
                      "partition": "tag=foo",
                      "column" : [
                          "*"
                      ],
                      "fetchSize": "100"
                  },
                  "name":"Reader",
                  "category":"reader"
              },
              {
                  "stepType":"stream",
                  "parameter":{},
                  "name":"Writer",
                  "category":"writer"
              }
          ],
          "setting":{
              "errorLimit":{
                  "record":"0"// The maximum number of dirty data records allowed.
              },
              "speed":{
                  "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
                  "concurrent":1,// The maximum number of concurrent threads.
              }
          },
          "order":{
              "hops":[
                  {
                      "from":"Reader",
                      "to":"Writer"
                  }
              ]
          }
      }
    • The following section provides the sample DDL statements used to create a partitioned Hologres table.
      begin;
      drop table if exists holo_reader_basic_part_src;
      create table holo_reader_basic_part_src(
        tag text not null, 
        id int not null, 
        title text not null, 
        body text, 
        primary key (tag, id))
        partition by list( tag );
        call set_table_property('holo_reader_basic_part_src', 'orientation', 'column');
        call set_table_property('holo_reader_basic_part_src', 'shard_count', '3');
      commit;
      
      create table holo_reader_basic_part_src_1583161774228 partition of holo_reader_basic_part_src for values in ('foo');
      
      # Make sure that the child partitioned table has been created with proper data.
      postgres=# \d+ holo_reader_basic_part_src
                               Table "public.holo_reader_basic_part_src"
       Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
      --------+---------+-----------+----------+---------+----------+--------------+-------------
       tag    | text    |           | not null |         | extended |              | 
       id     | integer |           | not null |         | plain    |              | 
       title  | text    |           | not null |         | extended |              | 
       body   | text    |           |          |         | extended |              | 
      Partition key: LIST (tag)
      Indexes:
          "holo_reader_basic_part_src_pkey" PRIMARY KEY, btree (tag, id)
      Partitions: holo_reader_basic_part_src_1583161774228 FOR VALUES IN ('foo')