This topic describes how to create a foreign table in Hologres to accelerate queries of Object Storage Service (OSS) data.

Prerequisites

  • A Hologres instance is created, and a development tool is connected to the instance. In this example, HoloWeb is used. For more information, see HoloWeb quick start.
  • OSS is activated. Data is uploaded to an OSS bucket. For more information, see Get started with OSS.
  • A bucket policy is configured to authorize you to access the data in the OSS bucket. If you do not have the required permissions, the foreign table that was created cannot be used to query the data in the OSS bucket. For more information about bucket policies in OSS, see Bucket policy.

Background information

OSS is a secure, cost-effective, and highly reliable cloud storage service that can store a large number of objects of all types. Hologres is integrated with OSS. You can create foreign tables in Hologres to accelerate queries of OSS data without the need to import and export data. In such scenarios, data is still stored in OSS. A foreign table in Hologres is used to map to the fields in the source table, but is not used to store data.

Hologres is compatible with PostgreSQL. The principles of accessing OSS data by creating foreign tables are the same as the principles of the foreign-data wrapper in PostgreSQL. For more information, see postgres_fdw.

Limits

When you create a foreign table in Hologres to access OSS data, take note of the following limits:
  • This feature is supported only in Hologres V0.10 and later. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your Hologres instance is earlier than V0.10, submit a ticket to update your instance.
  • Before you create a foreign table in Hologres, you must execute the following statement to install an extension as the superuser. Extensions are installed by superusers at the database level. An extension needs to be installed for each database only once.
    create extension oss_fdw;
  • Only specific data types are supported, including TEXT, BIGINT, INT, FLOAT, DOUBLE PRECISION, BOOLEAN, and DATE.
  • Partitioned tables in OSS cannot be accessed. Use non-partitioned tables instead.
  • When you query data in a table, the first N rows of the table cannot be skipped.
  • To import data from Hologres to OSS, we recommend that you use COPY commands. For more information, see Use COPY commands to export data from Hologres to OSS.

Data type mappings

The following tables describe the mappings between data types in Parquet and ORC and Hologres.

  • Data type in Parquet Data type in Hologres
    INT INT
    BIGINT BIGINT
    STRING TEXT
    BOOLEAN BOOLEAN
    DATE DATE
    TIMESTAMP If the source table contains columns of the TIMESTAMP type, you can create a Hologres table that contains columns of the TEXT type. In this case, the table can be created but cannot be queried.
    FLOAT FLOAT4
    DOUBLE FLOAT8
    DECIMAL If the source table contains columns of the DECIMAL type, you can create a Hologres table that contains columns of the TEXT type. In this case, the table can be created but cannot be queried.
    CHAR(n) TEXT
    VARCHAR(n) TEXT
  • Data type in ORC Data type in Hologres
    INT INT
    BIGINT BIGINT
    STRING TEXT
    BOOLEAN BOOLEAN
    DATE DATE
    TIMESTAMP If the source table contains columns of the TIMESTAMP type, an error is returned when you create a Hologres table.
    FLOAT FLOAT4
    DOUBLE FLOAT8
    DECIMAL If the source table contains columns of the DECIMAL type, an error is returned when you create a Hologres table.
    CHAR(n) TEXT
    VARCHAR(n) TEXT

Create a foreign table to access OSS data

To access OSS data by creating a foreign table in Hologres, perform the following steps:
  1. Install an extension.
    Before you create a foreign table in Hologres, you must execute the following statement to install an extension in a database as the superuser. Extensions are installed by superusers at the database level. An extension need to be installed for each database only once.
    create extension oss_fdw;
  2. Create a server.
    After the extension is installed, you must create a server to connect to OSS.
    • Syntax
      CREATE SERVER <server_name> FOREIGN DATA WRAPPER oss_fdw 
      OPTIONS (
        endpoint 'https://<Internal endpoint of the OSS bucket that is accessible over the classic network>'
      );
    • Parameters
      Parameter Description Example
      server_name The name of the server. oss_server
      endpoint The internal endpoint of the OSS bucket that is accessible over the classic network. The endpoint must start with http:// or https://. For more information, see the "Obtain the internal endpoint of OSS" section of the Access to OSS resources from ECS instances by using an internal endpoint of OSS topic. http://oss-cn-shanghai-internal.aliyuncs.com
    • Example
      CREATE SERVER oss_server FOREIGN DATA WRAPPER oss_fdw
      OPTIONS (
        endpoint 'https://oss-cn-shanghai-internal.aliyuncs.com'
      );
  3. Create a foreign table.
    After the server is created, you can create a foreign table in Hologres to query the data in the OSS bucket. You can create a foreign table in the Parquet or text format based on your business requirements. The created foreign table must be in the same file format as the source table in OSS. Otherwise, data may fail to be read from OSS.
    • Syntax
      Use the following statement to create a foreign table in the Parquet format:
      create foreign table <tablename>(
        col type, 
        col type
      ) server <server_name>
       options (
        dir './<dir_url>/',
        format 'parquet', 
      )
      Use the following statement to create a foreign table in the text format:
      create foreign table <tablename>(
        col type, 
        col type
      ) server <server_name>
       options (
        dir './<dir_url>/',
        format 'text',
        delimiter ','
      )
    • Parameters
      Parameter Description Example
      tablename The name of the foreign table. oss_table
      col The name of the field. id
      type The data type of the field. int
      server_name The name of the server. oss_server
      dir_url The OSS path of the source table. Specify the directory where the table is stored. Take note of the following items:
      • Hologres allows you to specify the directory where an object is stored in OSS, instead of the complete OSS path of the object that contains the name of the object.
      • Hologres reads all objects in the specified OSS path for parsing, such as the orc directory. Make sure that the objects in a directory are in the same file format. Otherwise, garbled characters may be read.
      • Do not include the name of an object in the OSS path that you specify. For example, if you specify ./oss-test/orc/region_zlib_dict.orc as the OSS path, the foreign table can be created but no data can be read.
      Example 1: A table in the ORC format is stored in the following OSS path:
      oss://oss-test/orc/region_zlib_dict.orc
      In this case, specify ./oss-test/orc/ as the OSS path of the table.

      Values must be in the following format: Bucket name + Directory name.

      Example 2: ./oss-holo-tmp/holotest/.

      1
      format The file format of the source table in OSS. Valid values:
      • parquet
      • orc
      • text
      parquet
      delimiter The delimiter of fields in the source table in the text format.
      Note This parameter is required when the format parameter is set to text.
      ,
    • Example

      For example, a directory named holotest exists in the OSS bucket named oss-holo-tmp. In this case, you can execute the following statement to create a foreign table to read data of the TEXT type from the directory:

      create foreign table oostest (
          a int, 
          b float,
          c text
          ) 
          server oss_server
       options (
        dir './oss-holo-tmp/holotest/', 
        format 'text',
      delimiter ','
      );
  4. Query data by using an external table
    After the foreign table is created, you can query data in the foreign table. Sample code:
    SELECT COUNT (*) FROM osstest; -- Query the number of rows in the osstest table. 
    
    SELECT * FROM osstest limit 10; -- Query data in the osstest table and return 10 data entries. 

Performance optimization

Hologres uses foreign tables to read OSS data. To be specific, Hologres reads all objects in a directory in OSS to create a foreign table. When you partition the foreign table, we recommend that you use each object in the directory in OSS as a partition. To balance the size of each partition, we recommend that you control the size of each object. This prevents a large source table from consuming an excess of computing resources and degrading overall performance.