All Products
Search
Document Center

Dataphin:Logical table DQL operations

Last Updated:Jan 21, 2025

This topic outlines the DQL (Data Query Language) operations for logical tables.

Limits

  • The SELECT statement can return a maximum of 10,000 rows and the result set must not exceed 10MB. However, when used as a subquery, the SELECT statement is not subject to this limitation and will return all results to the parent query.

  • By default, the SELECT statement is restricted from performing a full table scan on partitioned tables.

SELECT standard query

Dataphin supports the use of standard SELECT statements for querying logical tables, enabling the selection of data that meets specific criteria.

[with <cte>[, ...] ]
select [all | distinct] <select_expr>[, <except_expr>)][, <replace_expr>] ...
       from <ld_data_unit>.<logical_table_name>
       [where <where_condition>]
       [group by {<col_list>|rollup(<col_list>)}]
           [having <having_condition>]
       [order by <order_condition>]
       [distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
       [limit <number>]
       [window <window_clause>]
Important

Note: The data unit name is a required component and cannot be omitted.

Example explanation:

To demonstrate the SELECT statement's application, consider the following example. After creating the logical table 'user' in Dataphin, you can execute the command below to retrieve data from the logical table.

SELECT * FROM ld_test.dim_user WHERE ds>0;

SELECT special query (WHERE clause)

Dataphin also offers a specialized syntax for querying fields from associated logical tables, which significantly streamlines SQL coding and enhances efficiency.

select <table_alias>.<column_list>
      ,<table_alias>.<dim_role>[.<dim_role>].<dim_columnn_list>
  from <ld_data_unit>.<logical_table_name> <table_alias>
 where <where_condition>

Command explanation:

  • table_alias: The alias for the logical table being queried, which is mandatory in this context.

  • dim_role: The role name of the associated logical dimension table within the primary table.

  • column_list: The list of fields from the primary table.

  • dim_column_list: The list of fields from the associated logical dimension table.

  • ld_data_unit: The identifier for the data unit.

Example explanation:

In the following example, the order logical fact table ld_test.fct_crt_sale_order_id is linked with the customer logical dimension table ld_test.dim_customer and the address logical dimension table ld_test.dim_address. Additionally, the address logical dimension table ld_test.dim_address is connected to the city logical dimension table ld_test.dim_region_lvl3, which is a hierarchy dimension table with the role name 'dim_city' in 'dim_address'.

To query the customer name and delivery city for a specific order (order ID=123), use the following command:

select crt_sale_order_id
      ,ord.dim_customer.user_name
      ,ord.dim_address.dim_city.region_lvl3_name as city_name
  from ld_test.fct_crt_sale_order_di ord
 where ds = '20230101'
   and crt_sale_order_id = 123

This syntax is equivalent to the following standard SQL statement:

select crt_sale_order_id
      ,cust.user_name
      ,city.region_lvl3_name as city_name
  from (select * 
          from ld_test.fct_crt_sale_order_di
         where ds = '20230101'
           and crt_sale_order_id = 123) ord
       left outer join
       (select * 
          from ld_test.dim_customer
         where ds = '20230101') cust
    on (ord.customer_id = cust.customer_id)
       left outer join
       (select * 
          from ld_test.dim_address
         where ds = '20230101') addr
    on (ord.delivery_address_id = addr.address_id)
       left outer join
       (select * 
          from ld_test.dim_region_lvl3 --original city hierarchy dimension table name, role name is dim_city
         where ds = '20230101') city
    on (addr.city_id = city.region_lvl3_id)