This topic outlines the DQL (Data Query Language) operations for logical tables.
Limits
The SELECT statement can return a maximum of 10,000 rows and the result set must not exceed 10MB. However, when used as a subquery, the SELECT statement is not subject to this limitation and will return all results to the parent query.
By default, the SELECT statement is restricted from performing a full table scan on partitioned tables.
SELECT standard query
Dataphin supports the use of standard SELECT statements for querying logical tables, enabling the selection of data that meets specific criteria.
[with <cte>[, ...] ]
select [all | distinct] <select_expr>[, <except_expr>)][, <replace_expr>] ...
from <ld_data_unit>.<logical_table_name>
[where <where_condition>]
[group by {<col_list>|rollup(<col_list>)}]
[having <having_condition>]
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
[limit <number>]
[window <window_clause>]Note: The data unit name is a required component and cannot be omitted.
Example explanation:
To demonstrate the SELECT statement's application, consider the following example. After creating the logical table 'user' in Dataphin, you can execute the command below to retrieve data from the logical table.
SELECT * FROM ld_test.dim_user WHERE ds>0;SELECT special query (WHERE clause)
Dataphin also offers a specialized syntax for querying fields from associated logical tables, which significantly streamlines SQL coding and enhances efficiency.
select <table_alias>.<column_list>
,<table_alias>.<dim_role>[.<dim_role>].<dim_columnn_list>
from <ld_data_unit>.<logical_table_name> <table_alias>
where <where_condition>Command explanation:
table_alias: The alias for the logical table being queried, which is mandatory in this context.
dim_role: The role name of the associated logical dimension table within the primary table.
column_list: The list of fields from the primary table.
dim_column_list: The list of fields from the associated logical dimension table.
ld_data_unit: The identifier for the data unit.
Example explanation:
In the following example, the order logical fact table ld_test.fct_crt_sale_order_id is linked with the customer logical dimension table ld_test.dim_customer and the address logical dimension table ld_test.dim_address. Additionally, the address logical dimension table ld_test.dim_address is connected to the city logical dimension table ld_test.dim_region_lvl3, which is a hierarchy dimension table with the role name 'dim_city' in 'dim_address'.
To query the customer name and delivery city for a specific order (order ID=123), use the following command:
select crt_sale_order_id
,ord.dim_customer.user_name
,ord.dim_address.dim_city.region_lvl3_name as city_name
from ld_test.fct_crt_sale_order_di ord
where ds = '20230101'
and crt_sale_order_id = 123
This syntax is equivalent to the following standard SQL statement:
select crt_sale_order_id
,cust.user_name
,city.region_lvl3_name as city_name
from (select *
from ld_test.fct_crt_sale_order_di
where ds = '20230101'
and crt_sale_order_id = 123) ord
left outer join
(select *
from ld_test.dim_customer
where ds = '20230101') cust
on (ord.customer_id = cust.customer_id)
left outer join
(select *
from ld_test.dim_address
where ds = '20230101') addr
on (ord.delivery_address_id = addr.address_id)
left outer join
(select *
from ld_test.dim_region_lvl3 --original city hierarchy dimension table name, role name is dim_city
where ds = '20230101') city
on (addr.city_id = city.region_lvl3_id)