This topic provides answers to some frequently asked questions about MaxCompute Lightning.
Is the number of MaxCompute Lightning query connections limited?
Yes, a maximum of 20 parallel MaxCompute Lightning query connections can be established for a single MaxCompute project.
Does MaxCompute Lightning limit the amount of data that can be queried? How much data can I query without affecting query performance?
How do I use MaxCompute Lightning? Can I use MaxCompute Lightning in a web console?
MaxCompute Lightning does not provide a web console. You can use SQL clients or business intelligence (BI) tools to connect to the Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) API of MaxCompute Lightning. Then, you can access the required MaxCompute projects.
What data can I query by using MaxCompute Lightning if no table is created?
You can use MaxCompute Lightning to access data tables in a specified MaxCompute project by using your Alibaba Cloud account. If no data table exists in the project, you must create a table in the project and load and process data in the table by using tools, such as DataWorks or MaxCompute client odpscmd. Then, you can use MaxCompute Lightning to connect to the project. This way, you can view the tables on which you are granted permissions in the project and query data from these tables.
How does MaxCompute Lightning process the fields whose names start with digits?
MaxCompute supports fields whose names start with digits, such as 1_day_active_users
. However, the PostgreSQL syntax that is used by MaxCompute Lightning does not support
fields whose names start with digits. If a field whose name starts with a digit is
involved in MaxCompute Lightning queries, enclose the field name in double quotation
marks (""). Example: select bizdate,"1_day_active_users","7_day_active_users" from t_active_users;
.
How do I configure the connection between the JDBC API and MaxCompute Lightning in an application?
If you use the JDBC API to connect to MaxCompute Lightning in an application, we recommend
that you set prepareThreshold
to 0 and autocommit
to True.
Do you recommend that I directly use MaxCompute Lightning to query all detailed data?
No, we recommend that you do not directly use MaxCompute Lightning to query all detailed
data. If you use an SQL client tool or BI tool to perform a full query (SELECT * FROM table;
) on a table that contains a large amount of data, a large amount of data needs to
be extracted from the server to the client. As a result, the query is prolonged.
We recommend that you analyze detailed data on the server. This way, the system filters out data and returns a small amount of data. This reduces end-to-end query latency.
Can I use psycopg2 to access MaxCompute Lightning?
#!/usr/bin/env python
# coding=utf-8
import psycopg2
import sys
def query_lightning(lightning_conf, sql):
"""Query data through Lightning by sql
Args:
lightning_conf: a map contains settings of 'dbname', 'user', 'password', 'host', 'port'
sql: query submit to Lightning
Returns:
result: the query result in format of list of rows
"""
result = None
conn = None
conn_str = None
try:
conn_str = ("dbname={dbname} "
"user={user} "
"password={password} "
"host={host} "
"port={port}").format(**lightning_conf)
except Exception, e:
print >> sys.stderr, ("Invalid Lightning' configuration "
"{}".format(e))
sys.exit(1)
try:
conn = psycopg2.connect(conn_str)
conn.set_session(autocommit=True) # This will disable transaction
# started with keyword BEGIN,
# which is currently not
# supported by Lightning' public service
cur = conn.cursor()
# execute Lightning' query
cur.execute(sql)
# get result
result = cur.fetchall()
except Exception, e:
print >> sys.stderr, ("Failed to query data through "
"Lightning: {}".format(e))
finally:
if conn:
conn.close()
return result
if __name__ == "__main__":
# step1. setup configuration
lightning_conf = {
"dbname": "project",
"user": "accessId", # AccessId
"password": "accessKey", # AccessKey
"host": "lightning.cn-shanghai.maxcompute.aliyun-inc.com", # Lightning Endpoint
"port": 443
}
# step2. issue a query
result = query_lightning(lightning_conf, "select count(1) from table_name")
# step3. print result
if result:
for i in xrange(0, len(result)):
print
"Got %d row from Lightning:%s" % (i + 1, result[i])
What methods can I use to optimize the performance of MaxCompute Lightning?
- When you query a partitioned table, specify partitions to reduce the amount of data that needs to be scanned.
- Query only the required columns. For example, change
select * from table;
toselect a,b from table;
. - If you want to query data from a table that contains a large amount of data by using
a query that contains an equality operator, we recommend that you create a hash-clustered
table or change the original table to a hash-clustered table. Then, use the CLUSTERED
field to filter data in a more efficient manner. For example,
select id,value from table where id=123;
is a query that contains an equality operator. - If a query on a table that contains a small amount of data is time-consuming, check whether data skew exists. You can run the EXPLAIN ANALYZE command to view the execution plan and obtain diagnostic information.
What do I do if the "AXF Exception" error message appears when I use a BI tool to select a partitioned table for analysis in drag-and-drop mode?
ERROR: AXF Exception: specified partitions count in odps table: <project_name.table_name> is: xxx, exceeds the limitation of xxx, please add stricter partition filter
To ensure query performance, MaxCompute Lightning limits the number of partitions that can be queried in a partitioned table. The maximum number of partitions that can be scanned for a single table at the same time is 1,024. Specific BI tools allow you to select a table for analysis in drag-and-drop mode. In this mode, you cannot specify partition conditions in the BI frontend. As a result, the number of partitions that need to be scanned exceeds the upper limit, and MaxCompute Lightning returns this error.
We recommend that you change the data table that you want to query to a non-partitioned table or a partitioned table that contains less than 1,024 partitions before you query and analyze data in the table.
The "ERROR: SSL required" error message appears, which indicates a connection failure. What do I do?
MaxCompute Lightning requires that a third-party client accesses data over SSL. If
the third-party client does not provide the SSL option, you can add an SSL parameter
to the JDBC URL connection string, such as jdbc:postgresql://lightning.cn-shanghai.maxcompute.aliyun.com:443/myproject?ssl=true
.
When I use the SQL Workbench/J client to query data, an error message that indicates transaction termination appears. What do I do?
- Problem description
When I use the SQL Workbench/J client to query data, the following error message appears:
rror:current transaction is aborted,commands ignored until end of transaction block.
- Solution
On the SQL Workbench/J client, select the Autocommit option.
What do I do if the "Failed to login the project" error message appears when I request a connection?
- Problem description
The following error message appears:
ERROR: Failed to login the project: <project name>. Lightning service has not been activated for your project, please contact the administrator
- Cause
MaxCompute Lightning is not activated in the region to which your MaxCompute project belongs. You must apply for activating MaxCompute Lightning and use this service.
- Solution
Activate MaxCompute Lightning in the region to which your MaxCompute project belongs.
What do I do if the "ConnectionException" error message appears when I request a connection?
- Problem description
The following error message appears:
ConnectionException: HikariPool-2 - Connection is not available, request timed out after 10018ms
- Cause
The error message indicates that a network connection times out. This may be caused by network fluctuations.
- Solution
Check the network connection. Establish the connection again when the network connection is stable.