This topic provides answers to some frequently asked questions about MaxCompute Lightning.

Category FAQ
Features
Connection to MaxCompute Lightning

Is the number of MaxCompute Lightning query connections limited?

Yes, a maximum of 20 parallel MaxCompute Lightning query connections can be established for a single MaxCompute project.

Does MaxCompute Lightning limit the amount of data that can be queried? How much data can I query without affecting query performance?

MaxCompute Lightning can scan a maximum of 1 TB of data in a table in a single query. The query performance increases with the reduction of the amount of data that is queried.
Note We recommend that you scan no more than 100 GB of table data in a query. If the amount of table data that needs to be scanned exceeds 100 GB, the query can still succeed. However, the query performance is low. If you want to scan more than 100 GB of data in a table, we recommend that you use MaxCompute SQL.

How do I use MaxCompute Lightning? Can I use MaxCompute Lightning in a web console?

MaxCompute Lightning does not provide a web console. You can use SQL clients or business intelligence (BI) tools to connect to the Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) API of MaxCompute Lightning. Then, you can access the required MaxCompute projects.

What data can I query by using MaxCompute Lightning if no table is created?

You can use MaxCompute Lightning to access data tables in a specified MaxCompute project by using your Alibaba Cloud account. If no data table exists in the project, you must create a table in the project and load and process data in the table by using tools, such as DataWorks or MaxCompute client odpscmd. Then, you can use MaxCompute Lightning to connect to the project. This way, you can view the tables on which you are granted permissions in the project and query data from these tables.

How does MaxCompute Lightning process the fields whose names start with digits?

MaxCompute supports fields whose names start with digits, such as 1_day_active_users. However, the PostgreSQL syntax that is used by MaxCompute Lightning does not support fields whose names start with digits. If a field whose name starts with a digit is involved in MaxCompute Lightning queries, enclose the field name in double quotation marks (""). Example: select bizdate,"1_day_active_users","7_day_active_users" from t_active_users;.

How do I configure the connection between the JDBC API and MaxCompute Lightning in an application?

If you use the JDBC API to connect to MaxCompute Lightning in an application, we recommend that you set prepareThreshold to 0 and autocommit to True.

Do you recommend that I directly use MaxCompute Lightning to query all detailed data?

No, we recommend that you do not directly use MaxCompute Lightning to query all detailed data. If you use an SQL client tool or BI tool to perform a full query (SELECT * FROM table;) on a table that contains a large amount of data, a large amount of data needs to be extracted from the server to the client. As a result, the query is prolonged.

We recommend that you analyze detailed data on the server. This way, the system filters out data and returns a small amount of data. This reduces end-to-end query latency.

Can I use psycopg2 to access MaxCompute Lightning?

Yes, you can use psycopg2 to access MaxCompute Lightning. The following sample code shows how to access MaxCompute Lightning by using psycopg2:
#!/usr/bin/env python
# coding=utf-8

import psycopg2
import sys

def query_lightning(lightning_conf, sql):
    """Query data through Lightning by sql

    Args:
        lightning_conf: a map contains settings of 'dbname', 'user', 'password', 'host', 'port'
        sql:  query submit to Lightning

    Returns:
        result: the query result in format of list of rows
    """
    result = None
    conn = None
    conn_str = None
    try:
        conn_str = ("dbname={dbname} "
                    "user={user} "
                    "password={password} "
                    "host={host} "
                    "port={port}").format(**lightning_conf)
    except Exception, e:
        print >> sys.stderr, ("Invalid Lightning' configuration "
                              "{}".format(e))
        sys.exit(1)

    try:
        conn = psycopg2.connect(conn_str)
        conn.set_session(autocommit=True)  # This will disable transaction
        # started with keyword BEGIN,
        # which is currently not
        # supported by Lightning' public service

        cur = conn.cursor()
        # execute Lightning' query
        cur.execute(sql)
        # get result
        result = cur.fetchall()
    except Exception, e:
        print >> sys.stderr, ("Failed to query data through "
                              "Lightning: {}".format(e))
    finally:
        if conn:
            conn.close()

    return result

if __name__ == "__main__":
    # step1. setup configuration
    lightning_conf = {
        "dbname": "project",
        "user": "accessId",  # AccessId
        "password": "accessKey",  # AccessKey
        "host": "lightning.cn-shanghai.maxcompute.aliyun-inc.com", # Lightning Endpoint
        "port": 443
    }

    # step2. issue a query
    result = query_lightning(lightning_conf, "select count(1) from table_name")
    # step3. print result
    if result:
        for i in xrange(0, len(result)):
            print
            "Got %d row from Lightning:%s" % (i + 1, result[i])

What methods can I use to optimize the performance of MaxCompute Lightning?

  • When you query a partitioned table, specify partitions to reduce the amount of data that needs to be scanned.
  • Query only the required columns. For example, change select * from table; to select a,b from table;.
  • If you want to query data from a table that contains a large amount of data by using a query that contains an equality operator, we recommend that you create a hash-clustered table or change the original table to a hash-clustered table. Then, use the CLUSTERED field to filter data in a more efficient manner. For example, select id,value from table where id=123; is a query that contains an equality operator.
  • If a query on a table that contains a small amount of data is time-consuming, check whether data skew exists. You can run the EXPLAIN ANALYZE command to view the execution plan and obtain diagnostic information.

What do I do if the "AXF Exception" error message appears when I use a BI tool to select a partitioned table for analysis in drag-and-drop mode?

The following error message appears:
ERROR: AXF Exception: specified partitions count in odps table: <project_name.table_name> is: xxx, exceeds the limitation of xxx, please add stricter partition filter

To ensure query performance, MaxCompute Lightning limits the number of partitions that can be queried in a partitioned table. The maximum number of partitions that can be scanned for a single table at the same time is 1,024. Specific BI tools allow you to select a table for analysis in drag-and-drop mode. In this mode, you cannot specify partition conditions in the BI frontend. As a result, the number of partitions that need to be scanned exceeds the upper limit, and MaxCompute Lightning returns this error.

We recommend that you change the data table that you want to query to a non-partitioned table or a partitioned table that contains less than 1,024 partitions before you query and analyze data in the table.

The "ERROR: SSL required" error message appears, which indicates a connection failure. What do I do?

MaxCompute Lightning requires that a third-party client accesses data over SSL. If the third-party client does not provide the SSL option, you can add an SSL parameter to the JDBC URL connection string, such as jdbc:postgresql://lightning.cn-shanghai.maxcompute.aliyun.com:443/myproject?ssl=true.

When I use the SQL Workbench/J client to query data, an error message that indicates transaction termination appears. What do I do?

  • Problem description

    When I use the SQL Workbench/J client to query data, the following error message appears:

    rror:current transaction is aborted,commands ignored until end of transaction block.
  • Solution

    On the SQL Workbench/J client, select the Autocommit option.

What do I do if the "Failed to login the project" error message appears when I request a connection?

  • Problem description

    The following error message appears:

    ERROR: Failed to login the project: <project name>. Lightning service has not been activated for your project, please contact the administrator
  • Cause

    MaxCompute Lightning is not activated in the region to which your MaxCompute project belongs. You must apply for activating MaxCompute Lightning and use this service.

  • Solution

    Activate MaxCompute Lightning in the region to which your MaxCompute project belongs.

What do I do if the "ConnectionException" error message appears when I request a connection?

  • Problem description

    The following error message appears:

    ConnectionException: HikariPool-2 - Connection is not available, request timed out after 10018ms
  • Cause

    The error message indicates that a network connection times out. This may be caused by network fluctuations.

  • Solution

    Check the network connection. Establish the connection again when the network connection is stable.